Main GSoC Page
Main Ceph GSoC Page
Requirements:
Basic understanding of distributed system
Familiarity with a programming language
Some familiarity with log analytics
Description:
Distributed systems are often hard to troubleshoot. With many logs and a vast array of potential issues that can arise from hardware, network, and configuration, it is sometimes time consuming to find the root of any problem that may occur. This summer project will work to collect all of the various information sources in a single location and provide a framework around diagnosing problems.
An applicant interested in this project should have a working knowledge of distributed systems, some experience with Linux troubleshooting and log analysis, and a knowledge of some high level programming language for implementation (C++ or Python preferred).
Milestones:
GSOC 2017 community bonding:
GSOC 2017 midterm:
GSOC 2017 final:
Requirements:
Description:
Ceph uses the CRUSH algorithm to distribute data objects among storage devices. And the storage devices are weighted, so CRUSH is able to maintain a statistically balanced utilization of storage and bandwidth resources. At the very beginning, the administrator is very likely specify a weight relative to the total capacity of the device. But over the time, the utilization of storage devices could become unbalanced. And this hurts the availability of the storage pool. Because once any of the storage device assigned to a storage pool is full, the whole pool is not writeable anymore, even there is abundant space in other devices. And we also want to minimize the performance impact caused by rebalance. So a smarter reweight algorithm would be very helpful.
It’s hard to evaluate the performance of this algorithm. So the participant should build up a model or a tool to evaluate the performance, and then come up a reweight algorithm to address the problems listed above.
Milestones:
GSOC 2017 community bonding:
GSOC 2017 midterm:
GSOC 2017 final:
Requirements:
Description:
Ceph-mgr is a daemon collecting the real-time statistics from a Ceph cluster. In this daemon, a python intepreter is embedded. And it exposes a set of python APIs that can be consumed by the hosted python modules. This project involves designing and prototyping a status dashboard to visualize the statistics of the cluster in different levels and perspectives.
This project can be divided into three phases:
Milestones:
GSOC 2017 community bonding:
GSOC 2017 midterm:
GSOC 2017 final:
Additional Mentors:
Requirements
Description:
Ceph-mgr is a daemon collecting the real-time statistics from a Ceph cluster. In this daemon, a python intepreter is embedded. And it exposes a set of python APIs that can be consumed by the hosted python modules. And OSD is the daemon managing the storage devices. It would be very helpful if we can identify the slow OSDs and take measures, before the performance degeneration has visible impact to user.
Milestones:
GSOC 2017 community bonding:
GSOC 2017 midterm:
GSOC 2017 final:
Requirements:
Description:
Ceph-mgr is a daemon collecting the real-time statistics from a Ceph cluster. In this daemon, a python intepreter is embedded. And it exposes a set of python APIs that can be consumed by the hosted python modules. In Ceph, data is distributed among storage devices in the form of “objects”, which are in turn aggregated by placement groups within a storage pool. Because tracking object placement and object metadata on a per-object basis is computationally expensive — i.e. a pool with millions of object cannot realistically track placement on a per-object basis. And the Ceph client calculates which placement group an object should be placed by hashing the object ID, so every object ID can be mapped to a certain placement group. So, better data durability and even distribution call for more placement groups, but their number should be reduced to the minimum to save CPU and memory. As each placement group is served by a set of OSDs, so the more placement groups served by an OSD, this OSD demands more CPU and memory resource. In short, it is a tradeoff. So apparently, when there is more objects stored in a given pool, the number of placement group should be tuned. For more infomation, see http://docs.ceph.com/docs/master/rados/operations/placement-groups/.
Milestones:
GSOC 2017 community bonding:
GSOC 2017 midterm:
GSOC 2017 final:
Requirements
Description:
Ceph-mgr is a daemon collecting the real-time statistics from a Ceph cluster. In this daemon, a python intepreter is embedded. And it exposes a set of python APIs that can be consumed by the hosted python modules. And caps (short for “capabilities”) is the way how Ceph describe authorizing an authenticated user to exercise the functionality of different components of Ceph. To grant a CephFS client the access to a certain directory, one need to use the ceph command line to create the corresponding caps for it. If we are able to leverage ceph-mgr to do this job, it would be greatly simplified this process.
Milestones:
GSOC 2017 community bonding:
GSOC 2017 midterm:
GSOC 2017 final:
Requirements:
Description:
This project involves designing and prototyping a simplified database backend for the Ceph Object Gateway (RGW). This backend will make it possible for developers to set up an RGW without the rest of Ceph. The abstraction for the backend would be useful for adding other backends in the future.
The first half of the project involves working with mentors to understand RGW internals and architecture, writing up a multi-page design document, and setting up a stand alone prototype in C++ that implements some object operations on a database. The design document will show how a file-backend would fit into the rest of the RGW architecture.
The second half of the project involves implementing a prototype of the design using a database as the backend. Participants should have made substantial progress or completed a prototype by the end of coding.
Milestones:
GSoC 2017 Community Bonding:
GSoC 2017 Midterm:
GSoC 2017 Final:
Additional Mentors:
Requirements:
Description:
The Ceph testing framework is almost entirely written in python. To test the Amazon S3 storage protocol and Openstack Swift storage, there are many major SDKs used that are in languages other than python.
Bugs have shown up which only occur for these other non-python SDKs. This project would involve implementing those tests from these other SDKs, add tests to the other SDKs, and make them ready to be used in Ceph’s upstream testing framework: teuthology.
One of the main challenges of this project will be in working within a multi-host, multi-OS,
multi-language systems environment.
Milestones:
GSOC 2017 community bonding:
GSOC 2017 midterm:
GSOC 2017 final:
Additional Mentors: