Ceph in Outreachy 2017 - December Round

Ceph Project Ideas

Applications

In order to apply to an Outreachy project this round you should:

  1. Read the Outreachy Eligibility requirements and make sure you are eligible for this round
  2. Read the through the Ceph project ideas on this page
  3. Fill out this form
  4. Start Up a Ceph Cluster using vstart.sh (after #3)
  5. Work on tasks related to the project ideas from Ceph mentors (after #4)
  6. Fill out the Project Details section in your Outreachy application before the deadline on October 23.

Communication channels for more information:

IRC: #ceph channel at irc.oftc.net.

Mailing:  ceph-users@ceph.com.

 

SMART data for OSDs

Requirements:

  • Python
  • Basic knowledge of C++
  • Experience with basic Linux administration

Description:

HDDs and SSDs expose internal metrics about usage, wear, and hardware health in the form of SMART metrics that can be queried from the host. This information can be used to build a model around expected longevity of the device so that impending failures can be predicted and data can be proactive replicated and failing devices removed from the system before they cause availability problems.

Ceph is an open source distributed storage system that uses replication and erasure coding to distribute data across many HDDs and/or SSDs. The system aims to be self-managing, which should include predicting the failures of constituent devices before they happen to improve overall data safety.

This project will include integration of low-level tools to extract SMART data from devices on a regular basis (e.g., by modifying the smartctl(8) utility to dump it’s result in structured form), feeding that data back to a the central Ceph cluster “manager” daemons, and implementing a simple mathematical model to predict failures and preemptive remove failing devices from the system. Several simple existing models are available that can be used as-is once the SMART data is centrally stored and monitored, although once this infrastructure is in place we’ll eventually look to improve the accuracy of the predictive models based on additional data.

Milestones:

  1. Extract structured SMART data from raw devices. The current smartctl(8) utility dumps it’s result in human- and not machine-readable form, so either a smartctl modification or a wrapper to parse it’s output is needed.
  2. The OSD (storage) daemons will need to gather SMART information for their backing devices on a regular basis and feed that information back to the ceph manager daemons.
  3. <midpoint>
  4. The ceph manager (ceph-mgr) daemon will store the SMART data in a small database.
  5. The ceph manager will expose SMART metrics via CLI commands.
  6. <stretch goals>
  7. The ceph manager daemon will periodically run a predictive model to check for devices with a high probability of failing in the near future. Health alerts can be raised when probabilities exceed a configured threshold.
  8. The ceph manager can be configured to proactively and automatically remove devices with a high probability of failure.

 

 

 

 

Radosgw-admin improvements

Requirements:

  • C++ programming experience
  • Python programming experience
  • Comfortable using a Linux development environment

Description:

Ceph is a highly available distributed software defined storage, providing object, key/value and file-system interfaces. Ceph Radosgw provides HTTP REST API that is AWS S3 and openstack swift compatible.
radosgw-admin is a command line tool for configuring, controlling this service.
It also allow querying the service status, user data and the geo replication..
Today adding or updating commands is a complex process, resulting from the choice of implementation.

We would like to improve the implementation:

  • Better command line options parsing tool like boost.program_options
  • Use structural programming model
  • Code reusability
  • Add coverage testing

The project will consist of three parts:

  1. Writing a coverage test to validate the commands (in python)
  2. Refactor radosgw-admin
  3. Integrating the coverage test into teuthology (ceph automatic testing framework)

Milestones:

Midpoint:

  • Set Up a ceph cluster with radosgw and radosgw-admin
  • Develop a coverage test for radosgw-admin API
  • Detail design of the new radosgw-admin implementation

Final:

  • Improved radosgw-admin that is passing the coverage test
  • Stretch goal: Integration of the new coverage test into teuthology

CBT output visualization

Requirements:

  • Comfortable with Python
  • Knowledge of Graphing tools
  • Experience with data visualization
  • Comfortable using a Linux development environment

Preferred:

  • Experience running, documenting, or reading performance benchmarks
  • Experience with data storage systems
  • Knowledge of JSON, YAML or other (non-markup) languages

Description:

The Ceph Benchmarking Tool is a python based framework used to automate distributed data storage performance tests. Lately, we have been working on the integration of CBT with our nightly regression testing framework called Teuthology. The goal is to automate comprehensive performance testing for Ceph. We need a candidate with a keen interest in data visualization and knowledge of web programming (especially web based graphing frameworks) to help us take the nightly performance test results and system monitoring data and present them in intuitive and user friendly ways.

Milestones:

Midterm:

  • Proof of concept of the design
  • Clear idea of the tools to be used, along with a functional model with smaller datasets

Final:

  • Integration of data visualization into CBT and Teuthology

Ceph-mgr dashboard enhancements

Requirements

  • Python (Intermediate)
  • Javascript/HTML/CSS (Basic)
  • Experience with basic Linux administration

Description

Ceph includes a lightweight web dashboard that enables users to see the health of the system and explore the status of the various services within the cluster.

The dashboard was added recently in the 12.x Luminous Ceph release, and has much scope for enhancement, as there is currently much more data available in ceph-mgr than is exposed in the dashboard code.

This project will include making a variety of improvements to the dashboard, such as:

  • A generic view to explore the available performance counters for all Ceph services in tabular form, including exposing the documentation metadata for the counters.
  • A built-in “help” view for configuration settings in Ceph, including filtering them by service and exposing the built-in documentation strings for the settings.
  • Greater information density by exposing additional statistics inline with status information, using compact representations such as sparklines.

There is some flexibility in exactly which features/pages to work on.

Milestones

  1. Addition of more information to one or more existing views
  2. Creation of one or more of the proposed new pages
  3. <midpoint>
  4. Completion of both the configuration and performance counter browsers
  5. Write a blog post for ceph.com to raise awareness of the new features