The Ceph Blog

Featured Post

v0.94.1 Hammer released

This bug fix release fixes a few critical issues with CRUSH. The most important addresses a bug in feature bit enforcement that may prevent pre-hammer clients from communicating with the cluster during an upgrade. This only manifests in some cases (for example, when the ‘rack’ type is in use in the CRUSH map, and possibly other cases), but for safety we strongly recommend that all users use 0.94.1 instead of 0.94 when upgrading.

There is also a fix in the new straw2 buckets when OSD weights are 0.

We recommend that all v0.94 users upgrade.


  • crush: fix divide-by-0 in straw2 (#11357 Sage Weil)
  • crush: fix has_v4_buckets (#11364 Sage Weil)
  • osd: fix negative degraded objects during backfilling (#7737 Guang Yang)

For more detailed information, see the complete changelog.



Earlier Posts

v0.94 Hammer released

This major release is expected to form the basis of the next long-term stable series. It is intended to supersede v0.80.x Firefly.

Highlights since Giant include:

  • RADOS Performance: a range of improvements have been made in the OSD and client-side librados code that improve the throughput on flash backends and improve parallelism and scaling on fast machines.
  • Simplified RGW deployment: the ceph-deploy tool now has a new ‘ceph-deploy rgw create HOST’ command that quickly deploys a instance of the S3/Swift gateway using the embedded Civetweb server. This is vastly simpler than the previous Apache-based deployment. There are a few rough edges (e.g., around SSL support) but we encourage users to try the new method.
  • RGW object versioning: RGW now supports the S3 object versioning API, which preserves old version of objects instead of overwriting them.
  • RGW bucket sharding: RGW can now shard the bucket index for large buckets across, improving performance for very large buckets.
  • RBD object maps: RBD now has an object map function that tracks which parts of the image are allocating, improving performance for clones and for commands like export and delete.
  • RBD mandatory locking: RBD has a new mandatory locking framework (still disabled by default) that adds additional safeguards to prevent multiple clients from using the same image at the same time.
  • RBD copy-on-read: RBD now supports copy-on-read for image clones, improving performance for some workloads.
  • CephFS snapshot improvements: Many many bugs have been fixed with CephFS snapshots. Although they are still disabled by default, stability has improved significantly.
  • CephFS Recovery tools: We have built some journal recovery and diagnostic tools. Stability and performance of single-MDS systems is vastly improved in Giant, and more improvements have been made now in Hammer. Although we still recommend caution when storing important data in CephFS, we do encourage testing for non-critical workloads so that we can better guage the feature, usability, performance, and stability gaps.
  • CRUSH improvements: We have added a new straw2 bucket algorithm that reduces the amount of data migration required when changes are made to the cluster.
  • Shingled erasure codes (SHEC): The OSDs now have experimental support for shingled erasure codes, which allow a small amount of additional storage to be traded for improved recovery performance.
  • RADOS cache tiering: A series of changes have been made in the cache tiering code that improve performance and reduce latency.
  • Experimental RDMA support: There is now experimental support for RDMA via the Accelio (libxio) library.
  • New administrator commands: The ‘ceph osd df’ command shows pertinent details on OSD disk utilizations. The ‘ceph pg ls …’ command makes it much simpler to query PG states while diagnosing cluster issues.

Other highlights since Firefly include:
read more…

v0.80.9 Firefly released

This is a bugfix release for firefly. It fixes a performance regression in librbd, an important CRUSH misbehavior (see below), and several RGW bugs. We have also backported support for flock/fcntl locks to ceph-fuse and libcephfs.

We recommend that all Firefly users upgrade.

For more detailed information, see the complete changelog.


  • This point release fixes several issues with CRUSH that trigger excessive data migration when adjusting OSD weights. These are most obvious when a very small weight change (e.g., a change from 0 to .01) triggers a large amount of movement, but the same set of bugs can also lead to excessive (though less noticeable) movement in other cases.

    However, because the bug may already have affected your cluster, fixing it may trigger movement back to the more correct location. For this reason, you must manually opt-in to the fixed behavior.

    In order to set the new tunable to correct the behavior:

    ceph osd crush set-tunable straw_calc_version 1

    Note that this change will have no immediate effect. However, from this point forward, any ‘straw’ bucket in your CRUSH map that is adjusted will get non-buggy internal weights, and that transition may trigger some rebalancing.

    read more…

I recently had the opportunity to work on a Firefly cluster (0.80.8) in which power outages caused a failure of two OSDs. As with lots of things in technology, that’s not the whole story. The manner in which the power outages and OSD failures occurred put the cluster into a state with 5 placement groups (PGs) into an incomplete state. Before I got involved, the failed OSDs had been ejected from the cluster and new OSDs re-deployed in their place.

The good news is that one of the ‘failed’ OSDs was still readable for the most part and this allowed us to use a new tool to recover the PG contents.

WARNING: THIS IS A RISKY PROCESS! Do not attempt this on a production cluster without engaging Red Hat Ceph support. You could cause irreversible data loss in your cluster.
read more…

v0.93 Hammer release candidate released

This is the first release candidate for Hammer, and includes all of the features that will be present in the final release. We welcome and encourage any and all testing in non-production clusters to identify any problems with functionality, stability, or performance before the final Hammer release.

We suggest some caution in one area: librbd. There is a lot of new functionality around object maps and locking that is disabled by default but may still affect stability for existing images. We are continuing to shake out those bugs so that the final Hammer release (probably v0.94) will be rock solid.

Major features since Giant include:
read more…

v0.87.1 Giant released

This is the first (and possibly final) point release for Giant. Our focus on stability fixes will be directed towards Hammer and Firefly.

We recommend that all v0.87 Giant users upgrade to this release.


  • Due to a change in the Linux kernel version 3.18 and the limits of the FUSE interface, ceph-fuse needs be mounted as root on at least some systems. See issues #9997, #10277, and #10542 for details.


read more…

Ceph Developer Summit: Infernalis

Hey Cephers, it’s that time again…time for another Ceph Developer Summit! As Hammer winds its way through the maze of QA and release procedures we need to start looking forward to what will come with Infernalis (which is a cool lookin’ squid if you haven’t seen it yet). Blueprint submissions are now open for any and all work that that you would like to contribute or request of community developers. Please submit as soon as possible to ensure that it gets a CDS slot.

There will be one slight change this time around in an attempt to further centralize information. While blueprint submissions will still occur via the usual method on the wiki, all of that information will be captured in the etherpad which will be the canonical document going forward. If people like this method we’ll probably shift to a completely etherpad-based blueprint process to make it easier to capture and evolve the work for each item.

The rough schedule of CDS and Infernalis in general should look something like this:

Date Milestone
16 FEB Blueprint submissions begin
27 FEB Blueprint submissions end
02 MAR Summit agenda announced
03 MAR Ceph Developer Summit: Day 1
04 MAR Ceph Developer Summit: Day 2 (if needed)
July 2015 Infernalis Release

As always, this event will be an online event (utilizing the BlueJeans system) so that everyone can attend from their own timezone. If you are interested in submitting a blueprint or collaborating on an existing blueprint, please click the big red button below!


Submit Blueprint

scuttlemonkey out

v0.92 released

This is the second-to-last chunk of new stuff before Hammer. Big items include additional checksums on OSD objects, proxied reads in the cache tier, image locking in RBD, optimized OSD Transaction and replication messages, and a big pile of RGW and MDS bug fixes.


  • The experimental ‘keyvaluestore-dev’ OSD backend has been renamed ‘keyvaluestore’ (for simplicity) and marked as experimental. To enable this untested feature and acknowledge that you understand that it is untested and may destroy data, you need to add the following to your ceph.conf:
    enable experimental unrecoverable data corrupting features = keyvaluestore
  • The following librados C API function calls take a ‘flags’ argument whose value is now correctly interpreted:

    rados_write_op_operate() rados_aio_write_op_operate() rados_read_op_operate() rados_aio_read_op_operate()

    The flags were not correctly being translated from the librados constants to the internal values. Now they are. Any code that is passing flags to these methods should be audited to ensure that they are using the correct LIBRADOS_OP_FLAG_* constants.

  • The ‘rados’ CLI ‘copy’ and ‘cppool’ commands now use the copy-from operation, which means the latest CLI cannot run these commands against pre-firefly OSDs.
  • The librados watch/notify API now includes a watch_flush() operation to flush the async queue of notify operations. This should be called by any watch/notify user prior to rados_shutdown().


read more…

v0.91 released

We are quickly approaching the Hammer feature freeze but have a few more dev releases to go before we get there. The headline items are subtree-based quota support in CephFS (ceph-fuse/libcephfs client support only for now), a rewrite of the watch/notify librados API used by RBD and RGW, OSDMap checksums to ensure that maps are always consistent inside the cluster, new API calls in librados and librbd for IO hinting modeled after posix_fadvise, and improved storage of per-PG state.

We expect two more releases before the first Hammer release candidate (v0.93).


  • The ‘category’ field for objects has been removed. This was originally added to track PG stat summations over different categories of objects for use by radosgw. It is no longer has any known users and is prone to abuse because it can lead to a pg_stat_t structure that is unbounded. The librados API calls that accept this field now ignore it, and the OSD no longers tracks the per-category summations.
  • The output for ‘rados df’ has changed. The ‘category’ level has been eliminated, so there is now a single stat object per pool. The structure of the JSON output is different, and the plaintext output has one less column.
  • The ‘rados create <objectname> [category]’ optional category argument is no longer supported or recognized.
  •’s Rados class no longer has a __del__ method; it was causing problems on interpreter shutdown and use of threads. If your code has Rados objects with limited lifetimes and you’re concerned about locked resources, call Rados.shutdown() explicitly.
  • There is a new version of the librados watch/notify API with vastly improved semantics. Any applications using this interface are encouraged to migrate to the new API. The old API calls are marked as deprecated and will eventually be removed.
  • The librados rados_unwatch() call used to be safe to call on an invalid handle. The new version has undefined behavior when passed a bogus value (for example, when rados_watch() returns an error and handle is not defined).
  • The structure of the formatted ‘pg stat’ command is changed for the portion that counts states by name to avoid using the ‘+’ character (which appears in state names) as part of the XML token (it is not legal).


read more…

v0.80.8 Firefly released

This is a long-awaited bugfix release for firefly. It several imporant (but relatively rare) OSD peering fixes, performance issues when snapshots are trimmed, several RGW fixes, a paxos corner case fix, and some packaging updates.

We recommend that all users for v0.80.x firefly upgrade when it is convenient to do so.


read more…

Page 1 of 1512345...10...Last »
© 2015, Inktank Storage, Inc.. All rights reserved.