The Ceph Blog

Featured Post

v0.80.9 Firefly released

This is a bugfix release for firefly. It fixes a performance regression in librbd, an important CRUSH misbehavior (see below), and several RGW bugs. We have also backported support for flock/fcntl locks to ceph-fuse and libcephfs.

We recommend that all Firefly users upgrade.

For more detailed information, see the complete changelog.


  • This point release fixes several issues with CRUSH that trigger excessive data migration when adjusting OSD weights. These are most obvious when a very small weight change (e.g., a change from 0 to .01) triggers a large amount of movement, but the same set of bugs can also lead to excessive (though less noticeable) movement in other cases.

    However, because the bug may already have affected your cluster, fixing it may trigger movement back to the more correct location. For this reason, you must manually opt-in to the fixed behavior.

    In order to set the new tunable to correct the behavior:

    ceph osd crush set-tunable straw_calc_version 1

    Note that this change will have no immediate effect. However, from this point forward, any ‘straw’ bucket in your CRUSH map that is adjusted will get non-buggy internal weights, and that transition may trigger some rebalancing.

    read more…

Earlier Posts

I recently had the opportunity to work on a Firefly cluster (0.80.8) in which power outages caused a failure of two OSDs. As with lots of things in technology, that’s not the whole story. The manner in which the power outages and OSD failures occurred put the cluster into a state with 5 placement groups (PGs) into an incomplete state. Before I got involved, the failed OSDs had been ejected from the cluster and new OSDs re-deployed in their place.

The good news is that one of the ‘failed’ OSDs was still readable for the most part and this allowed us to use a new tool to recover the PG contents.

WARNING: THIS IS A RISKY PROCESS! Do not attempt this on a production cluster without engaging Red Hat Ceph support. You could cause irreversible data loss in your cluster.
read more…

v0.93 Hammer release candidate released

This is the first release candidate for Hammer, and includes all of the features that will be present in the final release. We welcome and encourage any and all testing in non-production clusters to identify any problems with functionality, stability, or performance before the final Hammer release.

We suggest some caution in one area: librbd. There is a lot of new functionality around object maps and locking that is disabled by default but may still affect stability for existing images. We are continuing to shake out those bugs so that the final Hammer release (probably v0.94) will be rock solid.

Major features since Giant include:
read more…

v0.87.1 Giant released

This is the first (and possibly final) point release for Giant. Our focus on stability fixes will be directed towards Hammer and Firefly.

We recommend that all v0.87 Giant users upgrade to this release.


  • Due to a change in the Linux kernel version 3.18 and the limits of the FUSE interface, ceph-fuse needs be mounted as root on at least some systems. See issues #9997, #10277, and #10542 for details.


read more…

Ceph Developer Summit: Infernalis

Hey Cephers, it’s that time again…time for another Ceph Developer Summit! As Hammer winds its way through the maze of QA and release procedures we need to start looking forward to what will come with Infernalis (which is a cool lookin’ squid if you haven’t seen it yet). Blueprint submissions are now open for any and all work that that you would like to contribute or request of community developers. Please submit as soon as possible to ensure that it gets a CDS slot.

There will be one slight change this time around in an attempt to further centralize information. While blueprint submissions will still occur via the usual method on the wiki, all of that information will be captured in the etherpad which will be the canonical document going forward. If people like this method we’ll probably shift to a completely etherpad-based blueprint process to make it easier to capture and evolve the work for each item.

The rough schedule of CDS and Infernalis in general should look something like this:

Date Milestone
16 FEB Blueprint submissions begin
27 FEB Blueprint submissions end
02 MAR Summit agenda announced
03 MAR Ceph Developer Summit: Day 1
04 MAR Ceph Developer Summit: Day 2 (if needed)
July 2015 Infernalis Release

As always, this event will be an online event (utilizing the BlueJeans system) so that everyone can attend from their own timezone. If you are interested in submitting a blueprint or collaborating on an existing blueprint, please click the big red button below!


Submit Blueprint

scuttlemonkey out

v0.92 released

This is the second-to-last chunk of new stuff before Hammer. Big items include additional checksums on OSD objects, proxied reads in the cache tier, image locking in RBD, optimized OSD Transaction and replication messages, and a big pile of RGW and MDS bug fixes.


  • The experimental ‘keyvaluestore-dev’ OSD backend has been renamed ‘keyvaluestore’ (for simplicity) and marked as experimental. To enable this untested feature and acknowledge that you understand that it is untested and may destroy data, you need to add the following to your ceph.conf:
    enable experimental unrecoverable data corrupting features = keyvaluestore
  • The following librados C API function calls take a ‘flags’ argument whose value is now correctly interpreted:

    rados_write_op_operate() rados_aio_write_op_operate() rados_read_op_operate() rados_aio_read_op_operate()

    The flags were not correctly being translated from the librados constants to the internal values. Now they are. Any code that is passing flags to these methods should be audited to ensure that they are using the correct LIBRADOS_OP_FLAG_* constants.

  • The ‘rados’ CLI ‘copy’ and ‘cppool’ commands now use the copy-from operation, which means the latest CLI cannot run these commands against pre-firefly OSDs.
  • The librados watch/notify API now includes a watch_flush() operation to flush the async queue of notify operations. This should be called by any watch/notify user prior to rados_shutdown().


read more…

v0.91 released

We are quickly approaching the Hammer feature freeze but have a few more dev releases to go before we get there. The headline items are subtree-based quota support in CephFS (ceph-fuse/libcephfs client support only for now), a rewrite of the watch/notify librados API used by RBD and RGW, OSDMap checksums to ensure that maps are always consistent inside the cluster, new API calls in librados and librbd for IO hinting modeled after posix_fadvise, and improved storage of per-PG state.

We expect two more releases before the first Hammer release candidate (v0.93).


  • The ‘category’ field for objects has been removed. This was originally added to track PG stat summations over different categories of objects for use by radosgw. It is no longer has any known users and is prone to abuse because it can lead to a pg_stat_t structure that is unbounded. The librados API calls that accept this field now ignore it, and the OSD no longers tracks the per-category summations.
  • The output for ‘rados df’ has changed. The ‘category’ level has been eliminated, so there is now a single stat object per pool. The structure of the JSON output is different, and the plaintext output has one less column.
  • The ‘rados create <objectname> [category]’ optional category argument is no longer supported or recognized.
  •’s Rados class no longer has a __del__ method; it was causing problems on interpreter shutdown and use of threads. If your code has Rados objects with limited lifetimes and you’re concerned about locked resources, call Rados.shutdown() explicitly.
  • There is a new version of the librados watch/notify API with vastly improved semantics. Any applications using this interface are encouraged to migrate to the new API. The old API calls are marked as deprecated and will eventually be removed.
  • The librados rados_unwatch() call used to be safe to call on an invalid handle. The new version has undefined behavior when passed a bogus value (for example, when rados_watch() returns an error and handle is not defined).
  • The structure of the formatted ‘pg stat’ command is changed for the portion that counts states by name to avoid using the ‘+’ character (which appears in state names) as part of the XML token (it is not legal).


read more…

v0.80.8 Firefly released

This is a long-awaited bugfix release for firefly. It several imporant (but relatively rare) OSD peering fixes, performance issues when snapshots are trimmed, several RGW fixes, a paxos corner case fix, and some packaging updates.

We recommend that all users for v0.80.x firefly upgrade when it is convenient to do so.


read more…

v0.90 released

This is the last development release before Christmas. There are some API cleanups for librados and librbd, and lots of bug fixes across the board for the OSD, MDS, RGW, and CRUSH. The OSD also gets support for discard (potentially helpful on SSDs, although it is off by default), and there are several improvements to ceph-disk.

The next two development releases will be getting a slew of new functionality for hammer. Stay tuned!


  • Previously, the formatted output of ‘ceph pg stat -f …’ was a full pg dump that included all metadata about all PGs in the system. It is now a concise summary of high-level PG stats, just like the unformatted ‘ceph pg stat’ command.
  • All JSON dumps of floating point values were incorrecting surrounding the value with quotes. These quotes have been removed. Any consumer of structured JSON output that was consuming the floating point values was previously having to interpret the quoted string and will most likely need to be fixed to take the unquoted number.


read more…

v0.88 released

This is the first development release after Giant. The two main features merged this round are the new AsyncMessenger (an alternative implementation of the network layer) from Haomai Wang at UnitedStack, and support for POSIX file locks in ceph-fuse and libcephfs from Yan, Zheng. There is also a big pile of smaller items that re merged while we were stabilizing Giant, including a range of smaller performance and bug fixes and some new tracepoints for LTTNG.


read more…

Page 1 of 1512345...10...Last »
© 2015, Inktank Storage, Inc.. All rights reserved.