The Ceph Blog

Featured Post

v0.90 released

This is the last development release before Christmas. There are some API cleanups for librados and librbd, and lots of bug fixes across the board for the OSD, MDS, RGW, and CRUSH. The OSD also gets support for discard (potentially helpful on SSDs, although it is off by default), and there are several improvements to ceph-disk.

The next two development releases will be getting a slew of new functionality for hammer. Stay tuned!


  • Previously, the formatted output of ‘ceph pg stat -f …’ was a full pg dump that included all metadata about all PGs in the system. It is now a concise summary of high-level PG stats, just like the unformatted ‘ceph pg stat’ command.
  • All JSON dumps of floating point values were incorrecting surrounding the value with quotes. These quotes have been removed. Any consumer of structured JSON output that was consuming the floating point values was previously having to interpret the quoted string and will most likely need to be fixed to take the unquoted number.


  • arch: fix NEON feaeture detection (#10185 Loic Dachary)
  • build: adjust build deps for yasm, virtualenv (Jianpeng Ma)
  • build: improve build dependency tooling (Loic Dachary)
  • ceph-disk: call partx/partprobe consistency (#9721 Loic Dachary)
  • ceph-disk: fix dmcrypt key permissions (Loic Dachary)
  • ceph-disk: fix umount race condition (#10096 Blaine Gardner)
  • ceph-disk: init=none option (Loic Dachary)
  • ceph-monstore-tool: fix shutdown (#10093 Loic Dachary)
  • ceph-objectstore-tool: fix import (#10090 David Zafman)
  • ceph-objectstore-tool: many improvements and tests (David Zafman)
  • ceph.spec: package rbd-replay-prep (Ken Dreyer)
  • common: add ‘perf reset …’ admin command (Jianpeng Ma)
  • common: do not unlock rwlock on destruction (Federico Simoncelli)
  • common: fix block device discard check (#10296 Sage Weil)
  • common: remove broken CEPH_LOCKDEP optoin (Kefu Chai)
  • crush: fix tree bucket behavior (Rongze Zhu)
  • doc: add build-doc guidlines for Fedora and CentOS/RHEL (Nilamdyuti Goswami)
  • doc: enable rbd cache on openstack deployments (Sebastien Han)
  • doc: improved installation nots on CentOS/RHEL installs (John Wilkins)
  • doc: misc cleanups (Adam Spiers, Sebastien Han, Nilamdyuti Goswami, Ken Dreyer, John Wilkins)
  • doc: new man pages (Nilamdyuti Goswami)
  • doc: update release descriptions (Ken Dreyer)
  • doc: update sepia hardware inventory (Sandon Van Ness)
  • librados: only export public API symbols (Jason Dillaman)
  • libradosstriper: fix stat strtoll (Dongmao Zhang)
  • libradosstriper: fix trunc method (#10129 Sebastien Ponce)
  • librbd: fix list_children from invalid pool ioctxs (#10123 Jason Dillaman)
  • librbd: only export public API symbols (Jason Dillaman)
  • many coverity fixes (Danny Al-Gaaf)
  • mds: ‘flush journal’ admin command (John Spray)
  • mds: fix MDLog IO callback deadlock (John Spray)
  • mds: fix deadlock during journal probe vs purge (#10229 Yan, Zheng)
  • mds: fix race trimming log segments (Yan, Zheng)
  • mds: store backtrace for stray dir (Yan, Zheng)
  • mds: verify backtrace when fetching dirfrag (#9557 Yan, Zheng)
  • mon: add max pgs per osd warning (Sage Weil)
  • mon: fix *_ratio units and types (Sage Weil)
  • mon: fix JSON dumps to dump floats as flots and not strings (Sage Weil)
  • mon: fix formatter ‘pg stat’ command output (Sage Weil)
  • msgr: async: several fixes (Haomai Wang)
  • msgr: simple: fix rare deadlock (Greg Farnum)
  • osd: batch pg log trim (Xinze Chi)
  • osd: clean up internal ObjectStore interface (Sage Weil)
  • osd: do not abort deep scrub on missing hinfo (#10018 Loic Dachary)
  • osd: fix ghobject_t formatted output to include shard (#10063 Loic Dachary)
  • osd: fix osd peer check on scrub messages (#9555 Sage Weil)
  • osd: fix pgls filter ops (#9439 David Zafman)
  • osd: flush snapshots from cache tier immediately (Sage Weil)
  • osd: keyvaluestore: fix getattr semantics (Haomai Wang)
  • osd: keyvaluestore: fix key ordering (#10119 Haomai Wang)
  • osd: limit in-flight read requests (Jason Dillaman)
  • osd: log when scrub or repair starts (Loic Dachary)
  • osd: support for discard for journal trim (Jianpeng Ma)
  • qa: fix osd create dup tests (#10083 Loic Dachary)
  • rgw: add location header when object is in another region (VRan Liu)
  • rgw: check timestamp on s3 keystone auth (#10062 Abhishek Lekshmanan)
  • rgw: make sysvinit script set ulimit -n properly (Sage Weil)
  • systemd: better systemd unit files (Owen Synge)
  • tests: ability to run unit tests under docker (Loic Dachary)


Earlier Posts

v0.88 released

This is the first development release after Giant. The two main features merged this round are the new AsyncMessenger (an alternative implementation of the network layer) from Haomai Wang at UnitedStack, and support for POSIX file locks in ceph-fuse and libcephfs from Yan, Zheng. There is also a big pile of smaller items that re merged while we were stabilizing Giant, including a range of smaller performance and bug fixes and some new tracepoints for LTTNG.


  • ceph-disk: Scientific Linux support (Dan van der Ster)
  • ceph-disk: respect –statedir for keyring (Loic Dachary)
  • ceph-fuse, libcephfs: POSIX file lock support (Yan, Zheng)
  • ceph-fuse, libcephfs: fix cap flush overflow (Greg Farnum, Yan, Zheng)
  • ceph-fuse, libcephfs: fix root inode xattrs (Yan, Zheng)
  • ceph-fuse, libcephfs: preserve dir ordering (#9178 Yan, Zheng)
  • ceph-fuse, libcephfs: trim inodes before reconnecting to MDS (Yan, Zheng)
  • ceph: do not parse injectargs twice (Loic Dachary)
  • ceph: make ‘ceph -s’ output more readable (Sage Weil)
  • ceph: new ‘ceph tell mds.$name_or_rank_or_gid’ (John Spray)
  • ceph: test robustness (Joao Eduardo Luis)
  • ceph_objectstore_tool: behave with sharded flag (#9661 David Zafman)
  • cephfs-journal-tool: fix journal import (#10025 John Spray)
  • cephfs-journal-tool: skip up to expire_pos (#9977 John Spray)
  • cleanup rados.h definitions with macros (Ilya Dryomov)
  • common: shared_cache unit tests (Cheng Cheng)
  • config: add $cctid meta variable (Adam Crume)
  • crush: fix buffer overrun for poorly formed rules (#9492 Johnu George)
  • crush: improve constness (Loic Dachary)
  • crushtool: add –location <id> command (Sage Weil, Loic Dachary)
  • default to libnss instead of crypto++ (Federico Gimenez)
  • doc: ceph osd reweight vs crush weight (Laurent Guerby)
  • doc: document the LRC per-layer plugin configuration (Yuan Zhou)
  • doc: erasure code doc updates (Loic Dachary)
  • doc: misc updates (Alfredo Deza, VRan Liu)
  • doc: preflight doc fixes (John Wilkins)
  • doc: update PG count guide (Gerben Meijer, Laurent Guerby, Loic Dachary)
  • keyvaluestore: misc fixes (Haomai Wang)
  • keyvaluestore: performance improvements (Haomai Wang)
  • librados: add rados_pool_get_base_tier() call (Adam Crume)
  • librados: cap buffer length (Loic Dachary)
  • librados: fix objecter races (#9617 Josh Durgin)
  • libradosstriper: misc fixes (Sebastien Ponce)
  • librbd: add missing python docstrings (Jason Dillaman)
  • librbd: add readahead (Adam Crume)
  • librbd: fix cache tiers in list_children and snap_unprotect (Adam Crume)
  • librbd: fix performance regression in ObjectCacher (#9513 Adam Crume)
  • librbd: lttng tracepoints (Adam Crume)
  • librbd: misc fixes (Xinxin Shu, Jason Dillaman)
  • mds: fix sessionmap lifecycle bugs (Yan, Zheng)
  • mds: initialize root inode xattr version (Yan, Zheng)
  • mds: introduce auth caps (John Spray)
  • mds: misc bugs (Greg Farnum, John Spray, Yan, Zheng, Henry Change)
  • misc coverity fixes (Danny Al-Gaaf)
  • mon: add ‘ceph osd rename-bucket …’ command (Loic Dachary)
  • mon: clean up auth list output (Loic Dachary)
  • mon: fix ‘osd crush link’ id resolution (John Spray)
  • mon: fix misc error paths (Joao Eduardo Luis)
  • mon: fix paxos off-by-one corner case (#9301 Sage Weil)
  • mon: new ‘ceph pool ls [detail]’ command (Sage Weil)
  • mon: wait for writeable before cross-proposing (#9794 Joao Eduardo Luis)
  • msgr: avoid useless new/delete (Haomai Wang)
  • msgr: fix delay injection bug (#9910 Sage Weil, Greg Farnum)
  • msgr: new AsymcMessenger alternative implementation (Haomai Wang)
  • msgr: prefetch data when doing recv (Yehuda Sadeh)
  • osd: add erasure code corpus (Loic Dachary)
  • osd: add misc tests (Loic Dachary, Danny Al-Gaaf)
  • osd: cleanup boost optionals (William Kennington)
  • osd: expose non-journal backends via ceph-osd CLI (Hoamai Wang)
  • osd: fix JSON output for stray OSDs (Loic Dachary)
  • osd: fix ioprio options (Loic Dachary)
  • osd: fix transaction accounting (Jianpeng Ma)
  • osd: misc optimizations (Xinxin Shu, Zhiqiang Wang, Xinze Chi)
  • osd: use FIEMAP_FLAGS_SYNC instead of fsync (Jianpeng Ma)
  • rados: fix put of /dev/null (Loic Dachary)
  • rados: parse command-line arguments more strictly (#8983 Adam Crume)
  • rbd-fuse: fix memory leak (Adam Crume)
  • rbd-replay-many (Adam Crume)
  • rbd-replay: –anonymize flag to rbd-replay-prep (Adam Crume)
  • rbd: fix ‘rbd diff’ for non-existent objects (Adam Crume)
  • rbd: fix error when striping with format 1 (Sebastien Han)
  • rbd: fix export for image sizes over 2GB (Vicente Cheng)
  • rbd: use rolling average for rbd bench-write throughput (Jason Dillaman)
  • rgw: send explicit HTTP status string (Yehuda Sadeh)
  • rgw: set length for keystone token validation request (#7796 Yehuda Sadeh, Mark Kirkwood)
  • udev: fix rules for CentOS7/RHEL7 (Loic Dachary)
  • use clock_gettime instead of gettimeofday (Jianpeng Ma)
  • set up environment for s3-tests (Luis Pabon)


v0.87 Giant released

This release will form the basis for the stable release Giant, v0.87.x. Highlights for Giant include:

  • RADOS Performance: a range of improvements have been made in the OSD and client-side librados code that improve the throughput on flash backends and improve parallelism and scaling on fast machines.
  • CephFS: we have fixed a raft of bugs in CephFS and built some basic journal recovery and diagnostic tools. Stability and performance of single-MDS systems is vastly improved in Giant. Although we do not yet recommend CephFS for production deployments, we do encourage testing for non-critical workloads so that we can better guage the feature, usability, performance, and stability gaps.
  • Local Recovery Codes: the OSDs now support an erasure-coding scheme that stores some additional data blocks to reduce the IO required to recover from single OSD failures.
  • Degraded vs misplaced: the Ceph health reports from ‘ceph -s’ and related commands now make a distinction between data that is degraded (there are fewer than the desired number of copies) and data that is misplaced (stored in the wrong location in the cluster). The distinction is important because the latter does not compromise data safety.
  • Tiering improvements: we have made several improvements to the cache tiering implementation that improve performance. Most notably, objects are not promoted into the cache tier by a single read; they must be found to be sufficiently hot before that happens.
  • Monitor performance: the monitors now perform writes to the local data store asynchronously, improving overall responsiveness.
  • Recovery tools: the ceph_objectstore_tool is greatly expanded to allow manipulation of an individual OSDs data store for debugging and repair purposes. This is most heavily used by our QA infrastructure to exercise recovery code.

read more…

v0.80.7 Firefly released

This release fixes a few critical issues with v0.80.6, particularly with clusters running mixed versions.

We recommend that all v0.80.x Firefly users upgrade to this release.

For more detailed information, see the complete changelog.


  • osd: fix invalid memory reference in log trimming (#9731 Samuel Just)
  • osd: fix use-after-free in cache tiering code (#7588 Sage Weil)
  • osd: remove bad backfill assertion for mixed-version clusters (#9696 Samuel Just)


A few days ago I made a challenge to the open storage community to support the Ada Initiative‘s work to improve gender diversity in open source and open data communities, and offered to match contributions if we can reach $8192.  I’m pleased to say that the response so far has been great, and we’re now over half way there!  I’d like to thank those who have contributed so far, including

Josef Bacik
Dave McAllister
Eric Sandeen
Zulah and Carlos
Geoff Arnold
Andreas Dilger
Tom Lyon
Greg Farnum
Travis Rhoden
Garrett D’Amore
Erin M. Evans
Travis Rhoden
Dan V
Peter Tribble
Bryan Horstmann-Allen
Patrick McGarry
7 anonymous donors

I’m quite pleased to see Linux, Lustre, GlusterFS, and OpenZFS / Illumos represented on this list! It’s also great to see that this is an issue that the Illumos community has already identified and recently called out:



Increasing awareness of the issue and showing broad support for these campaigns is just as important as the money raised, so please contribute or help spread the word even if it is a token amount!

I would love to see some similar attention to this issue in the OpenStack Cinder and Swift communities. What do you think?

Donate now

v0.80.6 Firefly released

This is a major bugfix release for firefly, fixing a range of issues in the OSD and monitor, particularly with cache tiering. There are also important fixes in librados, with the watch/notify mechanism used by librbd, and in radosgw.

A few pieces of new functionality of been backported, including improved ‘ceph df’ output (view amount of writeable space per pool), support for non-default cluster names when using sysvinit or systemd, and improved (and fixed) support for dmcrypt.

We recommend that all v0.80.x Firefly users upgrade to this release.

For more detailed information, see the complete changelog.


read more…

I’d like to take a moment away from your regularly scheduled storage revolution to talk about the Ada Initiative: who they are, what they do, and why it is important to open source storage communities. I’m also going to challenge you to raise $8192 for them, and I’ll match that dollar for dollar if you do. (If you already know who they are and support their work, go ahead and donate now.)

Research, experience, and common sense have demonstrated that diverse communities perform better: they are more dynamic, they generate better ideas, and they are more pleasant to be a part of. On the gender axis, however, most open source communities–including storage–are extremely homogeneous, with far less than 10% participation from women. It surprised me to learn that this is significantly lower than software engineering in general, where you find women make up about 30% of the community.

This frustrates me on two levels.  First, open source is about building communities around code that bring together diverse organizations and interests in pursuit of a common goal.  Why is something as fundamental as being inclusive of women a problem here?  Our “open” communities should be setting the standard, not lagging behind when it comes to diversity.

Second, I am passionate about Ceph not because I think it is the end-all solution to everyone’s storage problems, but because it is playing an important role in breaking the stranglehold that proprietary vendors have over a large and increasingly critical industry.  If we are going to win the larger battle of making the industry-leading, state of the art, de facto choice for storage an open platform, we will need all the help we can get: the incumbents are well-entrenched, they are better funded, and (it turns out) they are doing a better job of attracting diverse talent.

The Ada Initiative is one of the few organizations who is addressing gender diversity head-on, with a specific focus on open source and open data communities.  You might already know about their successful campaign to get most open source conferences to adopt codes of conduct, which make women and other marginalized groups more likely to attend.  Their AdaCamp conferences for women in open tech/culture, and their Ally Skills Workshops which teach men how to support women in everyday ways have proven to be extremely popular and effective both in welcoming and empowering women.  These programs have proven so successful, in fact, that (for lack of staff and funding) they are currently unable to meet the full demand for them: all three AdaCamps this year sold out months early, and they are booking solid for Ally Skills Workshops for the next 3 months.

As Inktank and as DreamHost we were proud to be early supporters of the Ada Initiative.  Today, I am proud to continue that support with a personal challenge to the open source storage community:

If you raise $8192 by next Wednesday, I will match that contribution dollar for dollar.

This challenge applies to the larger open source storage world, including the Ceph, Gluster, Swift, and Linux storage and file systems communities, and goes until Wednesday, October 8th, when the Ada Initiative’s fundraising drive ends.

Donate now

Ceph Developer Summit: Hammer

As many of you Ceph Day attendees are no doubt aware, we’re fast approaching the release date for the ‘Giant’ release of Ceph. With that, it’s time to get together at another virtual Ceph Developer Summit and chat about what development work is going in to the ‘Hammer’ release. Blueprint submissions are open now, so if you have any work you would like to contribute or request of our community developers, please submit it as soon as possible to ensure it gets a CDS slot.

The rough schedule of CDS and Hammer in general should look something like this:

Date Milestone
30 SEP Blueprint submissions begin
17 OCT Blueprint submissions end
21 OCT Summit agenda announced
28 OCT Ceph Developer Summit: Day 1
29 OCT Ceph Developer Summit: Day 2 (if needed)
January 2015 Hammer Release

If there are enough sessions we are exploring the possibility of expanding our event into three days, but that will be predicated on the blueprint workload. As always, this event will be an online event (utilizing the BlueJeans system) so that everyone can attend from their own timezone. If you are interested in submitting a blueprint or collaborating on an existing blueprint, please click the big red button below!


Submit Blueprint

scuttlemonkey out

v0.67.11 Dumpling released

This stable update for Dumpling fixes several important bugs that affect a small set of users.

We recommend that all Dumpling users upgrade at their convenience. If none of these issues are affecting your deployment there is no urgency.


  • common: fix sending dup cluster log items (#9080 Sage Weil)
  • doc: several doc updates (Alfredo Deza)
  • libcephfs-java: fix build against older JNI headesr (Greg Farnum)
  • librados: fix crash in op timeout path (#9362 Matthias Kiefer, Sage Weil)
  • librbd: fix crash using clone of flattened image (#8845 Josh Durgin)
  • librbd: fix error path cleanup when failing to open image (#8912 Josh Durgin)
  • mon: fix crash when adjusting pg_num before any OSDs are added (#9052 Sage Weil)
  • mon: reduce log noise from paxos (Aanchal Agrawal, Sage Weil)
  • osd: allow scrub and snap trim thread pool IO priority to be adjusted (Sage Weil)
  • osd: fix mount/remount sync race (#9144 Sage Weil)


v0.85 released

This is the second-to-last development release before Giant that contains new functionality. The big items to land during this cycle are the messenger refactoring from Matt Benjmain that lays some groundwork for RDMA support, a performance improvement series from SanDisk that improves performance on SSDs, lots of improvements to our new standalone civetweb-based RGW frontend, and a new ‘osd blocked-by’ mon command that allows admins to easily identify which OSDs are blocking peering progress. The other big change is that the OSDs and Monitors now distinguish between “misplaced” and “degraded” objects: the latter means there are fewer copies than we’d like, while the former simply means the are not stored in the locations where we want them to be.

Also of note is a change to librbd that enables client-side caching by default. This is coupled with another option that makes the cache write-through until a “flush” operations is observed: this implies that the librbd user (usually a VM guest OS) supports barriers and flush and that it is safe for the cache to switch into writeback mode without compromising data safety or integrity. It has long been recommended practice that these options be enabled (e.g., in OpenStack environments) but until now it has not been the default.

We have frozen the tree for the looming Giant release, and the next development release will be a release candidate with a final batch of new functionality.

read more…

Page 1 of 1412345...10...Last »
© 2014, Inktank Storage, Inc.. All rights reserved.