Archives: April 2012

Last month Inktank launched a community help program that we called “Office Hours” in an attempt to provide specific hours where an engineer would be available to answer questions from the community. These efforts have done a lot for both the ability to answer questions, as well as the ability for our engineers to focus their attention on development while not on duty. Our hope was that the community would jump in and participate in these efforts just like they have done with development efforts. We were not disappointed!

Three different non-Inktank groups have stepped up and volunteered to be resident “super-geeks” and help the community. Since we had such a great response we are relaunching this effort as “Geek on Duty” and tweaking our help page a bit to make it easier for people to get the type of assistance that is most appropriate to their needs.

read more…

Ceph Developer Summit

Come one, come all, to the world’s first (virtual) Ceph Developer Summit! Now that the Ceph project has moved to a regular release schedule we are trying to be more transparent about the planning process. To that end, we would like to invite participation from any interested parties as the next release of Ceph (and beyond) is planned.

Starting today we will be accepting blueprint submissions via the relaunched Ceph wiki. The timeline for submissions and announcements is as follows:

Date Milestone
11 APR Summit announced, blueprint submissions begin
29 APR Blueprint submission closed
01 MAY Summit agenda announced
07 MAY Ceph Developer Summit
08 JUL Dumpling Feature Freeze
August Dumpling Release

Interested in submitting a blueprint? Click this button. Want more details? Read on!

Submit Blueprint

read more…

Ceph support in OpenNebula 4.0

The Ceph team has been extremely blessed with the number of new people who choose to become involved with our community in some way. Even more exciting are the sheer numbers of people committing code and integration work, and the folks from OpenNebula are a great example of this in action.

At the end of February, one of the OpenNebula developers reached out to let us know that their integration work with Ceph was nearly complete. Below you can find a brief overview of how Ceph behaves in an OpenNebula environment, as well as a link to how to get it set up. Read on for details!

read more…

v0.60 released

Another sprint and another release! This is the last development release
before v0.61 Cuttlefish, which is due out in 4 weeks (around May 1). The
next few weeks will be focused on making sure everything we’ve built over
the last few months is rock solid and ready for you. We will have an -rc
release ready for you in a couple of weeks. In the meantime, v0.60 has a
few goodies:

  • osd: make tracking of object snapshot metadata more efficient (Sam Just)
  • osd: misc fixes to PG split (Sam Just)
  • osd: improve journal corruption detection (Sam Just)
  • osd: improve handling when disk fills up (David Zafman)
  • osd: add ‘noscrub’, ‘nodeepscrub’ osdmap flags (David Zafman)
  • osd: fix hang in ‘journal aio = true’ mode (Sage Weil)
  • ceph-disk-prepare: fix mkfs args on old distros (Alexandre Marangone)
  • ceph-disk-activate: improve multicluster support, error handling (Sage Weil)
  • librbd: optionally wait for flush before enabling writeback (Josh Durgin)
  • crush: update weights for all instances of an item, not just the first (Sage Weil)
  • mon: shut down safely if disk approaches full (Joao Luis)
  • rgw: fix Content-Length on 32-bit machines (Jan Harkes)
  • mds: store and update backpointers/traces on directory, file objects (Sam Lang)
  • mds: improve session cleanup (Sage Weil)
  • mds, ceph-fuse: fix bugs with replayed requests after MDS restart (Sage Weil)
  • ceph-fuse: enable kernel cache invalidation (Sam Lang)
  • libcephfs: new topo API requests for Hadoop (Noah Watkins)
  • ceph-fuse: session handling cleanup, bug fixes (Sage Weil)
  • much code cleanup and optimization (Danny Al-Gaaf)
  • use less memory for logging by default
  • upstart: automatically set osd weight based on df (Guilhem Lettron)
  • init-ceph, mkcephfs: close a few security holes with -a (Sage Weil)
  • rpm/deb: do not remove /var/lib/ceph on purge (v0.59 was the only release to do so)

You can get v0.60 from the usual places:

v0.46 released

Another sprint, and v0.46 is ready.  Big items in this release include:

  • rbd: new caching mode (see below)
  • rbd: trim/discard support
  • cluster naming
  • osd: new default locations (slimmer .conf files, see below)
  • osd: various journal replay fixes for non-btrfs file systems
  • log: async and multi-level logging (see below)

The biggest new item here is the new RBD (librbd) cache mode that Josh has been working on.  This reuses a caching module that ceph-fuse and libcephfs have used for ages, so the cache portion of the code is well-tested, but the integration with librbd is new, and there are some (rare) failure cases that are not yet handled in this version. We recommend it for performance and failure testing at this stage, but not for production use just yet–wait for v0.47.  librbd also got trim/discard support.  Patches for wire it up to qemu are still working their way upstream (and won’t work for virtio until virtio gets discard support).

We’ve revamped some of the default locations for data directories and log files and incoporated a cluster name configurable.  By default, the cluster name is ‘ceph’, and the config file is /etc/ceph/$cluster.conf (so ceph.conf is still the default).  The $cluster substitution variable is used the other default locations, allowing the same host to contain daemons participating in different clusters.  All data defaults to /var/lib/ceph/$type/$cluster-$id (e.g., /var/lib/ceph/osd/ceph-123 for osd_data), and logs go to /var/log/ceph/$cluster.$type.$id.  You can, of course, still override these with your own locations as before.

There is also new logging code that allows the daemons to gather debug information at a different (higher) log level than what is actually written to the log (asynchronously).  In the event of a crash (seg fault, failed assertion), the full log is dumped to the log for our reading pleasure.  The general syntax looks like:

debug foo = 1/10

where ‘foo’ is the subsystem name (e.g., “osd”, “filestore”, etc.), the first number is the debug level that is written to the log, and the second number is the level that is gathered in memory (we keep many thousands of past entries around by default).  The hope is that people can gather debug information in memory with a lower performance impact and avoid eating their disk space.  We’ll need some more operational experience to find out how expensive that will really be.

You can get v0.46 from the usual locations:

 

Released v0.45

v0.45 is ready!  Notable changes include:

  • osd: large xattrs stored in leveldb, allowing XFS and ext4 to be used with radosgw
  • osd: new heartbeat code (simpler, more robust)
  • osd: fixed some glaring journal performance problems
  • fixed encoding performance regression
  • ceph: less noisy output by default
  • msgr: code cleanups
  • doc: misc cleanups
  • qa: improved testing coverage

In short, some performance and bug fixes but no huge functionality.  v0.46 will be a bit more exciting on that front.

You can get packages from the usual locations:

v0.27 released

v0.27 is done!  This mostly bugfixes, cleanups, and incremental
improvements.  Notably:
* lots of cleanups in config file loading, handling, to make library
behavior sane, warn on config file errors, etc.
* osd: fix out of order ack bug
* mount.ceph: uses kernel keys interface (when available) to pass secrets
* osd, mon: use new syncfs() syscall where available
* librados: compound object operation support
* librbd: snapshot images no longer writeable
* librbd: rollback to snapshot and other misc fixes
* mds: journal replay cleanups, performance, bug fixes
* mds: many clustered mds fixes (mostly with rename and recovery)
* mds: standby-replay mode fixes
* mds: robust lookuphash for better nfs reexport support
* mon: bugfixes with mds takeover
* obsync: synchronize object buckets between s3, directory, swift, rados
* osd: misc recovery fixes
* radosgw: dup bucket creation fixes
* radosgw: many small protocol fixes
As part of the radosgw work we’ve created s3-tests.git, which includes a
bunch of simple tests to verify implementations of the s3 protocol.  See
git://ceph.newdream.net/git/s3-tests.git
http://ceph.newdream.net/git/?p=s3-tests.git;a=summary
For v0.28 we’re focusing on the OSD cluster, radosgw, and continuing with
the MDS clustering fixes.  Sam and Josh are working on a refactor in the
OSD peering code that will make peering more understandable, verifiable,
and (we hope) less buggy.
Relevant URLs:
* Direct download at: http://ceph.newdream.net/download/ceph-0.27.tar.gz
* For Debian and Ubuntu packages, see http://ceph.newdream.net/wiki/Debian

v0.27 is done!  This mostly bugfixes, cleanups, and incremental improvements.  Notably:

  • lots of cleanups in config file loading, handling, to make library behavior sane, warn on config file errors, etc.
  • osd: fix out of order ack bug
  • mount.ceph: uses kernel keys interface (when available) to pass secrets
  • osd, mon: use new syncfs() syscall where available
  • librados: compound object operation support
  • librbd: snapshot images no longer writeable
  • librbd: rollback to snapshot and other misc fixes
  • mds: journal replay cleanups, performance, bug fixes
  • mds: many clustered mds fixes (mostly with rename and recovery)
  • mds: standby-replay mode fixes
  • mds: robust lookuphash for better nfs reexport support
  • mon: bugfixes with mds takeover
  • obsync: synchronize object buckets between s3, directory, swift, rados
  • osd: misc recovery fixes
  • radosgw: dup bucket creation fixes
  • radosgw: many small protocol fixes

As part of the radosgw work we’ve created s3-tests.git, which includes a  bunch of simple tests to verify implementations of the s3 protocol.  See

For v0.28 we’re focusing on the OSD cluster, radosgw, and continuing with the MDS clustering fixes.  Sam and Josh are working on a refactor in the  OSD peering code that will make peering more understandable, verifiable, and (we hope) less buggy.

Relevant URLs:

v0.26 released

We tagged v0.26 a few days ago.  Changes since the last release include:

  • misc build, configure, rpm build fixes
  • crypto: support for libnss (which exists in RHEL environments)
  • osd: improved throttling
  • osd: scrub no longer blocks requests
  • osd: vastly improved map update performance
  • osd: recovery fixes
  • librados, osd: support for object locator strings
  • librados: API fixes, extensions
  • mds: recovery fix for large directories
  • mds: journaling fixes
  • mds: rstats fixes
  • radosgw: Swift API support.  many fixes

For v0.27 we’re continuing to focus on stabilizing the OSD and radosgw.  There have also been a flurry of bugs found (and fixed!) in the MDS with fsstress from LTP (which, BTW, is a pretty great tool).  As part of this we’re chipping away at the clustered MDS problems as well.  See the current roadmap for the next few intermediate releases and current set of desired 1.0 features.

Relevant URLs:

v0.20 released

After a long few weeks of debugging, we’re releasing v0.20.  The goal here is to get something out prior to the v2.6.34 kernel release (which includes the Ceph client) with most of the pending improvements.  Changes since v0.19 include:

  • osd: new filestore, journaling infrastructure.  (lower latency writes, btrfs no longer strictly required)
  • msgr: wire protocol improvements
  • mds: reduced memory utilization (still more to do!)
  • auth: many auth_x cleanups and improvements
  • librados: some cleanup; C++ API now usable
  • many bug fixes throughout

There are a handful of bugs that we’ve seen but haven’t been able to reproduce reliably.  As those are fixed there will be a v0.20.1 point release.  In the meantime, work continues on v0.21.  Upcoming changes include:

  • performance improvements
  • rbd: rados block device (kvm and native linux drivers)
  • flock/fnctl lock support
  • lazy io
  • allow client reconnect even after mds has restarted (useful for clients temporarily disconnected during mds restarts)
  • cluster mds fixes

To get it:

RPMs will be included in the soon to be released Fedora 13.  There is also a ceph.spec file in git to build your own.

Kernel client update

The Linux kernel client has stabilized to the point where you can untar and build a kernel source tree (and unmount it cleanly) without any problems.  Yay!

© 2013, Inktank Storage, Inc.. All rights reserved.