The Ceph Blog

Featured Post

v0.79 released

This release is intended to serve as a release candidate for firefly, which will hopefully be v0.80. No changes are being made to the code base at this point except those that fix bugs. Please test this release if you intend to make use of the new erasure-coded pools or cache tiers in firefly.

This release fixes a range of bugs found in v0.78 and streamlines the user experience when creating erasure-coded pools. There is also a raft of fixes for the MDS (multi-mds, directory fragmentation, and large directories). The main notable new piece of functionality is a small change to allow radosgw to use an erasure-coded pool for object data.

UPGRADING

  • Erasure pools created with v0.78 will no longer function with v0.79. You will need to delete the old pool and create a new one.
  • A bug was fixed in the authentication handshake with big-endian architectures that prevent authentication between big- and little-endian machines in the same cluster. If you have a cluster that consists entirely of big-endian machines, you will need to upgrade all daemons and clients and restart.
  • The ‘ceph.file.layout’ and ‘ceph.dir.layout’ extended attributes are no longer included in the listxattr(2) results to prevent problems with ‘cp -a’ and similar tools.
  • Monitor ‘auth’ read-only commands now expect the user to have ‘rx’ caps. This is the same behavior that was present in dumpling, but in emperor and more recent development releases the ‘r’ cap was sufficient. The affected commands are:
    ceph auth export
    ceph auth get
    ceph auth get-key
    ceph auth print-key
    ceph auth list

NOTABLE CHANGES

  • ceph-conf: stop creating bogus log files (Josh Durgin, Sage Weil)
  • common: fix authentication on big-endian architectures (Dan Mick)
  • debian: change directory ownership between ceph and ceph-common (Sage Weil)
  • init: fix startup ordering/timeout problem with OSDs (Dmitry Smirnov)
  • librbd: skip zeroes/holes when copying sparse images (Josh Durgin)
  • mds: cope with MDS failure during creation (John Spray)
  • mds: fix crash from client sleep/resume (Zheng Yan)
  • mds: misc fixes for directory fragments (Zheng Yan)
  • mds: misc fixes for larger directories (Zheng Yan)
  • mds: misc fixes for multiple MDSs (Zheng Yan)
  • mds: remove .ceph directory (John Spray)
  • misc coverity fixes, cleanups (Danny Al-Gaaf)
  • mon: add erasure profiles and improve erasure pool creation (Loic Dachary)
  • mon: ‘ceph osd pg-temp …’ and primary-temp commands (Ilya Dryomov)
  • mon: fix pool count in ‘ceph -s’ output (Sage Weil)
  • msgr: improve connection error detection between clients and monitors (Greg Farnum, Sage Weil)
  • osd: add/fix CPU feature detection for jerasure (Loic Dachary)
  • osd: improved scrub checks on clones (Sage Weil, Sam Just)
  • osd: many erasure fixes (Sam Just)
  • osd: move to jerasure2 library (Loic Dachary)
  • osd: new tests for erasure pools (David Zafman)
  • osd: reduce scrub lock contention (Guang Yang)
  • rgw: allow use of an erasure data pool (Yehuda Sadeh)

DOWNLOADING

Earlier Posts

v0.78 released

This development release includes two key features: erasure coding and cache tiering. A huge amount of code was merged for this release and several additional weeks were spent stabilizing the code base, and it is now in a state where it is ready to be tested by a broader user base.

This is not the firefly release. Firefly will be delayed for at least another sprint so that we can get some operational experience with the new code and do some additional testing before committing to long term support.

Please note that while it is possible to create and test erasure coded pools in this release, the pools will not be usable when you upgrade to v0.79 as the OSDMap encoding will subtlely change. Please do not populate your test pools with important data that can’t be reloaded.

UPGRADING

  • Upgrade daemons in the following order:
    1. Monitors
    2. OSDs
    3. MDSs and/or radosgw

    If the ceph-mds daemon is restarted first, it will wait until all OSDs have been upgraded before finishing its startup sequence. If the ceph-mon daemons are not restarted prior to the ceph-osd daemons, they will not correctly register their new capabilities with the cluster and new features may not be usable until they are restarted a second time.

  • Upgrade radosgw daemons together. There is a subtle change in behavior for multipart uploads that prevents a multipart request that was initiated with a new radosgw from being completed by an old radosgw.
  • CephFS recently added support for a new ‘backtrace’ attribute on file data objects that is used for lookup by inode number (i.e., NFS reexport and hard links), and will later be used by fsck repair. This replaces the existing anchor table mechanism that is used for hard link resolution. In order to completely phase that out, any inode that has an outdated backtrace attribute will get updated when the inode itself is modified. This will result in some extra workload after a legacy CephFS file system is upgraded.
  • The per-op return code in librados’ ObjectWriteOperation interface is now filled in.
  • The librados cmpxattr operation now handles xattrs containing null bytes as data rather than null-terminated strings.
  • Compound operations in librados that create and then delete the same object are now explicitly disallowed (they fail with -EINVAL).

NOTABLE CHANGES

  • ceph-brag: new client and server tools (Sebastien Han, Babu Shanmugam)
  • ceph-disk: use partx on RHEL or CentOS instead of partprobe (Alfredo Deza)
  • ceph: fix combination of ‘tell’ and interactive mode (Joao Eduardo Luis)
  • ceph-fuse: fix bugs with inline data and multiple MDSs (Zheng Yan)
  • client: fix getcwd() to use new LOOKUPPARENT operation (Zheng Yan)
  • common: fall back to json-pretty for admin socket (Loic Dachary)
  • common: fix ‘config dump’ debug prefix (Danny Al-Gaaf)
  • common: misc coverity fixes (Danny Al-Gaaf)
  • common: throtller, shared_cache performance improvements, TrackedOp (Greg Farnum, Samuel Just)
  • crush: fix JSON schema for dump (John Spray)
  • crush: misc cleanups, tests (Loic Dachary)
  • crush: new vary_r tunable (Sage Weil)
  • crush: prevent invalid buckets of type 0 (Sage Weil)
  • keyvaluestore: add perfcounters, misc bug fixes (Haomai Wang)
  • keyvaluestore: portability improvements (Noah Watkins)
  • libcephfs: API changes to better support NFS reexport via Ganesha (Matt Benjamin, Adam Emerson, Andrey Kuznetsov, Casey Bodley, David Zafman)
  • librados: API documentation improvements (John Wilkins, Josh Durgin)
  • librados: fix object enumeration bugs; allow iterator assignment (Josh Durgin)
  • librados: streamline tests (Josh Durgin)
  • librados: support for atomic read and omap operations for C API (Josh Durgin)
  • librados: support for osd and mon command timeouts (Josh Durgin)
  • librbd: pass allocation hints to OSD (Ilya Dryomov)
  • logrotate: fix bug that prevented rotation for some daemons (Loic Dachary)
  • mds: avoid duplicated discovers during recovery (Zheng Yan)
  • mds: fix file lock owner checks (Zheng Yan)
  • mds: fix LOOKUPPARENT, new LOOKUPNAME ops for reliable NFS reexport (Zheng Yan)
  • mds: fix xattr handling on setxattr (Zheng Yan)
  • mds: fix xattrs in getattr replies (Sage Weil)
  • mds: force backtrace updates for old inodes on update (Zheng Yan)
  • mds: several multi-mds and dirfrag bug fixes (Zheng Yan)
  • mon: encode erasure stripe width in pool metadata (Loic Dachary)
  • mon: erasure code crush rule creation (Loic Dachary)
  • mon: erasure code plugin support (Loic Dachary)
  • mon: fix bugs in initial post-mkfs quorum creation (Sage Weil)
  • mon: fix error output to terminal during startup (Joao Eduardo Luis)
  • mon: fix legacy CRUSH tunables warning (Sage Weil)
  • mon: fix osd_epochs lower bound tracking for map trimming (Sage Weil)
  • mon: fix OSDMap encoding features (Sage Weil, Aaron Ten Clay)
  • mon: fix ‘pg dump’ JSON output (John Spray)
  • mon: include dirty stats in ‘ceph df detail’ (Sage Weil)
  • mon: list quorum member names in quorum order (Sage Weil)
  • mon: prevent addition of non-empty cache tier (Sage Weil)
  • mon: prevent deletion of CephFS pools (John Spray)
  • mon: warn when cache tier approaches ‘full’ (Sage Weil)
  • osd: allocation hint, with XFS support (Ilya Dryomov)
  • osd: erasure coded pool support (Samuel Just)
  • osd: fix bug causing slow/stalled recovery (#7706) (Samuel Just)
  • osd: fix bugs in log merging (Samuel Just)
  • osd: fix/clarify end-of-object handling on read (Loic Dachary)
  • osd: fix impolite mon session backoff, reconnect behavior (Greg Farnum)
  • osd: fix SnapContext cache id bug (Samuel Just)
  • osd: increase default leveldb cache size and write buffer (Sage Weil, Dmitry Smirnov)
  • osd: limit size of ‘osd bench …’ arguments (Joao Eduardo Luis)
  • osdmaptool: new –test-map-pgs mode (Sage Weil, Ilya Dryomov)
  • osd, mon: add primary-affinity to adjust selection of primaries (Sage Weil)
  • osd: new ‘status’ admin socket command (Sage Weil)
  • osd: simple tiering agent (Sage Weil)
  • osd: store checksums for erasure coded object stripes (Samuel Just)
  • osd: tests for objectstore backends (Haomai Wang)
  • osd: various refactoring and bug fixes (Samuel Just, David Zafman)
  • rados: add ‘set-alloc-hint’ command (Ilya Dryomov)
  • rbd-fuse: fix enumerate_images overflow, memory leak (Ilya Dryomov)
  • rbdmap: fix upstart script (Stephan Renatus)
  • rgw: avoid logging system events to usage log (Yehuda Sadeh)
  • rgw: fix Swift range reponse (Yehuda Sadeh)
  • rgw: improve scalability for manifest objects (Yehuda Sadeh)
  • rgw: misc fixes for multipart objects, policies (Yehuda Sadeh)
  • rgw: support non-standard MultipartUpload command (Yehuda Sadeh)

You can get v0.78 from the usual locations:

The past couple of weeks have been a veritable sharknado of activity for the Ceph Community. From our most successful Ceph Day yet last week in Frankfurt, Germany, to another great quarterly developer summit as work begins on the “Giant” release. It is great to see that the engagement and adoption trends are continuing and we are definitely enjoying the fruits of a rich and productive community.

SHARKNADO

Read on for details of these Ceph community events.

read more…

There is only a little over a week left to vote for OpenStack Summit talks for the upcoming Atlanta event.

2014-02-20_15-21-07

While it can be hard to narrow down the list since there are so many great talks, we thought it might be helpful to create a short list of talks that touch on Ceph or something closely related. If any of these topics interest you please stop on by and give them a vote.

While you are there you can peruse some of the other great talks and perhaps find a few others to endorse. At the very least you should book your tickets now as the OpenStack events are always jam packed with useful information and great people.

See you there!

scuttlemonkey out

v0.77 released

This is the final development release before the Firefly feature freeze. The main items in this release include some additional refactoring work in the OSD IO path (include some locking improvements), per-user quotas for the radosgw, a switch to civetweb from mongoose for the prototype radosgw standalone mode, and a prototype leveldb-based backend for the OSD. The C librados API also got support for atomic write operations (read side transactions will appear in v0.78).

UPGRADING

  • The ‘ceph -s’ or ‘ceph status’ command’s ‘num_in_osds’ field in the JSON and XML output has been changed from a string to an int.
  • The recently added ‘ceph mds set allow_new_snaps’ command’s syntax has changed slightly; it is now ‘ceph mds set allow_new_snaps true’. The ‘unset’ command has been removed; instead, set the value to ‘false’.
  • The syntax for allowing snapshots is now ‘mds set allow_new_snaps <true|false>’ instead of ‘mds <set,unset> allow_new_snaps’.

read more…

v0.67.7 Dumpling released

This Dumpling point release fixes a few critical issues in v0.67.6.

All v0.67.6 users are urgently encouraged to upgrade. We also recommend that all v0.67.5 (or older) users upgrade.

The v0.67.7 point release contains a number of important fixed for the OSD, monitor, and radosgw. Most significantly, a change that forces large object attributes to spill over into leveldb has been backported that can prevent objects and the cluster from being damaged by large attributes (which can be induced via the radosgw). There is also a set of fixes that improves data safety and RADOS semantics when the cluster becomes full and then non-full.

UPGRADING

  • Once you have upgraded a radosgw instance or OSD to v0.67.7, you should not downgrade to a previous version.
  • The OSD has long contained a feature that allows large xattrs to spill over into the leveldb backing store in situations where not all local file systems are able to store them reliably. This option is now enabled unconditionally in order to avoid rare cases where storing large xattrs renders the object unreadable. This is known to be triggered by very large multipart objects, but could be caused by other workloads as well. Although there is some small risk that performance for certain workloads will degrade, it is more important that data be retrievable. Note that newer versions of Ceph (e.g., firefly) do some additional work to avoid the potential performance regression in this case, but that is current considered too complex for backport to the Dumpling stable series.
  • It is very dangerous to downgrade from v0.67.6 to a prior version of Dumpling. If the old version does not have ‘filestore xattr use omap = true’ it may not be able to read all xattrs for an object and can cause undefined behavior.
  • read more…

Yesterday Mirantis announced their efforts towards Open Source vendor certifications for OpenStack that seek to build and accelerate some of the great work that has been going on in the Cinder community. This is huge, and in more ways than immediately obvious. Unfortunately, in recent history “The Cloud” has been such an overused buzzword, and encompasses so many things, that it has become almost meaningless to a wide swath of consumers.

SAY_CLOUD_AGAIN

So many people look at OpenStack and just see “software” to make “one of those cloud things” for a very specific use. They miss the point entirely that OpenStack (and others like it) are simply part of the commoditization of, and a paradigm shift in the way we think about, infrastructure.

With an open certification program we’ll be able to see advantages like:

  1. Creating a common lexicon
  2. Accelerating adoption
  3. Embracing the Open Source model
  4. Leveling the playing field for innovation
  5. A strongly supported ecosystem

read more…

v0.76 released

This release includes another batch of updates for firefly functionality. Most notably, the cache pool infrastructure now support snapshots, the OSD backfill functionality has been generalized to include multiple targets (necessary for the coming erasure pools), and there were performance improvements to the erasure code plugin on capable processors. The MDS now properly utilizes (and seamlessly migrates to) the OSD key/value interface (aka omap) for storing directory objects. There continue to be many other fixes and improvements for usability and code portability across the tree.

UPGRADING

  • ‘rbd ls’ on a pool which never held rbd images now exits with code 0. It outputs nothing in plain format, or an empty list in non-plain format. This is consistent with the behavior for a pool which used to hold images, but contains none. Scripts relying on this behavior should be updated.
  • The MDS requires a new OSD operation TMAP2OMAP, added in this release. When upgrading, be sure to upgrade and restart the ceph-osd daemons before the ceph-mds daemon. The MDS will refuse to start if any up OSDs do not support the new feature.
  • The ‘ceph mds set_max_mds N’ command is now deprecated in favor of ‘ceph mds set max_mds N’.
  • read more…

Ceph Developer Summit: Giant

The “Giant” Ceph Developer Summit looms….giantly… on the horizon and our wiki is ready for blueprint authors! We want your ideas, brainstorms, plans for work, or anything else you can dream up.

If you don’t already have an account on the wiki please bear with us as we work through a few kinks of account creation that have cropped up with an upgraded plugin (it couldn’t come at a worse time!). Creating an account will send you to wikilogin.ceph.com and ask for Google credentials. Once you enter them and select a username for the forum it may dump you to the old wiki at wikilogin.ceph.com instead of redirecting you back to wiki.ceph.com. If that happens please just head back to wiki.ceph.com by hand and it should let you in the door. [edit: all auth components should be working now]

If you have issues please send them to Scuttlemonkey.

Now, on with the summit details!

Date Milestone
03 FEB Blueprint submissions begin
21 FEB Blueprint submissions end
24 FEB Summit agenda announced
04 MAR Ceph Developer Summit: Day 1
05 MAR Ceph Developer Summit: Day 2
June 2014 Giant Release

If you are interested in submitting a blueprint, collaborating on an existing blueprint, or just attending to learn more about Ceph, read on!

 

Submit Blueprint

read more…

Page 1 of 1112345...10...Last »
© 2013, Inktank Storage, Inc.. All rights reserved.