The Ceph Blog

Featured Post

Ceph Developer Summit: Hammer

As many of you Ceph Day attendees are no doubt aware, we’re fast approaching the release date for the ‘Giant’ release of Ceph. With that, it’s time to get together at another virtual Ceph Developer Summit and chat about what development work is going in to the ‘Hammer’ release. Blueprint submissions are open now, so if you have any work you would like to contribute or request of our community developers, please submit it as soon as possible to ensure it gets a CDS slot.

The rough schedule of CDS and Hammer in general should look something like this:

Date Milestone
30 SEP Blueprint submissions begin
17 OCT Blueprint submissions end
21 OCT Summit agenda announced
28 OCT Ceph Developer Summit: Day 1
29 OCT Ceph Developer Summit: Day 2 (if needed)
January 2015 Hammer Release

If there are enough sessions we are exploring the possibility of expanding our event into three days, but that will be predicated on the blueprint workload. As always, this event will be an online event (utilizing the BlueJeans system) so that everyone can attend from their own timezone. If you are interested in submitting a blueprint or collaborating on an existing blueprint, please click the big red button below!

 

Submit Blueprint

scuttlemonkey out
Earlier Posts

v0.67.11 Dumpling released

This stable update for Dumpling fixes several important bugs that affect a small set of users.

We recommend that all Dumpling users upgrade at their convenience. If none of these issues are affecting your deployment there is no urgency.

NOTABLE CHANGES

  • common: fix sending dup cluster log items (#9080 Sage Weil)
  • doc: several doc updates (Alfredo Deza)
  • libcephfs-java: fix build against older JNI headesr (Greg Farnum)
  • librados: fix crash in op timeout path (#9362 Matthias Kiefer, Sage Weil)
  • librbd: fix crash using clone of flattened image (#8845 Josh Durgin)
  • librbd: fix error path cleanup when failing to open image (#8912 Josh Durgin)
  • mon: fix crash when adjusting pg_num before any OSDs are added (#9052 Sage Weil)
  • mon: reduce log noise from paxos (Aanchal Agrawal, Sage Weil)
  • osd: allow scrub and snap trim thread pool IO priority to be adjusted (Sage Weil)
  • osd: fix mount/remount sync race (#9144 Sage Weil)

GETTING CEPH

v0.85 released

This is the second-to-last development release before Giant that contains new functionality. The big items to land during this cycle are the messenger refactoring from Matt Benjmain that lays some groundwork for RDMA support, a performance improvement series from SanDisk that improves performance on SSDs, lots of improvements to our new standalone civetweb-based RGW frontend, and a new ‘osd blocked-by’ mon command that allows admins to easily identify which OSDs are blocking peering progress. The other big change is that the OSDs and Monitors now distinguish between “misplaced” and “degraded” objects: the latter means there are fewer copies than we’d like, while the former simply means the are not stored in the locations where we want them to be.

Also of note is a change to librbd that enables client-side caching by default. This is coupled with another option that makes the cache write-through until a “flush” operations is observed: this implies that the librbd user (usually a VM guest OS) supports barriers and flush and that it is safe for the cache to switch into writeback mode without compromising data safety or integrity. It has long been recommended practice that these options be enabled (e.g., in OpenStack environments) but until now it has not been the default.

We have frozen the tree for the looming Giant release, and the next development release will be a release candidate with a final batch of new functionality.

UPGRADING

  • The client-side caching for librbd is now enabled by default (rbd cache = true). A safety option (rbd cache writethrough until flush = true) is also enabled so that writeback caching is not used until the library observes a ‘flush’ command, indicating that the librbd users is passing that operation through from the guest VM. This avoids potential data loss when used with older versions of qemu that do not support flush.

    leveldb_write_buffer_size = 32*1024*1024 = 33554432 // 32MB leveldb_cache_size = 512*1024*1204 = 536870912 // 512MB leveldb_block_size = 64*1024 = 65536 // 64KB leveldb_compression = false leveldb_log = “”

    OSDs will still maintain the following osd-specific defaults:

    leveldb_log = “”

  • The ‘rados getxattr …’ command used to add a gratuitous newline to the attr value; it now does not.

NOTABLE CHANGES

  • ceph-disk: do not inadvertantly create directories (Owne Synge)
  • ceph-disk: fix dmcrypt support (Sage Weil)
  • ceph-disk: linter cleanup, logging improvements (Alfredo Deza)
  • ceph-disk: show information about dmcrypt in ‘ceph-disk list’ output (Sage Weil)
  • ceph-disk: use partition type UUIDs and blkid (Sage Weil)
  • ceph: fix for non-default cluster names (#8944, Dan Mick)
  • doc: document new upstream wireshark dissector (Kevin Cox)
  • doc: many install doc updates (John Wilkins)
  • librados: fix lock leaks in error paths (#9022, Paval Rallabhandi)
  • librados: fix pool existence check (#8835, Pavan Rallabhandi)
  • librbd: enable caching by default (Sage Weil)
  • librbd: fix crash using clone of flattened image (#8845, Josh Durgin)
  • librbd: store and retrieve snapshot metadata based on id (Josh Durgin)
  • mailmap: many updates (Loic Dachary)
  • mds: add min/max UID for snapshot creation/deletion (#9029, Wido den Hollander)
  • misc build errors/warnings for Fedora 20 (Boris Ranto)
  • mon: add ‘osd blocked-by’ command to easily see which OSDs are blocking peering progress (Sage Weil)
  • mon: add perfcounters for paxos operations (Sage Weil)
  • mon: create default EC profile if needed (Loic Dachary)
  • mon: fix crash on loopback messages and paxos timeouts (#9062, Sage Weil)
  • mon: fix divide by zero when pg_num is adjusted before OSDs are added (#9101, Sage Weil)
  • mon: fix occasional memory leak after session reset (#9176, Sage Weil)
  • mon: fix ruleset/ruleid bugs (#9044, Loic Dachary)
  • mon: make usage dumps in terms of bytes, not kB (Sage Weil)
  • mon: prevent implicit destruction of OSDs with ‘osd setmaxosd …’ (#8865, Anand Bhat)
  • mon: verify all quorum members are contiguous at end of Paxos round (#9053, Sage Weil)
  • msgr: refactor to cleanly separate SimpleMessenger implemenetation, move toward Connection-based calls (Matt Benjamin, Sage Wei)
  • objectstore: clean up KeyValueDB interface for key/value backends (Sage Weil)
  • osd: add local_mtime for use by cache agent (Zhiqiang Wang)
  • osd: add superblock for KeyValueStore backend (Haomai Wang)
  • osd: add support for Intel ISA-L erasure code library (Andreas-Joachim Peters)
  • osd: do not skip promote for write-ordered reads (#9064, Samuel Just)
  • osd: fix ambigous encoding order for blacklisted clients (#9211, Sage Weil)
  • osd: fix cache flush corner case for snapshotted objects (#9054, Samuel Just)
  • osd: fix discard of old/obsolete subop replies (#9259, Samuel Just)
  • osd: fix discard of peer messages from previous intervals (Greg Farnum)
  • osd: fix dump of open fds on EMFILE (Sage Weil)
  • osd: fix journal dump (Ma Jianpeng)
  • osd: fix mon feature bit requirements bug and resulting log spam (Sage Weil)
  • osd: fix recovery chunk size usage during EC recovery (Ma Jianpeng)
  • osd: fix recovery reservation deadlock for EC pools (Samuel Just)
  • osd: fix removal of old xattrs when overwriting chained xattrs (Ma Jianpeng)
  • osd: fix requesting queueing on PG split (Samuel Just)
  • osd: force new xattrs into leveldb if fs returns E2BIG (#7779, Sage Weil)
  • osd: implement alignment on chunk sizes (Loic Dachary)
  • osd: improve prioritization of recovery of degraded over misplaced objects (Sage Weil)
  • osd: locking, sharding, caching improvements in FileStore’s FDCache (Somnath Roy, Greg Farnum)
  • osd: many important bug fixes (Samuel Just)
  • osd, mon: add rocksdb support (Xinxin Shu, Sage Weil)
  • osd, mon: distinguish between “misplaced” and “degraded” objects in cluster health and PG state reporting (Sage Weil)
  • osd: refactor some ErasureCode functionality into command parent class (Loic Dachary)
  • osd: set rollback_info_completed on create (#8625, Samuel Just)
  • rados: allow setxattr value to be read from stdin (Sage Weil)
  • rados: drop gratuitous n from getxattr command (Sage Weil)
  • rgw: add –min-rewrite-stripe-size for object restriper (Yehuda Sadeh)
  • rgw: add powerdns hook for dynamic DNS for global clusters (Wido den Hollander)
  • rgw: copy object data is target bucket is in a different pool (#9039, Yehuda Sadeh)
  • rgw: do not try to authenticate CORS preflight requests (#8718, Robert Hubbard, Yehuda Sadeh)
  • rgw: fix civetweb URL decoding (#8621, Yehuda Sadeh)
  • rgw: fix removal of objects during object creation (Patrycja Szablowska, Yehuda Sadeh)
  • rgw: fix striping for copied objects (#9089, Yehuda Sadeh)
  • rgw: fix test for identify whether an object has a tail (#9226, Yehuda Sadeh)
  • rgw: fix when stripe size is not a multiple of chunk size (#8937, Yehuda Sadeh)
  • rgw: improve civetweb logging (Yehuda Sadeh)
  • rgw: misc civetweb frontend fixes (Yehuda Sadeh)
  • sysvinit: add support for non-default cluster names (Alfredo Deza)

GETTING CEPH

v0.84 released

The next Ceph development release is here! This release contains several meaty items, including some MDS improvements for journaling, the ability to remove the CephFS file system (and name it), several mon cleanups with tiered pools, several OSD performance branches, a new “read forward” RADOS caching mode, a prototype Kinetic OSD backend, and various radosgw improvements (especially with the new standalone civetweb frontend). And there are a zillion OSD bug fixes. Things are looking pretty good for the Giant release that is coming up in the next month.

UPGRADING

  • The *_kb perf counters on the monitor have been removed. These are replaced with a new set of *_bytes counters (e.g., cluster_osd_kb is replaced by cluster_osd_bytes).
  • The rd_kb and wr_kb fields in the JSON dumps for pool stats (accessed via the ‘ceph df detail -f json-pretty’ and related commands) have been replaced with corresponding *_bytes fields. Similarly, the ‘total_space’, ‘total_used’, and ‘total_avail’ fields are replaced with ‘total_bytes’, ‘total_used_bytes’, and ‘total_avail_bytes’ fields.
  • The ‘rados df –format=json’ output ‘read_bytes’ and ‘write_bytes’ fields were incorrectly reporting ops; this is now fixed.
  • The ‘rados df –format=json’ output previously included ‘read_kb’ and ‘write_kb’ fields; these have been removed. Please use ‘read_bytes’ and ‘write_bytes’ instead (and divide by 1024 if appropriate).

NOTABLE CHANGES

  • ceph-conf: flush log on exit (Sage Weil)
  • ceph-dencoder: refactor build a bit to limit dependencies (Sage Weil, Dan Mick)
  • ceph.spec: split out ceph-common package, other fixes (Sandon Van Ness)
  • ceph_test_librbd_fsx: fix RNG, make deterministic (Ilya Dryomov)
  • cephtool: refactor and improve CLI tests (Joao Eduardo Luis)
  • client: improved MDS session dumps (John Spray)
  • common: fix dup log messages (#9080, Sage Weil)
  • crush: include new tunables in dump (Sage Weil)
  • crush: only require rule features if the rule is used (#8963, Sage Weil)
  • crushtool: send output to stdout, not stderr (Wido den Hollander)
  • fix i386 builds (Sage Weil)
  • fix struct vs class inconsistencies (Thorsten Behrens)
  • hadoop: update hadoop tests for Hadoop 2.0 (Haumin Chen)
  • librbd, ceph-fuse: reduce cache flush overhead (Haomai Wang)
  • librbd: fix error path when opening image (#8912, Josh Durgin)
  • mds: add file system name, enabled flag (John Spray)
  • mds: boot refactor, cleanup (John Spray)
  • mds: fix journal conversion with standby-replay (John Spray)
  • mds: separate inode recovery queue (John Spray)
  • mds: session ls, evict commands (John Spray)
  • mds: submit log events in async thread (Yan, Zheng)
  • mds: use client-provided timestamp for user-visible file metadata (Yan, Zheng)
  • mds: validate journal header on load and save (John Spray)
  • misc build fixes for OS X (John Spray)
  • misc integer size cleanups (Kevin Cox)
  • mon: add get-quota commands (Joao Eduardo Luis)
  • mon: do not create file system by default (John Spray)
  • mon: fix ‘ceph df’ output for available space (Xiaoxi Chen)
  • mon: fix bug when no auth keys are present (#8851, Joao Eduardo Luis)
  • mon: fix compat version for MForward (Joao Eduardo Luis)
  • mon: restrict some pool properties to tiered pools (Joao Eduardo Luis)
  • msgr: misc locking fixes for fast dispatch (#8891, Sage Weil)
  • osd: add ‘dump_reservations’ admin socket command (Sage Weil)
  • osd: add READFORWARD caching mode (Luis Pabon)
  • osd: add header cache for KeyValueStore (Haomai Wang)
  • osd: add prototype KineticStore based on Seagate Kinetic (Josh Durgin)
  • osd: allow map cache size to be adjusted at runtime (Sage Weil)
  • osd: avoid refcounting overhead by passing a few things by ref (Somnath Roy)
  • osd: avoid sharing PG info that is not durable (Samuel Just)
  • osd: clear slow request latency info on osd up/down (Sage Weil)
  • osd: fix PG object listing/ordering bug (Guang Yang)
  • osd: fix PG stat errors with tiering (#9082, Sage Weil)
  • osd: fix bug with long object names and rename (#8701, Sage Weil)
  • osd: fix cache full -> not full requeueing (#8931, Sage Weil)
  • osd: fix gating of messages from old OSD instances (Greg Farnum)
  • osd: fix memstore bugs with collection_move_rename, lock ordering (Sage Weil)
  • osd: improve locking for KeyValueStore (Haomai Wang)
  • osd: make tiering behave if hit_sets aren’t enabled (Sage Weil)
  • osd: mark pools with incomplete clones (Sage Weil)
  • osd: misc locking fixes for fast dispatch (Samuel Just, Ma Jianpeng)
  • osd: prevent old rados clients from using tiered pools (#8714, Sage Weil)
  • osd: reduce OpTracker overhead (Somnath Roy)
  • osd: set configurable hard limits on object and xattr names (Sage Weil, Haomai Wang)
  • osd: trim old EC objects quickly; verify on scrub (Samuel Just)
  • osd: work around GCC 4.8 bug in journal code (Matt Benjamin)
  • rados bench: fix arg order (Kevin Dalley)
  • rados: fix {read,write}_ops values for df output (Sage Weil)
  • rbd: add rbdmap pre- and post post- hooks, fix misc bugs (Dmitry Smirnov)
  • rbd: improve option default behavior (Josh Durgin)
  • rgw: automatically align writes to EC pool (#8442, Yehuda Sadeh)
  • rgw: fix crash on swift CORS preflight request (#8586, Yehuda Sadeh)
  • rgw: fix memory leaks (Andrey Kuznetsov)
  • rgw: fix multipart upload (#8846, Silvain Munaut, Yehuda Sadeh)
  • rgw: improve -h (Abhishek Lekshmanan)
  • rgw: improve delimited listing of bucket, misc fixes (Yehuda Sadeh)
  • rgw: misc civetweb fixes (Yehuda Sadeh)
  • rgw: powerdns backend for global namespaces (Wido den Hollander)
  • systemd: initial systemd config files (Federico Simoncelli)

GETTING CEPH

v0.67.10 Dumpling released

This stable update release for Dumpling includes primarily fixes for RGW, including several issues with bucket listings and a potential data corruption problem when multiple multi-part uploads race. There is also some throttling capability added in the OSD for scrub that can mitigate the performance impact on production clusters.

We recommend that all Dumpling users upgrade at their convenience.

NOTABLE CHANGES

  • ceph-disk: partprobe befoere settle, fixing dm-crypt (#6966, Eric Eastman)
  • librbd: add invalidate cache interface (Josh Durgin)
  • librbd: close image if remove_child fails (Ilya Dryomov)
  • librbd: fix potential null pointer dereference (Danny Al-Gaaf)
  • librbd: improve writeback checks, performance (Haomai Wang)
  • librbd: skip zeroes when copying image (#6257, Josh Durgin)
  • mon: fix rule(set) check on ‘ceph pool set … crush_ruleset …’ (#8599, John Spray)
  • mon: shut down if mon is removed from cluster (#6789, Joao Eduardo Luis)
  • osd: fix filestore perf reports to mon (Sage Weil)
  • osd: force any new or updated xattr into leveldb if E2BIG from XFS (#7779, Sage Weil)
  • osd: lock snapdir object during write to fix race with backfill (Samuel Just)
  • osd: option sleep during scrub (Sage Weil)
  • osd: set io priority on scrub and snap trim threads (Sage Weil)
  • osd: ‘status’ admin socket command (Sage Weil)
  • rbd: tolerate missing NULL terminator on block_name_prefix (#7577, Dan Mick)
  • rgw: calculate user manifest (#8169, Yehuda Sadeh)
  • rgw: fix abort on chunk read error, avoid using extra memory (#8289, Yehuda Sadeh)
  • rgw: fix buffer overflow on bucket instance id (#8608, Yehuda Sadeh)
  • rgw: fix crash in swift CORS preflight request (#8586, Yehuda Sadeh)
  • rgw: fix implicit removal of old objects on object creation (#8972, Patrycja Szablowska, Yehuda Sadeh)
  • rgw: fix MaxKeys in bucket listing (Yehuda Sadeh)
  • rgw: fix race with multiple updates to a single multipart object (#8269, Yehuda Sadeh)
  • rgw: improve bucket listing with delimiter (Yehuda Sadeh)
  • rgw: include NextMarker in bucket listing (#8858, Yehuda Sadeh)
  • rgw: return error early on non-existent bucket (#7064, Yehuda Sadeh)
  • rgw: set truncation flag correctly in bucket listing (Yehuda Sadeh)
  • sysvinit: continue starting daemons after pre-mount error (#8554, Sage Weil)

For more detailed information, see the complete changelog.

Twitter || Facebook || Google+ || Lists/IRC


openstack-logo512

Voting for submissions is well underway for the next OpenStack summit, and this one is shaping up to be another great place to talk about Ceph. Almost fifty talks are currently available for voting on the OpenStack site! Ceph has been steadily gaining popularity in the OpenStack world, especially if you take a look at recent user survey results.

Start Voting!

The even better part is there are only three Ceph talks that were submitted by the former Inktank crew, which means there are a ton of organic submissions available. If you are interested in taking a peek at the Inktank crew specifically, they can be found at the following URLs:

read more…

v0.83 released

Another Ceph development release! This has been a longer cycle, so there has been quite a bit of bug fixing and stabilization in this round. There is also a bunch of packaging fixes for RPM distros (RHEL/CentOS, Fedora, and SUSE) and for systemd. We’ve also added a new librados-striper library from Sebastien Ponce that provides a generic striping API for applications to code to.

UPGRADING

  • The experimental keyvaluestore-dev OSD backend had an on-disk format change that prevents existing OSD data from being upgraded. This affects developers and testers only.
  • mon-specific and osd-specific leveldb options have been removed. From this point onward users should use ‘leveldb_‘ generic options and add the options in the appropriate sections of their configuration files. Monitors will still maintain the following monitor-specific defaults:

    leveldb_write_buffer_size = 32*1024*1024 = 33554432 // 32MB leveldb_cache_size = 512*1024*1204 = 536870912 // 512MB leveldb_block_size = 64*1024 = 65536 // 64KB leveldb_compression = false leveldb_log = “”

    OSDs will still maintain the following osd-specific defaults:

    leveldb_log = “”

read more…

v0.80.5 Firefly released

This release fixes a few important bugs in the radosgw and fixes several packaging and environment issues, including OSD log rotation, systemd environments, and daemon restarts on upgrade.

We recommend that all v0.80.x Firefly users upgrade, particularly if they are using upstart, systemd, or radosgw.

NOTABLE CHANGES

  • ceph-dencoder: do not needlessly link to librgw, librados, etc. (Sage Weil)
  • do not needlessly link binaries to leveldb (Sage Weil)
  • mon: fix mon crash when no auth keys are present (#8851, Joao Eduaro Luis)
  • osd: fix cleanup (and avoid occasional crash) during shutdown (#7981, Sage Weil)
  • osd: fix log rotation under upstart (Sage Weil)
  • rgw: fix multipart upload when object has irregular size (#8846, Yehuda Sadeh, Sylvain Munaut)
  • rgw: improve bucket listing S3 compatibility (#8858, Yehuda Sadeh)
  • rgw: improve delimited bucket listing (Yehuda Sadeh)
  • rpm: do not restart daemons on upgrade (#8849, Alfredo Deza)

For more detailed information, see the complete changelog.

GETTING CEPH

Lots Going on with Ceph

While we knew that after the acquisition of Inktank life would accelerate again, it seems like the Ceph community is quickly approaching ludicrous speed, and it shows no sign of slowing down. We have had some amazing participation in the various endeavors, but it would be completely understandable if you had missed something amidst the avalanche of Ceph-related news.

Just in case something flew by you, I wanted to take a few minutes to recap some of the highlights of recent history. If you would like to keep a closer eye on what has been going on feel free to follow one (or all!) of our informational feeds:

Twitter || Facebook || Google+ || Lists/IRC

ludicrous speed

read more…

OSCON has arrived (although if you came in for the Ceph tutorial session that’s old news to you)! As a part of our participation in OSCON, and as a way to celebrate the fact that Ceph turned 10 years old this year, we have decided to have our party be a distributed one.

We would love to have our users send us pictures of whatever they might be doing to celebrate the 10th anniversary of ceph. Are you busy racking in 3 petabytes of storage to add to your Ceph cluster? Did you create a culinary masterpiece in the form of a squid cake? Are you sitting alone in the middle of the OSCON show floor with a party hat and a cupcake? We want to see! As thanks for sharing your birthday celebration efforts with the community we’ll be picking one lucky winner to receive a desktop Ceph test cluster built by our very own Mark Nelson (Ceph performance guru extraordinaire!).

While the cluster wont break any speed records, and only a madman would use it for anything even remotely production ready, it will give you a Ceph cluster to play with and can sit on your desk to invoke feelings of envy in your coworkers. For more details check out the (new) contest page on the Ceph wiki. If you have any questions please contact me or just tweet @Ceph. Thanks, and happy birthday to Ceph!

scuttlemonkey out
Page 1 of 1312345...10...Last »
© 2013, Inktank Storage, Inc.. All rights reserved.