v0.94.8 Hammer released

This Hammer point release fixes several bugs.

We recommend that all hammer v0.94.x users upgrade.

For more detailed information, see the complete changelog.


  • build/ops: Add -D_LARGEFILE64_SOURCE to Linux build. (issue#16611, pr#10182, Ira Cooper)
  • build/ops: boost uuid makes valgrind complain (issue#12736, pr#9741, Sage Weil, Rohan Mars)
  • build/ops: ceph-disk s/by-parttype-uuid/by-parttypeuuid/ (issue#15867, pr#9107, Nathan Cutler)
  • common: add units to rados bench output and clean up formatting (issue#12248, pr#8960, Dmitry Yatsushkevich, Brad Hubbard, Gu Zhongyan)
  • common: config set with negative value results in “error setting ‘filestore_merge_threshold’ to ‘-40’: (22) Invalid argument” (issue#13829, pr#10291, Brad Hubbard, Kefu Chai)
  • common: linking to -lrbd causes process startup times to balloon (issue#15225, pr#8538, Richard W.M. Jones)
  • doc: fix by-parttypeuuid in ceph-disk(8) nroff (issue#15867, pr#10699, Ken Dreyer)
  • fs: double decreased the count to trim caps which will cause failing to respond to cache pressure (issue#14319, pr#8804, Zhi Zhang)
  • log: do not repeat errors to stderr (issue#14616, pr#10227, Sage Weil)
  • mds: failing file operations on kernel based cephfs mount point leaves unaccessible file behind on hammer 0.94.7 (issue#16013, pr#10198, Yan, Zheng)
  • mds: fix stray purging in ‘stripe_count > 1’ case (issue#15050, pr#8042, Yan, Zheng)
  • mds: wrongly treat symlink inode as normal file/dir when symlink inode is stale on kcephfs (issue#15702,pr#9404, Zhi Zhang)
  • mon: LibRadosMiscConnectFailure.ConnectFailure (not so intermittent) failure in upgrade/hammer-x (issue#13992, pr#8806, Sage Weil)
  • mon: Monitor: validate prefix on handle_command() (issue#16297, pr#10038, You Ji)
  • mon: drop pg temps from not the current primary in OSDMonitor (issue#16127, pr#9893, Samuel Just)
  • mon: fix calculation of %USED (issue#15641, pr#9125, Ruifeng Yang, David Zafman)
  • mon: improve reweight_by_utilization() logic (issue#15686, pr#9416, xie xingguo)
  • mon: pool quota alarm is not in effect (issue#15478, pr#8593, Danny Al-Gaaf)
  • mon: wrong ceph get mdsmap assertion (issue#14681, pr#7542, Vicente Cheng)
  • msgr: ceph-osd valgrind invalid reads/writes (issue#15870, pr#9238, Samuel Just)
  • objecter: LibRadosWatchNotifyPPTests/LibRadosWatchNotifyPP.WatchNotify2Timeout/1 segv (issue#15760, pr#9400, Sage Weil)
  • osd: OSD reporting ENOTEMPTY and crashing (issue#14766, pr#9277, Samuel Just)
  • osd: When generating past intervals due to an import end at pg epoch and fix build_past_intervals_parallel (issue#12387, issue#14438, pr#8464, David Zafman)
  • osd: acting_primary not updated on split (issue#15523, pr#9001, Sage Weil)
  • osd: assert(!actingbackfill.empty()): old watch timeout tries to queue repop on replica (issue#15391,pr#8665, Sage Weil)
  • osd: assert(rollback_info_trimmed_to == head) in PGLog (issue#13965, pr#8849, Samuel Just)
  • osd: delete one of the repeated op->mark_started in ReplicatedBackend::sub_op_modify_impl (issue#16572, pr#9977, shun-s)
  • osd: fix omap digest compare when scrub (issue#16000, pr#9271, Xinze Chi)
  • osd: is_split crash in handle_pg_create (issue#15426, pr#8805, Kefu Chai)
  • osd: objects unfound after repair (fixed by repeering the pg) (issue#15006, pr#7961, Jianpeng Ma, Loic Dachary, Kefu Chai)
  • osd: rados cppool omap to ec pool crashes osd (issue#14695, pr#8845, Jianpeng Ma)
  • osd: remove all stale osdmaps in handle_osd_map() (issue#13990, pr#9090, Kefu Chai)
  • osd: send write and read sub ops on behalf of client ops at normal priority in ECBackend (issue#14313,pr#8573, Samuel Just)
  • rbd: snap rollback: restore the link to parent (issue#14512, pr#8535, Alexey Sheplyakov)
  • rgw: S3: set EncodingType in ListBucketResult (issue#15896, pr#8987, Victor Makarov, Robin H. Johnson)
  • rgw: backport rgwx-copy-if-newer for radosgw-agent (issue#16262, pr#9671, Yehuda Sadeh)
  • rgw: bucket listing following object delete is partial (issue#14826, pr#10555, Orit Wasserman)
  • rgw: convert plain object to versioned (with null version) when removing (issue#15243, pr#8755, Yehuda Sadeh)
  • rgw: fix multi-delete query param parsing. (issue#16618, pr#10189, Robin H. Johnson)
  • rgw: have a flavor of bucket deletion to bypass GC and to trigger (issue#15557, pr#10509, Pavan Rallabhandi)
  • rgw: keep track of written_objs correctly (issue#15886, pr#9240, Yehuda Sadeh)
  • rgw: multipart ListPartsResult has missing quotes on ETag (issue#15334, pr#8475, xie xingguo, Robin H. Johnson)
  • rgw: no Last-Modified, Content-Size and X-Object-Manifest headers if no segments in DLO manifest (issue#15812, pr#9402, Radoslaw Zarzynski)
  • rgw: radosgw server abort when user passed bad parameters to set quota (issue#14190, issue#14191,pr#8313, Dunrong Huang)
  • rgw: radosgw-admin region-map set is not reporting the bucket quota correctly (issue#16815, pr#10554, Yehuda Sadeh, Orit Wasserman)
  • rgw: refrain from sending Content-Type/Content-Length for 304 responses (issue#16327, issue#13582,issue#15119, issue#14005, pr#8379, Yehuda Sadeh, Nathan Cutler, Wido den Hollander)
  • rgw: remove bucket index objects when deleting the bucket (issue#16412, pr#10530, Orit Wasserman)
  • rgw: set Access-Control-Allow-Origin to an asterisk if allowed in a rule (issue#15348, pr#8528, Wido den Hollander)
  • rgw: subset of uploaded objects via radosgw are unretrievable when using EC pool (issue#15745,pr#9407, Yehuda Sadeh)
  • rgw: subuser rm fails with status 125 (issue#14375, pr#9961, Orit Wasserman)
  • rgw: the swift key remains after removing a subuser (issue#12890, issue#14375, pr#10718, Orit Wasserman, Sangdi Xu)
  • rgw: user quota may not adjust on bucket removal (issue#14507, pr#8113, Edward Yang)
  • tests: be more generous with test timeout (issue#15403, pr#8470, Loic Dachary)
  • tests: qa/workunits/rbd: respect RBD_CREATE_ARGS environment variable (issue#16289, pr#9722, Mykola Golub)


v0.94.3 Hammer released

This Hammer point release fixes a critical (though rare) data corruption bug that could be triggered when logs are rotated via SIGHUP. It also fixes a range of other important bugs in the OSD, monitor, RGW, RBD, and CephFS.

All v0.94.x Hammer users are strongly encouraged to upgrade.


  • The pg ls-by-{pool,primary,osd} commands and pg ls now take the argument recovering instead of recovery in order to include the recovering pgs in the listed pgs.


  • librbd: aio calls may block (issue#11770pr#4875, Jason Dillaman)
  • osd: make the all osd/filestore thread pool suicide timeouts separately configurable (issue#11701pr#5159, Samuel Just)
  • mon: ceph fails to compile with boost 1.58 (issue#11982pr#5122, Kefu Chai)
  • tests: TEST_crush_reject_empty must not run a mon (issue#12285,11975pr#5208, Kefu Chai)
  • osd: FAILED assert(!old_value.deleted()) in upgrade:giant-x-hammer-distro-basic-multi run (issue#11983pr#5121, Samuel Just)
  • build/ops: linking ceph to tcmalloc causes segfault on SUSE SLE11-SP3 (issue#12368pr#5265, Thorsten Behrens)
  • common: utf8 and old gcc breakage on RHEL6.5 (issue#7387pr#4687, Kefu Chai)
  • crush: take crashes due to invalid arg (issue#11740pr#4891, Sage Weil)
  • rgw: need conversion tool to handle fixes following #11974 (issue#12502pr#5384, Yehuda Sadeh)
  • rgw: Swift API: support for 202 Accepted response code on container creation (issue#12299pr#5214, Radoslaw Zarzynski)
  • common: Log::reopen_log_file: take m_flush_mutex (issue#12520pr#5405, Samuel Just)
  • rgw: Properly respond to the Connection header with Civetweb (issue#12398pr#5284, Wido den Hollander)
  • rgw: multipart list part response returns incorrect field (issue#12399pr#5285, Henry Chang)
  • build/ops: 95-ceph-osd.rules, mount.ceph, and mount.fuse.ceph not installed properly on SUSE (issue#12397pr#5283, Nathan Cutler)
  • rgw: radosgw-admin dumps user info twice (issue#12400pr#5286, guce)
  • doc: fix doc build (issue#12180pr#5095, Kefu Chai)
  • tests: backport 11493 fixes, and test, preventing ec cache pools (issue#12314pr#4961, Samuel Just)
  • rgw: does not send Date HTTP header when civetweb frontend is used (issue#11872pr#5228, Radoslaw Zarzynski)
  • mon: pg ls is broken (issue#11910pr#5160, Kefu Chai)
  • librbd: A client opening an image mid-resize can result in the object map being invalidated (issue#12237pr#5279, Jason Dillaman)
  • doc: missing man pages for ceph-create-keys, ceph-disk-* (issue#11862pr#4846, Nathan Cutler)
  • tools: ceph-post-file fails on rhel7 (issue#11876pr#5038, Sage Weil)
  • build/ops: rcceph script is buggy (issue#12090pr#5028, Owen Synge)
  • rgw: Bucket header is enclosed by quotes (issue#11874pr#4862, Wido den Hollander)
  • build/ops: packaging: add SuSEfirewall2 service files (issue#12092pr#5030, Tim Serong)
  • rgw: Keystone PKI token expiration is not enforced (issue#11722pr#4884, Anton Aksola)
  • build/ops: debian/control: ceph-common (>> 0.94.2) must be >= 0.94.2-2 (issue#12529,11998pr#5417, Loic Dachary)
  • mon: Clock skew causes missing summary and confuses Calamari (issue#11879pr#4868, Thorsten Behrens)
  • rgw: rados objects wronly deleted (issue#12099pr#5117, wuxingyi)
  • tests: kernel_untar_build fails on EL7 (issue#12098pr#5119, Greg Farnum)
  • fs: Fh ref count will leak if readahead does not need to do read from osd (issue#12319pr#5427, Zhi Zhang)
  • mon: OSDMonitor: allow addition of cache pool with non-empty snaps with co… (issue#12595pr#5252, Samuel Just)
  • mon: MDSMonitor: handle MDSBeacon messages properly (issue#11979pr#5123, Kefu Chai)
  • tools: ceph-disk: get_partition_type fails on /dev/cciss… (issue#11760pr#4892, islepnev)
  • build/ops: max files open limit for OSD daemon is too low (issue#12087pr#5026, Owen Synge)
  • mon: add an “osd crush tree” command (issue#11833pr#5248, Kefu Chai)
  • mon: mon crashes when “ceph osd tree 85 –format json” (issue#11975pr#4936, Kefu Chai)
  • build/ops: ceph / ceph-dbg steal ceph-objecstore-tool from ceph-test / ceph-test-dbg (issue#11806pr#5069, Loic Dachary)
  • rgw: DragonDisk fails to create directories via S3: MissingContentLength (issue#12042pr#5118, Yehuda Sadeh)
  • build/ops: /usr/bin/ceph from ceph-common is broken without installing ceph (issue#11998pr#5206, Ken Dreyer)
  • build/ops: systemd: Increase max files open limit for OSD daemon (issue#11964pr#5040, Owen Synge)
  • build/ops: rgw/logrotate.conf calls service with wrong init script name (issue#12044pr#5055, wuxingyi)
  • common: OPT_INT option interprets 3221225472 as -1073741824, and crashes in Throttle::Throttle() (issue#11738pr#4889, Kefu Chai)
  • doc: doc/release-notes: v0.94.2 (issue#11492pr#4934, Sage Weil)
  • common: admin_socket: close socket descriptor in destructor (issue#11706pr#4657, Jon Bernard)
  • rgw: Object copy bug (issue#11755pr#4885, Javier M. Mellid)
  • rgw: empty json response when getting user quota (issue#12245pr#5237, wuxingyi)
  • fs: cephfs Dumper tries to load whole journal into memory at once (issue#11999pr#5120, John Spray)
  • rgw: Fix tool for #11442 does not correctly fix objects created via multipart uploads (issue#12242pr#5229, Yehuda Sadeh)
  • rgw: Civetweb RGW appears to report full size of object as downloaded when only partially downloaded (issue#12243pr#5231, Yehuda Sadeh)
  • osd: stuck incomplete (issue#12362pr#5269, Samuel Just)
  • osd: start_flush: filter out removed snaps before determining snapc’s (issue#11911pr#4899, Samuel Just)
  • librbd: 1967: FAILED assert(watchers.size() == 1) (issue#12239pr#5243, Jason Dillaman)
  • librbd: new QA client upgrade tests (issue#12109pr#5046, Jason Dillaman)
  • librbd: [ FAILED ] TestLibRBD.ExclusiveLockTransition (issue#12238pr#5241, Jason Dillaman)
  • rgw: Swift API: XML document generated in response for GET on account does not contain account name (issue#12323pr#5227, Radoslaw Zarzynski)
  • rgw: keystone does not support chunked input (issue#12322pr#5226, Hervé Rousseau)
  • mds: MDS is crashed (mds/ 1391: FAILED assert(!is_complete())) (issue#11737pr#4886, Yan, Zheng)
  • cli: ceph: cli interactive mode does not understand quotes (issue#11736pr#4776, Kefu Chai)
  • librbd: add valgrind memory checks for unit tests (issue#12384pr#5280, Zhiqiang Wang)
  • build/ops: admin/build-doc: script fails silently under certain circumstances (issue#11902pr#4877, John Spray)
  • osd: Fixes for rados ops with snaps (issue#11908pr#4902, Samuel Just)
  • build/ops: ceph-common subpackage def needs tweaking for SUSE/openSUSE (issue#12308pr#4883, Nathan Cutler)
  • fs: client: reference counting ‘struct Fh’ (issue#12088pr#5222, Yan, Zheng)
  • build/ops: ceph.spec: update OpenSUSE BuildRequires (issue#11611pr#4667, Loic Dachary)

For more detailed information, see the complete changelog.


v9.0.3 released

This is the second to last batch of development work for the Infernalis cycle. The most intrusive change is an internal (non user-visible) change to the OSD’s ObjectStore interface. Many fixes and improvements elsewhere across RGW, RBD, and another big pile of CephFS scrub/repair improvements.


  • The return code for librbd’s rbd_aio_read and Image::aio_read API methods no longer returns the number of bytes read upon success. Instead, it returns 0 upon success and a negative value upon failure.
  • ‘ceph scrub’, ‘ceph compact’ and ‘ceph sync force are now DEPRECATED. Users should instead use ‘ceph mon scrub’, ‘ceph mon compact’ and ‘ceph mon sync force’.
  • ‘ceph mon_metadata’ should now be used as ‘ceph mon metadata’. There is no need to deprecate this command (same major release since it was first introduced).
  • The –dump-json option of “osdmaptool” is replaced by –dump json.
  • The commands of “pg ls-by-{pool,primary,osd}” and “pg ls” now take “recovering” instead of “recovery”, to include the recovering pgs in the listed pgs.


  • autotools: fix out of tree build (Krxysztof Kosinski)
  • autotools: improve make check output (Loic Dachary)
  • buffer: add invalidate_crc() (Piotr Dalek)
  • buffer: fix zero bug (#12252 Haomai Wang)
  • build: fix junit detection on Fedora 22 (Ira Cooper)
  • ceph-disk: install pip > 6.1 (#11952 Loic Dachary)
  • cephfs-data-scan: many additions, improvements (John Spray)
  • ceph: improve error output for ‘tell’ (#11101 Kefu Chai)
  • ceph-objectstore-tool: misc improvements (David Zafman)
  • ceph-objectstore-tool: refactoring and cleanup (John Spray)
  • ceph_test_rados: test pipelined reads (Zhiqiang Wang)
  • common: fix bit_vector extent calc (#12611 Jason Dillaman)
  • common: make work queue addition/removal thread safe (#12662 Jason Dillaman)
  • common: optracker improvements (Zhiqiang Wang, Jianpeng Ma)
  • crush: add –check to validate dangling names, max osd id (Kefu Chai)
  • crush: cleanup, sync with kernel (Ilya Dryomov)
  • crush: fix subtree base weight on adjust_subtree_weight (#11855 Sage Weil)
  • crypo: fix NSS leak (Jason Dillaman)
  • crypto: fix unbalanced init/shutdown (#12598 Zheng Yan)
  • doc: misc updates (Kefu Chai, Owen Synge, Gael Fenet-Garde, Loic Dachary, Yannick Atchy-Dalama, Jiaying Ren, Kevin Caradant, Robert Maxime, Nicolas Yong, Germain Chipaux, Arthur Gorjux, Gabriel Sentucq, Clement Lebrun, Jean-Remi Deveaux, Clair Massot, Robin Tang, Thomas Laumondais, Jordan Dorne, Yuan Zhou, Valentin Thomas, Pierre Chaumont, Benjamin Troquereau, Benjamin Sesia, Vikhyat Umrao)
  • erasure-code: cleanup (Kefu Chai)
  • erasure-code: improve tests (Loic Dachary)
  • erasure-code: shec: fix recovery bugs (Takanori Nakao, Shotaro Kawaguchi)
  • libcephfs: add pread, pwrite (Jevon Qiao)
  • libcephfs,ceph-fuse: cache cleanup (Zheng Yan)
  • librados: add src_fadvise_flags for copy-from (Jianpeng Ma)
  • librados: respect default_crush_ruleset on pool_create (#11640 Yuan Zhou)
  • librbd: fadvise for copy, export, import (Jianpeng Ma)
  • librbd: handle NOCACHE fadvise flag (Jinapeng Ma)
  • librbd: optionally disable allocation hint (Haomai Wang)
  • librbd: prevent race between resize requests (#12664 Jason Dillaman)
  • log: fix data corruption race resulting from log rotation (#12465 Samuel Just)
  • mds: expose frags via asok (John Spray)
  • mds: fix setting entire file layout in one setxattr (John Spray)
  • mds: fix shutdown (John Spray)
  • mds: handle misc corruption issues (John Spray)
  • mds: misc fixes (Jianpeng Ma, Dan van der Ster, Zhang Zhi)
  • mds: misc snap fixes (Zheng Yan)
  • mds: store layout on header object (#4161 John Spray)
  • misc performance and cleanup (Nathan Cutler, Xinxin Shu)
  • mon: add NOFORWARD, OBSOLETE, DEPRECATE flags for mon commands (Joao Eduardo Luis)
  • mon: add PG count to ‘ceph osd df’ output (Michal Jarzabek)
  • mon: clean up, reorg some mon commands (Joao Eduardo Luis)
  • mon: disallow >2 tiers (#11840 Kefu Chai)
  • mon: fix log dump crash when debugging (Mykola Golub)
  • mon: fix metadata update race (Mykola Golub)
  • mon: fix refresh (#11470 Joao Eduardo Luis)
  • mon: make blocked op messages more readable (Jianpeng Ma)
  • mon: only send mon metadata to supporting peers (Sage Weil)
  • mon: periodic background scrub (Joao Eduardo Luis)
  • mon: prevent pgp_num > pg_num (#12025 Xinxin Shu)
  • mon: reject large max_mds values (#12222 John Spray)
  • msgr: add ceph_perf_msgr tool (Hoamai Wang)
  • msgr: async: fix seq handling (Haomai Wang)
  • msgr: xio: fastpath improvements (Raju Kurunkad)
  • msgr: xio: sync with accellio v1.4 (Vu Pham)
  • osd: clean up temp object if promotion fails (Jianpeng Ma)
  • osd: constrain collections to meta and PGs (normal and temp) (Sage Weil)
  • osd: filestore: clone using splice (Jianpeng Ma)
  • osd: filestore: fix recursive lock (Xinxin Shu)
  • osd: fix dup promotion lost op bug (Zhiqiang Wang)
  • osd: fix temp-clearing (David Zafman)
  • osd: include a temp namespace within each collection/pgid (Sage Weil)
  • osd: low and high speed flush modes (Mingxin Liu)
  • osd: peer_features includes self (David Zafman)
  • osd: recovery, peering fixes (#11687 Samuel Just)
  • osd: require firefly features (David Zafman)
  • osd: set initial crush weight with more precision (Sage Weil)
  • osd: use a temp object for recovery (Sage Weil)
  • osd: use blkid to collection partition information (Joseph Handzik)
  • rados: add –striper option to use libradosstriper (#10759 Sebastien Ponce)
  • radosgw-admin: fix subuser modify output (#12286 Guce)
  • rados: handle –snapid arg properly (Abhishek Lekshmanan)
  • rados: improve bench buffer handling, performance (Piotr Dalek)
  • rados: new pool import implementation (John Spray)
  • rbd: fix link issues (Jason Dillaman)
  • rbd: improve CLI arg parsing, usage (Ilya Dryomov)
  • rbd: recognize queue_depth kernel option (Ilya Dryomov)
  • rbd: support G and T units for CLI (Abhishek Lekshmanan)
  • rbd: use image-spec and snap-spec in help (Vikhyat Umrao, Ilya Dryomov)
  • rest-bench: misc fixes (Shawn Chen)
  • rest-bench: support https (#3968 Yuan Zhou)
  • rgw: add max multipart upload parts (#12146 Abshishek Dixit)
  • rgw: add Trasnaction-Id to response (Abhishek Dixit)
  • rgw: document layout of pools and objects (Pete Zaitcev)
  • rgw: do not preserve ACLs when copying object (#12370 Yehuda Sadeh)
  • rgw: fix Connection: header handling (#12298 Wido den Hollander)
  • rgw: fix data corruptions race condition (#11749 Wuxingyi)
  • rgw: fix JSON response when getting user quota (#12117 Wuxingyi)
  • rgw: force content_type for swift bucket stats requests (#12095 Orit Wasserman)
  • rgw: improved support for swift account metadata (Radoslaw Zarzynski)
  • rgw: make max put size configurable (#6999 Yuan Zhou)
  • rgw: orphan detection tool (Yehuda Sadeh)
  • rgw: swift: do not override sent content type (#12363 Orit Wasserman)
  • rgw: swift: set Content-Length for account GET (#12158 Radoslav Zarzynski)
  • rpm: always rebuild and install man pages for rpm (Owen Synge)
  • rpm: misc fixes (Boris Ranto, Owen Synge, Ken Dreyer, Ira Cooper)
  • systemd: logrotate fixes (Tim Seron, Lars Marowsky-Bree, Nathan Cutler)
  • sysvinit compat: misc fixes (Owen Synge)
  • test: misc fs test improvements (John Spray, Loic Dachary)
  • test: python tests, linter cleanup (Alfredo Deza)


Ceph cluster on Docker for testing

{% img center Ceph cluster on Docker for testing %}

I haven’t advertised this one really much (even if I’ve been using it in some articles).
Since people are still wondering how to quickly get a full Ceph cluster up and running for testing, I believe it deserves its own article so it will get more visibility.
Re-introducing the Ceph demo container.
This is going to be a really short article :).

read more…

faster debugging of a teuthology workunit

The Ceph integration tests run via teuthology rely on workunits found in the Ceph repository. For instance:

  • the /cephtool/ workunit is modified
  • it is pushed to a wip- in the official Ceph git repository
  • the gitbuilder will automatically build packages for all supported distributions for this wip- branch
  • the rados/singleton/all/cephtool suite can be run with teuthology-suite –suite rados/singleton
  • the workunit task fetches the workunits directory from the Ceph git repository and runs it

There is no need for Ceph to be packaged each time the workunit script is modified. Instead it can be fetched from a pull request:

  • the cephtool/ workunit is modified
  • the pull request number 2043 is created or updated with the modified workunit
  • the workunit.yaml file is created with
          branch: refs/pull/2043/head
  • the rados/singleton/all/cephtool suite can be run with teuthology-suite –suite rados/singleton $(pwd)/workunit.yaml
  • the workunit task fetch the workunits directory in the branch refs/pull/2043/head from the Ceph git repository and runs it

For each pull request, github implicitly creates a reference in the target git repository. This reference is mirrored to where the workunit task can extract it. The teuthology-suite command accepts yaml files in argument and they are assumed to be relative to the root of a clone of the ceph-qa-suite repository. By providing an absolute path ($(pwd)/workunit.yaml) the file is read from the current directory instead and there is no need to commit it to the ceph-qa-suite repository.

{% img center Getting started with the Docker RBD volume plugin %}

Docker 1.8 was just released a week ago and with it came the support for volume plugin.
Several volume plugins are available but today I will be introducing the Ceph RBD ones (yes there are currently 3 different drivers).

read more…

Downgrade LSI 9207 to P19 Firmware

After numerous problems encountered with the P20 firmware on this card model, here are the steps I followed to flash in P19 Version.

Since, no more problems :)

The model of the card is a LSI 9207-8i (SAS2308 controler) with IT FW:

lspci | grep LSI
01:00.0 Serial Attached SCSI controller: LSI Logic / Symbios Logic SAS2308 PCI-Express Fusion-MPT SAS-2 (rev 05)

read more…

Using SSD drives in some part of your cluster might useful.
Specially under read oriented workloads.
Ceph has a mechanism called primary affinity, which allows you to put a higher affinity on your OSDs so they will likely be primary on some PGs.
The idea is to have reads served by the SSDs so clients can get faster reads.

read more…

Scaling out the Ceph community lab

Ceph integration tests are vital and expensive. Contrary to unit tests that can be run on a laptop, they require multiple machines to deploy an actual Ceph cluster. As the community of Ceph developers expands, the community lab needs to expand.

The current development workflow and its challenges

When a developer contributes to Ceph, it goes like this:

  • The Developer submits a pull request
  • After the Reviewer is satisfied with the pull request, it is scheduled for integration testing (by adding the needs-qa label)
  • A Tester merges the pull request in an integration branch, together with other pull requests that needs-qa and set a label informing (s)he did so (for instance if Kefu Chai did it, he would set the wip-kefu-testing label)
  • The Tester waits for the packages to be built for the integration branch
  • The Tester schedules a suite of integration tests in the community lab
  • When the suite finishes, the Tester analyzes the integration tests results, finds the pull request responsible for a failure (which can be challenging when there are more than a handfull of pull requests in the integration branch)
  • For each failure the Tester adds a comment to the faulty pull request with a link to the integration test logs, kindly asking the developer to address the issue
  • When the integration tests are clean, the Tester merges the pull requests

As the number of contributors to Ceph increases, running the integration tests and analyzing their results becomes the bottleneck, because:

  • getting the integration tests results usually takes a few days
  • only people with access to the community lab can run integration tests
  • analyzing test results is time consuming

Increasing the number of machines in the community lab would run integration tests faster. But acquiring hardware, hosting it and monitoring it not only takes months, it also require significant system administration work. The community of Ceph developers is growing faster than what the community lab. And to make things even more complicated, as Ceph evolves the number of integration tests increases and require even more resources.

When a developer frequently contributes to Ceph, (s)he is granted access to the VPN that allows her/him to schedule integration tests. For instance Abhishek Lekshmanan and Nathan Cutler who routinely run and analyze integration tests for backports now have access to the community lab and can do that on their own. But the process to get access to the VPN takes weeks and the learning curve to use it properly is significant.

Although it is mostly invisible to the community lab user, the system administration workload to keep it running is significant. Dan Mick, Zack Cerza and others fix problems on a daily basis. As the size of the community lab grows, this workload increases and requires skills that are difficult to acquire.

Simplifying the workflow with public OpenStack clouds

As of July 2015, it became possible to run integration tests on public OpenStack clouds. More importantly, it takes less than one hour for a new developer to register and schedule an integration test. This new facility can be leveraged to simplify the workflow as follows:

  • The Developer submits a pull request
  • The Developer is required to attach a successfull run of integration tests demonstrating the feature or the bug fix
  • After the Reviewer is satisfied with the pull request, it is merged.

There is no need for a Tester because the Developer now has the ability to run integration tests and interpret the results.

The interpretation of the test results is simpler because there is only one pull request for a run. The Developer can compare her/his run to a recent run from the community lab to verify the unmodified code passes. (S)He also can debug a failed test in interactive mode.

Contrary to the community lab, the test cluster has a short life span and requires no system administration skills. It is created in the cloud, on demand, and can be destroyed as soon as the results have been analyzed.

The learning curve to schedule and interpret integration tests is reduced. The Developer needs to know about the teuthology-openstack command and how to interpret a test failure. But (s)he does not need the other teuthology-* commands nor does (s)he have to get access to the VPN of the community lab.

© 2016, Red Hat, Inc. All rights reserved.