v10.2.6 Jewel released
TheAnalyst
This point release fixes several important bugs in RBD mirroring, RGW multi-site, CephFS, and RADOS.
We recommend that all v10.2.x users upgrade.
For more detailed information, see the complete changelog .
OSDs No Longer Send ENXIO by Default ¶
In previous versions, if a client sent an op to the wrong OSD, the OSD would reply with ENXIO. The rationale here is that the client or OSD is clearly buggy and we want to surface the error as clearly as possible. We now only send the ENXIO reply if the osd_enxio_on_misdirected_op option is enabled (it's off by default). This means that a VM using librbd that previously would have gotten an EIO and gone read-only will now see a blocked/hung IO instead.
Other Notable Changes
- build/ops: add hostname sanity check to run-{c}make-check.sh (issue#18134, pr#12302, Nathan Cutler)
- build/ops: add ldap lib to rgw lib deps based on build config (issue#17313, pr#13183, Nathan Cutler)
- build/ops: ceph-create-keys loops forever (issue#17753, pr#11884, Alfredo Deza)
- build/ops: ceph daemons DUMPABLE flag is cleared by setuid preventing coredumps (issue#17650, pr#11736, Patrick Donnelly)
- build/ops: fixed compilation error when --with-radowsgw=no (issue#18512, pr#12729, Pan Liu)
- build/ops: fixed the issue when --disable-server, compilation fails. (issue#18120, pr#12239, Pan Liu)
- build/ops: fix undefined crypto references with --with-xio (issue#18133, pr#12296, Nathan Cutler)
- build/ops: install-deps.sh based on /etc/os-release (issue#18466, issue#18198, pr#12405, Jan Fajerski, Nitin A Kamble, Nathan Cutler)
- build/ops: Remove the runtime dependency on lsb_release (issue#17425, pr#11875, John Coyle, Brad Hubbard)
- build/ops: rpm: /etc/ceph/rbdmap is packaged with executable access rights (issue#17395, pr#11855, Ken Dreyer)
- build/ops: selinux: Allow ceph to manage tmp files (issue#17436, pr#13048, Boris Ranto)
- build/ops: systemd: Restart Mon after 10s in case of failure (issue#18635, pr#13058, Wido den Hollander)
- build/ops: systemd restarts Ceph Mon to quickly after failing to start (issue#18635, pr#13184, Wido den Hollander)
- ceph-disk: fix flake8 errors (issue#17898, pr#11976, Ken Dreyer)
- cephfs: fuse client crash when adding a new osd (issue#17270, pr#11860, John Spray)
- cli: ceph-disk: convert none str to str before printing it (issue#18371, pr#13187, Kefu Chai)
- client: Fix lookup of "/.." in jewel (issue#18408, pr#12766, Jeff Layton)
- client: fix stale entries in command table (issue#17974, pr#12137, John Spray)
- client: populate metadata during mount (issue#18361, pr#13085, John Spray)
- cli: implement functionality for adding, editing and removing omap values with binary keys (issue#18123, pr#12755, Jason Dillaman)
- common: Improve linux dcache hash algorithm (issue#17599, pr#11529, Yibo Cai)
- common: utime.h: fix timezone issue in round_to_* funcs. (issue#14862, pr#11508, Zhao Chao)
- doc: Python Swift client commands in Quick Developer Guide don't match configuration in vstart.sh (issue#17746, pr#13043, Ronak Jain)
- librbd: allow to open an image without opening parent image (issue#18325, pr#13130, Ricardo Dias)
- librbd: metadata_set API operation should not change global config setting (issue#18465, pr#13168, Mykola Golub)
- librbd: new API method to force break a peer's exclusive lock (issue#15632, issue#16773, issue#17188, issue#16988, issue#17210, issue#17251, issue#18429, issue#17227, issue#18327, issue#17015, pr#12890, Danny Al-Gaaf, Mykola Golub, Jason Dillaman)
- librbd: properly order concurrent updates to the object map (issue#16176, pr#12909, Jason Dillaman)
- librbd: restore journal access when force disabling mirroring (issue#17588, pr#11916, Mykola Golub)
- mds: Cannot create deep directories when caps contain path=/somepath (issue#17858, pr#12154, Patrick Donnelly)
- mds: cephfs metadata pool: deep-scrub error omap_digest != best guess omap_digest (issue#17177, pr#12380, Yan, Zheng)
- mds: cephfs test failures (ceph.com/qa is broken, should be download.ceph.com/qa) (issue#18574, pr#13023, John Spray)
- mds: ceph-fuse crash during snapshot tests (issue#18460, pr#13120, Yan, Zheng)
- mds: ceph_volume_client: fix recovery from partial auth update (issue#17216, pr#11656, Ramana Raja)
- mds: ceph_volume_client.py : Error: Can't handle arrays of non-strings (issue#17800, pr#12325, Ramana Raja)
- mds: Cleanly reject session evict command when in replay (issue#17801, pr#12153, Yan, Zheng)
- mds: client segfault on ceph_rmdir path / (issue#9935, pr#13029, Michal Jarzabek)
- mds: Clients without pool-changing caps shouldn't be allowed to change pool_namespace (issue#17798, pr#12155, John Spray)
- mds: Decode errors on backtrace will crash MDS (issue#18311, pr#12836, Nathan Cutler, John Spray)
- mds: false failing to respond to cache pressure warning (issue#17611, pr#11861, Yan, Zheng)
- mds: finish clientreplay requests before requesting active state (issue#18461, pr#13113, Yan, Zheng)
- mds: fix incorrect assertion in Server::_dir_is_nonempty() (issue#18578, pr#13459, Yan, Zheng)
- mds: fix MDSMap upgrade decoding (issue#17837, pr#13139, John Spray, Patrick Donnelly)
- mds: fix missing ll_get for ll_walk (issue#18086, pr#13125, Gui Hecheng)
- mds: Fix mount root for ceph_mount users and change tarball format (issue#18312, issue#18254, pr#12592, Jeff Layton)
- mds: fix null pointer dereference in Locker::handle_client_caps (issue#18306, pr#13060, Yan, Zheng)
- mds: lookup of /.. in returns -ENOENT (issue#18408, pr#12783, Jeff Layton)
- mds: MDS crashes on missing metadata object (issue#18179, pr#13119, Yan, Zheng)
- mds: mds fails to respawn if executable has changed (issue#17531, pr#11873, Patrick Donnelly)
- mds: MDS: false failing to respond to cache pressure warning (issue#17716, pr#11856, Yan, Zheng)
- mds: MDS goes damaged on blacklist (failed to read JournalPointer: -108 ((108) Cannot send after transport endpoint shutdown) (issue#17236, pr#11413, John Spray)
- mds: MDS long-time blocked ops. ceph-fuse locks up with getattr of file (issue#17275, pr#11858, Yan, Zheng)
- mds: speed up readdir by skipping unwanted dn (issue#18519, pr#12921, Xiaoxi Chen)
- mds: standby-replay daemons can sometimes miss events (issue#17954, pr#13126, John Spray)
- mon: cache tiering: base pool last_force_resend not respected (racing read got wrong version) (issue#18366, pr#13115, Sage Weil)
- mon: ceph osd down detection behaviour (issue#18104, pr#12677, xie xingguo)
- mon: Error EINVAL: removing mon.a at 172.21.15.16:6789/0, there will be 1 monitors (issue#17725, pr#11999, Joao Eduardo Luis)
- mon: health does not report pgs stuck in more than one state (issue#17515, pr#11660, Sage Weil)
- mon: monitor assertion failure when deactivating mds in (invalid) fscid 0 (issue#17518, pr#11862, Patrick Donnelly)
- mon: monitor cannot start because of FAILED assert(info.state == MDSMap::STATE_STANDBY) (issue#18166, pr#13123, John Spray, Patrick Donnelly)
- mon: osd flag health message is misleading (issue#18175, pr#13117, Sage Weil)
- mon: OSDMonitor: clear jewel+ feature bits when talking to Hammer OSD (issue#18582, pr#13131, Piotr Dałek)
- mon: OSDs marked OUT wrongly after monitor failover (issue#17719, pr#11947, Dong Wu)
- mon: peon wrongly delete routed pg stats op before receive pg stats ack (issue#18458, pr#13045, Mingxin Liu)
- mon: send updated monmap to its subscribers (issue#17558, pr#11743, Kefu Chai)
- msgr: don't truncate message sequence to 32-bits (issue#16122, pr#12416, Yan, Zheng)
- msgr: msg/simple: clear_pipe when wait() is mopping up pipes (issue#15784, pr#13062, Sage Weil)
- msgr: msg/simple/Pipe: error decoding addr (issue#18072, pr#12291, Sage Weil)
- osd: Add config option to disable new scrubs during recovery (issue#17866, pr#11944, Wido den Hollander)
- osd: collection_list shadow return value # (issue#17713, pr#11737, Haomai Wang)
- osd: do not send ENXIO on misdirected op by default (issue#18751, pr#13255, Sage Weil)
- osd: FileStore: fiemap cannot be totally retrieved in xfs when the number of extents > 1364 (issue#17610, pr#11998, Kefu Chai, Ning Yao)
- osd: leveldb corruption leads to Operation not permitted not handled and assert (issue#18037, pr#12789, Nathan Cutler)
- osd: limit omap data in push op (issue#16128, pr#11991, Wanlong Gao)
- osd: osd crashes when radosgw-admin bi list --max-entries=1 command runing (issue#17745, pr#11758, weiqiaomiao)
- osd: osd_max_backfills default has changed, documentation should reflect that. (issue#17701, pr#11735, huangjun)
- osd: OSDMonitor: only reject MOSDBoot based on up_from if inst matches (issue#17899, pr#12868, Samuel Just)
- osd: osd/PG: publish PG stats when backfill-related states change (issue#18369, pr#12875, Alexey Sheplyakov, Sage Weil)
- osd: Remove extra call to reg_next_scrub() during splits (issue#16474, pr#11606, David Zafman)
- osd: Revert "Merge pull request #12978 from asheplyakov/jewel-18581" (issue#18809, pr#13280, Samuel Just)
- osd: update_log_missing does not order correctly with osd_ops (issue#17789, pr#11997, Samuel Just)
- qa/tasks: backport rbd_fio fixes to jewel (issue#13512, pr#13104, Ilya Dryomov)
- qa/tasks/workunits: backport misc fixes to jewel (issue#18336, pr#12912, Sage Weil)
- rados: crash adding snap to purged_snaps in ReplicatedPG::WaitingOnReplicas (part 2) (issue#15943, issue#18504, pr#12791, Samuel Just)
- rados: Memory leaks in object_list_begin and object_list_end (issue#18252, pr#13118, Brad Hubbard)
- rados: The request lock RPC message might be incorrectly ignored (issue#17030, pr#10865, Jason Dillaman)
- rbd: add image id block name prefix APIs (issue#18270, pr#12529, Jason Dillaman)
- rbd: add max_part and nbds_max options in rbd nbd map, in order to keep consistent with (issue#18186, pr#12426, Pan Liu)
- rbd: Attempting to remove an image w/ incompatible features results in partial removal (issue#18315, pr#13156, Dongsheng Yang)
- rbd: bench-write will crash if --io-size is 4G (issue#18422, pr#13129, Gaurav Kumar Garg)
- rbd: diff calculate can hide parent extents when examining first snapshot in clone (issue#18068, pr#12322, Jason Dillaman)
- rbd: Exclusive lock improperly initialized on read-only image when using snap_set API (issue#17618, pr#11852, Jason Dillaman)
- rbd: FAILED assert(m_processing == 0) while running test_lock_fence.sh (issue#17973, pr#12323, Venky Shankar)
- rbd: Improve error reporting from rbd feature enable/disable (issue#16985, pr#13157, Gaurav Kumar Garg)
- rbd: JournalMetadata flooding with errors when being blacklisted (issue#18243, pr#12739, Jason Dillaman)
- rbd: librbd: use proper snapshot when computing diff parent overlap (issue#18200, pr#12649, Xiaoxi Chen)
- rbd: partition func should be enabled When load nbd.ko for rbd-nbd (issue#18115, pr#12754, Pan Liu)
- rbd: Potential race when removing two-way mirroring image (issue#18447, pr#13233, Mykola Golub)
- rbd: [qa] crash in journal-enabled fsx run (issue#18618, pr#13128, Jason Dillaman)
- rbd: 'rbd du' of missing image does not return error (issue#16987, pr#11854, Dongsheng Yang)
- rbd: rbd-mirror: gmock warnings in bootstrap request unit tests (issue#18048, issue#18012, issue#18156, issue#16991, issue#18051, pr#12425, Mykola Golub)
- rbd: rbd-mirror: image sync object map reload logs message (issue#16179, pr#12753, runsisi)
- rbd: rbd-mirror: snap protect of non-layered image results in split-brain (issue#16962, pr#11869, Mykola Golub)
- rbd: [rbd-mirror] sporadic image replayer shut down failure (issue#18441, pr#13155, Jason Dillaman)
- rbd: rbd-nbd: disallow mapping images >2TB in size (issue#17219, pr#11870, Mykola Golub)
- rbd: rbd-nbd: invalid error code for "failed to read nbd request" messages (issue#18242, pr#12756, Mykola Golub)
- rbd: status json format has duplicated/overwritten key (issue#18261, pr#12741, Mykola Golub)
- rbd: TestLibRBD.DiscardAfterWrite doesn't handle rbd_skip_partial_discard = true (issue#17750, pr#11853, Jason Dillaman)
- rbd: truncate can cause unflushed snapshot data lose (issue#17193, pr#12324, Yan, Zheng)
- : ReplicatedBackend: take read locks for clone sources during recovery (issue#17831, issue#18583, pr#12978, Samuel Just)
- rgw: add option to log custom HTTP headers (rgw_log_http_headers) (issue#18891, pr#12490, Matt Benjamin)
- rgw: add suport for Swift-at-root dependent features of Swift API (issue#18526, issue#16673, pr#11497, Pritha Srivastava, Radoslaw Zarzynski, Pete Zaitcev, Abhishek Lekshmanan)
- rgw: add support for the prefix parameter in account listing of Swift API (issue#17931, pr#12258, Radoslaw Zarzynski)
- rgw: Add workaround for upgrade issues for older jewel versions (issue#17820, pr#12316, Orit Wasserman)
- rgw: be aware abount tenants on cls_user_bucket -> rgw_bucket conversion (issue#18364, issue#16355, pr#13276, Radoslaw Zarzynski)
- rgw: bucket check remove _multipart_ prefix (issue#13724, pr#11470, Weijun Duan)
- rgw: bucket resharding (issue#17549, issue#17550, pr#13341, Yehuda Sadeh, Robin H. Johnson)
- rgw: disable virtual hosting of buckets when no hostnames are configured (issue#17440, issue#15975, issue#17136, pr#11760, Casey Bodley, Robin H. Johnson)
- rgw: do not abort when accept a CORS request with short origin (issue#18187, pr#12397, LiuYang)
- rgw: don't store empty chains in gc (issue#17897, pr#12174, Yehuda Sadeh)
- rgw:fix for deleting objects name beginning and ending with underscores of one bucket using POST method of js sdk. (issue#17888, pr#12320, Casey Bodley)
- rgw: fix period update crash (issue#18631, pr#13273, Orit Wasserman)
- rgw: fix put_acls for objects starting and ending with underscore (issue#17625, pr#11675, Orit Wasserman)
- rgw: fix use of marker in List::list_objects() (issue#18331, pr#13358, Yehuda Sadeh)
- rgw: for the create_bucket api, if the input creation_time is zero, we … (issue#16597, pr#11990, weiqiaomiao)
- rgw: Have a flavor of bucket deletion in radosgw-admin to bypass garbage collection (issue#15557, pr#10661, Pavan Rallabhandi)
- rgw: json encode/decode of RGWBucketInfo missing index_type field (issue#17755, pr#11759, Yehuda Sadeh)
- rgw: ldap: enforce simple_bind w/LDAPv3 redux (issue#18339, pr#12678, Weibing Zhang)
- rgw: leak from RGWMetaSyncShardCR::incremental_sync (issue#18412, issue#18300, pr#13004, Casey Bodley, Sage Weil)
- rgw: leak in RGWFetchAllMetaCR (issue#17812, pr#11872, Casey Bodley)
- rgw: librgw: objects created from s3 apis are not visible from nfs mount point (issue#18651, pr#13177, Matt Benjamin)
- rgw: log name instead of id for SystemMetaObj on failure (issue#15776, pr#12622, Wido den Hollander, Abhishek Lekshmanan)
- rgw: multimds: mds entering up:replay and processing down mds aborts (issue#17670, pr#11857, Patrick Donnelly)
- rgw: multipart upload copy (issue#12790, pr#13068, Yehuda Sadeh, Javier M. Mellid, Matt Benjamin)
- rgw: multisite: after finishing full sync on a bucket, incremental sync starts over from the beginning (issue#17661, issue#17624, pr#11864, Zengran Zhang, Casey Bodley)
- rgw: multisite: assert(next) failed in RGWMetaSyncCR (issue#17044, pr#11477, Casey Bodley)
- rgw: multisite: coroutine deadlock assertion on error in FetchAllMetaCR (issue#17571, pr#11866, Casey Bodley)
- rgw: multisite: coroutine deadlock in RGWMetaSyncCR after ECANCELED errors (issue#17465, pr#12738, Casey Bodley)
- rgw: multisite doesn't retry RGWFetchAllMetaCR on failed lease (issue#17047, pr#11476, Casey Bodley)
- rgw: multisite: ECANCELED & 500 error on bucket delete (issue#17698, pr#12044, Casey Bodley)
- rgw: multisite: failed assertion in 'radosgw-admin bucket sync status' (issue#18083, pr#12314, Casey Bodley)
- rgw: multisite: fix ref counting of completions (issue#17792, issue#18414, issue#17793, issue#18407, pr#13001, Casey Bodley)
- rgw: multisite: metadata master can get the wrong value for 'oldest_log_period' (issue#16894, pr#11868, Casey Bodley)
- rgw: multisite: obsolete 'radosgw-admin period prepare' command (issue#17387, pr#11574, Gaurav Kumar Garg)
- rgw: multisite: race between ReadSyncStatus and InitSyncStatus leads to EIO errors (issue#17568, pr#11865, Casey Bodley)
- rgw: multisite requests failing with '400 Bad Request' with civetweb 1.8 (issue#17822, pr#12313, Casey Bodley)
- rgw: multisite: segfault after changing value of rgw_data_log_num_shards (issue#18488, pr#13180, Casey Bodley)
- rgw: multisite: sync status reports master is on a different period (issue#18064, pr#13175, Abhishek Lekshmanan)
- rgw: multisite upgrade from hammer -> jewel ignores rgw_region_root_pool (issue#17963, pr#12156, Casey Bodley)
- rgw: radosgw-admin period update reverts deleted zonegroup (issue#17239, pr#13171, Orit Wasserman)
- rgw: Realm set does not create a new period (issue#18333, pr#13182, Orit Wasserman)
- rgw: remove spurious mount entries for RGW buckets (issue#17850, pr#12045, Matt Benjamin)
- rgw: Replacing '+' with "%20" in canonical uri for s3 v4 auth. (issue#17076, pr#12542, Pritha Srivastava)
- rgw: rgw-admin: missing command to modify placement targets (issue#18078, pr#12428, Yehuda Sadeh, Casey Bodley)
- rgw: RGWRados::get_system_obj() sends unnecessary stat request before read (issue#17580, pr#11867, Casey Bodley)
- rgw: rgw_rest_s3: apply missed base64 try-catch (issue#17663, pr#11672, Matt Benjamin)
- rgw: RGW will not list Argonaut-era bucket via HTTP (but radosgw-admin works) (issue#17372, pr#11863, Yehuda Sadeh)
- rgw: sends omap_getvals with (u64)-1 limit (issue#17985, pr#12419, Yehuda Sadeh, Sage Weil)
- rgw: slave zonegroup cannot enable the bucket versioning (issue#18003, pr#13173, Orit Wasserman)
- rgw: TempURL properly handles accounts created with the implicit tenant (issue#17961, pr#12079, Radoslaw Zarzynski)
- rgw: the value of total_time is wrong in the result of 'radosgw-admin log show' opt (issue#17598, pr#11876, weiqiaomiao)
- rgw: Unable to commit period zonegroup change (issue#17364, pr#12315, Orit Wasserman)
- rgw: valgrind "invalid read size 4" RGWGetObj (issue#18071, pr#12997, Matt Benjamin)
- rgw: work around curl_multi_wait bug with non-blocking reads (issue#15915, issue#16368, issue#16695, pr#11627, John Coyle, Casey Bodley)
- tests: add require_jewel_osds before upgrading last hammer node (issue#18719, pr#13161, Nathan Cutler)
- tests: add require_jewel_osds to upgrade/hammer-x/tiering (issue#18920, pr#13404, Nathan Cutler)
- tests: assertion failure in a radosgw-admin related task (issue#17167, pr#12764, Orit Wasserman)
- tests: Cannot reserve CentOS 7.2 smithi machines (issue#18416, issue#18401, pr#13050, Nathan Cutler, Sage Weil, Yuri Weinstein)
- tests: ignore bogus ceph-objectstore-tool error in ceph_manager (issue#16263, pr#13240, Nathan Cutler, Kefu Chai)
- tests: objecter_requests workunit fails on wip branches (issue#18393, pr#12761, Sage Weil)
- tests: qa/suites/upgrade/hammer-x: break stress split ec symlinks (issue#19006, pr#13533, Nathan Cutler)
- tests: qa/suites/upgrade/hammer-x/stress-split: finish thrashing before final upgrade (issue#19004, pr#13222, Sage Weil)
- tests: qa/tasks/ceph_deploy.py: use dev option (issue#18736, pr#13106, Vasu Kulkarni)
- tests: qa/workunits/rbd: use more recent qemu-iotests that support Xenial (issue#18149, issue#10773, pr#13103, Jason Dillaman)
- tests: remove qa/suites/buildpackages (issue#18846, pr#13299, Loic Dachary)
- tests: SUSE yaml facets in qa/distros/all are out of date (issue#18856, issue#18846, pr#13331, Nathan Cutler)
- tests: update rbd/singleton/all/formatted-output.yaml to support ceph-ci (issue#18440, pr#12822, Nathan Cutler, Venky Shankar)
- tests: update Ubuntu image url after ceph.com refactor (issue#18542, pr#12959, Jason Dillaman)
- tests: upgrade:hammer-x: install firefly only on Ubuntu 14.04 (issue#18089, pr#13153, Nathan Cutler)
- tests: use ceph-jewel branch for s3tests (issue#18384, pr#12745, Nathan Cutler)
- tests: Workunits needlessly wget from git.ceph.com (issue#18336, issue#18271, issue#18388, pr#12686, Nathan Cutler, Sage Weil)
- test: temporarily disable fork()'ing tests (issue#16556, issue#17832, pr#11953, John Spray)
- test: test fails due to The UNIX domain socket path (issue#16014, pr#12151, Loic Dachary)
- tools: ceph-disk: ceph-disk@.service races with ceph-osd@.service (issue#17889, issue#17813, pr#12147, Loic Dachary)
- tools: ceph-disk --dmcrypt create must not require admin key (issue#17849, pr#12033, Loic Dachary)
- tools: ceph-disk prepare writes osd log 0 with root owner (issue#18538, pr#13025, Samuel Matzek)
- tools: crushtool --compile is create output despite of missing item (issue#17306, pr#11410, Kefu Chai)
- tools: rados bench seq must verify the hostname (issue#17526, pr#13049, Loic Dachary)
- tools: snapshotted RBD extent objects can't be manually evicted from a cache tier (issue#17896, pr#11968, Mingxin Liu)
- tools: systemd/ceph-disk: reduce ceph-disk flock contention (issue#18049, issue#13160, pr#12210, David Disseldorp)