13.2.3 Mimic released
TheAnalyst
This is the third bugfix release of the Mimic v13.2.x long term stable release series. This release contains many fixes across all components of Ceph.
If you haven't yet upgraded to v13.2.3, consider upgrading to the already released v13.2.4 which has a couple of security fixes on top of this release.
- The default memory utilization for the mons has been increased somewhat. Rocksdb now uses 512 MB of RAM by default, which should be sufficient for small to medium-sized clusters; large clusters should tune this up. Also, the mon_osd_cache_size has been increase from 10 OSDMaps to 500, which will translate to an additional 500 MB to 1 GB of RAM for large clusters, and much less for small clusters.
- Ceph v13.2.2 includes a wrong backport, which may cause mds to go into ‘damaged’ state when upgrading Ceph cluster from previous version. The bug is fixed in v13.2.3. If you are already running v13.2.2, upgrading to v13.2.3 does not require special action.
- The bluestore_cache_* options are no longer needed. They are replaced by osd_memory_target, defaulting to 4GB. BlueStore will expand and contract its cache to attempt to stay within this limit. Users upgrading should note this is a higher default than the previous bluestore_cache_size of 1GB, so OSDs using BlueStore will use more memory by default. For more details, see the BlueStore docs.
- This version contains an upgrade bug, http://tracker.ceph.com/issues/36686, due to which upgrading during recovery/backfill can cause OSDs to fail. This bug can be worked around, either by restarting all the OSDs after the upgrade, or by upgrading when all PGs are in “active+clean” state. If you have already successfully upgraded to 13.2.2, this issue should not impact you. Going forward, we are working on a clean upgrade path for this feature.
Changelog ¶
- build/ops: Can’t compile Ceph on Fedora 29 as it doesn’t recognize python*3*-tox as an install Tox (issue#18163, issue#37301, issue#37422, pr#25294, Nathan Cutler, Brad Hubbard)
- build/ops: debian: correct ceph-common relationship with older radosgw package (pr#25115, Matthew Vernon)
- ceph-bluestore-tool: fix set label functionality for specific keys (pr#24352, Igor Fedotov)
- ceph fs add_data_pool applies pool application metadata incorrectly (issue#36203, issue#36028, pr#24470, John Spray)
- cephfs: client: explicitly show blacklisted state via asok status command (issue#36457, issue#36352, pr#24993, Jonathan Brielmaier, Zhi Zhang)
- cephfs: client: request next osdmap for blacklisted client (issue#36668, issue#36690, pr#24987, Zhi Zhang)
- cephfs-journal-tool: wrong layout info used (issue#24933, issue#24644, pr#24583, Gu Zhongyan)
- cephfs: some tool commands silently operate on only rank 0, even if multiple ranks exist (issue#36218, pr#25036, Venky Shankar)
- ceph-fuse: add to selinux profile (issue#36103, issue#36197, pr#24439, Patrick Donnelly)
- ceph-volume: activate option –auto-detect-objectstore respects –no-systemd (issue#36249, pr#24357, Alfredo Deza)
- ceph-volume add device_id to inventory listing (pr#25349, Jan Fajerski)
- ceph-volume: add inventory command (issue#24972, pr#25013, Jan Fajerski)
- ceph-volume Additional work on ceph-volume to add some choose_disk capabilities (issue#36446, pr#24782, Erwan Velu)
- ceph-volume add new ceph-handlers role from ceph-ansible (issue#36251, pr#24337, Alfredo Deza)
- ceph-volume: adds a –prepare flag to lvm batch (issue#36363, pr#24760, Andrew Schoen)
- ceph-volume: allow to specify –cluster-fsid instead of reading from ceph.conf (issue#26953, pr#25116, Alfredo Deza)
- ceph_volume_client: py3 compatible (issue#26850, issue#17230, pr#24443, Rishabh Dave, Patrick Donnelly)
- ceph-volume custom cluster names fail on filestore trigger (issue#27210, pr#24279, Alfredo Deza)
- ceph-volume: do not send (lvm) stderr/stdout to the terminal, use the logfile (issue#36492, pr#24740, Alfredo Deza)
- ceph-volume enable –no-systemd flag for simple sub-command (issue#36470, pr#25011, Alfredo Deza)
- ceph-volume: fix journal and filestore data size in lvm batch –report (issue#36242, pr#24306, Andrew Schoen)
- ceph-volume: lsblk can fail to find PARTLABEL, must fallback to blkid (issue#36098, pr#24334, Alfredo Deza)
- ceph-volume lvm.prepare update help to indicate partitions are needed, not devices (issue#24795, pr#24449, Alfredo Deza)
- ceph-volume: make lvm batch idempotent (pr#24588, Andrew Schoen)
- ceph-volume: patch Device when testing (issue#36768, pr#25066, Alfredo Deza)
- ceph-volume: reject devices that have existing GPT headers (issue#27062, pr#25103, Andrew Schoen)
- ceph-volume: remove LVs when using zap –destroy (pr#25100, Alfredo Deza)
- ceph-volume remove version reporting from help menu (issue#36386, pr#24753, Alfredo Deza)
- ceph-volume: rename Device property valid to available (issue#36701, pr#25133, Jan Fajerski)
- ceph-volume: skip processing devices that don’t exist when scanning system disks (issue#36247, pr#24381, Alfredo Deza)
- ceph-volume systemd import main so console_scripts work for executable (issue#36648, pr#24852, Alfredo Deza)
- ceph-volume tests install ceph-ansible’s requirements.txt dependencies (issue#36672, pr#24959, Alfredo Deza)
- ceph-volume tests.systemd update imports for systemd module (issue#36704, pr#24957, Alfredo Deza)
- ceph-volume: use console_scripts (issue#36601, pr#24838, Mehdi Abaakouk)
- ceph-volume util.encryption don’t push stderr to terminal (issue#36246, pr#24826, Alfredo Deza)
- ceph-volume util.encryption robust blkid+lsblk detection of lockbox (pr#24980, Alfredo Deza)
- client: fix use-after-free in Client::link() (issue#35841, issue#24557, pr#24187, “Yan, Zheng”)
- client: statfs inode count odd (issue#35940, issue#24849, pr#24377, Rishabh Dave)
- client:two ceph-fuse client, one can not list out files created by an… (issue#27051, issue#35934, pr#24295, Peng Xie)
- client: update ctime when modifying file content (issue#35945, issue#36134, pr#24385, “Yan, Zheng”)
- common: get real hostname from container/pod environment (pr#23916, Sage Weil)
- core: _aio_log_start inflight overlap of 0x10000~1000 with [65536~4096] (issue#36754, issue#36625, pr#25062, Jonathan Brielmaier, Yang Honggang)
- core: FAILED assert(osdmap_manifest.pinned.empty()) in OSDMonitor::prune_init() (issue#24612, issue#35071, pr#24918, Joao Eduardo Luis)
- core: Interactive mode CLI prints no output since Mimic (issue#36358, issue#36432, pr#24971, John Spray, Mohamad Gebai)
- core: mgr crash on scrub of unconnected osd (issue#36110, issue#36465, pr#25029, Sage Weil)
- core: mon osdmap cash too small during upgrade to mimic (issue#36505, pr#25019, Sage Weil)
- core: monstore tool rebuild does not generate creating_pgs (issue#36306, issue#36433, pr#25016, Sage Weil)
- core: Objecter: add ignore cache flag if got redirect reply (issue#36658, pr#25075, Iain Buclaw, Jonathan Brielmaier)
- core: objecter cannot resend split-dropped op when racing with con reset (issue#22544, issue#35843, pr#24970, Sage Weil)
- core: os/bluestore: cache autotuning and memory limit (issue#37340, pr#25283, Josh Durgin, Mark Nelson)
- core: rados rm –force-full is blocked when cluster is in full status (issue#36435, pr#25017, Yang Honggang)
- crush/CrushWrapper: fix crush tree json dumper (issue#36150, pr#24481, Oshyn Song)
- debian/control: require fuse for ceph-fuse (issue#21057, pr#24037, Thomas Serlin)
- doc: add ceph-volume inventory sections (pr#25130, Jan Fajerski)
- doc: fix broken fstab url in cephfs/fuse (issue#36286, issue#36313, pr#24441, Jos Collin)
- doc: Put command template into literal block (pr#25000, Alexey Stupnikov)
- doc: remove deprecated ‘scrubq’ from ceph(8) (issue#35813, issue#35855, pr#24210, Ruben Kerkhof)
- docs: backport edit on github changes (pr#25362, Neha Ojha, Noah Watkins)
- doc: Typo error on cephfs/fuse/ (issue#36180, issue#36308, pr#24420, Karun Josy)
- ec: src/common/interval_map.h: 161: FAILED assert(len > 0) (issue#21931, issue#22330, pr#24581, Neha Ojha)
- fsck: cid is improperly matched to oid (issue#36146, issue#36551, issue#36099, issue#32731, pr#24480, Kefu Chai, Sage Weil)
- kernel_untar_build.sh: bison: command not found (issue#36121, pr#24241, Neha Ojha)
- libcephfs: expose CEPH_SETATTR_MTIME_NOW and CEPH_SETATTR_ATIME_NOW (issue#36205, issue#35961, pr#24464, Zhu Shangzhong)
- librados application’s symbol could conflict with the libceph-common (issue#26839, issue#25154, pr#24708, Kefu Chai)
- librbd: blacklisted client might not notice it lost the lock (issue#34534, pr#24401, Jason Dillaman)
- librbd: ensure exclusive lock acquired when removing sync point snaps… (issue#35714, issue#24898, pr#24137, Mykola Golub)
- librbd: fixed assert when flattening clone with zero overlap (issue#35957, issue#35702, pr#24356, Jason Dillaman)
- librbd: journaling unable request can not be sent to remote lock owner (issue#26939, issue#35712, pr#24122, Mykola Golub)
- librbd: object map improperly flagged as invalidated (issue#24516, issue#36225, pr#24413, Jason Dillaman)
- librgw: crashes in multisite configuration (issue#36302, issue#36415, pr#24908, Casey Bodley)
- mds: allows client to create .. and . dirents (issue#32104, pr#24384, Venky Shankar)
- mds: curate priority of perf counters sent to mgr (issue#35938, issue#26991, issue#32090, issue#35837, pr#24467, Patrick Donnelly, Venky Shankar)
- mds: evict cap revoke non-responding clients (pr#24661, Venky Shankar)
- mimic:mds: fix mds damaged due to unexpected journal length (issue#36199, pr#24463, Zhi Zhang)
- mds: internal op missing events time ‘throttled’, ‘all_read’, ‘dispatched’ (issue#36114, issue#36195, pr#24411, Yanhu Cao)
- mds: migrate strays part by part when shutdown mds (issue#26926, issue#32092, pr#24435, “Yan, Zheng”)
- mds: optimize the way how max export size is enforced (issue#25131, pr#23952, “Yan, Zheng”)
- mds: print is_laggy message once (issue#35250, issue#35719, pr#24161, Patrick Donnelly)
- mds: rctime may go back (issue#35916, issue#36136, pr#24379, “Yan, Zheng”)
- mds: rctime not set on system inode (root) at startup (issue#36221, issue#36461, pr#25042, Patrick Donnelly)
- mds: reset heartbeat map at potential time-consuming places (issue#26858, pr#23506, Yan, Zheng, “Yan, Zheng”)
- mds: src/mds/MDLog.cc: 281: FAILED ceph_assert(!capped) during max_mds thrashing (issue#36350, issue#37093, pr#25095, “Yan, Zheng”, Jonathan Brielmaier)
- mgr/DaemonServer: fix Session leak (pr#24233, Sage Weil)
- mgr/dashboard: Add http support to dashboard (issue#36069, pr#24734, Boris Ranto, Wido den Hollander)
- mgr/dashboard: Add support for URI encode (issue#24621, issue#26856, issue#24907, pr#24488, Tiago Melo)
- mgr/dashboard: Progress bar does not stop in TableKeyValueComponent (issue#35925, pr#24258, Volker Theile)
- mgr/dashboard: Remove fieldsets when using CdTable (issue#27851, issue#26999, pr#24478, Tiago Melo)
- mgr: hold lock while accessing the request list and submittin request (pr#25113, Jerry Lee)
- mgr: [restful] deep_scrub is not a valid OSD command (issue#36720, issue#36749, pr#25040, Boris Ranto)
- mon: mgr options not parse propertly (issue#35076, issue#35836, pr#24176, Sage Weil)
- mon/OSDMonitor: invalidate max_failed_since on cancel_report (issue#35930, issue#35860, pr#24281, xie xingguo)
- mon: test if gid exists in pending for prepare_beacon (issue#35848, pr#24272, Patrick Donnelly)
- msg/async: clean up local buffers on dispatch (issue#36127, issue#35987, pr#24386, Greg Farnum)
- msg: ceph_abort() when there are enough accepter errors in msg server (issue#36219, pr#25045, penglaiyxy@gmail.com)
- msg: challenging authorizer messages appear at debug_ms=0 (issue#35251, issue#35717, pr#24113, Patrick Donnelly)
- multisite: data full sync does not limit concurrent bucket sync (issue#26897, issue#36216, pr#24536, Casey Bodley)
- multisite: data sync error repo processing does not back off on empty (issue#35979, issue#26938, pr#24319, Casey Bodley)
- multisite: incremental data sync makes unnecessary call to RGWReadRemoteDataLogShardInfoCR (issue#35977, issue#26952, pr#24710, Casey Bodley)
- multisite: intermittent test_bucket_index_log_trim failures (issue#36201, issue#36034, pr#24400, Casey Bodley)
- multisite: invalid read in RGWCloneMetaLogCoroutine (issue#36208, issue#35851, pr#24414, Casey Bodley)
- multisite: segfault on shutdown/realm reload (issue#35857, issue#35543, pr#24235, Casey Bodley)
- os/bluestore: fix bloom filter num entry miscalculation in repairer (issue#25001, pr#24339, Igor Fedotov)
- os/bluestore: handle spurious read errors (issue#22464, pr#24647, Paul Emmerich)
- osd: add creating to pg_string_state (issue#36174, issue#36298, pr#24601, Dan van der Ster)
- osd: backport recent upmap fixes (pr#25419, ningtao, xie xingguo)
- osdc/Objecter: possible race condition with connection reset (issue#36183, issue#36296, pr#24600, Jason Dillaman)
- osd: crash in OpTracker::unregister_inflight_op via OSD::get_health_metrics (issue#24889, pr#23026, Radoslaw Zarzynski)
- osdc: reduce ObjectCacher’s memory fragments (issue#36192, issue#36643, pr#24873, “Yan, Zheng”)
- osd/ECBackend: don’t get result code of subchunk-read overwritten (issue#35959, issue#21769, pr#24298, songweibin)
- OSDMapMapping does not handle active.size() > pool size (issue#26866, issue#35936, pr#24431, Sage Weil)
- osd/PG: avoid choose_acting picking want with > pool size items (issue#35963, issue#35924, pr#24344, Sage Weil)
- osd/PrimaryLogPG: fix potential pg-log overtrimming (pr#24309, xie xingguo)
- osd: race condition opening heartbeat connection (issue#36637, issue#36602, pr#25026, Sage Weil)
- osd: RBD client IOPS pool stats are incorrect (2x higher; includes IO hints as an op) (issue#24909, issue#36557, pr#25024, Jason Dillaman)
- osd: Remove old bft= which has been superceded by backfill (issue#36292, issue#36170, pr#24573, David Zafman)
- qa: add test that builds example librados programs (issue#36228, issue#15100, pr#24537, Nathan Cutler)
- qa/ceph-ansible: Specify stable-3.2 branch (pr#25191, Brad Hubbard)
- qa: extend timeout for SessionMap flush (issue#36156, pr#24438, Patrick Donnelly)
- qa: fsstress workunit does not execute in parallel on same host without clobbering files (issue#36278, issue#24177, issue#36323, issue#36184, issue#36165, issue#36153, pr#24408, Patrick Donnelly)
- qa: increase rm timeout for workunit cleanup (issue#36501, issue#36365, pr#24684, Patrick Donnelly)
- qa: install dependencies for rbd_workunit_kernel_untar_build (issue#35074, issue#35077, pr#24240, Ilya Dryomov)
- qa: remove knfs site from future releases (issue#36075, issue#36102, pr#24269, Yuri Weinstein)
- qa/suites/rados/thrash-old-clients: exclude packages for hammer, jewel (pr#25193, Neha Ojha)
- qa/suites/rgw/verify/tasks/cls_rgw: test cls_rgw (issue#25024, pr#23197, Casey Bodley, Sage Weil)
- qa/tasks/qemu: use unique clone directory to avoid race with workunit (issue#36542, issue#36569, pr#24811, Jason Dillaman)
- qa: test_recovery_pool tries asok on wrong node (issue#24928, issue#24858, pr#23087, Patrick Donnelly)
- qa: tolerate failed rank while waiting for state (issue#36280, issue#35828, pr#24572, Patrick Donnelly)
- qa/workunits: replace ‘realpath’ with ‘readlink -f’ in fsstress.sh (issue#36409, issue#36430, issue#35538, pr#24622, Ilya Dryomov, Jason Dillaman)
- RADOS: probably missing clone location for async_recovery_targets (issue#35964, issue#35546, pr#24345, xie xingguo)
- mimic:rbd: fix error import when the input is a pipe (issue#35705, issue#34536, pr#24002, songweibin)
- [rbd-mirror] failed assertion when updating mirror status (issue#36084, issue#36120, pr#24321, Jason Dillaman)
- rbd: [rbd-mirror] forced promotion after killing remote cluster results in stuck state (issue#36659, issue#36693, pr#24952, Jonathan Brielmaier, Jason Dillaman)
- rbd: [rbd-mirror] periodic mirror status timer might fail to be scheduled (issue#36500, issue#36555, pr#24916, Jason Dillaman)
- rbd: rbd-nbd: do not ceph_abort() after print the usages (issue#36660, issue#36713, pr#24988, Shiyang Ruan)
- rbd: TokenBucketThrottle: use reference to m_blockers.front() and then update it (issue#36529, issue#36475, pr#24915, Dongsheng Yang)
- Revert “mimic: cephfs-journal-tool: enable purge_queue journal’s event commands” (issue#36346, issue#24604, pr#24485, Xuehan Xu, “Yan, Zheng”)
- rgw: abort_bucket_multiparts() ignores individual NoSuchUpload errors (issue#36129, issue#35986, pr#24388, Casey Bodley)
- rgw-admin: reshard add can add a non existant bucket (issue#36449, issue#36756, pr#25087, Jonathan Brielmaier, Abhishek Lekshmanan)
- rgw: async sync_object and remove_object does not access coroutine me… (issue#36138, issue#35905, pr#24417, Tianshan Qu)
- rgw/beast: drop privileges after binding ports (issue#36041, pr#24436, Paul Emmerich)
- rgw: beast frontend fails to parse ipv6 endpoints (issue#36662, issue#36734, pr#25079, Jonathan Brielmaier, Casey Bodley)
- rgw: cls_user_remove_bucket does not write the modified cls_user_stats (issue#36496, issue#36533, pr#24910, Casey Bodley)
- rgw: default quota not set in radosgw for Openstack users (issue#24595, issue#36223, pr#24907, Casey Bodley)
- mimic:rgw: fix chunked-encoding for chunks >1MiB (issue#36125, issue#35990, pr#24363, Robin H. Johnson)
- rgw: fix deadlock on RGWIndexCompletionManager::stop (issue#26949, issue#35710, pr#24101, Yao Zongyou)
- mimic:rgw: fix leak of curl handle on shutdown (issue#35715, issue#36213, pr#24518, Casey Bodley)
- mimic:rgw: list bucket can not show the object uploaded by RGWPostObj when enable bucket versioning (pr#24571, yuliyang)
- rgw: radosgw-admin user stats are incorrect when dynamic re-sharding is enabled (issue#36535, pr#24911, Casey Bodley)
- rgw: raise debug level on redundant data sync error messages (issue#35830, issue#36140, pr#24418, Casey Bodley)
- rgw: raise default rgw_curl_low_speed_time to 300 seconds (issue#35708, issue#27989, pr#24071, Casey Bodley)
- rgw: renew resharding locks to prevent expiration (issue#36687, issue#27219, issue#34307, pr#24899, Orit Wasserman, J. Eric Ivancich)
- rgw: resharding produces invalid values of bucket stats (issue#36290, issue#36381, pr#24526, Abhishek Lekshmanan)
- mimic:rgw: return x-amz-version-id: null when delete obj in versioning (issue#35814, pr#24189, yuliyang)
- rgw: RGWAsyncGetBucketInstanceInfo does not access coroutine memory (issue#36211, issue#35812, pr#24516, Casey Bodley)
- rgw: set default objecter_inflight_ops = 24576 (issue#36571, issue#25109, pr#24860, Jonathan Brielmaier, Matt Benjamin)
- rgw: support server-side encryption when SSL is terminated in a proxy (issue#36645, issue#27221, pr#24931, Jonathan Brielmaier, Casey Bodley)
- rgw: use-after-free from RGWRadosGetOmapKeysCR::~RGWRadosGetOmapKeysCR (issue#21154, issue#36537, issue#36539, pr#24912, Casey Bodley, Sage Weil)
- rpm: use updated gperftools (issue#36508, issue#35969, pr#24260, Brad Hubbard, Kefu Chai)
- segv in BlueStore::OldExtent::create (issue#36592, issue#36526, pr#24745, Sage Weil)
- test/librbd: not valid to have different parents between image snapshots (issue#36117, pr#24244, Jason Dillaman)
- [test] periodic seg faults within unittest_librbd (issue#36220, issue#36238, pr#24711, Jason Dillaman)
- test/rbd_mirror: race in WaitingOnLeaderReleaseLeader (issue#36236, issue#36276, pr#24551, Mykola Golub)
- tests: ceph-admin-commands.sh workunit does not log what it’s doing (issue#37153, issue#37089, pr#25085, Nathan Cutler)
- tests: librados api aio tests race condition (issue#24587, issue#36647, pr#25027, Josh Durgin)
- tests: make readable.sh fail if it doesn’t run anything (pr#25050, Greg Farnum)
- tests: rbd: move OpenStack devstack test to rocky release (issue#36410, issue#36428, pr#24913, Jason Dillaman)
- tests: unittest_rbd_mirror: TestMockImageMap.AddInstancePingPongImageTest: Value of: it != peer_ack_ctxs->end() (issue#36683, issue#36689, pr#24946, Mykola Golub, Jonathan Brielmaier)
- tests: use timeout for fs asok operations (issue#36335, issue#36503, pr#25332, Patrick Donnelly)
- tests: /usr/bin/ld: cannot find -lradospp in rados mimic (issue#37396, pr#25285, Nathan Cutler)
- test: Use a grep pattern that works across releases (issue#35845, issue#35909, pr#24017, David Zafman)
- tools: ceph-objectstore-tool: Allow target level as first positional … (issue#35846, issue#35992, pr#24116, David Zafman)