v12.2.10 Luminous released

TheAnalyst

This is the tenth bug fix release of the Luminous v12.2.x long term stable release series. The previous release, v12.2.9, introduced the PG hard-limit patches which were found to cause an issue in certain upgrade scenarios, and this release was expedited to revert those patches. If you already successfully upgraded to v12.2.9, you should not upgrade to v12.2.10, but rather wait for a release in which http://tracker.ceph.com/issues/36686 is addressed. All other users are encouraged to upgrade to this release.

Notable Changes

OSD

  • This release reverts the PG hard-limit patches added in v12.2.9 (https://tracker.ceph.com/issues/23979), in which, a partial upgrade,  during  a recovery or a backfill can cause the osds on the previous version, to fail with assert(trim_to <= info.last_complete). The workaround for users running v12.2.9 is to upgrade and restart all OSDs to a version with the pg hard limit, or only upgrade when all PGs are active+clean.

See also: http://tracker.ceph.com/issues/36686

As mentioned in the release announcements if you have updated to v12.2.9 already, please do NOT upgrade to v12.2.10 until the above mentioned issue is addressed

  • The bluestore_cache_* options are no longer needed. They are replaced by osd_memory_target, defaulting to 4GB. BlueStore will expand and contract its cache to attempt to stay within this limit. Users upgrading should note this is a higher default than the previous bluestore_cache_size of 1GB, so OSDs using BlueStore will use more memory by default.

    For more details, see BlueStore docs

Changelog

Changelog for v12.2.9

  • build/ops: add e2fsprogs runtime dependency (pr#24663, Guillaume Abrioux, Alfredo Deza)
  • build/ops: deb: fix ceph-mgr .pyc files left behind (issue#26883, pr#23832, Dan Mick)
  • build/ops: deb: require fuse for ceph-fuse (issue#21057, pr#23693, Thomas Serlin)
  • build/ops: rpm: selinux-policy fixes (pr#24136, Brad Hubbard)
  • build/ops: rpm: use updated gperftools (issue#35969, pr#24259, Kefu Chai)
  • ceph-volume: activate option –auto-detect-objectstore respects –no-systemd (issue#36249, pr#24358, Alfredo Deza)
  • ceph-volume: lsblk can fail to find PARTLABEL, must fallback to blkid (issue#36098, pr#24335, Alfredo Deza)
  • ceph-volume: add new ceph-handlers role from ceph-ansible (issue#36251, pr#24338, Alfredo Deza)
  • ceph-volume: batch carve out lvs for bluestore (issue#34535, pr#24075, Alfredo Deza)
  • ceph-volume: batch tests for mixed-type of devices (issue#35535, issue#27210, pr#23967, Alfredo Deza)
  • ceph-volume: batch: allow –osds-per-device, default it to 1 (issue#35913, pr#24080, Alfredo Deza)
  • ceph-volume: batch: allow journal+block.db sizing on the CLI (issue#36088, pr#24209, Alfredo Deza)
  • ceph-volume: custom cluster names fail on filestore trigger (issue#27210, pr#24280, Alfredo Deza)
  • ceph-volume: do not send (lvm) stderr/stdout to the terminal, use the logfile (issue#36492, pr#24741, Alfredo Deza)
  • ceph-volume: earlier detection for –journal and –filestore flag requirements (issue#24794, pr#24206, Alfredo Deza)
  • ceph-volume: fix journal and filestore data size in lvm batch –report (issue#36242, pr#24307, Andrew Schoen)
  • ceph-volume: fix zap not working with LVs (issue#35970, pr#24082, Alfredo Deza)
  • ceph-volume: lvm.prepare update help to indicate partitions are needed, not devices (issue#24795, pr#24451, Jeffrey Zhang, Alfredo Deza)
  • ceph-volume: make lvm batch idempotent (pr#24589, Andrew Schoen)
  • ceph-volume: remove version reporting from help menu (issue#36386, pr#24754, Alfredo Deza)
  • ceph-volume: skip processing devices that don’t exist when scanning system disks (issue#36247, pr#24382, Alfredo Deza)
  • cephfs: MDSMonitor: consider raising priority of MMDSBeacons from MDS so they are processed before other client messages (issue#26899, pr#23554, Patrick Donnelly)
  • cephfs: MDSMonitor: lookup of gid in prepare_beacon that has been removed will cause exception (issue#35848, pr#23990, Patrick Donnelly)
  • cephfs: ceph-fuse: add SELinux policy (issue#36103, pr#24313, Patrick Donnelly)
  • cephfs: ceph_volume_client: allow atomic update of RADOS objects (issue#24173, pr#24084, Rishabh Dave)
  • cephfs: ceph_volume_client: delay required after adding data pool to MDSMap (issue#25141, pr#23726, Patrick Donnelly)
  • cephfs: ceph_volume_client: py3 compatible (issue#17230, pr#24083, Rishabh Dave, Patrick Donnelly)
  • cephfs: cephfs-data-scan: print the max used ino (issue#26925, pr#23881, “Yan, Zheng”)
  • cephfs: cephfs-journal-tool: wrong layout info used (issue#24644, pr#24033, Gu Zhongyan)
  • cephfs: client: check for unmounted condition before printing debug output (issue#25213, pr#23617, Jeff Layton)
  • cephfs: client: drop null child dentries before try pruning inode’s alias (issue#22293, pr#24119, “Yan, Zheng”)
  • cephfs: client: fix choose_target_mds for requests that do name lookup (issue#26860, pr#23793, “Yan, Zheng”)
  • cephfs: client: retry remount on dcache invalidation failure (issue#27657, pr#24303, Venky Shankar)
  • cephfs: client: statfs inode count odd (issue#24849, pr#24376, Rishabh Dave)
  • cephfs: client: two ceph-fuse clients, one can not list out files created by another (issue#27051, pr#24282, Peng Xie)
  • cephfs: client: update ctime when modifying file content (issue#35945, pr#24323, “Yan, Zheng”)
  • common: get real hostname from container/pod environment (pr#23915, Sage Weil)
  • core: PGPool::update optimizations (pr#23969, Zac Medico)
  • core: ceph-disk: compatibility fix for python 3 (issue#35906, pr#24347, Tim Serong)
  • core: discover_all_missing() not always called during activating (issue#22837, pr#23817, Sage Weil, David Zafman)
  • core: kv/KeyValueDB: return const char* from MergeOperator::name() (issue#26875, pr#23566, Sage Weil)
  • core: librados application’s symbol could conflict with the libceph-common (issue#25154, pr#23483, Kefu Chai)
  • core: mgr/MgrClient: guard send_pgstats() with lock (issue#23370, pr#23791, Kefu Chai)
  • core: mgr/balancer: deepcopy best plan - otherwise we get latest (issue#27000, pr#23740, Stefan Priebe)
  • core: mgrc: enable disabling stats via mgr_stats_threshold (issue#25197, pr#23461, John Spray)
  • core: mon/OSDMonitor: invalidate max_failed_since on cancel_report (issue#35860, pr#24257, xie xingguo)
  • core: object errors found in be_select_auth_object() aren’t logged the same (issue#25108, pr#23871, David Zafman)
  • core: os/bluestore: bluestore_buffer_hit_bytes perf counter doesn’t reset (pr#23773, Igor Fedotov)
  • core: os/bluestore: cache autotuning and memory limit (pr#24065, Mark Nelson)
  • core: osd,mon: increase mon_max_pg_per_osd to 250 (issue#25112, pr#23862, Neha Ojha)
  • core: osd/PG: avoid choose_acting picking want with > pool size items (issue#35924, pr#24299, Sage Weil)
  • core: osdc/Objecter: fix split vs reconnect race (issue#22544, pr#24188, Sage Weil)
  • core: rados python bindings use prval from stack (issue#25175, pr#23864, Sage Weil)
  • doc: Fix broken urls (issue#25185, pr#23621, Jos Collin)
  • doc: remove deprecated ‘scrubq’ from ceph(8) (issue#35813, pr#24211, Ruben Kerkhof)
  • doc: rgw: ldap-auth: fixed option name ‘rgw_ldap_searchfilter’ (issue#23081, pr#23761, Konstantin Shalygin)
  • mds: MDBalancer::try_rebalance() may stop prematurely (issue#26973, pr#23884, “Yan, Zheng”)
  • mds: allows client to create .. and . dirents (issue#25113, pr#24329, Venky Shankar)
  • mds: avoid using g_conf->get_val<…>(…) in hot path (issue#24820, pr#23408, “Yan, Zheng”)
  • mds: calculate load by checking self CPU usage (issue#26834, pr#23505, “Yan, Zheng”)
  • mds: configurable timeout for client eviction (issue#25188, pr#24086, Patrick Donnelly, Venky Shankar)
  • mds: crash when dumping ops in flight (issue#26894, pr#23677, “Yan, Zheng”)
  • mds: curate priority of perf counters sent to mgr (issue#22097, issue#24004, pr#24089, Guan yunfei, Venky Shankar)
  • mds: explain delayed client_request due to subtree migration (issue#24840, pr#23678, Yan, Zheng, “Yan, Zheng”)
  • mds: health warning for slow metadata IO (issue#24879, pr#24171, “Yan, Zheng”)
  • mds: internal op missing events time ‘throttled’, ‘all_read’, ‘dispatched’ (issue#36114, pr#24410, Yanhu Cao)
  • mds: mds got laggy because of MDSBeacon stuck in mqueue (issue#23519, pr#23556, “Yan, Zheng”)
  • mds: optimize the way how max export size is enforced (issue#25131, pr#23789, “Yan, Zheng”)
  • mds: prevent MDSRank::evict_client from blocking finisher thread (issue#35720, pr#23946, “Yan, Zheng”)
  • mds: print is_laggy message once (issue#35250, pr#24138, Patrick Donnelly)
  • mds: rctime may go back (issue#35916, pr#24378, “Yan, Zheng”)
  • mds: reset heartbeat map at potential time-consuming places (issue#26858, pr#23507, Yan, Zheng, “Yan, Zheng”)
  • mds: runs out of file descriptors after several respawns (issue#35850, pr#24310, Patrick Donnelly)
  • mds: track average session uptime (issue#25013, pr#24421, Patrick Donnelly, Venky Shankar)
  • mds: use monotonic clock for beacon message timekeeping (issue#26959, pr#24311, Patrick Donnelly)
  • mgr: Sync the prometheus module (pr#23216, Boris Ranto)
  • mon: Automatically set expected_num_objects for new pools with >=100 PGs per OSD (issue#24687, pr#24395, Douglas Fuller)
  • msg: “challenging authorizer” messages appear at debug_ms=0 (issue#35251, pr#23943, Patrick Donnelly)
  • msg: async: clean up local buffers on dispatch (issue#35987, pr#24387, Greg Farnum)
  • msg: ceph_abort() when there are enough accepter errors in msg server (issue#23649, pr#24419, penglaiyxy@gmail.com)
  • osd: EC: slow/hung ops in multimds suite test (issue#23769, pr#24393, Sage Weil)
  • osd: ECBackend: don’t get result code of subchunk-read overwritten (issue#21769, pr#24342, songweibin)
  • osd: Limit pg log length during recovery/backfill so that we don’t run out of memory (issue#21416, pr#23211, Neha Ojha, xie xingguo)
  • osd: OSDMap: fix apply upmap segfault (issue#22056, pr#23579, Brad Hubbard)
  • osd: PG: add custom_reaction Backfilled and release reservations after bac… (issue#23614, pr#23493, Neha Ojha)
  • osd: PrimaryLogPG: fix potential pg-log overtrimming (pr#24308, xie xingguo)
  • osd: backport ‘bench’ and stdout changes (issue#24022, pr#23680, Коренберг Маркr, John Spray, Kefu Chai)
  • osd: read object attrs failed at EC recovery (issue#24406, pr#24327, xiaofei cui)
  • osd: scrub livelock (issue#26890, pr#24396, Sage Weil)
  • qa/suites/rados/upgrade/jewel-x-singleton: exclude python3-rados, python3-cephfs (pr#24479, Neha Ojha)
  • rbd: [rbd-mirror] failed assertion when updating mirror status (issue#36084, pr#24320, Jason Dillaman)
  • rbd: fix error import when the input is a pipe (issue#34536, pr#24003, songweibin)
  • rbd: librbd: blacklisted client might not notice it lost the lock (issue#34534, pr#24405, Song Shun, Mykola Golub, Jason Dillaman)
  • rbd: librbd: discard should wait for in-flight cache writeback to complete (issue#23548, pr#23594, Jason Dillaman)
  • rbd: librbd: ensure exclusive lock acquired when removing sync point snaps… (issue#24898, pr#24123, Mykola Golub, Jason Dillaman)
  • rbd: librbd: fix refuse to release lock when cookie is the same at rewatch (issue#27986, pr#23758, Song Shun)
  • rbd: librbd: fixed assert when flattening clone with zero overlap (issue#35702, pr#24285, Jason Dillaman)
  • rbd: librbd: image create request should validate data pool for self-managed snapshot support (issue#24675, pr#24390, Mykola Golub)
  • rbd: librbd: journaling unable request can not be sent to remote lock owner (issue#26939, pr#24100, Mykola Golub)
  • rbd: librbd: object map improperly flagged as invalidated (issue#24516, pr#24415, Jason Dillaman)
  • rbd: librbd: potential race on image create request complete (issue#24910, pr#23892, Mykola Golub)
  • rgw: ‘radosgw-admin sync error trim’ only trims partially (issue#24873, pr#24054, Casey Bodley)
  • rgw: Fix log level of gc_iterate_entries (issue#23801, pr#23665, iliul)
  • rgw: Limit the number of lifecycle rules on one bucket (issue#24572, pr#23522, Zhang Shaowen)
  • rgw: The delete markers generated by object expiration should have owner (issue#24568, pr#23545, Zhang Shaowen)
  • rgw: abort_bucket_multiparts() ignores individual NoSuchUpload errors (issue#35986, pr#24389, Casey Bodley)
  • rgw: change default rgw_thread_pool_size to 512 (issue#25214, issue#24544, pr#24034, Douglas Fuller, Casey Bodley)
  • rgw: cls/rgw: don’t assert in decode_list_index_key() (issue#24117, pr#24391, Yehuda Sadeh)
  • rgw: cls/rgw: ready rgw_usage_log_entry for extraction via ceph-dencoder (issue#34537, pr#23974, Vaibhav Bhembre)
  • rgw: fix chunked-encoding for chunks >1MiB (issue#35990, pr#24361, Robin H. Johnson)
  • rgw: fix deadlock on RGWIndexCompletionManager::stop (issue#26949, pr#24069, Yao Zongyou)
  • rgw: incremental data sync uses truncated flag to detect end of listing (issue#26952, pr#24242, Casey Bodley)
  • rgw: multisite: data sync error repo processing does not back off on empty (issue#26938, pr#24318, Casey Bodley)
  • rgw: multisite: intermittent failures in test_bucket_sync_disable_enable (issue#26895, pr#24316, Casey Bodley)
  • rgw: multisite: intermittent test_bucket_index_log_trim failures (issue#36034, pr#24398, Casey Bodley)
  • rgw: multisite: object metadata operations are skipped by sync (issue#24367, pr#24056, Casey Bodley)
  • rgw: multisite: object name should be urlencoded when we put it into ES (issue#23216, pr#24424, Chang Liu)
  • rgw: multisite: out of order updates to sync status markers (issue#35539, pr#24317, Yehuda Sadeh)
  • rgw: multisite: segfault on shutdown/realm reload (issue#35543, pr#24231, Casey Bodley)
  • rgw: multisite: update index segfault on shutdown/realm reload (issue#35905, pr#24397, Tianshan Qu)
  • rgw: raise debug level on redundant data sync error messages (issue#35830, issue#36037, pr#24135, Casey Bodley, Matt Benjamin)
  • rgw: raise default rgw_curl_low_speed_time to 300 seconds (issue#27989, pr#24046, Casey Bodley)
  • rgw: resharding produces invalid values of bucket stats (issue#36290, pr#24527, Abhishek Lekshmanan)
  • rgw: return x-amz-version-id: null when delete obj in versioning suspended bucket (issue#35814, pr#24190, yuliyang)
  • rgw: rgw_file: deep stat handling (issue#24915, pr#23499, Matt Benjamin)
  • tests: Excluded ‘python34-cephfs’ from the install tasks (pr#24650, Yuri Weinstein)
  • tests: Use pids instead of jobspecs which were wrong (issue#27056, pr#23901, David Zafman)
  • tests: cephfs: multifs requires 4 mds but gets only 2 (issue#24899, pr#24328, Patrick Donnelly)
  • tests: cls_rgw test is only run in rados suite: add it to rgw suite as well (issue#24815, pr#24070, Casey Bodley, Sage Weil)
  • tests: librbd: not valid to have different parents between image snapshots (issue#36097, pr#24245, Jason Dillaman)
  • tests: move mds/client config to qa from teuthology ceph.conf.template (issue#26900, issue#24839, pr#23877, Patrick Donnelly)
  • tests: qa/tasks: s3a fix mirror (pr#24039, Vasu Kulkarni)
  • tests: qa/workunits: replace ‘realpath’ with ‘readlink -f’ in fsstress.sh (issue#27211, issue#36409, pr#24620, Ilya Dryomov, Jason Dillaman)
  • tests: qa: add .qa helper link (pr#24134, Patrick Donnelly)
  • tests: qa: added v12.2.8 to the mix (issue#35541, pr#23913, Yuri Weinstein)
  • tests: remove knfs qa suite from future releases (issue#36075, pr#24268, Yuri Weinstein)
  • tools: ceph-objectstore-tool: Allow target level as first positional parameter (issue#35846, pr#24115, David Zafman)