June 14, 2018

New in Mimic: centralized configuration management

One of the key new features in Ceph Mimic is the ability to manage the cluster configuration–what traditionally resides in ceph.conf–in a central fashion.  Starting in Mimic, we also store configuration information in the monitors internal database, and seamlessly manage the distribution of that config info to all daemons and clients in the system.

Historically, operators wanting to make a configuration change would need to edit the ceph.conf files manually, distribute them to the right nodes, and ensure that the right daemons have been restarted.  Most large-scale users relied on external tools like ansible, puppet, or salt to do this, but the solution always varied, and there was always a disconnect between what the config management service thought the configuration should be and what configuration the running daemon is using (has its local ceph.conf updated?  has the daemon been restarted?  has the operator injected a configuration change via the command line?).

This new feature is designed to bridge this gap, providing a robust view into what the configuration should be (and whether the running configuration matches), and avoid the need for external tools to manage ceph.conf configuration files.  Most importantly, it provides a simplified configuration experience out of the box.

Note that the new capability is designed to interoperate with the traditional way of managing configurations via ceph.conf, so somebody upgrading to mimic doesn’t have to make any changes at all if they don’t want to.  However, we expect that the advantages of migrating to the new mode of operation will pay off.

The basics

The monitors jointly manage a configuration database.  The database has the same semantic structure as a ceph.conf file:

  • There are option names (e.g., osd scrub load threshold) and values.
  • A setting can be associated with a “global” group, and type group that applies to all entities of a given type (e.g., “osd” or “mds”), or a specific daemon (e.g., “osd.123”).

The ceph config dump command will output the equivalent of the cluster-wide ceph.conf in table format.

When a daemon or client starts up, it will look for a ceph.conf file like it always does.  In most cases a small ceph.conf is still necessary in order to identify who the monitors are.  For example, a typical minimal ceph.conf file might be:

mon host =,,

or better yet

mon host =

where ceph-mons is a DNS entry with multiple A records, one for each monitor.  This allows the number and identities of monitors to change over time without modifying any configuration files at all. More importantly, the configuration file on each is usually static over the lifetime of the cluster, simplifying deployment and management.

You can put any other settings you like in ceph.conf as well.  The overall priority order that Ceph uses to set options is:

  1. Compiled-in default values
  2. Cluster configuration database (the new thing!)
  3. Local ceph.conf file
  4. Runtime override (via “ceph daemon <daemon> config set …” or “ceph tell <daemon> injectargs …”)

Command line interface

Typing ceph config -h will summarize the set of commands available:

$ ceph config -h
config assimilate-conf                          Assimilate options from a conf, and return a 
                                                 new, minimal conf file
config dump                                     Show all configuration option(s)
config get <who> {<key>}                        Show configuration option(s) for an entity
config help <key>                               Describe a configuration option
config log {<int>}                              Show recent history of config changes
config reset <int>                              Revert configuration to previous state
config rm <who> <name>                          Clear a configuration option for one or more 
config set <who> <name> <value>                 Set a configuration option for one or more 
config show <who> {<key>}                       Show running configuration
config show-with-defaults <who>                 Show running configuration (including compiled-
                                                 in defaults)

A good place to start is simply dumping the cluster configuration:

$ ceph config dump
WHO    MASK LEVEL    OPTION                         VALUE RO 
global      advanced mon_pg_warn_min_per_osd        3                                                               
global      advanced osd_pool_default_min_size      1                                                               
global      advanced osd_pool_default_size          1                                                               
  mon       advanced mon_allow_pool_delete          true                                                            

We can set an option like so:

$ ceph config set osd debug_ms 1
$ ceph config dump
WHO    MASK LEVEL    OPTION                         VALUE RO 
global      advanced mon_pg_warn_min_per_osd        3                                                               
global      advanced osd_pool_default_min_size      1                                                               
global      advanced osd_pool_default_size          1                                                               
  mon       advanced mon_allow_pool_delete          true                                                            
  osd       advanced debug_ms                       1

Note that this is all that is necessary to make the change: any daemons or clients in the system that this option applies to will be notified of the configuration change immediately. No restarting of daemons, no use of the awkward ceph tell … injectargs … command, or anything else.

In the above dump output, the MASK field is a secondary restriction on which daemons or clients the option applies to, and can match either a CRUSH location (e.g., “rack:foo”) or an OSD class (e.g., “ssd” vs “hdd”). For example, we could set a higher debug level that only applies to OSDs that are backed by SSDs (and reported by the ceph osd crush tree command):

$ ceph config set osd/class:ssd debug_ms 2
$ ceph config dump
  osd            advanced debug_ms  1
  osd  class:ssd advanced debug_ms  2

Instead of dumping the entire config database you can also inspect the config for a individual daemon in the system. For example,

$ ceph config set osd.0 debug_osd 10
$ ceph config get osd.0
WHO    MASK      LEVEL    OPTION                    VALUE       RO 
osd    class:ssd advanced debug_ms                  2/2            
osd.0            advanced debug_osd                 10/10
global           advanced mon_pg_warn_min_per_osd   3              

This output tells you which options and values apply the daemon, as well as where the option is coming from (is it set globally, for this daemon specifically, etc.).

Naturally, a config entry can also be cleared:

$ ceph config rm osd/class:ssd debug_ms
$ ceph config get osd.0
WHO    MASK LEVEL    OPTION                    VALUE       RO 
osd         advanced debug_ms                  1/1            
global      advanced mon_pg_warn_min_per_osd   3              

Enforced configuration schema

One of the new advantages of the new approach is that configuration values are validated and checked at the time they are set. The configuration schema (what options exist and what values are legal) are compiled into the system and globally known. So, if you try to set something that doesn’t make sense, you’ll get an informative error message without affecting the existing configuration. For example,

$ ceph config set osd.10 debug_osd very_high
Error EINVAL: error parsing value: value must take the form N or N/M, where N and M are integers
$ ceph config set osd.10 bluestore_compression_mode 1
Error EINVAL: error parsing value: '1' is not one of the permitted values: none, passive, aggressive, force

The schema for a particular option can be queried with a help command:

$ ceph config help bluestore_compression_mode
bluestore_compression_mode - Default policy for using compression when pool does not specify
  (std::string, advanced)
  Default: none
  Possible values:  none passive aggressive force
  Can update at runtime: true

'none' means never use compression.  'passive' means use compression when clients hint that data
is compressible.  'aggressive' means use compression unless clients hint that data is not
compressible.  This option is used when the per-pool property for the compression mode is not

One thing you’ll notice is that advanced on the second line. All options are divided into three categories: basic, advanced, and dev. The dev options are meant for development, testing, or are generally not intended to ever be modified by a user. The advanced options are, unsurprising, only meant for advanced users. There are relatively few basic options because, well, in general we aim not to require much in the way of configuration in order to make Ceph work.

Some numeric options include a minimum and maximum value, and will accept suffixes like K or M for large values:

$ ceph config set mon mon_data_size_warn 100G
$ ceph config get mon.a
WHO    MASK LEVEL    OPTION                         VALUE        RO 
mon         advanced mon_data_size_warn             107374182400    

Note that whether ‘K’ means 1000 or 1024 depends on the configuration option in question: some are based on SI units (base-10) and some on IEC units (base-2, like KiB and GiB).

Running configuration

Because configuration can come from many places (defaults, cluster config, local ceph.conf, operator override) there is a show command that returns the active configuration options as reported by any daemon in the system. For example,

$ ceph daemon mgr.x config set debug_mgr 10  # manual override of config option
$ ceph config set mgr.x ms_type simple       # set an option normally
$ ceph config show mgr.x
debug_mgr  10/10       override mon[20/20]         
debug_mon  20/20       mon                         
debug_ms   1/1         file                        
ms_type    async+posix default             mon     

The NAME and VALUE columns tell you which options and values are currently in effect. SOURCE tells you where the value came from: “override” from our ceph daemon command above, “mon” from the cluster configuration database, and “file” from a local ceph.conf file. In the case of an override source, the OVERRIDES column tells you what the value would have been (and from where); in this case debug_mgr would have been set to 20/20 by the mon if we hadn’t issued that ceph daemon … command.

The IGNORES column indicates where there is an option that has been set to a new value but the daemon is still using an old value. This is true for lots of options that can only take effect when the daemon is restarted, such as ms_type (which controls which message passing implementation to use). You can also see that this is a read-only value from the RO column in config get command results:

$ ceph config get mgr.x
mgr         advanced  debug_mgr  20/20   *  
mgr         advanced  ms_type    simple  *  

You’ll also note that the help result for ms_type tells us the same thing:

$ ceph config help ms_type
  Default: async+posix
  Can update at runtime: false

Configuration change history

One of the key advantages of using an external configuration management framework is that those tools usually store the declarative system configuration in a source control tool like Git. This provides a history of changes to the system so that if something goes wrong changes can be undone.

Ceph’s new configuration management provides a simple version of that capability. Every configuration change in the system is recorded and easily viewable:

$ ceph config log
--- 15 --- 2018-06-13 15:02:46.176060 ---
- mgr.x/ms_type = simple
+ mgr.x/ms_type = async
--- 14 --- 2018-06-13 14:52:51.877714 ---
+ mgr.x/ms_type = simple
--- 13 --- 2018-06-13 14:45:33.988326 ---
+ mon/mon_data_size_warn = 107374182400

The output is meant to be somewhat familiar to anyone familiar with diff output, where “+” lines indicate a new configuration entry and “-” lines indicate a removed or replaced entry (and its prior value).

The configuration of the system can be reverted to a previous state based on the numeric identifier preceding each change record. For example, to undo our changes to ms_type,

$ ceph config reset 13
$ ceph config log
--- 16 --- 2018-06-13 15:05:10.960659 --- reset to 13 ---
- mgr.x/ms_type = async
--- 15 --- 2018-06-13 15:02:46.176060 ---
- mgr.x/ms_type = simple
+ mgr.x/ms_type = async
--- 14 --- 2018-06-13 14:52:51.877714 ---
+ mgr.x/ms_type = simple
--- 13 --- 2018-06-13 14:45:33.988326 ---
+ mon/mon_data_size_warn = 107374182400

(The net effect of resetting to 13 is that the ms_type entry is removed, even though it had two intermediate values since then.) Since the reset command is a configuration change like any other you can also undo it with another reset command.

Migrating from old configuration files

Any existing cluster is likely to have various settings in the ceph.conf files stored on each node of the system. We also provide a command to easily import these files into the configuration database.

One challenge is that not all options are suitable to be stored in the central config database. The mon_host option is a good example: it’s used to bootstrap a connection to the cluster before fetching any additional configuration options. For this reason, the import command takes both the existing config file as input and generates a (hopefully shorter) config file for output that contains any options that could not be assimilated. For example,

$ cat ceph.conf
mon host =
debug_osd = 0/0
mds invalid option = this option does not exist

$ ceph config assimilate-conf -i ceph.conf -o
        mon_host =

        mds_invalid_option = this option does not exist

$ ceph config get osd.1
WHO    MASK LEVEL    OPTION                         VALUE       RO 
osd.1       advanced debug_osd                      0/0            

In this simple example, only the debug_osd option for osd.1 was imported; mon_host was left behind (it’s needed for bootstrapping) and mds_invalid_option was left behind (it was not a recognized option).

For a cluster making a transition to a cluster-managed config, the basic process would be to run an assimilate command like the above on each host to incorporate settings into the cluster’s configuration database, leaving behind only the bootstrap-related options on each host. For example,

$ cd /etc/ceph
$ ceph config assimilate-conf -i ceph.conf -o
$ cat   # make sure it looks okay!
$ mv ceph.conf

This will work in the majority of cases. However, be warned that if assimilating a configuration file will change any settings mentioned in the input, which means that if two hosts have config files setting the same option to different values, the end result will depend on the order in which the files are assimilated.

Next steps

Looking forward, the key next step is to surface all of these configuration options into the new management dashboard. There is a in-flight pull request that adds this functionality now that will provide this for the upcoming Nautilus release.

Sage Weil