Cluster configuration
From Ceph wiki
Ceph uses a single configuration file to define cluster membership, hostnames, paths to devices, and runtime options. By default, it is located at /etc/ceph/ceph.conf.
The configuration file is designed so that you can put everything you need in a single file and share it unmodified on all hosts in the cluster. No per-node configuration file changes should be necessary.
Sections are used to define which daemons (or daemon instances, e.g. 'osd0' or 'osd.0', 'mds.foo', etc.) exist, and what their startup and runtime options are. The '.' in the daemon instance name is optional. For any given daemon, options are searched for first in the daemon section, then daemon type section, then the global section. For example, for osd0, we search [osd0] and [osd.0], then [osd], then [global].
The following variable substitutions allow you to define many options generically in [global] or the daemon type section (e.g., [osd]):
- $name - daemon name (e.g. 'osd.2', 'mds.foo', 'mon.1')
- $type - daemon type (e.g. 'osd', 'mds', 'mon')
- $id or $num - daemon number or name (e.g. '2', 'foo', '1')
- $host - the host the daemon is assigned to
Substitution follows 'sh' syntax: you can use either "$name" or "${name}", depending on whether the variable is followed by an alphanumeric character.
Note that even if you define all options for a given daemon generically (e.g. in [osd] and [global]), you need to define (possibly empty) sections for each daemon instance (e.g., 'osd0', 'mds.foo') you want to create or start.
The key is that each daemon should define a 'host = something' option that specifies which host that daemon lives on. The init script will then (by default) only pay attention to daemon instances that match its hostname. If -a or --allhosts is specified, then the script will use ssh to start/stop/whatever daemons on all hosts.
Please see (/etc/ceph/)sample.ceph.conf for a simple example.
Contents |
Node specific conf files
You can also use a different conf file on each host, defining parameters only for the daemons running on that host, but this is more difficult to maintain. The one caveat is that the conf file on every host should include the monitor daemon sections and 'mon addr' option, so that the daemon knows how to join the cluster. (All other monitor options can be excluded, except, of course, on the hosts running the monitors themselves.)
Node names
MDSes and Monitors can be named however you like, using alphanumerics. OSDs *must* be named using ordered integers: 1, 2, 3, 4...., n. This is because the OSD ID number is directly used as an index into a number of data structures. Attempting to name them while skipping numbers, or with padding zeroes, will unfortunately cause the daemon to segfault or otherwise die.
Cephx auth
Ceph also supports cephx secure authentication between the nodes, this to make your cluster more secure.
To configure auth, enable it in your config:
[global]
auth supported = cephx
debug ms = 0
keyring = /etc/ceph/keyring.bin
[mds]
debug mds = 1
keyring = /etc/ceph/keyring.$name
[osd]
osd data = /srv/ceph/osd$id
keyring = /etc/ceph/keyring.$name
debug osd = 1
debug filestore = 1
Now your cluster will use secure auth.
To enable you to use your ceph -s and ceph -w command place admin_keyring.bin from your monitor's data-dir in /etc/ceph/keyring.bin
Mounting
When running cephx Ceph uses secure authentication. When mounting you have to specify the secret or secretfile and name option:
mount -t ceph -o name=admin,secret=<secret> 1.2.3.4:/ /mnt/ceph
Or via a secretfile ( Recommended )
mount -t ceph -o name=admin,secretfile=<secretfile> 1.2.3.4:/ /mnt/ceph
Where secretfile then only contains a string.
The secret can be derived from /etc/ceph/keyring.bin:
root@client02:~# ceph-authtool -l /etc/ceph/keyring.bin client.admin key: AQAyuDVM5TUNABAAwTDDT9V9BY8yiCkeJ6s37w== auid: 0 caps: [mds] allow caps: [mon] allow * caps: [osd] allow * root@client02:~#
Or
ceph-authtool --print-key /etc/ceph/keyring.bin > /etc/ceph/secret chmod 600 /etc/ceph/secret
Since /etc/fstab is publicly readable, it's recommended to use a secretfile with read only permission for root.
If you get the following message: entity client.admin not found, do
ceph auth get client.admin -o keyring.bin ceph-authtool --print-key keyring.bin
Managing a single, centralized conf file
You can manage a single master ceph.conf file that contains configuration information for the entire cluster, and then configure a 'fetch_config' script on each node that will pull down the configuration when needed. There is a sample /etc/ceph/sample.fetch_config script, although it is pretty trivial. The script takes a single parameter: a temporary filename where the fetched configuration should be written. If /etc/ceph/fetch_config is executable, the init script and mkcephfs will use that.
Note that if you run a binary directly (a daemon, or something 'ceph'), the config fetcher is not used. You should use the init script to start daemons.
For command line user tools (e.g., 'ceph'), you can define an /etc/ceph/ceph.conf that contains at a minimum the monitor ip addresses, as that is usually all you need.
For a temporary/test installation
You can build and run a distributed cluster out of an NFS-mounted home directory. Simply create a 'ceph.conf' file in the ceph src/ dir that defines all hosts. You can then run './mkcephfs -a ...' and './init-ceph -a' to mkfs and start/stop the daemons. (mkcephfs and init-ceph look if they're being run with "./", and if so, assume all configuration and binaries are in the current directory.)
Ports and Firewalls
In order to communicate, the Ceph daemons need to listen on several TCP ports:
- ceph-mon
- 6789
- ceph-mds
- Uses the first available port starting from 6800
- ceph-osd
- Uses the first three available ports starting from 6800
So if you are running a monitor, MDS, and OSD on the same node, an iptables rule to open the required ports to just the 192.168.1.* subnet might look like:
iptables -A INPUT -m multiport -p tcp -s 192.168.1.0/24 --dports 6789,6800:6803 -j ACCEPT
Debug Logging Configuration
The debug logging configuration is controlled by a few variables:
log_file
A path to the log file we should write to. If this is empty, we will not log to a file (unless log_dir is set).
log_to_stderr
Since 0.39, this is a boolean. On means log everything to stderr, off means log nothing to stderr.
In 0.38 and earlier, it had three settings:
0: log nothing to stderr 1: log urgent messages to stderr 2: log everything to stderr
log_to_syslog
Set this to true to enable logging to syslog. Syslog may not be able to capture all log messages if too many are generated. This depends on choice of syslogd and how you configure it, of course.
log_per_instance, log_sym_dir
These are used in the vstart.sh script, but should generally not be used outside of there.
General Issues
- For daemons, log files will be written out by default, even if you don't configure anything. If you want to disable writing log files, you have to add this to your configuration:
log_file = ""
- Enabling syslog does not automatically disable anything else. If you just want to write to syslog, use this:
log_file = "" log_to_syslog = true
- Running programs with the -d flag is equivalent to putting this into your configuration file:
log_to_stderr = 2 daemonize = false
Configuration tips
Some tips:
- Ceph produces a lot of logs currently. Make sure your log partition is on a fast disk.
- Enable noatime everywhere, particularly on the OSD store.
Example configuration
(This is from src/sample.ceph.conf in ceph.git, or /etc/ceph/sample.ceph.conf.)
; global
[global]
; enable secure authentication
auth supported = cephx
; monitors
; You need at least one. You need at least three if you want to
; tolerate any node failures. Always create an odd number.
[mon]
mon data = /data/mon$id
; some minimal logging (just message traffic) to aid debugging
debug ms = 1
[mon.0]
host = alpha
mon addr = 192.168.0.10:6789
[mon.1]
host = beta
mon addr = 192.168.0.11:6789
[mon.2]
host = gamma
mon addr = 192.168.0.12:6789
; mds
; You need at least one. Define two to get a standby.
[mds]
; where the mds keeps its secret encryption keys
keyring = /data/keyring.$name
[mds.alpha]
host = alpha
[mds.beta]
host = beta
[mds.charlie]
host = charlie
mds standby replay = true
mds standby for name = alpha
; osd
; You need at least one. Two if you want data to be replicated.
; Define as many as you like.
[osd]
; This is where the btrfs volume will be mounted.
osd data = /data/osd$id
; Ideally, make this a separate disk or partition. A few GB
; is usually enough; more if you have fast disks. You can use
; a file under the osd data dir if need be
; (e.g. /data/osd$id/journal), but it will be slower than a
; separate disk or partition.
osd journal = /data/osd$id/journal
; If the OSD journal is a file, you need to specify the size. This is specified in MB.
osd journal size = 512
[osd.0]
host = delta
; if 'btrfs devs' is not specified, you're responsible for
; setting up the 'osd data' dir. if it is not btrfs, things
; will behave up until you try to recover from a crash (which
; usually fine for basic testing).
btrfs devs = /dev/sdx
[osd.1]
host = epsilon
btrfs devs = /dev/sdy
[osd.2]
host = zeta
btrfs devs = /dev/sdx
[osd.3]
host = eta
btrfs devs = /dev/sdy
You can also use uuid path for your drives instead /dev/sdx. To get the uuid path for your drive please use:
ubuntu@sepia70:~$ ls -l /dev/disk/by-uuid/cef77500-29cf-472f-b068-2344f1cc59e4 lrwxrwxrwx 1 root root 10 2011-05-17 13:48 /dev/disk/by-uuid/cef77500-29cf-472f-b068-2344f1cc59e4 -> ../../sda5
Note: Make sure that alpha, beta, gamma, etc. are valid host names. The latest versions of Ceph require a dot in between the section names ([osd.3] instead of [osd3]). Older versions didn't need the dot.