The contents of this wiki are no longer actively maintained. The most current documentation is available at http://ceph.com/docs.

Cluster configuration

From Ceph wiki

Revision as of 03:16, 23 February 2012 by Dmick (Talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Ceph uses a single configuration file to define cluster membership, hostnames, paths to devices, and runtime options. By default, it is located at /etc/ceph/ceph.conf.

The configuration file is designed so that you can put everything you need in a single file and share it unmodified on all hosts in the cluster. No per-node configuration file changes should be necessary.

Sections are used to define which daemons (or daemon instances, e.g. 'osd0' or 'osd.0', 'mds.foo', etc.) exist, and what their startup and runtime options are. The '.' in the daemon instance name is optional. For any given daemon, options are searched for first in the daemon section, then daemon type section, then the global section. For example, for osd0, we search [osd0] and [osd.0], then [osd], then [global].

The following variable substitutions allow you to define many options generically in [global] or the daemon type section (e.g., [osd]):

  • $name - daemon name (e.g. 'osd.2', 'mds.foo', 'mon.1')
  • $type - daemon type (e.g. 'osd', 'mds', 'mon')
  • $id or $num - daemon number or name (e.g. '2', 'foo', '1')
  • $host - the host the daemon is assigned to

Substitution follows 'sh' syntax: you can use either "$name" or "${name}", depending on whether the variable is followed by an alphanumeric character.

Note that even if you define all options for a given daemon generically (e.g. in [osd] and [global]), you need to define (possibly empty) sections for each daemon instance (e.g., 'osd0', 'mds.foo') you want to create or start.

The key is that each daemon should define a 'host = something' option that specifies which host that daemon lives on. The init script will then (by default) only pay attention to daemon instances that match its hostname. If -a or --allhosts is specified, then the script will use ssh to start/stop/whatever daemons on all hosts.

Please see (/etc/ceph/)sample.ceph.conf for a simple example.

Contents

Node specific conf files

You can also use a different conf file on each host, defining parameters only for the daemons running on that host, but this is more difficult to maintain. The one caveat is that the conf file on every host should include the monitor daemon sections and 'mon addr' option, so that the daemon knows how to join the cluster. (All other monitor options can be excluded, except, of course, on the hosts running the monitors themselves.)

Node names

MDSes and Monitors can be named however you like, using alphanumerics. OSDs *must* be named using ordered integers: 1, 2, 3, 4...., n. This is because the OSD ID number is directly used as an index into a number of data structures. Attempting to name them while skipping numbers, or with padding zeroes, will unfortunately cause the daemon to segfault or otherwise die.

Cephx auth

Ceph also supports cephx secure authentication between the nodes, this to make your cluster more secure.

To configure auth, enable it in your config:

[global]
	auth supported = cephx
	debug ms = 0
        keyring = /etc/ceph/keyring.bin

[mds]
	debug mds = 1
	keyring = /etc/ceph/keyring.$name

[osd]
	osd data = /srv/ceph/osd$id
	keyring = /etc/ceph/keyring.$name
	debug osd = 1
	debug filestore = 1

Now your cluster will use secure auth.

To enable you to use your ceph -s and ceph -w command place admin_keyring.bin from your monitor's data-dir in /etc/ceph/keyring.bin

Mounting

When running cephx Ceph uses secure authentication. When mounting you have to specify the secret or secretfile and name option:

mount -t ceph -o name=admin,secret=<secret> 1.2.3.4:/ /mnt/ceph

Or via a secretfile ( Recommended )

mount -t ceph -o name=admin,secretfile=<secretfile> 1.2.3.4:/ /mnt/ceph

Where secretfile then only contains a string.

The secret can be derived from /etc/ceph/keyring.bin:

root@client02:~# ceph-authtool -l /etc/ceph/keyring.bin
client.admin
	 key: AQAyuDVM5TUNABAAwTDDT9V9BY8yiCkeJ6s37w==
	auid: 0
	caps: [mds] allow
	caps: [mon] allow *
	caps: [osd] allow *
root@client02:~#

Or

ceph-authtool --print-key /etc/ceph/keyring.bin > /etc/ceph/secret
chmod 600 /etc/ceph/secret

Since /etc/fstab is publicly readable, it's recommended to use a secretfile with read only permission for root.

If you get the following message: entity client.admin not found, do

ceph auth get client.admin -o keyring.bin
ceph-authtool --print-key keyring.bin

Managing a single, centralized conf file

You can manage a single master ceph.conf file that contains configuration information for the entire cluster, and then configure a 'fetch_config' script on each node that will pull down the configuration when needed. There is a sample /etc/ceph/sample.fetch_config script, although it is pretty trivial. The script takes a single parameter: a temporary filename where the fetched configuration should be written. If /etc/ceph/fetch_config is executable, the init script and mkcephfs will use that.

Note that if you run a binary directly (a daemon, or something 'ceph'), the config fetcher is not used. You should use the init script to start daemons.

For command line user tools (e.g., 'ceph'), you can define an /etc/ceph/ceph.conf that contains at a minimum the monitor ip addresses, as that is usually all you need.

For a temporary/test installation

You can build and run a distributed cluster out of an NFS-mounted home directory. Simply create a 'ceph.conf' file in the ceph src/ dir that defines all hosts. You can then run './mkcephfs -a ...' and './init-ceph -a' to mkfs and start/stop the daemons. (mkcephfs and init-ceph look if they're being run with "./", and if so, assume all configuration and binaries are in the current directory.)

Ports and Firewalls

In order to communicate, the Ceph daemons need to listen on several TCP ports:

ceph-mon
6789
ceph-mds
Uses the first available port starting from 6800
ceph-osd
Uses the first three available ports starting from 6800

So if you are running a monitor, MDS, and OSD on the same node, an iptables rule to open the required ports to just the 192.168.1.* subnet might look like:

 iptables -A INPUT -m multiport -p tcp -s 192.168.1.0/24 --dports 6789,6800:6803 -j ACCEPT

Debug Logging Configuration

The debug logging configuration is controlled by a few variables:

log_file

A path to the log file we should write to. If this is empty, we will not log to a file (unless log_dir is set).

log_to_stderr

Since 0.39, this is a boolean. On means log everything to stderr, off means log nothing to stderr.

In 0.38 and earlier, it had three settings:

0: log nothing to stderr
1: log urgent messages to stderr
2: log everything to stderr

log_to_syslog

Set this to true to enable logging to syslog. Syslog may not be able to capture all log messages if too many are generated. This depends on choice of syslogd and how you configure it, of course.

log_per_instance, log_sym_dir

These are used in the vstart.sh script, but should generally not be used outside of there.

General Issues

  • For daemons, log files will be written out by default, even if you don't configure anything. If you want to disable writing log files, you have to add this to your configuration:
log_file = ""
  • Enabling syslog does not automatically disable anything else. If you just want to write to syslog, use this:
log_file = ""
log_to_syslog = true
  • Running programs with the -d flag is equivalent to putting this into your configuration file:
log_to_stderr = 2
daemonize = false

Configuration tips

Some tips:

  • Ceph produces a lot of logs currently. Make sure your log partition is on a fast disk.
  • Enable noatime everywhere, particularly on the OSD store.

Example configuration

(This is from src/sample.ceph.conf in ceph.git, or /etc/ceph/sample.ceph.conf.)

; global
[global]
	; enable secure authentication
	auth supported = cephx

; monitors
;  You need at least one.  You need at least three if you want to
;  tolerate any node failures.  Always create an odd number.
[mon]
	mon data = /data/mon$id

	; some minimal logging (just message traffic) to aid debugging
	debug ms = 1

[mon.0]
	host = alpha
	mon addr = 192.168.0.10:6789

[mon.1]
	host = beta
	mon addr = 192.168.0.11:6789

[mon.2]
	host = gamma
	mon addr = 192.168.0.12:6789

; mds
;  You need at least one.  Define two to get a standby.
[mds]
	; where the mds keeps its secret encryption keys
	keyring = /data/keyring.$name

[mds.alpha]
	host = alpha

[mds.beta]
	host = beta

[mds.charlie]
        host = charlie
        mds standby replay = true
        mds standby for name = alpha

; osd
;  You need at least one.  Two if you want data to be replicated.
;  Define as many as you like.
[osd]
	; This is where the btrfs volume will be mounted.
	osd data = /data/osd$id

	; Ideally, make this a separate disk or partition.  A few GB
 	; is usually enough; more if you have fast disks.  You can use
 	; a file under the osd data dir if need be
 	; (e.g. /data/osd$id/journal), but it will be slower than a
 	; separate disk or partition.
	osd journal = /data/osd$id/journal
        
        ; If the OSD journal is a file, you need to specify the size. This is specified in MB.
        osd journal size = 512

[osd.0]
	host = delta

	; if 'btrfs devs' is not specified, you're responsible for
	; setting up the 'osd data' dir.  if it is not btrfs, things
	; will behave up until you try to recover from a crash (which
	; usually fine for basic testing).
	btrfs devs = /dev/sdx

[osd.1]
	host = epsilon
	btrfs devs = /dev/sdy

[osd.2]
	host = zeta
	btrfs devs = /dev/sdx

[osd.3]
	host = eta
	btrfs devs = /dev/sdy

You can also use uuid path for your drives instead /dev/sdx. To get the uuid path for your drive please use:

ubuntu@sepia70:~$ ls -l /dev/disk/by-uuid/cef77500-29cf-472f-b068-2344f1cc59e4
lrwxrwxrwx 1 root root 10 2011-05-17 13:48
/dev/disk/by-uuid/cef77500-29cf-472f-b068-2344f1cc59e4 -> ../../sda5


Note: Make sure that alpha, beta, gamma, etc. are valid host names. The latest versions of Ceph require a dot in between the section names ([osd.3] instead of [osd3]). Older versions didn't need the dot.

Personal tools