Aggregated news from external sources
An update on my talk submission for the OpenStack summit this year in Paris: my speech about Ceph performance analysis was not chosen by the committee for the official agenda. But at least one piece of good news: Marc’s talk will be part of t…
This option allows you to answer a fairly constant worry in the case of heterogeneous cluster. Indeed, all HDD do not have the same performance or not the same ratio performance / size.
With this option, it is possible to reduce the load on a specific disk without reducing the amount of data it contains.
Furthermore, the option is easy to modify because it does not result in data migration. Only preference between primary / secondary will be modified and propagated to clients.
Before playing with cluster options and tune crushmap, remember to verify that your client is compatible with those options.
You must enable ‘mon osd allow primary affinity = true’ on the mons before you can adjust primary-affinity. note that older clients will no longer be able to communicate with the cluster.
( For client kernel module you can have a look to http://cephnotes.ksperis.com/blog/2014/01/21/feature-set-mismatch-error-on-ceph-kernel-client. )
Look if the monitor has the primary affinity option:
# ceph –admin-daemon /var/run/ceph/ceph-mon.*.asok config show | grep ‘primary_affinity’
Edit ceph.conf dans add in section [mon]:
mon osd allow primary affinity = true
Reload mon and test.
We look how many pg is primary on osd.0, and how many is secondary :
# ceph pg dump | grep active+clean | egrep "\[0," | wc -l 100 # ceph pg dump | grep active+clean | egrep ",0\]" | wc -l 80
Try to change primary affinity :
# ceph osd primary-affinity osd.0 0.5 set osd.0 primary-affinity to 0.5 (8327682) # ceph pg dump | grep active+clean | egrep "\[0," | wc -l 48 # ceph pg dump | grep active+clean | egrep ",0\]" | wc -l 132 # ceph osd primary-affinity osd.0 0 set osd.0 primary-affinity to 0 (802) # ceph pg dump | grep active+clean | egrep "\[0," | wc -l 0 # ceph pg dump | grep active+clean | egrep ",0\]" | wc -l 180
Now there will be no more reading on this OSD.
The next Ceph development release is here! This release contains several meaty items, including some MDS improvements for journaling, the ability to remove the CephFS file system (and name it), several mon cleanups with tiered pools, several OSD performance branches, a new “read forward” RADOS caching mode, a prototype Kinetic OSD backend, and various radosgw …Read more
An example of using the python-ceph library for thumbnailing service.
Thumbor is an opensource tool for thumbnail generation developed by globo.
The tool allows to make a number of operations (crop, resize, filters …) directly via the URL. …
By default teuthology will clone the ceph-qa-suite repository and use the tasks it contains. If tasks have been modified localy, teuthology can be instructed to use a local directory by inserting something like: suite_path: /home/loic/software/ceph/ceph-qa-suite in the teuthology job yaml … Continue reading →
This stable update release for Dumpling includes primarily fixes for RGW, including several issues with bucket listings and a potential data corruption problem when multiple multi-part uploads race. There is also some throttling capability added in the OSD for scrub that can mitigate the performance impact on production clusters. We recommend that all Dumpling users …Read more
A simple exemple to make Replication for RBD.
Based on this post from scuttlemonkey : http://ceph.com/dev-notes/incremental-snapshots-with-rbd/, here is a sample script to synchronize rbd image on a remote cluster (eg for backups).
In the example below, the sync is made to an “archive” pool on the same cluster.
(For remote host, you need to use ssh key.)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45
An other exemple : https://www.rapide.nl/blog/item/ceph_-_rbd_replication
I’m glad to announce that Ceph is now part of the mirrors iWeb provides.
It is available in both IPv4 and IPv6 by:
rsync on ceph.mirror.iweb.ca::ceph
The mirror provides 4 Gbps of connectivity and is located on the eastern coast of Canada, more precisely in Montreal, Quebec.
Feel free to give it a try and let me know if you see any problems !
The aim is showing that it is possible to create a low-cost storage, efficient and scalable, using opensource solutions.
In the example below, I am using Ceph for scalability and reliability, combined with EnhanceIO to ensure very good performance.
The idea was to create a storage with two parts: the storage itself (backing store) and a large cache to keep good performance on all data actually used.
In fact, the volume needs may be important, but in the context of use for office, the data really used every day is only a portion of this storage.
In my case, I intend to use Ceph deployed on low-cost hardware to ensure a scalable and reliable conformatble volume, based on small servers with large SATA drives. Data access will be via Samba shares on a slightly more powerful machine with an SAS Array to create a big cache. (Since Firefly, it would be more interesting to use ceph cache tiering)
The material you choose should match the requirements of both performance and cost. To keep prices extremely competitive machines chosen here are based on Supermicro hardware without additional disk controller. Initially, the ceph storage is composed of 5 machines: store-01-02 store, store-03-04 store, store-05. Each node is build with simple CPU Core 2, 4 GB Memory, 1 SSD Drive (Intel 520 60GB) for system and Ceph journal and 3 SATA Drive of 3 TO (Seagate CS 3TB) with no specific Drive controller (Using onbord controller).
The cache must be on the same server as the file server. In my case, I use Dell server with 10 sas drive 10k in Raid10 for the cache.
I do not provide for a specific network, just a gigabit switch with vlan to separate public / private network for Ceph OSD and interface bonding for file server.
The system is currently used everyday since more than one year. And it works perfectly.
Here is some detail of the installation (from what I remember or I noted):
The current configuration is running on Ceph Cuttlefish (v0.61) and Debian Wheezy. (Installation made in June 2013.)
The operating system is installed on the SSD with 3 others partitions for osd journal.
$ apt-get install vim sudo sysstat ntp smartmontools $ vim /etc/default/smartmontools $ vim /etc/smartd.conf
The use of the kernel version 3.12 available in backports seems improve memory footprint on the osd node.
$ echo "deb http://ftp.fr.debian.org/debian wheezy-backports main" >> /etc/apt/sources.list $ apt-get install -t wheezy-backports linux-image-3.12-0.bpo.1-amd64
( Based on the official documentation: http://ceph.com/docs/master/start/ )
On each server create a ceph user
$ useradd -d /home/ceph -m ceph $ passwd ceph $ vim [[/etc/hosts]] 192.168.0.1 store-b1-01 192.168.0.2 store-b1-02 192.168.0.3 store-b1-03 192.168.0.4 store-b1-04 192.168.0.5 store-b1-05 $ echo "ceph ALL = (root) NOPASSWD:ALL" | sudo tee /etc/sudoers.d/ceph $ chmod 0440 /etc/sudoers.d/ceph
On store-b1-01 (deployment server)
Create a new key for ssh authentification :
$ ssh-keygen $ cluster="store-b1-01 store-b1-02 store-b1-03 store-b1-04 store-b1-05" $ for i in $cluster; do ssh-copy-id ceph@$i done $ vim /root/.ssh/config Host store* User ceph
Install ceph-deploy and its dependencies
$ wget -q -O- 'https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc' | sudo apt-key add - $ echo deb http://eu.ceph.com/debian-cuttlefish/ $(lsb_release -sc) main | sudo tee /etc/apt/sources.list.d/ceph.list $ apt-get update $ apt-get install python-pkg-resources python-setuptools ceph-deploy collectd $ curl http://python-distribute.org/distribute_setup.py | python $ easy_install pushy
Install ceph on cluster
$ ceph-deploy install $cluster $ ceph-deploy new store-b1-01 store-b1-02 store-b1-03 $ vim ceph.conf $ ceph-deploy mon create store-b1-01 store-b1-02 store-b1-03
$ ceph-deploy gatherkeys store-b1-01 $ ceph-deploy osd create \ store-b1-01:/dev/sdb:/dev/sda5 \ store-b1-01:/dev/sdc:/dev/sda6 \ store-b1-01:/dev/sdd:/dev/sda7 \ store-b1-02:/dev/sdb:/dev/sda5 \ store-b1-02:/dev/sdc:/dev/sda6 \ store-b1-02:/dev/sdd:/dev/sda7 \ store-b1-03:/dev/sdb:/dev/sda5 \ store-b1-03:/dev/sdc:/dev/sda6 \ store-b1-03:/dev/sdd:/dev/sda7 \ store-b1-04:/dev/sdb:/dev/sda5 \ store-b1-04:/dev/sdc:/dev/sda6 \ store-b1-04:/dev/sdd:/dev/sda7 \ store-b1-05:/dev/sdb:/dev/sda5 \ store-b1-05:/dev/sdc:/dev/sda6 \ store-b1-05:/dev/sdd:/dev/sda7
Add in fstab:
/dev/sdb1 /var/lib/ceph/osd/ceph-0 xfs inode64,noatime 0 0 /dev/sdc1 /var/lib/ceph/osd/ceph-1 xfs inode64,noatime 0 0 /dev/sdd1 /var/lib/ceph/osd/ceph-2 xfs inode64,noatime 0 0
One can check that all osd have been created, and check the cluster status :
$ ceph osd tree # id weight type name up/down reweight -1 41.23 root default -8 41.23 datacenter DC1 -7 41.23 rack b1 -2 8.24 host store-b1-01 0 2.65 osd.0 up 1 1 2.65 osd.1 up 1 2 2.65 osd.2 up 1 -3 8.24 host store-b1-02 3 2.65 osd.3 up 1 4 2.65 osd.4 up 1 5 2.65 osd.5 up 1 -4 8.24 host store-b1-03 6 2.65 osd.6 up 1 7 2.65 osd.7 up 1 8 2.65 osd.8 up 1 -5 8.24 host store-b1-04 9 2.65 osd.9 up 1 10 2.65 osd.10 up 1 11 2.65 osd.11 up 1 -6 8.24 host store-b1-05 12 2.65 osd.12 up 1 13 2.65 osd.13 up 1 14 2.65 osd.14 up 1
$ apt-get install ifenslave $ vim /etc/network/interface auto bond0 iface bond0 inet static address 192.168.0.12 netmask 255.255.0.0 gateway 192.168.0.1 slaves eth0 eth1 bond-mode 802.3ad $ echo "alias bond0 bonding options bonding mode=4 miimon=100 lacp_rate=1" > vim /etc/modprobe.d/bonding.conf $ echo "bonding" >> /etc/modules
The kernel version is important for using KRBD. I recommend to use at least kernel 3.10.26 or later.
$ apt-get install debconf-utils dpkg-dev debhelper build-essential kernel-package libncurses5-dev $ cd /usr/src/ $ wget http://www.kernel.org/pub/linux/kernel/v3.0/linux-3.6.11.tar.bz2 $ tar xjf linux-3.6.11.tar.bz2
Create rbd volume (Format 1) :
$ rbd create datashare/share1 --image-format=1 --size=1048576 $ mkdir /share1 $ echo "/dev/rbd/datashare/share1 /share1 xfs _netdev,barrier=0,nodiratime 0 0" >> /etc/fstab
The choice of cache mechanism is focused on “EnhanceIO” because it allows to enable or disable cache while a source volume is being used. This is particularly useful when we want to resize a volume without interrupting service.
Build EnhanceIO :
$ apt-get install build-essential dkms $ git clone https://github.com/stec-inc/EnhanceIO.git $ cd EnhanceIO/ $ wget http://ftp.de.debian.org/debian/pool/main/e/enhanceio/enhanceio_0+git20130620-3.debian.tar.xz $ tar xJf enhanceio_0+git20130620-3.debian.tar.xz $ dpkg-buildpackage -rfakeroot -b $ dpkg -i ../*.deb
Create cache :
For exemple in write-though :
(/dev/sdb2 is a local partition dedicate for cache)
$ eio_cli create -d /dev/rbd1 -s /dev/sdb2 -p lru -m wt -b 4096 -c share1
If you want to use write-back cache, you can protect the file system to mount before the cache by using a symbolic link in the udev script. ( https://github.com/ksperis/EnhanceIO/commit/954e167fdb580d514747512ce2bd1c9c29a77418 )
$ echo "deb http://ftp.sernet.de/pub/samba/3.6/debian wheezy main" >> /etc/apt/sources.list $ apt-get update $ apt-get install sernet-samba sernet-winbind xfsprogs krb5-user acl attr
Update startup script
$ vim /etc/init.d/samba # Should-Start: slapd cups rbdmap # Should-Stop: slapd cups rbdmap $ insserv -d samba
Configure and join the domain ( https://help.ubuntu.com/community/ActiveDirectoryWinbindHowto )
$ vi /etc/krb5.conf ... $ kinit Administrator@AD.MYDOMAIN.COM $ vim /etc/samba/smb.conf [global] workgroup = MYDOMAIN realm = AD.MYDOMAIN.COM netbios name = MYNAS wins server = 192.168.0.4 server string = %h server dns proxy = no log file = /var/log/samba/log.%m log level = 1 max log size = 1000 syslog = 0 panic action = /usr/share/samba/panic-action %d security = ADS winbind separator = + client use spnego = yes winbind use default domain = yes domain master = no local master = no preferred master = no encrypt passwords = true passdb backend = tdbsam obey pam restrictions = yes unix password sync = yes passwd program = /usr/bin/passwd %u passwd chat = *Enter\snew\s*\spassword:* %n\n *Retype\snew\s*\spassword:* %n\n *password\supdated\ssuccessfully* . pam password change = yes idmap uid = 10000-20000 idmap gid = 10000-20000 template shell = /bin/bash template homedir = /share4/home/%D/%U winbind enum groups = yes winbind enum users = yes map acl inherit = yes vfs objects = acl_xattr recycle shadow_copy2 recycle:repository =.recycle/%u recycle:keeptree = yes recycle:exclude = *.tmp recycle:touch = yes shadow:snapdir = .snapshots shadow:sort = desc ea support = yes map hidden = no map system = no map archive = no map readonly = no store dos attributes = yes load printers = no printing=bsd printcap name = /dev/null disable spoolss = yes guest account = invité map to guest = bad user $================================================================ [share0] comment = My first share path = /share0 writable = yes valid users = @"MYDOMAIN+Domain Admins" "MYDOMAIN+laurent" [share1] comment = Other share path = /share1 writable = yes valid users = @"MYDOMAIN+Domain Admins" "MYDOMAIN+laurent" .... $ /etc/init.d/samba restart $ net join -U Administrator $ wbinfo -u $ wbinfo -g $ vi /etc/nsswitch.conf passwd: compat winbind group: compat winbind
Using this script :
Fisrt, create rbd timemachine and mount to /mnt/timemachine with noatime and uquota if you want to use per user quota (Do not add to the autosnap script.)
$ apt-get install avahi-daemon avahi-utils $ vim /etc/avahi/services/smb.service <?xml version="1.0" standalone='no'?> <!DOCTYPE service-group SYSTEM "avahi-service.dtd"> <service-group> <name replace-wildcards="yes">%h (File Server)</name> <service> <type>_smb._tcp</type> <port>445</port> </service> <service> <type>_device-info._tcp</type> <port>0</port> <txt-record>model=RackMac</txt-record> </service> </service-group> $ vim /etc/avahi/services/timemachine.service <?xml version="1.0" standalone="no"?> <!DOCTYPE service-group SYSTEM "avahi-service.dtd"> <service-group> <name replace-wildcards="yes">%h (Time Machine)</name> <service> <type>_afpovertcp._tcp</type> <port>548</port> </service> <service> <type>_device-info._tcp</type> <port>0</port> <txt-record>model=TimeCapsule</txt-record> </service> <service> <type>_adisk._tcp</type> <port>9</port> <txt-record>sys=waMA=00:1d:09:63:87:e0,adVF=0x100</txt-record> <txt-record>dk0=adVF=0x83,adVN=TimeMachine</txt-record> </service> </service-group>
Install netatalk and configure like this :
$ apt-get install netatalk $ vim /etc/netatalk/afpd.conf - -tcp -noddp -nozeroconf -uamlist uams_dhx.so,uams_dhx2.so -nosavepassword -setuplog "default log_info /var/log/afpd.log" $ vim /etc/netatalk/AppleVolumes.default (Remove home directory + Add :) /mnt/timemachine/ "TimeMachine" cnidscheme:dbd options:usedots,upriv,tm allow:@"MYDOMAIN+Domain Users" $ vim /etc/pam.d/netatalk auth required pam_winbind.so account required pam_winbind.so session required pam_unix.so
Setting quota for a user : (Do not use soft limits, because it will not be recognized by timemachine.)
$ xfs_quota -x -c 'limit bhard=1024g user1' /mnt/timemachine
( For sequential IO, bandwidth is limited by the network card on the client side. )
Twitter || Facebook || Google+ || Lists/IRC Voting for submissions is well underway for the next OpenStack summit, and this one is shaping up to be another great place to talk about Ceph. Almost fifty talks are currently available for voting on the OpenStack site! Ceph has been steadily gaining popularity in the OpenStack world, …Read more
The Call for Speakers period for the OpenStack Summit from 03. – 07.11.2014 in Paris ended this week. Now the voting for the submitted talks started and ends at 11:59pm CDT on August 6. (6:59 am CEST on 7. August).I’ve submitted a talk to the stor…
In a Ceph cluster with low bandwidth, the root disk of an OpenStack instance became extremely slow during days. When an OSD is scrubbing a placement group, it has a significant impact on performances and this is expected, for a … Continue reading →