Planet Ceph

Aggregated news from external sources

  • December 10, 2013
    Profiling CPU usage of a ceph command (gperftools)

    After compiling Ceph from sources with: ./configure –with-debug CFLAGS=’-g’ CXXFLAGS=’-g’ The crushtool test mode is used to profile the crush implementation with: LD_PRELOAD=/usr/lib/libprofiler.so.0 \ CPUPROFILE=crush.prof src/crushtool \ -i src/test/cli/crushtool/one-hundered-devices.crushmap \ –test –show-bad-mappings as instructed in the cpu profiler documentation. The … Continue reading

  • December 9, 2013
    Ceph Osd Reweight
    ceph health
    HEALTH_WARN 1 near full osd(s)
    

    Arrhh, Trying to optimize a little weight given to the OSD.
    Rebalancing load between osd seems to be easy, but do not always go as we would like…

    Increase osd weight

    Before operation get the map of Placement Groups.

    $ ceph pg dump > /tmp/pg_dump.1
    

    Let’s go slowly, we will increase the weight of osd.13 with a step of 0.05.

    $ ceph osd tree | grep osd.13
    13  3                   osd.13  up  1   
    
    $ ceph osd crush reweight osd.13 3.05
    reweighted item id 13 name 'osd.13' to 3.05 in crush map
    
    $ ceph osd tree | grep osd.13
    13  3.05                osd.13  up  1
    

    The new weight has been changed in the crushmap. Look at what is happening in the cluster.

    $ ceph health details
    HEALTH_WARN 2 pgs backfilling; 2 pgs stuck unclean; recovery 16884/9154554 degraded (0.184%)
    pg 3.183 is stuck unclean for 434.029986, current state active+remapped+backfilling, last acting [1,13,5]
    pg 3.83 is stuck unclean for 2479.504088, current state active+remapped+backfilling, last acting [5,13,12]
    pg 3.183 is active+remapped+backfilling, acting [1,13,5]
    pg 3.83 is active+remapped+backfilling, acting [5,13,12]
    recovery 16884/9154554 degraded (0.184%)
    

    Well, pg 3.183 and 3.83 is in active+remapped+backfilling state :

    $ ceph pg map 3.183
    osdmap e4588 pg 3.183 (3.183) -> up [1,13] acting [1,13,5]
    
    $ ceph pg map 3.83
    osdmap e4588 pg 3.83 (3.83) -> up [13,5] acting [5,13,12]
    

    In this case, we can see that osd with id 13 has been added for this two placement groups. Pg 3.183 and 3.83 will respectively remove from osd 5 and 12.
    If we have a look on osd bandwidth, we can see those transfert osd.1 —> osd.13 and osd.5 —> osd.13 :

    OSD 1 and 5 are primary for pg 3.183 and 3.83 (see acting table) and OSD 13 is writing.

    I wait that cluster has finished. Then,

    $ ceph pg dump > /tmp/pg_dump.3
    

    Let us look at the change.

    # Old map
    $ egrep '^(3.183|3.83)' /tmp/pg_dump.1 | awk '{print $1,$9,$14,$15}'
    3.183 active+clean [1,5] [1,5]
    3.83 active+clean [12,5] [12,5]
    
    # New map
    $ egrep '^(3.183|3.83)' /tmp/pg_dump.3 | awk '{print $1,$9,$14,$15}'
    3.183 active+clean [1,13] [1,13]
    3.83 active+clean [13,5] [13,5]
    

    So, for pg 3.183 and 3.83, osd 5 and 12 will be replace by osd13

    Deacrese osd weight

    Same as above, but this time to reduce the weight for the osd in “near full ratio”.

    $ ceph pg dump > /tmp/pg_dump.4
    
    $ ceph osd tree | grep osd.7
    7   2.65                osd.7   up  1
    
    $ ceph osd crush reweight osd.7 2.6
    reweighted item id 7 name 'osd.7' to 2.6 in crush map
    
    $ ceph health detail
    HEALTH_WARN 2 pgs backfilling; 2 pgs stuck unclean; recovery 17117/9160466 degraded (0.187%)
    pg 3.ca is stuck unclean for 1097.132237, current state active+remapped+backfilling, last acting [4,6,7]
    pg 3.143 is stuck unclean for 1097.456265, current state active+remapped+backfilling, last acting [12,6,7]
    pg 3.143 is active+remapped+backfilling, acting [12,6,7]
    pg 3.ca is active+remapped+backfilling, acting [4,6,7]
    recovery 17117/9160466 degraded (0.187%)
    

    On the osd bandwidth, we can see those transfert osd.4 —> osd.6 and osd.12 —> osd.6 :

    OSD 4 and 12 are primary for pg 3.143 and 3.ca (see acting table) and OSD 6 is writing.
    The OSD 7 will be released from both PG who will both be added to the OSD 6.
    In my case, the osd 7 has no reading because it is only as replica for both pgs.

    # Before
    $ egrep '^(3.ca|3.143)' /tmp/pg_dump.3 | awk '{print $1,$9,$14,$15}'
    3.143 active+clean [12,7] [12,7]
    3.ca active+clean [4,7] [4,7]
    
    # After
    $ ceph pg dump > /tmp/pg_dump.5
    $ egrep '^(3.ca|3.143)' /tmp/pg_dump.5 | awk '{print $1,$9,$14,$15}'
    3.143 active+clean [12,6] [12,6]
    3.ca active+clean [4,6] [4,6]
    

    Well, obviously, the data are not really carried on the desired osd and now it is also too full.
    I think it will take a little time for something well balanced.

    Using crushtool

    A good idea could be to test using crushtool with otpion —show-utilization.

    Before retrieve current crushmap :

    $ ceph osd getcrushmap -o crushmap.bin
    

    You can show utilisation for a specific pool and rep size :

    $ ceph osd dump | grep '^pool 0'
    pool 0 'data' rep size 2 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 64 pgp_num 64 last_change 1 owner 0 
    $ crushtool --test -i crushmap.bin --show-utilization --rule 0 --num-rep=2
      device 0: 123
      device 1: 145
      device 2: 125
      device 3: 121
      device 4: 139
      device 5: 133
      device 6: 129
      device 7: 142
      device 8: 146
      device 9: 139
      device 10:    146
      device 11:    143
      device 12:    129
      device 13:    136
      device 14:    152
    

    Make modification, and test with new weight :

    $ crushtool -d crushmap.bin -o crushmap.txt
    # edit crushmap.txt
    $ crushtool -c crushmap.txt -o crushmap-new.bin
    $ crushtool --test -i crushmap-new.bin --show-utilization --rule 0 --num-rep=2
    

    If all goes well, reimport crushmap :

    $ ceph osd setcrushmap -i crushmap-new.bin
    

    http://ceph.com/docs/master/man/8/crushtool/

    osd reweight-by-utilization

    Also, you can use ceph osd reweight-by-utilization.

    http://ceph.com/docs/master/rados/operations/control/#osd-subsystem

  • December 9, 2013
    Ceph RADOS benchmarks replica impacts

    Some figures from a RADOS bench.

    The test maintained an IO concurrency of 1. Basically, we sent IOs one by one.

    Block Size
    Concurrency
    Replica Count
    Bandwidth

    1048576
    1
    1
    72.668

    1048576…

  • December 9, 2013
    Testing a Ceph crush map

    After modifying a crush map it should be tested to check that all rules can provide the specified number of replicas. If a pool is created to use the metadata rule with seven replicas, could it fail to find enough … Continue reading

  • December 7, 2013
    Ceph has a REST API!

    Ceph is a distributed object store and file system designed to
    provide excellent performance, reliability and scalability.
    It’s a technology I’ve been following and working with for the past
    couple months, especially around deploying it with puppet, I really
    have a feeling it is going to revolutionize the world of storage.

    I just realized Ceph has a REST API since the Dumpling (0.67)
    release.
    This API essentially wraps around the command line tools allowing you
    to monitor and manage your cluster.

    Inktank, the company behind Ceph (a bit like Canonical is behind
    Ubuntu) recently released an enterprise offering that includes a web
    interface to manage your cluster and it is based on that API.
    Calamari, their interface, is unfortunately closed source.

    Open source initiatives are already being worked on (1, 2),
    can’t wait to see what kind of nice things we can craft !

  • December 6, 2013
    Ceph has a REST API!

    I learned recently that a REST API existed for Ceph. I can’t wait to see what kind of nice things we can craft with it.

  • December 5, 2013
    Ceph + OpenStack :: Part-5

    OpenStack Instance boot from Ceph VolumeFor a list of images to choose from to create a bootable volume[root@rdo /(keystone_admin)]# nova image-list+————————————–+—————————–+——–+——–+| ID …

  • December 5, 2013
    Ceph + OpenStack :: Part-4

    Testing OpenStack Glance + RBDTo allow glance to keep images on ceph RBD volume , edit /etc/glance/glance-api.confdefault_store = rbd# ============ RBD Store Options =============================# Ceph configuration file path# If using cephx …

  • December 5, 2013
    Ceph + OpenStack :: Part-3

    Testing OpenStack Cinder + RBDCreating a cinder volume provided by ceph backend[root@rdo /]#[root@rdo /]# cinder create –display-name cinder-ceph-vol1 –display-description “first cinder volume on ceph backend” 10+———————+—–…

  • December 5, 2013
    Ceph + OpenStack :: Part-2

    Configuring OpenStack

    Two parts of openstack integrates with Ceph’s block devices:

    • Images: OpenStack Glance manages images for VMs.
    • Volumes: Volumes are block devices. OpenStack uses volumes to boot VMs, or to attach volumes to running VMs. OpenStack manages volumes using Cinder services.
      • Create pools for volumes and images:
    ceph osd pool create volumes 128
    ceph osd pool create images 128
    • Configure OpenStack Ceph Client – The nodes running glance-api and cinder-volume act as Ceph clients. Each requires the ceph.conf file:
    [root@ceph-mon1 ceph]# scp ceph.conf openstack:/etc/ceph
    • Installing ceph client packages on openstack node
      • First install Python bindings for librbd
    yum install python-ceph
      • Install ceph
    [root@ceph-mon1 ceph]# ceph-deploy install openstack
    • Setup Ceph Client Authentication for both pools along with keyrings
      • Create a new user for Nova/Cinder and Glance.
    ceph auth get-or-create client.volumes mon 'allow r' osd 'allow class-read object_prefix rbd_children, allow rwx pool=volumes, allow rx pool=images'
    ceph auth get-or-create client.images mon 'allow r' osd 'allow class-read object_prefix rbd_children, allow rwx pool=images' 
      • Add these keyrings to glance-api and cinder-volume nodes.
    ceph auth get-or-create client.images | ssh openstack tee /etc/ceph/ceph.client.images.keyring
    ssh openstack chown glance:glance /etc/ceph/ceph.client.images.keyring
    ceph auth get-or-create client.volumes | ssh openstack tee /etc/ceph/ceph.client.volumes.keyring
    ssh openstack chown cinder:cinder /etc/ceph/ceph.client.volumes.keyring
      • Hosts running nova-compute do not need the keyring. Instead, they store the secret key in libvirt. To create libvirt secret key you will need key from client.volumes.key
    ceph auth get-key client.volumes | ssh openstack tee client.volumes.key
      • on the compute nodes, add the secret key to libvirt create a secret.xml file
    cat > secret.xml < <EOF
    <secret ephemeral='no' private='no'>
    <usage type='ceph'>
    <name>client.volumes secret</name>
    </usage>
    EOF
      • Generate secret from created secret.xml file , make a note of uuid of secret output
    # virsh secret-define --file secret.xml 
      • Set libvirt secret using above key
    # virsh secret-set-value --secret {uuid of secret} --base64 $(cat client.volumes.key) && rm client.volumes.key secret.xml
    • Configure OpenStack-Glance to use CEPH
      • Glance can use multiple back ends to store images. To use Ceph block devices by default, edit /etc/glance/glance-api.conf and add:
    default_store=rbd
    rbd_store_user=images
    rbd_store_pool=images
      • If want to enable copy-on-write cloning of images into volumes, also add:
    show_image_direct_url=True
    • Configure OpenStack – Cinder to use CEPH 
      • OpenStack requires a driver to interact with Ceph block devices. You must specify the pool name for the block device. On your OpenStack node, edit/etc/cinder/cinder.conf by adding:
    volume_driver=cinder.volume.drivers.rbd.RBDDriver
    rbd_pool=volumes
    glance_api_version=2
    • If you’re using cephx authentication also configure the user and uuid of the secret you added to libvirt earlier:
    rbd_user=volumes
    rbd_secret_uuid={uuid of secret}
    • Restart Openstack
    service glance-api restart
    service nova-compute restart
    service cinder-volume restart
    • Once OpenStack is up and running, you should be able to create a volume with OpenStack on a Ceph block device.
    • NOTE : Make sure /etc/ceph/ceph.conf file have sufficient rights to be ready by cinder and glance users.

    Please Follow Ceph + OpenStack :: Part-3 for next step in installation


  • December 5, 2013
    Ceph + OpenStack :: Part-1

    Ceph & OpenStack IntegrationWe can use Ceph Block Device with openstack through libvirt, which configures the QEMU interface tolibrbd. To use Ceph Block Devices with openstack , we must install QEMU, libvirt, and&…

  • December 5, 2013
    Ceph Installation :: Part-3

    Creating Block Device from CephFrom monitor node , use ceph-deploy to install Ceph on your ceph-client1 node.[root@ceph-mon1 ~]# ceph-deploy install ceph-client1[ceph_deploy.cli][INFO ] Invoked (1.3): /usr/bin/ceph-deploy install ceph-client1[ceph_dep…

Careers