The Ceph Blog

Ceph blog stories provide high-level spotlights on our customers all over the world

  • May 5, 2015
    v9.0.0 released

    This is the first development release for the Infernalis cycle, and the first Ceph release to sport a version number from the new numbering scheme. The “9” indicates this is the 9th release cycle–I (for Infernalis) is the 9th letter. The first “0” indicates this is a development release (“1” will mean release candidate and …Read more

  • May 4, 2015
    OpenVZ: Kernel 3.10 With Rbd Module

    3.X Kernel for OpenVZ is out and it is compiled with rbd module:

    root@debian:~# uname -a
    Linux debian 3.10.0-3-pve #1 SMP Thu Jun 12 13:50:49 CEST 2014 x86_64 GNU/Linux

    root@debian:~# modinfo rbd
    filename: /lib/modules/3.10.0-3-pve/kernel/drive…

  • May 4, 2015
    Ceph using Monitor key/value store

    Ceph monitors make use of leveldb to store cluster maps, users and keys.
    Since the store is present, Ceph developers thought about exposing this through the monitors interface.
    So monitors have a built-in capability that allows you to store blobs of …

  • April 27, 2015
    v0.87.2 Giant released

    This is the second (and possibly final) point release for Giant. We recommend all v0.87.x Giant users upgrade to this release. NOTABLE CHANGES ceph-objectstore-tool: only output unsupported features when incompatible (#11176 David Zafman) common: do not implicitly unlock rwlock on destruction (Federico Simoncelli) common: make wait timeout on empty queue configurable (#10818 Samuel Just) crush: …Read more

  • April 27, 2015
    Ceph: manually repair object

    {% img center http://sebastien-han.fr/images/ceph-manually-repair-objects.jpg Ceph: manually repair object %}

    Debugging scrubbing errors can be tricky and you don’t necessary know how to proceed.

    Assuming you have a cluster state similar to this o…

  • April 25, 2015
    Ceph Loves Jumbo Frames

    Ceph Loves Jumbo Frames
    Who doesn’t loves a high performing Ceph storage cluster. To get this you need to tame it , i mean not only Ceph tuning but also Network needs to be tuned. The quickest way to tune your network is to enable Jumbo Frames.

    What are they :

    • They are ethernet frames with payload more than 1500 MTU
    • Can significantly improve network performance by making data transmission efficient.
    • Requires Gigabit ethernet
    • Most of the enterprise network device supports Jumbo Frames
    • Some people also call them ‘Giants’

    Enabling Jumbo Frames

    • Make sure your switch port is configured to accept Jumbo frames
    • On server side , set your network interface MTU to 9000
    1
    
    # ifconfig eth0 mtu 9000
    
    • Make changes permanent by updating network interface file and restart network services
    1
    
    # echo "MTU 9000" >> /etc/sysconfig/network-script/ifcfg-eth0
    
    • Confirm if MTU is used between two specific devices
    1
    
    # ip route get {IP-address}
    

    In my production Ceph cluster, i have seen improvements after enabling Jumbo Frames both on Ceph as well as on OpenStack nodes.

  • April 17, 2015
    Stretching Ceph networks

    This is a quick note about Ceph networks, so do not expect anything lengthy here :).

    Usually Ceph networks are presented as cluster public and cluster private.
    However it is never mentioned that you can use a separate network for the monitors.
    This mi…

  • April 15, 2015
    Ceph Pool Migration

    You have probably already be faced to migrate all objects from a pool to another, especially to change parameters that can not be modified on pool. For example, to migrate from a replicated pool to an EC pool, change EC profile, or to reduce the number of PGs…
    There are different methods, depending on the contents of the pool (RBD, objects), size…

    The simple way

    The simplest and safest method to copy all objects with the “rados cppool” command.
    However, it need to have read only access to the pool during the copy.

    For example for migrating to an EC pool :

    1
    2
    3
    4
    5
    
    pool=testpool
    ceph osd pool create $pool.new 4096 4096 erasure default
    rados cppool $pool $pool.new
    ceph osd pool rename $pool $pool.old
    ceph osd pool rename $pool.new $pool
    

    But it does not work in all cases. For example with EC pools : “error copying pool testpool => newpool: (95) Operation not supported”.

    Using Cache Tier

    This must to be used with caution, make tests before using it on a cluster in production. It worked for my needs, but I can not say that it works in all cases.

    I find this method interesting method, because it allows transparent operation, reduce downtime and avoid to duplicate all data. The principle is simple: use the cache tier, but in reverse order.

    At the begning, we have 2 pools : the current “testpool”, and the new one “newpool”

    Setup cache tier

    Configure the existing pool as cache pool :

    1
    2
    
    ceph osd tier add newpool testpool --force-nonempty
    ceph osd tier cache-mode testpool forward
    

    In ceph osd dump you should see something like that :

    --> pool 58 'testpool' replicated size 3 .... tier_of 80 
    

    Now, all new objects will be create on new pool :

    Now we can force to move all objects to new pool :

    1
    
    rados -p testpool cache-flush-evict-all
    

    Switch all clients to the new pool

    (You can also do this step earlier. For example, just after the cache pool creation.)
    Until all the data has not been flushed to the new pool you need to specify an overlay to search objects on old pool :

    1
    
    ceph osd tier set-overlay newpool testpool
    

    In ceph osd dump you should see something like that :

    --> pool 80 'newpool' replicated size 3 .... tiers 58 read_tier 58 write_tier 58
    

    With overlay, all operation will be forwarded to the old testpool :

    Now you can switch all the clients to access objects on the new pool.

    Finish

    When all data is migrate, you can remove overlay and old “cache” pool :

    1
    2
    
    ceph osd tier remove-overlay newpool
    ceph osd tier remove newpool testpool
    

    In-use object

    During eviction you can find some error :

    ....
    rb.0.59189e.2ae8944a.000000000001   
    rb.0.59189e.2ae8944a.000000000023   
    rb.0.59189e.2ae8944a.000000000006   
    testrbd.rbd 
    failed to evict testrbd.rbd: (16) Device or resource busy
    rb.0.59189e.2ae8944a.000000000000   
    rb.0.59189e.2ae8944a.000000000026   
    ...
    

    List watcher on object can help :

    1
    2
    
    rados -p testpool listwatchers testrbd.rbd
    watcher=10.20.6.39:0/3318181122 client.5520194 cookie=1
    

    Using Rados Export/Import

    For this, you need to use a temporary local directory.

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    
    rados export --create testpool tmp_dir
    [exported]     rb.0.4975.2ae8944a.000000002391
    [exported]     rb.0.4975.2ae8944a.000000004abc
    [exported]     rb.0.4975.2ae8944a.0000000018ce
    ...
    
    rados import tmp_dir newpool
    
    # Stop All IO
    # And redo a sync of modified objects
    
    rados export --workers 5 testpool tmp_dir
    rados import --workers 5 tmp_dir newpool
    
  • April 13, 2015
    v0.94.1 Hammer released

    This bug fix release fixes a few critical issues with CRUSH. The most important addresses a bug in feature bit enforcement that may prevent pre-hammer clients from communicating with the cluster during an upgrade. This only manifests in some cases (for example, when the ‘rack’ type is in use in the CRUSH map, and possibly …Read more

  • April 13, 2015
    Ceph: analyse journal write pattern

    Simple trick to analyse the write patterns applied to your Ceph journal.

    Assuming your journal device is /dev/sdb1, checking for 10 seconds:

    bash
    $ iostat -dmx /dev/sbd1 10 | awk ‘/[0-9]/ {print $8}’
    16.25

    Now converting sectors to KiB.

    16.25…

  • April 10, 2015
    Ceph make check in a ram disk

    When running tests from the Ceph sources, the disk is used intensively and a ram disk can be used to reduce the latency. The kernel must be rebooted to set the ramdisk maximum size to 16GB. For instance on Ubuntu … Continue reading

  • April 7, 2015
    v0.94 Hammer released

    This major release is expected to form the basis of the next long-term stable series. It is intended to supersede v0.80.x Firefly. Highlights since Giant include: RADOS Performance: a range of improvements have been made in the OSD and client-side librados code that improve the throughput on flash backends and improve parallelism and scaling on …Read more

Careers