The Ceph Blog

Ceph blog stories provide high-level spotlights on our customers all over the world

  • September 8, 2014
    v0.85 released

    This is the second-to-last development release before Giant that contains new functionality. The big items to land during this cycle are the messenger refactoring from Matt Benjmain that lays some groundwork for RDMA support, a performance improvement series from SanDisk that improves performance on SSDs, lots of improvements to our new standalone civetweb-based RGW frontend, …Read more

  • September 8, 2014
    Analyse Ceph object directory mapping on disk

    Useful to understand benchmark result and Ceph’s second write penalty (this phenomena is explained here in the section I.1).

    I. Use an RBD image and locate the objects

    Let’s start with a simple 40 MB RBD image and get some statistics about this image:

    bash
    $ sudo rbd info volumes/2578a6ed-2bab-4f71-910d-d42f18c80d11_disk
    rbd image '2578a6ed-2bab-4f71-910d-d42f18c80d11_disk':
    size 40162 kB in 10 objects
    order 22 (4096 kB objects)
    block_name_prefix: rbd_data.97ab74b0dc51
    format: 2
    features: layering

    Now using my script to validate the placement of each object.
    Please note that all the blocks must be allocated, if not simply map the device and run dd.

    bash
    $ sudo ./rbd-placement volumes 2578a6ed-2bab-4f71-910d-d42f18c80d11_disk
    osdmap e2518 pool 'volumes' (28) object 'rbd_data.97ab74b0dc51.0000000000000000' -> pg 28.b52329a6 (28.6) -> up ([0,1], p0) acting ([0,1], p0)
    osdmap e2518 pool 'volumes' (28) object 'rbd_data.97ab74b0dc51.0000000000000009' -> pg 28.7ac71fc6 (28.6) -> up ([0,1], p0) acting ([0,1], p0)
    osdmap e2518 pool 'volumes' (28) object 'rbd_data.97ab74b0dc51.0000000000000002' -> pg 28.f9256dc8 (28.8) -> up ([1,0], p1) acting ([1,0], p1)
    osdmap e2518 pool 'volumes' (28) object 'rbd_data.97ab74b0dc51.0000000000000005' -> pg 28.141bf9ca (28.a) -> up ([1,0], p1) acting ([1,0], p1)
    osdmap e2518 pool 'volumes' (28) object 'rbd_data.97ab74b0dc51.0000000000000003' -> pg 28.58c5376b (28.b) -> up ([1,0], p1) acting ([1,0], p1)
    osdmap e2518 pool 'volumes' (28) object 'rbd_data.97ab74b0dc51.0000000000000008' -> pg 28.a310d3d0 (28.10) -> up ([1,0], p1) acting ([1,0], p1)
    osdmap e2518 pool 'volumes' (28) object 'rbd_data.97ab74b0dc51.0000000000000001' -> pg 28.88755b97 (28.17) -> up ([1,0], p1) acting ([1,0], p1)
    osdmap e2518 pool 'volumes' (28) object 'rbd_data.97ab74b0dc51.0000000000000004' -> pg 28.e52ce538 (28.18) -> up ([1,0], p1) acting ([1,0], p1)
    osdmap e2518 pool 'volumes' (28) object 'rbd_data.97ab74b0dc51.0000000000000006' -> pg 28.80a6755a (28.1a) -> up ([0,1], p0) acting ([0,1], p0)
    osdmap e2518 pool 'volumes' (28) object 'rbd_data.97ab74b0dc51.0000000000000007' -> pg 28.9c45d2fa (28.1a) -> up ([0,1], p0) acting ([0,1], p0)

    This image is stored on OSD 0 and OSD 1.
    Then I just picked up the all the PGs and the rbd_prefix.
    We can reflect the placement in our directory hierarchy using the tree command:

    “`bash
    $ sudo tree -Ph ‘97ab74b0dc51‘ /var/lib/ceph/osd/ceph-0/current/{28.6,28.8,28.a,28.b,28.10,28.17,28.18,28.1a}_head/
    /var/lib/ceph/osd/ceph-0/current/28.6_head/
    ├── [4.0M] rbd\udata.97ab74b0dc51.0000000000000000head_B52329A61c
    └── [3.2M] rbd\udata.97ab74b0dc51.0000000000000009head_7AC71FC61c
    /var/lib/ceph/osd/ceph-0/current/28.8_head/
    └── [4.0M] rbd\udata.97ab74b0dc51.0000000000000002head_F9256DC81c
    /var/lib/ceph/osd/ceph-0/current/28.a_head/
    └── [4.0M] rbd\udata.97ab74b0dc51.0000000000000005head_141BF9CA1c
    /var/lib/ceph/osd/ceph-0/current/28.b_head/
    └── [4.0M] rbd\udata.97ab74b0dc51.0000000000000003head_58C5376B1c
    /var/lib/ceph/osd/ceph-0/current/28.10_head/
    └── [4.0M] rbd\udata.97ab74b0dc51.0000000000000008head_A310D3D01c
    /var/lib/ceph/osd/ceph-0/current/28.17_head/
    └── [4.0M] rbd\udata.97ab74b0dc51.0000000000000001head_88755B971c
    /var/lib/ceph/osd/ceph-0/current/28.18_head/
    └── [4.0M] rbd\udata.97ab74b0dc51.0000000000000004head_E52CE5381c
    /var/lib/ceph/osd/ceph-0/current/28.1a_head/
    ├── [4.0M] rbd\udata.97ab74b0dc51.0000000000000006head_80A6755A1c
    └── [4.0M] rbd\udata.97ab74b0dc51.0000000000000007head_9C45D2FA1c

    0 directories, 10 files
    “`

    II. Analyse your disk geometry

    For the sake of simplicity I used a virtual hard drive disk attached to my virtual machine. The disk is 10GB big.

    “`bash
    root@ceph:~# fdisk -l /dev/sdb1

    Disk /dev/sdb1: 10.5 GB, 10484711424 bytes
    255 heads, 63 sectors/track, 1274 cylinders, total 20477952 sectors
    Units = sectors of 1 * 512 = 512 bytes
    Sector size (logical/physical): 512 bytes / 512 bytes
    I/O size (minimum/optimal): 512 bytes / 512 bytes
    Disk identifier: 0x00000000

    Disk /dev/sdb1 doesn’t contain a valid partition table
    “`

    So I have 20477952 sectors/blocks of 512 bytes in total, here (20477952*512)/1024/1024/1024 = ~10 GB

    III. Print block mapping for each object

    Now I will be assuming that the underlying filesystem of your OSDs data is XFS.
    Otherwise the following will not possible.

    bash
    $ sudo for i in $(find /var/lib/ceph/osd/ceph-0/current/{28.6,28.8,28.a,28.b,28.10,28.17,28.18,28.1a}_head/*97ab74b0dc51*) ; do xfs_bmap -v $i ;done
    /var/lib/ceph/osd/ceph-0/current/28.6_head/rbd\udata.97ab74b0dc51.0000000000000000__head_B52329A6__1c:
    EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET TOTAL
    0: [0..2943]: 1992544..1995487 0 (1992544..1995487) 2944
    1: [2944..8191]: 1987296..1992543 0 (1987296..1992543) 5248
    /var/lib/ceph/osd/ceph-0/current/28.6_head/rbd\udata.97ab74b0dc51.0000000000000009__head_7AC71FC6__1c:
    EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET TOTAL
    0: [0..255]: 1987040..1987295 0 (1987040..1987295) 256
    1: [256..1279]: 1986016..1987039 0 (1986016..1987039) 1024
    2: [1280..6599]: 1978848..1984167 0 (1978848..1984167) 5320
    /var/lib/ceph/osd/ceph-0/current/28.8_head/rbd\udata.97ab74b0dc51.0000000000000002__head_F9256DC8__1c:
    EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET TOTAL
    0: [0..8191]: 19057336..19065527 3 (3698872..3707063) 8192
    /var/lib/ceph/osd/ceph-0/current/28.a_head/rbd\udata.97ab74b0dc51.0000000000000005__head_141BF9CA__1c:
    EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET TOTAL
    0: [0..8191]: 13909496..13917687 2 (3670520..3678711) 8192
    /var/lib/ceph/osd/ceph-0/current/28.b_head/rbd\udata.97ab74b0dc51.0000000000000003__head_58C5376B__1c:
    EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET TOTAL
    0: [0..639]: 7303544..7304183 1 (2184056..2184695) 640
    1: [640..8191]: 10090000..10097551 1 (4970512..4978063) 7552
    /var/lib/ceph/osd/ceph-0/current/28.10_head/rbd\udata.97ab74b0dc51.0000000000000008__head_A310D3D0__1c:
    EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET TOTAL
    0: [0..639]: 12289352..12289991 2 (2050376..2051015) 640
    1: [640..8191]: 13934072..13941623 2 (3695096..3702647) 7552
    /var/lib/ceph/osd/ceph-0/current/28.17_head/rbd\udata.97ab74b0dc51.0000000000000001__head_88755B97__1c:
    EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET TOTAL
    0: [0..8191]: 19049144..19057335 3 (3690680..3698871) 8192
    /var/lib/ceph/osd/ceph-0/current/28.18_head/rbd\udata.97ab74b0dc51.0000000000000004__head_E52CE538__1c:
    EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET TOTAL
    0: [0..8191]: 13901304..13909495 2 (3662328..3670519) 8192
    /var/lib/ceph/osd/ceph-0/current/28.1a_head/rbd\udata.97ab74b0dc51.0000000000000006__head_80A6755A__1c:
    EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET TOTAL
    0: [0..6911]: 13917688..13924599 2 (3678712..3685623) 6912
    1: [6912..8191]: 13932792..13934071 2 (3693816..3695095) 1280
    /var/lib/ceph/osd/ceph-0/current/28.1a_head/rbd\udata.97ab74b0dc51.0000000000000007__head_9C45D2FA__1c:
    EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET TOTAL
    0: [0..8191]: 13924600..13932791 2 (3685624..3693815) 8192

    It seems that I have a bit of fragmentation on my filesystem since some files are mapping to more than one extend.
    Thus before going further I am going to defragment some files.
    Example for one file:

    “`bash
    $ sudo xfs_bmap -v /var/lib/ceph/osd/ceph-0/current/28.6_head/rbd\udata.97ab74b0dc51.0000000000000009head_7AC71FC61c
    /var/lib/ceph/osd/ceph-0/current/28.6_head/rbd\udata.97ab74b0dc51.0000000000000009head_7AC71FC61c:
    EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET TOTAL
    0: [0..255]: 1987040..1987295 0 (1987040..1987295) 256
    1: [256..1279]: 1986016..1987039 0 (1986016..1987039) 1024
    2: [1280..6599]: 1978848..1984167 0 (1978848..1984167) 5320

    $ sudo xfs_fsr /var/lib/ceph/osd/ceph-0/current/28.6_head/rbd\udata.97ab74b0dc51.0000000000000009head_7AC71FC61c

    $ sudo xfs_bmap -v /var/lib/ceph/osd/ceph-0/current/28.6_head/rbd\udata.97ab74b0dc51.0000000000000009head_7AC71FC61c
    /var/lib/ceph/osd/ceph-0/current/28.6_head/rbd\udata.97ab74b0dc51.0000000000000009head_7AC71FC61c:
    EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET TOTAL
    0: [0..6599]: 1860632..1867231 0 (1860632..1867231) 6600
    “`

    After the operation, we have the following repartition:

    bash
    $ sudo for i in $(find /var/lib/ceph/osd/ceph-0/current/{28.6,28.8,28.a,28.b,28.10,28.17,28.18,28.1a}_head/*97ab74b0dc51*) ; do xfs_bmap -v $i ;done
    /var/lib/ceph/osd/ceph-0/current/28.6_head/rbd\udata.97ab74b0dc51.0000000000000000__head_B52329A6__1c:
    EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET TOTAL
    0: [0..8191]: 1852440..1860631 0 (1852440..1860631) 8192
    /var/lib/ceph/osd/ceph-0/current/28.6_head/rbd\udata.97ab74b0dc51.0000000000000009__head_7AC71FC6__1c:
    EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET TOTAL
    0: [0..6599]: 1860632..1867231 0 (1860632..1867231) 6600
    /var/lib/ceph/osd/ceph-0/current/28.8_head/rbd\udata.97ab74b0dc51.0000000000000002__head_F9256DC8__1c:
    EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET TOTAL
    0: [0..8191]: 19057336..19065527 3 (3698872..3707063) 8192
    /var/lib/ceph/osd/ceph-0/current/28.a_head/rbd\udata.97ab74b0dc51.0000000000000005__head_141BF9CA__1c:
    EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET TOTAL
    0: [0..8191]: 13909496..13917687 2 (3670520..3678711) 8192
    /var/lib/ceph/osd/ceph-0/current/28.b_head/rbd\udata.97ab74b0dc51.0000000000000003__head_58C5376B__1c:
    EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET TOTAL
    0: [0..8191]: 13932792..13940983 2 (3693816..3702007) 8192
    /var/lib/ceph/osd/ceph-0/current/28.10_head/rbd\udata.97ab74b0dc51.0000000000000008__head_A310D3D0__1c:
    EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET TOTAL
    0: [0..8191]: 14201728..14209919 2 (3962752..3970943) 8192
    /var/lib/ceph/osd/ceph-0/current/28.17_head/rbd\udata.97ab74b0dc51.0000000000000001__head_88755B97__1c:
    EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET TOTAL
    0: [0..8191]: 19049144..19057335 3 (3690680..3698871) 8192
    /var/lib/ceph/osd/ceph-0/current/28.18_head/rbd\udata.97ab74b0dc51.0000000000000004__head_E52CE538__1c:
    EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET TOTAL
    0: [0..8191]: 13901304..13909495 2 (3662328..3670519) 8192
    /var/lib/ceph/osd/ceph-0/current/28.1a_head/rbd\udata.97ab74b0dc51.0000000000000006__head_80A6755A__1c:
    EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET TOTAL
    0: [0..8191]: 14209920..14218111 2 (3970944..3979135) 8192
    /var/lib/ceph/osd/ceph-0/current/28.1a_head/rbd\udata.97ab74b0dc51.0000000000000007__head_9C45D2FA__1c:
    EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET TOTAL
    0: [0..8191]: 13924600..13932791 2 (3685624..3693815) 8192

    IV. Get an idea of your object mapping

    As mentioned earlier, we have 20477952 blocks of 512 in total and the object have the following mapping:

    • 1852440..1860631, a range of 8192 blocks of 512 bytes, (8192*512/1024/1024) = 4M
    • 1860632..1867231
    • 19057336..19065527
    • 13909496..13917687
    • 13932792..13940983
    • 14201728..14209919
    • 19049144..19057335
    • 13901304..13909495
    • 14209920..14218111
    • 13924600..13932791

    The average block position based on the range are:

    • 1856535
    • 1863135
    • 13905399
    • 13913591
    • 13928695
    • 13936887
    • 14205823
    • 14214015
    • 19053239
    • 19061431

    We can now calculate the standard deviation of these positions: 6020910.93405966

    The purpose of this article was to demonstre and justify the second write penalty into Ceph.
    The second write is being called by syncfs which is writing all the objects to their respective PG directories.
    Understanding the PG placement of the object in addition to the physical mapping of each object of the filesystem on the block device might be a great helper while debugging perfomance issue.
    Unfortunately this problem is hard to solve because of the client concurrency writes and the distributed nature of Ceph.
    Obviously what was written here remains pure theory (it’s likely true though :p) given that determining the real placement of a data on a disk is difficult.
    One more thing abou the block placement returned by xfs, this placement gives us values but we don’t know how the mapping of these ranges really looks like on the device.

  • September 5, 2014
    OpenStack at the CephDays Paris

    Save the date (September 18, 2014) and join us at the new edition of the Ceph Days in Paris.
    I will be talking about the new amazing stuff that happened during this (non-finished yet) Juno cycle.
    Actually I’ve never seen so many patch sets in one cyc…

  • September 1, 2014
    OpenStack: use ephemeral and persistent root storage for different hypervisors

    {% img center http://sebastien-han.fr/images/epheremal-persistent-root-storage-different-hypervisor.jpg OpenStack: use ephemeral and persistent root storage for different hypervisors %}

    Computes with Ceph image backend and computes with local image ba…

  • August 31, 2014
    What cinder volume is missing an RBD object ?

    Although it is extremely unlikely to loose an object stored in Ceph, it is not impossible. When it happens to a Cinder volume based on RBD, knowing which has an object missing will help with disaster recovery. The list_missing command … Continue reading

  • August 25, 2014
    Ceph: mix SATA and SSD within the same box

    The use case is simple, I want to use both SSD disks and SATA disks within the same machine and ultimately create pools pointing to SSD or SATA disks.
    In order to achieve our goal, we need to modify the CRUSH map.
    My example has 2 SATA disks and 2 SS…

  • August 25, 2014
    Ceph Node.js Bindings for Librados
    1
    2
    3
    4
    5
    6
    7
    8
    9
    
    var cluster = new rados.Rados( "ceph", "client.admin", "/etc/ceph/ceph.conf");
    cluster.connect();
    
    var ioctx = new rados.Ioctx(cluster, "data");
    ioctx.aio_write("testfile2", new Buffer("1234567879ABCD"), 14, 0, function (err) {
      if (err) {
        throw err;
      }
      ...
    

    To my knowledge, there is not yet any wrapper Node.js for librados. (I guess that Alexandre has not found again the code it had begun. http://tracker.ceph.com/issues/4230). So, I started to make a draft of a plugin (when I have some time). For now I am not using it, but it allows me to discover Node. If people are interested this is here :

    https://github.com/ksperis/node-rados

    (suggestions are welcome, especially on the Error Handling, the use of libuv, buffers / strings, and everything else…)

    All is not yet implemented, but the basic functions are present.

    For example (example.js file in repo):

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    48
    49
    50
    51
    52
    53
    54
    55
    56
    57
    58
    59
    60
    61
    62
    63
    64
    65
    66
    67
    68
    69
    70
    71
    72
    73
    74
    75
    76
    77
    78
    79
    80
    81
    82
    83
    84
    85
    86
    87
    88
    89
    90
    91
    92
    93
    94
    95
    96
    97
    98
    99
    100
    101
    102
    103
    104
    105
    106
    107
    108
    109
    110
    111
    112
    113
    114
    115
    116
    117
    118
    119
    120
    121
    122
    123
    124
    125
    126
    127
    128
    129
    130
    131
    
    var rados = require('./build/Release/rados');
    
    // EXEMPLE FILE
    
    //==================================
    //     Connect to cluster
    //==================================
    var cluster = new rados.Rados( "ceph", "client.admin", "/etc/ceph/ceph.conf");
    
    var err = cluster.connect();
    if (err) {
      // On connection error
      console.log("Error " + err);
      throw err;
    }
    
    // Print cluster FSID, pools
    console.log( "fsid : " + cluster.get_fsid() );
    console.log( "ls pools : " + cluster.pool_list() );
    
    
    //==================================
    //     Create IOCTX
    //==================================
    var ioctx = new rados.Ioctx(cluster, "data");
    
    console.log(" --- RUN Sync Write / Read --- ");
    // Sync write_full
    ioctx.write_full("testfile1", new Buffer("01234567ABCDEF"));
    
    // Sync Read
    console.log( "Read data : " +
      ioctx.read("testfile1", ioctx.stat("testfile1").psize).toString() );
    
    // Remove
    ioctx.remove("testfile1");
    
    console.log(" --- RUN ASync Write / Read --- ");
    // ASync write_full
    ioctx.aio_write("testfile2", new Buffer("1234567879ABCD"), 14, 0, function (err) {
      if (err) {
        throw err;
      }
    
      ioctx.aio_read("testfile2", 14, 0, function (err, data) {
      if (err) {
        throw err;
      }
    
       console.log("[async callback] data = " + data.toString());
    
      });
    
    });
    
    
    //==================================
    //     Read / Write Attributes
    //==================================
    
    console.log(" --- RUN Attributes Write / Read --- ");
    
    ioctx.setxattr("testfile3", "attr1", "first attr");
    ioctx.setxattr("testfile3", "attr2", "second attr");
    ioctx.setxattr("testfile3", "attr3", "last attr value");
    
    var attrs = ioctx.getxattrs("testfile3");
    
    console.log("testfile3 xattr = %j", attrs);
    
    
    // destroy ioctx and close cluster after aio_flush
    ioctx.aio_flush_async(function (err) {
      ioctx.destroy();
      cluster.shutdown();
    });
    
    
    process.exit(code=0)
    
    // OTHER EXEMPLES
    
    //   Read Sync file in chunks
    var file = "testfile";
    var   fileSize = ioctx.stat(file).psize,
      chunkSize = 512,
        bytesRead = 0;
    
    
    while (bytesRead < fileSize) {
        if ((bytesRead + chunkSize) > fileSize) {
            chunkSize = (fileSize - bytesRead);
        }
        var buffer = ioctx.read(file, chunkSize, bytesRead);
        bytesRead += chunkSize;
        process.stdout.write(buffer.toString());
    }
    
    
    //   Read Async file in chunks
    var file = "testfile";
    var   fileSize = ioctx.stat(file).psize,
      chunkSize = 512,
        bytesRead = 0;
    
    
    while (bytesRead < fileSize) {
        if ((bytesRead + chunkSize) > fileSize) {
            chunkSize = (fileSize - bytesRead);
        }
        ioctx.aio_read(file, chunkSize, bytesRead, function (err, data) {
          process.stdout.write(data.toString());
        });
        bytesRead += chunkSize;
    }
    
    
    //   Use snapshot
    ioctx.write_full("testfile10", new Buffer("version1"));
    ioctx.snap_create("snaptest1");
    ioctx.write_full("testfile10", new Buffer("version2"));
    ioctx.snap_create("snaptest2");
    ioctx.write_full("testfile10", new Buffer("version3"));
    ioctx.snap_create("snaptest3");
    
    ioctx.snap_rollback("testfile10", "snaptest2");
    console.log(ioctx.read("testfile10").toString());
    
    ioctx.snap_remove("snaptest1");
    ioctx.snap_remove("snaptest2");
    ioctx.snap_remove("snaptest3");
    
  • August 24, 2014
    Update: OpenStack Summit Paris 2014 – CFS

    An update on my talk submission for the OpenStack summit this year in Paris: my speech about Ceph performance analysis was not chosen by the committee for the official agenda. But at least one piece of good news: Marc’s talk will be part of t…

  • August 20, 2014
    Ceph Primary Affinity

    This option allows you to answer a fairly constant worry in the case of heterogeneous cluster. Indeed, all HDD do not have the same performance or not the same ratio performance / size.
    With this option, it is possible to reduce the load on a specific disk without reducing the amount of data it contains.
    Furthermore, the option is easy to modify because it does not result in data migration. Only preference between primary / secondary will be modified and propagated to clients.

    Before playing with cluster options and tune crushmap, remember to verify that your client is compatible with those options.

    You must enable ‘mon osd allow primary affinity = true’ on the mons before you can adjust primary-affinity. note that older clients will no longer be able to communicate with the cluster.

    ( For client kernel module you can have a look to http://cephnotes.ksperis.com/blog/2014/01/21/feature-set-mismatch-error-on-ceph-kernel-client. )

    Look if the monitor has the primary affinity option:
    # ceph –admin-daemon /var/run/ceph/ceph-mon.*.asok config show | grep ‘primary_affinity’
    “mon_osd_allow_primary_affinity”: “false”,

    Edit ceph.conf dans add in section [mon]:
    mon osd allow primary affinity = true

    Reload mon and test.
    We look how many pg is primary on osd.0, and how many is secondary :

    # ceph pg dump | grep active+clean | egrep "\[0," | wc -l
    100
    # ceph pg dump | grep active+clean | egrep ",0\]" | wc -l
    80
    

    Try to change primary affinity :

    # ceph osd primary-affinity osd.0 0.5
    set osd.0 primary-affinity to 0.5 (8327682)
    
    # ceph pg dump | grep active+clean | egrep "\[0," | wc -l
    48
    # ceph pg dump | grep active+clean | egrep ",0\]" | wc -l
    132
    
    # ceph osd primary-affinity osd.0 0
    set osd.0 primary-affinity to 0 (802)
    
    # ceph pg dump | grep active+clean | egrep "\[0," | wc -l
    0
    # ceph pg dump | grep active+clean | egrep ",0\]" | wc -l
    180
    

    Now there will be no more reading on this OSD.

  • August 18, 2014
    v0.84 released

    The next Ceph development release is here! This release contains several meaty items, including some MDS improvements for journaling, the ability to remove the CephFS file system (and name it), several mon cleanups with tiered pools, several OSD performance branches, a new “read forward” RADOS caching mode, a prototype Kinetic OSD backend, and various radosgw …Read more

  • August 18, 2014
    Scallable Thumbnaling Service With Thumbor and Ceph

    An example of using the python-ceph library for thumbnailing service.

    Thumbor is an opensource tool for thumbnail generation developed by globo.

    The tool allows to make a number of operations (crop, resize, filters …) directly via the URL. …

  • August 14, 2014
    Tell teuthology to use a local ceph-qa-suite directory

    By default teuthology will clone the ceph-qa-suite repository and use the tasks it contains. If tasks have been modified localy, teuthology can be instructed to use a local directory by inserting something like: suite_path: /home/loic/software/ceph/ceph-qa-suite in the teuthology job yaml … Continue reading

Careers