Planet Ceph

Aggregated news from external sources

  • April 10, 2015
    Ceph make check in a ram disk

    When running tests from the Ceph sources, the disk is used intensively and a ram disk can be used to reduce the latency. The kernel must be rebooted to set the ramdisk maximum size to 16GB. For instance on Ubuntu … Continue reading

  • April 7, 2015
    v0.94 Hammer released

    This major release is expected to form the basis of the next long-term stable series. It is intended to supersede v0.80.x Firefly. Highlights since Giant include: RADOS Performance: a range of improvements have been made in the OSD and client-side librados code that improve the throughput on flash backends and improve parallelism and scaling on …Read more

  • March 30, 2015
    Ceph rolling upgrades with Ansible

    {% img center http://sebastien-han.fr/images/ansible-ceph-upgrades.jpg Ceph rolling upgrades with Ansible %}

    Recently I improved a playbook that I wrote a couple of months ago regarding Ceph rolling upgrades.
    This playbook is part of the Ceph Ansible …

  • March 29, 2015
    OpenStack Summit Vancouver: thanks for your votes

    {% img center http://sebastien-han.fr/images/openstack-summit-vancouver.jpg OpenStack Summit Vancouver: thanks for your votes %}

    Bonjour, bonjour !
    Quick post to let you know that my talk submission has been accepted, so I’d like to thank you all for voting.
    As a reminder, our talk (Josh Durgin and I) is scheduled Tuesday, May 19 between 11:15am – 11:55am.

    Also note that the summit has other Ceph talks!

    See you in Vancouver!

  • March 27, 2015
    Update: OpenStack Summit Vancouver Presentation
    The schedule for the upcoming OpenStack Summit 2015 in Vancouver is finally available. Sage and I submitted a presentation about “Storage security in a critical enterprise OpenStack environment“. The submission was accepted and the talk is scheduled for Monday, May 18th, 15:40 – 16:20. 

    There are also some other talks related to Ceph available:
    Checkout the links or the schedule for dates and times of the talks. 

    See you in Vancouver!

  • March 27, 2015
    Ceph erasure coding overhead in a nutshell

    Calculating the storage overhead of a replicated pool in Ceph is easy.
    You divide the amount of space you have by the “size” (amount of replicas) parameter of your storage pool.

    Let’s work with some rough numbers: 64 OSDs of 4TB each.

    Raw size: 64 * 4  = 256TB
    Size 2  : 128 / 2 = 128TB
    Size 3  : 128 / 3 = 85.33TB
    

    Replicated pools are expensive in terms of overhead: Size 2 provides the same resilience and overhead as RAID-1.
    Size 3 provides more resilience than RAID-1 but at the tradeoff of even more overhead.

    Explaining what Erasure coding is about gets complicated quickly.

    I like to compare replicated pools to RAID-1 and Erasure coded pools to RAID-5 (or RAID-6) in the sense that there are data chunks and recovery/parity/coding chunks.

    What’s appealing with erasure coding is that it can provide the same (or better) resiliency than replicated pools but with less storage overhead – at the cost of the computing it requires.

    Ceph has had erasure coding support for a good while already and interesting documentation is available:

    The thing with erasure coded pools, though, is that you’ll need a cache tier in front of them to be able to use them in most cases.

    This makes for a perfect synergy of slower/larger/less expensive drives for your erasure coded pool and faster, more expensive drives in front as your cache tier.

    To calculate the overhead of a erasure coded pool, you need to know your ‘k’ and ‘m’ values of your erasure code profile.

    chunk

      When the encoding function is called, it returns chunks of the same size. Data chunks which can be concatenated to reconstruct the original object and coding chunks which can be used to rebuild a lost chunk.

    K

      The number of data chunks, i.e. the number of chunks in which the original object is divided. For instance if K = 2 a 10KB object will be divided into K objects of 5KB each.

    M

      The number of coding chunks, i.e. the number of additional chunks computed by the encoding functions. If there are 2 coding chunks, it means 2 OSDs can be out without losing data.

    The formula to calculate the overhead is:

    nOSD * k / (k+m) * OSD Size
    

    Finally, let’s look at a couple different erasure coding profile configurations based on 64 OSDs of 4 TB ranging from m=1 to m=4 and k=1 to k=10:

    |     | 1      | 2      | 3      | 4      |
    |-----|--------|--------|--------|--------|
    | 1   | 128.00 | 85.33  | 64.00  | 51.20  |
    | 2   | 170.67 | 128.00 | 102.40 | 85.33  |
    | 3   | 192.00 | 153.60 | 128.00 | 109.71 |
    | 4   | 204.80 | 170.67 | 146.29 | 128.00 |
    | 5   | 213.33 | 182.86 | 160.00 | 142.22 |
    | 6   | 219.43 | 192.00 | 170.67 | 153.60 |
    | 7   | 224.00 | 199.11 | 179.20 | 162.91 |
    | 8   | 227.56 | 204.80 | 186.18 | 170.67 |
    | 9   | 230.40 | 209.45 | 192.00 | 177.23 |
    | 10  | 232.73 | 213.33 | 196.92 | 182.86 |
    | Raw | 256    | 256    | 256    | 256    |
    
  • March 13, 2015
    RadosGW: Simple Replication Example

    This is a simple example of federated gateways config to make an asynchonous replication between two Ceph clusters.

    ( This configuration is based on Ceph documentation :
    http://ceph.com/docs/master/radosgw/federated-config/ )

    Here I use only one region (“default”) and two zones (“main” and “fallback”), one for each cluster.

    Note that in this example, I use 3 placement targets (default, hot, cold) that correspond respectively on pool .main.rgw.buckets, .main.rgw.hot.buckets, .main.rgw.cold.buckets.
    Be carefull to replace the tags {MAIN_USER_ACCESS}, {MAIN_USER_SECRET}, {FALLBACK_USER_ACESS}, {FALLBACK_USER_SECRET} by corresponding values.

    First I created region and zones files, that will be require on the 2 clusters :

    The region file “region.conf.json” :

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    
    { "name": "default",
      "api_name": "default",
      "is_master": "true",
      "endpoints": [
            "http:\/\/s3.mydomain.com:80\/"],
      "master_zone": "main",
      "zones": [
            { "name": "main",
              "endpoints": [
                    "http:\/\/s3.mydomain.com:80\/"],
              "log_meta": "true",
              "log_data": "true"},
            { "name": "fallback",
              "endpoints": [
                    "http:\/\/s3-fallback.mydomain.com:80\/"],
              "log_meta": "true",
              "log_data": "true"}],
      "placement_targets": [
            { "name": "default-placement",
              "tags": []},
            { "name": "cold-placement",
              "tags": []},
            { "name": "hot-placement",
              "tags": []}],
      "default_placement": "default-placement"}
    

    a zone file “zone-main.conf.json” :

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    
    { "domain_root": ".main.domain.rgw",
      "control_pool": ".main.rgw.control",
      "gc_pool": ".main.rgw.gc",
      "log_pool": ".main.log",
      "intent_log_pool": ".main.intent-log",
      "usage_log_pool": ".main.usage",
      "user_keys_pool": ".main.users",
      "user_email_pool": ".main.users.email",
      "user_swift_pool": ".main.users.swift",
      "user_uid_pool": ".main.users.uid",
      "system_key": {
          "access_key": "{MAIN_USER_ACCESS}",
          "secret_key": "{MAIN_USER_SECRET}"},
      "placement_pools": [
            { "key": "default-placement",
              "val": { "index_pool": ".main.rgw.buckets.index",
                  "data_pool": ".main.rgw.buckets",
                  "data_extra_pool": ".main.rgw.buckets.extra"}},
            { "key": "cold-placement",
              "val": { "index_pool": ".main.rgw.buckets.index",
                  "data_pool": ".main.rgw.cold.buckets",
                  "data_extra_pool": ".main.rgw.buckets.extra"}},
            { "key": "hot-placement",
              "val": { "index_pool": ".main.rgw.buckets.index",
                  "data_pool": ".main.rgw.hot.buckets",
                  "data_extra_pool": ".main.rgw.buckets.extra"}}]}
    

    And a zone file “zone-fallback.conf.json” :

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    
    { "domain_root": ".fallback.domain.rgw",
      "control_pool": ".fallback.rgw.control",
      "gc_pool": ".fallback.rgw.gc",
      "log_pool": ".fallback.log",
      "intent_log_pool": ".fallback.intent-log",
      "usage_log_pool": ".fallback.usage",
      "user_keys_pool": ".fallback.users",
      "user_email_pool": ".fallback.users.email",
      "user_swift_pool": ".fallback.users.swift",
      "user_uid_pool": ".fallback.users.uid",
      "system_key": {
        "access_key": "{FALLBACK_USER_ACESS}",
        "secret_key": "{FALLBACK_USER_SECRET}"
             },
      "placement_pools": [
            { "key": "default-placement",
              "val": { "index_pool": ".fallback.rgw.buckets.index",
                  "data_pool": ".fallback.rgw.buckets",
                  "data_extra_pool": ".fallback.rgw.buckets.extra"}},
            { "key": "cold-placement",
              "val": { "index_pool": ".fallback.rgw.buckets.index",
                  "data_pool": ".fallback.rgw.cold.buckets",
                  "data_extra_pool": ".fallback.rgw.buckets.extra"}},
            { "key": "hot-placement",
              "val": { "index_pool": ".fallback.rgw.buckets.index",
                  "data_pool": ".fallback.rgw.hot.buckets",
                  "data_extra_pool": ".fallback.rgw.buckets.extra"}}]}
    

    On first cluster (MAIN)

    I created the pools :

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    
    ceph osd pool create .rgw.root 16 16
    ceph osd pool create .main.rgw.root 16 16
    ceph osd pool create .main.domain.rgw 16 16
    ceph osd pool create .main.rgw.control 16 16
    ceph osd pool create .main.rgw.gc 16 16
    ceph osd pool create .main.rgw.buckets 512 512
    ceph osd pool create .main.rgw.hot.buckets 512 512
    ceph osd pool create .main.rgw.cold.buckets 512 512
    ceph osd pool create .main.rgw.buckets.index 32 32
    ceph osd pool create .main.rgw.buckets.extra 16 16
    ceph osd pool create .main.log 16 16
    ceph osd pool create .main.intent-log 16 16
    ceph osd pool create .main.usage 16 16
    ceph osd pool create .main.users 16 16
    ceph osd pool create .main.users.email 16 16
    ceph osd pool create .main.users.swift 16 16
    ceph osd pool create .main.users.uid 16 16
    

    I configured region, zone, and add system users :

    1
    2
    3
    4
    5
    6
    7
    
      radosgw-admin region set --name client.radosgw.main < region.conf.json
      radosgw-admin zone set --rgw-zone=main --name client.radosgw.main < zone-main.conf.json
      radosgw-admin zone set --rgw-zone=fallback --name client.radosgw.main < zone-fallback.conf.json
      radosgw-admin regionmap update --name client.radosgw.main
    
      radosgw-admin user create --uid="main" --display-name="Zone main" --name client.radosgw.main --system --access-key={MAIN_USER_ACCESS} --secret={MAIN_USER_SECRET}
      radosgw-admin user create --uid="fallback" --display-name="Zone fallback" --name client.radosgw.main --system --access-key={FALLBACK_USER_ACESS} --secret={FALLBACK_USER_SECRET}
    

    Setup RadosGW Config in ceph.conf on cluster MAIN :

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    
      [client.radosgw.main]
      host = ceph-main-radosgw-01
      rgw region = default
      rgw region root pool = .rgw.root
      rgw zone = main
      rgw zone root pool = .main.rgw.root
      rgw frontends = "civetweb port=80"
      rgw dns name = s3.mydomain.com
      keyring = /etc/ceph/ceph.client.radosgw.keyring
      rgw_socket_path = /var/run/ceph/radosgw.sock
    

    I needed to create keyring for [client.radosgw.main] in /etc/ceph/ceph.client.radosgw.keyring, see documentation.

    Then, start/restart radosgw for cluster MAIN.

    On the other Ceph cluster (FALLBACK)

    I created the pools :

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    
    ceph osd pool create .rgw.root 16 16
    ceph osd pool create .fallback.rgw.root 16 16
    ceph osd pool create .fallback.domain.rgw 16 16
    ceph osd pool create .fallback.rgw.control 16 16
    ceph osd pool create .fallback.rgw.gc 16 16
    ceph osd pool create .fallback.rgw.buckets 512 512
    ceph osd pool create .fallback.rgw.hot.buckets 512 512
    ceph osd pool create .fallback.rgw.cold.buckets 512 512
    ceph osd pool create .fallback.rgw.buckets.index 32 32
    ceph osd pool create .fallback.rgw.buckets.extra 16 16
    ceph osd pool create .fallback.log 16 16
    ceph osd pool create .fallback.intent-log 16 16
    ceph osd pool create .fallback.usage 16 16
    ceph osd pool create .fallback.users 16 16
    ceph osd pool create .fallback.users.email 16 16
    ceph osd pool create .fallback.users.swift 16 16
    ceph osd pool create .fallback.users.uid 16 16
    

    I configured region, zone, and add system users :

    radosgw-admin region set --name client.radosgw.fallback < region.conf.json
    radosgw-admin zone set --rgw-zone=fallback --name client.radosgw.fallback < zone-fallback.conf.json
    radosgw-admin zone set --rgw-zone=main --name client.radosgw.fallback < zone-main.conf.json
    radosgw-admin regionmap update --name client.radosgw.fallback
    
    radosgw-admin user create --uid="fallback" --display-name="Zone fallback" --name client.radosgw.fallback --system --access-key={FALLBACK_USER_ACESS} --secret={FALLBACK_USER_SECRET}
    radosgw-admin user create --uid="main" --display-name="Zone main" --name client.radosgw.fallback --system --access-key={MAIN_USER_ACCESS} --secret={MAIN_USER_SECRET}
    

    Setup RadosGW Config in ceph.conf on cluster FALLBACK :

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    
    [client.radosgw.fallback]
    host = ceph-fallback-radosgw-01
    rgw region = default
    rgw region root pool = .rgw.root
    rgw zone = fallback
    rgw zone root pool = .fallback.rgw.root
    rgw frontends = "civetweb port=80"
    rgw dns name = s3-fallback.mydomain.com
    keyring = /etc/ceph/ceph.client.radosgw.keyring
    rgw_socket_path = /var/run/ceph/radosgw.sock
    

    Also, I needed to create keyring for [client.radosgw.fallback] in /etc/ceph/ceph.client.radosgw.keyring and start radosgw for cluster FALLBACK.

    Finally setup the RadosGW Agent

    /etc/ceph/radosgw-agent/default.conf :

    1
    2
    3
    4
    5
    6
    7
    8
    9
    
    src_zone: main
    source: http://s3.mydomain.com:80
    src_access_key: {MAIN_USER_ACCESS}
    src_secret_key: {MAIN_USER_SECRET}
    dest_zone: fallback
    destination: http://s3-fallback.mydomain.com:80
    dest_access_key: {FALLBACK_USER_ACESS}
    dest_secret_key: {FALLBACK_USER_SECRET}
    log_file: /var/log/radosgw/radosgw-sync.log
    
    1
    
    /etc/init.d/radosgw-agent start
    

    After that, he still has a little suspense …
    Then I try to create a bucket with data on s3.mydomain.com and verify that, it’s well synchronized.

    for debug, you can enable logs on the RadosGW on each side, and start radosgw-agent with radosgw-agent -v -c /etc/ceph/radosgw-agent/default.conf

    These steps work for me. The establishment is sometimes not obvious. Whenever I setup a sync it rarely works the first time, but it always ends up running.

  • March 11, 2015
    New release of python-cephclient: 0.1.0.5

    I’ve just drafted a new release of python-cephclient
    on PyPi: v0.1.0.5.

    After learning about the ceph-rest-api I just had
    to do something fun with it.

    In fact, it’s going to become very handy for me as I might start to develop
    with it for things like nagios monitoring scripts.

    The changelog:

    dmsimard:

    • Add missing dependency on the requests library
    • Some PEP8 and code standardization cleanup
    • Add root “PUT” methods
    • Add mon “PUT” methods
    • Add mds “PUT” methods
    • Add auth “PUT” methods

    Donald Talton:

    • Add osd “PUT” methods

    Please try it out and let me know if you have any feedback !

    Pull requests are welcome 🙂

  • March 10, 2015
    v0.80.9 Firefly released

    This is a bugfix release for firefly. It fixes a performance regression in librbd, an important CRUSH misbehavior (see below), and several RGW bugs. We have also backported support for flock/fcntl locks to ceph-fuse and libcephfs. We recommend that all Firefly users upgrade. For more detailed information, see the complete changelog. ADJUSTING CRUSH MAPS This point release fixes …Read more

  • March 9, 2015
    Provisionning a teuthology target with a given kernel

    When a teuthology target (i.e. machine) is provisioned with teuthology-lock for the purpose of testing Ceph, there is no way to choose the kernel. But it can be installed afterwards using the following: cat > kernel.yaml <<EOF interactive-on-error: true roles: … Continue reading

  • March 9, 2015
    Ceph OSD uuid conversion to OSD id and vice versa

    When handling a Ceph OSD, it is convenient to assign it a symbolic name that can be chosen even before it is created. That’s what the uuid argument for ceph osd create is for. Without a uuid argument, a random … Continue reading

  • March 5, 2015
    Incomplete PGs — OH MY!

    I recently had the opportunity to work on a Firefly cluster (0.80.8) in which power outages caused a failure of two OSDs. As with lots of things in technology, that’s not the whole story. The manner in which the power outages and OSD failures occurred put the cluster into a state with 5 placement groups …Read more

Careers