Planet Ceph

Aggregated news from external sources

  • August 20, 2013
    Ceph OSD : Where Is My Data ?

    The purpose is to verify where my data is stored on the Ceph cluster.

    For this, I have just create a minimal cluster with 3 osd :

    1
    
    $ ceph-deploy osd create ceph-01:/dev/sdb ceph-02:/dev/sdb ceph-03:/dev/sdb

    Where is my osd directory on ceph-01 ?

    1
    2
    
    $ mount | grep ceph
    /dev/sdb1 on /var/lib/ceph/osd/ceph-0 type xfs (rw,noatime,attr2,delaylog,noquota)

    The directory content :

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    
    $ cd /var/lib/ceph/osd/ceph-0; ls -l
    total 52
    -rw-r--r--   1 root root  487 août  20 12:12 activate.monmap
    -rw-r--r--   1 root root    3 août  20 12:12 active
    -rw-r--r--   1 root root   37 août  20 12:12 ceph_fsid
    drwxr-xr-x 133 root root 8192 août  20 12:18 current
    -rw-r--r--   1 root root   37 août  20 12:12 fsid
    lrwxrwxrwx   1 root root   58 août  20 12:12 journal -> /dev/disk/by-partuuid/37180b7e-fe5d-4b53-8693-12a8c1f52ec9
    -rw-r--r--   1 root root   37 août  20 12:12 journal_uuid
    -rw-------   1 root root   56 août  20 12:12 keyring
    -rw-r--r--   1 root root   21 août  20 12:12 magic
    -rw-r--r--   1 root root    6 août  20 12:12 ready
    -rw-r--r--   1 root root    4 août  20 12:12 store_version
    -rw-r--r--   1 root root    0 août  20 12:12 sysvinit
    -rw-r--r--   1 root root    2 août  20 12:12 whoami
    
    $ du -hs *
    4,0K  activate.monmap → The current monmap
    4,0K  active      → "ok"
    4,0K  ceph_fsid   → cluster fsid (same return by 'ceph fsid')
    2,1M  current
    4,0K  fsid        → id for this osd
    0 journal         → symlink to journal partition
    4,0K  journal_uuid
    4,0K  keyring     → the key
    4,0K  magic       → "ceph osd volume v026"
    4,0K  ready       → "ready"
    4,0K  store_version   
    0 sysvinit
    4,0K  whoami      → id of the osd

    The data are store in the directory “current” :
    It contains some file and many _head file :

    1
    2
    3
    4
    5
    6
    
    $ cd current; ls -l | grep -v head
    total 20
    -rw-r--r-- 1 root root     5 août  20 12:18 commit_op_seq
    drwxr-xr-x 2 root root 12288 août  20 12:18 meta
    -rw-r--r-- 1 root root     0 août  20 12:12 nosnap
    drwxr-xr-x 2 root root   111 août  20 12:12 omap

    In omap directory :

    1
    2
    3
    4
    5
    6
    7
    8
    
    $ cd omap; ls -l
    -rw-r--r-- 1 root root     150 août  20 12:12 000007.sst
    -rw-r--r-- 1 root root 2031616 août  20 12:18 000010.log 
    -rw-r--r-- 1 root root      16 août  20 12:12 CURRENT
    -rw-r--r-- 1 root root       0 août  20 12:12 LOCK
    -rw-r--r-- 1 root root     172 août  20 12:12 LOG
    -rw-r--r-- 1 root root     309 août  20 12:12 LOG.old
    -rw-r--r-- 1 root root   65536 août  20 12:12 MANIFEST-000009

    In meta directory :

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    
    $ cd ../meta; ls -l
    total 940
    -rw-r--r-- 1 root root  710 août  20 12:14 inc\uosdmap.10__0_F4E9C003__none
    -rw-r--r-- 1 root root  958 août  20 12:12 inc\uosdmap.1__0_B65F4306__none
    -rw-r--r-- 1 root root  722 août  20 12:14 inc\uosdmap.11__0_F4E9C1D3__none
    -rw-r--r-- 1 root root  152 août  20 12:14 inc\uosdmap.12__0_F4E9C163__none
    -rw-r--r-- 1 root root  153 août  20 12:12 inc\uosdmap.2__0_B65F40D6__none
    -rw-r--r-- 1 root root  574 août  20 12:12 inc\uosdmap.3__0_B65F4066__none
    -rw-r--r-- 1 root root  153 août  20 12:12 inc\uosdmap.4__0_B65F4136__none
    -rw-r--r-- 1 root root  722 août  20 12:12 inc\uosdmap.5__0_B65F46C6__none
    -rw-r--r-- 1 root root  136 août  20 12:14 inc\uosdmap.6__0_B65F4796__none
    -rw-r--r-- 1 root root  642 août  20 12:14 inc\uosdmap.7__0_B65F4726__none
    -rw-r--r-- 1 root root  153 août  20 12:14 inc\uosdmap.8__0_B65F44F6__none
    -rw-r--r-- 1 root root  722 août  20 12:14 inc\uosdmap.9__0_B65F4586__none
    -rw-r--r-- 1 root root    0 août  20 12:12 infos__head_16EF7597__none
    -rw-r--r-- 1 root root 2870 août  20 12:14 osdmap.10__0_6417091C__none
    -rw-r--r-- 1 root root  830 août  20 12:12 osdmap.1__0_FD6E49B1__none
    -rw-r--r-- 1 root root 2870 août  20 12:14 osdmap.11__0_64170EAC__none
    -rw-r--r-- 1 root root 2870 août  20 12:14 osdmap.12__0_64170E7C__none   → current osdmap
    -rw-r--r-- 1 root root 1442 août  20 12:12 osdmap.2__0_FD6E4941__none
    -rw-r--r-- 1 root root 1510 août  20 12:12 osdmap.3__0_FD6E4E11__none
    -rw-r--r-- 1 root root 2122 août  20 12:12 osdmap.4__0_FD6E4FA1__none
    -rw-r--r-- 1 root root 2122 août  20 12:12 osdmap.5__0_FD6E4F71__none
    -rw-r--r-- 1 root root 2122 août  20 12:14 osdmap.6__0_FD6E4C01__none
    -rw-r--r-- 1 root root 2190 août  20 12:14 osdmap.7__0_FD6E4DD1__none
    -rw-r--r-- 1 root root 2802 août  20 12:14 osdmap.8__0_FD6E4D61__none
    -rw-r--r-- 1 root root 2802 août  20 12:14 osdmap.9__0_FD6E4231__none
    -rw-r--r-- 1 root root  354 août  20 12:14 osd\usuperblock__0_23C2FCDE__none
    -rw-r--r-- 1 root root    0 août  20 12:12 pglog\u0.0__0_103B076E__none     → Log for each pg
    -rw-r--r-- 1 root root    0 août  20 12:12 pglog\u0.1__0_103B043E__none
    -rw-r--r-- 1 root root    0 août  20 12:12 pglog\u0.11__0_5172C9DB__none
    -rw-r--r-- 1 root root    0 août  20 12:12 pglog\u0.13__0_5172CE3B__none
    -rw-r--r-- 1 root root    0 août  20 12:13 pglog\u0.15__0_5172CC9B__none
    -rw-r--r-- 1 root root    0 août  20 12:13 pglog\u0.16__0_5172CC2B__none
    ............
    -rw-r--r-- 1 root root    0 août  20 12:12 snapmapper__0_A468EC03__noneosd

    Try decompiling crush map from osdmap :

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    48
    49
    50
    51
    52
    53
    54
    55
    56
    57
    58
    59
    60
    61
    
    $ ceph osd stat
    e12: 3 osds: 3 up, 3 in
    
    $ osdmaptool osdmap.12__0_64170E7C__none --export-crush /tmp/crushmap.bin
    osdmaptool: osdmap file 'osdmap.12__0_64170E7C__none'
    osdmaptool: exported crush map to /tmp/crushmap.bin
    
    $ crushtool -d /tmp/crushmap.bin -o /tmp/crushmap.txt
    
    $ cat /tmp/crushmap.txt
    # begin crush map
    
    # devices
    device 0 osd.0
    device 1 osd.1
    device 2 osd.2
    
    # types
    type 0 osd
    type 1 host
    type 2 rack
    type 3 row
    type 4 room
    type 5 datacenter
    type 6 root
    
    # buckets
    host ceph-01 {
      id -2       # do not change unnecessarily
      # weight 0.050
      alg straw
      hash 0  # rjenkins1
      item osd.0 weight 0.050
    }
    host ceph-02 {
      id -3       # do not change unnecessarily
      # weight 0.050
      alg straw
      hash 0  # rjenkins1
      item osd.1 weight 0.050
    }
    host ceph-03 {
      id -4       # do not change unnecessarily
      # weight 0.050
      alg straw
      hash 0  # rjenkins1
      item osd.2 weight 0.050
    }
    root default {
      id -1       # do not change unnecessarily
      # weight 0.150
      alg straw
      hash 0  # rjenkins1
      item ceph-01 weight 0.050
      item ceph-02 weight 0.050
      item ceph-03 weight 0.050
    }
    
    ...
    
    # end crush map

    Ok it’s what I expect. 🙂

    The cluster is empty :

    1
    2
    
    $ find *_head -type f | wc -l
    0

    The directory list correspond to the ‘ceph pg dump’

    1
    2
    3
    
    $ for dir in ` ceph pg dump | grep '\[0,' | cut -f1 `; do if [ -d $dir_head ]; then echo exist; else echo nok; fi; done | sort | uniq -c
    dumped all in format plain
         69 exist

    To get all stats for a specific pg :

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    48
    49
    50
    51
    52
    53
    54
    55
    56
    57
    58
    59
    60
    61
    62
    63
    64
    65
    66
    67
    68
    69
    70
    71
    72
    73
    74
    75
    76
    77
    78
    79
    80
    81
    82
    83
    84
    85
    86
    87
    88
    89
    90
    91
    92
    93
    94
    95
    96
    97
    98
    99
    100
    101
    102
    103
    104
    
    $ ceph pg 0.1 query
    { "state": "active+clean",
      "epoch": 12,
      "up": [
            0,
            1],
      "acting": [
            0,
            1],
      "info": { "pgid": "0.1",
          "last_update": "0'0",
          "last_complete": "0'0",
          "log_tail": "0'0",
          "last_backfill": "MAX",
          "purged_snaps": "[]",
          "history": { "epoch_created": 1,
              "last_epoch_started": 12,
              "last_epoch_clean": 12,
              "last_epoch_split": 0,
              "same_up_since": 9,
              "same_interval_since": 9,
              "same_primary_since": 5,
              "last_scrub": "0'0",
              "last_scrub_stamp": "2013-08-20 12:12:37.851559",
              "last_deep_scrub": "0'0",
              "last_deep_scrub_stamp": "2013-08-20 12:12:37.851559",
              "last_clean_scrub_stamp": "0.000000"},
          "stats": { "version": "0'0",
              "reported_seq": "12",
              "reported_epoch": "12",
              "state": "active+clean",
              "last_fresh": "2013-08-20 12:16:22.709534",
              "last_change": "2013-08-20 12:16:22.105099",
              "last_active": "2013-08-20 12:16:22.709534",
              "last_clean": "2013-08-20 12:16:22.709534",
              "last_became_active": "0.000000",
              "last_unstale": "2013-08-20 12:16:22.709534",
              "mapping_epoch": 5,
              "log_start": "0'0",
              "ondisk_log_start": "0'0",
              "created": 1,
              "last_epoch_clean": 12,
              "parent": "0.0",
              "parent_split_bits": 0,
              "last_scrub": "0'0",
              "last_scrub_stamp": "2013-08-20 12:12:37.851559",
              "last_deep_scrub": "0'0",
              "last_deep_scrub_stamp": "2013-08-20 12:12:37.851559",
              "last_clean_scrub_stamp": "0.000000",
              "log_size": 0,
              "ondisk_log_size": 0,
              "stats_invalid": "0",
              "stat_sum": { "num_bytes": 0,
                  "num_objects": 0,
                  "num_object_clones": 0,
                  "num_object_copies": 0,
                  "num_objects_missing_on_primary": 0,
                  "num_objects_degraded": 0,
                  "num_objects_unfound": 0,
                  "num_read": 0,
                  "num_read_kb": 0,
                  "num_write": 0,
                  "num_write_kb": 0,
                  "num_scrub_errors": 0,
                  "num_shallow_scrub_errors": 0,
                  "num_deep_scrub_errors": 0,
                  "num_objects_recovered": 0,
                  "num_bytes_recovered": 0,
                  "num_keys_recovered": 0},
              "stat_cat_sum": {},
              "up": [
                    0,
                    1],
              "acting": [
                    0,
                    1]},
          "empty": 1,
          "dne": 0,
          "incomplete": 0,
          "last_epoch_started": 12},
      "recovery_state": [
            { "name": "Started\/Primary\/Active",
              "enter_time": "2013-08-20 12:15:30.102250",
              "might_have_unfound": [],
              "recovery_progress": { "backfill_target": -1,
                  "waiting_on_backfill": 0,
                  "backfill_pos": "0\/\/0\/\/-1",
                  "backfill_info": { "begin": "0\/\/0\/\/-1",
                      "end": "0\/\/0\/\/-1",
                      "objects": []},
                  "peer_backfill_info": { "begin": "0\/\/0\/\/-1",
                      "end": "0\/\/0\/\/-1",
                      "objects": []},
                  "backfills_in_flight": [],
                  "pull_from_peer": [],
                  "pushing": []},
              "scrub": { "scrubber.epoch_start": "0",
                  "scrubber.active": 0,
                  "scrubber.block_writes": 0,
                  "scrubber.finalizing": 0,
                  "scrubber.waiting_on": 0,
                  "scrubber.waiting_on_whom": []}},
            { "name": "Started",
              "enter_time": "2013-08-20 12:14:51.501628"}]}

    Retrieve an object on the cluster

    In this test we create a standard pool (pgnum=8 and repli=2)

    1
    2
    3
    4
    5
    6
    7
    8
    
    $ rados mkpool testpool
    $ wget -q http://ceph.com/docs/master/_static/logo.png
    $ md5sum logo.png
    4c7c15e856737efc0d2d71abde3c6b28  logo.png
    
    $ rados put -p testpool logo.png logo.png
    $ ceph osd map testpool logo.png
    osdmap e14 pool 'testpool' (3) object 'logo.png' -> pg 3.9e17671a (3.2) -> up [2,1] acting [2,1]

    My Ceph logo is on pg 3.2 (main on osd.2 and replica on osd.1)

    1
    2
    3
    4
    5
    6
    7
    8
    9
    
    $ ceph osd tree
    # id  weight  type name   up/down reweight
    -1    0.15    root default
    -2    0.04999     host ceph-01
    0 0.04999         osd.0   up  1   
    -3    0.04999     host ceph-02
    1 0.04999         osd.1   up  1   
    -4    0.04999     host ceph-03
    2 0.04999         osd.2   up  1

    And osd.2 is on ceph-03 :

    1
    2
    3
    4
    5
    
    $ cd /var/lib/ceph/osd/ceph-2/current/3.2_head/
    $ ls
    logo.png__head_9E17671A__3
    $ md5sum logo.png__head_9E17671A__3
    4c7c15e856737efc0d2d71abde3c6b28  logo.png__head_9E17671A__3
    

    It exactly the same 🙂

    Import RBD

    Same thing, but testing as a block device.

    1
    2
    3
    4
    5
    6
    7
    8
    
    $ rbd import logo.png testpool/logo.png 
    Importing image: 100% complete...done.
    $ rbd info testpool/logo.png
    rbd image 'logo.png':
      size 3898 bytes in 1 objects
      order 22 (4096 KB objects)
      block_name_prefix: rb.0.1048.2ae8944a
      format: 1

    Only one object.

    1
    2
    3
    4
    5
    6
    7
    
    $ rados ls -p testpool
    logo.png
    rb.0.1048.2ae8944a.000000000000
    rbd_directory
    logo.png.rbd
    $ ceph osd map testpool logo.png.rbd
    osdmap e14 pool 'testpool' (3) object 'logo.png.rbd' -> pg 3.d592352c (3.4) -> up [0,2] acting [0,2]

    Let’s go.

    1
    2
    3
    4
    
    $ cd /var/lib/ceph/osd/ceph-0/current/3.4_head/
    $ cat logo.png.rbd__head_D592352C__3
    <<< Rados Block Device Image >>>
    rb.0.1048.2ae8944aRBD001.005:

    Here we can retrieve the block name prefix of the rbd ‘rb.0.1048.2ae8944a’ :

    1
    2
    
    $ ceph osd map testpool rb.0.1048.2ae8944a.000000000000
    osdmap e14 pool 'testpool' (3) object 'rb.0.1048.2ae8944a.000000000000' -> pg 3.d512078b (3.3) -> up [2,1] acting [2,1]

    On ceph-03 :

    1
    2
    3
    
    $ cd /var/lib/ceph/osd/ceph-2/current/3.3_head
    $ md5sum rb.0.1048.2ae8944a.000000000000__head_D512078B__3
    4c7c15e856737efc0d2d71abde3c6b28  rb.0.1048.2ae8944a.000000000000__head_D512078B__3

    We retrieve the file unchanged because it is not split 🙂

    Try RBD snapshot

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    
    $ rbd snap create testpool/logo.png@snap1
    $ rbd snap ls testpool/logo.png
    SNAPID NAME        SIZE 
         2 snap1 3898 bytes
    $ echo "testpool/logo.png" >> /etc/ceph/rbdmap
    $ service rbdmap reload
    [ ok ] Starting RBD Mapping: testpool/logo.png.
    [ ok ] Mounting all filesystems...done.
    
    $ dd if=/dev/zero of=/dev/rbd/testpool/logo.png 
    dd: écriture vers « /dev/rbd/testpool/logo.png »: Aucun espace disponible sur le périphérique
    8+0 enregistrements lus
    7+0 enregistrements écrits
    3584 octets (3,6 kB) copiés, 0,285823 s, 12,5 kB/s
    
    $ ceph osd map testpool rb.0.1048.2ae8944a.000000000000
    osdmap e15 pool 'testpool' (3) object 'rb.0.1048.2ae8944a.000000000000' -> pg 3.d512078b (3.3) -> up [2,1] acting [2,1]

    It’s the same place on ceph-03 :

    1
    2
    3
    4
    
    $ cd /var/lib/ceph/osd/ceph-2/current/3.3_head
    $ md5sum *
    4c7c15e856737efc0d2d71abde3c6b28  rb.0.1048.2ae8944a.000000000000__2_D512078B__3
    dd99129a16764a6727d3314b501e9c23  rb.0.1048.2ae8944a.000000000000__head_D512078B__3

    We can notice that file containing 2 (snap id 2) contain original data.
    And a new file has been created for the current data : head

    For next tests, I will try with stripped files, rbd format 2 and snap on pool.

  • August 18, 2013
    Ceph Dumpling

    The Ceph community just finished its latest three-month cycle of development, culminating in a new major release of Ceph called “Dumpling,” or v0.67 for those of a more serious demeanor. Inktank is proud to have contributed two major pieces of functionality to Dumpling. 1. Global Namespace and Region Support – Many service providers and IT […]

  • August 9, 2013
    Samba Shadow_copy and Ceph RBD

    I add script to create snapshot on rbd for use with samba shadow_copy2.
    For more details go on https://github.com/ksperis/autosnap-rbd-shadow-copy

     How to use :

    Before you need to have ceph cluster running and samba installed.

    Verify admin access to the ceph cluster : (should not return error)

    1
    
    $ rbd ls

    Get the script :

    1
    2
    3
    4
    5
    
    $ mkdir -p /etc/ceph/scripts/
    $ cd /etc/ceph/scripts/
    $ wget https://raw.github.com/ksperis/autosnap-rbd-shadow-copy/master/autosnap.conf
    $ wget https://raw.github.com/ksperis/autosnap-rbd-shadow-copy/master/autosnap.sh
    $ chmod +x autosnap.sh

    Create a block device :

    1
    2
    3
    4
    5
    
    $ rbd create myshare --size=1024
    $ echo "myshare" >> /etc/ceph/rbdmap
    $ /etc/init.d/rbdmap reload
    [ ok ] Starting RBD Mapping: rbd/myshare.
    [ ok ] Mounting all filesystems...done.

    Format the block device :

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    
    $ mkfs.xfs /dev/rbd/rbd/myshare
    log stripe unit (4194304 bytes) is too large (maximum is 256KiB)
    log stripe unit adjusted to 32KiB
    meta-data=/dev/rbd/rbd/myshare   isize=256    agcount=9, agsize=31744 blks
             =                       sectsz=512   attr=2, projid32bit=0
    data     =                       bsize=4096   blocks=262144, imaxpct=25
             =                       sunit=1024   swidth=1024 blks
    naming   =version 2              bsize=4096   ascii-ci=0
    log      =internal log           bsize=4096   blocks=2560, version=2
             =                       sectsz=512   sunit=8 blks, lazy-count=1
    realtime =none                   extsz=4096   blocks=0, rtextents=0

    Mount the share :

    1
    2
    3
    
    $ mkdir /myshare
    $ echo "/dev/rbd/rbd/myshare /myshare xfs defaults 0 0" >> /etc/fstab
    $ mount /myshare

    Add this section in your /etc/samba/smb.conf :

    1
    2
    3
    4
    5
    6
    
    [myshare]
        path = /myshare
        writable = yes
      vfs objects = shadow_copy2
      shadow:snapdir = .snapshots
      shadow:sort = desc

    Reload samba

    1
    
    $ /etc/init.d/samba reload

    Create snapshot directory and run the script :

    1
    2
    3
    4
    5
    6
    
    $ mkdir -p /myshare/.snapshots
    $ /etc/ceph/scripts/autosnap.sh
    * Create snapshot for myshare: @GMT-2013.08.09-10.16.10-autosnap
    synced, no cache, snapshot created.
    * Shadow Copy to mount for rbd/myshare :
    GMT-2013.08.09-10.14.44

    Verify that the first snapshot is correctly mount :

    1
    2
    3
    
    $ mount | grep myshare
    /dev/rbd1 on /myshare type xfs (rw,relatime,attr2,inode64,sunit=8192,swidth=8192,noquota)
    /dev/rbd2 on /myshare/.snapshots/@GMT-2013.08.09-10.14.44 type xfs (ro,relatime,nouuid,norecovery,attr2,inode64,sunit=8192,swidth=8192,noquota)

    Also, you can add this on crontab to run everyday the script :

    1
    
    $ echo "00 0    * * *   root    /bin/bash /etc/ceph/scripts/autosnap.sh" >> /etc/crontab
  • August 3, 2013
    Test Ceph Persistant Rbd Device

    Create persistant rbd device

    Create block device and map it with /etc/ceph/rbdmap

    1
    2
    3
    4
    5
    
    $ rbd create rbd/myrbd --size=1024
    $ echo "rbd/myrbd" >> /etc/ceph/rbdmap
    $ service rbdmap reload
    [ ok ] Starting RBD Mapping: rbd/myrbd.
    [ ok ] Mounting all filesystems...done.

    View rbd mapped :

    1
    2
    3
    
    $ rbd showmapped
    id pool image snap device    
    1  rbd  myrbd -    /dev/rbd1

    Create FS and mount :

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    
    $ mkfs.xfs /dev/rbd/rbd/myrbd 
    log stripe unit (4194304 bytes) is too large (maximum is 256KiB)
    log stripe unit adjusted to 32KiB
    meta-data=/dev/rbd/rbd/myrbd     isize=256    agcount=9, agsize=31744 blks
             =                       sectsz=512   attr=2, projid32bit=0
    data     =                       bsize=4096   blocks=262144, imaxpct=25
             =                       sunit=1024   swidth=1024 blks
    naming   =version 2              bsize=4096   ascii-ci=0
    log      =internal log           bsize=4096   blocks=2560, version=2
             =                       sectsz=512   sunit=8 blks, lazy-count=1
    realtime =none                   extsz=4096   blocks=0, rtextents=0
    
    $ mkdir -p /mnt/myrbd
    $ blkid | grep rbd1
    /dev/rbd1: UUID="a07e969e-bb1a-4921-9171-82cf7a737a69" TYPE="xfs"
    $ echo "UUID=a07e969e-bb1a-4921-9171-82cf7a737a69 /mnt/myrbd xfs defaults 0 0" >> /etc/fstab
    $ mount -a

    Check :

    1
    2
    
    $ mount | grep rbd1
    /dev/rbd1 on /mnt/myrbd type xfs (rw,relatime,attr2,inode64,sunit=8192,swidth=8192,noquota)

    Test snapshot

    1
    
    $ touch /mnt/myrbd/v1

    Make snapshot :

    1
    2
    3
    
    $ sync && xfs_freeze -f /mnt/
    $ rbd snap create rbd/myrbd@snap1
    $ xfs_freeze -u /mnt/

    Change a file :

    1
    
    $ mv /mnt/myrbd/v1 /mnt/myrbd/v2

    Mount snapshot in RO :

    1
    2
    3
    
    $ mkdir -p /mnt/myrbd@snap1
    $ rbd map rbd/myrbd@snap1
    $ mount -t xfs -o ro,norecovery,nouuid "/dev/rbd/rbd/myrbd@snap1" "/mnt/myrbd@snap1"
    1
    2
    3
    
    $ ls "/mnt/myrbd"
    total 0
    v2

    OK.

    1
    2
    
    $ ls "/mnt/myrbd@snap1"
    total 0

    Nothing ??? Something went wrong with the sync ?

    Try again :

    1
    2
    3
    4
    5
    6
    
    $ sync && xfs_freeze -f /mnt/
    $ rbd snap create rbd/myrbd@snap2
    $ xfs_freeze -u /mnt/
    $ mkdir -p /mnt/myrbd@snap2
    $ rbd map rbd/myrbd@snap2
    $ mount -t xfs -o ro,norecovery,nouuid "/dev/rbd/rbd/myrbd@snap2" "/mnt/myrbd@snap2"

    Move again the file.

    1
    
    $ mv /mnt/myrbd/v2 /mnt/myrbd/v3
    1
    2
    3
    4
    5
    6
    
    $ ls /mnt/myrbd@snap2
    total 0
    v2
    $ ls /mnt/myrbd
    total 0
    v3

    All right.

    Stop rbdmap (will remove all rbd mapped device)

    1
    
    $ service rbdmap remove

    Remove line added in /etc/ceph/rbdmap

    Remove myrbd :

    1
    2
    3
    4
    
    $ rbd snap purge rbd/myrbd
    Removing all snapshots: 100% complete...done.
    $ rbd rm rbd/myrbd
    Removing image: 100% complete...done.
  • August 2, 2013
    Don’t Forget Unmap Before Remove Rbd

    1
    2
    3
    4
    $ rbd rm rbd/myrbd
    Removing image: 99% complete…failed.2013-08-02 14:07:17.530470 7f3ba2692760 -1 librbd: error removing header: (16) Device or resource busy
    rbd: error: image still has watchers
    This means the image is still open or the clien…

  • July 30, 2013
    Convert RBD to Format V2

    Simple Import / Export

    Don’t forget to stop IO before sync and unmap rbd before rename.

    1
    2
    3
    $ rbd export rbd/myrbd – | rbd import –image-format 2 – rbd/myrbd_v2
    $ rbd mv rbd/myrbd rbd/myrbd_old
    $ rbd mv rbd/myrbd_v2 rbd/myrbd

    Check :

  • July 30, 2013
    Remove Snapshot Before Rbd

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    $ rbd rm rbd/myrbd
    2013-07-30 14:10:13.341184 7f9e11922760 -1 librbd: image has snapshots – not removing
    Removing image: 0% complete…failed.
    rbd: image has snapshots – these must be deleted with ‘rbd snap purge’ before the …

  • July 30, 2013
    Using Ceph-deploy

    Install the ceph cluster

    On each node :

    create a user “ceph” and configure sudo for nopassword :

    1
    2
    3
    4
    $ useradd -d /home/ceph -m ceph
    $ passwd ceph
    $ echo “ceph ALL = (root) NOPASSWD:ALL” | sudo tee /etc/sudoers.d/ceph
    $ chmod 0440 /e…

  • July 26, 2013
    Ceph: update Cephx Keys

    It’s not really clear from the command line

    Generate a dummy key for the exercise

    1
    2
    3
    4
    $ ceph auth get-or-create client.dummy mon 'allow r' osd 'allow rwx pool=dummy'

    [client.dummy]
    key = AQAPiu1RCMb4CxAAmP7rrufwZP…

  • July 11, 2013
    Inktank Presenting on Ceph at FinTech Demo Day!

    It’s been quite a year for Inktank and the Ceph community. We are super excited to announce another major milestone for Inktank – our participation in the third annual FinTech Innovation Lab in New York City. The goal of the Lab – established in 2010 by Accenture and the Partnership Fund for New York City […]

  • June 23, 2013
    What I think about CephFS in OpenStack

    I recently had some really interesting questions that led to some nice discussions.
    Since I received the same question twice, I thought it might be good to share the matter with the community.

    The question was pretty simple and obvioulsy the context…

  • June 11, 2013
    Ceph RBD Online Resize

    Extend rbd drive with libvirt and XFS

    First, resize the device on the physical host.

    Get the current size :

    1
    
    $ qemu-img info -f rbd "rbd:rbd/myrbd"

    Be careful, you must specify a bigger size, shrink a volume is destructive for the FS.

    1
    
    $ qemu-img resize -f rbd "rbd:rbd/myrbd" 600G

    List device define for myVM :

    1
    
    $ virsh domblklist myVM

    Resize libvirt blockdevice :

    1
    2
    
    $ virsh blockresize --domain myVM --path vdb --size 600G
    $ rbd info rbd/myrb

    Extend xfs on guest :

    1
    
    $ xfs_growfs /mnt/rbd/myrbd

    Extend rbd with kernel module

    You need at least kernel 3.10 on ceph client to support resizing.
    For previous version look at http://dachary.org/?p=2179

    Get current size :

    1
    
    $ rbd info rbd/myrbd

    Just do :

    1
    2
    
    $ rbd resize rbd/myrbd --size 600000
    $ xfs_growfs /mnt/rbd/myrbd

    Also, since cuttlefish you can’t shrink a bloc device without specify additional option (–allow-shrink)

Careers