Planet Ceph

Aggregated news from external sources

  • July 11, 2015
    Running your own Ceph integration tests with OpenStack

    Note: this is obsoleted by Ceph integration tests made simple with OpenStack The Ceph lab has hundreds of machines continuously running integration and upgrade tests. For instance, when a pull request modifies the Ceph core, it goes through a run … Continue reading

  • July 8, 2015
    configuring ansible for teuthology

    As of July 8th, 2015, teuthology (the Ceph integration test software) switched from using Chef to using Ansible. To keep it working, two files must be created. The /etc/ansible/hosts/group_vars/all.yml file with: modify_fstab: false The modify_fstab is necessary for OpenStack provisioned … Continue reading

  • July 8, 2015
    See what the Ceph client sees

    The title is probably weird and misleading but I could not find better than this :).
    The idea here is to dive a little bit into what the kernel client sees for each client that has a RBD device mapped.
    In this article, we are focusing on the Kernel R…

  • July 6, 2015
    Ceph enable the object map feature

    The Hammer release brought the support of a new feature for RBD images called object map.
    The object map tracks which blocks of the image are actually allocated and where.
    This is especially useful for operations on clones like resize, import, export…

  • July 3, 2015
    First China Ceph Day – Beijing Ceph Day

    Ceph is becoming more and more popular in China. Intel and Redhat jointly held the Beijing Ceph Day in Intel RYC office on June 6th, 2015. It attracted ~200 developers, end users from 120+ companies. Ten technical sessions were delivered to share Ceph’s transformative power during the event, it also focused on current problems of …Read more

  • June 28, 2015
    Bring persistent storage for your containers with KRBD on Kubernetes

    {% img center http://sebastien-han.fr/images/kubernetes-ceph-krbd.png Bring persistent storage for your containers with KRBD on Kubernetes %}

    Use RBD device to provide persistent storage to your containers.
    This work was initiated by a colleague of mine Huamin Chen.
    I would like to take the opportunity to thank him for the troubleshooting session we had.
    Having the ability to use persistent volume for your containers is critical, containers can be ephemeral since they are immutable.
    If they did on a machine they can be bootstrapped on another host without any problem.
    The only problem here is we need to ensure that somehow the data that come with this container will follow it no matter where it goes.
    This is exactly what we want to achieve with this implementation.

    Pre requisite

    This article assumes that your Kubernetes environment is up and running.
    First on your host install Ceph:

    $ sudo yum install -y ceph-common
    

    W Important: the version of ceph-common must be >= 0.87.

    Set up your Ceph environment:

    $ sudo docker run -d \
    --net=host \
    -v /var/lib/ceph:/var/lib/ceph \
    -v /etc/ceph:/etc/ceph \
    -e MON_IP=192.168.0.1 \
    -e CEPH_NETWORK=192.168.0.0/24 \
    ceph/demo
    

    Several actions are not assumed by Kubernetes such as:

    • RBD volume creation
    • Filesystem on this volume

    So let’s do this first:

    $ sudo rbd create foo -s 1024
    $ sudo rbd map foo
    /dev/rbd0
    $ sudo mkfs.ext4 /dev/rbd0
    $ sudo rbd unmap /dev/rbd0
    

    Configure Kubernetes

    First, we clone Kubernetes repository to get some handy file examples:

    $ git clone https://github.com/GoogleCloudPlatform/kubernetes.git
    $ cd kubernetes/examples/rbd
    

    Get your ceph.admin key and encode it in base64:

    $ sudo ceph auth get-key client.admin
    AQBAMo1VqE1OMhAAVpERPcyQU5pzU6IOJ22x1w==
    
    $ echo "AQBAMo1VqE1OMhAAVpERPcyQU5pzU6IOJ22x1w==" | base64
    QVFCQU1vMVZxRTFPTWhBQVZwRVJQY3lRVTVwelU2SU9KMjJ4MXc9PQo=
    

    R Note: it’s not mandatory to use the client.admin key, you can use whatever key you want as soon as it has the approprieate permissions of the given pool.

    Edit your ceph-secret.yml with the base64 key:

    apiVersion: v1
    kind: Secret
    metadata:
      name: ceph-secret
    data:
      key: QVFCQU1vMVZxRTFPTWhBQVZwRVJQY3lRVTVwelU2SU9KMjJ4MXc9PQo=
    

    Add your secret to Kubernetes:

    $ kubectl create -f secret/ceph-secret.yaml
    $ kubectl get secret
    NAME                  TYPE                                  DATA
    ceph-secret           Opaque                                1
    

    Now, we edit our rbd-with-secret.json pod file.
    This file describes the content of your pod:

    {
        "apiVersion": "v1beta3",
        "id": "rbdpd2",
        "kind": "Pod",
        "metadata": {
            "name": "rbd2"
        },
        "spec": {
            "containers": [
                {
                    "name": "rbd-rw",
                    "image": "kubernetes/pause",
                    "volumeMounts": [
                        {
                            "mountPath": "/mnt/rbd",
                            "name": "rbdpd"
                        }
                    ]
                }
            ],
            "volumes": [
                {
                    "name": "rbdpd",
                    "rbd": {
                        "monitors": [
                                                            "192.168.0.1:6789"
                                     ],
                        "pool": "rbd",
                        "image": "foo",
                        "user": "admin",
                        "secretRef": {
                                                      "name": "ceph-secret"
                                             },
                        "fsType": "ext4",
                        "readOnly": true
                    }
                }
            ]
        }
    }
    

    The relevant sections are:

    • mountPath: where to mount the RBD image, this mountpoint must exist
    • monitors: address of the monitors (you can have asn many as you want)
    • pool: the pool used to store your image
    • image: name of the image
    • secretRef: name of the secret
    • fsType: filesystem type of the image

    Now it’s time to fire it up your pod:

    $ kubectl create -f rbd-with-secret.json
    $ kubectl get pods
    NAME      READY     REASON    RESTARTS   AGE
    rbd2      1/1       Running   0          1m
    

    Check the running containers:

    $ docker ps
    CONTAINER ID        IMAGE                                  COMMAND             CREATED             STATUS              PORTS               NAMES
    61e12752d0e9        kubernetes/pause:latest                "/pause"            18 minutes ago      Up 18 minutes                           k8s_rbd-rw.1d89132d_rbd2_default_bd8b2bb0-1c0d-11e5-9dcf-b4b52f63c584_f9954e16
    e7b1c2645e8f        gcr.io/google_containers/pause:0.8.0   "/pause"            18 minutes ago      Up 18 minutes                           k8s_POD.e4cc795_rbd2_default_bd8b2bb0-1c0d-11e5-9dcf-b4b52f63c584_ac64e07c
    e9dfc079809f        ceph/demo:latest                       "/entrypoint.sh"    3 hours ago         Up 3 hours                              mad_ardinghelli
    

    Everything seems to be working well, let’s check the device status on the Kubernetes host:

    $ sudo rbd showmapped
    id pool image snap device
    0  rbd  foo   -    /dev/rbd0
    

    The image got mapped, now we check where this image got mounted:

    $ mount |grep kube
    /dev/rbd0 on /var/lib/kubelet/plugins/kubernetes.io/rbd/rbd/rbd-image-foo type ext4 (ro,relatime,stripe=1024,data=ordered)
    /dev/rbd0 on /var/lib/kubelet/pods/bd8b2bb0-1c0d-11e5-9dcf-b4b52f63c584/volumes/kubernetes.io~rbd/rbdpd type ext4 (ro,relatime,stripe=1024,data=ordered)
    

    Further work and known issue

    The current implementation is here and it’s good to see that we merged such thing.
    It will be easier in the future to follow up on that original work.
    The “v2” will ease operators life, since they won’t need to pre-populate RBD images and filesystems.

    There is a bug currently where the pod creation failed if the mount point doest not exist.
    This is fixed in Kubernetes 0.20.

    I hope you will enjoy this as much as I do 🙂

  • June 26, 2015
    Jewel – Ceph Developer Summit

    The next (virtual) Ceph Developer Summit is coming. The agenda has been finally announced for the 1.and 2. of July 2015. The fist day starts at 07:00 PDT (16:00 CEST) and the second day starts at 18:00 PDT on 2. July or rather 03:00 CEST on 03.July.&nb…

  • June 26, 2015
    Map a RBD device inside a Docker container

    {% img center http://sebastien-han.fr/images/map-rbd-device-inside-docker-container.jpg Map a RBD device inside a Docker container %}

    People have been having trouble to map a RBD device in a container.
    Quick tip on how to map a Rados Block Device into a container:

    Bootstrap a Ceph demo container:

    $ docker run -d \
    --net=host \
    -v /var/lib/ceph:/var/lib/ceph \
    -v /etc/ceph:/etc/ceph \
    -e MON_IP=192.168.0.1 \
    -e CEPH_NETWORK=192.168.0.0/24 \
    ceph/demo
    

    Enable the Kernel module and create the image:

    $ sudo modprobe rbd
    $ sudo rbd create foo -s 1024
    

    Then bootstrap a container, map the container and put a filesystem on top of it:

    $ sudo docker run -ti -v /dev:/dev -v /sys:/sys --net=host --privileged=true -v /etc/ceph:/etc/ceph ceph/base bash
    root@atomic1:/# rbd map foo
    /dev/rbd0
    
    root@atomic1:/# rbd showmapped
    id pool image snap device
    0  rbd  foo   -    /dev/rbd0
    
    root@atomic1:/# mkfs.ext4 /dev/rbd0
    ...
    ...
    
    root@atomic1:/# mount /dev/rbd0 /mnt/
    
    root@atomic1:/# df -h
    Filesystem                 Size  Used Avail Use% Mounted on
    /dev/dm-5                   10G  483M  9.6G   5% /
    shm                       1001M  8.0K 1001M   1% /dev/shm
    tmpfs                     1001M   12K 1001M   1% /run
    tmpfs                     1001M     0 1001M   0% /tmp
    devtmpfs                   986M     0  986M   0% /dev
    tmpfs                     1001M  8.0K 1001M   1% /dev/shm
    /dev/mapper/atomicos-root   11G  1.9G  9.1G  18% /etc/ceph
    tmpfs                     1001M     0 1001M   0% /sys/fs/cgroup
    /dev/rbd0                  976M  1.3M  908M   1% /mnt
    

    Et voilà !

  • June 25, 2015
    Get OMAP Key/value Size

    List the total size of all keys for each object on a pool.

    object size_keys(kB) size_values(kB) total(kB) nr_keys nr_values
    meta.log.44 0 1 1 0 10
    data_log.78 0 56419 5…

  • June 25, 2015
    DOST 2015: Ceph Security Presentation

    After two days the first “Deutsche OpenStack Tage” ended. There have been many interesting presentations and discussions on OpenStack and also Ceph topics. You can find the slides from my talk about “Ceph in a security critical OpenStack Cloud” on slid…

  • June 22, 2015
    Bootstrap your Ceph cluster in Docker

    {% img center http://sebastien-han.fr/images/bootstrap-ceph-cluster-docker.jpg Bootstrap your Ceph cluster in Docker %}

    Almost two years have passed since my first attempt to run Ceph inside Docker.
    Time has elapsed and I haven’t really got the time to resume this work until recently.
    For the last couple of months, I have been devoting a third part of my time to contributing on deploying Ceph in Docker.
    Before we start, I would like to highlight that nothing of this work would have been possible without the help of Seán C. McCord.
    Indeed the current ceph-docker repository is based on Seán’s initial work.
    Let’s see how you can get this running!

    Rationale

    Running Ceph inside Docker is a bit controversial and many people might believe that there is no point doing this.
    Where it is not really a problem for Monitors, Metadata Server and Rados Gateway to be containerized things get tricky when it comes to the OSDs.
    The Ceph OSD is really tighten to the machine it runs on, having such strong relationship with the hardware is not something common for all the softwares.
    Given that the OSD can not work if the disk that it relies on die is a bit of an issue in this container world.

    To be honest at some point, I was thinking this:

    I don’t know why am I doing this. I just know that people out there want it (and yes they probably don’t know why).
    I can feel it’s important to do it anyway, so let’s do it.

    This does not sound really optimistic I know, but it’s somehow the truth.
    My vision has slightly changed though, so for what it’s worth let me explain why.
    We will see if you will change your mind as well.
    And yes my explanation will be more than: Docker is fancy, so let’s Dockerize everything.

    People have started investing a lot of engineering efforts to run containerized softwares on their platforms.
    Thus they have been using various tools to build and orchestrate their environment.
    And I won’t be surprised to see Kubernetes being the orchestration tool for this matter.
    Some people also love to run bleeding edge technologies on production as they might find other things boring (right Seán?).
    So with the containerize everything approach, they will be happy that something is happening on their favorite open source storage solution :).

    Where with yum or apt-get it is not easy to rollback, this is different with containers.
    Upgrades and rollback are made easier, as you can easily docker stop and docker run a new version of your daemons.
    You can also potentially run different clusters on an isolated fashion on the same machine.
    This makes development ideal.

    The project

    As mentioned, everything started from Seán C. McCord work and we iterated around his work together.
    Currently if you use ceph-docker you will be able to run every single Ceph daemon either on Ubuntu or CentOS.
    We have a lot of images available on Docker Hub.
    We have the Ceph namespace, so our images are prefixed as ceph/<daemon>.
    We use automated builds, as a result everytime we merge a new patch and new build gets triggered and produces a new version of the container image.
    As we are currently in a refactoring process, you will see that a lot of images are available.
    Historically we had (and we still do until we merge this patch) one image per daemon.
    So one container image for monitor, osd, mds and radosgw.
    This is not really ideal and in practice not needed.
    This is why we worked on a single container image called daemon.
    This image contains all the Ceph daemons and you activate the one you want with a parameter while invoking the docker run command.
    That being said, if you want to start I encourage you to directly use the ceph/daemon image.
    I will show example in the next section on how to run it.

    Containerize Ceph

    Monitors

    Given that monitors can not communicate through a NATed network we need to use the --net=host to expose Docker’s host machine network stack:

    $ sudo docker run -d --net=host \
    -v /etc/ceph:/etc/ceph \
    -v /var/lib/ceph/:/var/lib/ceph \
    -e MON_IP=192.168.0.20 \
    -e CEPH_PUBLIC_NETWORK=192.168.0.0/24 \
    ceph/daemon mon
    

    List of available options:

    • MON_IP is the IP address of your host running Docker
    • MON_NAME is the name of your monitor (DEFAULT: $(hostname))
    • CEPH_PUBLIC_NETWORK is the CIDR of the host running Docker, it should be in the same network as the MON_IP
    • CEPH_CLUSTER_NETWORK is the CIDR of a secondary interface of the host running Docker. Used for the OSD replication traffic.

    Object Storage Daemon

    The current implementation allows you to run a single OSD process per container
    Following the microservice mindset we should not run more than one service inside our container.
    In our case, running multiple OSD processes into a single container breaks this rule and will likely introduce undesirable behaviours.
    This will also increase the setup and maintenance complexity of the solution.

    In this configuration, the usage of --privileged=true is strictly required because we need a full access to /dev/ and other kernel functions.
    However, we support another configuration based on simply exposing OSD directories, where the operators will do the appropriate preparation of the devices.
    Then he/she will simply expose the OSD directory and populating (ceph-osd mkfs) the OSD will be done by the entrypoint.
    The configuration I’m presenting now is easier to start with because you only need to specify a block device and the entrypoint will do the rest.

    For those who do not want to use --privileged=true, please fall back on the second example.

    $ sudo docker run -d --net=host \
    --privileged=true \
    -v /etc/ceph:/etc/ceph \
    -v /var/lib/ceph/:/var/lib/ceph \
    -v /dev/:/dev/ \
    -e OSD_DEVICE=/dev/vdd \
    ceph-daemon osd_ceph_disk
    

    If you don’t want to use --privileged=true you can always prepare the OSD by yourself with the help of your configuration management of your choice.

    Example without a privileged mode, in this example we assume that you partitioned, put a filesystem and mounted the OSD partition.
    To create your OSDs simply run the following command:

    $ sudo docker exec <mon-container-id> ceph osd create.
    

    Then run your container like so:

    docker run -v /osds/1:/var/lib/ceph/osd/ceph-1 -v /osds/2:/var/lib/ceph/osd/ceph-2
    
    $ sudo docker run -d --net=host \
    -v /etc/ceph:/etc/ceph \
    -v /var/lib/ceph/:/var/lib/ceph \
    -v /osds/1:/var/lib/ceph/osd/ceph-1 \
    ceph-daemon osd_disk_directory
    

    List of available options:

    • OSD_DEVICE is the OSD device, ie: /dev/sdb
    • OSD_JOURNAL is the device that will be used to store the OSD’s journal, ie: /dev/sdz
    • HOSTNAME is the hostname of the hostname of the container where the OSD runs (DEFAULT: $(hostname))
    • OSD_FORCE_ZAP will force zapping the content of the given device (DEFAULT: 0 and 1 to force it)
    • OSD_JOURNAL_SIZE is the size of the OSD journal (DEFAULT: 100)

    Metadata Server

    This one is pretty straighforward and easy to bootstrap.
    The only caviat at the moment is that we require the Ceph admin key to be available in the Docker.
    This key will be used to create the CephFS pools and the filesystem.

    If you run an old version of Ceph (prior to 0.87) you don’t need this, but you will likely do since it is always better to run the last version!

    $ sudo docker run -d --net=host \
    -v /var/lib/ceph/:/var/lib/ceph \
    -v /etc/ceph:/etc/ceph \
    -e CEPHFS_CREATE=1 \
    ceph-daemon mds
    

    List of available options:

    • MDS_NAME is the name of the Metadata server (DEFAULT: mds-$(hostname))
    • CEPHFS_CREATE will create a filesystem for your Metadata server (DEFAULT: 0 and 1 to enable it)
    • CEPHFS_NAME is the name of the Metadata filesystem (DEFAULT: cephfs)
    • CEPHFS_DATA_POOL is the name of the data pool for the Metadata Server (DEFAULT: cephfs_data)
    • CEPHFS_DATA_POOL_PG is the number of placement groups for the data pool (DEFAULT: 8)
    • CEPHFS_DATA_POOL is the name of the metadata pool for the Metadata Server (DEFAULT: cephfs_metadata)
    • CEPHFS_METADATA_POOL_PG is the number of placement groups for the metadata pool (DEFAULT: 8)

    Rados Gateway

    For the Rados Gateway, we deploy it with civetweb enabled by default.
    However it is possible to use different CGI frontends by simply giving remote address and port.

    $ sudo docker run -d --net=host \
    -v /var/lib/ceph/:/var/lib/ceph \
    -v /etc/ceph:/etc/ceph \
    ceph-daemon rgw
    

    List of available options:

    • RGW_REMOTE_CGI defines if you use the embedded webserver of Rados Gateway or not (DEFAULT: 0 and 1 to disable it)
    • RGW_REMOTE_CGI_HOST is the remote host running a CGI process
    • RGW_REMOTE_CGI_PORT is the remote port of the host running a CGI process
    • RGW_CIVETWEB_PORT is the listenning port of civetweb (DEFAULT: 80)
    • RGW_NAME is the name of the Rados Gateway instance (DEFAULT: $(hostname))

    Further work

    Configuration store backends

    By default, the ceph.conf and all the ceph keys are generated during the initial monitor bootstrap.
    This process assumes that to extand your cluster to multiple nodes you have to distribute these configurations across all the nodes.
    This is not really flexible and we want to improve this.
    One thing that I will propose soon is to use Ansible to generate the configuration/keys and to distribute them on all the machines.

    Alternatively, we want to be able to store various configuration files on different backends kv store like etcd and consul.

    Orchestrate the deployment

    A very first step is to use ceph-ansible where the logic is already implemented.
    I just need to push some changes, but most of the work is already present.

    Kubernetes, a preview on how to bootstrap monitors is already available.

    Extending to Rocket and beyond

    There is not much to do here as you can simply port your Docker images into Rocket and launch them (pun intented here).

    Bonus video

    A video demo is available:

    {% youtube FUSTjTBA8f8 %}

    Once again, I would like to take the opportunity to thank Seán C. McCord who has made it possible.
    Seán is a nice person to work with and I’m looking forward to contributing with him to ceph-docker!

  • June 22, 2015
    The Kernel 4.1 Is Out

    This kernel version support all features for Hammer, in particular straw v2.

    https://www.kernel.org/

    The main changes in this version:

    rbd: rbd_wq comment is obsolete
    libceph: announce support for straw2 buckets
    crush: straw2 bucket type with an …

Careers