We’ve just tagged the v0.25 release. Most of the work here is in the OSD cluster, a new librbd library (refactoring existing RBD infrastructure), and a librados API refresh.
The librados changes are an attempt to clean up the API warts sooner rather than later. If there are any issues with the new interface, we’d like to hear about them!
The new librbd library sits on top of librados and captures the RBD striping, snapshotting, and other functionality, presenting a simple block device-like interface. The qemu/KVM driver is being rewritten in terms of librbd, which will vastly simplify the upstream qemu code and allow us to fix bugs and add functionality without being tied to a specific version of qemu/KVM.
The focus for v0.26 will remain on stability, primary with the OSD cluster, RBD, and radosgw. Internally, we’re focusing on building out our QA and performance testing infrastructure.
We’ve released v0.24.3 with more bug fixes, including one that loses data in certain cases when OSDs restart during recovery. It’s pretty much all OSD stuff, which is where we’re focusing our testing efforts currently.
This is a bugfix release. Changes since v0.24.1 include:
v0.24.1 has been released, with a number of bug fixes from v0.24. These include:
This is also the first time I’ve built Ubuntu packages (for lucid and maverick), as the libcrypto++ dependency resolves to a different library version on Ubuntu and Debian sid. If anyone has any problems there, please let us know. libcrypto++ is unfortunately also a hassle under Redhat, as it is not included in RHEL and was only recently added to Fedora. We plan to start building RHEL/CentOS and Fedora packages soon, and will be updating the wiki with information on gathering all the dependencies to build from source shortly.
The QEMU-RBD block device has been merged upstream into the QEMU project. QEMU-RBD was created originally by Christian Brunner, and is binary compatible with the linux native RBD driver. It allows the creation of QEMU block devices that are striped over objects in RADOS — the Ceph distributed object store. As with the corresponding Linux device driver, the QEMU driver gets all the RBD goodies: thin provisioning, reliability, scalability, and snapshots!
libvirt is a virtualization library that allows controlling virtual machines (such as QEMU based VMs, but also others) using a single API. There are many tools already built around it (e.g., virsh, virt-manager, etc.), and adding the ability to configure RBD devices via the library makes RBD work in the existing tools. With the help of the Sheepdog project (whom also merged their QEMU block device upstream into QEMU recently), we were able to get RBD (and Sheepdog, and also nbd) support upstream into libvirt. Basically a new “network” disk type was added, and there are currently 3 possible types for such a disk: nbd, sheepdog, or rbd. For each you can specify a host name. E.g., for rbd the host name(s) would hold the ip address and tcp port for the ceph cluster monitor(s).
libvirt support for the Linux native kernel rbd driver is also in the works, which will allow rbd to be used with non-qemu VMs supported by libvirt (e.g., Xen, VirtualBox, VMware, etc.)
As we posted before, the RBD native linux device was merged into the upcoming Linux kernel version (2.6.37) which will be out in a few weeks. Since the original merge we’ve modified the RBD sysfs interface so that it’d conform better with the sysfs requirements: originally, the RBD driver was based on another linux block device called osdblk and it inherited its sysfs interface, which was monolithic and kept a single sysfs entry per config option for all the devices. This was both wrong and cumbersome, as we needed to specify the device id for each operation. The new interface moves the sysfs rbd subdir to a better location (/sys/bus/rbd) and creates a subdir per device, so that all operations for a single device are grouped together, and there’s no need to specify the device name. We also create a subdir per snapshot under the device that holds all its information, and we dropped the one-big-list-for-all entry.
All in all, it was a relatively big change to introduce well into the release cycle, but we believe it was worth it.
We’ve released v0.24, just in time for the holidays! Big changes this time around include:
The focus for the next release (v0.25) is on OSD and MDS stability, directory fragmentation recovery, and fsck preliminaries; see the roadmap for more details.
This release includes some bug fixes for v0.23, although there’s nothing here that too many people have been hitting, fortunately.
v0.24 is still a few weeks away, and will include OSD recovery improvements, background scrubbing, and MDS clustering and performance improvements, among other things.
The radosgw has been around for a while, but it hasn’t been well publicized or documented, so I thought I’d mention it here. The idea is this:
The result is radosgw, a FastCGI-based proxy that exposes Ceph’s object store via a REST (HTTP-based) interface. Radosgw implements a subset of Amazon’s API (some Amazon-specific features of ACLs and object versioning aren’t supported), but the subset it does implement aims to be fully compatible. That means that most existing apps that are designed for S3 can be seamlessly migrated to a Ceph-based object store, provided they allow the hostname to be configured (many hard-code s3.amazonaws.com).
It should be noted that this approach has some fundamental limitations:
Check it out!
Another month, and v0.23 is out. The main milestone here is that clustered MDS is pretty stable. Stable enough that, if you’re interested and willing, we’d like you to try it and let us know what problems you have. Notably, clustered recovery is not yet well tested (that’s v0.24), so don’t do this unless you’re feeling adventurous. Directory fragmentation (splitting and merging) is also working, although still off by default. If you’d like to try that too, add ‘mds bal frag = true’ to your [mds] section.
Other notable changes this time around:
The general focus for v0.24 will be continuing OSD stability and clustered MDS recovery.