Planet Ceph

Aggregated news from external sources

  • July 8, 2011
    Linus vs FUSE

    I can’t decide whether Linus is amused or annoyed by the extent to which people hang on his every word, or go nuts over his random rants about this or that. People still talk about his pronouncement about O_DIRECT and tripping monkeys (which has now found a home on the open(2) man page). The latest […]

  • November 13, 2010
    S3-compatible object storage with radosgw

    The radosgw has been around for a while, but it hasn’t been well publicized or documented, so I thought I’d mention it here.  The idea is this: Ceph’s architecture is based on a robust, scalable distributed object store called RADOS. Amazon’s S3 has shown that a simple object-based storage interface is a convenient way to […]

  • June 6, 2009
    RADOS snapshots

    Some interesting issues came up when we started considering how to expose the RADOS snapshot functionality to librados users.  The object store exposes a pretty low-level interface to control when objects are cloned (i.e. when an object snapshot is taken via the btrfs copy-on-write ioctls).  The basic design in Ceph is that the client provides […]

  • May 19, 2009
    The RADOS distributed object store

    The Ceph architecture can be pretty neatly broken into two key layers.  The first is RADOS, a reliable autonomic distributed object store, which provides an extremely scalable storage service for variably sized objects.  The Ceph file system is built on top of that underlying abstraction: file data is striped over objects, and the MDS (metadata […]

  • March 11, 2009
    dbench performance

    Yehuda and I did some performance tuning with dbench a couple weeks back and made some significant improvements.  Here are the rough numbers, before I forget.  We were testing on a simple client/server setup to make a reasonable comparison with NFS: single server on a single SATA disk, and a single client. Since we were […]

  • March 6, 2009
    New configuration and startup framework

    Yehuda and I spent last week polishing his configuration framework and reworking the way everything is configured and started up.  I think the end result is pretty slick: There are now two configuration files.  The first, cluster.conf, defines which hosts participate in the cluster, which daemons run on which hosts, and what paths are used […]

  • January 12, 2009
    Asynchronous metadata operations

    The focus for the last few weeks has been on speeding up metadata operations.  The problem has been that the focus was first and foremost on reliability and recoverability.  Each metadata operation was performed by the MDS, and it was journaled safely to the OSDs before being applied.  This meant that every metadata operation went […]

  • November 6, 2008
    lockdep for pthreads

    Linux has a great tool called lockdep for identifying locking dependency problems.  Instead of waiting until an actual deadlock occurs (which may be extremely difficult when it is a timing-sensitive thing), lockdep keeps track of which locks are already held when any new lock is taken, and ensures that there are no cycles in the […]

  • April 18, 2008
    POSIX file system test suite

    A few weeks back a POSIX file system test suite was announced on linux-fsdevel. Some 1700 tests of return values, error codes, and side effects for things like unlink, chmod, and so forth. The suite turned up a number of minor bugs in the MDS and client (mostly relating to things like legal file modes), […]

  • April 18, 2008
    Delayed capability release

    The Ceph MDS server issues “capabilities” to clients to grant them permission to read or write objects for a particular file. I’ve added a delayed release of capabilities after a file is closed, as many workloads will quickly reopen the same file. In that case, we can re-use our existing capabilities (and be assured by […]

  • March 24, 2008
    Client cache coherency

    I’ve been shying away from the question of how to manage the client metadata cache consistency for ages, now, under the assumption that it was going to complicate the client/MDS protocol and MDS significantly. Zach’s progress on CRFS got me thinking about it again, though, and I had a realization the other night that most […]

  • March 19, 2008
    File system creation and scaling

    I’ve spent the last week or so revamping the whole “mkfs” process and some of the machinery needed to adjust data distribution when the size of the cluster changes by an order of magnitude or more. The basic problem is that data is distributed in a two-step process: objects are first statically mapped into one […]