The contents of this wiki are no longer actively maintained. The most current documentation is available at http://ceph.com/docs.

Backend filesystem requirements

From Ceph wiki

Revision as of 00:04, 17 February 2012 by Dmick (Talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

The following list describes what is required of a file system so that ceph can run on top of it.

  • support for xattrs

Filesystem needs to support xattrs.

  • Large xattrs

Certain file systems provide xattrs, however, they only support small sized xattrs. E.g., ext3/4 support only 4K xattrs.

We do hit the limit with radosgw pretty easily, though, and may also hit it in exceptional cases where the OSD cluster is very unhealthy.

XFS does not have an xattr size limit and thus does not have this problem.

btrfs limits individual xattrs; however, we can work around that by chaining multiple xattrs as it doesn't limit the total size of xattrs per file (unlike ext3/ext4).

(note: it may be that we could modify radosgw to use fewer xattrs at the cost of reduced functionality, but that is yet to be proven)

So the order of preference:

(a) no xattr size limit (total or individual)
(b) limit on individual xattr (e.g., for btrfs it just fits into a btree
    node), but no limit on total xattrs.
(c) large limit on total xattrs.
  • Keeping consistent state

There are a few options here (in order of preference):

(a) the ability to create and remove snapshots programmatically over a
    file system volume. Snapshot creation needs to be fast.
(b) fsync on a just-written file flushes the underlying fs's journal
    such that all previous operations are also committed (such as
    ext3's data=journal)
(c) ability to sync a single mounted file system
(d) operational sync operation (should be with any POSIX-compliant
    file system).
Personal tools