Backend filesystem requirements
From Ceph wiki
The following list describes what is required of a file system so that ceph can run on top of it.
- support for xattrs
Filesystem needs to support xattrs.
- Large xattrs
Certain file systems provide xattrs, however, they only support small sized xattrs. E.g., ext3/4 support only 4K xattrs.
We do hit the limit with radosgw pretty easily, though, and may also hit it in exceptional cases where the OSD cluster is very unhealthy.
XFS does not have an xattr size limit and thus does not have this problem.
btrfs limits individual xattrs; however, we can work around that by chaining multiple xattrs as it doesn't limit the total size of xattrs per file (unlike ext3/ext4).
(note: it may be that we could modify radosgw to use fewer xattrs at the cost of reduced functionality, but that is yet to be proven)
So the order of preference:
(a) no xattr size limit (total or individual) (b) limit on individual xattr (e.g., for btrfs it just fits into a btree node), but no limit on total xattrs. (c) large limit on total xattrs.
- Keeping consistent state
There are a few options here (in order of preference):
(a) the ability to create and remove snapshots programmatically over a file system volume. Snapshot creation needs to be fast. (b) fsync on a just-written file flushes the underlying fs's journal such that all previous operations are also committed (such as ext3's data=journal) (c) ability to sync a single mounted file system (d) operational sync operation (should be with any POSIX-compliant file system).