The Ceph Blog

Linus vs FUSE

I can’t decide whether Linus is amused or annoyed by the extent to which people hang on his every word, or go nuts over his random rants about this or that. People still talk about his pronouncement about O_DIRECT and tripping monkeys (which has now found a home on the open(2) man page). The latest hullabaloo is about his decree that all FUSE-based file systems are toys.

Clearly, as many have pointed out, calling all such systems “toys” isn’t completely fair. But then it wouldn’t be fun to say it if it were strictly true. There are real systems (big and fast) built on FUSE, just as there are such systems built with Java, Visual BASIC, Cobol, and every other platform/technology we love to mock.

I haven’t seen PLFS come up yet in the discussion, but I think it’s worth mentioning just because it is such a good example of optimizing for the cases that actually matter for your workload. For those not familiar, PLFS (parallel log-structured file system) is a FUSE-based file system built at LANL for their huge many-thousand node clusters that turns all random IO sequential by building a mess of intermediate indices. It sounds like it would be a disaster, but in practice it speeds up their workloads by several orders of magnitude, simply because the underlying parallel file systems on which it is stacked are so bad at those workloads.

Anyway, there are just a few points I wanted to make about the kernel vs userspace file systems, having implemented the Ceph client using both. At the risk of stating the obvious:

  • There is nothing you can do in userspace that you can’t also do in the kernel. Sure, development can be harder in the kernel, but you have unparalleled access to the system. The only significant technical disadvantage of a kernel implementation is fault isolation: a buggy FUSE-based file system won’t take down the system with it.
  • Implementation is easier with FUSE. At least for something basic. There are some key problems that are harder to solve because of limitations in the interface.
  • Memory management is easier in the kernel. AB is right when he says that the memory management and file system need to work together. The problem is that it is difficult to push memory management into userspace when you are not the only tenant on the machine. (I suspect that in most of the big production environments where userspace file systems are used, the fs either is the sole tenant or is given some fixed amount of RAM to work with.) The kernel VM, on the other hand, will apply cache pressure dynamically based on the demands of all users of the system. Trying to do that in userspace is extremely awkward at best.
  • Managing cache coherency is easier in the kernel. Some people don’t care about this (e.g., see NFS, or any of the “toys” Linus was referring to), but we do. This is mainly a result of the limited FUSE interface. You can probably avoid the issue by simply not using the kernel dentry and page caches and reimplementing it all in userspace. That’s a simple enough approach, but is slow, and fails to leverage years of work invested in the core Linux VFS code.
  • FUSE may be partly to blame. Jeff Darcy has made the point that many of the FUSE shortcomings aren’t inherent to userspace storage, but artifacts of the current interface and kernel politics. Maybe that’s the case, but that is the world we live in. No file system that doesn’t work on Linux (or maybe *BSD) is relevant. And for what it’s worth, most of the people I see complaining about kernel community intransigence haven’t even tried to work upstream; it’s easier than you think, as long as the code you’re pushing isn’t crap.

Which is better for any given project in the end is probably more of a business decision: technical investment, performance, time to market, ease of deployment. If you’re talking purely about the technical limitations of the environment, however, it’s hard to beat the kernel.

Or, if you can, implement both. It makes these sorts of debates that much more fun.

Comments: Linus vs FUSE

  1. Not completely on topic here, but it’s interesting to note that the FUSE interface for PLFS is primarily a compatibility layer for codes requiring a POSIX interface (including the normal suite of posix utils), file archival, and directory management. Since this is targeted at speeding up distributed scientific shared file codes, the preferred interface is through MPI-IO using an ADIO driver. This interface allows us to relax many assumptions of a classical POSIX interface.

    Eventually PLFS does need to write data to the parallel file system. This is not the normal functionality of a standard in-kernel filesystem to reformat data and write it to another filesystem. This is, however, seen in many FUSE filesystems (sshfs, httpfs, etc). We were specifically not interested in reimplementing all of the standard POSIX utilities to be PLFS aware, and not interested in writing our own parallel filesystem.

    The point of this was supposed to demonstrate that often its just about the right tool for the job.

    Posted by Ben McClelland
    July 11, 2011 at 9:15 pm
  2. Off topic: Java is a well designed modern language and does not deserve to be associated with the likes of Visual BASIC or Cobol.

    “We mock what we don’t understand.” — Austin Millbarge

    Posted by neb
    July 14, 2011 at 6:06 pm
  3. @neb: The point is that these systems are viewed by many with disdain but still widely and successfully deployed in production environments. But you may be right; perhaps including Visual BASIC was stretching things a bit. :)

    Posted by sage
    July 14, 2011 at 8:35 pm
  4. off topic
    Neb repeat that in another ten years , when Java is just another legacy business language like cobol

    p.s you can build and run fortran(via gfortran) programs on most modern mobile phones even almost fully oo Fortran 2008 code.Fairly easy on android, rather hard for iphone

    Intel claims it makes more money selling fortran compilers than c/c++ compilers.

    Posted by al
    August 2, 2011 at 6:32 pm

Add Comment

*

© 2013, Inktank Storage, Inc.. All rights reserved.