Ceph blog stories provide high-level spotlights on our customers all over the world
From February 13-18, we ran our very first Ceph Census. The purpose of this survey was to get a sense of how many Ceph clusters are out there, what they are used for, and which technologies they are used alongside.
The Census was announced on the ceph-devel and ceph-users mailing lists and a link was placed in the topic of the #ceph IRC channel. There were 10 questions in total. The survey was anonymous by default, although people could provide their email address if they chose. In total, we received data from 81 respondents.
Raw responses (without email addresses and other personally-identifiable information) are available at the bottom of this post. Between here and there is my attempt to summarize the most important data. Some questions were optional and some allowed for multiple answers, so the number of responses for each question was often more or less than 81. Enjoy!
It’s close to an even three-way split between those who are assessing Ceph, those with concrete production plans, and those in production.
The community reported 21 production clusters with a combined raw storage of 1,154TB. Apparently the team at DreamHost didn’t participate in the Census; DreamObjects alone is over 3PB!
Pre-production clusters represent a total raw storage of 2,466TB (excluding a reported 20PB cluster.)
|Assessment / Investigation||36|
The total amount of storage reported was 5,635TB, and most of it is in clusters with less than 50TB. The average cluster size is just over 72TB.
Since this question allowed for a free-form text response, I converted each response into TB. If a range was specified, I chose the lower number.
Of the two largest responses, one was the mysterious 20PB pre-production cluster I mentioned above that provides storage for an OpenStack deployment. The other was a 1PB cluster in pre-production at GRNET SA.
It’s no surprise that half of the reported Ceph clusters are being used to provide storage for cloud deployments. It is interesting, though, that private cloud deployments represent a far larger set of clusters than public ones.
I didn’t anticipate so much interest in Ceph for backup and archival, but I should have – Ceph’s low cost per gig and ease of expansion make it great for that.
I am pleased to see big data as a popular use case as well. Many open source distributed filesystems can be used to replace HDFS, and Ceph is no exception.
|Backup / Archival||29|
This was kind of an odd question because the “client” OS only matters for some use cases. It doesn’t really matter what OS a REST client is running, for example, but it matters a lot for clients of Ceph’s block and file interfaces.
Even so, Ubuntu has a substantial lead. Top among the “Other” responses were Gentoo and SLES.
On the server-side, Ubuntu is king. Over half of those polled said they were currently running, or planning to run, their clusters on Ubuntu.
We worked hard early last year to make sure that the Ceph experience on Ubuntu was great, and similar efforts are currently being put into the other major distributions.
Ubuntu and Debian combined (the apt-get cabal!) account for all but two of the production clusters reflected in this Census.
This was kind of a surprise! Actually, two surprises.
First, OpenStack is the most dominant cloud stack. Integrations with Apache CloudStack, ProxMox, and others have been generating interest – I wouldn’t be surprised to see a more even distribution in the next Census.
The second surprise is that most respondents use no cloud stack at all…even though the #1 and #3 use cases were cloud deployments!
The top responses under “Other” were Ganeti and VMWare vCloud.
So! That concludes our first Ceph Census. I think it was incredibly worthwhile, and I appreciate the participation of all those involved! Full results can be downloaded in CSV here.
We hope to repeat this Census regularly, and we’ll continue to publish results. Until next time!