Aggregated news from external sources
Finally back from Vancouver, back from an interesting week at the OpenStack Summit including a quite packed schedule with many interesting presentations, work sessions and meetings. I presented together with Sage Weil about “Storage security in a …
In Ceph, a pool can be configured to use erasure coding instead of replication to save space. When used with Intel processors, the default Jerasure plugin that computes erasure code can be replaced by the ISA plugin for better write … Continue reading →
The Ceph command line and ceph-disk helper are python scripts for which there are integration tests (ceph-disk.sh and test.sh). It would be useful to add unit tests and pep8 checks. It can be done by creating a python module instead … Continue reading →
Calculating the storage overhead of a replicated pool in Ceph is easy.
You divide the amount of space you have by the “size” (amount of replicas) parameter of your storage pool.
Let’s work with some rough numbers: 64 OSDs of 4TB each.
Raw size: 64 * 4 = 256TB Size 2 : 128 / 2 = 128TB Size 3 : 128 / 3 = 85.33TB
Replicated pools are expensive in terms of overhead: Size 2 provides the same resilience and overhead as RAID-1.
Size 3 provides more resilience than RAID-1 but at the tradeoff of even more overhead.
Explaining what Erasure coding is about gets complicated quickly.
What’s appealing with erasure coding is that it can provide the same (or better) resiliency than replicated pools but with less storage overhead – at the cost of the computing it requires.
Ceph has had erasure coding support for a good while already and interesting documentation is available:
The thing with erasure coded pools, though, is that you’ll need a cache tier in front of them to be able to use them in most cases.
This makes for a perfect synergy of slower/larger/less expensive drives for your erasure coded pool and faster, more expensive drives in front as your cache tier.
To calculate the overhead of a erasure coded pool, you need to know your ‘k’ and ‘m’ values of your erasure code profile.
When the encoding function is called, it returns chunks of the same size. Data chunks which can be concatenated to reconstruct the original object and coding chunks which can be used to rebuild a lost chunk.
The number of data chunks, i.e. the number of chunks in which the original object is divided. For instance if K = 2 a 10KB object will be divided into K objects of 5KB each.
The number of coding chunks, i.e. the number of additional chunks computed by the encoding functions. If there are 2 coding chunks, it means 2 OSDs can be out without losing data.
The formula to calculate the overhead is:
nOSD * k / (k+m) * OSD Size
Finally, let’s look at a couple different erasure coding profile configurations based on 64 OSDs of 4 TB ranging from m=1 to m=4 and k=1 to k=10:
| | 1 | 2 | 3 | 4 | |-----|--------|--------|--------|--------| | 1 | 128.00 | 85.33 | 64.00 | 51.20 | | 2 | 170.67 | 128.00 | 102.40 | 85.33 | | 3 | 192.00 | 153.60 | 128.00 | 109.71 | | 4 | 204.80 | 170.67 | 146.29 | 128.00 | | 5 | 213.33 | 182.86 | 160.00 | 142.22 | | 6 | 219.43 | 192.00 | 170.67 | 153.60 | | 7 | 224.00 | 199.11 | 179.20 | 162.91 | | 8 | 227.56 | 204.80 | 186.18 | 170.67 | | 9 | 230.40 | 209.45 | 192.00 | 177.23 | | 10 | 232.73 | 213.33 | 196.92 | 182.86 | | Raw | 256 | 256 | 256 | 256 |
After learning about the ceph-rest-api I just had
to do something fun with it.
In fact, it’s going to become very handy for me as I might start to develop
with it for things like nagios monitoring scripts.
Please try it out and let me know if you have any feedback !
Pull requests are welcome 🙂
When a teuthology target (i.e. machine) is provisioned with teuthology-lock for the purpose of testing Ceph, there is no way to choose the kernel. But it can be installed afterwards using the following: cat > kernel.yaml <<EOF interactive-on-error: true roles: … Continue reading →
The Ceph integration tests may fail because of environmental problems (network not available, packages not built, etc.). If six jobs failed out of seventy, these failed test can be re-run instead of re-scheduling the whole suite. It can be done … Continue reading →
When a Ceph teuthology integration test fails (for instance a rados jobs), it will collect core dumps which can be downloaded from the same directory where the logs and config.yaml files can be found, under the remote/mira076/coredump directory. The binary … Continue reading →
The “Ceph Developer Summit” for the Infernalis release is on the way. The summit is planed for 03. and 04. March. The blueprint submission period started on 16. February and will end 27. February 2015. Do you miss something in Ceph or plan to deve…