Planet Ceph

Aggregated news from external sources

October 23, 2017

Using Erasure Coding with RadosGW

This is going to be a quick write up of Erasure Coding and how to use it with our RadosGW. First lets look at our default profile for erasure coding on Ceph, understand it, and go and create our own.

1
2
3
4
5
6
root> ceph osd erasure-code-profile get default
k=2
m=1
plugin=jerasure
crush-failure-domain=host
technique=reed_sol_van

Erasure coding profiles break down using the following formula.

  • n = k + m

k = the number of data chunks in which the original object is divided. For instance, in the default profile where K = 2, a 10KB object will be divided into K objects of 5KB each.

m = the number of coding chunks, i.e additional chunks that represent reliability level. If there are 2 coding chunks, it means 2 OSDs can be out without losing data.

n = The sum of the k and mchunks created.

In our default profile above this means we have 3 total chunks (2 + 1 = 3), and can lose m number of chunks, anything more than that and its Bad News Bears.

So what advantage is there to using erasure coding?

The main advantage is that your data footprint is not that large as compared to replicating your data by a factor of 3.
For example purposes lets use a 100GB file to determine our final raw data footprint using erasure coding. Using the following 2 formulas and our default profile;

  • ratio = k / n – (~.66 = 2/3)
  • total__raw = file_size * (1/ratio) – (~151.51GB = 100GB * (1/.66))

Our file size ends up being 151.51GB, instead of 300GB if replicated 3 times.

So what disadvantage is there to using erasure coding?

Mainly speed. Erasure coding takes time to process the chunks. And the mode chunks you have, the more resources and time it will take to process those. Most of the time, but not always the case, erasure coding will be slower. A good balance between size, reliability, and performance is to set k=4 and m=2.

Creating a erasure coding profile

So lets create one for our RGW pool using k=4 and m=2.

1
ceph osd erasure-code-profile set EC_RGW k=4 m=2 crush-failure-domain=host

Note – crush-failure-domain can be set to osd/host/rack etc etc

Converting a RGW pool to use erasure coding

CAUTION – Take it from me DO NOT convert any other pool besides default.rgw.buckets.data. I converted default.rgw.buckets.index to EC and after 5 hours I found the problem to be related to converting it. See below for examples of errors that occurred because of this.

Sadly you can’t (as far as I know), just switch a pool over to use erasure coding. But what we can do is run a mini script that will create a pool with erasure set, copy the old pool to the new pool, rename the old pool, and then rename the new pool to the old pools name. Sound confusing? Yeah I agree, but once you get it, it clicks and makes sense. This is how I like to convert pools, but as always, try this in a test environment before doing anything like this in production.

1
2
3
4
5
pool=default.rgw.buckets.data
ceph osd pool create $pool.new 128 128 erasure EC_RGW
rados cppool $pool $pool.new
ceph osd pool rename $pool $pool.old
ceph osd pool rename $pool.new $pool

Create a user, or use an existing one, and try to create a bucket or file. You should be able to create files like normal.

Troubleshooting

More than likely you are going to see these errors when you set any other pool to EC that isn’t default.rgw.buckets.data. This is easy enough to fix by essentially renaming everything back to the way it was before running the conversion script.

This example here is from me converting default.rgw.buckets.index to EC. I was able to read all files just fine, but I could not write anything, or create anything.

1
2
3
4
2017-10-12 10:54:45.108810 7f380338a700 1 ====== starting new request req=0x7f3803384710 =====
2017-10-12 10:54:45.138562 7f380338a700 0 ERROR: could not get stats for buckets
2017-10-12 10:54:45.138582 7f380338a700 0 WARNING: set_req_state_err err_no=5 resorting to 500
2017-10-12 10:54:45.138655 7f380338a700 1 ====== req done req=0x7f3803384710 op status=-5 http_status=500 ======
1
Oct 12 14:25:36 ceph-rgw1 radosgw[19834]: 2017-10-12 14:25:36.197153 7fe05b2ac9c0 -1 Couldn't init storage provider (RADOS)

Fin

I hope this helps out peeps and makes like a little easier. If this even helped out one admin, then it was well worth it.
Thanks for reading and feel free to contact me at magusnebula@gmail.com!

Source: Stephen McElroy (Using Erasure Coding with RadosGW)

Careers