Ceph erasure code jerasure plugin benchmarks

loic

On a Intel(R) Xeon(R) CPU E5-2630 0 @ 2.30GHz processor (and all SIMD capable Intel processors) the Reed Solomon Vandermonde technique of the jerasure plugin, which is the default in Ceph Firefly, performs better.

Reed Solomon vs Cauchy

The chart is for decoding erasure coded objects. Y are in GB/s and the X are K/M/erasures. For instance 10/3/2 is K=10,M=3 and 2 erasures, meaning each object is sliced in K=10 equal chunks and M=3 parity chunks have been computed and the jerasure plugin is used to recover from the loss of two chunks (i.e. 2 erasures).

Benchmark reports

The bench.sh output is rendered in a standalone HTML page with Flot from the root directory of the source file.

  • SIMD optimized

    TOTAL_SIZE=$((4 * 1024 * 1024 * 1024)) \ CEPH_ERASURE_CODE_BENCHMARK=src/ceph_erasure_code_benchmark \ PLUGIN_DIRECTORY=src/.libs \ qa/workunits/erasure-code/bench.sh fplot jerasure

  • no optimization

    PARAMETERS='--parameter jerasure-variant=generic' \ TOTAL_SIZE=$((4 * 1024 * 1024 * 1024)) \ CEPH_ERASURE_CODE_BENCHMARK=src/ceph_erasure_code_benchmark \ PLUGIN_DIRECTORY=src/.libs \ qa/workunits/erasure-code/bench.sh fplot jerasure

Results interpretation

The benchmarks are presented in two charts, one for encoding performances and another for decoding performances. The Y axis is the amount data processed in GB/s : more is better.

simd encode 4KB

The X axis has one K/M pair for each point, ordered from the simpler on the left (K=2, M=1 which is also the default in Firefly) to the one requiring more effort on the right (K=10, M=4).

simd optimized decode 4KB

The X axis of the chart for decoding performances is further divided to show the cost of recovering from an increasing number of erasures. For instance the 4/3/1 point for Reed Solomon shows that an object encoded with K=4, M=3 that has lost one chunk (one erasure) can be decoded at a rate over 0.75 GB/s. The next point, 4/3/2 shows that when there are two erasures, the rate falls under 0.75 GB/s. The points that share the same K/M pair are connected with a line.

SIMD improvements and previous benchmarks

jerasure version 2 can use SIMD to accelerate encoding and decoding. Without SIMD, the Cauchy technique performs better than the Reed Solomon Vandermonde technique with 1MB objects.

generic decode 1MB

With SIMD the Reed Solomon Vandermonde technique is faster.

The previous jerasure benchmarks were on version one but they also show that the Cauchy technique is faster. However, these benchmarks were conducted before the implementation of erasure coded pools. The actual stripe size is 4KB and the 1MB results are only included to compare with previous results.