When Ceph was originally designed a decade ago, the concept was that “intelligent” disk drives with some modest processing capability could store objects instead of blocks and take an active role in replicating, migrating, or repairing data within the system. In contrast to conventional disk drives, a smart object-based drive could coordinate with other drives in the system in a peer-to-peer fashion to build a more scalable storage system.
Today an ethernet-attached hard disk drive from WDLabs is making this architecture a reality. WDLabs has assembled over 500 drives from the early production line and assembled them into a 4 PB (3.6 PiB) Ceph cluster running Jewel and the prototype BlueStore storage backend. WDLabs has been working on validating the need to apply an open source compute environment within the storage device and is now beginning to understand the use cases as thought leaders such as Red Hat work with the early units. This test seeks to demonstrate that the second generation converged microserver has become a viable solution for distributed storage use cases like Ceph. Building an open platform that can run open source software is a key underpinning of the concept. read more…
The Ceph project would like to congratulate the following students on their acceptance to the 2016 Google Summer of Code program, and the Ceph project:
End-to-end Performance Visualization
Improve Overall Python Infrastructure
Over-the-wire Encryption Support
Python 3 Support for Ceph
These five students represent the best of the almost 70 project submissions that we fielded from students around the world. For those not familiar with the Google Summer of Code program, this means that Google will generously fund these students during their summer work.
Thanks to everyone who applied this year, the selection process was made very challenging by the number of highly qualified applicants. We look forward to mentoring students to a successful summer of coding and Open Source, both this year and in the years to come.
This major release of Ceph will be the foundation for the next long-term stable release. There have been many major changes since the Infernalis (9.2.x) and Hammer (0.94.x) releases, and the upgrade process is non-trivial. Please read these release notes carefully.
Who doesn’t loves a high performing Ceph storage cluster. To get this you need to tame it , i mean not only Ceph tuning but also Network needs to be tuned. The quickest way to tune your network is to enable Jumbo Frames.
What are they :
They are ethernet frames with payload more than 1500 MTU
Can significantly improve network performance by making data transmission efficient.
Requires Gigabit ethernet
Most of the enterprise network device supports Jumbo Frames
Some people also call them ‘Giants’
Enabling Jumbo Frames
Make sure your switch port is configured to accept Jumbo frames
On server side , set your network interface MTU to 9000
# ifconfig eth0 mtu 9000
Make changes permanent by updating network interface file and restart network services
This is a quick note about Ceph networks, so do not expect anything lengthy here :).
Usually Ceph networks are presented as cluster public and cluster private.
However it is never mentioned that you can use a separate network for the monitors.
This might sound obvious for some people but it is completely possible.
The only requirement of course is to have this monitor network accessible from all the Ceph nodes.
We can then easily imagine 4 VLANs:
I know this does not sound much, but I’ve been hearing this question so many times :).
You have probably already be faced to migrate all objects from a pool to another, especially to change parameters that can not be modified on pool. For example, to migrate from a replicated pool to an EC pool, change EC profile, or to reduce the number of PGs…
There are different methods, depending on the contents of the pool (RBD, objects), size…
The simple way
The simplest and safest method to copy all objects with the “rados cppool” command.
However, it need to have read only access to the pool during the copy.
For example for migrating to an EC pool :
ceph osd pool create $pool.new 4096 4096 erasure default
rados cppool $pool$pool.new
ceph osd pool rename $pool$pool.old
ceph osd pool rename $pool.new $pool
But it does not work in all cases. For example with EC pools : “error copying pool testpool => newpool: (95) Operation not supported”.
Using Cache Tier
This must to be used with caution, make tests before using it on a cluster in production. It worked for my needs, but I can not say that it works in all cases.
I find this method interesting method, because it allows transparent operation, reduce downtime and avoid to duplicate all data. The principle is simple: use the cache tier, but in reverse order.
At the begning, we have 2 pools : the current “testpool”, and the new one “newpool”
In ceph osd dump you should see something like that :
--> pool 58 'testpool' replicated size 3 .... tier_of 80
Now, all new objects will be create on new pool :
Now we can force to move all objects to new pool :
rados -p testpool cache-flush-evict-all
Switch all clients to the new pool
(You can also do this step earlier. For example, just after the cache pool creation.)
Until all the data has not been flushed to the new pool you need to specify an overlay to search objects on old pool :
ceph osd tier set-overlay newpool testpool
In ceph osd dump you should see something like that :
This bug fix release fixes a few critical issues with CRUSH. The most important addresses a bug in feature bit enforcement that may prevent pre-hammer clients from communicating with the cluster during an upgrade. This only manifests in some cases (for example, when the ‘rack’ type is in use in the CRUSH map, and possibly other cases), but for safety we strongly recommend that all users use 0.94.1 instead of 0.94 when upgrading.
There is also a fix in the new straw2 buckets when OSD weights are 0.
We recommend that all v0.94 users upgrade.
crush: fix divide-by-0 in straw2 (#11357 Sage Weil)
crush: fix has_v4_buckets (#11364 Sage Weil)
osd: fix negative degraded objects during backfilling (#7737 Guang Yang)