<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Ceph &#187; RGW</title>
	<atom:link href="http://ceph.com/category/rgw/feed/" rel="self" type="application/rss+xml" />
	<link>http://ceph.com</link>
	<description></description>
	<lastBuildDate>Mon, 20 May 2013 20:23:57 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.2</generator>
		<item>
		<title>Ceph is the new black.  It goes with everything!</title>
		<link>http://ceph.com/community/ceph-is-the-new-black-it-goes-with-everything/</link>
		<comments>http://ceph.com/community/ceph-is-the-new-black-it-goes-with-everything/#comments</comments>
		<pubDate>Wed, 17 Oct 2012 12:18:37 +0000</pubDate>
		<dc:creator>scuttlemonkey</dc:creator>
				<category><![CDATA[Community]]></category>
		<category><![CDATA[RADOS]]></category>
		<category><![CDATA[RBD]]></category>
		<category><![CDATA[RGW]]></category>

		<guid isPermaLink="false">http://ceph.com/?p=1390</guid>
		<description><![CDATA[In my (rather brief) time digging in to Ceph and working with the community, most discussions generally boil down to two questions: “How does Ceph work?” and “What can I do with Ceph?” The first question has garnered a fair amount of attention in our outreach efforts. Ross Turk&#8217;s post “More Than an Object Store” [...]<img src="http://track.hubspot.com/__ptq.gif?a=268973&k=14&bu=http%3A%2F%2Fceph.com&r=http%3A%2F%2Fceph.com%2Fcommunity%2Fceph-is-the-new-black-it-goes-with-everything%2F&bvt=rss&p=wordpress" style="float:left;" xml:base="http://ceph.com/feed/" width="1" height="1" border="0" align="right"/>]]></description>
			<content:encoded><![CDATA[<p>In my (rather brief) time digging in to Ceph and working with the community, most discussions generally boil down to two questions: <em>“How does Ceph work?”</em> and <em>“What can I do with Ceph?”</em> The first question has garnered a fair amount of attention in our outreach efforts. Ross Turk&#8217;s post “<a title="More Than an Object Store" href="http://ceph.com/community/more-than-an-object-store/" target="_blank">More Than an Object Store</a>” does a fantastic job summarizing Ceph&#8217;s magic. The second question is what I will address below.</p>
<p>So what <em>can</em> you do with Ceph? For those who like to read the ending first, the answer turns out to be “a blindingly awesome ton.” Thankfully that doesn’t spoil it for the rest of us, because it’s the details that make it fun. In an email discussion of these details, it was Inktank’s chief suit, Bryan Bogensberger, who managed to succinctly summarize many of the available options while still citing examples and supporting data. (How do you like that, a business guy who has a solid handle on the tech. How lucky are we!?) Without immediately overwhelming you with all the supporting details, his list was as follows:</p>
<p><span id="more-1390"></span></p>
<ul>
<li>Enable the Public Cloud</li>
<li>Enable the Private Cloud</li>
<li>Support service providers replacing legacy storage</li>
<li>Act as a replacement for HDFS</li>
<li>Act as a replacement for enterprise storage</li>
<li>Serve as a Lustre replacement</li>
<li>Provide a platform for Application Development</li>
</ul>
<p>I would actually add one more:</p>
<ul>
<li>Act as the basis for loads of academic research, development, and experimentation</li>
</ul>
<p>The cool part about this is that a number of these categories already have early adopters that took one look at Ceph and decided to dive right in, building amazing things on top of it. The combination of Ceph as open source technology and Inktank’s seasoned enterprise veterans have allowed those responsible for Ceph to engage with two disparate communities at a very deep level. Open source enthusiasts have helped with edge cases, testing, patches, and even active development and support. Additionally, many businesses have provided their own external expertise as we build bridges from Ceph to other technologies, interesting problems to solve, as well as active development contributions. This combination is starting to allow Ceph as a technology to spread like wildfire, challenging many expensive alternatives just as cloud challenged traditional infrastructure.</p>
<p>&nbsp;</p>
<h3>Enabling the Public Cloud</h3>
<p style="text-align: center;"><a href="http://ceph.com/community/ceph-is-the-new-black-it-goes-with-everything/attachment/cumulus_clouds_in_fair_weather/" rel="attachment wp-att-1398"><img class="size-medium wp-image-1398 aligncenter" title="Credit: http://commons.wikimedia.org/wiki/File:Cumulus_clouds_in_fair_weather.jpeg" src="http://ceph.com/wp-content/uploads/2012/11/Cumulus_clouds_in_fair_weather-293x220.jpg" alt="" width="293" height="220" /></a></p>
<p>Several service providers are already availing themselves of the insanely low comparative cost per gigabyte that Ceph allows, building object storage products to compete with incumbent offerings. Because of Ceph’s extensible nature, powerful integrations, and ease of management these service providers are finding themselves with a lot of room to grow, sidestepping the inherent limitations of proprietary technologies. Dreamhost is a prime example of this with their new <a href="http://dreamhost.com/cloud/dreamobjects/" target="_blank">DreamObjects</a> offering. DreamObjects is aimed at being an inexpensive, object-based, cloud storage service that allows users to connect via Amazon S3 and OpenStack Swift compatible APIs. Ceph is giving them the ability to bring this new object store to market quickly, cheaply, and in a way that is extremely easy to scale.</p>
<p>&nbsp;</p>
<h3>Enabling the Private Cloud</h3>
<p>Over the last few years many businesses have been moving to virtual infrastructure as a way to save both time and money. Unfortunately this trades the problem of managing the sprawl of physical racks for the complexity of maintaining the digital sprawl of your virtual infrastructure. IT managers are always looking for a good way to get the most bang for their buck, as well as scale resources across applications. Ceph is a platform with many capabilities, and not just a simple object store; many IT professionals are finding that it can solve many problems at once, extending its usefulness and their budget.</p>
<p>Inktank is already supporting a few companies who are developing their own cloud projects with a cost effective alternative for reliable, scalable cloud storage. Ceph’s modularity and auto-rebalancing of the cluster through intelligent object storage daemons (OSDs) are especially important in the private cloud. They allow a business to spin nodes up or down based on need without downtime or effects on performance. This goes a long way towards having a truly dynamic infrastructure that is so important in today’s virtualized data center. Additionally, there are a number of different ways you can access a Ceph cluster which allows a massively flexible, and self-service, approach for your application developers. Whether you are taking advantage of one of the native APIs or simply mounting your cluster with CephFS, once the cluster is available, application developers can interface with it in a way that makes the most sense for their individual project. This allows ops people to focus on ops and application developers to focus on building the best software possible.</p>
<p>A great example of what Ceph can do in a private cloud can be seen in <a href="http://www.pistoncloud.com/" target="_blank">Piston Cloud’s</a> OpenStack product. Piston Cloud has created a tool that auto-configures a whole rack of servers as OpenStack nodes with Ceph providing the block storage. This offering installs in under 10 minutes and allows customers to evaluate all of OpenStack’s services without having to configure several different machines. Piston’s implementation also promises both easy deployment and seamless upgrade from a pilot deployment to a fully supported production environment.</p>
<p>&nbsp;</p>
<h3>Support service providers replacing legacy storage</h3>
<p style="text-align: center;"><a href="http://ceph.com/community/ceph-is-the-new-black-it-goes-with-everything/attachment/backupleftaccent/" rel="attachment wp-att-1399"><img class="size-medium wp-image-1399 aligncenter" title="Credit: http://www.lynxtechnologies.net/Images/BackupLeftAccent.jpg" src="http://ceph.com/wp-content/uploads/2012/11/BackupLeftAccent-153x220.jpg" alt="" width="153" height="220" /></a></p>
<p>Many enterprises are long overdue for upgrading their storage needs, due in no small part to the cost associated with doing so via existing technologies. Ceph has a leg up on traditional enterprise storage through being both open source and driven primarily by commodity hardware. Several companies have already come to the conclusion that while reliability and scalability can be enhanced, the real driving force is the long term effects on the bottom line. Ceph is a technology that can grow with a business, through the expert help of someone like Inktank or the internal expertise developed as with any other technology.</p>
<p>&nbsp;</p>
<h3>HDFS / Enterprise Storage / Lustre Replacement</h3>
<p>Ceph actually originated as an alternative to Lustre that would provide better scalability and performance, and has offered a number of other interesting enhancements as well. One of the interesting parts about Ceph is that it has the ability to maintain multiple metadata server daemons, which makes file system access much more efficient. This particular approach is also what makes it an <a href="http://static.usenix.org/publications/login/2010-08/openpdfs/maltzahn.pdf" target="_blank">ideal replacement for the Hadoop Distributed File System</a> (HDFS), especially when <a href="http://www.itworld.com/big-datahadoop/262612/ceph-extends-storage-open-scalability" target="_blank">compared to HDFS’s single name-node architecture</a>. Several folks are already experimenting with blending a Ceph back end with a Hadoop front end and we can&#8217;t wait to see the results.</p>
<p>In addition to the strictly technical advantages, Ceph’s open source nature and ability to run on commodity hardware also makes it a very attractive offering for Enterprise data storage, especially as it relates to the bottom line. The ability to deploy Ceph as a strictly software solution in an existing infrastructure allows businesses to remove the ongoing cost and difficulty of maintaining a separate appliance-based solution. This allows for many options when it comes to extensibility and flexibility, which are key to the long-term viability of an enterprise environment. The level of data ownership and control available to an integrated solution also mitigates a lot of risk when it comes to future migrations or data access requirements.</p>
<p>&nbsp;</p>
<h3>Application development</h3>
<p>As mentioned earlier, Ceph is so much <a title="More Than an Object Store" href="http://ceph.com/community/more-than-an-object-store/" target="_blank">more than just an object store</a>: it is a whole platform that can be used in a myriad of different ways. Two of the key ways we see developers engaging with Ceph are through the client library (directly via librados) and via our RADOS gateway (radosgw). Librados gives developers full control over Ceph’s object store via C, C++, Java, Python, Ruby, or PHP and offers the most advanced functionality of the four types of interfaces with RADOS. The massive horizontal scaling and fault tolerance of object stores are ideal for many of the big data operations that businesses are finding problematic from storing and loading virtual machine images to archival of video surveillance. Regardless of what interesting applications people decide to build though, librados gives you the best tools to build them.</p>
<p>The RADOS gateway provides a REST interface compatible with applications written for Swift or Amazon’s S3. This allows developers to take advantage of the Ceph platform without having to rewrite their applications. Developers can immediately realize the advantages offered by Ceph like the financial benefits of commodity hardware or the time savings of self-healing storage devices. Amazon has done a tremendous job of building and proving out the cloud model for application development; now anyone can build their own version of S3 and take advantage of, or improve upon, those benefits.</p>
<p>&nbsp;</p>
<h3>Research, development, and experimentation</h3>
<p style="text-align: center;"><a href="http://ceph.com/community/ceph-is-the-new-black-it-goes-with-everything/attachment/research/" rel="attachment wp-att-1400"><img class="size-medium wp-image-1400 aligncenter" title="Credit: http://www.observera.com/images/research.jpg" src="http://ceph.com/wp-content/uploads/2012/11/research-293x220.jpg" alt="" width="293" height="220" /></a></p>
<p>Ceph’s unique characteristics and modular architecture also make it an ideal candidate as the subject of purely academic study. The most obvious research pathway is the “Controlled Replication Under Scalable Hashing” (CRUSH) algorithm. CRUSH can be described as follows:</p>
<blockquote><p>“CRUSH works by describing the storage cluster in a hierarchy that reflects its physical organization. For example, let&#8217;s say each host has three disks, each rack has 30 hosts, and we have some number of racks. The result is a hierarchy of racks, hosts, and devices.”</p></blockquote>
<p>The nature of this CRUSH algorithm allows you to pass three things in (the placement group, latest state of the cluster, and a crush map) and it can always calculate the location of your data object. CRUSH is both repeatable and fluid, since the cluster can change and CRUSH will always know how to adapt to the new layout. CRUSH is also fully configurable allowing you to specify things like how many times your data should be replicated and what kind of weighting should be applied. This allows us to ask a few potentially interesting questions, like “to what other types of problems could the CRUSH algorithm be applied?” or “how can we extend or refine the CRUSH algorithm?”</p>
<p>In any good academic study, it is important to have very controlled circumstances (to the extent possible). The modular nature of Ceph allows you to limit the number of variables in the equation by only running the parts of the platform that you require. If your particular case doesn’t require a RESTful interface, for instance, you have the ability to turn off an entire part of the platform (in this case, radosgw). This distinct separation of function allows you to scale out the pieces you need without any unnecessary overhead or interdependency. This is helpful for study and also useful in production environments!</p>
<p>Another application of pure study could be uses and extensions to the object store, RADOS (Reliable, Autonomic, Distributed Object Store). RADOS is comprised of two kinds of components: the monitors (ceph-mon), which keep track of which nodes are in operation at any given time, and the object storage daemons (ceph-osd) themselves. The incredibly cool part about RADOS is that the storage nodes have a certain level of “intelligence” built into them and have the ability to be self-healing, self-managing, and smarter than your average bear. The potential for other applications (or just enhancing this intelligence) is quite vast, especially as it relates to Ceph&#8217;s object model. Objects in Ceph have properties (as objects do), but you can also build extensions that give them methods like makeThumbnail() or MD5encrypt(). Tools like this could provide an enterprising developer hours of enjoyment, and we look forward to helping them experiment!</p>
<p>Other avenues of study could incorporate things like cluster power efficiency, multi data center research, plain old optimization tweakery, or many other things that we haven’t even thought of yet. Ceph has provided a veritable “wild west” of opportunities for research, development, and experimentation, and our community has responded in kind with the best creativity and ingenuity an open source project could hope for.</p>
<p>&nbsp;</p>
<h3>Conclusion</h3>
<p>Now that you have read the details, you can see our skip-to-the-end conclusion of “a blindingly awesome ton” was pretty accurate, even with today’s list. This list grows every day thanks to the creativity of our community. We are all deeply excited to see what fancy new cloud apps, massive data applications, or other incredibly creative new tools might be built on top of Ceph tomorrow! If you have questions, ideas, or requests please feel free to snag us at one of the stops on our rigorous trade show schedule, on irc (irc.oftc.net #ceph), or on Twitter (<a href="http://twitter.com/ceph" target="_blank">@Ceph</a> or <a href="http://twitter.com/inktank" target="_blank">@Inktank</a>). We’d love to hear from you.</p>
<img src="http://track.hubspot.com/__ptq.gif?a=268973&k=14&bu=http%3A%2F%2Fceph.com&r=http%3A%2F%2Fceph.com%2Fcommunity%2Fceph-is-the-new-black-it-goes-with-everything%2F&bvt=rss&p=wordpress" style="float:left;" xml:base="http://ceph.com/feed/" width="1" height="1" border="0" align="right"/>]]></content:encoded>
			<wfw:commentRss>http://ceph.com/community/ceph-is-the-new-black-it-goes-with-everything/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Atomicity of RESTful radosgw operations</title>
		<link>http://ceph.com/dev-notes/atomicity-of-restful-radosgw-operations/</link>
		<comments>http://ceph.com/dev-notes/atomicity-of-restful-radosgw-operations/#comments</comments>
		<pubDate>Mon, 07 Nov 2011 21:42:26 +0000</pubDate>
		<dc:creator>yehuda</dc:creator>
				<category><![CDATA[Dev notes]]></category>
		<category><![CDATA[RADOS]]></category>
		<category><![CDATA[RGW]]></category>

		<guid isPermaLink="false">http://ceph.newdream.net/?p=343</guid>
		<description><![CDATA[A while back we worked on radosgw doing atomic reads and writes. The first issue was making sure that two or more concurrent writers that write to the same object don’t end up with an inconsistent object. That is the &#8220;atomic PUT&#8221; issue. We also wanted to be able to make sure that when one [...]<img src="http://track.hubspot.com/__ptq.gif?a=268973&k=14&bu=http%3A%2F%2Fceph.com&r=http%3A%2F%2Fceph.com%2Fdev-notes%2Fatomicity-of-restful-radosgw-operations%2F&bvt=rss&p=wordpress" style="float:left;" xml:base="http://ceph.com/feed/" width="1" height="1" border="0" align="right"/>]]></description>
			<content:encoded><![CDATA[<p>A while back we worked on radosgw doing atomic reads and writes.</p>
<p>The first issue was making sure that two or more concurrent writers that write to the same object don’t end up with an inconsistent object. That is the &#8220;atomic PUT&#8221; issue.</p>
<p>We also wanted to be able to make sure that when one client reads an object via radosgw while another client writes to the same object, the result is consistent. That is, when reading an object a client should get either the old or the new version of the object, and never a mix of the two. That is the &#8220;atomic GET&#8221; issue.</p>
<p>Radosgw is built directly on top of RADOS and is a prime example of a librados user. The basic issue is that radosgw streams the objects from or to the RADOS objects with a series of relatively small reads or writes. For the atomic PUT and atomic GET we didn&#8217;t want to introduce locking. Locking would solve the issue, but implementing it on top of RADOS would not have been trivial, and would have affected scalability and the relative simplicity of the gateway. The Ceph distributed file system implements locking in the metadata server (as part of its POSIX file locking support), and introducing that in the gateway would require holding state on each object and synchronizing it between the different gateway instances. We didn’t want to reimplement the MDS again.</p>
<p><strong>Atomic PUT</strong></p>
<p>When radosgw reads or writes an object it can issue multiple read or write librados requests to the RADOS backend. One RADOS feature is that each single operation is atomic. The problem is that for sufficiently large object (which are not too large in any case) we issue multiple write operations, and could end up with an interleaved object.</p>
<p>The solution for the atomic PUT is to write the object into a temporary object. Once the temp object is completely written, we issue a single librados clone-range operation that atomically clones the entire temp object to the destination. Once the data is there we remove the temp object. This is equivalent to write to a temporary file and renaming it over the target when we finish.</p>
<p>Since the RADOS backend is distributed, we need to make sure that both the temp object and the target object will be located in the same placement group (and on the same OSD). Usually the object location is determined by the object name, but for this purpose we used the &#8220;object locator&#8221; feature, which allows us to provide alternative string that is fed into the hash function. In this case we use the target object name as the object locator for the temporary object, ensuring that both objects end up on the same placement group on the same node so that the clone operation can work.</p>
<p><strong>Atomic GET</strong></p>
<p>With atomic PUT we know that the objects are consistent. However, this doesn’t help with clients reading when an object is being written. Since there can be multiple librados read operations for a single GET, some of the reads may happen before the object is replaced and some may happen after that, leading to an inconsistent &#8220;torn&#8221; result.</p>
<p>In addition to atomic operations, RADOS has a nice feature called compound operations which allow you to send a few operations that are bundled together and applied atomically. If one of the operations fail, nothing is applied. We use this for atomic PUT in order to set both data and metadata on the target object in a single atomic operation.</p>
<p>For the atomic GET we introduce an object &#8220;tag,&#8221; which is a random value that we generate for each PUT and store as an object attribute (xattr). When radosgw writes to an object it first checks for an existing object and fetches its tag (which it can do atomically). If the object exists it clones it to a new object with the tag as a suffix (taking necessary steps to avoid name collisions) and the original object name as the locator. The compound clone operation looks like:</p>
<ol>
<li>check to see if object &lt;name&gt; tag attribute is &lt;tag&gt;</li>
<li>clone to &lt;name&gt;_&lt;tag&gt;</li>
</ol>
<p>The first operation is a guard to make sure that the object hasn&#8217;t been rewritten since we first read it. (Had it been rewritten, we need to restart the whole operation and reread the tag.) We put the same guard when we write the new object instance, to make sure that there was no racing operation.</p>
<p>A client that reads the object also starts by reading the tag, and putting the same guard before each subsequent read operation. If the guard fails, the client knows that the object has been rewritten. However, it also knows that since it has been rewritten, the object that it started reading can now be found at &lt;name&gt;_&lt;tag&gt;. So, reading of an object named foo looks like this:</p>
<ul>
<li>read object foo tag -&gt; 123</li>
<li>verify object foo tag is &#8220;123&#8243;; read object foo (offset = 0, size = 512K) -&gt; ok, read 512K</li>
<li>check object foo tag is &#8220;123&#8243;; read object foo (offset = 512K, size = 512K) -&gt; not ok, object was replaced</li>
<li>read object foo_123 (offset = 512K, size = 512K) -&gt; ok, read 512K</li>
</ul>
<p>The final component is an intent log. Since we end up creating multiple instances of the same object under different names, we need to make sure that these object are cleaned up after some reasonable amount of time. We added a log object which we record each such object that needs to be removed. After a sufficient amount of time (however long we expect very slow GETs to still succeed), a process iterates over the log and removes old objects.</p>
<img src="http://track.hubspot.com/__ptq.gif?a=268973&k=14&bu=http%3A%2F%2Fceph.com&r=http%3A%2F%2Fceph.com%2Fdev-notes%2Fatomicity-of-restful-radosgw-operations%2F&bvt=rss&p=wordpress" style="float:left;" xml:base="http://ceph.com/feed/" width="1" height="1" border="0" align="right"/>]]></content:encoded>
			<wfw:commentRss>http://ceph.com/dev-notes/atomicity-of-restful-radosgw-operations/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
	</channel>
</rss>
