Ceph Days Silicon Valley 2025

Bringing Ceph to Silicon Valley, California ¶
A full-day event dedicated to sharing Ceph’s transformative power and fostering the vibrant Ceph community with the community in Silicon Valley, California!
The expert Ceph team, Ceph’s customers and partners, and the Ceph community join forces to discuss things like the status of the Ceph project, recent Ceph project improvements and roadmap, and Ceph community news. The day ends with a networking reception, to foster more Ceph learning.
Important Dates ¶
- CFP Opens: 2025-01-13
- CFP Closes: 2025-02-21
- Speakers receive confirmation of acceptance: 2025-02-28
- Schedule Announcement: 2025-03-07
- Event Date: 2025-03-25
Schedule ¶
Time | Abstract | |
8:00 AM | Check-in and Breakfast | |
9:00 AM | Welcome | ![]() |
9:10 AM | Keynote - State of Ceph A look at the Ceph roadmap, current development priorities, and the latest activity in the Ceph community. | ![]() |
9:30 AM | Ceph Operations at scale In this presentation we will go over DigitalOcean's journey with Ceph as the primary storage backend for Block and Object workloads and how we automate, monitor, alert, and operate Ceph day-to-day. | ![]() |
10:00 AM | MSR(Mutli-Step Retry): An generalization of CRUSH allowing multiple OSDs per failure domain There are use cases where one might, for example, want to spread an 8+6 erasure coded pool such that no host (or rack) has more than 4 shards. Existing CRUSH rules struggle with this because CHOOSELEAF is the only way to allow an out OSD to be mapped to another failure domain, but CHOOSELEAF does not allow the placement of more than one OSD per failure domain. MSR rules generalize the CRUSH algorithm to allow retrying the full sequence of selections while still respecting placement limitations. This talk will describe the algorithm, implementation, and use cases. | ![]() |
10:30 AM | Tea/Coffee Break | |
11:00 AM | 9 years of Ceph at Walmart This talk covers the humble beginnings of Ceph at Walmart aimed to provide reliable, flexible, future-proof storage for our on-premises cloud, and how it has evolved in supporting Walmart's triplet cloud model, challenges that were uncovered operating Ceph at large scale supporting variety of use-cases ranging from latency sensitive databases, eCommerce applications, backups and others. | ![]() |
11:30 AM | Supporting 3 Availability Zones Stretch Cluster A Ceph cluster stretched across 3 zones faces a potential scenario where data loss can occur due to unforeseeable circumstances. An example of such a scenario is when we have 6 replicas spread across 3 datacenters with a min_size of 3 and the setup is intended to prevent I/O from happening when there is only 1 datacenter available, however, there is an edge case where a placement group (PG) becomes available due to a lack of safeguarding during the process of temporary PG mappings in order ensure data availability. This scenario poses a risk when the sole surviving data center accepts writes, and then suddenly the 2 unavailable data centers come back up. At the same time, the surviving data center suddenly goes down, which means we would have a data loss situation. To prevent such a scenario from happening, we created a solution that utilizes an existing feature in stretch mode that would restrict how we choose the OSDs that would go into the acting set of a PG. This talk will take a deep dive into how this feature is implemented in the latest Ceph upstream as well as other features that improve the user experience with stretch cluster in the latest Ceph upstream release. | ![]() |
12:00 PM | Zero trust data lakehouse This talk examines the integration of Ceph with Apache Polaris, an advanced technical catalog for Apache Iceberg. Polaris introduces credential vending, where it generates session tokens for engines to use with object stores that are scoped according to catalog namespace and table policies. In doing so, table and namespace level access controls are enforced at the storage level, instead of requiring the engine itself to be a trusted policy enforcement point. We will demonstrate the integration, and explain in full detail the mechanics of how this functionality works in conjunction with Ceph’s IAM and STS capabilities. | ![]() |
12:30 PM | Lunch | |
1:30 PM | A Solid case for NVMe How to prepare for the future by embracing NVMe and when to leave spinning disks in the past. | ![]() |
2:00 PM | Cephadm at Scale for Detractors: Why it's time to reconsider containerized Ceph deployments Let's face it: Cephadm hasn't won over everyone. Many longtime Ceph users still prefer package-based deployments with custom automation, arguing that containers add unnecessary complexity and reduce control. In this talk, we'll take a cold, hard look at Cephadm: why it exists, what problems it solves (and creates), and how it compares to previous deployment tools like Ceph-deploy, Ceph-Ansible, DeepSea, and Rook. We'll look at the main concerns of Cephadm's detractors, from container overhead to troubleshooting, and discuss whether those concerns are still valid today. Most importantly, we'll look at real-world results, starting with the Pawsey 100-node, 4000-OSD scale test, which proved that Cephadm is not just for toy clusters. We'll also cover the latest usability improvements and what's next for Cephadm. | ![]() |
2:30 PM | CephFS Basics & What's New We start with a light introduction on the CephFS architecture and modern components, with a focus on newer components such as the ceph-mgr’s role in working with a CephFS filesystem. Then we move on to new features: the mgr/volume has grown to a complete solution for scalably handling internal and public cloud filesystems, and has grown new capabilities such as server-side quiescing that enables multi-client, multi-volume consistent snapshots. Filesystem protocol integration with samba and NFS-Ganesha has dramatically improved. Hear about these and other new features! With any leftover time, we will preview the development roadmap for where we hope to take CephFS in the future. | ![]() |
3:00 PM | Snack Break | |
3:30 PM | Ceph Telemetry - The Why, What, and How Whether you are a Ceph user or a developer, you have probably wondered at some point: How many Ceph clusters are out there? What Ceph versions are they running? What does their storage capacity distribution look like? Answers to these questions and more are available thanks to Ceph’s telemetry module. In this session, we will deep dive into this module and explore the value it brings to users and developers alike. | ![]() |
4:00 PM | Optimizing Ceph for Scale: Lessons from Large-Scale Operations and Performance Tuning Ceph is a powerful, scalable storage solution, but operating it effectively at scale presents unique challenges. In this talk, I’ll share insights from years of experience managing and supporting some of the largest Ceph deployments. We’ll explore best practices for maintaining stability, ensuring high availability, and optimizing performance for demanding workloads. Key topics will include tuning Ceph for large-scale customers, troubleshooting common performance bottlenecks, and strategies for extracting the best possible performance from Ceph. Whether you’re running a growing cluster or supporting mission-critical workloads, these practical lessons will help you get the most out of your Ceph deployment. | ![]() |
4:30 PM | Closing Panel and Remarks | |
5:00 PM | Networking Reception | 6:00 PM | Event Close |