Planet Ceph

Aggregated news from external sources

  • January 5, 2017
    ceph 的crush算法 straw

    很多年以前,Sage 在写CRUSH的原始算法的时候,写了不同的Bucket类型,可以选择不同的伪随机选择算法,大部分的模型是基于RJ Honicky写的RUSH algorithms 这个算法,这个在网上可以找到资料,这里面有一个新的特性是sage很引以为豪的,straw算法,也就是我们现在常用的一些算法,这个算法有下面的特性: items 可以有任意的weight 选择一个项目的算法复杂度是O(n) 如果一个item的weight调高或者调低,只会在调整了的item直接变动,而没有调整的item是不会变动的 O(n)找到一个数组里面最大的一个数,你要把n个变量都扫描一遍,操作次数为n,那么算法复杂度是O(n)冒泡法的算法复杂度是O(n²) 这个过程的算法基本动机看起来像画画的颜料吸管,最长的一个将会获胜,每个item 基于weight有自己的随机straw长度 这些看上去都很好,但是第三个属性实际上是不成立的,这个straw 长度是基于bucket中的其他的weights来进行的一个复杂的算法的,虽然iteam的PG的计算方法是很独立的,但是一个iteam的权重变化实际上影响了其他的iteam的比例因子,这意味着一个iteam的变化可能会影响其他的iteam 这个看起来是显而易见的,但是事实上证明,8年都没有人去仔细研究底层的代码或者算法,这个影响就是用户做了一个很小的权重变化,但是看到了一个很大的数据变动过程,sage 在做的时候写过一个很好的测试,来验证了第三个属性是真的,但是当时的测试只用了几个比较少的组合,如果大量测试是会发现这个问题的 sage注意到这个问题也是很多人抱怨在迁移的数据超过了预期的数据,但是这个很难量化和验证,所以被忽视了很久 无论如何,这是个坏消息 好消息是,sage找到了如何解决分布算法来的实现这三个属性,新的算法被称为 ‘straw2’,下面是不同的算法straw的算法 max_x = -1max_item = -1for each item: x = random value from 0..65535 x *= scaling factor if x > max_x: max_x = x max_item = itemreturn item 这个就有问题了scaling factor(比例因子) 是其他iteam的权重所有的,这个就意味着改变A的权重,可能会影响到B和C的权重了 新的straw2的算法是这样的 max_x = -1max_item …Read more

  • January 3, 2017
    rbd的image对象数与能写入文件数的关系

    一、前言 收到一个问题如下: 一个300TB 的RBD,只有7800万的objects,如果存储小文件的话,感觉不够用 对于这个问题,我原来的理解是:对象默认设置的大小是4M一个,存储下去的数据,如果小于4M,就会占用一个小于4M的对象,如果超过4M,那么存储的数据就会进行拆分成多个4M,这个地方其实是不严谨的 对于rados接口来说,数据是多大对象put进去就是多大的对象,并没有进行拆分,进行拆分的是再上一层的应用,比如rbd,比如cephfs 那么对于rbd的image显示的对象数目和文件数目有什么关系呢?本篇将来看看这个问题,到底会不会出现上面的问题 二、实践过程 创建一个image [root@lab8106 ~]# rbd create –image zpsize –size 100M[root@lab8106 ~]# rbd info zpsizerbd image ‘zpsize’: size 102400 kB in 25 objects order 22 (4096 kB objects) block_name_prefix: rbd_data.85c66b8b4567 format: 2 features: layering flags: 可以看到,这个image从集群中分配到了25个对象,每个对象的大小为4M,假如我们写入1000个小文件看下会是什么情况 映射到本地并且格式化xfs文件系统 [root@lab8106 ~]# rbd map zpsize/dev/rbd0[root@lab8106 ~]# mkfs.xfs -f /dev/rbd0 meta-data=/dev/rbd0 isize=256 agcount=4, …Read more

  • January 3, 2017
    Storage just in time

    Storage is one of the pillar of IT infrastructure which used to be dominated by big and costly storage appliances. As software defined storage becomes commonplace, we now have the opportunity to apply just-in-time principles to the storage world. This is what I will go through in this article. History Since the early 2000s big …Read more

  • January 2, 2017
    Ceph, the future of storage, incoming features blog series

    Happy New year! Bonne Année ! Best wishes to my readers :). C’est le turfu and Ceph is moving fast, really fast and you won’t believe how many awesome features are currently in the pipe. So to start the year off the wheels, I’m planning on publishing a set of articles to tease you a …Read more

  • December 26, 2016
    处理Ceph osd的journal的uuid问题

    一、前言之前有一篇文章介绍的是,在centos7的jewel下面如果自己做的分区如何处理自动挂载的问题,当时的环境对journal的地方采取的是文件的形式处理的,这样就没有了重启后journal的磁盘偏移的问题如果采用的是ceph自带的deploy去做分区的处理的时候,是调用的sgdisk去对磁盘做了一些处理的,然后deploy能够识别一些特殊的标记,然后去做了一些其他的工作,而自己分区的时候,是没有做这些标记的这样就可能会有其他的问题 我们看下如何在部署的时候就处理好journal的uuid的问题 二、实践 2.1 按常规流程部署OSD 准备测试的自分区磁盘 dd if=/dev/zero of=/dev/sde bs=4M count=100;dd if=/dev/zero of=/dev/sdf bs=4M count=100; parted /dev/sde mklabel gpt;parted /dev/sdf mklabel gpt;parted /dev/sde mkpart primary 1 100%;parted /dev/sdf mkpart primary 1 100% 使用的sde1作为数据盘,使用sdf1作为ssd的独立分区的journal磁盘 我们线按照常规的步骤去部署下 做osd的prepare操作 [root@lab8106 ceph]# ceph-deploy osd prepare lab8106:/dev/sde1:/dev/sdf1···[lab8106][WARNIN] adjust_symlink: Creating symlink /var/lib/ceph/tmp/mnt.7HuS8k/journal -> /dev/sdf1··· 做osd的activate操作 [root@lab8106 ceph]# ceph-deploy osd activate lab8106:/dev/sde1:/dev/sdf···[lab8106][WARNIN] ceph_disk.main.Error: …Read more

  • December 23, 2016
    Ceph Rados Gateway and NFS

    I guess you got lucky, or maybe I felt so bad not posting anything for more than a month but here it is the last blog post of the year :). With the latest release of Ceph, Jewel, a new Rados Gateway feature came out. This feature hasn’t really been advertised yet so I thought …Read more

  • December 19, 2016
    Busy busy days!

    Oh wait is it the end of the year already? Dear readers, no sebastien-han.fr is not dead (it actually got a bit more color since yesterday) and yes I know… I haven’t posted anything for more than a month I’ve been having a hard time keeping up with the pace I’ve committed too, sorry about …Read more

  • December 16, 2016
    Ceph RGW AWS4 presigned URLs working with the Minio Cloud client

    Some fellows are using the Minio Client (mc) as their primary client-side tool to work with S3 cloud storage and filesystems. As you may know, mc works with the AWS v4 signature API and it provides a modern alternative under the Apache 2.0 License to UNIX commands (ls, cat, cp, diff, etc). In the case …Read more

  • December 13, 2016
    Rapido: A Glorified Wrapper for Dracut and QEMU

    Introduction I’ve blogged a few of times about how Dracut and QEMU can be combined to greatly improve Linux kernel dev/test turnaround. My first post covered the basics of building the kernel, running dracut, and booting the resultant image with qemu-kvm. A later post took a closer look at network configuration, and focused on bridging …Read more

  • December 5, 2016
    Ceph ansible is building its community

    This blog just relays what the initial announcement of the ceph-ansible mailing list. Hello community! ceph-ansible has been growing quite decently for the last couple of year. I’m glad to see that we now have so many users and contributors. We are currently implementing a release process within ceph-ansible, where we will certify stable releases …Read more

  • November 27, 2016
    The Dos and Don'ts for Ceph for OpenStack

    Ceph and OpenStack are an extremely useful and highly popular combination. Still, new Ceph/OpenStack deployments frequently come with easily avoided shortcomings — we’ll help you fix them! Do use show_image_direct_url and the Glance v2 API With Ceph RBD (RADOS Block Device), you have the ability to create clones. You can think of clones as the …Read more

  • November 25, 2016
    How to increase debug levels and harvest a detailed OSD log

    Your OSD doesn’t start and you want to find out why. Here’s how to increase the debug levels and harvest a detailed OSD log: First, rotate the OSD log, or just do “cd /var/log/ceph ; mv ceph-osd.0.log ceph-osd.0.log-foo” Then, edit /etc/ceph/ceph.conf to add the following lines to the [osd] section: [osd] debug osd = 20 …Read more

Careers