Planet Ceph

Aggregated news from external sources

  • June 14, 2017
    调整PG分多次调整和一次到位的迁移差别分析

    前言 这个问题来源于我们研发的一个问题,在进行pg调整的时候,是一次调整到位好,还是分多次调整比较好,分多次调整的时候会不会出现某个pg反复挪动的问题,造成整体迁移量大于一次调整的 最近自己的项目上也有pg调整的需求,这个需求一般来源于pg规划好了,后期出现节点扩容的情况,需要对pg进行增加的调整 本篇用具体的数据来分析两种方式的差别 因为本篇的篇幅较长,直接先把结论拿出来 数据结论 调整pg 迁移pg 迁移对象 1200->1440 460 27933 1440->1680 458 27730 1680->1920 465 27946 1920->2160 457 21141 2160->2400 458 13938 总和 2305 132696 调整pg 迁移pg 迁移对象 1200->2400 2299 115361 结论:分多次调整的时候,PG迁移量比一次调整多了6个,多了0.2%,对象的迁移量多了17335,多了15% 从数据上看pg迁移的数目基本一样,但是数据量是多了15%,这个是因为分多次迁移的时候,在pg基数比较小的时候,迁移一个pg里面的对象要比后期分裂以后的对象要多,就产生了这个数据量的差别 从整体上来看二者需要迁移的pg基本差不多,数据量上面会增加15%,分多次的时候是可以进行周期性调整的,拆分到不同的时间段来做,所以各有好处 实践 环境准备 本次测试采用的是开发环境,使用开发环境可以很快的部署一个需要的环境,本次分析采用的就是一台机器模拟的4台机器48个 4T osd的环境 环境搭建 生成集群 ./vstart.sh -n –mon_num 1 –osd_num 48 –mds_num 1 –short -d 后续操作都在源码的src目录下面执行 设置存储池副本为2 …Read more

  • June 11, 2017
    SUSE Enterprise Storage 5 Beta Program

    openATTIC 3.x will be part of the upcoming SUSE Enterprise Storage 5 release, which is currently in beta testing. It will be based on the upstream Ceph “Luminous” release and will also ship with openATTIC 3.x and Salt/DeepSea for the orchestration, deployment and management. If you would like to take a look at this release …Read more

  • June 11, 2017
    Update on the State of Ceph Support in openATTIC 3.x (June 2017)

    A bit over a month ago, I posted about a few new Ceph management features that we have been working on in openATTIC 3.x after we finished refactoring the code base. These have been merged into the trunk in the meanwhile, and the team has started working on additional features. In this post, I’d like …Read more

  • June 9, 2017
    Rapido: Quick Kernel Testing From Source (Video)

    I presented a short talk at the 2017 openSUSE Conference on Linux kernel testing using Rapido. There were many other interesting talks during the conference, all of which can be viewed on the oSC 2017 media site.A video of my presentation is available below, and on YouTube. Many thanks to the organisers and sponsors for …Read more

  • June 9, 2017
    使用日志系统graylog获取Ceph集群状态

    前言 在看集群的配置文件的时候看到ceph里面有一个graylog的输出选择,目前看到的是可以收集mon日志和clog,osd单个的日志没有看到,Elasticsearch有整套的日志收集系统,可以很方便的将所有日志汇总到一起,这个graylog的收集采用的是自有的udp协议,从配置上来说可以很快的完成,这里只做一个最基本的实践 系统实践 graylog日志系统主要由三个组件组成的 MongoDB – 存储配置信息和一些元数据信息的,MongoDB (>= 2.4) Elasticsearch – 用来存储Graylog server收取的log messages的,Elasticsearch (>= 2.x) Graylog server – 用来解析日志的并且提供内置的web的访问接口 配置好基础源文件 CentOS-Base.repoepel.repo 安装java 要求版本Java (>= 8) yum install java-1.8.0-openjdk 安装MongoDB 安装软件 yum install mongodb mongodb-server 启动服务并且加入自启动 systemctl restart mongodsystemctl enable mongod 安装完成检查服务启动端口 [root@lab102 ~]# netstat -tunlp|grep 27017tcp 0 0 127.0.0.1:27017 0.0.0.0:* LISTEN 151840/mongod 安装Elasticsearch 倒入认证文件 …Read more

  • June 8, 2017
    VM live migration across the globe with Ceph and Openstack

    Have you ever been in a situation where you had to migrate a VMs across the globe with minimum downtime? VM live migration across Openstack regions. Classic approach The volume import/export capabilities of Openstack are limited, therefore the classic approach is to turn off the VM, copy it to the destination, then attach the volume …Read more

  • June 6, 2017
    Ceph部署mon出现0.0.0.0地址

    前言 最近在群里两次看到出现mon地址不对的问题,都是显示0.0.0.0:0地址,如下所示: [root@lab8106 ceph]# ceph -s cluster 3137d009-e41e-41f0-b8f8-5cb574502572 health HEALTH_ERR 1 mons down, quorum 0,1,2 lab8106,node8107,lab104 monmap e2: 4 mons at {lab104=192.168.10.4:6789/0,lab8106=192.168.8.106:6789/0,lab8107=0.0.0.0:0/2,node8107=192.168.8.107:6789/0} 这个之前偶尔会看到有出现这个问题,但是自己一直没碰到过,想看下是什么情况下触发的,在征得这个cepher的同意后,登录上他的环境检查了一下,发现是主机名引起的这个问题 问题复现 在部署的过程中,已经规划好了主机名,而又去修改了这个机器的主机名的情况下就会出现这个问题比如我的这个机器,开始规划好lab8107主机名是这个,然后再lab8107上执行hostname node8107,就会触发这个问题 这个在deploy的部署输出日志中可以看得到 [lab8107][WARNIN] ********************************************************************************[lab8107][WARNIN] provided hostname must match remote hostname[lab8107][WARNIN] provided hostname: lab8107[lab8107][WARNIN] remote hostname: node8107[lab8107][WARNIN] monitors may not reach quorum and create-keys will not complete[lab8107][WARNIN] ******************************************************************************** 可以看到 provided hostname: …Read more

  • June 1, 2017
    Centos7升级内核后无法启动解决办法

    前言 这个问题存在有一段时间了,之前做的centos7的ISO,在进行内核的升级以后就存在这个问题: 系统盘在板载sata口上是可以正常启动新内核并且能识别面板硬盘 系统盘插在面板口上新内核无法启动,调试发现无法找到系统盘 系统盘插在面板上默认的3.10内核可以正常启动 暂时的解决办法就是让系统插在板载的sata口上,因为当时没找到具体的解决办法,在这个问题持续了一段时间后,最近再次搜索资料的时候,把问题定位在了initramfs内的驱动的问题,并且对问题进行了解决 解决过程 查询initramfs的驱动 [root@lab103 lab103]# lsinitrd -k 3.10.0-327.el7.x86_64|grep mpt[23]sasdrwxr-xr-x 2 root root 0 Apr 17 12:05 usr/lib/modules/3.10.0-327.el7.x86_64/kernel/drivers/scsi/mpt2sas-rw-r–r– 1 root root 337793 Nov 20 2015 usr/lib/modules/3.10.0-327.el7.x86_64/kernel/drivers/scsi/mpt2sas/mpt2sas.ko 可以看到在3.10内核的时候是mpt2sas驱动 可以在4.x内核中看到新版的内核已经把mpt2sas升级为mpt3sas /lib/modules/4.4.46/kernel/drivers/scsi/mpt3sas/mpt3sas.ko 查询initramfs内的模块 lsinitrd -k 4.4.46|grep mpt[23]sas 可以看到并没有输出,说明initramfs并没有把这个驱动打进去 这个地方有两种方式来解决 方法一: 修改 /etc/dracut.conf文件,增加字段 add_drivers+=”mpt3sas” 重新生成initramfs dracut -f /boot/initramfs-4.4.46.img 4.4.46 方法二: 强制加载驱动 dracut –force –add-drivers mpt3sas …Read more

  • May 27, 2017
    A tool to rebalance uneven Ceph pools

    The algorithm to fix uneven CRUSH distributions in Ceph was implemented as the crush optimize subcommand. Given the output of ceph report, crush analyze can show buckets that are over/under filled: $ ceph report > ceph_report.json $ crush analyze –crushmap ceph_report.json –pool 3 ~id~ ~weight~ ~PGs~ ~over/under filled %~ ~name~ cloud3-1363 -6 419424 1084 7.90 …Read more

  • May 23, 2017
    Freebsd10.2安装包升级pkg引起环境破坏的解决

    前言 freebsd10.2环境在安装一个新软件包的时候提示升级pkg到1.10.1,然后点击了升级,然后整个pkg环境就无法使用了 记录 升级完了软件包以后第一个错误提示 FreeBSD: /usr/local/lib/libpkg.so.3: Undefined symbol “utimensat” 这个是因为这个库是在freebsd的10.3当中才有的库,而我的环境是10.2的环境 网上有一个解决办法 更新源 # cat /usr/local/etc/pkg/repos/FreeBSD.confFreeBSD: { url: “pkg+http://pkg.FreeBSD.org/${ABI}/release_2”, enabled: yes} 检查当前版本 # pkg –version1.10.1 更新缓存 # pkg update 卸载 # pkg delete -f pkg 重新安装 # pkg install -y pkg# pkg2ng 检查版本 # pkg –version1.5.4 这个在我的环境下没有生效 还有一个办法 有个pkg-static命令可以使用,,然后/var/cache/pkg里边缓存的包。执行命令: # pkg-static install -f /var/cache/pkg/pkg-1.5.4.txz“` 这个在我的环境下报错“`bashroot@mkiso:/usr/ports/ports-mgmt/pkg # …Read more

  • May 12, 2017
    An algorithm to fix uneven CRUSH distributions in Ceph

    The current CRUSH implementation in Ceph does not always provide an even distribution. The most common cause of unevenness is when only a few thousands PGs, or less, are mapped. This is not enough samples and the variations can be as high as 25%. For instance, when there are two OSDs with the same weight, …Read more

  • May 11, 2017
    Ceph space lost due to overweight CRUSH items

    When a CRUSH bucket contains five Ceph OSDs with the following weights: weight osd.0 5 osd.1 1 osd.2 1 osd.3 1 osd.4 1 20% of the space in osd.0 will never be used by a pool with two replicas. The osd.0 gets 55% of the values for the first replica (i.e 5 / 9), as …Read more

Careers