centos7.8安装ceph-13.2.10-mimic (非常详细)

    科技2022-08-25  122

    1.安装文档

    #创建虚拟机,每台机器3块硬盘 #集群规划,写在/etc/hosts文件里 192.168.244.120 ceph1 #ceph-deploy,mgr,mon 192.168.244.110 ceph2 192.168.244.130 ceph3 #换阿里yum和expl源,每台机器执行 cd /etc/yum.repos.d #备份之前的配置 mv ./CentOS-Base.repo ./CentOS-Base.repo.bak #下载 wget -O /etc/yum.repos.d/CentOS-Base.repo http://mirrors.aliyun.com/repo/Centos-7.repo wget -O /etc/yum.repos.d/epel.repo http://mirrors.aliyun.com/repo/epel-7.repo #建立缓存 yum clean all yum makecache #关闭selinux、防⽕墙 systemctl stop firewalld.service systemctl disable firewalld.service firewall-cmd --state sed -i '/^SELINUX=.*/c SELINUX=disabled' /etc/selinux/config sed -i 's/^SELINUXTYPE=.*/SELINUXTYPE=disabled/g' /etc/selinux/config grep --color=auto '^SELINUX' /etc/selinux/config setenforce 0 getenforce #创建ceph管理用户 sudo useradd -d /home/ceph-admin -m ceph-admin sudo passwd ceph-admin #密码111 echo "ceph-admin ALL = (root) NOPASSWD:ALL" | sudo tee /etc/sudoers.d/ceph-admin sudo chmod 0440 /etc/sudoers.d/ceph-admin su ceph-admin #安装NTP yum install -y ntp ntpdate sudo vi /etc/ntp.conf #ntp服务器配置。加入以下内容 restrict 192.168.244.0 mask 255.255.255.0 nomodify notrap server 192.168.244.110 perfer restrict 192.168.244.110 nomodify notrap noquery #允许120.130访问 restrict 192.168.244.120 restrict 192.168.244.130 server 127.127.1.0 # local clock fudge 127.127.1.0 stratum 10 #ntp客户端配置 注释掉server 0.centos.pool.ntp.org iburst 这4行 server 192.168.244.110 #添加如下一行。允许上层时间服务器主动修改本机时间 restrict 192.168.244.110 nomodify notrap noquery restrict 192.168.244.110 # 外部时间服务器不可用时,以本地时间作为时间服务 server 127.127.1.0 # local clock fudge 127.127.1.0 stratum 10 #开机自启动 sudo service ntpd restart sudo systemctl enable ntpd #在管理节点设置允许无密码 SSH 登录,所有机器切换到ceph-admin用户 #在管理节点 ssh-keygen #一路回车 或者 ssh-keygen -t rsa #修改文件 vi .ssh/config Host ceph1 Hostname ceph1 User ceph-admin Host ceph2 Hostname ceph2 User ceph-admin Host ceph3 Hostname ceph3 User ceph-admin #修改权限,发送过去 sudo chmod 600 config #ssh-copy-id {username}@yournode ssh-copy-id ceph-admin@ceph20 ssh-copy-id ceph-admin@ceph30 #或者sudo ssh-copy-id -i ~/.ssh/id_rsa.pub ceph-admin@192.168.244.130 # 配置sudo不需要tty,在每台机器执行 sed -i 's/Default requiretty/#Default requiretty/' /etc/sudoers 或者在某些发行版(如 CentOS )上,执行 ceph-deploy 命令时,如果你的 Ceph 节点默认设置了 requiretty 那就会遇到报错。可以这样禁用此功能:执行 sudo visudo ,找到 Defaults requiretty 选项, 把它改为 Defaults:ceph !requiretty ,这样 ceph-deploy 就能用 ceph 用户登录并使用 sudo 了。 #前期准备就绪 #设置ceph yum,管理节点换回root,每台机器执行 su vi /etc/yum.repos.d/ceph.repo [ceph] name=ceph baseurl=http://mirrors.aliyun.com/ceph/rpm-mimic/el7/x86_64/ gpgcheck=0 priority=1 [ceph-noarch] name=cephnoarch baseurl=http://mirrors.aliyun.com/ceph/rpm-mimic/el7/noarch/ gpgcheck=0 priority=1 [ceph-source] name=Ceph source packages baseurl=https://mirrors.aliyun.com/ceph/rpm-mimic/el7/SRPMS enabled=0 gpgcheck=1 type=rpm-md gpgkey=https://mirrors.aliyun.com/ceph/keys/release.asc priority=1 #所有节点 yum install -y ceph ceph -v #管理节点 yum install -y ceph-deploy su ceph-admin mkdir ceph-cluster cd ceph-cluster #创建集群 ceph-deploy new ceph1 ls vi ceph.conf 加入 public network = 192.168.244.0/24 [global] mon_allow_pool_delete = true #创建和收集keyring ceph-deploy mon create-initial ceph-deploy --overwrite-conf mon create-initial #出错就sudo ceph-deploy --overwrite-conf config push ceph1 ceph2 ceph3 #添加mon sudo ceph-deploy mon add ceph2 sudo ceph-deploy mon add ceph3 sudo ceph-deploy admin ceph1 ceph2 ceph3 #在每个节点 sudo chmod +r /etc/ceph/ceph.client.admin.keyring #创建mgr sudo ceph-deploy mgr create ceph1 ceph2 ceph3 #如果在某些地⽅碰到麻烦,想从头再来,可以⽤下列命令清除配置: # ceph-deploy purge {ceph-node} [{ceph-node}] # ceph-deploy purgedata {ceph-node} [{ceph-node}] # ceph-deploy forgetkeys # rm -rf /etc/ceph/* # rm -rf /var/lib/ceph/*/* # rm -rf /var/log/ceph/* # rm -rf /var/run/ceph/* # rm -rf /ceph-cluster #擦净硬盘 sudo ceph-deploy disk zap ceph1 /dev/sdb sudo ceph-deploy disk zap ceph2 /dev/sdb sudo ceph-deploy disk zap ceph3 /dev/sdb #创建osd节点 ceph-deploy osd create --data /dev/sdc ceph1 ceph-deploy osd create --data /dev/sdb ceph2 ceph-deploy osd create --data /dev/sdb ceph3 #下面很多内容不用管,可以直接跳到224行 #查看class类型 ceph osd crush class ls #删除默认的class for i in {0..10} ; do ceph osd crush rm-device-class osd.$i;done #将osd编号0-9标记为hdd for i in {0..9} ;do ceph osd crush set-device-class hdd osd.$i;done ceph osd tree ceph osd crush class ls #将osd编号 10-11 标记为 ssd for i in 10 11 12;do ceph osd crush set-device-class ssd osd.$i;done ceph osd tree ceph osd crush class ls 少于5个OSD设置pg_num为128 5到10个OSD设置pg_num为512 10到50个OSD设置pg_num为1024 #创建⼀个 ssd 规则 ceph osd crush rule create-replicated <rule-name> <root> <failure-domain> <class> ceph osd crush rule create-replicated rule-ssd default host ssd ceph osd crush rule create-replicated rule-hdd default host hdd ceph osd crush rule ls #创建⼀个使⽤ rule-ssd 规则的存储池:osd=4 <5 128 ceph osd pool create cache 64 64 rule-ssd #创建⼀个使⽤ rule-hdd 规则的存储池 ceph osd pool create hddpool 128 128 rule-hdd # 将上面创建的ssdpool池绑定至存储池的前端,hddpool即为我们的后端存储池 ceph osd tier add hddpool cache # 设置缓存模式为writeback ceph osd tier cache-mode cache writeback # 将所有客户端请求从标准池引导至缓存池 ceph osd tier set-overlay hddpool cache #设置过滤器 ceph osd pool set cache hit_set_type bloom # 当缓存池中的数据量达到1TB时开始刷盘并驱逐 ceph osd pool set cache target_max_bytes 1099511627776 # 当缓存池中的对象个数达到100万时开始刷盘并驱逐 ceph osd pool set cache target_max_objects 10000000 #定义缓存层将对象刷至存储层或者驱逐的时间: ceph osd pool set cache cache_min_flush_age 600 ceph osd pool set cache cache_min_evict_age 600 #当缓存池的使用量达到其总量的一定百分比时,认为缓存池满了,此时会将未修改的(干净的)对象刷盘 ceph osd pool set cache cache_target_full_ratio 0.8 #安装对象网关RGW sudo ceph-deploy install --rgw ceph1 ceph2 ceph3 ceph-deploy rgw create ceph1 ceph2 ceph3#在1上创建对象网关示例 curl http://ceph1:7480 #在ceph1创建一个用户,23没创建 sudo radosgw-admin user create --uid="testuser" --display-name="testuser" "access_key": "KDAGRZU8SNPYTJULLY9O", "secret_key": "YnQdNRCM5EkNazhDN2Aj7L15hySWftEK85OgYygP" sudo yum install python-boto #安装s3cmd sudo yum install s3cmd s3cmd --configure 根据提示输入accessKey,securityKey 生成基本的配置文件。 修改host_base/host_bucket: host_base = 192.168.244.101:7480 host_bucket = 192.168.244.101:7480/%(bucket)s #这里可以测试s3test.py #创建pool ceph osd pool create data 128 ceph osd lspools #这里可以测试python-rados.py #设置允许最大object数量为100: ceph osd pool set-quota test-pool max_objects 100 #设置允许容量限制为10GB: ceph osd pool set-quota test-pool max_bytes $((10 * 1024 * 1024 * 1024)) #取消配额限制只需要把对应值设为0即可。 #重命名pool ceph osd poolrename test-pool test-pool-new #安装rados-python,可以进行测试 sudo yum install python-ceph #安装rados-java,或者直接把编译好的rados.jar拷贝过来 sudo yum install jna git clone --recursive https://github.com/ceph/rados-java.git cd rados-java-master mvn install -Dmaven.test.skip=true #复制该JAR文件到公共目录 (例如 /usr/share/java) ,并且确认该文件和JNA JAR在你的JVM’s classpath目录中. sudo cp target/rados-0.6.0.jar /usr/share/java/rados-0.6.0.jar sudo ln -s /usr/share/java/jna.jar /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.242.b08-0.el7_7.x86_64/jre/lib/ext/jna.jar sudo ln -s /usr/share/java/rados-0.6.0.jar /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.242.b08-1.el7.x86_64/jre/lib/ext/rados-0.6.0.jar #可能需要 yum install librados2-0.94.5-1.el7.x86_64.rpm librados2-devel-0.94.5-1.el7.x86_64.rpm -y yum install librbd1-0.94.5-1.el7.x86_64.rpm librbd1-devel-0.94.5-1.el7.x86_64.rpm -y yum install java-devel #挂载cephfs #添加mds ceph-deploy mds create ceph1 ceph df # 添加pool ceph osd pool create cephfs_data 128 ceph osd pool create cephfs_metadata 128 ceph fs new cephfs cephfs_metadata cephfs_data ceph df #CephFS挂载实例 #ceph2 su - ceph-admin sudo yum install -y ceph-fuse mkdir ~/cephfs sudo ceph-fuse -m ceph2:6789 ~/cephfs/ df -h ~/cephfs/ ceph auth get-or-create mgr.docker-node1 mon 'allow profile mgr' osd 'allow *' mds 'allow *' #设置mgr dashbord ceph-mgr -i master ceph mgr module enable dashboard ceph config-key set mgr/dashboard/master/server_addr 10.3.1.11 # dashboard 默认运⾏在7000端⼝ ceph config-key put mgr/dashboard/server_port 7000 ceph mgr module disable dashboard ceph mgr module enable dashboard

    安装完成后即可测试ceph,参考我的博客 测试和使用ceph ceph存储接口讲解

    2.安装常见问题

    2.1 安装ceph的osd时,运行清空磁盘命令

    ceph-deploy disk zap node3-ceph /dev/sdb 如果报错磁盘忙无法清除之类的

    [WARNIN] stderr: wipefs: error: /dev/sdb: probing initialization failed: Device or resource busy [ceph3][WARNIN] --> failed to wipefs device, will try again to workaround probable race condition

    手动进行dd命令清空磁盘并重启 sudo dd if=/dev/zero of=/dev/sdb bs=512K count=1 reboot 重启完成后,再进入ceph-admin的主机进行ceph-deploy disk zap node3-ceph /dev/sdb 就能够清理磁盘

    2.2 报错has no attribute ‘needs_ssh’,如图

    原因是:remoto包需要升级 解决方式: 在root账户下: 1)安装python2-pip包:yum install python2-pip -y 2)升级remoto:pip install --upgrade remoto

    2.3 ceph pool里对象无法查看,运行代码到 连接cluster后卡住

    原因:一些ceph进程挂了,要重启mon

    systemctl { start | stop | restart} ceph-mon.target

    https://blog.51cto.com/11495268/2339451

    pool 里的pg出问题

    大多数是两种情况导致的,一是有节点没开机,或者其mon进程挂了,解决办法是重启mon服务;另一种是硬盘少而副本多,这时候应该增加硬盘.

    Processed: 0.009, SQL: 10