Overview

Skill Level: Intermediate

We will go over the steps to setup an independent ceph cluster with block storage service(rbd). We will also look at steps to install ceph client on worker nodes, required to provision block storage in ceph.

Ingredients

1. ICP 3.1 setup with 3 or more worker nodes. 

2. 3 or more VMs with raw disks to be used by Ceph ( can be same as ICP worker nodes , provided there is no other storage provisioner accessing those raw disks. ( Ubuntu >=16.04 ) 

3. ssh server setup on all nodes.

Step-by-step

  1. Topology of ceph nodes

    We will be referring to the topology below for our setup of ceph cluster. For this we need 4 vms, which can be completely separate VMs or one of the nodes which are part of the ICP cluster, provided that at least 3 nodes (referred as node1,node2, node3 ) have raw disks, which are not in use by any other storage provisoner.

    4th VM (referred as admin-node) is used for setting up the cluster and creating the cluster config. You can use one of the ICP worker nodes as admin node.

    In all the steps below, replace node1,node2, node3 with hostnames of the your VMs(nodes) .

    Refer to this link for explanation of different types of services in ceph cluster.

    ceph topology

  2. Prepare nodes

    In the steps that follow, we install ceph-deploy on admin node along with some dependencies required for ceph on all the nodes. 

    The following commands should be run as the root user

    a)Add the release Key (only on admin-node)

    wget -q -O- ‘https://download.ceph.com/keys/release.asc’ | sudo apt-key add –

    b)Add the CEPH packages to your repository (only on admin-node) 

    echo deb https://download.ceph.com/debian-{ceph-stable-release}/ $(lsb_release -sc) main | sudo tee /etc/apt/sources.list.d/ceph.list

    replace {ceph-stable-release} with the release ou would like to install e.g. mimic –¬†debian-mimic

    c) Install ntp on all nodes
    apt-get install -y ntp

    If there is a local ntp server on your network, update /etc/ntp.conf with your local pool server and restart ntp.Monitor nodes need to be in synch as far as time is concerned so its important to have time synchronized on all nodes.

    d)Install python on all nodes
    apt-get install -y python

    e)Update and install (only on admin node) 
    apt-get update
    apt-get install -y ceph-deploy

    f)Create a ceph-deploy user on all nodes.

    useradd -m -s /bin/bash -c “ceph deploy user” ceph-deploy
    echo “ceph-deploy:Passw0rd!” | sudo -S chpasswd

    g) Add ceph-deploy user to passwordless sudo on all nodes
    echo ‘ceph-deploy ALL=(root) NOPASSWD:ALL’ |sudo EDITOR=’tee -a’ visudo

     

    h) Login as your ceph-deploy user on to the admin-node and create the following file at ~/.ssh/config. You may need to create both the /home/ceph-deploy/.ssh directory and the config file. Replace node1,node2, node3 with hostnames of the vms used for storage nodes.

     

    Host node1
     Hostname node1
     User ceph-deploy
    Host node2
     Hostname node2
     User ceph-deploy
    Host node3
     Hostname node3
     User ceph-deploy

     

    i)Enable password less access from admin node to other nodes.

    login as ceph-deploy user on admin node and execute the command below

    ssh-keygen -t rsa -P ”

    j) Copy the generated public key to ceph nodes to enable password less login.

    ssh-copy-id -i ~/.ssh/id_rsa ceph-deploy@node1
    ssh-copy-id -i ~/.ssh/id_rsa ceph-deploy@node2
    ssh-copy-id -i ~/.ssh/id_rsa ceph-deploy@node3

    ssh-copy-id -i ~/.ssh/id_rsa ceph-deploy@admin-node

    When asked for password, enter the password set for ceph-deploy in step g) 

  3. Deploy Ceph

    a) Create the cluster From the ceph-deploy user’s home directory:
    mkdir mycluster
    cd mycluster
    ceph-deploy new node1       

    b) Install Ceph on all nodes
    ceph-deploy install node1 node2 node3 

    c)Deploy the initial monitor and gather the keys
    ceph-deploy mon create-initial

    d)Copy the admin config files to all nodes
    ceph-deploy admin node1 node2 node3 

    e)Deploy a manager node
    ceph-deploy mgr create node1

    f)Deploy storage nodes
    The data should be the raw device name of an unused raw device installed in the host. The final parameter is the hostname. Execute this command once for every raw device and host in the environment. ( assuming /dev/sdb and /dev/sdc are raw disks available on these nodes (node1 to node3 ) 

    ceph-deploy osd create –data /dev/sdb node1
    ceph-deploy osd create –data /dev/sdc node1

    ceph-deploy osd create –data /dev/sdb node2

    ceph-deploy osd create –data /dev/sdc node2

    ceph-deploy osd create –data /dev/sdb node3

    ceph-deploy osd create –data /dev/sdc node3

     

    g) Install a metadata server
    ceph-deploy mds create node1

    h) Check the status of your cluster

    ssh node1 sudo ceph -s

    It should return HEALTH_OK             

     

    For production setup, you should have multiple instances of monitor and manager running. You can add more instances by following the steps documented here.

  4. Test your ceph setup

    You can test the setup by following the steps documented in this link 

  5. Integrating ICP with Ceph

    Login to node1 (initial monitor node as ceph-deploy user and perform the next steps ) 

    a) Create an rbd pool for use with ICP

    sudo ceph osd pool create icp 192 192 

    b) Create a new ceph user for use with ICP
    sudo ceph auth get-or-create client.icp mon ‘allow r’ osd ‘allow class-read object_prefix rbd_children, allow rwx pool=icp’ -o ceph.client.kube.keyring

    c) To deploy images as this user, you will need to create a keyring file for your worker nodes.
    sudo ceph auth get client.icp > ./ceph.client.icp.keyring

    d) copy the keyring file to all the osd nodes under /etc/ceph

    sudo cp ./ceph.client.icp.keyring /etc/ceph

    sudo scp ./ceph.client.icp.keyring root@node2:/etc/ceph

    sudo scp ./ceph.client.icp.keyring root@node3:/etc/ceph

     

    d) Retrieve the Ceph admin key as base64
    sudo ceph auth get-key client.admin |base64

    This should return something like: QVFDSGhYZGIrcmc0SUJBQXd0Yy9pRXIxT1E1ZE5sMmdzRHhlZVE9PQ==

    e)Retrieve the Ceph ICP key as base64
    sudo ceph auth get-key client.icp |base64

    This should return something like: QVFERUlYNWJKbzlYR1JBQTRMVnU1N1YvWDhYbXAxc2tseDB6QkE9PQ==

     

    Login to ICP client node where you have kubectl configured to connect to ICP.

    f) Follow the steps documented here from point 7 onwards to setup ICP storage class definition corresponding to the icp pool created above.

     

     

  6. Install ceph client on ICP worker nodes

    Ceph client (rbd) needs to be installed on worker nodes for pvc requests related to ceph storage class to be satsified.

    Login to admin node and run the following commands

    a) Install ceph-client on all the worker nodes

    ceph-deploy install <ceph-client-node>

     

    b)use ceph-deploy to copy the Ceph configuration file and the ceph.client.admin.keyring to the ceph-client. 

    ceph-deploy admin <ceph-client-node>

     

    If you don’t do the above step you would see Pod in Pending state and “describe pod <podname>” would show following error message:

     

    Warning¬† FailedMount ¬† ¬† ¬† ¬† ¬† ¬† 1m (x3 over 6m) ¬† kubelet,¬†<node-ip> ¬†Unable to mount volumes for pod “re1deb7a1a3-apiconnect-cc-1_default(f64f06f6-ebe0-11e8-a37a-00163e01995e)”: timeout expired waiting for volumes to attach or mount for pod “default”/”re1deb7a1a3-apiconnect-cc-1”. list of unmounted volumes=[pv-claim]. list of unattached volumes=[pv-claim podinfo tls-secret default-token-7qfbn]

    ¬† Warning¬† FailedMount ¬† ¬† ¬† ¬† ¬† ¬† 1m (x11 over 7m)¬† kubelet,¬†<node-ip> ¬†MountVolume.WaitForAttach failed for volume “pvc-f64bd63d-ebe0-11e8-a37a-00163e01995e” : fail to check rbd image status with: (executable file not found in $PATH), rbd output: ()

  7. Common issues faced and fixes.

    1. ceph luminous or later  requires kernel version > 4.4  

    Saw this on ubuntu 16.04

    kubcetl describe pods <pod> shows following error message 

    Events:

      Type     Reason       Age                From                     Message

    ¬† —- ¬† ¬† —— ¬† ¬† ¬† —- ¬† ¬† ¬† ¬† ¬† ¬† ¬† —- ¬† ¬† ¬† ¬† ¬† ¬† ¬† ¬† ¬† ¬† ——-

    ¬† Warning¬† FailedMount¬† 21m (x13 over 1h)¬† kubelet, 172.16.171.100¬† MountVolume.WaitForAttach failed for volume “pvc-5e64d7c8-e99b-11e8-a37a-00163e01995e” : rbd: map failed exit status 110, rbd output: rbd: sysfs write failed

    In some cases useful info is found in syslog – try “dmesg | tail”.

    rbd: map failed: (110) Connection timed out

    ¬† Warning¬† FailedMount¬† 1m (x28 over 1h)¬† kubelet, 172.16.171.100¬† Unable to mount volumes for pod “re1deb7a1a3-apiconnect-cc-0_default(399b6c62-ebc2-11e8-a37a-00163e01995e)”: timeout expired waiting for volumes to attach or mount for pod “default”/”re1deb7a1a3-apiconnect-cc-0”. list of unmounted volumes=[pv-claim]. list of unattached volumes=[pv-claim podinfo tls-secret default-token-7qfbn]

    dmesg shows:

    libceph: mon0 172.16.170.215:6789 feature set mismatch, my 106b84a842a42 < server’s 40106b84a842a42, missing 400000000000000

    Fix :

    run following command on monitor node ( node1) 

      ceph osd crush tunables legacy  

    Details here: https://bugs.launchpad.net/charm-ceph-mon/+bug/1716735

     

    2. volume attachment fails because of missing rbd client on nodes:

     

      Warning  FailedScheduling        8m (x7 over 8m)   default-scheduler        pod has unbound PersistentVolumeClaims (repeated 6 times)

      Normal   Scheduled               8m                default-scheduler        Successfully assigned default/re1deb7a1a3-apiconnect-cc-1 to 172.16.167.125

    ¬† Normal ¬† SuccessfulAttachVolume¬† 8m¬† ¬† ¬† ¬† ¬† ¬† ¬† ¬† attachdetach-controller¬† AttachVolume.Attach succeeded for volume “pvc-f64bd63d-ebe0-11e8-a37a-00163e01995e”

    ¬† Warning¬† FailedMount ¬† ¬† ¬† ¬† ¬† ¬† 1m (x3 over 6m) ¬† kubelet, 172.16.167.125¬† Unable to mount volumes for pod “re1deb7a1a3-apiconnect-cc-1_default(f64f06f6-ebe0-11e8-a37a-00163e01995e)”: timeout expired waiting for volumes to attach or mount for pod “default”/”re1deb7a1a3-apiconnect-cc-1”. list of unmounted volumes=[pv-claim]. list of unattached volumes=[pv-claim podinfo tls-secret default-token-7qfbn]

    ¬† Warning¬† FailedMount ¬† ¬† ¬† ¬† ¬† ¬† 1m (x11 over 7m)¬† kubelet, 172.16.167.125¬† MountVolume.WaitForAttach failed for volume “pvc-f64bd63d-ebe0-11e8-a37a-00163e01995e” : fail to check rbd image status with: (executable file not found in $PATH), rbd output: ()

     

    Fix:

    Step 6) above.

    Details here: http://docs.ceph.com/docs/mimic/start/quick-rbd/

     

     

    3.¬†application not enabled on 1 pool(s) after creation of ‘icp’ pool.

    sudo ceph -s  return HEALTH_WARN 

      cluster:

        id:     14d528b2-5caa-47b5-9c96-c77b63ff8b9b

        health: HEALTH_WARN

                application not enabled on 1 pool(s)

     

    Fix:

    Run following command on monitor node ( node1) as ceph-deploy

    sudo ceph osd pool application enable icp  rbd

     

    Details here: https://ceph.com/community/new-luminous-pool-tags/

Join The Discussion