Deploy a multi-node Cassandra cluster on IBM Cloud Private and Kubernetes
Take a deep-dive into installing a scalable Cassandra solution on IBM Cloud Private and Kubernetes
IBM Cloud Private is an application platform for developing and managing on-premises, containerized applications. The IBM Cloud Private environment is comprised of the container orchestrator Kubernetes, a private image repository, a management console, and monitoring frameworks. While IBM Cloud Private can provide the benefits of the public cloud from the safety of your firewall-protected datacenter, IBM Cloud Private can also be installed on your local machine, providing developers a quick way to get started with a Kubernetes cluster. Kubernetes, the most popular container orchestration platform, eliminates many of the manual processes involved in deploying and scaling containerized applications.
In this how-to, we will get up and running with an installation of IBM Cloud Private and then install Cassandra on top of it.
This is a beginner level how-to and requires an environment capable of installing:
Time to completion will largely depend on bandwidth speed. All things being equal, it should take approximately 30 minutes to complete this how-to.
Install IBM Cloud Private
Getting a Kubernetes cluster up and running is no small task. By definition, a cluster means a bunch, and getting a bunch of containers running and talking to each other correctly can present quite a challenge. This is where IBM Cloud Private (from hereon called ICP) can help. ICP is a available on various cloud platforms but the most useful way for trying it out is on your local machine. We will be using Vagrant and a pre-existing Vagrant file to take care of all the heavy lifting required by providing a fully functional cluster of 8 nodes, complete with networking and functional services such as DNS and an API server.
Before we get started you will need an installation of VirtualBox and Vagrant. Once those are installed clone the
deploy-ibm-cloud repository from github, change into the new directory and run vagrant up.
git clone https://github.com/IBM/deploy-ibm-cloud.git cd deploy-ibm-cloud-private vagrant up
This will download and install everything needed to run a local instance of ICP. Once complete, you have a functioning Kubernetes cluster, and you can log into the web interface that ICP provides. The last bit of the install script will give you the web address with a username and password combination for access. Log in to and you will be greeted with the dashboard:
The web interface provides the means to interact with your cluster, both in terms of monitoring – you can see several metrics like CPU utilization and memory usage in the dashboard – and adminstration – as you can manipulate your workloads and infrastructure from the menu in the upper left. Another means of interacting with your cluster is the
kubectl command line tool. Download and install the CLI from the Kubernetes Docs. Once installed, you will need a few connection paramaters from ICP to be able to use
kubectl. From the dashboard, click on the user dropdown from the upper right corner (which will read
admin, unless you have been creating other users already…) and select Configure Client. Copy and paste these into your terminal, and you are ready to proceed.
Now we have a fully provisioned cluster we are ready to add a workload to it. As a side note, another advantage of using ICP to run your cluster is the number of applications immediately available to you from the Applications menu. Click the icon of three lines in the upper left hand corner to open the navigation menu, and select App Center. As an example, you could select nginx, click install, and you’ll have a webserver in minutes. However, in this how-to, we are going to set up a Cassandra cluster, which would serve well as a backend for any kind of distributed application.
Setting up Cassandra
Initially developed at Facebook, Cassandra is essentially a cross between a key-value and a column-oriented database management system. As one of the most active Apache projects, Cassandra is used in production by a multitude of organizations. We could have chosen any database system to implement as a part of this tutorial, but the decentralized and fault-tolerant nature of Cassandra makes it a natural fit.
To speed up our demonstration, We will be using some of the configuration files included in the Cassandra on Kubernetes repository from the IBM org on github.com. Go ahead and clone that repo as well to get started.:
git clone https://github.com/IBM/Scalable-Cassandra-deployment-on-Kubernetes cd Scalable-Cassandra-deployment-on-Kubernetes
Create the Cassandra Headless Service
To allow us to do simple discovery of the cassandra seed node (which we will deploy shortly) we can create a “headless” service. We do this by specifying none for the clusterIP in
cassandra-service.yaml. This headless service allows us to use KubeDNS for the Pods to discover the IP address of the Cassandra seed.
First, we will create the headless service with the command
kubectl create -f cassandra-service.yml. You should receive a response to the tune of:
$ kubectl create -f cassandra-service.yml service "cassandra" created
You can confirm the service has been created through the web UI by selecting Workloads -> Services from the navigation menu:
You will notice that the CLUSTER-IP listed for the Cassandra service is None, which means we have created a “headless” service. This allows for the use of KubeDNS to discover the IP address of the Cassandra seed that we will be deploying momentarily.
Create Local Volumes
Next, we will to create some storage space for our Cassandra nodes. In our example, we are creating three nodes, so we will need three persistent volumes. We can take care of this with the command
kubectl create -f local-volumes.yaml from which you should see the following response:
$ kubectl create -f local-volumes.yaml persistentvolume "cassandra-data-1" created persistentvolume "cassandra-data-2" created persistentvolume "cassandra-data-3" created
To confirm, select Infrastructure -> Storage from the navigation menu:
Create a StatefulSet
Our last step is to create a Pod. Once again, we will be using a preconfigured yaml file that in this case provides ordered deployment, ordered termination and unique network names for our nodes. Run
kubectl create -f cassandra-statefulset.yaml and you should see the response:
$ kubectl create -f cassandra-statefulset.yaml statefulset "cassandra" created
Back in the UI, select Workloads -> Applications -> StatefulSet to confirm creation:
Click on the name of our StatefulSet to get more information on it, including Pod status:
Back on the main StatefulSet page, you can scale your stateful set by clicking on the cog icon under ACTION all the way to the right, and select Scale:
And there you have it! From here, you can use CQL to interact with your Cassandra cluster and put it to use however you like.
You can perform a
nodetool status to check if the other cassandra nodes have joined and formed a Cassandra cluster.
Note: It can take around 5 minutes for the Cassandra database to finish its setup.
$ kubectl exec -ti cassandra-0 -- nodetool status Datacenter: DC1 =============== Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens Owns (effective) Host ID Rack UN 10.xxx.xxx.xxx 103.25 KiB 256 68.7% 633ae787-3080-40e8-83cc-d31b62f53582 Rack1 UN 10.xxx.xxx.xxx 108.62 KiB 256 63.5% e95fc385-826e-47f5-a46b-f375532607a3 Rack1 UN 10.xxx.xxx.xxx 177.38 KiB 256 67.8% 66bd8253-3c58-4be4-83ad-3e1c3b334dfd Rack1
IBM Cloud Private allows us to get up and running with a Kubernetes cluster quite easily. Here, we have shown how to use ICP to start a Cassanda cluster.
- Virtualbox: A free and open-source hypervisor
- Vagrant: An open-source product for building and maintaining virtual deployment environments
- Deploy IBM Cloud Private: Instructions for deploying IBM Cloud Private on several senarios
- kubectl: A command line interface for running commands against Kubernetes clusters
- Cassandra: A free and open-source distributed NoSQL database management system
- Cassandra Deployment on Kubernetes: An IBM Code Pattern with more generic and in-depth instructions for installing Cassandra on Kubernetes