Taxonomy Icon

Databases

Apache Cassandra is a proven fault-tolerant and scalable decentralized NoSQL database for today’s applications. You can deploy Cassandra on Docker containers or manage Cassandra through Kubernetes.

In this tutorial, you will learn how to set up a Cassandra cluster that spans multiple data centers (all geographically miles apart) with Kubernetes. The data centers could be in different countries or regions. Some reasons to use this setup are:

  • Performing live backups, where data written in one data center is asynchronously copied over to the other data center.
  • Users in one location (for example, the United States) connect to the data center in or near that location, and users in another location (for example, India) connect to the data center in or near their location to ensure faster performance.
  • If one data center is down, you can still serve Cassandra data from the other data center.
  • If a few nodes in one data center are down, Cassandra data is still available without interruption.

Docker containers managed with the open source Kubernetes platform makes this happen.

Setup

To complete the steps in this tutorial, you will use the Kubernetes concepts of pod, StatefulSet, headless service, and PersistentVolume. Here’s what you need:

  • A Kubernetes cluster with nodes in at least two separate data centers. Make sure Kubernetes is V1.8.x or higher
  • At least three nodes in each data center where Kubernetes can deploy pods

Figure 1 shows the setup with five nodes in each data center. Each Kubernetes node deploys one Cassandra pod representing a Cassandra node. Application pods are deployed on each data center, and they access the Cassandra nodes local to the data center using headless service. Data written to any node in one data center is asynchronously copied over to the nodes in other data centers.

Figure 1. One Cassandra cluster spread across two data centers with five nodes in each data center
Image shows data center 1 with Cassandra-a service and App1 pod with Async/Sync replication to Data center 2 with Cassandra-b service to App1 pod

The idea

Create two StatefulSets, one for each site. The StatefulSet manages the deployment and scaling of a set of pods and provide guarantees about the ordering and uniqueness of these pods. The StatefulSet has node affinity defined, such that the pods from one StatefulSet is deployed on nodes in one data center only. This is achieved by setting labels on each node. Label all nodes in data center 1 with dc=DC1 and all nodes in data center 2 with dc=DC2. There will be one Cassandra seed node from each data center, but if you have five nodes, two seed nodes are recommended. Add the appropriate labels to the Kubernetes nodes.

Label the nodes by data center

kubectl label nodes nodea1 dc=DC1
kubectl label nodes nodea2 dc=DC1
kubectl label nodes nodea3 dc=DC1
kubectl label nodes nodeb1 dc=DC2
kubectl label nodes nodeb2 dc=DC2
kubectl label nodes nodeb3 dc=DC2

The action

All code used in this tutorial is available on GitHub. Clone the repository or copy the YAML files from there.

1. Create the namespace

First, create a new namespace where you will do all the work.

kubectl create namespace c7a

2. Create the headless services

kubectl ‑n c7a create ‑f service.yaml

Headless services allow application pods to connect to Cassandra pods by service name. With two headless services, one headless service serves Cassandra pods from one data center, and the other serves Cassandra pods from the other data center. The application pods deployed in each data center can make use of environment variables to select the Cassandra service local to their data center to which to connect to.

3. Create persistent volumes

Cassandra nodes stores data in persistent volumes. Because Cassandra prefers all local storage for each of its nodes, you can provision storage in advance. There are a total six nodes. First, create a directory on each of the nodes to store Cassandra data. For example /data/cass/. Make sure 10 GB of storage is available in this directory on each node. Create this directory on each of the three nodes, on each site.

Then create all six PersistentVolumes using the YAML file provided in GitHub. You can adjust the capacity and location of the directory in this file as needed.

kubectl ‑n c7a create ‑f local_pvs.yaml

4. Create the StatefulSets

The persistent volume claims, which are defined in the StatefulSet, consumes the persistent volumes. Create the two StatefulSets (one for each site). Remember, each node on the first data center is labeled with the dc=DC1 label and each node on the second data center is labeled with the dc=DC2 label. The node affinity spec in StatefulSet named cassandra-a ensures that pods are provisioned to the DC1 data center nodes only. Similarly, the cassandra-b StatefulSet has affinity set to the DC2 data center, so all pods using that StatefulSet are deployed to the DC2 data center only.

spec:
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        ‑ matchExpressions:
          ‑ key: dc
            operator: In
            values:
            ‑ DC1

StatefulSet is a powerful construct in Kubernetes. To understand how the pod deployment and networking works, you need to understand some of its essential conventions.

Pod name

Pods in a StatefulSet are created sequentially, starting with the first one’s name and ending with zero. The pod names follow the syntax: <statefulset name>-<ordinal index>.. In this tutorial, pods in DC1 are named cassandra-a-0, cassandra-a-1, and cassandra-a-2; and pods in DC2 are named cassandra-b-0, and so on.

Network address

A StatefulSet can use a headless service to control the domain of its pods. The domain managed by this service takes the form $(service name).$(namespace).svc.cluster.local, where cluster.local is the cluster domain. As each pod is created, it gets a matching DNS subdomain, taking the form $(podname).$(service name).$(namespace).svc.cluster.local. In this tutorial, the first pod in DC1 has the name cassandra-a-0.cassandra-a.c7a.svc.cluster.local, and other pods follow the same convention.

Volume claims

The volume claim template is specified in the StatefulSet with the name cassandra-data. The resulting persistent volume claims, generated by this StatefulSet will be named with the format $(volumeClaimTemplate name)-$(pod name). For these, the deployment of StatefulSet creates volume claims, such as cassandra-data-cassandra-a-0 and cassandra-data-cassandra-b-0. Volume claims match up with corresponding pods. Because static volume provisioning is used, claims pick volumes they wish.

Cassandra configuration

There are further details in the StatefulSet related to the Cassandra configuration done through environment variables exposed by the specific Cassandra Docker image. One important aspect is the specification of Cassandra seeds. It is recommended that at least one seed node is specified from each data center. So the very first pod in each data center is specified as a seed node. Note the fully qualified node name of the seed nodes. Another notable difference is the data center specification; StatefulSet cassandra-a is in DC1, and StatefulSet cassandra-b is in the DC2 data center.

Now it is time to create the two StatefulSets and make sure they are created successfully:

kubectl ‑n c7a create ‑f statefulset‑a.yaml
kubectl ‑n c7a create ‑f statefulset‑b.yaml
kubectl ‑n c7a get statefulsets

5. Verify Cassandra seed nodes

In this tutorial, one replica is specified in each of the StatefulSet definitions, so for each StatefulSet, there will be only one pod created in each data center. These pods will be named cassandra-a-0 and cassandra-b-0. Just list the pods to confirm:

kubectl ‑n c7a get pods ‑o wide

NAME           READY     STATUS    RESTARTS   AGE       IP           NODE
cassandra‑a‑0   1/1       Running   0          4m        10.244.0.6   iops15
cassandra‑b‑0   1/1       Running   2          3m        10.244.1.2   kube‑vm1

If the pods are successful and they are in the running state, check Cassandra’s nodetool status. You should see one node from each data center:

kubectl ‑n c7a exec ‑ti cassandra‑a‑0 ‑‑ nodetool status
Data center: DC1
===============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
‑‑  Address      Load       Tokens       Owns (effective)  Host ID                               Rack
UN  10.244.2.22  127.33 KiB  256          100%             59b4e526‑3a3c‑4252‑84b7‑2c6c5de05e13  Rack1
Data center: DC2
===============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
‑‑  Address      Load       Tokens       Owns (effective)  Host ID                               Rack
UN  10.244.5.6   108.63 KiB  256          100%             cb86a5e6‑2666‑445e‑ba96‑88640b0e7848  Rack1

If either of the two seed node containers are in error, check the logs to find out what is wrong. You can obtain the cassandra-a-0 logs with kubectl -n c7a logs cassandra-a-0.

6. Scale using StatefulSet

So far, only one Cassandra node has been created in each data center. Now it is time to scale the pods using StatefulSet. Set the replica count to three for each StatefulSet:

kubectl ‑n c7a scale ‑‑replicas=3 statefulset/cassandra‑a
kubectl ‑n c7a scale ‑‑replicas=3 statefulset/cassandra‑b

Each of the StatefulSets are now scaled to have three replicas. You can have more replicas, but you need to make sure you have provisioned persistent volumes accordingly, in advance.

7. Verify other Cassandra nodes

Within a few minutes, the rest of the pods are up and running and have joined the Cassandra cluster. Verify the status of all the Cassandra nodes using this nodetool utility:

kubectl ‑n c7a exec ‑ti cassandra‑a‑0 ‑‑ nodetool status
Data center: DC1
===============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
‑‑  Address      Load       Tokens       Owns (effective)  Host ID                               Rack
UN  10.244.2.22  127.33 KiB  256          29.7%             59b4e526‑3a3c‑4252‑84b7‑2c6c5de05e13  Rack1
UN  10.244.1.24  108.62 KiB  256          32.9%             7749be9d‑4b66‑4c9f‑8afc‑55d484d7404f  Rack1
UN  10.244.1.25  218.27 KiB  256          32.6%             bfd26111‑21e3‑42a9‑bdf6‑b2068c1bd1c5  Rack1
Data center: DC2
===============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
‑‑  Address      Load       Tokens       Owns (effective)  Host ID                               Rack
UN  10.244.5.6   108.63 KiB  256          35.7%             cb86a5e6‑2666‑445e‑ba96‑88640b0e7848  Rack1
UN  10.244.5.7   196.67 KiB  256          33.8%             1a0b6ba5‑a9fd‑4d67‑bb5f‑9cdc97b5433e  Rack1
UN  10.244.5.8   127.42 KiB  256          35.4%             09fa301d‑d057‑4b2d‑a44f‑7ab57d7e5197  Rack1

The output above shows that the six nodes, three in each data center, are up and running.

8. Create a Cassandra keyspace with replication configuration

Now create a keyspace and specify how many replicas each data center needs. Having multiple replicas in each data center helps with local node failures because the data can still be served from a copy if a node fails.

kubectl ‑n c7a exec ‑ti cassandra‑a‑0 ‑‑ cqlsh
    Connected to Cassandra at 127.0.0.1:9042.
cqlsh 5.0.1 | Cassandra 3.11.1 | CQL spec 3.4.4 | Native protocol v4Use HELP for help.
cqlsh> create keyspace hr_keyspace with replication ={'class' : 'NetworkTopologyStrategy', 'DC1':2, 'DC2':2};

The code above shows that a Cassandra shell session (cqlsh) has been started on cassandra-a-0 node. Then a keyspace named hr_keyspace is created. When the keyspace is created, it is required to specify the replication strategy. The replication strategy of NetworkTopologyStrategy is typically used when multiple data centers are involved. There is an option to specify how many replicas are needed in each data center. In the example above, there are two replicas specified for data center DC1 and two replicas for data center DC2.

Next, create a table and add some data:

cqlsh> use hr_keyspace;
cqlsh> CREATE TABLE employee( emp_id int PRIMARY KEY, emp_name text, emp_city text, emp_sal varint, emp_phone varint);
#For asynchronous writes to other data center, set the #consistency level to LOCAL_QUORUM
cqlsh:hr_keyspace> consistency LOCAL_QUORUM
Consistency level set to LOCAL_QUORUM.
cqlsh:hr_keyspace> INSERT INTO employee (emp_id, emp_name, emp_city,emp_sal,emp_phone) VALUES(1,'David', 'San Francisco', 50000, 983210987);
cqlsh:hr_keyspace> INSERT INTO employee (emp_id, emp_name, emp_city,emp_sal,emp_phone) VALUES(2,'Robin', 'San Jose', 55000, 9848022339); 
cqlsh:hr_keyspace> INSERT INTO employee (emp_id, emp_name, emp_city,emp_sal,emp_phone) VALUES(3,'Bob', 'Austin', 45000, 9848022330);
cqlsh:hr_keyspace> INSERT INTO employee (emp_id, emp_name, emp_city,emp_sal,emp_phone) VALUES(4, 'Monica','San Jose', 55000, 9458022330);

Note the consistency level of LOCAL_QUORUM. It ensures that data is written to local nodes first and asynchronously copied to remote data center nodes. This is important because it tells Cassandra to asynchronously copy the data to a remote data center.

Now try to retrieve that data from any node.

cqlsh:hr_keyspace> select * from employee;

 emp_id | emp_city      | emp_name | emp_phone  | emp_sal
‑‑‑‑‑‑‑‑+‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑+‑‑‑‑‑‑‑‑‑‑+‑‑‑‑‑‑‑‑‑‑‑‑+‑‑‑‑‑‑‑‑‑
      1 | San Francisco |    David |  983210987 |   50000
      2 |      San Jose |    Robin | 9848022339 |   55000
      4 |      San Jose |   Monica | 9458022330 |   55000
      3 |        Austin |      Bob | 9848022330 |   45000

cqlsh:hr_keyspace> quit;

9. Simulate a site failure

You can simulate a site failure if all the Cassandra pods from one data center are unavailable. To do this, delete one StatefulSet. If one StatefulSet is deleted, three pods from that data center will be gone, simulating a site failure. If this happens, you can still retrieve data from the other site.

kubectl ‑n c7a delete statefulset cassandra‑a
statefulset "cassandra‑a" deleted

kubectl ‑n c7a get pods ‑o wide

This deletes StatefulSet cassandra-a and all of its pods. Now, there are only three Cassandra pods remaining, all from data center DC2. Connect to any of those and try to retrieve data:

kubectl ‑n c7a exec ‑ti cassandra‑b‑1 - cqlsh
    Connected to Cassandra at 127.0.0.1:9042.
cqlsh 5.0.1 | Cassandra 3.11.1 | CQL spec 3.4.4 | Native protocol v4Use HELP for help.
cqlsh> use hr_keyspace;
cqlsh:hr_keyspace> select * from employee;

 emp_id | emp_city      | emp_name  | emp_phone   | emp_sal
‑‑‑‑‑‑‑‑+‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑+‑‑‑‑‑‑‑‑‑‑‑+‑‑‑‑‑‑‑‑‑‑‑‑‑+‑‑‑‑‑‑‑‑‑
      1 | San Francisco |    David  |  983210987  |   50000
      2 |      San Jose |    Robin  | 9848022339  |   55000
      4 |      San Jose |   Monica  | 9458022330  |   55000
      3 |        Austin |      Bob  | 9848022330  |   45000

(4 rows)
cqlsh:hr_keyspace> quit;

This shows that the data is still available from other data center nodes. Note that the above example above shows a connection to the Cassandra node directly. But in actuality, the applications using Cassandra will connect it through the headless service created for it, as shown in Figure 1 above.

10. Cleanup

To delete everything on the Kubernetes cluster in the c7a namespace, use this command:

kubectl -n c7a delete statefulset,pvc,pv,svc ‑l app=cassandra

Conclusion

In this tutorial, you saw how to set up a multi-data center Cassandra cluster on a Kubernetes platform. This setup is useful to perform live backups and to protect from site or data center failures. The site location-aware access to a Cassandra cluster also reduces read and write latency.