Deploy a scalable Apache Cassandra database on Kubernetes

Get the code

Summary

Today’s businesses are gathering, storing, and analyzing immense amounts of data. Apache Cassandra is a massively scalable open source NoSQL database and is perfect for managing large amounts of structured, semi-structured, and unstructured data across multiple datacenters, commodity servers, and the cloud. Kubernetes is the world’s most popular container orchestration system, ranked as one of the most active projects on GitHub. In this pattern, you’ll learn how to combine these two powerhouse systems, deploying a cloud-native Cassandra implementation on Kubernetes.

Description

This pattern showcases the full power of Kubernetes clusters. It shows you how you can deploy Apache Cassandra, the world’s most popular NoSQL database, on top of the world’s most popular container orchestration platform, Kubernetes. You’ll find a full deployment roadmap for a multi-node scalable Cassandra cluster from IBM Cloud Container Service Kubernetes clusters. Each Cassandra component runs in a separate container or group of containers.

With Apache Cassandra’s distributed system, you can deploy large numbers of nodes across multiple data centers. Cassandra’s distributed architecture is specifically tailored for multiple-data center deployment, redundancy, failover, and disaster recovery. Together, these features make it a great fit for a container orchestration platform and will give you all the advantages of automation, operation, scaling, and monitoring.

Flow

flow

  1. The developer creates a headless service. A Kubernetes service is an abstraction, which defines a logical set of pods and a policy by which to access them. The headless Cassandra service is used for Cassandra cluster formation and “seed” discovery.
  2. The developer creates a Kubernetes ReplicationController responsible for creating and scaling non-persistent Cassandra cluster pod nodes. Once the developer verifies that a single Cassandra node has been created, she can scale the Cassandra cluster by adding more nodes in the ReplicationController.
  3. To create persistent Cassandra nodes, the developer provisions persistent volumes using Static provisioning, creating volumes using the provided files. The developer creates the same number of PersistentVolumes as the number of Cassandra nodes.
  4. The developer uses Kubernetes StatefulSets to create and scale persistent Cassandra cluster node pods. The StatefulSet is responsible for ordered deployment, ordered termination, and unique network names.
  5. The developer uses Cassandra Query Language (CQL) to create and update an Employee table on the Cassandra keyspace.

Instructions

Ready to put this code pattern to use? Complete details on how to get started are in the README.