Deploy a scalable Apache Cassandra database on Kubernetes  

Deploy a scalable Apache Cassandra database on Kubernetes

Last updated | By Animesh Singh, Anthony Amanse, Ishan Gulhane

Description

Today’s businesses are gathering, storing, and analyzing immense amounts of data. Apache Cassandra is a massively scalable open source NoSQL database and is perfect for managing large amounts of structured, semi-structured, and unstructured data across multiple datacenters, commodity servers and the cloud. Kubernetes is the world’s most popular container orchestration system, ranked as one of the most active projects on GitHub. In this journey, you’ll learn how to combine these two powerhouse systems, deploying a cloud-native Cassandra implementation on Kubernetes.

Overview

This journey showcases the full power of Kubernetes clusters. It shows you how you can deploy Apache Cassandra, the world’s most popular NoSQL database, on top of the world’s most popular container orchestration platform, Kubernetes. You’ll find a full deployment roadmap for a multi-node scalable Cassandra cluster from IBM Cloud Container Service Kubernetes clusters. Each Cassandra component runs in a separate container or group of containers.

With Apache Cassandra’s distributed system, you can deploy large numbers of nodes across multiple data centers. Cassandra’s distributed architecture is specifically tailored for multiple-data center deployment, redundancy, failover, and disaster recovery. Together, these features make it a great fit for a container orchestration platform and will give you all the advantages of automation, operation, scaling, and monitoring.

Flow

  1. The developer creates a headless service. A Kubernetes service is an abstraction, which defines a logical set of pods and a policy by which to access them. The headless Cassandra service is used for Cassandra cluster formation and “seed” discovery.
  2. The developer creates a Kubernetes ReplicationController responsible for creating and scaling non-persistent Cassandra cluster pod nodes. Once the developer verifies that a single Cassandra node has been created, she can scale the Cassandra cluster by adding more nodes in the ReplicationController.
  3. To create persistent Cassandra nodes, the developer provisions persistent volumes using Static provisioning, creating volumes using the provided files. The developer creates the same number of PersistentVolumes as the number of Cassandra nodes.
  4. The developer uses Kubernetes StatefulSets to create and scale persistent Cassandra cluster node pods. The StatefulSet is responsible for ordered deployment, ordered termination, and unique network names.
  5. The developer uses Cassandra Query Language (CQL) to create and update an Employee table on the Cassandra keyspace

Related Blogs

CloudNativeCon and KubeCon are coming to Copenhagen!

With May just around the corner, mark your calendars for an exciting event, CloudNativeCon/KubeCon, in Denmark’s capital city of Copenhagen. Many of us in the Cloud Native community already visited this beautiful city for DockerCon EU last year and we’re excited to be able to take in all of the wonderful sites again this year....

Continue reading CloudNativeCon and KubeCon are coming to Copenhagen!

Live analytics with an event store fed from Java and analyzed in Jupyter Notebook

Event-driven analytics requires a data management system that can scale to allow a high rate of incoming events while optimizing to allow immediate analytics. IBM Db2 Event Store extends Apache Spark to provide accelerated queries and lightning fast inserts. This code pattern is a simple introduction to get you started with event-driven analytics. You can...

Continue reading Live analytics with an event store fed from Java and analyzed in Jupyter Notebook

Related Links