Deploy a scalable Apache Cassandra database on Kubernetes  

Deploy a scalable Apache Cassandra database on Kubernetes

Last updated | By Animesh Singh, Anthony Amanse, Ishan Gulhane

Description

Today’s businesses are gathering, storing, and analyzing immense amounts of data. Apache Cassandra is a massively scalable open source NoSQL database and is perfect for managing large amounts of structured, semi-structured, and unstructured data across multiple datacenters, commodity servers and the cloud. Kubernetes is the world’s most popular container orchestration system, ranked as one of the most active projects on GitHub. In this journey, you’ll learn how to combine these two powerhouse systems, deploying a cloud-native Cassandra implementation on Kubernetes.

Overview

This journey showcases the full power of Kubernetes clusters. It shows you how you can deploy Apache Cassandra, the world’s most popular NoSQL database, on top of the world’s most popular container orchestration platform, Kubernetes. You’ll find a full deployment roadmap for a multi-node scalable Cassandra cluster from IBM Bluemix Container Service Kubernetes clusters. Each Cassandra component runs in a separate container or group of containers.

With Apache Cassandra’s distributed system, you can deploy large numbers of nodes across multiple data centers. Cassandra’s distributed architecture is specifically tailored for multiple-data center deployment, redundancy, failover, and disaster recovery. Together, these features make it a great fit for a container orchestration platform and will give you all the advantages of automation, operation, scaling, and monitoring.

Flow

  1. The developer creates a headless service. A Kubernetes service is an abstraction, which defines a logical set of pods and a policy by which to access them. The headless Cassandra service is used for Cassandra cluster formation and “seed” discovery.
  2. The developer creates a Kubernetes ReplicationController responsible for creating and scaling non-persistent Cassandra cluster pod nodes. Once the developer verifies that a single Cassandra node has been created, she can scale the Cassandra cluster by adding more nodes in the ReplicationController.
  3. To create persistent Cassandra nodes, the developer provisions persistent volumes using Static provisioning, creating volumes using the provided files. The developer creates the same number of PersistentVolumes as the number of Cassandra nodes.
  4. The developer uses Kubernetes StatefulSets to create and scale persistent Cassandra cluster node pods. The StatefulSet is responsible for ordered deployment, ordered termination, and unique network names.
  5. The developer uses Cassandra Query Language (CQL) to create and update an Employee table on the Cassandra keyspace

Related Blogs

Newsletters: The Curious Developer’s Best Friend

The great thing about software development is that there is always something new to learn! The terrible thing about software development is that there is always something new to learn! Luckily, there are tons of wonderful people sharing their knowledge every week in helpful and entertaining newsletters … and unfortunately, it can be really easy...

Continue reading Newsletters: The Curious Developer’s Best Friend

Kubernetes Upstream Contribution – 5 Do’s and Don’t

There is a good amount of documentation material out there on the Kubernetes community website which every contributor should read. However, if you are a new or intermediate contributor, or thinking to start contributing to Kubernetes upstream, hopefully, this post will help you understand some of the lessons that I have learned. This post discusses...

Continue reading Kubernetes Upstream Contribution – 5 Do’s and Don’t

Related Links