Some in the developer community have expressed concern that Docker containers and Kubernetes, in spite of their obvious advantages individually, do not work especially well together. In this article, I explain that the Kubernetes developers have responded to these concerns by introducing StatefulSets. In this tutorial, I show you how to deploy a MongoDB replica set in the Kubernetes-based IBM Cloud Kubernetes Service. Along the way, I outline MongoDB’s cluster requirements and show how to deploy Kubernetes StatefulSets. Finally, I conclude with some advice on using these tools in a production environment.
What you will need
This article assumes at least beginner-level knowledge of the following:
- MongoDB
- Docker
- Kubernetes
- Creating a Kubernetes cluster using the IBM Cloud Kubernetes Service
- An IBM Cloud account (At a minimum, this should be a “pay-as-you-go” account; a Lite account cannot create persistent volumes using the IBM Cloud File Storage offering)
If you have more in-depth knowledge of these technologies and simply want to skip ahead to the deployment instructions, go to “Create the MongoDB replica set on the IBM Cloud Kubernetes Service.”
Introduction: Two considerations
In Andrew Morgan’s blog posting, “Running MongoDB as a Microservice with Docker and Kubernetes,” he outlines the considerations for running MongoDB with containers and orchestration technology, such as Kubernetes. Of particular note are the following points:
- MongoDB database nodes are stateful.
- MongoDB database nodes within a replica set must communicate with each other, including after rescheduling (that is, when new nodes are created).
Those with at least a beginner-level knowledge of Docker might recognize that the first consideration is the antithesis of one of the basic principles of Docker containers, whose guidelines state that “containers should be as ephemeral as possible,” not stateful. Moreover, attempting to add statefulness to a Docker container can be challenging. If not done correctly, it could break another design principle of Docker containers—that of “fast, consistent delivery of your applications.”
Morgan’s second consideration relates to container networking, particularly with respect to hostnames because containers’ instances (“pods”) are automatically destroyed and recreated with new hostnames after automatic rescheduling. As such, the features of Kubernetes that make it attractive as a system for automating deployment, scaling, and management of containerized applications can potentially break inter-container communication.
Given these facts, many architects do not consider Docker containers and Kubernetes to be a means of making available MongoDB replica sets, despite the inherent benefits of these technologies. To address these concerns, the Kubernetes community has introduced StatefulSets.
Kubernetes StatefulSets
StatefulSets were introduced in Kubernetes as a beta resource in Version 1.5 and became stable and made generally available in Version 1.9. They are specifically designed to manage stateful applications, such as MongoDB.
The Kubernetes documentation states:
Like a Deployment, a StatefulSet manages Pods that are based on an identical container spec. Unlike a Deployment, a StatefulSet maintains a sticky identity for each of their Pods. These pods are created from the same spec, but are not interchangeable: each has a persistent identifier that it maintains across any rescheduling.
Critically, StatefulSets provide two (out of five) important features that relate to the considerations for running MongoDB as a microservice with Docker and Kubernetes:
- Stable, persistent storage
- Stable, unique network identifiers
In short, this means that by using a single MongoDB Docker container, you can define a StatefulSet configuration within Kubernetes that can automatically create and attach persistent storage for a MongoDB replica set node. This set node will be addressable using a unique, known network name for any number of newly created nodes.
StatefulSets with the IBM Cloud Kubernetes Service
Given that the primary purpose of MongoDB is to provide access to stored data, it’s critical that the characteristics of the underlying persistent storage are able to meet various non-functional requirements of the application or the business.
For example, many applications require predictable IOPS. Business or government requirements may also stipulate that application storage needs to be encrypted at rest or replicated for redundancy (despite MongoDB’s inherent replication feature).
By default, the IBM Cloud Kubernetes Service is configured to use IBM Cloud’s NFS-based file storage offerings, namely Endurance storage and Performance storage. The service provides eight pre-defined storage classes so that the cluster administrator does not have to create any such classes.
Using StatefulSets, the configuration simply selects one of the existing storage classes according to the application or business requirements; nothing more is required.
The IBM Cloud File Storage offering provides numerous benefits, including:
- Flash-backed storage
- Encryption for data at rest
- Snapshots and replication
- Volume duplication
- Expandable volumes
- Adjustable IOPS
Note: IBM Cloud Kubernetes Service worker nodes use encrypted SSDs, by default.
Step 1. Create the MongoDB replica set on the IBM Cloud Kubernetes Service
There is almost no difference between the configuration used for IBM Cloud Kubernetes Service and that used by other Kubernetes providers. That is one of the many reasons to invest in providers offering Kubernetes. The only real difference is the storageClassName
value.
Note: It is assumed that you’ve already created a Kubernetes Service cluster. For details on how to create an IBM Cloud Kubernetes Service cluster, see “Setting up clusters” in the IBM Cloud Kubernetes Service documentation.
Note: Ensure that your cluster is running at least Kubernetes Version 1.9.2. You can do this by passing --kube-version 1.9.2
to the cluster creation command or by updating an existing cluster and its worker nodes (see the IBM Cloud documentation).
After completing this section, you will have created a three-node MongoDB replica set within your Kubernetes cluster running in the IBM Cloud Kubernetes Service. This leverages the IBM Cloud File Storage offering for persistent volumes.
The figure below illustrates the three main components to be created, and where they would be logically placed in a full application deployment, including:
- The headless service
- The StatefulSet, including the MongoDB containers and associated persistent volume claims
The Persistent Volumes (IBM Cloud File Storage)
Figure 1. The main components of a three-node MongoDB replica set
Note: The Edge Services, Security, LoadBalancer/Ingress Service, and Application components are not addressed in this article. See “Planning external networking” in the documentation for details on how to deploy apps in a cluster, including network options for exposing the app.
Step 2. Create the headless service configuration file
A Kubernetes headless service controls the domain of pods, without requiring a load-balancer service or exposing an IP address, as would other Kubernetes service types. This ensures that the pods that match the service’s selector
are grouped together.
For the headless service, create the following configuration file:
apiVersion: v1
kind: Service
metadata:
name: mongo
labels:
app: mongo
spec:
ports:
- name: mongo
port: 27017
targetPort: 27017
clusterIP: None
selector:
app: mongo
A headless service is created with the clusterIP
value being set to None
(line 12).
Ensure that you use the appropriate port number for the targetPort
(line 11). It should match the value used in the StatefulSet configuration (line 37) in Step 3 below, which is the port used by the mongod service. By default, the port is 27017
. The value used for port
parameter will be the port on which any MongoDB clients connect to the service.
Step 3. Create the StatefulSet configuration file
For the StatefulSet, create the following configuration file:
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: mongo
spec:
selector:
matchLabels:
app: mongo
serviceName: "mongo"
replicas: 3
podManagementPolicy: Parallel
template:
metadata:
labels:
app: mongo
spec:
terminationGracePeriodSeconds: 10
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: "app"
operator: In
values:
- mongo
topologyKey: "kubernetes.io/hostname"
containers:
- name: mongo
image: mongo
command:
- mongod
- "--bind_ip_all"
- "--replSet"
- rs0
ports:
- containerPort: 27017
volumeMounts:
- name: mongo-data
mountPath: /data/db
volumeClaimTemplates:
- metadata:
name: mongo-data
spec:
accessModes: [ "ReadWriteOnce" ]
storageClassName: ibmc-file-bronze
resources:
requests:
storage: 20Gi
Note: The affinity
section (lines 18-27) ensures that a MongoDB replica set pod is not scheduled to run on a cluster worker node that is already running a pod. Doing so would introduce a potential point-of-failure if all three pods are running on a single worker node and the worker node fails.
With the configurations created, it’s time to deploy the resources.
Step 4. Deploy the Service and StatefulSet
To deploy the headless Service and StatefulSet, simply execute kubectl
commands using the IBM Cloud CLI, IBM Cloud
or bx
, and the Kubernetes CLI, kubectl
.
- First, log in to the IBM Cloud CLI:
$ bx login
- Then, set the context for the cluster in the CLI:
$ $(bx cs cluster-config sandbox-cluster | grep KUBECONFIG)
Next, execute the following commands (lines 1 and 4). Their expected output is shown.
$ kubectl apply -f mongo-headless-service.yaml service "mongo" created $ kubectl apply -f mongo-statefulset.yaml statefulset "mongo" created
Note: The configuration files for the headless service and StatefulSet can be merged into a single file, if desired, with the configuration for each separated with a line containing “
---
.”To confirm,
get
each object using the following commands (lines 1 and 5). The expected output is shown for each.$ kubectl get service mongo NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE mongo ClusterIP None <none> 27017/TCP 5s $ kubectl get statefulset mongo NAME DESIRED CURRENT AGE mongo 3 3 1m
With the StatefulSet deployed, Kubernetes takes care of the pod creation, naming the pods in accordance with the spec (for example,
mongo-0
,mongo-1
,mongo-2``)
. To list the automatically created pods,get
the pod objects using the pod labelapp=mongo
, as follows:$ kubectl get pod -l app=mongo NAME READY STATUS RESTARTS AGE mongo-0 1/1 Running 0 30s mongo-1 1/1 Running 0 30s mongo-2 1/1 Running 0 30s
Additionally, Kubernetes takes care of creating the persistent volume claims (pvc) and binding the persistent volumes to the pods. To list the persistent volume claims,
get
the pvc objects, as follows:Each MongoDB replica set node has now been created as an (internally) DNS addressable Kubernetes pod, bound to its own persistent storage. Specifically, the internal Kubernetes DNS hostname for each pod consistently uses a combination of the StatefulSet name appended with the pod number, that is:$ kubectl get pvc NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE mongo‑data‑mongo‑0 Bound pvc‑3ed73cf3‑0940‑11e8‑ac45‑925e6fdab1e7 20Gi RWO ibmc‑file‑bronze 1m mongo‑data‑mongo‑1 Bound pvc‑3ed82f17‑0940‑11e8‑ac45‑925e6fdab1e7 20Gi RWO ibmc‑file‑bronze 1m mongo‑data‑mongo‑2 Bound pvc‑3ed90d7e‑0940‑11e8‑ac45‑925e6fdab1e7 20Gi RWO ibmc‑file‑bronze 1m
Additionally, using the headless service, the domain-managed service takes this form:$(statefulset name)‑$(ordinal)
$(servicename).$(namespace).svc.cluster.local
- Finally, the pods can be referred to using this subdomain pattern: For example, using the configuration provided, each MongoDB node can be referenced using following DNS names:
$(podname).$(servicename)
mongo‑0.mongo, mongo‑1.mongo, mongo‑2.mongo
The persistent storage is configured to be accessible on the /data/db
mount path within each pod, by default using the IBM Cloud File Storage offering. If a pod is destroyed, any future incarnations of the pod with the same name will automatically mount to the same persistent storage (unless the persistent storage claim was manually destroyed).
All that remains is to initialize the replica set using each node.
Step 5. Initiate and configure the MongoDB replica set
With each MongoDB replica set node created, the replica set itself is required to be initiated and configured. To do so, access the first node and run a series of mongo
commands.
Note: These commands can be scripted as part of a CI/CD process, as can the addition or removal of new nodes.
- First, execute the MongoDB shell on the first replica set node:
$ kubectl exec ‑it mongo‑0 ‑‑ mongo
- Now, initiate the MongoDB replica set:
> rs.initiate()
- Reconfigure the first member of the replica set with the correct DNS name:
> var cfg = rs.conf();cfg.members[0].host="mongo‑0.mongo:27017";rs.reconfig(cfg)
- Add the remaining replica set nodes:
rs.add("mongo‑1.mongo:27017") rs.add("mongo‑2.mongo:27017")
- Finally, confirm the replica set by reviewing the new configuration:
> rs.status()
The new configuration should show three members, with an “_id
” value corresponding to their hostname.
The MongoDB replica set is now initiated and configured. At this point, the MongoDB replica set is essentially ready to be accessed by a client application that is running within the same Kubernetes namespace.
Note: To access the MongoDB replica set from outside the Kubernetes cluster, a Kubernetes “NodePort” or “LoadBalancer” type of Service is required. See the IBM Cloud Kubernetes Service documentation for more details on how to plan external networking.
Using the configuration provided in this article, the MongoDB replica set URI connection string is:
mongodb://mongo‑0.mongo:27017,mongo‑1.mongo:27017,mongo‑2.mongo/myproject?replicaSet=rs0
Production considerations
This article should be considered a getting started guide. There are more factors to consider when running MongoDB in production using Kubernetes, most of which have already been addressed in other articles and blogs. Future articles will provide instructions specific to the IBM Cloud Kubernetes Service offering.
For example, it is not recommended that you use the provided configuration in production without considering at least the following (see “Related topics” below for more information on these subjects):
- Authentication (mongoDb)
- Not running as root
- Using MongoDB Enterprise Edition (mongoDb)
- Sharding clusters (mongoDb)
- Addressing the maximum number of members (mongoDb)
- Managing compute resources (Kubernetes)
- Safely store and access private Docker images in a highly available and scalable architecture using the IBM Container Registry (IBM Cloud)
Conclusion
This article has demonstrated the art of the possible in how to deploy a MongoDB replica set in the IBM Cloud Kubernetes Service, which is based on Kubernetes.
I have shown that by using Kubernetes StatefulSets, you can address many of the concerns of running MongoDB using container technology, therefore realizing the benefits provided by Kubernetes.
Additionally, I have touched on the benefits of using the IBM Cloud Kubernetes Service, specifically highlighting the IBM Cloud File Storage offering that underpins the service.
Finally, by following the instructions provided in this article, you can create a MongoDB replica set on the IBM Cloud Kubernetes Service within minutes, forming the basis for a development and testing environment. With further considerations, a production environment is also possible.
Share our content
-
- What you will need
- Introduction: Two considerations
- Kubernetes StatefulSets
- StatefulSets with the IBM Cloud Kubernetes Service
- Step 1. Create the MongoDB replica set on the IBM Cloud Kubernetes Service
- Step 2. Create the headless service configuration file
- Step 3. Create the StatefulSet configuration file
- Step 4. Deploy the Service and StatefulSet
- Step 5. Initiate and configure the MongoDB replica set
- Production considerations
- Conclusion