IBM Developer Day | Bengaluru | March 14th Register now
By Peter Tuton | Updated May 28, 2018 - Published March 15, 2018
Some in the developer community have expressed concern that Docker containers and Kubernetes, in spite of their obvious advantages individually, do not work especially well together. In this article, I explain that the Kubernetes developers have responded to these concerns by introducing StatefulSets. In this tutorial, I show you how to deploy a MongoDB replica set in the Kubernetes-based IBM Cloud Kubernetes Service. Along the way, I outline MongoDB’s cluster requirements and show how to deploy Kubernetes StatefulSets. Finally, I conclude with some advice on using these tools in a production environment.
This article assumes at least beginner-level knowledge of the following:
If you have more in-depth knowledge of these technologies and simply want to skip ahead to the deployment instructions, go to “Create the MongoDB replica set on the IBM Cloud Kubernetes Service.”
In Andrew Morgan’s blog posting, “Running MongoDB as a Microservice with Docker and Kubernetes,” he outlines the considerations for running MongoDB with containers and orchestration technology, such as Kubernetes. Of particular note are the following points:
Those with at least a beginner-level knowledge of Docker might recognize that the first consideration is the antithesis of one of the basic principles of Docker containers, whose guidelines state that “containers should be as ephemeral as possible,” not stateful. Moreover, attempting to add statefulness to a Docker container can be challenging. If not done correctly, it could break another design principle of Docker containers—that of “fast, consistent delivery of your applications.”
Morgan’s second consideration relates to container networking, particularly with respect to hostnames because containers’ instances (“pods”) are automatically destroyed and recreated with new hostnames after automatic rescheduling. As such, the features of Kubernetes that make it attractive as a system for automating deployment, scaling, and management of containerized applications can potentially break inter-container communication.
Given these facts, many architects do not consider Docker containers and Kubernetes to be a means of making available MongoDB replica sets, despite the inherent benefits of these technologies. To address these concerns, the Kubernetes community has introduced StatefulSets.
StatefulSets were introduced in Kubernetes as a beta resource in Version 1.5 and became stable and made generally available in Version 1.9. They are specifically designed to manage stateful applications, such as MongoDB.
The Kubernetes documentation states:
Like a Deployment, a StatefulSet manages Pods that are based on an identical container spec. Unlike a Deployment, a StatefulSet maintains a sticky identity for each of their Pods. These pods are created from the same spec, but are not interchangeable: each has a persistent identifier that it maintains across any rescheduling.
Critically, StatefulSets provide two (out of five) important features that relate to the considerations for running MongoDB as a microservice with Docker and Kubernetes:
In short, this means that by using a single MongoDB Docker container, you can define a StatefulSet configuration within Kubernetes that can automatically create and attach persistent storage for a MongoDB replica set node. This set node will be addressable using a unique, known network name for any number of newly created nodes.
Given that the primary purpose of MongoDB is to provide access to stored data, it’s critical that the characteristics of the underlying persistent storage are able to meet various non-functional requirements of the application or the business.
For example, many applications require predictable IOPS. Business or government requirements may also stipulate that application storage needs to be encrypted at rest or replicated for redundancy (despite MongoDB’s inherent replication feature).
By default, the IBM Cloud Kubernetes Service is configured to use IBM Cloud’s NFS-based file storage offerings, namely Endurance storage and Performance storage. The service provides eight pre-defined storage classes so that the cluster administrator does not have to create any such classes.
Using StatefulSets, the configuration simply selects one of the existing storage classes according to the application or business requirements; nothing more is required.
The IBM Cloud File Storage offering provides numerous benefits, including:
Note: IBM Cloud Kubernetes Service worker nodes use encrypted SSDs, by default.
There is almost no difference between the configuration used for IBM Cloud Kubernetes Service and that used by other Kubernetes providers. That is one of the many reasons to invest in providers offering Kubernetes. The only real difference is the storageClassName value.
Note: It is assumed that you’ve already created a Kubernetes Service cluster. For details on how to create an IBM Cloud Kubernetes Service cluster, see “Setting up clusters” in the IBM Cloud Kubernetes Service documentation.
Note: Ensure that your cluster is running at least Kubernetes Version 1.9.2. You can do this by passing --kube-version 1.9.2 to the cluster creation command or by updating an existing cluster and its worker nodes (see the IBM Cloud documentation).
After completing this section, you will have created a three-node MongoDB replica set within your Kubernetes cluster running in the IBM Cloud Kubernetes Service. This leverages the IBM Cloud File Storage offering for persistent volumes.
The figure below illustrates the three main components to be created, and where they would be logically placed in a full application deployment, including:
The Persistent Volumes (IBM Cloud File Storage)
Note: The Edge Services, Security, LoadBalancer/Ingress Service, and Application components are not addressed in this article. See “Planning external networking” in the documentation for details on how to deploy apps in a cluster, including network options for exposing the app.
A Kubernetes headless service controls the domain of pods, without requiring a load-balancer service or exposing an IP address, as would other Kubernetes service types. This ensures that the pods that match the service’s selector are grouped together.
For the headless service, create the following configuration file:
- name: mongo
A headless service is created with the clusterIP value being set to None (line 12).
Ensure that you use the appropriate port number for the targetPort (line 11). It should match the value used in the StatefulSet configuration (line 37) in Step 3 below, which is the port used by the mongod service. By default, the port is 27017. The value used for port parameter will be the port on which any MongoDB clients connect to the service.
For the StatefulSet, create the following configuration file:
- key: "app"
- name: mongo
- containerPort: 27017
- name: mongo-data
accessModes: [ "ReadWriteOnce" ]
Note: The affinity section (lines 18-27) ensures that a MongoDB replica set pod is not scheduled to run on a cluster worker node that is already running a pod. Doing so would introduce a potential point-of-failure if all three pods are running on a single worker node and the worker node fails.
With the configurations created, it’s time to deploy the resources.
To deploy the headless Service and StatefulSet, simply execute kubectl commands using the IBM Cloud CLI, Bluemixor bx, and the Kubernetes CLI, kubectl.
$ bx login
$ $(bx cs cluster-config sandbox-cluster | grep KUBECONFIG)
Next, execute the following commands (lines 1 and 4). Their expected output is shown.
$ kubectl apply -f mongo-headless-service.yaml
service "mongo" created
$ kubectl apply -f mongo-statefulset.yaml
statefulset "mongo" created
Note: The configuration files for the headless service and StatefulSet can be merged into a single file, if desired, with the configuration for each separated with a line containing “---.”
To confirm, get each object using the following commands (lines 1 and 5). The expected output is shown for each.
$ kubectl get service mongo
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
mongo ClusterIP None <none> 27017/TCP 5s
$ kubectl get statefulset mongo
NAME DESIRED CURRENT AGE
mongo 3 3 1m
With the StatefulSet deployed, Kubernetes takes care of the pod creation, naming the pods in accordance with the spec (for example, mongo-0, mongo-1, mongo-2``). To list the automatically created pods, get the pod objects using the pod label app=mongo, as follows:
$ kubectl get pod -l app=mongo
NAME READY STATUS RESTARTS AGE
mongo-0 1/1 Running 0 30s
mongo-1 1/1 Running 0 30s
mongo-2 1/1 Running 0 30s
Additionally, Kubernetes takes care of creating the persistent volume claims (pvc) and binding the persistent volumes to the pods. To list the persistent volume claims, get the pvc objects, as follows:
$ kubectl get pvc
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
mongo‑data‑mongo‑0 Bound pvc‑3ed73cf3‑0940‑11e8‑ac45‑925e6fdab1e7 20Gi RWO ibmc‑file‑bronze 1m
mongo‑data‑mongo‑1 Bound pvc‑3ed82f17‑0940‑11e8‑ac45‑925e6fdab1e7 20Gi RWO ibmc‑file‑bronze 1m
mongo‑data‑mongo‑2 Bound pvc‑3ed90d7e‑0940‑11e8‑ac45‑925e6fdab1e7 20Gi RWO ibmc‑file‑bronze 1m
mongo‑0.mongo, mongo‑1.mongo, mongo‑2.mongo
The persistent storage is configured to be accessible on the /data/db mount path within each pod, by default using the IBM Cloud File Storage offering. If a pod is destroyed, any future incarnations of the pod with the same name will automatically mount to the same persistent storage (unless the persistent storage claim was manually destroyed).
All that remains is to initialize the replica set using each node.
With each MongoDB replica set node created, the replica set itself is required to be initiated and configured. To do so, access the first node and run a series of mongo commands.
Note: These commands can be scripted as part of a CI/CD process, as can the addition or removal of new nodes.
$ kubectl exec ‑it mongo‑0 ‑‑ mongo
> var cfg = rs.conf();cfg.members.host="mongo‑0.mongo:27017";rs.reconfig(cfg)
The new configuration should show three members, with an “_id” value corresponding to their hostname.
The MongoDB replica set is now initiated and configured. At this point, the MongoDB replica set is essentially ready to be accessed by a client application that is running within the same Kubernetes namespace.
Note: To access the MongoDB replica set from outside the Kubernetes cluster, a Kubernetes “NodePort” or “LoadBalancer” type of Service is required. See the IBM Cloud Kubernetes Service documentation for more details on how to plan external networking.
Using the configuration provided in this article, the MongoDB replica set URI connection string is:
This article should be considered a getting started guide. There are more factors to consider when running MongoDB in production using Kubernetes, most of which have already been addressed in other articles and blogs. Future articles will provide instructions specific to the IBM Cloud Kubernetes Service offering.
For example, it is not recommended that you use the provided configuration in production without considering at least the following (see “Related topics” below for more information on these subjects):
This article has demonstrated the art of the possible in how to deploy a MongoDB replica set in the IBM Cloud Kubernetes Service, which is based on Kubernetes.
I have shown that by using Kubernetes StatefulSets, you can address many of the concerns of running MongoDB using container technology, therefore realizing the benefits provided by Kubernetes.
Additionally, I have touched on the benefits of using the IBM Cloud Kubernetes Service, specifically highlighting the IBM Cloud File Storage offering that underpins the service.
Finally, by following the instructions provided in this article, you can create a MongoDB replica set on the IBM Cloud Kubernetes Service within minutes, forming the basis for a development and testing environment. With further considerations, a production environment is also possible.
Learn the structure and basics of YAML and see an example used in Kubernetes.
ContainersIBM Cloud Native+
Think is IBM's flagship technology conference. This year, one of our leading containers dev advocates will be live streaming on…
Are you worried that all your code works in a production environment the same way it does locally? Join us…
Back to top