Taxonomy Icon

Containers

Some in the developer community have expressed concern that Docker containers and Kubernetes, in spite of their obvious advantages individually, do not work especially well together. In this article, I explain that the Kubernetes developers have responded to these concerns by introducing StatefulSets. In this tutorial, I show you how to deploy a MongoDB replica set in the Kubernetes-based IBM Cloud Kubernetes Service. Along the way, I outline MongoDB’s cluster requirements and show how to deploy Kubernetes StatefulSets. Finally, I conclude with some advice on using these tools in a production environment.

What you will need

This article assumes at least beginner-level knowledge of the following:

If you have more in-depth knowledge of these technologies and simply want to skip ahead to the deployment instructions, go to “Create the MongoDB replica set on the IBM Cloud Kubernetes Service.”

Introduction: Two considerations

In Andrew Morgan’s blog posting, “Running MongoDB as a Microservice with Docker and Kubernetes,” he outlines the considerations for running MongoDB with containers and orchestration technology, such as Kubernetes. Of particular note are the following points:

  • MongoDB database nodes are stateful.
  • MongoDB database nodes within a replica set must communicate with each other, including after rescheduling (that is, when new nodes are created).

Those with at least a beginner-level knowledge of Docker might recognize that the first consideration is the antithesis of one of the basic principles of Docker containers, whose guidelines state that “containers should be as ephemeral as possible,” not stateful. Moreover, attempting to add statefulness to a Docker container can be challenging. If not done correctly, it could break another design principle of Docker containers—that of “fast, consistent delivery of your applications.”

Morgan’s second consideration relates to container networking, particularly with respect to hostnames because containers’ instances (&pods&) are automatically destroyed and recreated with new hostnames after automatic rescheduling. As such, the features of Kubernetes that make it attractive as a system for automating deployment, scaling, and management of containerized applications can potentially break inter-container communication.

Given these facts, many architects do not consider Docker containers and Kubernetes to be a means of making available MongoDB replica sets, despite the inherent benefits of these technologies. To address these concerns, the Kubernetes community has introduced StatefulSets.

Kubernetes StatefulSets

StatefulSets were introduced in Kubernetes as a beta resource in Version 1.5 and became stable and made generally available in Version 1.9. They are specifically designed to manage stateful applications, such as MongoDB.

The Kubernetes documentation states:

Like a Deployment, a StatefulSet manages Pods that are based on an identical container spec. Unlike a Deployment, a StatefulSet maintains a sticky identity for each of their Pods. These pods are created from the same spec, but are not interchangeable: each has a persistent identifier that it maintains across any rescheduling.


Critically, StatefulSets provide two (out of five) important features that relate to the considerations for running MongoDB as a microservice with Docker and Kubernetes:

  • Stable, persistent storage
  • Stable, unique network identifiers

In short, this means that by using a single MongoDB Docker container, you can define a StatefulSet configuration within Kubernetes that can automatically create and attach persistent storage for a MongoDB replica set node. This set node will be addressable using a unique, known network name for any number of newly created nodes.

StatefulSets with the IBM Cloud Kubernetes Service

Given that the primary purpose of MongoDB is to provide access to stored data, it’s critical that the characteristics of the underlying persistent storage are able to meet various non-functional requirements of the application or the business.

For example, many applications require predictable IOPS. Business or government requirements may also stipulate that application storage needs to be encrypted at rest or replicated for redundancy (despite MongoDB’s inherent replication feature).

By default, the IBM Cloud Kubernetes Service is configured to use IBM Cloud’s NFS-based file storage offerings, namely Endurance storage and Performance storage. The service provides eight pre-defined storage classes so that the cluster administrator does not have to create any such classes.

Using StatefulSets, the configuration simply selects one of the existing storage classes according to the application or business requirements; nothing more is required.

The IBM Cloud File Storage offering provides numerous benefits, including:

  • Flash-backed storage
  • Encryption for data at rest
  • Snapshots and replication
  • Volume duplication
  • Expandable volumes
  • Adjustable IOPS

Note: IBM Cloud Kubernetes Service worker nodes use encrypted SSDs, by default.

Step 1. Create the MongoDB replica set on the IBM Cloud Kubernetes Service

There is almost no difference between the configuration used for IBM Cloud Kubernetes Service and that used by other Kubernetes providers. That is one of the many reasons to invest in providers offering Kubernetes. The only real difference is the storageClassName value.

Note: It is assumed that you’ve already created a Kubernetes Service cluster. For details on how to create an IBM Cloud Kubernetes Service cluster, see “Setting up clusters” in the IBM Cloud Kubernetes Service documentation.

Note: Ensure that your cluster is running at least Kubernetes Version 1.9.2. You can do this by passing --kube-version 1.9.2 to the cluster creation command or by updating an existing cluster and its worker nodes (see the IBM Cloud documentation).

After completing this section, you will have created a three-node MongoDB replica set within your Kubernetes cluster running in the IBM Cloud Kubernetes Service. This leverages the IBM Cloud File Storage offering for persistent volumes.

The figure below illustrates the three main components to be created, and where they would be logically placed in a full application deployment, including:

  • The headless service
  • The StatefulSet, including the MongoDB containers and associated persistent volume claims
  • The Persistent Volumes (IBM Cloud File Storage)

    Figure 1. The main components of a three-node MongoDB replica set

Note: The Edge Services, Security, LoadBalancer/Ingress Service, and Application components are not addressed in this article. See “Planning external networking” in the documentation for details on how to deploy apps in a cluster, including network options for exposing the app.

Step 2. Create the headless service configuration file

A Kubernetes headless service controls the domain of pods, without requiring a load-balancer service or exposing an IP address, as would other Kubernetes service types. This ensures that the pods that match the service’s selector are grouped together.

For the headless service, create the following configuration file:

apiVersion: v1
kind: Service
metadata:
  name: mongo
  labels:
    app: mongo
spec:
  ports:
  - name: mongo
    port: 27017
    targetPort: 27017
  clusterIP: None
  selector:
    app: mongo

A headless service is created with the clusterIP value being set to None (line 12).

Ensure that you use the appropriate port number for the targetPort (line 11). It should match the value used in the StatefulSet configuration (line 37) in Step 3 below, which is the port used by the mongod service. By default, the port is 27017. The value used for port parameter will be the port on which any MongoDB clients connect to the service.

Step 3. Create the StatefulSet configuration file

For the StatefulSet, create the following configuration file:

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: mongo
spec:
  selector:
    matchLabels:
      app: mongo
  serviceName: "mongo"
  replicas: 3
  podManagementPolicy: Parallel
  template:
    metadata:
      labels:
        app: mongo
    spec:
      terminationGracePeriodSeconds: 10
      affinity:
         podAntiAffinity:
           requiredDuringSchedulingIgnoredDuringExecution:
           - labelSelector:
               matchExpressions:
               - key: "app"
                 operator: In
                 values:
                 - mongo
             topologyKey: "kubernetes.io/hostname"
      containers:
      - name: mongo
        image: mongo
        command: 
        - mongod 
        - "--bind_ip_all"
        - "--replSet"
        - rs0
        ports:
        - containerPort: 27017
        volumeMounts:
        - name: mongo-data
          mountPath: /data/db
  volumeClaimTemplates:
  - metadata:
      name: mongo-data
    spec:
      accessModes: [ "ReadWriteOnce" ]
      storageClassName: ibmc-file-bronze
      resources:
        requests:
          storage: 20Gi

Note: The affinity section (lines 18-27) ensures that a MongoDB replica set pod is not scheduled to run on a cluster worker node that is already running a pod. Doing so would introduce a potential point-of-failure if all three pods are running on a single worker node and the worker node fails.

With the configurations created, it’s time to deploy the resources.

Step 4. Deploy the Service and StatefulSet

To deploy the headless Service and StatefulSet, simply execute kubectl commands using the IBM Cloud CLI, Bluemixor bx, and the Kubernetes CLI, kubectl.

  1. First, log in to the IBM Cloud CLI: $ bx login
  2. Then, set the context for the cluster in the CLI: $ $(bx cs cluster-config sandbox-cluster | grep KUBECONFIG)
  3. Next, execute the following commands (lines 1 and 4). Their expected output is shown.

    $ kubectl apply -f mongo-headless-service.yaml
    service "mongo" created
    
    $ kubectl apply -f mongo-statefulset.yaml
    statefulset "mongo" created
    

    Note: The configuration files for the headless service and StatefulSet can be merged into a single file, if desired, with the configuration for each separated with a line containing “---.”

  4. To confirm, get each object using the following commands (lines 1 and 5). The expected output is shown for each.

    $ kubectl get service mongo
    NAME    TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)     AGE
    mongo   ClusterIP   None         <none>        27017/TCP   5s
    
    $ kubectl get statefulset mongo
    NAME    DESIRED   CURRENT   AGE
    mongo   3         3         1m
    
  5. With the StatefulSet deployed, Kubernetes takes care of the pod creation, naming the pods in accordance with the spec (for example, mongo-0, mongo-1, mongo-2``). To list the automatically created pods, get the pod objects using the pod label app=mongo, as follows:

    $ kubectl get pod -l app=mongo
    NAME      READY     STATUS    RESTARTS   AGE
    mongo-0   1/1       Running   0          30s
    mongo-1   1/1       Running   0          30s
    mongo-2   1/1       Running   0          30s
    
  6. Additionally, Kubernetes takes care of creating the persistent volume claims (pvc) and binding the persistent volumes to the pods. To list the persistent volume claims, get the pvc objects, as follows:

    
        $ kubectl get pvc
        NAME                 STATUS    VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS       AGE
        mongo‑data‑mongo‑0   Bound     pvc‑3ed73cf3‑0940‑11e8‑ac45‑925e6fdab1e7   20Gi       RWO            ibmc‑file‑bronze   1m
        mongo‑data‑mongo‑1   Bound     pvc‑3ed82f17‑0940‑11e8‑ac45‑925e6fdab1e7   20Gi       RWO            ibmc‑file‑bronze   1m
        mongo‑data‑mongo‑2   Bound     pvc‑3ed90d7e‑0940‑11e8‑ac45‑925e6fdab1e7   20Gi       RWO            ibmc‑file‑bronze   1m
    Each MongoDB replica set node has now been created as an (internally) DNS addressable Kubernetes pod, bound to its own persistent storage. Specifically, the internal Kubernetes DNS hostname for each pod consistently uses a combination of the StatefulSet name appended with the pod number, that is:
    $(statefulset name)‑$(ordinal)
    Additionally, using the headless service, the domain-managed service takes this form:
    $(servicename).$(namespace).svc.cluster.local

  7. Finally, the pods can be referred to using this subdomain pattern:
    $(podname).$(servicename)
    For example, using the configuration provided, each MongoDB node can be referenced using following DNS names:
    mongo‑0.mongo, mongo‑1.mongo, mongo‑2.mongo

The persistent storage is configured to be accessible on the /data/db mount path within each pod, by default using the IBM Cloud File Storage offering. If a pod is destroyed, any future incarnations of the pod with the same name will automatically mount to the same persistent storage (unless the persistent storage claim was manually destroyed).

All that remains is to initialize the replica set using each node.

Step 5. Initiate and configure the MongoDB replica set

With each MongoDB replica set node created, the replica set itself is required to be initiated and configured. To do so, access the first node and run a series of mongo commands.

Note: These commands can be scripted as part of a CI/CD process, as can the addition or removal of new nodes.

  1. First, execute the MongoDB shell on the first replica set node:
    
        $ kubectl exec ‑it mongo‑0 ‑‑ mongo
  2. Now, initiate the MongoDB replica set:
    
        > rs.initiate()
  3. Reconfigure the first member of the replica set with the correct DNS name:
    
        > var cfg = rs.conf();cfg.members[0].host="mongo‑0.mongo:27017";rs.reconfig(cfg)
  4. Add the remaining replica set nodes:
    
    rs.add("mongo‑1.mongo:27017")
    rs.add("mongo‑2.mongo:27017")
    
  5. Finally, confirm the replica set by reviewing the new configuration:
    > rs.status()

The new configuration should show three members, with an “_id” value corresponding to their hostname.

The MongoDB replica set is now initiated and configured. At this point, the MongoDB replica set is essentially ready to be accessed by a client application that is running within the same Kubernetes namespace.

Note: To access the MongoDB replica set from outside the Kubernetes cluster, a Kubernetes “NodePort” or “LoadBalancer” type of Service is required. See the IBM Cloud Kubernetes Service documentation for more details on how to plan external networking.

Using the configuration provided in this article, the MongoDB replica set URI connection string is:


mongodb://mongo‑0.mongo:27017,mongo‑1.mongo:27017,mongo‑2.mongo/myproject?replicaSet=rs0

Production considerations

This article should be considered a getting started guide. There are more factors to consider when running MongoDB in production using Kubernetes, most of which have already been addressed in other articles and blogs. Future articles will provide instructions specific to the IBM Cloud Kubernetes Service offering.

For example, it is not recommended that you use the provided configuration in production without considering at least the following (see “Related topics” below for more information on these subjects):

  • Authentication (mongoDb)
  • Not running as root
  • Using MongoDB Enterprise Edition (mongoDb)
  • Sharding clusters (mongoDb)
  • Addressing the maximum number of members (mongoDb)
  • Managing compute resources (Kubernetes)
  • Safely store and access private Docker images in a highly available and scalable architecture using the IBM Container Registry (IBM Cloud)

Conclusion

This article has demonstrated the art of the possible in how to deploy a MongoDB replica set in the IBM Cloud Kubernetes Service, which is based on Kubernetes.

I have shown that by using Kubernetes StatefulSets, you can address many of the concerns of running MongoDB using container technology, therefore realizing the benefits provided by Kubernetes.

Additionally, I have touched on the benefits of using the IBM Cloud Kubernetes Service, specifically highlighting the IBM Cloud File Storage offering that underpins the service.

Finally, by following the instructions provided in this article, you can create a MongoDB replica set on the IBM Cloud Kubernetes Service within minutes, forming the basis for a development and testing environment. With further considerations, a production environment is also possible.