You’ve heard of Kubernetes, but what is it, really? Can you explain it to your boss? Or your coworkers? Or your dog?
Kubernetes is an open source container orchestration tool developed by Google (source code in GitHub), where it has been in use for 15 years. But what does that mean? And why should you care?
Let me start by outlining the problems with running applications in container clusters. Then I’ll show you what Kubernetes is NOT. And finally, I’ll show you how Kubernetes solves the aforementioned problems.
When you’re finished you should be able to explain Kubernetes so well they’ll all be eating out of your hand.
In this section, we’ll look at a three of the problems you’ll face when running container-based applications in a clustered environment. Any solution needs to address all of these (spoiler alert: Kubernetes does!).
You’ve got this great container-based application? Awesome! Now you need to make sure it runs when and where it’s supposed to. It’s important for your application to be running on the right machines in your cluster, but not all machines in the cluster are necessarily alike.
Your application is up and running. Great! Now you need to make sure that the client load is spread evenly among the nodes in your cluster. It is important that your application is making optimal use of the resources on each host to handle the client load. You don’t want some containers working at full throttle while others sit idle.
You have your containers running and the client load is balancing nicely between them. Super! Now you need to be able to bring containers online to handle the load (and spikes in demand), and kill them off when they are no longer needed. It is important to be able to handle spikes in client requests.
Cluster management and monitoring
Now that you have your application running efficiently on that giant cluster, you have to manage it. You need to define, launch, scale, load balance, and monitor the health of the containers that are running. Not an easy task.
What Kubernetes is NOT
Platform as a Service (PaaS)
While Kubernetes does a number of things that a PaaS offering would such as storage management and cluster logging and monitoring. But, Kubernetes is not really a PaaS offering because it does not provide components like the operating system, or supporting tools like Java and Docker. Kubernetes does, however, integrate nicely with PaaS offerings like Bluemix and OpenShift.
A data processing framework
Kubernetes is a framework that is definitely suitable for running Big Data applications, but does not perform – or provide services which perform – the same function as data processing frameworks like Apache Spark and Hadoop Map/Reduce. However, Kubernetes integrates nicely with both Spark and Hadoop (just to name two).
Kubernetes doesn’t build your application’s containers like Jenkins and other CI tools, but (surprise!) does work well with CI to help manage updates to your application as it evolves through its lifecycle.
Kubernetes addresses each of the problems listed above (you’re not shocked, are you?). In the sections below I talk through how, and I’ll introduce Kubernetes terminology (in bold italics) along the way.
A Kubernetes Pod is a group of containers that work together to perform an application function (or set of functions), and is the unit of scheduling in Kubernetes.
When a pod is created, the scheduler finds the most suitable Node (host machine in the cluster) on which it should run. This is handled by the
kube-scheduler component, which selects candidate nodes in the cluster, and makes sure that the resources provided by that node match those required by the containers in the pod.
In Kubernetes, load balancing by default is handled by services. For each service you can provide a label selector, used to identify the pod’s replicas. Since the physical location of the replicas is immaterial, the clients that need require their functionality neither know nor care where they actually run. The scheduler uses the label selector to select the right service for the request, and make sure that the client load is always balanced.
In certain supported Cloud environments, such as IBM Bluemix Container Service, Google Compute Engine (GCE) and Amazon Web Services (AWS), you can configure a service to use the Cloud provider’s load balancer by specifying the service type as LoadBalancer.
A Kubernetes Replication Controller makes sure that the specified number of pod replicas are running in the cluster at all times.
The Replication Controller handles scaling the app by ensuring that the number of replicas you want to be running is in fact always running. If there are too few (maybe one or more died for some reason), the replication controller starts more until the target is reached. If too many are running (in the case of auto-scaling) it kills some off.
Cluster management and monitoring
The Kubernetes Dashboard is a web-based UI for monitoring that includes screens to manage the pods that are running, and view metrics like CPU and memory usage. It is not deployed by default, but with the
kubectl command, you can deploy the dashboard and begin using it:
kubectl create -f https://rawgit.com/kubernetes/dashboard/master/src/deploy/kubernetes-dashboard.yaml
You should have a better idea of the problems Kubernetes solves, and how.
Now march into the next staff meeting and wow your boss and coworkers with your (high-level) understanding of Kubernetes. If they’re not eating out of your hand when you’re done, maybe you need a new job. But at least your dog will still love you.
References and other Kubernetes resources
I’ve sprinkled links throughout this document to help you learn more about Kubernetes, but I thought I would include a few here that are more overview-level. Enjoy!
- The source: https://kubernetes.io/docs/concepts/overview/what-is-kubernetes/
- IBM Bluemix Container Service – A highly secure, native Kubernetes experience for rapidly building cognitive apps
- Scaling containers: The essential guide to container cluster
- Kubernetes and IBM Bluemix: How to deploy, manage, and secure your container-based workloads
- Compare other container cluster management tools
- K8s scheduler