Kubernetes is an open-source container orchestration platform that allows users to automate deployment, scaling, and management of their containerized applications with ease. Autoscaling is a key feature in Kubernetes clusters that allows users to scale their applications automatically based on demand.
This guide will step you through the process of enabling autoscaling in your Kubernetes cluster and configuring an application to automatically scale up or down based on its CPU utilization.
- A provisioned multi-node Kubernetes (version 1.7 or later) cluster.
It should take about one hour to complete this how-to.
1. Setup Heapster to collect pod metrics
Heapster monitoring needs to be deployed on the Kubernetes cluster for the Autoscaler to collect metrics such as CPU and memory utilization of the pods. In this guide, we will set up Heapster with an InfluxDB backend and a Grafana interface.
First, download the following yaml files:
curl https://raw.githubusercontent.com/kubernetes/heapster/master/deploy/kube-config/influxdb/grafana.yaml > grafana.yaml curl https://raw.githubusercontent.com/kubernetes/heapster/master/deploy/kube-config/influxdb/heapster.yaml > heapster.yaml curl https://raw.githubusercontent.com/kubernetes/heapster/master/deploy/kube-config/influxdb/influxdb.yaml > influxdb.yaml curl https://raw.githubusercontent.com/kubernetes/heapster/master/deploy/kube-config/rbac/heapster-rbac.yaml > heapster-rbac.yaml
Create service instances of Grafana, Heapster, and InfluxDB, with the corresponding Kubernetes service account and role binding:
$ kubectl create -f grafana.yaml deployment "monitoring-grafana" created service "monitoring-grafana" created $ kubectl create -f heapster.yaml serviceaccount "heapster" created deployment "heapster" created service "heapster" created $ kubectl create -f influxdb.yaml deployment "monitoring-influxdb" created service "monitoring-influxdb" created $ kubectl create -f heapster-rbac.yaml clusterrolebinding "heapster" created
2. Create a deployment
For demonstration purpose, we create a test deployment using the ubuntu image running the sleep command. You can replace this with your own application. Note that the requests cpu flag has to be set. It is required for the autoscaler to work when scaling based on CPU utilization.
$ kubectl run autoscale-test --image=ubuntu:16.04 --requests=cpu=1000m --command sleep 1800 deployment "autoscale-test" created
3. Setup a Horizontal Pod Autoscaler
Now, set up a Horizontal Pod Autoscaler to monitor and autoscale the deployment that was just created in the previous step. You have to specify the target CPU utilization percentage, along with the minimum and maximum number of pods to be maintained. The autoscaler will create more pods (up to the maximum) to the deployment if the average CPU utilization of all existing pods exceeds the specified target. Similarly, the autoscaler will remove pods from the deployment if the average CPU utilization drops below the target.
$ kubectl autoscale deployment autoscale-test --cpu-percent=25 --min=1 --max=5 deployment "autoscale-test" autoscaled
Check the current status of the autoscaler:
$ kubectl get hpa NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE autoscale-test Deployment/autoscale-test 0% / 25% 1 5 1 1m
You can repeat this step to enable autoscaling on other deployments in the Kubernetes cluster.
4. Validate autoscaler operation
To validate that autoscaling is functioning properly, we use the “stress” utility to put some artificial load on the pod.
1. Get the pod name
$ kubectl get pod NAME READY STATUS RESTARTS AGE autoscale-test-59d66dcbf7-9fqr8 1/1 Running 0 9m
2. Install the “stress” utility on the pod
kubectl exec autoscale-test-59d66dcbf7-9fqr8 -- apt-get update kubectl exec autoscale-test-59d66dcbf7-9fqr8 -- apt-get install stress
3. Run a cpu workload on the pod for 5 minutes
$ kubectl exec autoscale-test-59d66dcbf7-9fqr8 -- stress --cpu 2 --timeout 600s & stress: info:  dispatching hogs: 2 cpu, 0 io, 0 vm, 0 hdd
4. Check the status of the autoscaler
Repeatedly checking the status of the autoscaler (hpa), you should see the number of pods to increase (up to the maximum) while the stress is running, and the number of pods will decrease (down to the minimum) after the stress is done.
$ kubectl get hpa NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE autoscale-test Deployment/autoscale-test 199% / 25% 1 5 1 13m $ kubectl get hpa NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE autoscale-test Deployment/autoscale-test 49% / 25% 1 5 4 16m $ kubectl get hpa NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE autoscale-test Deployment/autoscale-test 39% / 25% 1 5 5 20m $ kubectl get hpa NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE autoscale-test Deployment/autoscale-test 0% / 25% 1 5 1 25m
Delete the deployment created for testing and its corresponding Horizontal Pod Autoscaler:
$ kubectl delete hpa autoscale-test horizontalpodautoscaler "autoscale-test" deleted $ kubectl delete deploy autoscale-test deployment "autoscale-test" deleted
If you don’t intend to continue using autoscaling on your Kubernetes cluster, delete the Heapster, Grafana, and InfluxDB services:
$ kubectl delete -f heapster-rbac.yaml clusterrolebinding "heapster" deleted $ kubectl delete -f grafana.yaml deployment "monitoring-grafana" deleted service "monitoring-grafana" deleted $ kubectl delete -f heapster.yaml serviceaccount "heapster" deleted deployment "heapster" deleted service "heapster" deleted $ kubectl delete -f influxdb.yaml deployment "monitoring-influxdb" deleted service "monitoring-influxdb" deleted
This guide has illustrated how to set up monitoring using Heapster, using InfluxDB as a backend and Grafana as a user interface, to monitor the resource metrics on Kubernetes. It also described the steps to configure Horizontal Pod Autoscaler to automatically scale the number of pods for a deployment on Kubernetes, based on CPU utilization.