HorizontalPodAutoscaler (HPA) allow you to dynamically scale the replica count of your Deployment based on basic CPU/memory resource metrics from the metrics-server. If you want scaling based on more advanced scenarios and you are already using the Prometheus stack, the prometheus-adapter provides this enhancement.
The prometheus-adapter takes basic Prometheus metrics, and then synthesizes custom API metrics which can be used as a HorizontalPodAutoscaler trigger.
As an example, in addition to the basic Prometheus metrics (deployment counts, CPU, memory) this would allow you take any metric you exposed via Prometheus (e.g. incoming queue size, database row count, process/thread count, average blocking time, Java GC) and use this to trigger a HorizontalPodAutoscaler.
Prerequisite installation and validation
Before configuring the prometheus-adapter, you need a Kubenetes cluster:
- Running the metrics-server
- With a deployed kube-prometheus stack (Prometheus Operator, Grafana, Prometheus)
metrics-server installation
Per the official docs, you can install using the latest manifest.
# check for installation of metrics-server kubectl get deployment,service,serviceaccount metrics-server -n kube-system # if it does not exist, install using manifest kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml
metrics-server validation
# check status of metrics-server kubectl rollout status deployment -n kube-system metrics-server --timeout=90s kubectl get deployment/metrics-server -n kube-system # API group 'metrics.k8s.io/v1beta1' should have an entry if metrics-server healthy kubectl api-versions | grep "^metrics.k8s.io" # basic metrics should be available sudo apt install -y jq kubectl get --raw /apis/metrics.k8s.io/v1beta1 kubectl get --raw /apis/metrics.k8s.io/v1beta1/pods | jq kubectl get --raw /apis/metrics.k8s.io/v1beta1/nodes | jq # 'kubectl top' should report back metrics kubectl top pods kubectl top nodes
If these basic calls fail, then there is an issue with your cluster or metrics-server configuration.
If you need customization of the metrics-server, the container args can be modified for setting preferred address types and allowing insecure TLS.
kube-prometheus stack installation
In its simplest form (as described in the official docs), you install the monitoring stack using helm.
# set variables prom_release_name=prom-stack prom_release_ns=prom prom_service_name=prom-stack-kube-prometheus-prometheus # add helm repo helm repo add prometheus-community https://prometheus-community.github.io/helm-charts # validate helm repo was added helm repo list helm repo update prometheus-community # create the namespace kubectl create ns $prom_release_ns # install monitoring stack helm install --namespace $prom_release_ns $prom_release_name prometheus-community/kube-prometheus-stack # check status of prometheus stack release kubectl --namespace prom get pods -l "release=$prom_release_name" # check status of helm install helm status $prom_release_name -n $prom_release_ns # check values used during helm installation helm get values $prom_release_name -n $prom_release_ns
If you need to customize, see the values.yaml , which can then be passed to ‘helm install/upgrade’ using the ‘-f’ flag.
kube-prometheus stack validation
Validate the presence of the basic components below.
$ kubectl get deployments,ds,statefulset -n $prom_release_ns NAME READY UP-TO-DATE AVAILABLE AGE deployment.apps/prom-stack-kube-prometheus-operator 1/1 1 1 2d18h deployment.apps/prom-stack-kube-state-metrics 1/1 1 1 2d18h deployment.apps/prom-stack-grafana 1/1 1 1 2d18h NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE daemonset.apps/prom-stack-prometheus-node-exporter 3 3 3 3 3 kubernetes.io/os=linux 2d18h NAME READY AGE statefulset.apps/alertmanager-prom-stack-kube-prometheus-alertmanager 1/1 2d18h statefulset.apps/prometheus-prom-stack-kube-prometheus-prometheus 1/1 2d18h # this service IP:port will be used later by the prometheus-adapter $ kubectl get service $prom_service_name -n $prom_release_ns NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE prom-stack-kube-prometheus-prometheus ClusterIP 10.43.235.50 <none> 9090/TCP,8080/TCP 2d18h
Installing the prometheus-adapter
With validation of the metrics-server returning values via the API and the basic Prometheus monitoring stack done, we can now focus on the prometheus-adapter. This is the piece responsible for consuming Prometheus metrics, and synthesizing these into custom metrics exposed via the API.
Take note, the prometheus-adapter metrics are NOT going to be stored in Prometheus or queried via PromQL. They can only be pulled from the kube API, using a client such as kubectl. The HorizontalPodAutoscaler is able to make its evaluation based on custom metrics available via the API.
Installing prometheus-adapter
# add helm repo, will say "already exists" if Prometheus installed with same repo helm repo add prometheus-community https://prometheus-community.github.io/helm-charts # validate helm repo was added helm repo list # set variables for prometheus-adapter adapter_release_name=adapter-release adapter_release_ns=prom adapter_deployment_name=adapter-release-prometheus-adapter # set variables for Prometheus service prom_release_ns=prom prom_service_name=prom-stack-kube-prometheus-prometheus # get Prometheus service connection values prom_service_IP=$(kubectl get service $prom_service_name -n $prom_release_ns -o=jsonpath='{.spec.clusterIP}') prom_service_port=$(kubectl get service $prom_service_name -n $prom_release_ns -o=jsonpath='{.spec.ports[?(@.name=="http-web")].port}') echo "connect to $prom_service_name service at $prom_service_IP : $prom_service_port" # install helm chart with basic set of values helm install $adapter_release_name prometheus-community/prometheus-adapter --namespace=$adapter_release_ns --set prometheus.url=http://$prom_service_name.$prom_release_ns.svc --set prometheus.port=$prom_service_port # view helm chart installation status helm history $adapter_release_name -n $adapter_release_ns --max=1 # view values used in latest installation helm get values $adapter_release_name -n $adapter_release_ns # check status of deployment kubectl rollout status deployment -n $adapter_release_ns $adapter_deployment_name --timeout=90s kubectl get deployment -n $adapter_release_ns $adapter_deployment_name
Register API
You will not be able to use kubectl to query “custom.metrics.k8s.io/v1beta1” until you register the prometheus-adapter as a custom Metrics API service registered with the API aggregator. To do this, apply the following file.
# make sure jq utility is installed for json parsing sudo apt install -y jq # this will fail until API registration is done kubectl get --raw /apis/custom.metrics.k8s.io/v1beta2 | jq # register custom API service cat <<EOF | kubectl apply -f - apiVersion: apiregistration.k8s.io/v1 kind: APIService metadata: name: v1beta2.custom.metrics.k8s.io spec: group: custom.metrics.k8s.io groupPriorityMinimum: 100 insecureSkipTLSVerify: true service: name: adapter-release-prometheus-adapter namespace: prom version: v1beta2 versionPriority: 100 EOF # validate registration kubectl api-versions | grep "^custom.metrics.k8s.io/v1beta2" kubectl get --raw /apis/custom.metrics.k8s.io/v1beta2 | jq
This APIService can also be applied using ‘kubectl apply -f https://raw.githubusercontent.com/fabianlee/k3s-cluster-kvm/main/roles/prometheus-adapter/templates/api-service.yaml’
Create custom rule for scraping Prometheus metrics
The prometheus-adapter has a basic rule set for taking Prometheus metrics and exposing them as custom API metrics, but if we want more control over which of our custom Prometheus metrics gets synthesized, we need to add a custom rule(s).
This can be done by passing a custom values file to helm. Let’s create a custom values file that looks for any raw Prometheus metric ending with ‘_total’ (a counter), and creates a custom API metric of its rate of change over 2 minutes suffixed with “_per_min”.
Using our concrete example in the following section, a deployment might have 3 replicas of a web server each providing a raw Prometheus metric named “request_count_promtotal” indicating how many HTTP requests had been processed. The custom rule below would take that absolute counter and calculate the rate of change over a 2 minute period, then take the sum of all replicas.
This value would be exposed via the API as the custom metric “request_count_per_min”, and be responsible for scaling up the replica count of a deployment during high load.
cat << 'EOF' >>helm-values.yaml rules: custom: - seriesQuery: '{namespace!="",__name__!~"^container_.*"}' resources: template: "<<.Resource>>" name: matches: "^(.*)_promtotal" as: "${1}_per_min" metricsQuery: 'sum(rate(<<.Series>>{<<.LabelMatchers>>}[2m])) by (<<.GroupBy>>)' EOF
This file can also be fetched using ‘wget https://raw.githubusercontent.com/fabianlee/k3s-cluster-kvm/main/roles/prometheus-adapter/templates/helm-values.yaml’
Update prometheus-adapter with new custom rule
Pass this file in as a custom values file to ‘helm upgrade’ in order to update.
# show values file containing custom rule cat helm-values.yaml # update helm chart with rules from values file helm upgrade $adapter_release_name prometheus-community/prometheus-adapter --namespace=$adapter_release_ns --set prometheus.url=http://$prom_service_name.$prom_release_ns.svc --set prometheus.port=$prom_service_port --values=./helm-values.yaml # view values used in latest update, including custom rule for '_promtotal' helm get values $release_name -n $release_ns # view out-of-the-box rules as well as our newly added one kubectl get configmap -n prom $deployment_name -o yaml # check status of deployment kubectl get deployment -n $release_ns $deployment_name # restart deployment to make sure custom rule added kubectl rollout restart deployment -n $release_ns $deployment_name kubectl rollout status deployment -n $release_ns $deployment_name --timeout=120s # problem if 'unable to update' found in logs, probably bad connection to Prometheus service # debug using container args, --v=8 kubectl logs deployment/$deployment_name -n $release_ns | grep 'unable to update' # API group 'custom.metrics.k8s.io' should have an entry kubectl api-versions | grep "^metrics.k8s.io" # validate custom pod metrics can be pulled via API kubectl get --raw /apis/custom.metrics.k8s.io/v1beta2 | jq | grep 'pods/'
HorizontalPodAutoscaler using custom API metrics
The HorizontalPodAutoScaler is capable of scaling based on custom API metrics. So we will use the custom API metrics (which are synthesized from the raw Prometheus metrics) to drive the scaling decisions of the HPA.
Example Deployment/Service that exposes Prometheus metrics
Apply into your Kubernetes cluster the sample golang-hello-world-web Service and Deployment. This is a tiny containerized web server I wrote in GoLang (source).
# apply into Kubernetes cluster $ kubectl apply -f https://raw.githubusercontent.com/fabianlee/alpine-apache-benchmark/main/kubernetes-hpa/golang-hello-world-web.yaml service/golang-hello-world-web-service created deployment.apps/golang-hello-world-web created # wait for deployment to be ready $ kubectl rollout status deployment golang-hello-world-web -n default --timeout=90s deployment "golang-hello-world-web" successfully rolled out # Deployment has '1' replica $ kubectl get deployment golang-hello-world-web NAME READY UP-TO-DATE AVAILABLE AGE golang-hello-world-web 1/1 1 1 66s # and exposed via Service $ kubectl get service golang-hello-world-web-service NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE golang-hello-world-web-service ClusterIP 10.43.227.130 8080/TCP 46s # set hello service variables for use later hello_ns=default hello_deployment_name=golang-hello-world-web hello_service_name=golang-hello-world-web-service hello_service_IP=$(kubectl get service $hello_service_name -n $hello_ns -o=jsonpath='{.spec.clusterIP}') hello_service_port=$(kubectl get service $hello_service_name -n $hello_ns -o=jsonpath='{.spec.ports[?(@.name=="http")].port}') echo "connect to $hello_service_name service at $hello_service_IP : $hello_service_port"
Validate that raw Prometheus metrics are exposed
Each pod in the ‘golang-hello-world-web’ deployment exposes Prometheus formatted metrics at the standard “/metrics” endpoint. One of these metrics is “request_count_promtotal”, which is an absolute counter of how many HTTP requests have been served.
# smoke test of curl to simple web server pod, run multiple times to drive traffic for i in $(seq 1 10); do kubectl exec -it deployment/$hello_deployment_name -- wget http://$hello_service_IP:$hello_service_port/myhello/ -O - ; done # show prometheus /metrics endpoint, look for 'request_count_promtotal' (kubectl exec -it deployment/$hello_deployment_name -- wget http://$hello_service_IP:$hello_service_port/metrics -O -) | grep request_count_promtotal
Validate that /metrics values are persisted to Prometheus
Prove that the metric exposed by the container at /metrics is being ingested and persisted by Prometheus by pulling the metric ‘request_count_promtotal’ directly from the Prometheus API using its /api/v1/query endpoint.
# curl to Prometheus /api/v1/query endpoint to validate 'request_count_promtotal' $ (kubectl run -i --rm load-generator --image=ghcr.io/fabianlee/alpine-apache-benchmark:1.0.2 --restart=Never --command curl -- -fs http://$prom_service_IP:$prom_service_port/api/v1/query --data-urlencode "query=request_count_promtotal{pod=~'golang-hello-world-web-.*'}") | jq { "status": "success", "data": { "resultType": "vector", "result": [ { "metric": { "__name__": "request_count_promtotal", "endpoint": "http", "instance": "10.42.2.11:8080", "job": "golang-hello-world-web-service", "namespace": "default", "pod": "golang-hello-world-web-7d468d488c-6kzcz", "service": "golang-hello-world-web-service" }, "value": [ 1694304971.051, "18" ] } ] } }
Validate that Prometheus metrics are synthesized into custom API metrics by prometheus-adapter
We need to validate that our custom rule is capturing this ‘request_count_promtotal’ key and has exposed it as a custom API metric as ‘request_count_per_minute’.
$ kubectl get --raw /apis/custom.metrics.k8s.io/v1beta2 | jq | grep 'pods/' | grep request_count "name": "pods/request_count_per_min", "name": "pods/request_count_promtotal",
This proves our pod level Prometheus key ‘request_count_promtotal’ is being processed by the custom rule, and its rate can be found as the custom API metric ‘request_count_per_min’.
We should also be able to query down to the pod level with a selector filter and get the exact value of the custom metric.
# use jq utility to parse out values of each pod 'request_count_per_min' kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta2/namespaces/default/pods/*/request_count_per_min?selector=app%3D$hello_deployment_name" | jq '.items[] | select (.metric.name=="request_count_per_min").value'
It can take 60-120 seconds for the values to be updated, since the prometheus-adapter scrapes them from the /metrics at a specified interval. If the value being returned is “0”, then place some load on the deployment, wait 60 seconds and try again.
# place simple load on the deployment for i in $(seq 1 40); do kubectl exec -it deployment/$hello_deployment_name -- wget http://$hello_service_IP:$hello_service_port/myhello/ -O - ; done # wait 30 seconds, try pulling value again sleep 30 kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta2/namespaces/default/pods/*/request_count_per_min?selector=app%3D$hello_deployment_name" | jq '.items[] | select (.metric.name=="request_count_per_min").value'
Apply HorizontalPodAutoscaler that triggers based on custom API metrics
Now create an HPA that triggers scaling based on the custom API pod metric ‘request_count_per_min’ being at a higher rate that 1 request every 2 seconds.
# apply cat <<EOF | kubectl apply -f - apiVersion: autoscaling/v2 kind: HorizontalPodAutoscaler metadata: name: golang-hello-world-web-hpa spec: scaleTargetRef: apiVersion: apps/v1 kind: Deployment name: golang-hello-world-web minReplicas: 1 maxReplicas: 5 metrics: - type: Pods pods: metric: name: request_count_per_min target: type: Value averageValue: 500m # 500 milli-requests/second = 1 request every 2 seconds behavior: scaleDown: stabilizationWindowSeconds: 20 # seconds wait before adjusting, avoids flaping policies: - type: Pods value: 1 # number of pods to scale down at one time periodSeconds: 20 # seconds before each scale down selectPolicy: Max EOF # validate creation of HPA kubectl get hpa golang-hello-world-web
This HPA can also be applied using: ‘kubectl apply -f https://raw.githubusercontent.com/fabianlee/k3s-cluster-kvm/main/roles/prometheus-adapter/files/hpa.yaml’
Load Testing deployment to show scaling based on custom API metrics
In order to see the HPA triggered by the ‘request_count_per_min’ metric, we need to place a higher load on the pods in the service. The easiest way to do this is use a container that has the Apache Benchmark utility, especially since it means we do not have to consider various Ingress options, we can go straight to the internal cluster IP of the service.
# run load test that fetches the service 200 times, simulating 5 simultaneous users kubectl run -i --rm --tty load-generator --image=ghcr.io/fabianlee/alpine-apache-benchmark:1.0.2 --restart=Never --command ab -- -n200 -c5 http://$hello_service_IP:$hello_service_port/myhello/ # show prometheus /metrics endpoint, look for 'request_count_promtotal' (kubectl exec -it deployment/golang-hello-world-web -- wget http://localhost:8080/metrics -O -) | grep request_count_promtotal # wait 30 seconds sleep 30 # At prometheus-adapter level, view 'request_count_per_min' counter # > 500m will scale HPA kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta2/namespaces/default/pods/*/request_count_per_min?selector=app%3D$hello_deployment_name" | jq '.items[] | select (.metric.name=="request_count_per_min").value' # the target column will report the rate it sees, if it goes over 500m, then scaling occurs kubectl get hpa golang-hello-world-web-hpa # and the replica count will increase, but not more than maxReplicas (5) kubectl get deployment golang-hello-world-web # within 60 seconds the rate count of 'request_count_promtotal' will start decreasing kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta2/namespaces/default/pods/*/request_count_per_min?selector=app%3D$hello_deployment_name" | jq '.items[] | select (.metric.name=="request_count_per_min").value' # and that will cause scaling down of the deployment every 20 seconds by 1 pod, until it reaches minReplicas=1 while [ 1 -eq 1 ]; do kubectl get deployment golang-hello-world-web; sleep 5; done
REFERENCES
kubernetes.io, types of metric apis and explanation
kubernetes.io, scaling on external metric
kubernetes.io, resource metrics pipeline
stackoverflow, getting custom metrics with kubectl raw
blog.px.dev, custom metrics server explanation
github pixie-io, custom metrics server source
github kubernetes-sigs, Prometheus Adapter for custom metrics
github kubernetes-sigs, Prometheus Adapter walkthrough
Cezar Romaniuc, Kubernetes HPA with custom metrics from Prometheus
github luxas, custom metrics server
github issue, troubleshooting setup of prometheus-adapter
Cezar Romaniuc, HPA with custom metrics from Prometheus
Sudip Sengupta, autoscale with prometheus-adapter and custom metrics