KEDA is an open-source event-driven autoscaler that greatly enhances the abilities of the standard HorizontalPodAutoscaler. It can scale based on internal metrics as well as external Scaler sources.
In this article, I will illustrate how to install KEDA on a GKE cluster that has Workload Identity enabled, and then how to configure KEDA scaling events based on pod cpu utilization as well as messages from an external Google PubSub subscription.
GKE Cluster validation
First, verify that the GKE cluster and its nodepool have workload identity enabled.
# list clusters available gcloud container clusters list # set based on your output above cluster_name=cluster-1 location_flag=--zone=us-central1 # non-empty value indicates workload identity at cluster level gcloud container clusters describe $cluster_name $location_flag --format="value(workloadIdentityConfig.workloadPool)" # name of first found node pool nodepool_name=$(gcloud container node-pools list --cluster=$cluster_name $location_flag --format="value(name)" | head -n1) # non-empty value indicates workload identity set at node pool level gcloud container node-pools describe $nodepool_name --cluster=$cluster_name $location_flag --format="value(config.workloadMetadataConfig.mode)"
Then validate that the GKE cluster and its nodepool have the required oauth2 scopes for monitoring, “/auth/monitoring”. Most likely, you will have the list of scopes shown below, which are the ‘gke-default‘ scopes.
# oauth2 scopes below are from 'gke-default' $ gcloud container clusters describe $cluster_name $location_flag | grep oauthScopes -A10 --color - https://www.googleapis.com/auth/devstorage.read_only - https://www.googleapis.com/auth/logging.write - https://www.googleapis.com/auth/monitoring - https://www.googleapis.com/auth/servicecontrol - https://www.googleapis.com/auth/service.management.readonly - https://www.googleapis.com/auth/trace.append
If you have ‘https://www.googleapis.com/auth/cloud-platform‘, this is a wide oauth2 scope that encompasses all cloud services and is valid for our monitoring needs.
Preparing Google Service Account (GSA) for KEDA
For a GKE cluster that has Workload Identity enabled, you will need to grant the KEDA operator enough permissions that it can list/read/view the scaling events coming from any of the Scalers that are being used.
In our upcoming examples, we use Scalers to read from internal metrics/logs as well as a Google PubSub Topic, so we will create a Google Service Account (GSA) that has these roles (monitoring.viewer, logging.viewer and pubsub.viewer).
# create GSA GSA_PROJECT=$(gcloud config get project) GSA_NAME=keda-sa gcloud iam service-accounts create $GSA_NAME --project=$GSA_PROJECT # add roles to GSA ROLE_NAMES="roles/monitoring.viewer roles/logging.viewer roles/pubsub.viewer" for ROLE_NAME in $ROLE_NAMES; do gcloud projects add-iam-policy-binding $GSA_PROJECT --member "serviceAccount:${GSA_NAME}@${GSA_PROJECT}.iam.gserviceaccount.com" --role "$ROLE_NAME" done # bind GSA to Kubernetes Service Account (KSA) NAMESPACE=keda KSA_NAME=keda-operator gcloud iam service-accounts add-iam-policy-binding ${GSA_NAME}@${GSA_PROJECT}.iam.gserviceaccount.com \ --role roles/iam.workloadIdentityUser \ --member "serviceAccount:${GSA_PROJECT}.svc.id.goog[$NAMESPACE/$KSA_NAME]"
This will allow the Kubernetes Service Account named ‘keda-operator’ in the namespace ‘keda’ to read metrics, logs, and external Google PubSub topics/subscriptions.
Installing KEDA on GKE using Helm
Assuming you have kubectl and helm installed on your system, you can now deploy KEDA to your GKE cluster.
# kubeconfig credentials gcloud container clusters get-credentials $cluster_name $location_flag kubectl get pods # credentials for Helm gcloud auth application-default login # add KEDA repo helm repo add kedacore https://kedacore.github.io/charts helm repo update # deploy KEDA with annotation for KSA # annotation ties KSA with GSA roles helm install keda kedacore/keda --namespace keda --create-namespace --set serviceAccount.operator.annotations."iam\.gke\.io/gcp-service-account"="${GSA_NAME}@${GSA_PROJECT}.iam.gserviceaccount.com"
Validate KEDA Installation
# shows KEDA version installed helm list -n keda # shows custom values used during installation, namely the KSA annotation helm get values keda -n keda # waits for deployments to be rolled out kubectl rollout status deployment keda-operator -n keda --timeout=90s kubectl rollout status deployment keda-operator-metrics-apiserver -n keda --timeout=90s # check for success log message, try again if not yet found kubectl logs -n keda deployment/keda-operator-metrics-apiserver | grep "has been successfully established"
KEDA scaling event based on CPU utilization
Let’s test KEDA by deploying a simple web application and inducing load which will cause the cpu usage to spike and the deployment to scale up.
Deploy simple web app
# grab manifest for simple web app listening on port 8080 wget 'https://gitlab.com/gitlab-pipeline7091038/google-hello-app-logging-multiarch/-/raw/main/golang-hello-world-web-logging.yaml?ref_type=heads&inline=false' -O golang-hello-world-web-logging.yaml # deploy kubectl apply -f golang-hello-world-web-logging.yaml kubectl get deployment golang-hello-world-web-logging -n default
Scale web deployment based on cpu utilization
# grab manifest for KEDA ScaledObject based on cpu metric of deployment wget 'https://gitlab.com/gitlab-pipeline7091038/google-hello-app-logging-multiarch/-/raw/main/keda-scaledobject-cpu.yaml?ref_type=heads&inline=false' -O keda-scaledobject-cpu.yaml # deploy kubectl apply -f keda-scaledobject-cpu.yaml # show ScaleObject created kubectl describe -n default ScaledObject cpu-scaledobject # show the backing HPA that KEDA auto-creates, prefixed with 'keda-hpa' kubectl get -n default hpa keda-hpa-cpu-scaledobject
KEDA ScaledObject details
Let’s look at the ScaledObject we just deployed, it targets the simple web app we deployed earlier and has a trigger of type cpu utilization with a value of 50. So if cpu utilization of the pod goes over 50%, a scaling event will be triggered. The minimum replica count is 1, but it is allowed to go to a maximum of 5 replicas when load is heavy.
apiVersion: keda.sh/v1alpha1 kind: ScaledObject metadata: name: cpu-scaledobject namespace: default spec: scaleTargetRef: name: golang-hello-world-web-logging minReplicaCount: 1 maxReplicaCount: 5 pollingInterval: 15 # seconds cooldownPeriod: 15 # seconds triggers: - type: cpu metricType: Utilization metadata: value: "50"
Apply load to simple web app
# grab manifest for load testing utility wget https://github.com/fabianlee/docker-apache-workbench-tools/raw/refs/heads/main/apache-workbench-tools.yaml # deploy kubectl apply -f apache-workbench-tools.yaml # monitor replica count of web deployment (starts at 1) watch kubectl get deployment -n default golang-hello-world-web-logging
Now using another console, put load on the web deployment using the load testing pod.
# throw a lot of traffic, watch it scale up within 60 seconds kubectl exec -it -n default deployment/apache-workbench-tools -- ab -n 100000 -c 100 -f TLS1.2 http://golang-hello-world-web-logging-service:8080/
The replica count will jump to 5 within ~30 seconds based on the load. It will scale itself back down in 5-7 minutes.
The KEDA events that scaled this deployment can be seen directly on the HPA as well as the global events view.
# show events on HorizontalPodAutoscaler created by KEDA kubectl describe -n default hpa keda-hpa-cpu-scaledobject # show same events coming from HPA kubectl get events | grep keda-hpa-cpu-scaledobject
KEDA scaling event based on PubSub
As an example of scaling based on an external event, let’s create a Google PubSub Topic and trigger a scale up based on the number of messages in the Subscription.
This can illustrate a scenario where a high number of user events may need to be processed, and therefore you need the replica count to scale up. During periods of no activity, we will allow the replica count to go to 0.
Standard HPA are unable to scale to 0 (HPAScaleToZero is still a feature gate), so this is one of KEDA’s enriching features.
Create a GCP PubSub Topic
TOPIC_ID=my-topic SUBSCRIBE_ID=my-sub # enable pub/sub managed service gcloud services enable pubsub.googleapis.com # create topic and subscription gcloud pubsub topics create $TOPIC_ID gcloud pubsub subscriptions create $SUBSCRIBE_ID --topic $TOPIC_ID
Deploy simple web app
# get manifest for simple web app wget https://github.com/fabianlee/blogcode/raw/refs/heads/master/k8s/keda/golang-hello-world-web-scaled-pubsub.yaml # deploy kubectl apply -f golang-hello-world-web-scaled-pubsub.yaml kubectl get deployment golang-hello-world-web-scaled-pubsub -n default
Scale web deployment based on PubSub Subscription
# get manifest for KEDA ScaledObject based on Topic message count wget https://github.com/fabianlee/blogcode/raw/refs/heads/master/k8s/keda/keda-scaledobject-pubsub.yaml # deploy kubectl apply -f keda-scaledobject-pubsub.yaml # show ScaledObject just created, look for any failed event errors kubectl describe scaledobject pubsub-scaledobject # show the backing HPA that KEDA auto-creates, prefixed with 'keda-hpa' kubectl get -n default hpa keda-hpa-pubsub-scaledobject # monitor replica count (starts at 0) watch kubectl get deployment -n default golang-hello-world-web-scaled-pubsub
KEDA ScaledObject details
Let’s look at the ScaledObject we just deployed, it targets the simple web app we deployed earlier and has a trigger of type GCP pubsub with a minimum replica count of 0.
This mean KEDA will be handling two phases for this deployment: activation and scaling. The ‘activationValue’ controls how the KEDA operator takes a deployment into and out of the state where there are 0 replicas. While the ‘value’ controls the replica count in the non-zero scaling phase.
Our deployment will be triggered when there are more than 10 messages in the subscription within the 1 minute time horizon we have setup. And once activated, the replica count is controlled by the value of “5” specified.
apiVersion: keda.sh/v1alpha1 kind: TriggerAuthentication metadata: name: keda-workload-identity-auth spec: podIdentity: provider: gcp --- apiVersion: keda.sh/v1alpha1 kind: ScaledObject metadata: name: pubsub-scaledobject spec: pollingInterval: 10 # seconds cooldownPeriod: 10 # seconds maxReplicaCount: 5 minReplicaCount: 0 scaleTargetRef: name: golang-hello-world-web-scaled-pubsub triggers: - type: gcp-pubsub authenticationRef: name: keda-workload-identity-auth metadata: subscriptionName: my-sub mode: "SubscriptionSize" aggregation: "sum" value: "5" valueIfNull: '1.0' activationValue: "10" timeHorizon: "1m"
Push Messages into Topic to simulate load
for i in $(seq 1 40); do gcloud pubsub topics publish $TOPIC_ID --message="Hello World $i" --project $GSA_PROJECT done
The replica count will activate within ~2 minutes based on the load. And will scale itself back down to 0 when the time horizon passes.
The KEDA events that scaled this deployment can be seen directly on the HPA as well as the global events view.
# show scaling events for HorizontalPodAutoscaler created by KEDA kubectl describe -n default hpa keda-hpa-pubsub-scaledobject # show activation events on ScaledObject kubectl describe -n default scaledobject pubsub-scaledobject # show activation and scaling events kubectl get events | grep -E "keda-hpa-pubsub-scaledobject|pubsub-scaledobject"
REFERENCES
KEDA docs, GCP Workload Identity
debricked.com, what is GKE Workload Identity
KEDA, Scaler for google pubsub
KEDA, activation and scaling phase
KEDA, list of emitted events (scale, activation, etc)
Example of KEDA Scaler for RabbitMQ
NOTES
Clearing non-fatal errors in keda-operator-metrics-apiserver
# see if failed connection logged because timing of startup (not fatal, but let's clear) kubectl logs -n keda deployment/keda-operator-metrics-apiserver | grep "failed to connect to" # if so, try another restart in 90 seconds sleep 90 && kubectl rollout restart deployment keda-operator-metrics-apiserver -n keda kubectl rollout status deployment keda-operator-metrics-apiserver -n keda --timeout=90s # log should be clean now, no failures returned kubectl logs -n keda deployment/keda-operator-metrics-apiserver | grep "failed to connect to" # check for success log message kubectl logs -n keda deployment/keda-operator-metrics-apiserver | grep "has been successfully established"
Pulling messages off subscription
for i in $(seq 1 40); do gcloud pubsub subscriptions pull $SUBSCRIBE_ID --project $GSA_PROJECT done