GCP: Installing KEDA on a GKE cluster with workload identity and testing Scalers

KEDA is an open-source event-driven autoscaler that greatly enhances the abilities of the standard HorizontalPodAutoscaler.  It can scale based on internal metrics as well as external Scaler sources.

In this article, I will illustrate how to install KEDA on a GKE cluster that has Workload Identity enabled, and then how to configure KEDA scaling events based on pod cpu utilization as well as messages from an external Google PubSub subscription.

GKE Cluster validation

First, verify that the GKE cluster and its nodepool have workload identity enabled.

# list clusters available
gcloud container clusters list

# set based on your output above
cluster_name=cluster-1
location_flag=--zone=us-central1

# non-empty value indicates workload identity at cluster level
gcloud container clusters describe $cluster_name $location_flag --format="value(workloadIdentityConfig.workloadPool)"

# name of first found node pool
nodepool_name=$(gcloud container node-pools list --cluster=$cluster_name $location_flag --format="value(name)" | head -n1)
# non-empty value indicates workload identity set at node pool level
gcloud container node-pools describe $nodepool_name --cluster=$cluster_name $location_flag --format="value(config.workloadMetadataConfig.mode)"

Then validate that the GKE cluster and its nodepool have the required oauth2 scopes for monitoring, “/auth/monitoring”.   Most likely, you will have the list of scopes shown below, which are the ‘gke-default‘ scopes.

# oauth2 scopes below are from 'gke-default'
$ gcloud container clusters describe $cluster_name $location_flag | grep oauthScopes -A10 --color
    - https://www.googleapis.com/auth/devstorage.read_only
    - https://www.googleapis.com/auth/logging.write
    - https://www.googleapis.com/auth/monitoring
    - https://www.googleapis.com/auth/servicecontrol
    - https://www.googleapis.com/auth/service.management.readonly
    - https://www.googleapis.com/auth/trace.append

If you have ‘https://www.googleapis.com/auth/cloud-platform‘, this is a wide oauth2 scope that encompasses all cloud services and is valid for our monitoring needs.

Preparing Google Service Account (GSA) for KEDA

For a GKE cluster that has Workload Identity enabled, you will need to grant the KEDA operator enough permissions that it can list/read/view the scaling events coming from any of the Scalers that are being used.

In our upcoming examples, we use Scalers to read from internal metrics/logs as well as a Google PubSub Topic, so we will create a Google Service Account (GSA) that has these roles (monitoring.viewer, logging.viewer and pubsub.viewer).

# create GSA
GSA_PROJECT=$(gcloud config get project)
GSA_NAME=keda-sa
gcloud iam service-accounts create $GSA_NAME --project=$GSA_PROJECT

# add roles to GSA
ROLE_NAMES="roles/monitoring.viewer roles/logging.viewer roles/pubsub.viewer"
for ROLE_NAME in $ROLE_NAMES; do
  gcloud projects add-iam-policy-binding $GSA_PROJECT --member "serviceAccount:${GSA_NAME}@${GSA_PROJECT}.iam.gserviceaccount.com" --role "$ROLE_NAME"
done

# bind GSA to Kubernetes Service Account (KSA)
NAMESPACE=keda
KSA_NAME=keda-operator
gcloud iam service-accounts add-iam-policy-binding ${GSA_NAME}@${GSA_PROJECT}.iam.gserviceaccount.com \
    --role roles/iam.workloadIdentityUser \
    --member "serviceAccount:${GSA_PROJECT}.svc.id.goog[$NAMESPACE/$KSA_NAME]"

This will allow the Kubernetes Service Account named ‘keda-operator’ in the namespace ‘keda’ to read metrics, logs, and external Google PubSub topics/subscriptions.

Installing KEDA on GKE using Helm

Assuming you have kubectl and helm installed on your system, you can now deploy KEDA to your GKE cluster.

# kubeconfig credentials
gcloud container clusters get-credentials $cluster_name $location_flag
kubectl get pods

# credentials for Helm
gcloud auth application-default login

# add KEDA repo
helm repo add kedacore https://kedacore.github.io/charts
helm repo update

# deploy KEDA with annotation for KSA
# annotation ties KSA with GSA roles
helm install keda kedacore/keda --namespace keda --create-namespace --set serviceAccount.operator.annotations."iam\.gke\.io/gcp-service-account"="${GSA_NAME}@${GSA_PROJECT}.iam.gserviceaccount.com"

Validate KEDA Installation

# shows KEDA version installed
helm list -n keda

# shows custom values used during installation, namely the KSA annotation
helm get values keda -n keda

# waits for deployments to be rolled out
kubectl rollout status deployment keda-operator -n keda --timeout=90s
kubectl rollout status deployment keda-operator-metrics-apiserver -n keda --timeout=90s

# check for success log message, try again if not yet found
kubectl logs -n keda deployment/keda-operator-metrics-apiserver | grep "has been successfully established"

KEDA scaling event based on CPU utilization

Let’s test KEDA by deploying a simple web application and inducing load which will cause the cpu usage to spike and the deployment to scale up.

Deploy simple web app

# grab manifest for simple web app listening on port 8080
wget 'https://gitlab.com/gitlab-pipeline7091038/google-hello-app-logging-multiarch/-/raw/main/golang-hello-world-web-logging.yaml?ref_type=heads&inline=false' -O golang-hello-world-web-logging.yaml
# deploy
kubectl apply -f golang-hello-world-web-logging.yaml
kubectl get deployment golang-hello-world-web-logging -n default

Scale web deployment based on cpu utilization

# grab manifest for KEDA ScaledObject based on cpu metric of deployment
wget 'https://gitlab.com/gitlab-pipeline7091038/google-hello-app-logging-multiarch/-/raw/main/keda-scaledobject-cpu.yaml?ref_type=heads&inline=false' -O keda-scaledobject-cpu.yaml
# deploy
kubectl apply -f keda-scaledobject-cpu.yaml
# show ScaleObject created
kubectl describe -n default ScaledObject cpu-scaledobject
# show the backing HPA that KEDA auto-creates, prefixed with 'keda-hpa'
kubectl get -n default hpa keda-hpa-cpu-scaledobject

KEDA ScaledObject details

Let’s look at the ScaledObject we just deployed, it targets the simple web app we deployed earlier and has a trigger of type cpu utilization with a value of 50.  So if cpu utilization of the pod goes over 50%, a scaling event will be triggered.  The minimum replica count is 1, but it is allowed to go to a maximum of 5 replicas when load is heavy.

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: cpu-scaledobject
  namespace: default
spec:
  scaleTargetRef:
    name: golang-hello-world-web-logging
  minReplicaCount: 1
  maxReplicaCount: 5
  pollingInterval: 15 # seconds
  cooldownPeriod: 15 # seconds
  triggers:
  - type: cpu
    metricType: Utilization
    metadata:
      value: "50"

Apply load to simple web app

# grab manifest for load testing utility
wget https://github.com/fabianlee/docker-apache-workbench-tools/raw/refs/heads/main/apache-workbench-tools.yaml
# deploy
kubectl apply -f apache-workbench-tools.yaml 

# monitor replica count of web deployment (starts at 1)
watch kubectl get deployment -n default golang-hello-world-web-logging

Now using another console, put load on the web deployment using the load testing pod.

# throw a lot of traffic, watch it scale up within 60 seconds
kubectl exec -it -n default deployment/apache-workbench-tools -- ab -n 100000 -c 100 -f TLS1.2 http://golang-hello-world-web-logging-service:8080/

The replica count will jump to 5 within ~30 seconds based on the load.  It will scale itself back down in 5-7 minutes.

The KEDA events that scaled this deployment can be seen directly on the HPA as well as the global events view.

# show events on HorizontalPodAutoscaler created by KEDA
kubectl describe -n default hpa keda-hpa-cpu-scaledobject
# show same events coming from HPA
kubectl get events | grep keda-hpa-cpu-scaledobject

KEDA scaling event based on PubSub

As an example of scaling based on an external event, let’s create a Google PubSub Topic and trigger a scale up based on the number of messages in the Subscription.

This can illustrate a scenario where a high number of user events may need to be processed, and therefore you need the replica count to scale up.  During periods of no activity, we will allow the replica count to go to 0.

Standard HPA are unable to scale to 0 (HPAScaleToZero is still a feature gate), so this is one of KEDA’s enriching features.

Create a GCP PubSub Topic

TOPIC_ID=my-topic
SUBSCRIBE_ID=my-sub

# enable pub/sub managed service
gcloud services enable pubsub.googleapis.com

# create topic and subscription
gcloud pubsub topics create $TOPIC_ID
gcloud pubsub subscriptions create $SUBSCRIBE_ID --topic $TOPIC_ID

Deploy simple web app

# get manifest for simple web app
wget https://github.com/fabianlee/blogcode/raw/refs/heads/master/k8s/keda/golang-hello-world-web-scaled-pubsub.yaml
# deploy
kubectl apply -f golang-hello-world-web-scaled-pubsub.yaml
kubectl get deployment golang-hello-world-web-scaled-pubsub -n default

Scale web deployment based on PubSub Subscription

# get manifest for KEDA ScaledObject based on Topic message count
wget https://github.com/fabianlee/blogcode/raw/refs/heads/master/k8s/keda/keda-scaledobject-pubsub.yaml
# deploy
kubectl apply -f keda-scaledobject-pubsub.yaml

# show ScaledObject just created, look for any failed event errors
kubectl describe scaledobject pubsub-scaledobject
# show the backing HPA that KEDA auto-creates, prefixed with 'keda-hpa'
kubectl get -n default hpa keda-hpa-pubsub-scaledobject

# monitor replica count (starts at 0)
watch kubectl get deployment -n default golang-hello-world-web-scaled-pubsub

KEDA ScaledObject details

Let’s look at the ScaledObject we just deployed, it targets the simple web app we deployed earlier and has a trigger of type GCP pubsub with a minimum replica count of 0.

This mean KEDA will be handling two phases for this deployment: activation and scaling.  The ‘activationValue’ controls how the KEDA operator takes a deployment into and out of the state where there are 0 replicas.    While the ‘value’ controls the replica count in the non-zero scaling phase.

Our deployment will be triggered when there are more than 10 messages in the subscription within the 1 minute time horizon we have setup.  And once activated, the replica count is controlled by the value of “5” specified.

apiVersion: keda.sh/v1alpha1
kind: TriggerAuthentication
metadata:
  name: keda-workload-identity-auth
spec:
  podIdentity:
    provider: gcp
---
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: pubsub-scaledobject
spec:
  pollingInterval: 10 # seconds
  cooldownPeriod:  10 # seconds
  maxReplicaCount: 5
  minReplicaCount: 0
  scaleTargetRef:
    name: golang-hello-world-web-scaled-pubsub
  triggers:
    - type: gcp-pubsub
      authenticationRef:
        name: keda-workload-identity-auth
      metadata:
        subscriptionName: my-sub
        mode: "SubscriptionSize"
        aggregation: "sum"
        value: "5"
        valueIfNull: '1.0'
        activationValue: "10"
        timeHorizon: "1m"

Push Messages into Topic to simulate load

for i in $(seq 1 40); do
  gcloud pubsub topics publish $TOPIC_ID --message="Hello World $i" --project $GSA_PROJECT
done

The replica count will activate within ~2 minutes based on the load.  And will scale itself back down to 0 when the time horizon passes.

The KEDA events that scaled this deployment can be seen directly on the HPA as well as the global events view.

# show scaling events for HorizontalPodAutoscaler created by KEDA
kubectl describe -n default hpa keda-hpa-pubsub-scaledobject
# show activation events on ScaledObject
kubectl describe -n default scaledobject pubsub-scaledobject
# show activation and scaling events
kubectl get events | grep -E "keda-hpa-pubsub-scaledobject|pubsub-scaledobject"

 

REFERENCES

KEDA docs

KEDA docs, GCP Workload Identity

gcloud pubsub commands

Google doc, Workload Identity

debricked.com, what is GKE Workload Identity

Google doc, token-types that describes how it is best practice to use ‘cloud-platform’ oauth2 scope and then limit permission based on IAM

Google docs, IAM role list

KEDA, Scaler for cpu

KEDA, Scaler for google pubsub

KEDA, activation and scaling phase

KEDA, list of emitted events (scale, activation, etc)

Example of KEDA Scaler for RabbitMQ

 

NOTES

Clearing non-fatal errors in keda-operator-metrics-apiserver

# see if failed connection logged because timing of startup (not fatal, but let's clear)
kubectl logs -n keda deployment/keda-operator-metrics-apiserver | grep "failed to connect to"
# if so, try another restart in 90 seconds
sleep 90 && kubectl rollout restart deployment keda-operator-metrics-apiserver -n keda
kubectl rollout status deployment keda-operator-metrics-apiserver -n keda --timeout=90s

# log should be clean now, no failures returned
kubectl logs -n keda deployment/keda-operator-metrics-apiserver | grep "failed to connect to"
# check for success log message
kubectl logs -n keda deployment/keda-operator-metrics-apiserver | grep "has been successfully established"

Pulling messages off subscription

for i in $(seq 1 40); do
  gcloud pubsub subscriptions pull $SUBSCRIBE_ID --project $GSA_PROJECT
done