Fabian Lee : Software Engineer

Kubernetes: targeting workloads to a node pool/group using taints and tolerations

November 21, 2024
Categories: Hyperscaler, Kubernetes

If you have specific intentions for a Kubernetes node pool/group (workload isolation, cpu type, etc.), then you can assign labels to attract workloads in conjunction with taints to repel workloads that do not have explicit tolerations applied. And although the generalized kubectl utility can assign labels and taints to specific nodes, the assignment of labels … Kubernetes: targeting workloads to a node pool/group using taints and tolerations

GCP: publishing and reading from Google PubSub Topic using Python client libraries

September 22, 2024
Categories: Hyperscaler, Python

Google Pub/Sub is a managed messaging platform providing a scalable, asynchronous, loosely-coupled solution for communication between application entities. It centers around the concept of a Topic (queue). A Publisher can put messages on the Topic, and a Subscriber can read messages from the Subscription on a Topic. In this article, I will first use the … GCP: publishing and reading from Google PubSub Topic using Python client libraries

GCP: Installing KEDA on a GKE cluster with workload identity and testing Scalers

September 21, 2024
Categories: Hyperscaler, Kubernetes

KEDA is an open-source event-driven autoscaler that greatly enhances the abilities of the standard HorizontalPodAutoscaler. It can scale based on internal metrics as well as external Scaler sources. In this article, I will illustrate how to install KEDA on a GKE cluster that has Workload Identity enabled, and then how to configure KEDA scaling events … GCP: Installing KEDA on a GKE cluster with workload identity and testing Scalers

GCP: historical log of GKE cluster and nodepool upgrades and scaling

September 21, 2024
Categories: Hyperscaler

Although the simple ‘gcloud container operations list‘ command is the easiest way to find recent upgrade events on your GKE cluster or nodepool, it returns only the recent events and does not provide a historical record. If you need to look at historical events, you can use Logs Explorer web UI or use the ‘gcloud … GCP: historical log of GKE cluster and nodepool upgrades and scaling

GCP: quota project error when invoking GCP API using ADC application-default

October 14, 2023
Categories: Hyperscaler

If you receive an error similar to below when calling the GCP API using ADC login credentials with either gcloud or terraform: Cannot add the project “myproj-i1wsbbn8pkfeq3jhkcg0z4” to ADC as the quota project because the account in ADC does not have the “serviceusage.services.use” permission on this project. You might receive a “quota_exceeded” or “API not … GCP: quota project error when invoking GCP API using ADC application-default

GCP: determining whether ASM is installed via asmcli or gcloud fleet

May 11, 2023
Categories: Hyperscaler

Anthos Service Mesh for GKE can be installed in the following modes: In-cluster ASM using the asmcli utility Managed ASM using the asmcli utility Managed ASM using the ‘gcloud container fleet’ command Managed ASM using the Terraform asm submodule If you need to determine the installation mode used on your GKE cluster, you can examine … GCP: determining whether ASM is installed via asmcli or gcloud fleet

GCP: determining whether GKE cluster mode is Standard or Autopilot

May 8, 2023
Categories: Hyperscaler

If you need to determine at the CLI whether a GKE cluster is managed using Standard or Autopilot mode, this is available by using gcloud to describe the cluster. # identify cluster and location gcloud container clusters list cluster_name=<clusterName> location_flag=”–region=<region>” # OR –zone=<zone> # returns ‘True’ if GKE AutoPilot cluster # returns empty if standard … GCP: determining whether GKE cluster mode is Standard or Autopilot

GKE: terraform lifecycle ‘ignore_changes’ to manage external changes to GKE cluster

May 6, 2023
Categories: Automation, DevOps, Hyperscaler

As much as Terraform pushes to be the absolute system of record for resources it creates, often valid external processes are assisting in managing those same resources. Here are some examples of legitimate external changes: Other company-approved Terraform scripts applying labeling to resources in order to track ownership and costs Security teams modifying IAM roles … GKE: terraform lifecycle ‘ignore_changes’ to manage external changes to GKE cluster

GCP: Cloud Run with build trigger coming from remote GitHub repository

April 29, 2023
Categories: Containers, Development, Hyperscaler

GCP build triggers can easily handle Continuous Deployment (CD) when the source code is homed in a Google Cloud Source repository. But even if the system of record for your source is a remote GitHub repository, these same type of push and tag events can be consumed if you configure a connection and repository link. … GCP: Cloud Run with build trigger coming from remote GitHub repository

GCP: deploying a Python WSGI Gunicorn app on Cloud Run

April 27, 2023
Categories: Development, Hyperscaler, Python

Flask is a suitable web server during development, but if you are going to deploy in a production environment, a Python WSGI server such as Gunicorn should be used. This also applies to Python Flask apps deployed to GCP Cloud Run. Gunicorn is necessary to tune the worker and thread count of each instance to … GCP: deploying a Python WSGI Gunicorn app on Cloud Run

GCP: Cloud Run/Function to handle requests to GKE cluster during maintenance

April 18, 2023
Categories: Hyperscaler

At some point, there will be a system change significant enough that a maintenance window needs to be scheduled with customers. But that doesn’t mean the end-user traffic or client integrations will stop requesting the services. What we need to present to end-users is a maintenance page during this outage to indicate the overall solution … GCP: Cloud Run/Function to handle requests to GKE cluster during maintenance

GKE: show pod distribution across nodes and zones

April 10, 2023
Categories: Hyperscaler, Kubernetes

Whether you are working on scaling, performance, or high-availability, it can be useful to see exactly which Kubernetes worker node that pods are being scheduled unto. Pods as distributed across worker nodes ns=default kubectl get pods -n $ns -o=custom-columns=NAME:.metadata.name,NODE:.spec.nodeName Pods as distributed across zones (GKE specific) If you wanted to take it one step further … GKE: show pod distribution across nodes and zones

GKE: upgrade Anthos Config Management for GKE cluster

April 6, 2023
Categories: Hyperscaler

If you are managing GKE clusters using Anthos Config Management (ACM) and need to take advantage of newer features or enhancements in ConfigSync or PolicyController, upgrading these components can be done using the gcloud utility. # check current version of ACM on GKE clusters gcloud beta container fleet config-management version # select membership to upgrade … GKE: upgrade Anthos Config Management for GKE cluster

Terraform: migrate state from local to remote Google Cloud Storage bucket and back

March 27, 2023
Categories: DevOps, Hyperscaler

In this article I will demonstrate how to take a Terraform configuration that is using a local state file and migrate its persistent state to a remote Google Cloud Storage bucket (GCS). We will then perform the migration again, but this time to bring the remote state back to a local file. We will illustrate … Terraform: migrate state from local to remote Google Cloud Storage bucket and back

GKE: Determine Anthos on-prem GKE master node and IP address

March 25, 2023
Categories: Hyperscaler, Kubernetes

If you are using Anthos GKE on-premise and need to determine which node of your Admin Cluster is the master, query for the master role. The label is ‘node-role.kubernetes.io/master’. $ kubectl get nodes -l node-role.kubernetes.io/master NAME STATUS ROLES AGE VERSION gke-admin-master-adfwa Ready control-plane,master 7d v1.24.9-gke.100 # using wide will also show External and Internal IP … GKE: Determine Anthos on-prem GKE master node and IP address

GCP: Google Cloud Storage bucket with permissions for user or service account

February 24, 2023
Categories: Hyperscaler

Creating a Google Cloud Storage bucket is simple, but the IAM permissions required to perform operations in the bucket can be difficult to understand. Especially when you want something as simple as to provide upload/download access to the person who created the bucket and perhaps a service account. Below are the commands for creating a … GCP: Google Cloud Storage bucket with permissions for user or service account

GCP: list of available GKE cluster versions in region and channel

January 12, 2023
Categories: Hyperscaler

If you are going to create a GKE cluster in a region, you may need to be explicit with the version of the master control plane and worker nodes. Below is how you would list the available versions. # specify your region region=us-east1 gcloud container get-server-config –region=$region

GCP: fix kubectl auth plugin deprecation warning by installing new auth plugin

October 20, 2022
Categories: Hyperscaler, Kubernetes

Starting with Kubernetes client 1.22, you may start seeing warning messages about your authentication mechanism when running commands. Here is an example when using gcloud for GKE cluster credentials. WARNING: the gcp auth plugin is deprecated in v1.22+, unavailable in v1.26+; use gcloud instead. This is because the authentication provider-specific login code will be removed … GCP: fix kubectl auth plugin deprecation warning by installing new auth plugin

GCP: gcloud to change VM instance service account and API scope

October 19, 2022
Categories: Hyperscaler, Scripting

GCP Compute VM Instances can be set to run as a service account with API scopes that allow specific operations to be performed. If you need to change the service account or the scopes, you will need to power down the VM instance, make the changes, and then start the VM instance up again. Here … GCP: gcloud to change VM instance service account and API scope

GCP: gcloud csv format with no-heading for Bash parsing

October 19, 2022
Categories: Hyperscaler, Scripting

If you are scripting a set of gcloud commands, there is a good chance that it is within a Bash script. While some Bash loops can be driven by a single variable, sometimes you need access to multiple variables within the loop. In this article I will show you can drive a Bash loop requiring … GCP: gcloud csv format with no-heading for Bash parsing

GCP: LDAP authentication for Anthos VMware clusters using Anthos Identity Service

October 18, 2022
Categories: Hyperscaler, Kubernetes

Anthos Identity Service allows an organization to tie into their existing Identity Provider to authenticate and authorize users into their Anthos clusters. In this article, I will show how the authentication for an Anthos on VMware cluster can be integrated into an existing Active Directory deployment, and further how a user’s AD group membership can … GCP: LDAP authentication for Anthos VMware clusters using Anthos Identity Service

GCP: listing IAM roles for user, group, and service account in project and organization

October 18, 2022
Categories: Hyperscaler, Scripting

When GCP operations fail due to permissions issues, checking the IAM roles assigned to a user, group, or service account becomes a necessity. When hierarchical projects and organizations are involved it becomes even more complex. This article will show you how to use gcloud at the project and organization level to pull IAM policies for … GCP: listing IAM roles for user, group, and service account in project and organization

GCP: Enable HttpLoadBalancing feature on Cluster to avoid errors when applying BackEndConfig

June 5, 2022
Categories: Hyperscaler, Kubernetes

If you are configuring Istio/ASM ingress gateways with a BackendConfig for specifying health checks, timeouts, or Cloud Armor policies, then you need to ensure that your GKE cluster has the HttpLoadBalancing feature enabled. If this feature is not enabled, you will see an error message like below when attempting to apply the BackendConfig manifest: unable … GCP: Enable HttpLoadBalancing feature on Cluster to avoid errors when applying BackEndConfig

GCP: running a container on a GKE cluster using Workload Identity

May 23, 2022
Categories: Hyperscaler

With Workload Identity enabled on a GKE cluster, your container can access Google Cloud API services (Compute Engine, Storage, etc.) using a Kubernetes Service Account (KSA). This is done by having the container run as the KSA, where the KSA has been bound to the Google Service Account (GSA). This is the recommended way of … GCP: running a container on a GKE cluster using Workload Identity

GCP: Enabling autoUpgrade for node-pools to reduce manual maintenance

May 17, 2022
Categories: Hyperscaler, Kubernetes

GKE cluster upgrades do not need to be a manual process. GKE clusters can be auto upgraded by subscribing the cluster to an appropriate release channel and assigning a sensible maintenance window. As long as adequate pod disruption budgets, replicas, and ingress are configured, these upgrades can happen without interrupting availability. To check the current … GCP: Enabling autoUpgrade for node-pools to reduce manual maintenance

Kubernetes: Anthos GKE on-prem 1.11 on nested VMware environment

May 9, 2022
Categories: Hyperscaler, Kubernetes

Anthos GKE on-prem is a managed platform that brings GKE clusters to on-premise datacenters. This product offering brings best practice security measures, tested paths for upgrades, basic monitoring, platform logging, and full enterprise support. Setting up a platform this extensive requires many steps as officially documented here. However, if you want to practice in a … Kubernetes: Anthos GKE on-prem 1.11 on nested VMware environment

GCP: Moving a VM instance to a different region using snapshots

May 1, 2022
Categories: Hyperscaler

The ‘gcloud compute instances move‘ command is convenient for moving VM instances from one region to another, but only works within a narrow scope of OS image types and disks. For example, only older non-UEFI OS images can be moved with this command. Trying to move even the simplest Ubuntu bionic/focal or Debian bullseye/buster VM … GCP: Moving a VM instance to a different region using snapshots

GCP: Enable Policy Controller on a GKE cluster

April 26, 2022
Categories: Containers, Hyperscaler

Anthos Policy Controller enables enforcement of compliance, security, and organizational policies on GKE clusters. These might be best-practice policies coming from internal Architectural standards, or technical policies used to define/constrain resources, or audit requirements stemming from legal regulation. Anthos Policy Controller is built upon the open-source Open Policy Agent (OPA) Gatekeeper, which uses a Kubernetes … GCP: Enable Policy Controller on a GKE cluster

GCP: Enable Anthos Config Management (ACM) on a GKE cluster

April 19, 2022
Categories: Hyperscaler, Kubernetes

Anthos Config Management (ACM) brings the power of GitOps to your GKE clusters. Instead of needing to manually keep deployments current on a cluster or group of clusters, you can push changes to a git repository and the Config Sync component will periodically poll and attempt to reach the new state described by your git … GCP: Enable Anthos Config Management (ACM) on a GKE cluster

GCP: Cloud Function to handle requests to HTTPS LB during maintenance

April 1, 2022
Categories: Hyperscaler

At some point you may need to schedule a maintenance window for your solution But that doesn’t mean the end-user traffic or client integrations will stop requesting the services from the GCP external HTTPS LB that fronts all client requests. The VM instances and GKE clusters that normally respond to requests may not be able … GCP: Cloud Function to handle requests to HTTPS LB during maintenance

GCP: Deploying a 2nd gen Python Cloud Function and exposing from an HTTPS LB

April 1, 2022
Categories: Hyperscaler

GCP Cloud Functions have taken a step forward with the 2nd generation release. One of the biggest architectural differences is that now multiple request can run concurrently on a single instance, enabling large traffic loads. In this article, I will show you how to deploy a simple Python Flask web server as a 2nd gen … GCP: Deploying a 2nd gen Python Cloud Function and exposing from an HTTPS LB

GCP: VM instances running as the Compute Engine default service account

March 31, 2022
Categories: Hyperscaler, Scripting

The Compute Engine default service account is automatically generated for your project with the Editor role, and by default is attached to all VM instances created in the project. You can pull the exact id using gcloud. gcloud iam service-accounts list –filter=”displayName:’Compute Engine default service account'” –format=’value(email)’ The syntax will be ${project_id}-compute@developer.gserviceaccount.com. If you want … GCP: VM instances running as the Compute Engine default service account

GCP: global external HTTPS LB for securely exposing insecure VM services

March 30, 2022
Categories: Hyperscaler

If you have unmanaged GCP VM instances running services on insecure ports (e.g. Apache HTTP on port 80), one way to secure the public external traffic is to create an external GCP HTTPS load balancer. Conceptually, we want to expose a secure front to otherwise insecure services. While the preferred method would be to secure … GCP: global external HTTPS LB for securely exposing insecure VM services

GCP: internal HTTPS LB for securely exposing insecure VM services

March 30, 2022
Categories: Hyperscaler

If you have unmanaged GCP VM instances running services on insecure ports (e.g. Apache HTTP on port 80), one way to secure the internal communication coming from other internal pods/apps is to create an internal GCP HTTPS load balancer. Conceptually, we want to expose a secure front to otherwise insecure services. While the preferred method … GCP: internal HTTPS LB for securely exposing insecure VM services

GCP: Private GKE cluster in Autopilot mode using Terraform

March 7, 2022
Categories: Containers, Hyperscaler

GKE Autopilot reduces the operational costs of managing GKE clusters by freeing you from node level maintenance, instead focusing just on pod workloads. Costs are accrued based on pod resource consumption and not on node resource sizes or node count, which are managed by Google. Since you no longer own the node level, there are … GCP: Private GKE cluster in Autopilot mode using Terraform

GCP: enabling Cloud Armor on GCP HTTPS LB for Anthos Service Mesh

March 6, 2022
Categories: Hyperscaler

If you are using Anthos Service Mesh to deliver your public applications from a GFE HTTPS LB, I would strongly suggest enabling Cloud Armor which is a WAF (web application firewall) that can mitigate and defend against a variety of attacks such as cross-site scripting and denial of service. As a summary overview, the first … GCP: enabling Cloud Armor on GCP HTTPS LB for Anthos Service Mesh