Hyperscaler

Kubernetes: targeting workloads to a node pool/group using taints and tolerations

If you have specific intentions for a Kubernetes node pool/group (workload isolation, cpu type, etc.), then you can assign labels to attract workloads in conjunction with taints to repel workloads that do not have explicit tolerations applied. And although the generalized kubectl utility can assign labels and taints to specific nodes, the assignment of labels Kubernetes: targeting workloads to a node pool/group using taints and tolerations

GCP: publishing and reading from Google PubSub Topic using Python client libraries

Google Pub/Sub is a managed messaging platform providing a scalable, asynchronous, loosely-coupled solution for communication between application entities. It centers around the concept of a Topic (queue).  A Publisher can put messages on the Topic, and a Subscriber can read messages from the Subscription on a Topic. In this article, I will first use the GCP: publishing and reading from Google PubSub Topic using Python client libraries

GCP: Installing KEDA on a GKE cluster with workload identity and testing Scalers

KEDA is an open-source event-driven autoscaler that greatly enhances the abilities of the standard HorizontalPodAutoscaler.  It can scale based on internal metrics as well as external Scaler sources. In this article, I will illustrate how to install KEDA on a GKE cluster that has Workload Identity enabled, and then how to configure KEDA scaling events GCP: Installing KEDA on a GKE cluster with workload identity and testing Scalers

GCP: historical log of GKE cluster and nodepool upgrades and scaling

Although the simple ‘gcloud container operations list‘ command is the easiest way to find recent upgrade events on your GKE cluster or nodepool, it returns only the recent events and does not provide a historical record. If you need to look at historical events, you can use Logs Explorer web UI or use the ‘gcloud GCP: historical log of GKE cluster and nodepool upgrades and scaling

GCP: determining whether ASM is installed via asmcli or gcloud fleet

Anthos Service Mesh for GKE can be installed in the following modes: In-cluster ASM using the asmcli utility Managed ASM using the asmcli utility Managed ASM using the ‘gcloud container fleet’ command Managed ASM using the Terraform asm submodule If you need to determine the installation mode used on your GKE cluster, you can examine GCP: determining whether ASM is installed via asmcli or gcloud fleet

GCP: determining whether GKE cluster mode is Standard or Autopilot

If you need to determine at the CLI whether a GKE cluster is managed using Standard or Autopilot mode, this is available by using gcloud to describe the cluster. # identify cluster and location gcloud container clusters list cluster_name=<clusterName> location_flag=”–region=<region>” # OR –zone=<zone> # returns ‘True’ if GKE AutoPilot cluster # returns empty if standard GCP: determining whether GKE cluster mode is Standard or Autopilot

GKE: terraform lifecycle ‘ignore_changes’ to manage external changes to GKE cluster

As much as Terraform pushes to be the absolute system of record for resources it creates, often valid external processes are assisting in managing those same resources. Here are some examples of legitimate external changes: Other company-approved Terraform scripts applying labeling to resources in order to track ownership and costs Security teams modifying IAM roles GKE: terraform lifecycle ‘ignore_changes’ to manage external changes to GKE cluster

GCP: Cloud Run with build trigger coming from remote GitHub repository

GCP build triggers can easily handle Continuous Deployment (CD) when the source code is homed in a Google Cloud Source repository.  But even if the system of record for your source is a remote GitHub repository, these same type of push and tag events can be consumed if you configure a connection and repository link. GCP: Cloud Run with build trigger coming from remote GitHub repository

GCP: Cloud Run/Function to handle requests to GKE cluster during maintenance

At some point, there will be a system change significant enough that a maintenance window needs to be scheduled with customers.   But that doesn’t mean the end-user traffic or client integrations will stop requesting the services. What we need to present to end-users is a maintenance page during this outage to indicate the overall solution GCP: Cloud Run/Function to handle requests to GKE cluster during maintenance

GKE: show pod distribution across nodes and zones

Whether you are working on scaling, performance, or high-availability, it can be useful to see exactly which Kubernetes worker node that pods are being scheduled unto. Pods as distributed across worker nodes ns=default kubectl get pods -n $ns -o=custom-columns=NAME:.metadata.name,NODE:.spec.nodeName Pods as distributed across zones (GKE specific) If you wanted to take it one step further GKE: show pod distribution across nodes and zones

GKE: upgrade Anthos Config Management for GKE cluster

If you are managing GKE clusters using Anthos Config Management (ACM) and need to take advantage of newer features or enhancements in ConfigSync or PolicyController, upgrading these components can be done using the gcloud utility. # check current version of ACM on GKE clusters gcloud beta container fleet config-management version # select membership to upgrade GKE: upgrade Anthos Config Management for GKE cluster

Terraform: migrate state from local to remote Google Cloud Storage bucket and back

In this article I will demonstrate how to take a Terraform configuration that is using a local state file and migrate its persistent state to a remote Google Cloud Storage bucket (GCS).  We will then perform the migration again, but this time to bring the remote state back to a local file. We will illustrate Terraform: migrate state from local to remote Google Cloud Storage bucket and back

GKE: Determine Anthos on-prem GKE master node and IP address

If you are using Anthos GKE on-premise and need to determine which node of your Admin Cluster is the master, query for the master role.  The label is ‘node-role.kubernetes.io/master’. $ kubectl get nodes -l node-role.kubernetes.io/master NAME STATUS ROLES AGE VERSION gke-admin-master-adfwa Ready control-plane,master 7d v1.24.9-gke.100 # using wide will also show External and Internal IP GKE: Determine Anthos on-prem GKE master node and IP address

GCP: Google Cloud Storage bucket with permissions for user or service account

Creating a Google Cloud Storage bucket is simple, but the IAM permissions required to perform operations in the bucket can be difficult to understand.  Especially when you want something as simple as to provide upload/download access to the person who created the bucket and perhaps a service account. Below are the commands for creating a GCP: Google Cloud Storage bucket with permissions for user or service account

GCP: list of available GKE cluster versions in region and channel

If you are going to create a GKE cluster in a region, you may need to be explicit with the version of the master control plane and worker nodes.  Below is how you would list the available versions. # specify your region region=us-east1 gcloud container get-server-config –region=$region

GCP: fix kubectl auth plugin deprecation warning by installing new auth plugin

Starting with Kubernetes client 1.22, you may start seeing warning messages about your authentication mechanism when running commands.  Here is an example when using gcloud for GKE cluster credentials. WARNING: the gcp auth plugin is deprecated in v1.22+, unavailable in v1.26+; use gcloud instead. This is because the authentication provider-specific login code will be removed GCP: fix kubectl auth plugin deprecation warning by installing new auth plugin

GCP: LDAP authentication for Anthos VMware clusters using Anthos Identity Service

Anthos Identity Service allows an organization to tie into their existing Identity Provider to authenticate and authorize users into their Anthos clusters. In this article, I will show how the authentication for an Anthos on VMware cluster can be integrated into an existing Active Directory deployment, and further how a user’s AD group membership can GCP: LDAP authentication for Anthos VMware clusters using Anthos Identity Service

GCP: listing IAM roles for user, group, and service account in project and organization

When GCP operations fail due to permissions issues, checking the IAM roles assigned to a user, group, or service account becomes a necessity. When hierarchical projects and organizations are involved it becomes even more complex.  This article will show you how to use gcloud at the project and organization level to pull IAM policies for GCP: listing IAM roles for user, group, and service account in project and organization

GCP: Enable HttpLoadBalancing feature on Cluster to avoid errors when applying BackEndConfig

If you are configuring Istio/ASM ingress gateways with a BackendConfig for specifying health checks, timeouts, or Cloud Armor policies, then you need to ensure that your GKE cluster has the HttpLoadBalancing feature enabled. If this feature is not enabled, you will see an error message like below when attempting to apply the BackendConfig manifest: unable GCP: Enable HttpLoadBalancing feature on Cluster to avoid errors when applying BackEndConfig

GCP: running a container on a GKE cluster using Workload Identity

With Workload Identity enabled on a GKE cluster, your container can access Google Cloud API services (Compute Engine, Storage, etc.) using a Kubernetes Service Account (KSA). This is done by having the container run as the KSA, where the KSA has been bound to the Google Service Account (GSA).  This is the recommended way of GCP: running a container on a GKE cluster using Workload Identity

GCP: Enabling autoUpgrade for node-pools to reduce manual maintenance

GKE cluster upgrades do not need to be a manual process.  GKE clusters can be auto upgraded by subscribing the cluster to an appropriate release channel and assigning a sensible maintenance window.  As long as adequate pod disruption budgets, replicas, and ingress are configured, these upgrades can happen without interrupting  availability. To check the current GCP: Enabling autoUpgrade for node-pools to reduce manual maintenance

Kubernetes: Anthos GKE on-prem 1.11 on nested VMware environment

Anthos GKE on-prem is a managed platform that brings GKE clusters to on-premise datacenters. This product offering brings best practice security measures, tested paths for upgrades, basic monitoring, platform logging, and full enterprise support. Setting up a platform this extensive requires many steps as officially documented here. However, if you want to practice in a Kubernetes: Anthos GKE on-prem 1.11 on nested VMware environment

GCP: Moving a VM instance to a different region using snapshots

The ‘gcloud compute instances move‘ command is convenient for moving VM instances from one region to another, but only works within a narrow scope of OS image types and disks. For example, only older non-UEFI OS images can be moved with this command. Trying to move even the simplest Ubuntu bionic/focal or Debian bullseye/buster VM GCP: Moving a VM instance to a different region using snapshots

GCP: Enable Policy Controller on a GKE cluster

Anthos Policy Controller enables enforcement of compliance, security, and organizational policies on GKE clusters. These might be best-practice policies coming from internal Architectural standards, or technical policies used to define/constrain resources, or audit requirements stemming from legal regulation. Anthos Policy Controller is built upon the open-source Open Policy Agent (OPA) Gatekeeper, which uses a Kubernetes GCP: Enable Policy Controller on a GKE cluster

GCP: Cloud Function to handle requests to HTTPS LB during maintenance

At some point you may need to schedule a maintenance window for your solution  But that doesn’t mean the end-user traffic or client integrations will stop requesting the services from the GCP external HTTPS LB that fronts all client requests. The VM instances and GKE clusters that normally respond to requests may not be able GCP: Cloud Function to handle requests to HTTPS LB during maintenance

GCP: Deploying a 2nd gen Python Cloud Function and exposing from an HTTPS LB

GCP Cloud Functions have taken a step forward with the 2nd generation release.  One of the biggest architectural differences is that now multiple request can run concurrently on a single instance, enabling large traffic loads. In this article, I will show you how to deploy a simple Python Flask web server as a 2nd gen GCP: Deploying a 2nd gen Python Cloud Function and exposing from an HTTPS LB

GCP: global external HTTPS LB for securely exposing insecure VM services

If you have unmanaged GCP VM instances running services on insecure ports (e.g. Apache HTTP on port 80), one way to secure the public external traffic is to create an external GCP HTTPS load balancer. Conceptually, we want to expose a secure front to otherwise insecure services. While the preferred method would be to secure GCP: global external HTTPS LB for securely exposing insecure VM services

GCP: internal HTTPS LB for securely exposing insecure VM services

If you have unmanaged GCP VM instances running services on insecure ports (e.g. Apache HTTP on port 80), one way to secure the internal communication coming from other internal pods/apps is to create an internal GCP HTTPS load balancer. Conceptually, we want to expose a secure front to otherwise insecure services. While the preferred method GCP: internal HTTPS LB for securely exposing insecure VM services

GCP: Private GKE cluster in Autopilot mode using Terraform

GKE Autopilot reduces the operational costs of managing GKE clusters by freeing you from node level maintenance, instead focusing just on pod workloads.  Costs are accrued based on pod resource consumption and not on node resource sizes or node count, which are managed by Google. Since you no longer own the node level, there are GCP: Private GKE cluster in Autopilot mode using Terraform