Kubernetes: implementing and testing a HorizontalPodAutoscaler

HorizontalPodAutoscaler (HPA) allow you to dynamically scale the replica count of your Deployment based on criteria such as memory or CPU utilization, which make it great way to manage spikes in utilization while still keeping your cluster size and infrastructure costs managed effectively. In order for HPA to evaluate CPU and memory utilization and take Kubernetes: implementing and testing a HorizontalPodAutoscaler

Kubernetes: fixing x509 certificate errors from metric-server on K3s cluster

K3s is deployed by default with a metrics-server, but if you have a multi-node cluster it will fail unless you add the names of all the nodes to the kube-apiserver certificate.  Symptoms of this problem include: metrics-server deployment will throw x509 errors in its log Error when you try to run “kubectl top pods” No Kubernetes: fixing x509 certificate errors from metric-server on K3s cluster

Bash: decoding a JWT from the command line with jq

Although jwt.io has become a common online destination for decoding JWT, this can also be done locally using jq. # populate JWT variable JWT=… # decode with jq utility echo $JWT | jq -R ‘split(“.”) | .[0],.[1] | @base64d | fromjson’ Attribution of credit goes to this gist.

Terraform: error removing module containing legacy provider block, ‘Provider configuration not present’

If you have just removed a module declaration from your Terraform configuration and now get a ‘Provider configuration not present’ error when running apply: Error: Provider configuration not present To work with module.mymodule_legacysyntax.null_resource.test_rs (orphan) its original provider configuration at module.mymodule_legacysyntax.provider[“registry.terraform.io/hashicorp/null”] is required, but it has been removed. This occurs when a provider configuration is removed Terraform: error removing module containing legacy provider block, ‘Provider configuration not present’

Ansible: resolving ‘could not initialize the preferred locale: unsupported locale setting’

If you are getting the following error when invoking ‘ansible’, ‘ansible-playbook’, ‘ansible-galaxy’ or any of the Ansible related utilities: ERROR: Ansible could not initialize the preferred locale: unsupported locale setting This means Ansible cannot find a locale ending in “.UTF-8”.  Check the currently installed locales: $ locale -a Then export the LC_ALL variable to one Ansible: resolving ‘could not initialize the preferred locale: unsupported locale setting’

Kubernetes: evaluating full readiness of deployment, daemonset, or pod

Deployments and Daemonset typically have more than one replica or desired replica count, and although kubectl default formatting will return columns summarizing how many are desired and how many are currently ready, an automated script needs to parse these value in order to determine if full health. Similiarly, pod status as well as the readiness Kubernetes: evaluating full readiness of deployment, daemonset, or pod

Terraform: terraform_remote_state to pass values to other configurations

It would be uncommon to have one monolithic Terraform configuration for all the infrastructure in your organization.  More than likely, there are multiple groups and each has responsibility and ownership of certain components (e.g. networking, storage, authorization, Kubernetes). As an example, let’s say your responsibility is the Kubernetes cluster build. You may need the following Terraform: terraform_remote_state to pass values to other configurations

Kubernetes: creating TLS secrets with kustomize using embedded or external content

There are multiple options for creating a TLS secret using kustomize.  One is to embed the certificate content as a base64 string directly in the data, the other is to use an external file. Below is an example kustomization.yaml file that serves as an entry point for both methods. — apiVersion: kustomize.config.k8s.io/v1beta1 kind: Kustomization resources: Kubernetes: creating TLS secrets with kustomize using embedded or external content

Terraform: fixing error “querying Cloud Storage failed: storage: bucket doesn’t exist”

If you are attempting to run “terraform init” with a Google Cloud Storage backend and get the following error: Error: Failed to get existing workspaces: querying Cloud Storage failed: storage: bucket doesn’t exist The first check should be that the Google Cloud Storage bucket indeed exists, using gsutil. project_id=myproject-123 gsutil ls -p $project_id If the Terraform: fixing error “querying Cloud Storage failed: storage: bucket doesn’t exist”

GitLab: generating URL that can be used for Merge Request from fork to upstream

The forked workflow is popularized by the Open Source community where your personal contributions are made by having your own personal fork of a repository and pushing a GitLab Merge Request to a central repository. A GitLab Merge Request can be submitted from the web UI by clicking on “Merge requests” and manually selecting the GitLab: generating URL that can be used for Merge Request from fork to upstream

GCP: determining whether ASM is installed via asmcli or gcloud fleet

Anthos Service Mesh for GKE can be installed in the following modes: In-cluster ASM using the asmcli utility Managed ASM using the asmcli utility Managed ASM using the ‘gcloud container fleet’ command Managed ASM using the Terraform asm submodule If you need to determine the installation mode used on your GKE cluster, you can examine GCP: determining whether ASM is installed via asmcli or gcloud fleet

Bash: testing if a file exists, has content, and is recently modified

If you need to test for a file’s existence, content size, and whether it was recently modified, the ‘find‘ utility can provide this functionality in a single call. One scenario for this usage might be the cached results from a remote service call (database, REST service, etc).  If fetching these results was a relatively costly Bash: testing if a file exists, has content, and is recently modified

GCP: determining whether GKE cluster mode is Standard or Autopilot

If you need to determine at the CLI whether a GKE cluster is managed using Standard or Autopilot mode, this is available by using gcloud to describe the cluster. # identify cluster and location gcloud container clusters list cluster_name=<clusterName> location_flag=”–region=<region>” # OR –zone=<zone> # returns ‘True’ if GKE AutoPilot cluster # returns empty if standard GCP: determining whether GKE cluster mode is Standard or Autopilot

GKE: terraform lifecycle ‘ignore_changes’ to manage external changes to GKE cluster

As much as Terraform pushes to be the absolute system of record for resources it creates, often valid external processes are assisting in managing those same resources. Here are some examples of legitimate external changes: Other company-approved Terraform scripts applying labeling to resources in order to track ownership and costs Security teams modifying IAM roles GKE: terraform lifecycle ‘ignore_changes’ to manage external changes to GKE cluster

GCP: Cloud Run with build trigger coming from remote GitHub repository

GCP build triggers can easily handle Continuous Deployment (CD) when the source code is homed in a Google Cloud Source repository.  But even if the system of record for your source is a remote GitHub repository, these same type of push and tag events can be consumed if you configure a connection and repository link. GCP: Cloud Run with build trigger coming from remote GitHub repository

GCP: Cloud Run/Function to handle requests to GKE cluster during maintenance

At some point, there will be a system change significant enough that a maintenance window needs to be scheduled with customers.   But that doesn’t mean the end-user traffic or client integrations will stop requesting the services. What we need to present to end-users is a maintenance page during this outage to indicate the overall solution GCP: Cloud Run/Function to handle requests to GKE cluster during maintenance

Ansible: adding custom apt repository with ‘signed-by’ gpg key

The centralized system keyring for apt was deprecated starting in Ubuntu 21, and is being replaced with an explicit path to the local gpg key in the ‘signed-by’ attribute. I have written more extensive articles on this subject [here,here], but from an Ansible perspective, this means ensuring the gpg key is downloaded to ‘/usr/share/keyrings’ with Ansible: adding custom apt repository with ‘signed-by’ gpg key

Ansible: generating templates with deep directory structure using with_filetree

If you have a simple directory containing multiple template files that should be generated on a target host, the ‘with_fileglob‘ lookup plugin provides an easy way to render them.  Below is an example rendering all the files from the ‘templates’ directory of a role. – name: create file out of every file in template directory Ansible: generating templates with deep directory structure using with_filetree

GKE: show pod distribution across nodes and zones

Whether you are working on scaling, performance, or high-availability, it can be useful to see exactly which Kubernetes worker node that pods are being scheduled unto. Pods as distributed across worker nodes ns=default kubectl get pods -n $ns -o=custom-columns=NAME:.metadata.name,NODE:.spec.nodeName Pods as distributed across zones (GKE specific) If you wanted to take it one step further GKE: show pod distribution across nodes and zones

GKE: upgrade Anthos Config Management for GKE cluster

If you are managing GKE clusters using Anthos Config Management (ACM) and need to take advantage of newer features or enhancements in ConfigSync or PolicyController, upgrading these components can be done using the gcloud utility. # check current version of ACM on GKE clusters gcloud beta container fleet config-management version # select membership to upgrade GKE: upgrade Anthos Config Management for GKE cluster

Python: fixing ‘CryptographyDeprecationWarning: Blowfish has been deprecated’

If you are getting a warning similar to below when running a Python3 application: /usr/lib/python3/dist-packages/paramiko/transport.py:219: CryptographyDeprecationWarning: Blowfish has been deprecated This can be resolved by upgrading to the latest paramiko module. # check current version then upgrade pip3 show paramiko pip3 install paramiko –upgrade # check upgraded version pip3 show paramiko In my case, this Python: fixing ‘CryptographyDeprecationWarning: Blowfish has been deprecated’

Terraform: migrate state from local to remote Google Cloud Storage bucket and back

In this article I will demonstrate how to take a Terraform configuration that is using a local state file and migrate its persistent state to a remote Google Cloud Storage bucket (GCS).  We will then perform the migration again, but this time to bring the remote state back to a local file. We will illustrate Terraform: migrate state from local to remote Google Cloud Storage bucket and back

GKE: Determine Anthos on-prem GKE master node and IP address

If you are using Anthos GKE on-premise and need to determine which node of your Admin Cluster is the master, query for the master role.  The label is ‘node-role.kubernetes.io/master’. $ kubectl get nodes -l node-role.kubernetes.io/master NAME STATUS ROLES AGE VERSION gke-admin-master-adfwa Ready control-plane,master 7d v1.24.9-gke.100 # using wide will also show External and Internal IP GKE: Determine Anthos on-prem GKE master node and IP address

Kubernetes: list all pods in deployment

Listing all the pods belonging to a deployment can be done by querying its selectors, but using the deployment’s synthesized replicaset identifier allows for easier automation. # deployment name and namespace deployment_name=mydeployment deployment_ns=mynamespace # get replica set identifier for deployment dep_rs=$(kubectl describe deployment $deployment_name -n $deployment_ns | grep ^NewReplicaSet | awk ‘{print $2}’) # get Kubernetes: list all pods in deployment

OpenWrt: installing dig from opkg

For troubleshooting DNS issues, running the dig utility directly from OpenWrt can be essential.  This is easily done by installing the ‘bind-dig’ package as shown below. opkg update opkg install bind-dig

Ubuntu: ‘Connection to the Snap Store failed’ during upgrade from Ubuntu 20 to 22

If you are upgrading from Ubuntu 20 to Ubuntu 22 using ‘do-release-upgrade’ and get a fatal error ‘Connection to the the Snap Store failed’, this may be resolved by removing the ‘lxd’ package which is a lightweight container supervisor. sudo /etc/init.d/lxd stop sudo rm -fr /var/lib/lxd sudo dpkg –force depends -P lxd; sudo dpkg –force Ubuntu: ‘Connection to the Snap Store failed’ during upgrade from Ubuntu 20 to 22

GCP: Google Cloud Storage bucket with permissions for user or service account

Creating a Google Cloud Storage bucket is simple, but the IAM permissions required to perform operations in the bucket can be difficult to understand.  Especially when you want something as simple as to provide upload/download access to the person who created the bucket and perhaps a service account. Below are the commands for creating a GCP: Google Cloud Storage bucket with permissions for user or service account

Linux: using nmap to check the secure protocols and ciphers of a site

While enabling HTTPS is a important step in securing your web application, it is critical that you take steps to disable legacy protocols and low strength ciphers that can circumvent the very security you are attempting to implement. The Qualys SSL test is popular for grading the overall security of a public site, but you Linux: using nmap to check the secure protocols and ciphers of a site

OpenWrt: bridge VLAN filtering for OpenWrt 21.x with DSA, isolated guest Wi-Fi

There were significant changes made to VLAN configuration between OpenWrt 19.x and 21.x.  Also, many of the target chipset were migrated from swconfig to DSA (Distributed Switch Architecture), which introduced differences in bridging. In this article, I will create a set of VLAN for my OpenWrt 21.x DSA-enabled router with isolated guest Wi-Fi networks. I OpenWrt: bridge VLAN filtering for OpenWrt 21.x with DSA, isolated guest Wi-Fi

Kubernetes: restart a simple pod

A pod belonging to a deployment can be manually deleted, scaled down, or restarted to get a fresh pod.  However, if all you have is a simple pod definition, these actions are not available. One way of restarting the pod is to output its full yaml definition and use ‘kubectl replace’ with the force option. Kubernetes: restart a simple pod

Kubernetes: patch every array element using kubectl and jq

Below is an example using ‘kubectl patch’ to update the securityContext of a single, specific container named ‘my-init-container1’ of the ‘initContainers’ list. kubectl patch deployment my-deployment -n default –patch='{ “spec”: { “template”: { “spec”: { “initContainers”: [ { “name”: “my-init-container1”, “securityContext”: { “runAsUser”: 999 } } ] } } } }’ But ‘initContainers’ is an Kubernetes: patch every array element using kubectl and jq

Ubuntu: fixing apt NO_PUBKEY errors by converting deprecated keyring to signed-by attribute

If apt update throws warnings about invalid signature verification and NO_PUBKEY, you may need to migrate from using the deprecated system keyring to using a ‘signed-by’ attribute in your apt repo definition file. Here are examples of errors you might see when doing an ‘apt update’. W: An error occurred during the signature verification. The Ubuntu: fixing apt NO_PUBKEY errors by converting deprecated keyring to signed-by attribute

GCP: list of available GKE cluster versions in region and channel

If you are going to create a GKE cluster in a region, you may need to be explicit with the version of the master control plane and worker nodes.  Below is how you would list the available versions. # specify your region region=us-east1 gcloud container get-server-config –region=$region