As much as Terraform pushes to be the absolute system of record for resources it creates, often valid external processes are assisting in managing those same resources.
Here are some examples of legitimate external changes:
- Other company-approved Terraform scripts applying labeling to resources in order to track ownership and costs
- Security teams modifying IAM roles and memberships based on principles of least-privilege
- Hyperscaler vendor auto-upgrade of components such as Kubernetes node pool
These scenarios can be addressed by using the lifecycle meta argument ignore_changes, and explicitly providing a list of attributes.
For this article, let’s dive into the specifics of managing a Google GKE cluster and how ignore_changes can be used.
Potential issues if ignore_changes not used
Let’s assume that you created a GKE cluster using the “google_container_cluster” resource. Here is my full main.tf as an example.
If you did not define any ignore_changes attributes, the following issues could occur during the months and years of ongoing maintenance of this cluster:
- Your organization starts a mandatory labeling initiative of resources in order to track ownership, support, and chargebacks. These labels keep getting removed by your terraform script that are unaware of their purpose.
- You introduce Anthos Service Mesh, which adds a cluster “mesh_id” label that enables the metrics dashboard. This label keeps getting removed by your terraform script that is unaware of its purpose or existence.
- The GKE node pool instance count is manually increased during the holidays because of high traffic loads. The terraform script keeps scaling it back down, causing customer performance issues.
- The GKE node pool has auto upgrade enabled which is supposed to reduce manual maintenance, yet keeps getting downgraded by your Terraform scripts to older non-supported versions.
Master control plane upgrades are already understood
As part of the value-add of the platform, Google automatically upgrades the GKE master control plane portion of the Kubernetes cluster.
The ‘min_master_version’ attribute of the container_cluster terraform resource was designed for this purpose, so background upgrades do not force a change in the terraform plan.
Therefore, there is no need to include this attribute in the ‘ignore_changes’ list.
Ignore cluster label changes
External services may be required to set labels at the GKE cluster level. This can be part of an ownership/chargeback initiative, or even for services such as Anthos Service Mesh that append a “mesh_id”.
If you do not use ignore_changes on “resource_labels”, your terraform scripts will remove these additional labels. With ignore_changes set on resource_labels, terraform will ignore any additional labels.
Below is an example of manually changing labels, and seeing it has no affect on the terraform plan.
project_id=$(gcloud config get project) project_number=$(gcloud projects list --filter="id=$project_id" --format="value(projectNumber)") # setup variables for cluster name and location (region or zone) gcloud container clusters list cluster_name=xxxxxx location_flag="--zone=xxxx" # OR --region=xxxx # show current labels resource_labels=$(gcloud container clusters describe $cluster_name $location_flag --format="value(resourceLabels)" | sed 's/;/,/g') echo "current resourceLabel: $resource_labels" # add label gcloud container clusters update $cluster_name $location_flag --update-labels="color=red,$resource_labels" $ terraform plan ... No changes. Your infrastructure matches the configuration. ...
Ignore node pool instance count scaling
Company policy may allow node pool instance counts to be tweaked manually during periods of unexpected high-load or even scaled down to save costs during low traffic months. Use ignore_changes on the ‘initial_node_count‘ and ‘node_count‘ of the google_container_node_pool resource to avoid changes.
Below is an example of manually changing node pool instance counts and seeing it has no affect on the terraform plan.
# setup variable for node pool name gcloud container node-pools list --cluster $cluster_name $location_flag node_pool_name=xxxxx # get current count current_node_count=$(gcloud container clusters describe $cluster_name $location_flag --format="value(currentNodeCount)") # increase by 1 ((current_node_count++)) gcloud container clusters resize $cluster_name --node-pool $node_pool_name --num-nodes $current_node_count --quiet $ terraform plan ... No changes. Your infrastructure matches the configuration. ...
Ignore node pool version changes
If your GKE cluster has AutoUpgrade enabled for the node pool, then Google will perform upgrades during valid maintenance windows. In order to avoid changes in terraform, include “version” in the ignore_changes of the google_container_node_pool resource.
Below is an example of manually upgrading the node pool, and seeing it has no affect on the terraform plan.
# check if 'autoUpgrade' set to true gcloud container clusters describe $cluster_name $location_flag --format="value(nodePools.management)" # available node pool versions gcloud container get-server-config --format="yaml(validNodeVersions)" $location_flag node_version="1.xx.yy-gke.zz" # upgrade node pool gcloud container clusters upgrade $cluster_name --node-pool $node_pool_name --cluster-version $node_version --quiet $ terraform plan ... No changes. Your infrastructure matches the configuration. ...
REFERENCES
fabianlee github, project code for this article
hashicorp ref, lifecycle ignore_changes
Dave Storey, how and went to ignore lifecycle changes in terraform
hashcorp ref, Manage Resource lifecycle
stackoverflow, example scenarios why you would use ignore_changes
NOTES
forcing upgrade of master control plane
# variables for cluster name and location (region or zone) gcloud container clusters list cluster_name=xxxxxx location_flag="--zone=xxxx" # OR --region=xxxx # variable for new control plane version gcloud container get-server-config --format="yaml(validMasterVersions)" $location_flag new_version=1.xx.y-gke.zzzz # do control plane upgrade gcloud container clusters upgrade $cluster_name $location_flag --cluster-version="$new_version" --quiet