Kubernetes: major version upgrade of Anthos GKE on-prem from 1.10 to 1.11

Anthos GKE on-prem is a managed platform that brings GKE clusters to on-premise datacenters. In this article, I will be following the steps required to upgrade from Anthos 1.10 to 1.11 on VMware.

The instructions provided here are assuming you have used the Ansible scripts and Seed VM described in my previous Anthos 1.10 installation article.

 

Overview

The proper order for a major-version upgrade is:

  • Download the newer gkeadm tool
  • Upgrade the Admin Workstation
  • Install new full bundle for upgrade
  • Upgrade the User Clusters
  • Upgrade the Admin Cluster

Login to the Seed VM (from host)

The initial seed VM is the guest used to create the Admin Workstation.

cd anthos-nested-esx-manual
export project_path=$(realpath .)

# login to intial seed VM used to create the Admin Workstation
ssh -i ./tf-kvm-seedvm/id_rsa ubuntu@192.168.140.220

Download the newer gkeadm tool (from the Seed VM)

Download the newer 1.11 gkeadm tool.

cd ~/seedvm
gsutil cp gs://gke-on-prem-release/gkeadm/1.11.0-gke.543/linux/gkeadm ./gkeadm1100

chmod +x gkeadm1100
./gkeadm1100 version

Upgrade the Admin Workstation (from the Seed VM)

Use the admin-ws-config.yaml used to initially setup the Admin Workstation, and the generated Admin Workstation info file (which matches the name of the Admin Workstation).

$ ./gkeadm1100 upgrade admin-workstation --config admin-ws-config.yaml --info-file gke-admin-ws

Using config file "admin-ws-config.yaml"...
Running validations...
- Validation Category: Tools
    - [SUCCESS] gcloud
    - [SUCCESS] ssh
    - [SUCCESS] ssh-keygen
    - [SUCCESS] scp

- Validation Category: Config Check
    - [SUCCESS] Config

- Validation Category: Internet Access
    - [SUCCESS] Internet access to required domains

- Validation Category: GCP Access
    - [SUCCESS] Read access to GKE on-prem GCS bucket

- Validation Category: vCenter
    - [SUCCESS] Credentials
    - [SUCCESS] vCenter Version
    - [SUCCESS] ESXi Version
    - [SUCCESS] Datacenter
    - [SUCCESS] Datastore
    - [SUCCESS] Resource Pool
    - [SUCCESS] Folder
    - [SUCCESS] Network

All validation results were SUCCESS.
Upgrading admin workstation "gke-admin-ws" from version "1.10.0-gke.194" to version "1.11.0-gke.543"...
Generating local backup of admin workstation VM "gke-admin-ws"...  DONE
Downloading OS image "gs://gke-on-prem-release/admin-appliance/1.11.0-gke.543/gke-on-prem-admin-appliance-vsphere-1.11.0-gke.543.ova"...
 [==================================================>]   8.95GB/8.95GB
Image saved to /home/ubuntu/seedvm/gke-on-prem-admin-appliance-vsphere-1.11.0-gke.543.ova
Verifying image gke-on-prem-admin-appliance-vsphere-1.11.0-gke.543.ova...  
DONE
Setting up OS image as a VM template in vSphere...
...
[07-05-22 02:00:20] Uploading OS image "gke-on-prem-admin-appliance-vsphere-1.11.0-gke.543" to vSphere...
(100%, 118.0KiB[07-05-22 02:00:20] Uploading OS image "gke-on-prem-admin-appliance-vsphere-1.11.0-gke.543" to vSphere...OK
Do not cancel (double ctrl-c) while the admin workstation "gke-admin-ws" is being decommissioned. Doing so may result in an unrecoverable state.
Decommissioning original admin workstation VM "gke-admin-ws"...  DONE
Do not cancel (double ctrl-c) once the new admin workstation VM has been created. Doing so may result in an unrecoverable state.
Creating admin workstation VM "gke-admin-ws-1-11-0-gke-543-1651888821"...  
DONE
Waiting for admin workstation VM "gke-admin-ws-1-11-0-gke-543-1651888821" to be 
assigned an IP....  DONE

******************************************
Admin workstation VM successfully created:
- Name:    gke-admin-ws-1-11-0-gke-543-1651888821
- IP:      192.168.140.221
- SSH Key: /home/ubuntu/.ssh/gke-admin-workstation
******************************************
Deleting admin workstation VM "gke-admin-ws"...  DONE
Renaming new admin workstation "gke-admin-ws-1-11-0-gke-543-1651888821" to "gke-admin-ws"
Printing gkectl and docker versions on admin workstation...
gkectl version
gkectl 1.11.0-gke.543 (git-7e4c4c24a)
Add --kubeconfig to get more version information.

docker version
Client:
 Version:           20.10.7
 API version:       1.41
 Go version:        go1.13.8
 Git commit:        20.10.7-0ubuntu5~20.04.2~anthos1
 Built:             Wed Dec  8 15:14:53 2021
 OS/Arch:           linux/amd64
 Context:           default
 Experimental:      true

Server:
 Engine:
  Version:          20.10.7
  API version:      1.41 (minimum version 1.12)
  Go version:       go1.13.8
  Git commit:       20.10.7-0ubuntu5~20.04.2~anthos1
  Built:            Fri Nov 12 16:42:06 2021
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.5.8-0ubuntu0~20.04.1~anthos1.1
  GitCommit:        
 runc:
  Version:          1.0.0~rc95-0ubuntu1~20.04.1~anthos1
  GitCommit:        
 docker-init:
  Version:          0.19.0
  GitCommit:        


Checking NTP server on admin workstation...
timedatectl
               Local time: Sat 2022-05-07 02:09:35 UTC
           Universal time: Sat 2022-05-07 02:09:35 UTC
                 RTC time: Sat 2022-05-07 02:09:35    
                Time zone: Etc/UTC (UTC, +0000)       
System clock synchronized: yes                        
              NTP service: active                     
          RTC in local TZ: no                         

Getting component access service account...

Preparing "credential.yaml" for gkectl...

Copying files to admin workstation...
    - vcenter.ca.pem
    - anthos-allowlisted.json
    - /tmp/gke-on-prem-vcenter-credentials104537762/credential.yaml

Updating admin-cluster.yaml for gkectl...

********************************************************************
Admin workstation is ready to use.

WARNING: file already exists at "/home/ubuntu/seedvm/gke-admin-ws". Overwriting.
Admin workstation information saved to /home/ubuntu/seedvm/gke-admin-ws
This file is required for future upgrades
SSH into the admin workstation with the following command:
ssh -i /home/ubuntu/.ssh/gke-admin-workstation ubuntu@192.168.140.221
********************************************************************

This command does a backup of the files on your current Admin Workstation, kubeconfig, root certs, and json files; then creates a newer Admin Workstation and copies those files back onto it.  The backing vmdk disk for the AdminWS (‘dataDiskName’ in admin-ws-config.yaml) is re-attached to this new VM.

As stated in the output of the command, you will temporarily see a new VM in vCenter. However, this is only a temporary name until the older Admin WS is deleted.

A local file listing will show the backup archive that was created.  If the Admin Workstation upgrade failed, you would extract this over the top of a fresh Admin workstation VM to recover.

$ ls -l *.gz
-rw-r--r-- 1 ubuntu ubuntu 2076012 May 7 01:28 gke-admin-ws-backup.tar.gz

# exit from seed VM console, back to host
$ exit

Setup minimal utilities on Admin WS (from host)

The Admin Workstation has been recreated, so OS settings and utilities such as govc/k9s are no longer there.  Reinstall them using the Ansible playbook.

ansible-playbook playbook_adminws.yml

Test ssh to Admin WS (from host)

The private ssh key set on the upgraded 1.11 Admin Workstation was copied over, but the fingerprint of the host will have changed.  Validate login and change the ssh settings.

cd $project_path/needed_on_adminws

# clear old fingerprint to AdminWS
ssh-keygen -f ~/.ssh/known_hosts -R 192.168.140.221

# login to new Admin WS
ssh -i $project_path/needed_on_adminws/gke-admin-workstation ubuntu@192.168.140.221

# uptime will be low, because VM just created
uptime

# reset the ssh server timeout (destroyed during rebuild)
./adminws_ssh_increase_timeout.sh

# back to host
exit

Install full bundle for upgrade (from the Admin WS)

To do an upgrade, the newer full bundle needs to downloaded and prepared.

# login to new Admin WS
ssh -i $project_path/needed_on_adminws/gke-admin-workstation ubuntu@192.168.140.221

# view current bundles already downloaded locally
$ ls /var/lib/gke/bundles
gke-onprem-vsphere-1.11.0-gke.543-full.tgz gke-onprem-vsphere-1.11.0-gke.543.tgz

# view bundles currently in use by admin and user clusters
# will show older versions
$ gkectl version --kubeconfig /home/ubuntu/kubeconfig --details

gkectl version: 1.11.0-gke.543 (git-7e4c4c24a)

onprem user cluster controller version: 1.10.0-gke.194

current admin cluster version: 1.10.0-gke.194

current user cluster versions (VERSION: CLUSTER_NAMES):
- 1.10.0-gke.194: user1

available admin cluster versions:
- 1.10.0-gke.194

available user cluster versions:
- 1.10.0-gke.194

Info: The admin workstation and gkectl is NOT ready to upgrade to "1.12" yet, because there are "1.10" clusters.
Info: The admin cluster can't be upgraded to "1.11", because there are still "1.10" user clusters.

This shows us that the Admin and User cluster are still using 1.10.0-gke.194. Now we need to prepare the 1.11 full bundle.

# full bundle already found locally in /var/lib/gke/bundles
# assign appropriate permissions
$ sudo chmod ugo+r /var/lib/gke/bundles/*.tgz

$ gkectl prepare --bundle-path /var/lib/gke/bundles/gke-onprem-vsphere-1.11.0-gke.543-full.tgz --kubeconfig /home/ubuntu/kubeconfig

- Validation Category: Config Check
    - [SUCCESS] Config

- Validation Category: OS Images
    - [FAILURE] Admin cluster OS images exist: os images [gke-on-prem-ubuntu-1.11.0-gke.543] 
don't exist, please run prepare to upload os images.

- Validation Category: Internet Access
    - [SUCCESS] Internet access to required domains

- Validation Category: GCP
    - [SUCCESS] GCP service
    - [SUCCESS] GCP service account

- Validation Category: Container Registry
    - [SUCCESS] Docker registry access

- Validation Category: VCenter
    - [SUCCESS] Credentials
    - [SUCCESS] vCenter Version
    - [SUCCESS] ESXi Version
    - [SUCCESS] Datacenter
    - [SUCCESS] Datastore
    - [SUCCESS] Resource pool
    - [SUCCESS] Folder
    - [SUCCESS] Network

Some validation results were FAILURE or UNKNOWN. Check report above.
Logging in to gcr.io/gke-on-prem-release
Finished preparing the container images.
Using image file: "/tmp/gke-on-prem-bundle-cache/1007121/gke-on-prem-ubuntu-1.11.0-gke.543.ova"
Setting up OS image as a VM template in vSphere...
[07-05-22 14:25:14] Uploading OS image "gke-on-prem-ubuntu-1.11.0-gke.543" to vSphere...OK
Using image file: "/tmp/gke-on-prem-bundle-cache/1007121/gke-on-prem-cos-1.11.0-gke.543.ova"
Setting up OS image as a VM template in vSphere...
[07-05-22 14:30:30] Uploading OS image "gke-on-prem-cos-1.11.0-gke.543" to vSphere...OK
Finished preparing the OS images.
    Applying Bundle CRD YAML...  DONE
    Applying Bundle CRs...  DONE
Applied bundle in the admin cluster.
Successfully prepared the environment.

We are using the full bundle, which contains the large binary images needed for a major upgrades. In contrast to minor upgrades, where the image binaries typically do not change and so the regular bundle is all that is needed.

If you use the regular bundle for major upgrades, gkectl will download the full bundle as part of its upgrade processing in later steps. So, for major upgrades you may as well prepare the full bundle now.

Upgrade the User Clusters (from the Admin WS)

Prerequisites

Before upgrading, the cluster must be registered in the Anthos>Clusters of the Cloud Console (https://console.cloud.google.com). Also, there needs to be at least one free IP address from the user-block.yaml to accommodate the serial creation of a new worker node.

Check current version and image

If you used my previous article to install Anthos 1.10, then you are using ‘ubuntu_containerd‘ for the osImageType.  dockershim is already deprecated, and will be removed in Kubernetes 1.24 so it is important that you migrate away from the Ubuntu docker image in future preparation.

$ kubectl get nodes -o custom-columns="NAME:.metadata.name,VERSION:.status.nodeInfo.kubeletVersion,RUNTIME:.status.nodeInfo.containerRuntimeVersion,IMAGE:.status.nodeInfo.osImage"
NAME         VERSION            RUNTIME              IMAGE
user-host1   v1.21.5-gke.1200   containerd://1.5.8   Ubuntu 20.04.3 LTS
user-host2   v1.21.5-gke.1200   containerd://1.5.8   Ubuntu 20.04.3 LTS
user-host3   v1.21.5-gke.1200   containerd://1.5.8   Ubuntu 20.04.3 LTS

Upgrade

Run the gkectl command as shown below using the Admin Cluster kubeconfig and the config used to originally setup the User Cluster.

$ gkectl upgrade cluster --kubeconfig /home/ubuntu/kubeconfig --config user-cluster.yaml -v 3

Reading config with version "v1"
- Validation Category: Config Check
    - [SUCCESS] Config

- Validation Category: OS Images
    - [SUCCESS] User OS images exist

- Validation Category: Cluster Health
    Running validation check for "Admin cluster health"... |
    - [SUCCESS] Admin cluster health
    Running validation check for "Admin PDB"... |
W0507 19:03:00.064837    8374 warnings.go:70] policy/v1beta1 PodDisruptionBudget is deprecated in v1.21+, unavailable in    - [SUCCESS] Admin PDB
    - [SUCCESS] User cluster health
    Running validation check for "User PDB"... |
W0507 19:03:01.269006    8374 warnings.go:70] policy/v1beta1 PodDisruptionBudget is deprecated in v1.21+, unavailable in    - [SUCCESS] User PDB

- Validation Category: Reserved IPs
    - [SUCCESS] Admin cluster reserved IP for upgrading cluster
    - [SUCCESS] User cluster reserved IP for upgrading a user cluster

- Validation Category: GCP
    - [SUCCESS] GCP service
    - [SUCCESS] GCP service account

- Validation Category: Container Registry
    - [SUCCESS] Docker registry access

- Validation Category: VCenter
    - [SUCCESS] Credentials
    - [SUCCESS] VSphere CSI Driver

All validation results were SUCCESS.
Upgrading to bundle version: "1.11.0-gke.543"
Updating platform to "1.11.0-gke.543"... |
W0507 19:05:52.222554    8374 warnings.go:70] policy/v1beta1 PodDisruptionBudget is deprecated in v1.21+, unavailable inUpdating platform to "1.11.0-gke.543"... /
...
Updating platform to "1.11.0-gke.543"...  DONE
Reading config with version "v1"

Seesaw Upgrade Summary:
OS Image updated (old -> new): "gke-on-prem-ubuntu-1.10.0-gke.194" -> "gke-on-prem-ubuntu-1.11.0-gke.543"

Upgrading loadbalancer "seesaw-for-user1"
Deleting LB VM:  seesaw-for-user1-hmxgk4w4ng-1...  DONE
Creating new LB VMs...  DONE
Saved upgraded Seesaw group information of "seesaw-for-user1" to file: seesaw-for-user1.yaml
Waiting LBs to become ready...  DONE
Updating create-config secret...  DONE
Loadbalancer "seesaw-for-user1" is successfully upgraded.
Skipping admin cluster backup since clusterBackup section is not set in admin cluster seed config
Waiting for user cluster "user1" to be ready... \
Waiting for user cluster "user1" to be ready...  DONE
    Cluster is running...
    Creating or updating user cluster control plane workloads: deploying 
user-kube-apiserver-base, user-control-plane-base, 
user-control-plane-clusterapi-vsphere, user-control-plane-etcddefrag: 0/1 
statefulsets are ready...
    Creating or updating user cluster control plane workloads: deploying 
user-control-plane-base, user-control-plane-clusterapi-vsphere, 
user-control-plane-etcddefrag...
    Creating or updating user cluster control plane workloads: deploying 
user-control-plane-etcddefrag...
    Creating or updating user cluster control plane workloads: 13/15 pods are ready...
    Creating or updating node pools: pool-1: hasn't been seen by controller yet...
    Creating or updating node pools: pool-1: 1/3 replicas are updated...
    Creating or updating node pools: pool-1: Creating or updating node pool...
    Creating or updating node pools: pool-1: 2/3 replicas are updated...
    Creating or updating node pools: pool-1: Creating or updating node pool...
    Creating or updating node pools: pool-1: 4/3 replicas show up...
    Creating or updating node pools: pool-1: Creating or updating node pool...
    Creating or updating addon workloads: 3/4 machines are ready...
    Cluster is running...
Skipping admin cluster backup since clusterBackup section is not set in admin cluster seed config
Done upgrading user cluster user1.
Done upgrade.

This upgrades the load balancers first, then the User Cluster control plane and finally the User Cluster worker nodes.

During this process, in vCenter you will see the newer template being cloned as newer worker nodes are spun up to replace the older versions.

Invocation of “kubectl get nodes” during the upgrade will show nodes being swapped up serially as new nodes are brought in and older ones deleted. There are small windows of time when there are N+1 worker nodes.

Validation of new version

When done, you should see output similar to below where the VERSION is now v.1.22 (was v.1.21) and the Ubuntu image is 20.04.4 (was 20.04.3).

$ kubectl get nodes -o custom-columns="NAME:.metadata.name,VERSION:.status.nodeInfo.kubeletVersion,RUNTIME:.status.nodeInfo.containerRuntimeVersion,IMAGE:.status.nodeInfo.osImage" 
NAME         VERSION           RUNTIME              IMAGE
user-host2   v1.22.8-gke.200   containerd://1.5.8   Ubuntu 20.04.4 LTS
user-host4   v1.22.8-gke.200   containerd://1.5.8   Ubuntu 20.04.4 LTS
user-host5   v1.22.8-gke.200   containerd://1.5.8   Ubuntu 20.04.4 LTS

If this upgrade failed half-way through, you would want to invoke the exact same command but with the “skip-validation-all” flag to resume the upgrade.

Upgrade the Admin Cluster (from the Admin WS)

Prerequisites
There needs to be at least one free IP address from the admin-block.yaml to accommodate the serial creation of a new master nodes.

You also need to make sure that certs on the Admin Cluster master with a “sudo kubeadm certs check-expiration” on the master node, which is described in detail in the docs.

Check current version and image
The Admin cluster is still at the older version.

KUBECONFIG=kubeconfig kubectl get nodes -o custom-columns="NAME:.metadata.name,VERSION:.status.nodeInfo.kubeletVersion,RUNTIME:.status.nodeInfo.containerRuntimeVersion,IMAGE:.status.nodeInfo.osImage"
NAME          VERSION            RUNTIME              IMAGE
admin-host1   v1.21.5-gke.1200   containerd://1.5.8   Ubuntu 20.04.3 LTS
admin-host2   v1.21.5-gke.1200   containerd://1.5.8   Ubuntu 20.04.3 LTS
admin-host3   v1.21.5-gke.1200   containerd://1.5.8   Ubuntu 20.04.3 LTS
admin-host4   v1.21.5-gke.1200   containerd://1.5.8   Ubuntu 20.04.3 LTS

Run the gkectl command as shown below using the Admin Cluster kubeconfig and the config used to originally setup the Admin Cluster.

$ gkectl upgrade admin --kubeconfig kubeconfig --config admin-cluster.yaml -v 3

Reading config with version "v1"
Reading bundle at path: "/var/lib/gke/bundles/gke-onprem-vsphere-1.11.0-gke.543-full.tgz".
Admin cluster is healthy. Proceeding with upgrade.
- Validation Category: Config Check
    - [SUCCESS] Config

- Validation Category: OS Images
    - [SUCCESS] Admin OS images exist

- Validation Category: Cluster Health
    Running validation check for "Admin cluster health"... |
    - [SUCCESS] Admin cluster health
    Running validation check for "Admin PDB"... |
W0507 20:44:27.193427    9638 warnings.go:70] policy/v1beta1 PodDisruptionBudget is deprecated in v1.21+, unavailable in    - [SUCCESS] Admin PDB
    - [SUCCESS] All user clusters health

- Validation Category: Reserved IPs
    - [SUCCESS] Admin cluster reserved IP for upgrading cluster

- Validation Category: GCP
    - [SUCCESS] GCP service
    - [SUCCESS] GCP service account

- Validation Category: Container Registry
    - [SUCCESS] Docker registry access

- Validation Category: VCenter
    - [SUCCESS] Credentials

All validation results were SUCCESS.
Upgrading to bundle version "1.11.0-gke.543"
Reading config with version "v1"

Seesaw Upgrade Summary:
OS Image updated (old -> new): "gke-on-prem-ubuntu-1.10.0-gke.194" -> "gke-on-prem-ubuntu-1.11.0-gke.543"

Upgrading loadbalancer "seesaw-for-gke-admin"
Deleting LB VM:  seesaw-for-gke-admin-zh82srm57z-1...  DONE
Creating new LB VMs...  DONE
Saved upgraded Seesaw group information of "seesaw-for-gke-admin" to file: seesaw-for-gke-admin.yaml
Waiting LBs to become ready...  DONE
Updating create-config secret...  DONE
Loadbalancer "seesaw-for-gke-admin" is successfully upgraded.
Skipping admin cluster backup since clusterBackup section is not set in admin cluster seed config
Creating cluster "gkectl" ...
DEBUG: docker/images.go:67] Pulling image: gcr.io/gke-on-prem-release/kindest/node:v0.11.1-gke.32-v1.22.6-gke.2100 ...
 ✓ Ensuring node image (gcr.io/gke-on-prem-release/kindest/node:v0.11.1-gke.32-v1.22.6-gke.2100) 🖼 
 ✓ Preparing nodes 📦  
 ✓ Writing configuration 📜 
 ✓ Starting control-plane 🕹️ 
 ✓ Installing CNI 🔌 
 ✓ Installing StorageClass 💾 
 ✓ Waiting ≤ 5m0s for control-plane = Ready ⏳ 
 • Ready after 36s 💚
    Waiting for external cluster control plane to be healthy...  DONE
Applying admin bundle to external cluster
    Applying Bundle CRD YAML...  DONE
    Applying Bundle CRs...  DONE
...
   Waiting for external cluster cluster-api to be ready...  DONE
    Restoring checkpoint state in the external cluster...  DONE
    Creating cluster object gke-admin-prcjz in kind cluster...  DONE
    Creating master...  DONE
    Updating kind cluster object with master endpoint...  DONE
Creating internal cluster
    Getting internal cluster kubeconfig...  DONE
Applying admin bundle to internal cluster
    Applying Bundle CRD YAML...  DONE
    Applying Bundle CRs...  DONE
...
Applying admin master bundle components...  DONE
    Creating master...  DONE
    Updating admin cluster checkpoint...  DONE
    Updating external cluster object with master endpoint...  DONE
Creating internal cluster
    Getting internal cluster kubeconfig...  DONE
    Waiting for internal cluster control plane to be healthy... /
    Waiting for internal cluster control plane to be healthy...  DONE
Applying admin bundle to internal cluster
    Applying Bundle CRD YAML...  DONE
    Applying Bundle CRs... -
...
    Rebooting admin control plane machine...  DONE
    Pivoting objects from kind to internal cluster...  DONE
    Waiting for internal cluster control plane to be healthy...  DONE
    Waiting for internal cluster cluster-api to be ready...  DONE
Waiting for admin addon and user master nodes in the internal cluster to become ready...  
DONE
    Waiting for control plane to be ready...  DONE
    Waiting for kube-apiserver VIP to be configured on the internal cluster...  DONE
Applying admin node bundle components...  DONE
Creating node Machines in internal cluster...  DONE
...
Pruning unwanted admin node bundle components... /
Applying admin addon bundle to internal cluster...  DONE
Updating platform to "1.11.0-gke.543"... \
Updating platform to "1.11.0-gke.543"...  DONE
    Waiting for admin cluster system workloads to be ready...  DONE
Waiting for admin cluster machines and pods to be ready...  DONE
Pruning unwanted admin base bundle components... |
...
Pruning unwanted admin addon bundle components...  DONE
    Waiting for admin cluster system workloads to be ready...  DONE
Waiting for admin cluster machines and pods to be ready...  DONE
Skipping admin cluster backup since clusterBackup section is not set in admin cluster seed config
Trigger reconcile on user cluster 'user1/user1-gke-onprem-mgmt' to upgrade its user master VMs to the same version "1.11.0-gke.543" as the admin cluster
Waiting for reconcile to complete...  DONE
Cleaning up external cluster...  DONE
Done upgrading admin cluster.

This upgrades the load balancers first, then the Admin Cluster nodes.

Invocations of “kubectl get nodes” during the upgrade will show master node versions being swapped up serially as new nodes are brought in and older ones deleted. There are small windows of time when there are N+1 worker nodes.

# called half-way through the upgrade process
# notice 3 nodes at older version, and 2 at newer version
$ KUBECONFIG=kubeconfig kubectl get nodes -o custom-columns="NAME:.metadata.name,VERSION:.status.nodeInfo.kubeletVersion,RUNTIME:.status.nodeInfo.containerRuntimeVersion,IMAGE:.status.nodeInfo.osImage"
NAME          VERSION            RUNTIME              IMAGE
admin-host1   v1.22.8-gke.200    containerd://1.5.8   Ubuntu 20.04.4 LTS
admin-host2   v1.21.5-gke.1200   containerd://1.5.8   Ubuntu 20.04.3 LTS
admin-host3   v1.21.5-gke.1200   containerd://1.5.8   Ubuntu 20.04.3 LTS
admin-host4   v1.21.5-gke.1200   containerd://1.5.8   Ubuntu 20.04.3 LTS
admin-host5   v1.22.8-gke.200    containerd://1.5.8   Ubuntu 20.04.4 LTS

Admin Cluster upgrades are resumable (with caveats) starting with Anthos 1.10. See here for ‘gkectl repair admin-master‘ details, or contact Google Support for assistance.

Validation of new version

When done, you should see output similar to below where the VERSION is now v.1.22 (was v.1.21) and the Ubuntu image is 20.04.4 (was 20.04.3).

$ KUBECONFIG=kubeconfig kubectl get nodes -o custom-columns="NAME:.metadata.name,VERSION:.status.nodeInfo.kubeletVersion,RUNTIME:.status.nodeInfo.containerRuntimeVersion,IMAGE:.status.nodeInfo.osImage"
NAME          VERSION           RUNTIME              IMAGE
admin-host1   v1.22.8-gke.200   containerd://1.5.8   Ubuntu 20.04.4 LTS
admin-host2   v1.22.8-gke.200   containerd://1.5.8   Ubuntu 20.04.4 LTS
admin-host4   v1.22.8-gke.200   containerd://1.5.8   Ubuntu 20.04.4 LTS
admin-host5   v1.22.8-gke.200   containerd://1.5.8   Ubuntu 20.04.4 LTS

 

REFERENCES

Anthos upgrade guide

Anthos downloads

Anthos release notes

Google ref, using MetalLB for loadbalancing (versus older SeeSaw)

Google ref, import GKE OS image when direct access to ESX not provided

fabianlee.org, install Anthos 1.10

NOTES

Checking for expired certs on master before admin cluster upgrade

KUBECONFIG=$(realpath kubeconfig)

kubectl --kubeconfig "${KUBECONFIG}" get secrets -n kube-system sshkeys 
-o jsonpath='{.data.vsphere_tmp}' | base64 -d > ~/.ssh/admin-cluster.key && chmod 600 ~/.ssh/admin-cluster.key

export MASTER_NODE_IP=$(kubectl --kubeconfig "${KUBECONFIG}" get nodes -o jsonpath='{.items[*].status.addresses[?(@.type=="ExternalIP")].address}' --selector='node-role.kubernetes.io/master')

# login and run
ssh -i ~/.ssh/admin-cluster.key ubuntu@"${MASTER_NODE_IP}"
$ sudo kubeadm certs check-expiration

Testing ingress using SeeSaw before upgrade

my_service_ip=$(kubectl get services my-service --output=jsonpath="{.spec.loadBalancerIP}")
curl http://$my_service_ip/hello

Viewing IP addresses of User and Admin SeeSaw LB from vsphere level

govc find vm -name seesaw-for-user*
ulb_name=$(govc find vm -name seesaw-for-user* | sed -e 's#.*/##')
govc vm.info -json $ulb_name | jq ".VirtualMachines[0].Guest.Net[].IpConfig.IpAddress[].IpAddress"

govc find vm -name seesaw-for-gke-admin*
alb_name=$(govc find vm -name seesaw-for-gke-admin* | sed -e 's#.*/##')
govc vm.info -json $alb_name | jq ".VirtualMachines[0].Guest.Net[].IpConfig.IpAddress[].IpAddress"