If you are executing gkectl commands that scale or modify your worker nodes and hit a problem, the first place to go is into the gkectl logs. But if you need to dig deeper there are a couple of CRD that can assist in troubleshooting.
First, make sure you have executed gkectl with high verbosity “-v5”, and examine the logs on the Admin Workstation at “/home/ubuntu/.config/gke-on-prem/logs”.
If you need to dig deeper into why new or rebuilt worker nodes are not being provisioned, then look into the ‘machinedeployment’ and ‘machine’ CRD.
# will show on-premise provider details and expected replica counts kubectl describe machinedeployment
Then examine the events on any ‘machine’ type that might indicate an error during the creation or modification of a worker node VM.
# shows list of 'machine' kubectl describe machines # look at details of specific machine # e.g. ipam error if IP cannot be allocated kubectl describe machine <machineId> # view any events involving Machine kubectl get events --field-selector involvedObject.kind=Machine --all-namespaces
If one of the ‘machine’ objects has an error, it can be deleted and the Admin Cluster will attempt a recreation
Also, if additional static IP entries are required to support new node replicas, these can be added manually by editing the cluster as below.
# if new static IP definitions required kubectl edit cluster
REFERENCES
google, troubleshooting gkectl issues
google, managine and creating nodepool
bluematador.com, kubectl get events with involvedObject.kind
netapp docs, deploying additional user clusters with on-prem anthos