K3s is deployed by default with a metrics-server, but if you have a multi-node cluster it will fail unless you add the names of all the nodes to the kube-apiserver certificate. Symptoms of this problem include:
- metrics-server deployment will throw x509 errors in its log
- Error when you try to run “kubectl top pods”
- No evaluation of HorizontalPodAutoscaler based on cpu/memory utilization.
Root problem
If you are on a single node K3s cluster, then all K3s component are running on the same host and therefore the certificate SAN name is always valid. The metrics-server has a requirement to communicate with the kubelet of each host, and since the K3s kubelet, kube-apiserver, and metrics-server are all on the same hostname (e.g. ‘mymaster’), all using the same certificate (serving-kube-apiserver.crt), the certificate is evaluated as valid.
But on a multi-node K3s cluster, when the metrics-server securely reaches out to the kubelet of one of the nodes (e.g. ‘mynode1’ ) for metrics, the name may not match a SAN name from the kubelet certificate, and will be evaluated as invalid.
You can view the SAN names of the certificate being used by the kubelet (an integrated subprocess of kube-apiserver in K3s) by running the following command from the K3s master.
sudo openssl x509 -in /var/lib/rancher/k3s/server/tls/serving-kube-apiserver.crt -text -noout | grep "Subject Alternative Name:" -A1
If the SAN names do not include the hostnames and/or IP of all your cluster nodes, then you will have the x509 errors described.
Resolution
The metrics-server reuses the certificate in use by the kubelet/kube-apiserver. So, we need to modify the certificate used by the kube-apiserver to add the names of the worker nodes to the SAN list of the certificate.
This can be done with the ‘tls-san’ flag. If you are using “/etc/rancher/k3s/config.yaml” to configure the K3s master, then add the hostname and/or IP of your worker nodes in the cluster.
tls-san: [ "node1", "node2", "192.168.122.214", "192.168.122.215" ]
Then run the certificate rotation process from the K3s master to regenerate the api-server certificate.
sudo systemctl stop k3s sudo k3s --debug certificate rotate --service api-server sudo systemctl start k3s
This will add these entries to the SAN list of the certificate and resolve the x509 errors coming from metrics-server when trying to communicate with other nodes.
You can validate the SAN list change on the certificate using the openssl command from the “problem” section above.
Client Validation
After giving the metrics-service a couple of minutes to stabilize and then gather information from the set of nodes in your cluster, ‘kubectl top’ should work if this is successfully resolved.
kubectl top pods
Comment on other resolutions
There are a lot of other threads and github issues where people have issues with the K3s metrics-server collecting data, and workarounds include:
- Setting ‘spec.template.spec.hostNetwork=true’
- Adding the ‘−−kubelet-insecure-tls’ flag to ‘.spec.template.spec.containers[0].args’
- Disabling the K3s metrics bundle ‘−−disable metrics-server’ when invoking K3s, and then installing the metric-server from its official site
- Adding ‘requestheader-allowed-names’ flag as kube-apiserver-arg
I did not find any of these necessary once I added all the worker nodes to the SAN names of the certificate as described.
REFERENCES
kubernetes-sigs, metrics-server and its requirements (such as ability to reach kubelet securely)
k3s docs, kubelet-arg flag for customization
kubernetes docs, kubelet and its arguments
k3s-io wiki, K3s certificate rotation
k3s-docs, tls-san flag in config.yaml
k3s issues, tls-san multiple or with commas
k3s issues, tls-san csv format
k3s issues, need tls-san for joining cluster
github issue, using kubelet-preferred-address-points
github issue, k3s metrics-server hostNetwork does NOT need to be enabled
github issue, k3s requestheader-allowed-names for metrics-server fix
NOTES
Show certificate being used by kubeapi-server using openssl and hitting kube-apiserver
IP_and_port=$(yq '.clusters[0].cluster.server' < $KUBECONFIG | sed 's#https://##') IP=$(echo $IP_and_port | cut -d: -f1) # show SAN names of certificate echo | openssl s_client -showcerts -servername $IP -connect $IP_and_port 2>/dev/null | openssl x509 -inform pem -noout -text | grep "Subject Alternative Name" -A1
Show arguments being used by metrics-server deployment
$ kubectl get deployment metrics-server -n kube-system -o=yaml | yq '.spec.template.spec.containers[0].args' - --cert-dir=/tmp - --secure-port=10250 - --kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname - --kubelet-use-node-status-port - --metric-resolution=15s - --tls-cipher-suites=TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305,TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305