I hit an interesting issue with a K8s 1.19.2 running on an IBM Z box, specifically across a pair of Ubuntu containers which were then running on an IBM Secure Service Container (SSC) LPAR on a Z box in IBM's cloud.
One of my colleagues had just upgraded the SSC software on that particular LPAR, known as Hosting Appliance, and I was performing some post-upgrade checks.
Having set the KUBECONFIG variable: -
export KUBECONFIG=~/davehay_k8s.conf
I checked the running pods: -
kubectl get pods --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
default hello-world-nginx-74bbbf57b4-8kzpb 0/1 Error 0 25d
default hello-world-nginx-deploy-to-cluster-dc8x2-pod-8fqzm 0/8 Completed 0 25d
default hello-world-nginx-source-to-image-b2t7r-pod-l8mmm 0/2 Completed 0 25d
kube-system coredns-f9fd979d6-tbp67 0/1 Error 0 26d
kube-system coredns-f9fd979d6-v9thn 0/1 Error 0 26d
kube-system etcd-b23976de6423 1/1 Running 1 26d
kube-system kube-apiserver-b23976de6423 1/1 Running 1 26d
kube-system kube-controller-manager-b23976de6423 1/1 Running 1 26d
kube-system kube-proxy-cq5sg 1/1 Running 1 26d
kube-system kube-proxy-qfg6v 1/1 Running 1 26d
kube-system kube-scheduler-b23976de6423 1/1 Running 1 26d
tekton-pipelines tekton-pipelines-controller-587569588b-jv6hc 0/1 Error 0 25d
tekton-pipelines tekton-pipelines-webhook-655cf7f8bb-nhlld 0/1 Error 0 25d
and noticed that four pods, including the Tekton Pipelines components were all in Error.
Initially, I tried to simply remove the erroring pods: -
kubectl delete pod coredns-f9fd979d6-v9thn --namespace kube-system
kubectl delete pod coredns-f9fd979d6-tbp67 --namespace kube-system
kubectl delete pod tekton-pipelines-controller-587569588b-jv6hc --namespace tekton-pipelines
kubectl delete pod tekton-pipelines-webhook-655cf7f8bb-nhlld --namespace tekton-pipelines
but that didn't seem to help much: -
kubectl get pods --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
default hello-world-nginx-deploy-to-cluster-dc8x2-pod-8fqzm 0/8 Completed 0 25d
default hello-world-nginx-source-to-image-b2t7r-pod-l8mmm 0/2 Completed 0 25d
kube-system coredns-f9fd979d6-xl7nx 0/1 ContainerCreating 0 7m35s
kube-system coredns-f9fd979d6-zd62l 0/1 ContainerCreating 0 7m21s
kube-system etcd-b23976de6423 1/1 Running 1 26d
kube-system kube-apiserver-b23976de6423 1/1 Running 1 26d
kube-system kube-controller-manager-b23976de6423 1/1 Running 1 26d
kube-system kube-proxy-cq5sg 1/1 Running 1 26d
kube-system kube-proxy-qfg6v 1/1 Running 1 26d
kube-system kube-scheduler-b23976de6423 1/1 Running 1 26d
tekton-pipelines tekton-pipelines-controller-587569588b-mm9d9 0/1 ContainerCreating 0 9m49s
tekton-pipelines tekton-pipelines-webhook-655cf7f8bb-h772d 0/1 ContainerCreating 0 9m32s
I dug into the health of one of the CoreDNS pods: -
kubectl describe pod coredns-f9fd979d6-zd62l --namespace kube-system
which, in part, showed: -
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 4m17s default-scheduler Successfully assigned kube-system/coredns-f9fd979d6-zd62l to b23976de6423
Warning FailedCreatePodSandBox 4m16s kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "2afd79ad7a4aa926050bdb72affc567949f5dba1f07803020bd64bcbfe2de27b" network for pod "coredns-f9fd979d6-zd62l": networkPlugin cni failed to set up pod "coredns-f9fd979d6-zd62l_kube-system" network: open /run/flannel/subnet.env: no such file or directory
Warning FailedCreatePodSandBox 4m14s kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "8c628d3e6851acadc25fcd4a4121bd6bbfa6638557a91464fbd724c98bfec40b" network for pod "coredns-f9fd979d6-zd62l": networkPlugin cni failed to set up pod "coredns-f9fd979d6-zd62l_kube-system" network: open /run/flannel/subnet.env: no such file or directory
Warning FailedCreatePodSandBox 4m12s kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "ac5787fc3163e5216feceedbaaa16862ffea0e79d8ffc70951a531c625bd5424" network for pod "coredns-f9fd979d6-zd62l": networkPlugin cni failed to set up pod "coredns-f9fd979d6-zd62l_kube-system" network: open /run/flannel/subnet.env: no such file or directory
Warning FailedCreatePodSandBox 4m10s kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "115475e415ad5a74442639f3731a050608d0409191486a518e0b62d5ffde1756" network for pod "coredns-f9fd979d6-zd62l": networkPlugin cni failed to set up pod "coredns-f9fd979d6-zd62l_kube-system" network: open /run/flannel/subnet.env: no such file or directory
Warning FailedCreatePodSandBox 4m8s kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "2b6c81b54301793d87d0e426a35b44c41f44ce9f768d0f85cc89dbb7391baa5b" network for pod "coredns-f9fd979d6-zd62l": networkPlugin cni failed to set up pod "coredns-f9fd979d6-zd62l_kube-system" network: open /run/flannel/subnet.env: no such file or directory
Warning FailedCreatePodSandBox 4m6s kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "9f8279ab67680e7b14d43d4ea109c0440527fccf1d2d06f3737e0c5ff38c9b82" network for pod "coredns-f9fd979d6-zd62l": networkPlugin cni failed to set up pod "coredns-f9fd979d6-zd62l_kube-system" network: open /run/flannel/subnet.env: no such file or directory
Warning FailedCreatePodSandBox 4m4s kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "6788f48b5408ced802f77111e8b1f2968c4368228996e2fb946375638b8ca473" network for pod "coredns-f9fd979d6-zd62l": networkPlugin cni failed to set up pod "coredns-f9fd979d6-zd62l_kube-system" network: open /run/flannel/subnet.env: no such file or directory
Warning FailedCreatePodSandBox 4m2s kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "c1daa71473cdadfeb187962b918902890bfd90981a96907fdc60b2937cc3ece4" network for pod "coredns-f9fd979d6-zd62l": networkPlugin cni failed to set up pod "coredns-f9fd979d6-zd62l_kube-system" network: open /run/flannel/subnet.env: no such file or directory
Warning FailedCreatePodSandBox 4m kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "bee12244bd8e1070633226913f94cb0faae6de820b0f745fd905308b92a22b0d" network for pod "coredns-f9fd979d6-zd62l": networkPlugin cni failed to set up pod "coredns-f9fd979d6-zd62l_kube-system" network: open /run/flannel/subnet.env: no such file or directory
Normal SandboxChanged 3m53s (x12 over 4m15s) kubelet Pod sandbox changed, it will be killed and re-created.
Warning FailedCreatePodSandBox 3m52s (x4 over 3m58s) kubelet (combined from similar events): Failed to create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "3a0c18424d9c640e3d33467866852801982bd07fc919a323e290aae6852f7d04" network for pod "coredns-f9fd979d6-zd62l": networkPlugin cni failed to set up pod "coredns-f9fd979d6-zd62l_kube-system" network: open /run/flannel/subnet.env: no such file or directory
Working on the principle that the problem MIGHT be with the Flannel networking plugin, mainly because the Flannel pods were NOT listed as running / failing in the list of pods reporting, I redeployed it: -
kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/2140ac876ef134e0ed5af15c65e414cf26827915/Documentation/kube-flannel.yml
podsecuritypolicy.policy/psp.flannel.unprivileged created
Warning: rbac.authorization.k8s.io/v1beta1 ClusterRole is deprecated in v1.17+, unavailable in v1.22+; use rbac.authorization.k8s.io/v1 ClusterRole
clusterrole.rbac.authorization.k8s.io/flannel created
Warning: rbac.authorization.k8s.io/v1beta1 ClusterRoleBinding is deprecated in v1.17+, unavailable in v1.22+; use rbac.authorization.k8s.io/v1 ClusterRoleBinding
clusterrolebinding.rbac.authorization.k8s.io/flannel created
serviceaccount/flannel created
configmap/kube-flannel-cfg created
daemonset.apps/kube-flannel-ds-amd64 created
daemonset.apps/kube-flannel-ds-arm64 created
daemonset.apps/kube-flannel-ds-arm created
daemonset.apps/kube-flannel-ds-ppc64le created
daemonset.apps/kube-flannel-ds-s390x created
which made a positive difference: -
kubectl get pods --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
default hello-world-nginx-deploy-to-cluster-dc8x2-pod-8fqzm 0/8 Completed 0 25d
default hello-world-nginx-source-to-image-b2t7r-pod-l8mmm 0/2 Completed 0 25d
kube-system coredns-f9fd979d6-xl7nx 0/1 ContainerCreating 0 9m5s
kube-system coredns-f9fd979d6-zd62l 0/1 ContainerCreating 0 8m51s
kube-system etcd-b23976de6423 1/1 Running 1 26d
kube-system kube-apiserver-b23976de6423 1/1 Running 1 26d
kube-system kube-controller-manager-b23976de6423 1/1 Running 1 26d
kube-system kube-flannel-ds-s390x-ttl4n 1/1 Running 0 4s
kube-system kube-flannel-ds-s390x-wpx2h 1/1 Running 0 4s
kube-system kube-proxy-cq5sg 1/1 Running 1 26d
kube-system kube-proxy-qfg6v 1/1 Running 1 26d
kube-system kube-scheduler-b23976de6423 1/1 Running 1 26d
tekton-pipelines tekton-pipelines-controller-587569588b-mm9d9 0/1 ContainerCreating 0 11m
tekton-pipelines tekton-pipelines-webhook-655cf7f8bb-h772d 0/1 ContainerCreating 0 11m
kubectl get pods --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
default hello-world-nginx-deploy-to-cluster-dc8x2-pod-8fqzm 0/8 Completed 0 25d
default hello-world-nginx-source-to-image-b2t7r-pod-l8mmm 0/2 Completed 0 25d
kube-system coredns-f9fd979d6-xl7nx 1/1 Running 0 9m56s
kube-system coredns-f9fd979d6-zd62l 1/1 Running 0 9m42s
kube-system etcd-b23976de6423 1/1 Running 1 26d
kube-system kube-apiserver-b23976de6423 1/1 Running 1 26d
kube-system kube-controller-manager-b23976de6423 1/1 Running 1 26d
kube-system kube-flannel-ds-s390x-ttl4n 1/1 Running 0 55s
kube-system kube-flannel-ds-s390x-wpx2h 1/1 Running 0 55s
kube-system kube-proxy-cq5sg 1/1 Running 1 26d
kube-system kube-proxy-qfg6v 1/1 Running 1 26d
kube-system kube-scheduler-b23976de6423 1/1 Running 1 26d
tekton-pipelines tekton-pipelines-controller-587569588b-mm9d9 1/1 Running 0 12m
tekton-pipelines tekton-pipelines-webhook-655cf7f8bb-h772d 1/1 Running 0 11m
I then re-ran the script that creates a Tekton deployment: -
# Create the tutorial-service Service Account
kubectl apply -f ./serviceaccounts/create_tutorial_service_account.yaml
# Create the clusterrole and clusterrolebinding
kubectl apply -f ./roles/create_cluster_role.yaml
# Create the Tekton Resource aligned to the Git repository
kubectl apply -f ./resources/git.yaml
# Create the Tekton Task that creates the Docker image from the GitHub repository
kubectl apply -f ./tasks/source-to-image.yaml
# Create the Tekton Task that pushes the built image to Docker Hub
kubectl apply -f ./tasks/deploy-using-kubectl.yaml
# Create the Tekton Pipeline that runs the two tasks
kubectl apply -f ./pipelines/build-and-deploy-pipeline.yaml
# Create the Tekton PipelineRun that runs the Pipeline
kubectl apply -f ./runs/pipeline-run.yaml
# Display the Pipeline logs
tkn pipelines logs
and checked the resulting deployment: -
kubectl get deployments
NAME READY UP-TO-DATE AVAILABLE AGE
hello-world-nginx 1/1 1 1 17m
service: -
kubectl get services
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
hello-world-nginx NodePort 10.97.37.211 <none> 80:30674/TCP 17m
kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 26d
and nodes: -
kubectl get nodes
NAME STATUS ROLES AGE VERSION
68bc83cf0d09 Ready <none> 26d v1.19.2
b23976de6423 Ready master 26d v1.19.2
I then used cURL to validate the Nginx pod: -
curl http://192.168.32.142:30674
<html>
<head>
<title>Hello World</title>
</head>
<body>
<div class="info">
<p>
<h2>
<span>Welcome to IBM Cloud Kubernetes Service with Hyper Protect ...</span>
</h2>
</p>
<p>
<h2>
<span>and your first Docker application built by Tekton Pipelines and Triggers ...</span>
</h2>
</p>
<p>
<h2>
<span>Message of the Day .... Drink More Herbal Tea!!</span>
</h2>
</p>
<p>
<h2>
<span>( and, of course, Hello World! )</span>
</h2>
</p>
</div>
</body>
</html>
So we're now good to go ......