Tuesday, 22 December 2020

Kubernetes on IBM Z - Flannel says "No"

I hit an interesting issue with a K8s 1.19.2 running on an IBM Z box, specifically across a pair of Ubuntu containers which were then running on an IBM Secure Service Container (SSC) LPAR on a Z box in IBM's cloud.

One of my colleagues had just upgraded the SSC software on that particular LPAR, known as Hosting Appliance, and I was performing some post-upgrade checks.

Having set the KUBECONFIG variable: -

export KUBECONFIG=~/davehay_k8s.conf 

I checked the running pods: -

 kubectl get pods --all-namespaces

NAMESPACE          NAME                                                  READY   STATUS              RESTARTS   AGE

default            hello-world-nginx-74bbbf57b4-8kzpb                    0/1     Error       0          25d

default            hello-world-nginx-deploy-to-cluster-dc8x2-pod-8fqzm   0/8     Completed           0          25d

default            hello-world-nginx-source-to-image-b2t7r-pod-l8mmm     0/2     Completed           0          25d

kube-system        coredns-f9fd979d6-tbp67                               0/1     Error       0          26d

kube-system        coredns-f9fd979d6-v9thn                               0/1     Error       0          26d

kube-system        etcd-b23976de6423                                     1/1     Running             1          26d

kube-system        kube-apiserver-b23976de6423                           1/1     Running             1          26d

kube-system        kube-controller-manager-b23976de6423                  1/1     Running             1          26d

kube-system        kube-proxy-cq5sg                                      1/1     Running             1          26d

kube-system        kube-proxy-qfg6v                                      1/1     Running             1          26d

kube-system        kube-scheduler-b23976de6423                           1/1     Running             1          26d

tekton-pipelines   tekton-pipelines-controller-587569588b-jv6hc          0/1     Error       0          25d

tekton-pipelines   tekton-pipelines-webhook-655cf7f8bb-nhlld             0/1     Error       0          25d


and noticed that four pods, including the Tekton Pipelines components were all in Error.

Initially, I tried to simply remove the erroring pods: -

kubectl delete pod coredns-f9fd979d6-v9thn --namespace kube-system

kubectl delete pod coredns-f9fd979d6-tbp67 --namespace kube-system

kubectl delete pod tekton-pipelines-controller-587569588b-jv6hc --namespace tekton-pipelines

kubectl delete pod tekton-pipelines-webhook-655cf7f8bb-nhlld --namespace tekton-pipelines

but that didn't seem to help much: -

kubectl get pods --all-namespaces

NAMESPACE          NAME                                                  READY   STATUS              RESTARTS   AGE
default            hello-world-nginx-deploy-to-cluster-dc8x2-pod-8fqzm   0/8     Completed           0          25d
default            hello-world-nginx-source-to-image-b2t7r-pod-l8mmm     0/2     Completed           0          25d
kube-system        coredns-f9fd979d6-xl7nx                               0/1     ContainerCreating   0          7m35s
kube-system        coredns-f9fd979d6-zd62l                               0/1     ContainerCreating   0          7m21s
kube-system        etcd-b23976de6423                                     1/1     Running             1          26d
kube-system        kube-apiserver-b23976de6423                           1/1     Running             1          26d
kube-system        kube-controller-manager-b23976de6423                  1/1     Running             1          26d
kube-system        kube-proxy-cq5sg                                      1/1     Running             1          26d
kube-system        kube-proxy-qfg6v                                      1/1     Running             1          26d
kube-system        kube-scheduler-b23976de6423                           1/1     Running             1          26d
tekton-pipelines   tekton-pipelines-controller-587569588b-mm9d9          0/1     ContainerCreating   0          9m49s
tekton-pipelines   tekton-pipelines-webhook-655cf7f8bb-h772d             0/1     ContainerCreating   0          9m32s

I dug into the health of one of the CoreDNS pods: -

kubectl describe pod coredns-f9fd979d6-zd62l --namespace kube-system

which, in part, showed: -

Type     Reason                  Age                     From               Message

----     ------                  ----                    ----               -------

Normal   Scheduled               4m17s                   default-scheduler  Successfully assigned kube-system/coredns-f9fd979d6-zd62l to b23976de6423

Warning  FailedCreatePodSandBox  4m16s                   kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "2afd79ad7a4aa926050bdb72affc567949f5dba1f07803020bd64bcbfe2de27b" network for pod "coredns-f9fd979d6-zd62l": networkPlugin cni failed to set up pod "coredns-f9fd979d6-zd62l_kube-system" network: open /run/flannel/subnet.env: no such file or directory

Warning  FailedCreatePodSandBox  4m14s                   kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "8c628d3e6851acadc25fcd4a4121bd6bbfa6638557a91464fbd724c98bfec40b" network for pod "coredns-f9fd979d6-zd62l": networkPlugin cni failed to set up pod "coredns-f9fd979d6-zd62l_kube-system" network: open /run/flannel/subnet.env: no such file or directory

Warning  FailedCreatePodSandBox  4m12s                   kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "ac5787fc3163e5216feceedbaaa16862ffea0e79d8ffc70951a531c625bd5424" network for pod "coredns-f9fd979d6-zd62l": networkPlugin cni failed to set up pod "coredns-f9fd979d6-zd62l_kube-system" network: open /run/flannel/subnet.env: no such file or directory

Warning  FailedCreatePodSandBox  4m10s                   kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "115475e415ad5a74442639f3731a050608d0409191486a518e0b62d5ffde1756" network for pod "coredns-f9fd979d6-zd62l": networkPlugin cni failed to set up pod "coredns-f9fd979d6-zd62l_kube-system" network: open /run/flannel/subnet.env: no such file or directory

Warning  FailedCreatePodSandBox  4m8s                    kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "2b6c81b54301793d87d0e426a35b44c41f44ce9f768d0f85cc89dbb7391baa5b" network for pod "coredns-f9fd979d6-zd62l": networkPlugin cni failed to set up pod "coredns-f9fd979d6-zd62l_kube-system" network: open /run/flannel/subnet.env: no such file or directory

Warning  FailedCreatePodSandBox  4m6s                    kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "9f8279ab67680e7b14d43d4ea109c0440527fccf1d2d06f3737e0c5ff38c9b82" network for pod "coredns-f9fd979d6-zd62l": networkPlugin cni failed to set up pod "coredns-f9fd979d6-zd62l_kube-system" network: open /run/flannel/subnet.env: no such file or directory

Warning  FailedCreatePodSandBox  4m4s                    kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "6788f48b5408ced802f77111e8b1f2968c4368228996e2fb946375638b8ca473" network for pod "coredns-f9fd979d6-zd62l": networkPlugin cni failed to set up pod "coredns-f9fd979d6-zd62l_kube-system" network: open /run/flannel/subnet.env: no such file or directory

Warning  FailedCreatePodSandBox  4m2s                    kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "c1daa71473cdadfeb187962b918902890bfd90981a96907fdc60b2937cc3ece4" network for pod "coredns-f9fd979d6-zd62l": networkPlugin cni failed to set up pod "coredns-f9fd979d6-zd62l_kube-system" network: open /run/flannel/subnet.env: no such file or directory

Warning  FailedCreatePodSandBox  4m                      kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "bee12244bd8e1070633226913f94cb0faae6de820b0f745fd905308b92a22b0d" network for pod "coredns-f9fd979d6-zd62l": networkPlugin cni failed to set up pod "coredns-f9fd979d6-zd62l_kube-system" network: open /run/flannel/subnet.env: no such file or directory

Normal   SandboxChanged          3m53s (x12 over 4m15s)  kubelet            Pod sandbox changed, it will be killed and re-created.

Warning  FailedCreatePodSandBox  3m52s (x4 over 3m58s)   kubelet            (combined from similar events): Failed to create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "3a0c18424d9c640e3d33467866852801982bd07fc919a323e290aae6852f7d04" network for pod "coredns-f9fd979d6-zd62l": networkPlugin cni failed to set up pod "coredns-f9fd979d6-zd62l_kube-system" network: open /run/flannel/subnet.env: no such file or directory

Working on the principle that the problem MIGHT be with the Flannel networking plugin, mainly because the Flannel pods were NOT listed as running / failing in the list of pods reporting, I redeployed it: -

kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/2140ac876ef134e0ed5af15c65e414cf26827915/Documentation/kube-flannel.yml

podsecuritypolicy.policy/psp.flannel.unprivileged created

Warning: rbac.authorization.k8s.io/v1beta1 ClusterRole is deprecated in v1.17+, unavailable in v1.22+; use rbac.authorization.k8s.io/v1 ClusterRole

clusterrole.rbac.authorization.k8s.io/flannel created

Warning: rbac.authorization.k8s.io/v1beta1 ClusterRoleBinding is deprecated in v1.17+, unavailable in v1.22+; use rbac.authorization.k8s.io/v1 ClusterRoleBinding

clusterrolebinding.rbac.authorization.k8s.io/flannel created

serviceaccount/flannel created

configmap/kube-flannel-cfg created

daemonset.apps/kube-flannel-ds-amd64 created

daemonset.apps/kube-flannel-ds-arm64 created

daemonset.apps/kube-flannel-ds-arm created

daemonset.apps/kube-flannel-ds-ppc64le created

daemonset.apps/kube-flannel-ds-s390x created

which made a positive difference: -

kubectl get pods --all-namespaces

NAMESPACE          NAME                                                  READY   STATUS              RESTARTS   AGE
default            hello-world-nginx-deploy-to-cluster-dc8x2-pod-8fqzm   0/8     Completed           0          25d
default            hello-world-nginx-source-to-image-b2t7r-pod-l8mmm     0/2     Completed           0          25d
kube-system        coredns-f9fd979d6-xl7nx                               0/1     ContainerCreating   0          9m5s
kube-system        coredns-f9fd979d6-zd62l                               0/1     ContainerCreating   0          8m51s
kube-system        etcd-b23976de6423                                     1/1     Running             1          26d
kube-system        kube-apiserver-b23976de6423                           1/1     Running             1          26d
kube-system        kube-controller-manager-b23976de6423                  1/1     Running             1          26d
kube-system        kube-flannel-ds-s390x-ttl4n                           1/1     Running             0          4s
kube-system        kube-flannel-ds-s390x-wpx2h                           1/1     Running             0          4s
kube-system        kube-proxy-cq5sg                                      1/1     Running             1          26d
kube-system        kube-proxy-qfg6v                                      1/1     Running             1          26d
kube-system        kube-scheduler-b23976de6423                           1/1     Running             1          26d
tekton-pipelines   tekton-pipelines-controller-587569588b-mm9d9          0/1     ContainerCreating   0          11m
tekton-pipelines   tekton-pipelines-webhook-655cf7f8bb-h772d             0/1     ContainerCreating   0          11m

kubectl get pods --all-namespaces

NAMESPACE          NAME                                                  READY   STATUS      RESTARTS   AGE
default            hello-world-nginx-deploy-to-cluster-dc8x2-pod-8fqzm   0/8     Completed   0          25d
default            hello-world-nginx-source-to-image-b2t7r-pod-l8mmm     0/2     Completed   0          25d
kube-system        coredns-f9fd979d6-xl7nx                               1/1     Running     0          9m56s
kube-system        coredns-f9fd979d6-zd62l                               1/1     Running     0          9m42s
kube-system        etcd-b23976de6423                                     1/1     Running     1          26d
kube-system        kube-apiserver-b23976de6423                           1/1     Running     1          26d
kube-system        kube-controller-manager-b23976de6423                  1/1     Running     1          26d
kube-system        kube-flannel-ds-s390x-ttl4n                           1/1     Running     0          55s
kube-system        kube-flannel-ds-s390x-wpx2h                           1/1     Running     0          55s
kube-system        kube-proxy-cq5sg                                      1/1     Running     1          26d
kube-system        kube-proxy-qfg6v                                      1/1     Running     1          26d
kube-system        kube-scheduler-b23976de6423                           1/1     Running     1          26d
tekton-pipelines   tekton-pipelines-controller-587569588b-mm9d9          1/1     Running     0          12m
tekton-pipelines   tekton-pipelines-webhook-655cf7f8bb-h772d             1/1     Running     0          11m

I then re-ran the script that creates a Tekton deployment: -

# Create the tutorial-service Service Account

kubectl apply -f ./serviceaccounts/create_tutorial_service_account.yaml

# Create the clusterrole and clusterrolebinding

kubectl apply -f ./roles/create_cluster_role.yaml

# Create the Tekton Resource aligned to the Git repository

kubectl apply -f ./resources/git.yaml

# Create the Tekton Task that creates the Docker image from the GitHub repository

kubectl apply -f ./tasks/source-to-image.yaml

# Create the Tekton Task that pushes the built image to Docker Hub

kubectl apply -f ./tasks/deploy-using-kubectl.yaml

# Create the Tekton Pipeline that runs the two tasks

kubectl apply -f ./pipelines/build-and-deploy-pipeline.yaml

# Create the Tekton PipelineRun that runs the Pipeline

kubectl apply -f ./runs/pipeline-run.yaml

# Display the Pipeline logs

tkn pipelines logs

and checked the resulting deployment: -

kubectl get deployments

NAME                READY   UP-TO-DATE   AVAILABLE   AGE
hello-world-nginx   1/1     1            1           17m

service: -

kubectl get services

NAME                TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)        AGE
hello-world-nginx   NodePort    10.97.37.211   <none>        80:30674/TCP   17m
kubernetes          ClusterIP   10.96.0.1      <none>        443/TCP        26d

and nodes: -

kubectl get nodes

NAME           STATUS   ROLES    AGE   VERSION
68bc83cf0d09   Ready    <none>   26d   v1.19.2
b23976de6423   Ready    master   26d   v1.19.2

I then used cURL to validate the Nginx pod: -

curl http://192.168.32.142:30674

<html>
  <head>
    <title>Hello World</title>
  </head>
  <body>
    <div class="info">
      <p>
        <h2>
          <span>Welcome to IBM Cloud Kubernetes Service with Hyper Protect ...</span>
        </h2>
      </p>
      <p>
        <h2>
          <span>and your first Docker application built by Tekton Pipelines and Triggers ...</span>
        </h2>
      </p>
      <p>
        <h2>
          <span>Message of the Day .... Drink More Herbal Tea!!</span>
        </h2>
      </p>
      <p>
        <h2>
          <span>( and, of course, Hello World! )</span>
        </h2>
      </p>
    </div>
  </body>
</html>

So we're now good to go ......

No comments:

Yay, VMware Fusion and macOS Big Sur - no longer "NAT good friends" - forgive the double negative and the terrible pun ...

After macOS 11 Big Sur was released in 2020, VMware updated their Fusion product to v12 and, sadly, managed to break Network Address Trans...