Tuesday, 22 December 2020

Kubernetes on IBM Z - Flannel says "No"

I hit an interesting issue with a K8s 1.19.2 running on an IBM Z box, specifically across a pair of Ubuntu containers which were then running on an IBM Secure Service Container (SSC) LPAR on a Z box in IBM's cloud.

One of my colleagues had just upgraded the SSC software on that particular LPAR, known as Hosting Appliance, and I was performing some post-upgrade checks.

Having set the KUBECONFIG variable: -

export KUBECONFIG=~/davehay_k8s.conf 

I checked the running pods: -

 kubectl get pods --all-namespaces

NAMESPACE          NAME                                                  READY   STATUS              RESTARTS   AGE

default            hello-world-nginx-74bbbf57b4-8kzpb                    0/1     Error       0          25d

default            hello-world-nginx-deploy-to-cluster-dc8x2-pod-8fqzm   0/8     Completed           0          25d

default            hello-world-nginx-source-to-image-b2t7r-pod-l8mmm     0/2     Completed           0          25d

kube-system        coredns-f9fd979d6-tbp67                               0/1     Error       0          26d

kube-system        coredns-f9fd979d6-v9thn                               0/1     Error       0          26d

kube-system        etcd-b23976de6423                                     1/1     Running             1          26d

kube-system        kube-apiserver-b23976de6423                           1/1     Running             1          26d

kube-system        kube-controller-manager-b23976de6423                  1/1     Running             1          26d

kube-system        kube-proxy-cq5sg                                      1/1     Running             1          26d

kube-system        kube-proxy-qfg6v                                      1/1     Running             1          26d

kube-system        kube-scheduler-b23976de6423                           1/1     Running             1          26d

tekton-pipelines   tekton-pipelines-controller-587569588b-jv6hc          0/1     Error       0          25d

tekton-pipelines   tekton-pipelines-webhook-655cf7f8bb-nhlld             0/1     Error       0          25d


and noticed that four pods, including the Tekton Pipelines components were all in Error.

Initially, I tried to simply remove the erroring pods: -

kubectl delete pod coredns-f9fd979d6-v9thn --namespace kube-system

kubectl delete pod coredns-f9fd979d6-tbp67 --namespace kube-system

kubectl delete pod tekton-pipelines-controller-587569588b-jv6hc --namespace tekton-pipelines

kubectl delete pod tekton-pipelines-webhook-655cf7f8bb-nhlld --namespace tekton-pipelines

but that didn't seem to help much: -

kubectl get pods --all-namespaces

NAMESPACE          NAME                                                  READY   STATUS              RESTARTS   AGE
default            hello-world-nginx-deploy-to-cluster-dc8x2-pod-8fqzm   0/8     Completed           0          25d
default            hello-world-nginx-source-to-image-b2t7r-pod-l8mmm     0/2     Completed           0          25d
kube-system        coredns-f9fd979d6-xl7nx                               0/1     ContainerCreating   0          7m35s
kube-system        coredns-f9fd979d6-zd62l                               0/1     ContainerCreating   0          7m21s
kube-system        etcd-b23976de6423                                     1/1     Running             1          26d
kube-system        kube-apiserver-b23976de6423                           1/1     Running             1          26d
kube-system        kube-controller-manager-b23976de6423                  1/1     Running             1          26d
kube-system        kube-proxy-cq5sg                                      1/1     Running             1          26d
kube-system        kube-proxy-qfg6v                                      1/1     Running             1          26d
kube-system        kube-scheduler-b23976de6423                           1/1     Running             1          26d
tekton-pipelines   tekton-pipelines-controller-587569588b-mm9d9          0/1     ContainerCreating   0          9m49s
tekton-pipelines   tekton-pipelines-webhook-655cf7f8bb-h772d             0/1     ContainerCreating   0          9m32s

I dug into the health of one of the CoreDNS pods: -

kubectl describe pod coredns-f9fd979d6-zd62l --namespace kube-system

which, in part, showed: -

Type     Reason                  Age                     From               Message

----     ------                  ----                    ----               -------

Normal   Scheduled               4m17s                   default-scheduler  Successfully assigned kube-system/coredns-f9fd979d6-zd62l to b23976de6423

Warning  FailedCreatePodSandBox  4m16s                   kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "2afd79ad7a4aa926050bdb72affc567949f5dba1f07803020bd64bcbfe2de27b" network for pod "coredns-f9fd979d6-zd62l": networkPlugin cni failed to set up pod "coredns-f9fd979d6-zd62l_kube-system" network: open /run/flannel/subnet.env: no such file or directory

Warning  FailedCreatePodSandBox  4m14s                   kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "8c628d3e6851acadc25fcd4a4121bd6bbfa6638557a91464fbd724c98bfec40b" network for pod "coredns-f9fd979d6-zd62l": networkPlugin cni failed to set up pod "coredns-f9fd979d6-zd62l_kube-system" network: open /run/flannel/subnet.env: no such file or directory

Warning  FailedCreatePodSandBox  4m12s                   kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "ac5787fc3163e5216feceedbaaa16862ffea0e79d8ffc70951a531c625bd5424" network for pod "coredns-f9fd979d6-zd62l": networkPlugin cni failed to set up pod "coredns-f9fd979d6-zd62l_kube-system" network: open /run/flannel/subnet.env: no such file or directory

Warning  FailedCreatePodSandBox  4m10s                   kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "115475e415ad5a74442639f3731a050608d0409191486a518e0b62d5ffde1756" network for pod "coredns-f9fd979d6-zd62l": networkPlugin cni failed to set up pod "coredns-f9fd979d6-zd62l_kube-system" network: open /run/flannel/subnet.env: no such file or directory

Warning  FailedCreatePodSandBox  4m8s                    kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "2b6c81b54301793d87d0e426a35b44c41f44ce9f768d0f85cc89dbb7391baa5b" network for pod "coredns-f9fd979d6-zd62l": networkPlugin cni failed to set up pod "coredns-f9fd979d6-zd62l_kube-system" network: open /run/flannel/subnet.env: no such file or directory

Warning  FailedCreatePodSandBox  4m6s                    kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "9f8279ab67680e7b14d43d4ea109c0440527fccf1d2d06f3737e0c5ff38c9b82" network for pod "coredns-f9fd979d6-zd62l": networkPlugin cni failed to set up pod "coredns-f9fd979d6-zd62l_kube-system" network: open /run/flannel/subnet.env: no such file or directory

Warning  FailedCreatePodSandBox  4m4s                    kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "6788f48b5408ced802f77111e8b1f2968c4368228996e2fb946375638b8ca473" network for pod "coredns-f9fd979d6-zd62l": networkPlugin cni failed to set up pod "coredns-f9fd979d6-zd62l_kube-system" network: open /run/flannel/subnet.env: no such file or directory

Warning  FailedCreatePodSandBox  4m2s                    kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "c1daa71473cdadfeb187962b918902890bfd90981a96907fdc60b2937cc3ece4" network for pod "coredns-f9fd979d6-zd62l": networkPlugin cni failed to set up pod "coredns-f9fd979d6-zd62l_kube-system" network: open /run/flannel/subnet.env: no such file or directory

Warning  FailedCreatePodSandBox  4m                      kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "bee12244bd8e1070633226913f94cb0faae6de820b0f745fd905308b92a22b0d" network for pod "coredns-f9fd979d6-zd62l": networkPlugin cni failed to set up pod "coredns-f9fd979d6-zd62l_kube-system" network: open /run/flannel/subnet.env: no such file or directory

Normal   SandboxChanged          3m53s (x12 over 4m15s)  kubelet            Pod sandbox changed, it will be killed and re-created.

Warning  FailedCreatePodSandBox  3m52s (x4 over 3m58s)   kubelet            (combined from similar events): Failed to create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "3a0c18424d9c640e3d33467866852801982bd07fc919a323e290aae6852f7d04" network for pod "coredns-f9fd979d6-zd62l": networkPlugin cni failed to set up pod "coredns-f9fd979d6-zd62l_kube-system" network: open /run/flannel/subnet.env: no such file or directory

Working on the principle that the problem MIGHT be with the Flannel networking plugin, mainly because the Flannel pods were NOT listed as running / failing in the list of pods reporting, I redeployed it: -

kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/2140ac876ef134e0ed5af15c65e414cf26827915/Documentation/kube-flannel.yml

podsecuritypolicy.policy/psp.flannel.unprivileged created

Warning: rbac.authorization.k8s.io/v1beta1 ClusterRole is deprecated in v1.17+, unavailable in v1.22+; use rbac.authorization.k8s.io/v1 ClusterRole

clusterrole.rbac.authorization.k8s.io/flannel created

Warning: rbac.authorization.k8s.io/v1beta1 ClusterRoleBinding is deprecated in v1.17+, unavailable in v1.22+; use rbac.authorization.k8s.io/v1 ClusterRoleBinding

clusterrolebinding.rbac.authorization.k8s.io/flannel created

serviceaccount/flannel created

configmap/kube-flannel-cfg created

daemonset.apps/kube-flannel-ds-amd64 created

daemonset.apps/kube-flannel-ds-arm64 created

daemonset.apps/kube-flannel-ds-arm created

daemonset.apps/kube-flannel-ds-ppc64le created

daemonset.apps/kube-flannel-ds-s390x created

which made a positive difference: -

kubectl get pods --all-namespaces

NAMESPACE          NAME                                                  READY   STATUS              RESTARTS   AGE
default            hello-world-nginx-deploy-to-cluster-dc8x2-pod-8fqzm   0/8     Completed           0          25d
default            hello-world-nginx-source-to-image-b2t7r-pod-l8mmm     0/2     Completed           0          25d
kube-system        coredns-f9fd979d6-xl7nx                               0/1     ContainerCreating   0          9m5s
kube-system        coredns-f9fd979d6-zd62l                               0/1     ContainerCreating   0          8m51s
kube-system        etcd-b23976de6423                                     1/1     Running             1          26d
kube-system        kube-apiserver-b23976de6423                           1/1     Running             1          26d
kube-system        kube-controller-manager-b23976de6423                  1/1     Running             1          26d
kube-system        kube-flannel-ds-s390x-ttl4n                           1/1     Running             0          4s
kube-system        kube-flannel-ds-s390x-wpx2h                           1/1     Running             0          4s
kube-system        kube-proxy-cq5sg                                      1/1     Running             1          26d
kube-system        kube-proxy-qfg6v                                      1/1     Running             1          26d
kube-system        kube-scheduler-b23976de6423                           1/1     Running             1          26d
tekton-pipelines   tekton-pipelines-controller-587569588b-mm9d9          0/1     ContainerCreating   0          11m
tekton-pipelines   tekton-pipelines-webhook-655cf7f8bb-h772d             0/1     ContainerCreating   0          11m

kubectl get pods --all-namespaces

NAMESPACE          NAME                                                  READY   STATUS      RESTARTS   AGE
default            hello-world-nginx-deploy-to-cluster-dc8x2-pod-8fqzm   0/8     Completed   0          25d
default            hello-world-nginx-source-to-image-b2t7r-pod-l8mmm     0/2     Completed   0          25d
kube-system        coredns-f9fd979d6-xl7nx                               1/1     Running     0          9m56s
kube-system        coredns-f9fd979d6-zd62l                               1/1     Running     0          9m42s
kube-system        etcd-b23976de6423                                     1/1     Running     1          26d
kube-system        kube-apiserver-b23976de6423                           1/1     Running     1          26d
kube-system        kube-controller-manager-b23976de6423                  1/1     Running     1          26d
kube-system        kube-flannel-ds-s390x-ttl4n                           1/1     Running     0          55s
kube-system        kube-flannel-ds-s390x-wpx2h                           1/1     Running     0          55s
kube-system        kube-proxy-cq5sg                                      1/1     Running     1          26d
kube-system        kube-proxy-qfg6v                                      1/1     Running     1          26d
kube-system        kube-scheduler-b23976de6423                           1/1     Running     1          26d
tekton-pipelines   tekton-pipelines-controller-587569588b-mm9d9          1/1     Running     0          12m
tekton-pipelines   tekton-pipelines-webhook-655cf7f8bb-h772d             1/1     Running     0          11m

I then re-ran the script that creates a Tekton deployment: -

# Create the tutorial-service Service Account

kubectl apply -f ./serviceaccounts/create_tutorial_service_account.yaml

# Create the clusterrole and clusterrolebinding

kubectl apply -f ./roles/create_cluster_role.yaml

# Create the Tekton Resource aligned to the Git repository

kubectl apply -f ./resources/git.yaml

# Create the Tekton Task that creates the Docker image from the GitHub repository

kubectl apply -f ./tasks/source-to-image.yaml

# Create the Tekton Task that pushes the built image to Docker Hub

kubectl apply -f ./tasks/deploy-using-kubectl.yaml

# Create the Tekton Pipeline that runs the two tasks

kubectl apply -f ./pipelines/build-and-deploy-pipeline.yaml

# Create the Tekton PipelineRun that runs the Pipeline

kubectl apply -f ./runs/pipeline-run.yaml

# Display the Pipeline logs

tkn pipelines logs

and checked the resulting deployment: -

kubectl get deployments

NAME                READY   UP-TO-DATE   AVAILABLE   AGE
hello-world-nginx   1/1     1            1           17m

service: -

kubectl get services

NAME                TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)        AGE
hello-world-nginx   NodePort    10.97.37.211   <none>        80:30674/TCP   17m
kubernetes          ClusterIP   10.96.0.1      <none>        443/TCP        26d

and nodes: -

kubectl get nodes

NAME           STATUS   ROLES    AGE   VERSION
68bc83cf0d09   Ready    <none>   26d   v1.19.2
b23976de6423   Ready    master   26d   v1.19.2

I then used cURL to validate the Nginx pod: -

curl http://192.168.32.142:30674

<html>
  <head>
    <title>Hello World</title>
  </head>
  <body>
    <div class="info">
      <p>
        <h2>
          <span>Welcome to IBM Cloud Kubernetes Service with Hyper Protect ...</span>
        </h2>
      </p>
      <p>
        <h2>
          <span>and your first Docker application built by Tekton Pipelines and Triggers ...</span>
        </h2>
      </p>
      <p>
        <h2>
          <span>Message of the Day .... Drink More Herbal Tea!!</span>
        </h2>
      </p>
      <p>
        <h2>
          <span>( and, of course, Hello World! )</span>
        </h2>
      </p>
    </div>
  </body>
</html>

So we're now good to go ......

No comments:

Note to self - use kubectl to query images in a pod or deployment

In both cases, we use JSON ... For a deployment, we can do this: - kubectl get deployment foobar --namespace snafu --output jsonpath="{...