Monday, 28 June 2021

Docker Content Trust and the Case of the PEBCAK

 I'm doing some work with trusted/signed container images at present, and am using IBM Container Registry (ICR) as my ... container registry.

I'm doing the actual build/tag/push from an Ubuntu 20.04 VM, having logged into ICR: -

docker login us.icr.io --username iamapikey

and having set my Bash variables to enable Docker Content Trust (DCT): -

export DOCKER_CONTENT_TRUST=1
export DOCKER_CONTENT_TRUST_SERVER=https://us.icr.io:4443/

I then built / tagged: -

docker build --no-cache -t us.icr.io/foobarsnafu/hello_world_nginx_dct_june_2021:latest -f Dockerfile .

and pushed my image: -

docker push us.icr.io/foobarsnafu/hello_world_nginx_dct_june_2021:latest

Whilst this appeared to work, it ultimately failed: -

The push refers to repository [us.icr.io/foobarsnafu/hello_world_nginx_dct_june_2021]
d0471711ab1a: Pushed 
5dbe8c3d30af: Pushed 
67780d477478: Pushed 
5db88766b0e0: Pushed 
36dfa50192c8: Pushed 
8506b073cd53: Pushed 
468af79aab10: Pushed 
fbf82c12d86e: Pushed 
4dc20fbc0e8d: Pushed 
b831cc3ae47e: Pushed 
ace0eda3e3be: Pushed 
latest: digest: sha256:bfcadd198529d842b97dcd633f7b0b65fbcdca4599886a172a31eff0543f3f9d size: 2610
Signing and pushing trust metadata
unable to reach trust server at this time: 301.

I checked and re-checked my steps, and then turned to my faithful friend, Google .... and found this: -


in which the person raising the issue said this: -

After a day and a half of debugging and redoing everything from scratch several times I've figured out that the problem was the trailing slash in the notary server url.

That was revelatory .... notice that I'd previously typed: -

export DOCKER_CONTENT_TRUST_SERVER=https://us.icr.io:4443/

Once I changed my DCT URL to: -

export DOCKER_CONTENT_TRUST_SERVER=https://us.icr.io:4443

( i.e. no trailing slash )

everything was copacetic 😹

Easy when you know how now

Wednesday, 23 June 2021

More fun with containerd and the ctr tool

 As per previous posts, I've been tinkering ( gosh, I love that word ) with containerd and Kata 2.0 a lot recently.

Having deployed containerd and Kata 2.0 into my Kubernetes 1.21 environment, I am happily creating Pods using the Kata runtime: -

vi nginx-kata.yaml 

apiVersion: v1
kind: Pod
metadata:
  name: nginx-kata
spec:
  runtimeClassName: kata
  containers:
  - name: nginx
    image: nginx

kubectl apply -f nginx-kata.yaml

pod/nginx-kata created

and then using tools such as crictl to see what's going on ( on the K8s Compute Node ) : -

vi /etc/crictl.yaml

runtime-endpoint: unix:///run/containerd/containerd.sock
image-endpoint: unix:///run/containerd/containerd.sock
timeout: 10
debug: false

crictl pods

POD ID              CREATED             STATE               NAME                                      NAMESPACE           ATTEMPT
dc02fc7c89641       10 minutes ago      Ready               nginx-kata                                default             0
d1f8ce098f089       2 days ago          Ready               coredns-558bd4d5db-qwfl9                  kube-system         0
7f3783a919973       2 days ago          Ready               coredns-558bd4d5db-54k58                  kube-system         0
5dca2e336f243       2 days ago          Ready               calico-kube-controllers-cc8959d7f-xwggk   kube-system         0
9a9bca9a8e611       2 days ago          Ready               calico-node-wvn6q                         kube-system         0
d98889f80a38a       2 days ago          Ready               kube-proxy-bc897                          kube-system         0

crictl ps

CONTAINER ID        IMAGE               CREATED             STATE               NAME                      ATTEMPT             POD ID
efefdc0c91767       4f380adfc10f4       11 minutes ago      Running             nginx                     0                   dc02fc7c89641
7bc01b279a245       296a6d5035e2d       2 days ago          Running             coredns                   0                   d1f8ce098f089
693fae66b8cff       296a6d5035e2d       2 days ago          Running             coredns                   0                   7f3783a919973
ebd27f6028920       2ce0e04399aca       2 days ago          Running             calico-kube-controllers   0                   5dca2e336f243
dac664782bbac       ebc659140e762       2 days ago          Running             calico-node               0                   9a9bca9a8e611
fc7b02adf0cfe       38ddd85fe90e0       2 days ago          Running             kube-proxy                0                   d98889f80a38a

but I wanted to also see what was going on using the ctr tool, which ships with containerd

I tried this: -

ctr container list

but that only returned: -

CONTAINER    IMAGE    RUNTIME    

I even tried directing ctr to the same endpoint as crictl : -

ctr --address /run/containerd/containerd.sock containers list

which similarly returned: -

CONTAINER    IMAGE    RUNTIME    

I dug about online for a bit and and found this: -


in the containerd GitHub repo, which said, in part: -

containerd has namespaces: https://github.com/containerd/containerd/blob/master/README.md#namespaces

ctr --namespace k8s.io containers ls

Once I amended my command: -

ctr --namespace k8s.io container list

all was well: -

CONTAINER                                                           IMAGE                                                                                              RUNTIME                  
5dca2e336f24335693c6e6e36bfa9448b77f62ebd24dc6ff7dfbad6046b4e451    k8s.gcr.io/pause:3.2                                                                               io.containerd.runc.v2    
693fae66b8cffa4a6f2502704e6fb3ae581a85154284158122801e272182a480    k8s.gcr.io/coredns/coredns:v1.8.0                                                                  io.containerd.runc.v2    
7bc01b279a245902df8a74daa2647158992ca304313772002d3b92f989c832b4    k8s.gcr.io/coredns/coredns:v1.8.0                                                                  io.containerd.runc.v2    
7e9debe26471498d06a99403ade2b46e9ce8ef50cade17a2a44efddcacb7ec70    sha256:021ecb3cb5348375a201bc8e2fe97c04da8c675a89185ae5bb597f7b2bdd2097                            io.containerd.runc.v2    
7f3783a9199736931fa2b97915238e024f17ecb456c2c83e89520e4c0e4de6f3    k8s.gcr.io/pause:3.2                                                                               io.containerd.runc.v2    
9a9bca9a8e61143df776733aca2503094fa6ab7560f06c9a6809942f16418cd0    k8s.gcr.io/pause:3.2                                                                               io.containerd.runc.v2    
a0f5f0b540e2bcde26ab4cb7ea887718202252ec9abe35b8be3bc351c9163d2c    sha256:021ecb3cb5348375a201bc8e2fe97c04da8c675a89185ae5bb597f7b2bdd2097                            io.containerd.runc.v2    
d1f8ce098f089ffc85065d003c803223bd89f9508b5f78fccfcb7942b1a17f4d    k8s.gcr.io/pause:3.2                                                                               io.containerd.runc.v2    
d89404d155c09d8b2d5134989675593b2d45a3fe2b7972c1aba83f990af92dca    docker.io/calico/pod2daemon-flexvol:v3.18.4                                                        io.containerd.runc.v2    
d98889f80a38aa7b5dde86f7560b3a5791f2e621896063f991a994e79b84cce1    k8s.gcr.io/pause:3.2                                                                               io.containerd.runc.v2    
dac664782bbacf4e9d531c79d8702ce7070b7cbf6ef0e0f6936036c574c1d946    docker.io/calico/node:v3.18.4                                                                      io.containerd.runc.v2    
dc02fc7c8964173ac4d9273590191079b9aa943de62071cebdf31784dbe28b89    sha256:80d28bedfe5dec59da9ebf8e6260224ac9008ab5c11dbbe16ee3ba3e4439ac2c                            io.containerd.kata.v2    
ebd27f60289200020add88b0e09de8347521ddd9a9710f1dd5ea1b824be38ea4    sha256:2ce0e04399acab807c909223153f44dcd197765af5eb1e0a858acaf8869b27e4                            io.containerd.runc.v2    
efefdc0c917678590423c193f59ab3bcce02c7282fe876422506fd45e9693967    docker.io/library/nginx@sha256:8f7dcfc0d8c01c5b66a49f1f33803c959a354fabb4d0128e6144a7732c8e70eb    io.containerd.kata.v2    
fc7b02adf0cfe264f02a8dffe6ba938377b129eb51a471257d58f524cf05351f    k8s.gcr.io/kube-proxy:v1.21.0                                                                      io.containerd.runc.v2    


Thursday, 17 June 2021

Networking notworking on Ubuntu ?

 Whilst helping a colleague dig into some issues he was seeing pulling an IBM WebSphere Liberty image from Docker Hub, I realised that my Ubuntu box was missing a couple of useful utilities, including nslookup and traceroute.

In the latter case, I did have traceroute6 but not the IP v4 equivalent.

Easily fixed ...

apt-get update && apt-get install -y dnsutils traceroute

and now we're good to go: -

which nslookup

/usr/bin/nslookup

nslookup -version

nslookup 9.16.1-Ubuntu

which traceroute

/usr/sbin/traceroute

traceroute --version

Modern traceroute for Linux, version 2.1.0
Copyright (c) 2016  Dmitry Butskoy,   License: GPL v2 or any later

Monday, 14 June 2021

Design, build, and deploy universal application images

 From IBM ( for whom I've worked for ~30 years ) : -

Building high-quality container images and their corresponding pod specifications is the foundation for Kubernetes to effectively run and manage an application in production. There are numerous ways to build images, so knowing where to start can be confusing.
This learning path introduces you to the universal application image (UAI). A UAI is an image that uses Red Hat’s Universal Base Image (UBI) as its foundation, includes the application being deployed, and also adds extra elements that make it more secure and scalable in Kubernetes and Red Hat OpenShift.
Specifically, a universal application image:
    Is built from a Red Hat UBI
    Can run on Kubernetes and OpenShift
    Does not require any Red Hat licensing, so it’s freely distributable
    Includes qualities that make it run more efficiently
    Is supported by Red Hat when run in OpenShift
The articles in this learning path describes best practices for packaging an application, highlighting elements that are critical to include in designing the image, performing the build, and deploying the application.

Design, build, and deploy universal application images

Thursday, 10 June 2021

Tinkering with containerd and the ctr tool

 Some notes from a recent tinkering with containerd and ctr ...

which ctr

/usr/bin/ctr

ctr version

Client:
  Version:  1.4.4-0ubuntu1~20.04.2
  Revision:
  Go version: go1.13.8
Server:
  Version:  1.4.4-0ubuntu1~20.04.2
  Revision:
  UUID: 47a84416-93a1-4934-b850-fecb8dddf519

Pull an image

ctr image pull docker.io/library/nginx:latest -u davidhay1969

docker.io/library/nginx:latest:                                                   resolved       |++++++++++++++++++++++++++++++++++++++|
index-sha256:6d75c99af15565a301e48297fa2d121e15d80ad526f8369c526324f0f7ccb750:    exists         |++++++++++++++++++++++++++++++++++++++|
manifest-sha256:61191087790c31e43eb37caa10de1135b002f10c09fdda7fa8a5989db74033aa: exists         |++++++++++++++++++++++++++++++++++++++|
layer-sha256:351ad75a6cfabc7f2e103963945ff803d818f0bdcf604fd2072a0eefd6674bde:    exists         |++++++++++++++++++++++++++++++++++++++|
layer-sha256:596b1d696923618bec6ff5376cc9aed03a3724bc75b6c03221fd877b62046d05:    exists         |++++++++++++++++++++++++++++++++++++++|
layer-sha256:30afc0b18f67ae8441c2d26e356693009bb8927ab7e3bce05d5ed99531c9c1d4:    exists         |++++++++++++++++++++++++++++++++++++++|
layer-sha256:febe5bd23e98102ed5ff64b8f5987f516a945745c08bbcf2c61a50fb6e7b2257:    exists         |++++++++++++++++++++++++++++++++++++++|
layer-sha256:8283eee92e2f756bd57f96ea295e332ab9031724267d4f939de1f7d19fe9611a:    exists         |++++++++++++++++++++++++++++++++++++++|
config-sha256:d1a364dc548d5357f0da3268c888e1971bbdb957ee3f028fe7194f1d61c6fdee:   exists         |++++++++++++++++++++++++++++++++++++++|
layer-sha256:69692152171afee1fd341febc390747cfca2ff302f2881d8b394e786af605696:    exists         |++++++++++++++++++++++++++++++++++++++|
elapsed: 1.3 s                                                                    total:   0.0 B (0.0 B/s)                                         
unpacking linux/amd64 sha256:6d75c99af15565a301e48297fa2d121e15d80ad526f8369c526324f0f7ccb750...
done

List images

ctr image list

docker.io/library/nginx:latest              application/vnd.docker.distribution.manifest.list.v2+json sha256:6d75c99af15565a301e48297fa2d121e15d80ad526f8369c526324f0f7ccb750 51.3 MiB  linux/386,linux/amd64,linux/arm/v5,linux/arm/v7,linux/arm64/v8,linux/mips64le,linux/ppc64le,linux/s390x              -      

Create a container ( in background mode via -d )

ctr run --net-host -d --rm -t docker.io/library/nginx:latest nginx

Nothing returned

List running containers

ctr container list

CONTAINER    IMAGE                                                RUNTIME
nginx        docker.io/library/nginx:latest    io.containerd.runc.v2    

List tasks

ctr task list

TASK                 PID     STATUS    
nginx    1287661    RUNNING

List Linux processes

ps aux | grep containerd | grep -v grep

root       39604  0.8  1.6 1287024 67348 ?       Ssl  Jun08  18:11 /usr/bin/containerd
root     1287636  0.0  0.1 111852  7952 ?        Sl   01:44   0:00 /usr/bin/containerd-shim-runc-v2 -namespace default -id nginx -address /run/containerd/containerd.sock

Inspect task

ctr task ps nginx

PID        INFO
1287661    -
1287712    -
1287713    -

Attempt to remove task

ctr task delete nginx

ERRO[0000] unable to delete nginx                        error="task must be stopped before deletion: running: failed precondition"
ctr: task must be stopped before deletion: running: failed precondition

Attempt to remove container

ctr container delete nginx

ERRO[0000] failed to delete container "nginx"            error="cannot delete a non stopped container: {running 0 0001-01-01 00:00:00 +0000 UTC}"
ctr: cannot delete a non stopped container: {running 0 0001-01-01 00:00:00 +0000 UTC}

Kill the task

ctr task kill nginx

Nothing returned

Attempt to remove task

ctr task delete nginx

Nothing returned

Attempt to remove container

ctr container delete nginx

Nothing returned

Create a container ( in foreground mode via -t with bash )

- note that the container automatically terminates, and is removed, upon exit, via the --rm remove switch

ctr run --net-host -t --rm -t docker.io/library/nginx:latest nginx sh

#

Inspect Nginx configuration ( inside container )

cat /etc/nginx/nginx.conf

user  nginx;
worker_processes  auto;
error_log  /var/log/nginx/error.log notice;
pid        /var/run/nginx.pid;

events {
    worker_connections  1024;
}

http {
    include       /etc/nginx/mime.types;
    default_type  application/octet-stream;
    log_format  main  '$remote_addr - $remote_user [$time_local] "$request" '
                      '$status $body_bytes_sent "$http_referer" '
                      '"$http_user_agent" "$http_x_forwarded_for"';
    access_log  /var/log/nginx/access.log  main;
    sendfile        on;
    #tcp_nopush     on;
    keepalive_timeout  65;
    #gzip  on;
    include /etc/nginx/conf.d/*.conf;
}

Exit container

exit

Create a container ( in foreground mode via -t with bash mounting a local /k8s directory into the container as /k8s )

mkdir /k8s

echo "Hello World!" >> /k8s/greeting.txt

ctr run --net-host --mount type=bind,src=/k8s,dst=/k8s,options=rbind -t --rm -t docker.io/library/nginx:latest nginx sh

#

Display greeting from inside container

cat /k8s/greeting.txt

Hello World!

Exit container

exit

Wednesday, 9 June 2021

Tinkering with OpenLDAP on Docker on Ubuntu

 Following a discussion with a colleague on Slack, I thought I'd remind myself how OpenLDAP works as a service running inside a container, via the Docker container runtime interface (CRI).

Using this for inspiration: -

Docker image for OpenLDAP support

I pulled the requisite image from Docker Hub: -

docker pull osixia/openldap:1.5.0 -u davidhay1969:<DOCKER TOKEN>

and created a container: -

docker run --detach -p 3389:389 osixia/openldap:1.5.0 

Note that I'm using port mapping via -p 3389:389 to map the external ( host ) port of 3389 to the internal ( container ) port of 389

This allows me to run the container without needing to run it in privileged mode ( as Unix typically blocks non-root processes from listening on ports lower than 1,024 ).

Once the container was running happily: -

docker ps -a

CONTAINER ID   IMAGE                   COMMAND                 CREATED          STATUS          PORTS                            NAMES
23a39685da58   osixia/openldap:1.5.0   "/container/tool/run"   20 minutes ago   Up 20 minutes   636/tcp, 0.0.0.0:3389->389/tcp   agitated_mendel
55de9ae1b94a   busybox                 "sh"                    2 days ago       Created                                          nostalgic_mclean
da6a3136a33e   busybox                 "sh"                    13 days ago      Created                                          happy_swirles

I installed ldap-utils to give me the ldapsearch command: -

apt-get install -y ldap-utils

and then ran ldapsearch against the container via the mapped port: -

ldapsearch -H ldap://localhost:3389 -D cn=admin,dc=example,dc=org -w admin -b dc=example,dc=org

Note that I'm using the default credentials of admin / admin and would, of course, be changing this if this was a real-world environment .....

Bash and fun with escape characters

Whilst writing the world's simplest Bash script to: -

  • delete a pod
  • deploy a new pod
  • list the running pods
  • describe the newly deployed pod

I wanted to add newline characters into my echo statements, to make things more readable.

I've written before about echo -e so was just double-checking my understanding via the command-line ...

I entered: -

echo -e "Hello World!\n"

but, instead of a friendly greeting, I saw: -

-bash: !\n: event not found

Wait, what now ?

Yeah, of course, I've entered a magical place escape sequence of: -

pling slash n

which, obviously, isn't what I meant to do ....

Easy solution - stick in a space character ...

echo -e "Hello World! \n"

Hello World! 

So here's the finished script: -

#!/bin/bash

clear

echo -e "Deleting existing busybox-kata pod\n"

kubectl delete pod busybox-kata

echo -e "\nDeploying new busybox-kata pod\n"

kubectl apply -f busybox-kata.yaml 

echo -e "\nSleeping ...\n"

sleep 10

echo -e "\nChecking pods ...\n"

kubectl get pods

echo -e "\nSleeping ...\n"

sleep 5

echo -e "\nDescribing busybox-kata pod to see all is good ...\n"

kubectl describe pod busybox-kata | tail -8

Sunday, 6 June 2021

Doh, Kubernetes fails to run due to a lack of ...

Having started the build of a new K8s 1.21 cluster: -

kubeadm init --pod-network-cidr=10.244.0.0/16 --apiserver-advertise-address=$private_ip --apiserver-cert-extra-sans=$public_ip --kubernetes-version ${KUBE_VERSION}

I saw: -
[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s
[kubelet-check] Initial timeout of 40s passed.
Unfortunately, an error has occurred:
timed out waiting for the condition
This error is likely caused by:
- The kubelet is not running
- The kubelet is unhealthy due to a misconfiguration of the node in some way (required cgroups disabled)
If you are on a systemd-powered system, you can try to troubleshoot the error with the following commands:
- 'systemctl status kubelet'
- 'journalctl -xeu kubelet'
Additionally, a control plane component may have crashed or exited when started by the container runtime.
To troubleshoot, list all containers using your preferred container runtimes CLI.
Here is one example how you may list all Kubernetes containers running in cri-o/containerd using crictl:
- 'crictl --runtime-endpoint /run/containerd/containerd.sock ps -a | grep kube | grep -v pause'
Once you have found the failing container, you can inspect its logs with:
- 'crictl --runtime-endpoint /run/containerd/containerd.sock logs CONTAINERID'
error execution phase wait-control-plane: couldn't initialize a Kubernetes cluster
To see the stack trace of this error execute with --v=5 or higher

I did a spot of digging in the system log ( /var/log/syslog ) and found this: -

Jun  6 12:17:53 hurlinux2 containerd[39485]: time="2021-06-06T12:17:53.210883120-07:00" level=error msg="RunPodSandbox for &PodSandboxMetadata{Name:kube-apiserver-telicity1.fyre.ibm.com,Uid:599ab88dc99dd5fbdb7c6a92e4e965ba,Namespace:kube-system,Attempt:0,} failed, error" error="failed to create containerd task: failed to create shim: OCI runtime create failed: unable to retrieve OCI runtime error (open /run/containerd/io.containerd.runtime.v2.task/k8s.io/177b079efbe1b71b65586a467313d1a15e802a2b2d323a7ed974d5d2e99e33f5/log.json: no such file or directory): exec: \"runc\": executable file not found in $PATH: unknown"
Jun  6 12:17:53 hurlinux2 kubelet[41466]: E0606 12:17:53.211759   41466 remote_runtime.go:116] "RunPodSandbox from runtime service failed" err="rpc error: code = Unknown desc = failed to create containerd task: failed to create shim: OCI runtime create failed: unable to retrieve OCI runtime error (open /run/containerd/io.containerd.runtime.v2.task/k8s.io/177b079efbe1b71b65586a467313d1a15e802a2b2d323a7ed974d5d2e99e33f5/log.json: no such file or directory): exec: \"runc\": executable file not found in $PATH: unknown"

which made me think "Hmmmm, wonder what I forgot

A quick trip to Aptitude ...

apt-get install runc

Reading package lists... Done
Building dependency tree       
Reading state information... Done
The following NEW packages will be installed:
  runc
0 upgraded, 1 newly installed, 0 to remove and 8 not upgraded.
Need to get 4,018 kB of archives.
After this operation, 15.7 MB of additional disk space will be used.
Get:1 http://us.archive.ubuntu.com/ubuntu focal-updates/main amd64 runc amd64 1.0.0~rc93-0ubuntu1~20.04.2 [4,018 kB]
Fetched 4,018 kB in 1s (4,037 kB/s)
Selecting previously unselected package runc.
(Reading database ... 107797 files and directories currently installed.)
Preparing to unpack .../runc_1.0.0~rc93-0ubuntu1~20.04.2_amd64.deb ...
Unpacking runc (1.0.0~rc93-0ubuntu1~20.04.2) ...
Setting up runc (1.0.0~rc93-0ubuntu1~20.04.2) ...
Processing triggers for man-db (2.9.1-1) ...

and a kubeadm reset ( to clear down the borked cluster creation ) and I was then able to re-run kubeadm init as before, and we're good to go .....


TIL: How do I find the requisite information to join a Compute Node to an existing K8s Cluster ?

Having build a K8s 1.21 cluster a week or so back, I'd removed my Compute Node with: -

kubeadm reset

( run on the Compute Node itself )

I then wanted to find the command that I'd previously used to join the Compute Node to the cluster.

Now kubeadm init generates a token etc. which lives for 24 hours.

So finding the command from hist or from documentation ain't gonna cut it.

Thankfully, we have this: -

kubeadm token create --print-join-command

which generates output such as this: -

kubeadm join 10.51.16.135:6443 --token isahtxb.nfv74gu4yxbxxq2j --discovery-token-ca-cert-hash sha256:1375f426f376b99240ed34bf952f4c026cc0afaad4adbba816187ff5bcc384b6

with which I can then join the Compute Node into the cluster, and we're back up and running ...

Thanks as ever to StackOverflow, who've got my back with: -

How do I find the join command for kubeadm on the master?

Why won't Kubernetes kubelet come up ?

 After an unscheduled reboot of the VMs that host my K8s cluster, I was struggling to work out why the kubelet wasn't starting properly.

I ran systemctl start kubelet.service to start it and then checked the status with systemctl status kubelet.service which showed: -

● kubelet.service - kubelet: The Kubernetes Node Agent
     Loaded: loaded (/lib/systemd/system/kubelet.service; enabled; vendor preset: enabled)
    Drop-In: /etc/systemd/system/kubelet.service.d
             └─10-kubeadm.conf
     Active: active (running) since Sun 2021-06-06 00:35:01 PDT; 3s ago
       Docs: https://kubernetes.io/docs/home/
   Main PID: 82478 (kubelet)
      Tasks: 7 (limit: 2279)
     Memory: 14.6M
     CGroup: /system.slice/kubelet.service
             └─82478 /usr/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf --config=/var/lib/kubelet/conf>
Jun 06 00:35:01 garble1.domain.com systemd[1]: Started kubelet: The Kubernetes Node Agent.
Jun 06 00:35:01 garble1.domain.com kubelet[82478]: I0606 00:35:01.836881   82478 server.go:197] "Warning: For remote container runtime, --pod-infra-container-image is i>
Jun 06 00:35:01 garble1.domain.com ubelet[82478]: I0606 00:35:01.866762   82478 server.go:440] "Kubelet version" kubeletVersion="v1.21.0"
Jun 06 00:35:01 garble1.domain.com kubelet[82478]: I0606 00:35:01.867455   82478 server.go:851] "Client rotation is on, will bootstrap in background"
Jun 06 00:35:01 garble1.domain.com kubelet[82478]: I0606 00:35:01.870367   82478 certificate_store.go:130] Loading cert/key pair from "/var/lib/kubelet/pki/kubelet-clie>
Jun 06 00:35:01 garble1.domain.com kubelet[82478]: I0606 00:35:01.873004   82478 dynamic_cafile_content.go:167] Starting client-ca-bundle::/etc/kubernetes/pki/ca.crt

which looked OK.

I checked again: -

systemctl status kubelet.service

and saw: -

● kubelet.service - kubelet: The Kubernetes Node Agent
     Loaded: loaded (/lib/systemd/system/kubelet.service; enabled; vendor preset: enabled)
    Drop-In: /etc/systemd/system/kubelet.service.d
             └─10-kubeadm.conf
     Active: activating (auto-restart) (Result: exit-code) since Sun 2021-06-06 00:35:22 PDT; 8s ago
       Docs: https://kubernetes.io/docs/home/
    Process: 82505 ExecStart=/usr/bin/kubelet $KUBELET_KUBECONFIG_ARGS $KUBELET_CONFIG_ARGS $KUBELET_KUBEADM_ARGS $KUBELET_EXTRA_ARGS (code=exited, status=1/FAILURE)
   Main PID: 82505 (code=exited, status=1/FAILURE)
Jun 06 00:35:22 garble1.domain.com systemd[1]: kubelet.service: Main process exited, code=exited, status=1/FAILURE
Jun 06 00:35:22 garble1.domain.com systemd[1]: kubelet.service: Failed with result 'exit-code'.

which looked not so good.

I then checked the syslog with: -

tail -f /var/log/syslog

and saw, amongst many other things, this: -

Jun  6 00:40:27 garble1 kubelet[83211]: E0606 00:40:27.104582   83211 server.go:292] "Failed to run kubelet" err="failed to run Kubelet: running with swap on is not supported, please disable swap! or set --fail-swap-on flag to false. /proc/swaps contained: [Filename\t\t\t\tType\t\tSize\tUsed\tPriority /swap.img                               file\t\t4194300\t0\t-2]"


Of course, the VMs were rebooted ... so swap is still on ....

A quick trip to swapoff with: -

swapoff -a

and we're back in the game.

kubectl get nodes

NAME                     STATUS   ROLES                  AGE     VERSION
garble1.domain.com   Ready    control-plane,master   3d13h   v1.21.0
garble2.domain.com   Ready    <none>                 3d13h   v1.21.0

crictl pods

POD ID              CREATED             STATE               NAME                                             NAMESPACE           ATTEMPT
c3969548182d6       17 seconds ago      Ready               calico-node-nl2g2                                kube-system         0
bd06ccb126620       18 seconds ago      Ready               kube-proxy-ht4mq                                 kube-system         0
5a31b04c1d01a       18 seconds ago      Ready               kube-scheduler-garble1.domain.com            kube-system         0
ac6e59ccb87f1       25 seconds ago      Ready               kube-controller-manager-garble1.domain.com   kube-system         0
d2ece5d26441e       35 seconds ago      Ready               kube-apiserver-garble1.domain.com            kube-system         0
10019ac4de96d       45 seconds ago      Ready               etcd-garble1.domain.com                      kube-system         0

Wrangling Kubernetes using crictl

I needed to find a way to remove a bunch of NotReady pods from my K8s 1.21 cluster, on both the Control Plane and Compute Nodes.

Simples, use crictl

A useful StackOverflow post - Master not pods in NotReady status - gave me this: -

crictl pods|grep NotReady|cut -f1 -d" "|xargs -L 1 -I {} -t crictl rmp {}

but, of course, I had to remember how to tell crictl to look at the requisite endpoint - specifically at unix:///var/run/containerd/containerd.sock 

This was easy; I created /etc/crictl.yaml 

runtime-endpoint: unix:///var/run/containerd/containerd.sock
image-endpoint: unix:///var/run/containerd/containerd.sock
timeout: 10
debug: false

and was off to the races ...

Thursday, 3 June 2021

Apple Remote - tell the telly to turn the heck off !

One nice feature of the Apple TV ( I have a couple of 'em ) is that they can turn the HDMI-attached TV on when you hit the [Menu] button on the  remote ... which is nice ....

But, of course, there's no off button .....

Wait, what now ?

So how do I turn off the TV when I'm done ? Go find and use the "normal" TV remote like a cave person ?

Or.....

Hit the "Siri' button - the one with the mic logo and say something like "Please turn off" ...

And that's it - one remote to rule them all, one remote to find them ( no, that's AirTags, fool ), one remote to bring them all and, in the Dark Mode, to bind them



Wednesday, 2 June 2021

Now that I did not know - using pushd and popd to navigate the Bourne Again Shell (BASH)

Further tinkering with Kata Containers etc. led me here: -

Install cri-tools

You can install the cri-tools from source code:

$ go get github.com/kubernetes-incubator/cri-tools

$ pushd $GOPATH/src/github.com/kubernetes-incubator/cri-tools

$ make

$ sudo -E make install

$ popd

I'd seen references to pushd and popd before but decided to unleash Google fu to know a little more ....

Navigating the Bash shell with pushd and popd

Pushd and popd are the fastest navigational commands you've never heard of.

The pushd and popd commands are built-in features of the Bash shell to help you "bookmark" directories for quick navigation between locations on your hard drive. You might already feel that the terminal is an impossibly fast way to navigate your computer; in just a few key presses, you can go anywhere on your hard drive, attached storage, or network share. But that speed can break down when you find yourself going back and forth between directories, or when you get "lost" within your filesystem. Those are precisely the problems pushd and popd can help you solve.

So now I'm learning .....

Tuesday, 1 June 2021

Inspecting certificates using OpenSSL and a variant of grep

In the context of: -

Building Kubernetes on Linux on IBM Z - it's a matter of trust ...

today I learned (TIL) that one could use egrep to examine x509 certificates: -

echo | openssl s_client -connect storage.googleapis.com:443 | egrep "^subject=|^issuer="

depth=2 OU = GlobalSign Root CA - R2, O = GlobalSign, CN = GlobalSign
verify return:1
depth=1 C = US, O = Google Trust Services, CN = GTS CA 1O1
verify return:1
depth=0 C = US, ST = California, L = Mountain View, O = Google LLC, CN = *.storage.googleapis.com
verify return:1
DONE
subject=/C=US/ST=California/L=Mountain View/O=Google LLC/CN=*.storage.googleapis.com
issuer=/C=US/O=Google Trust Services/CN=GTS CA 1O1

which is good to know 😁

Building Kubernetes on Linux on IBM Z - it's a matter of trust ...

 One of my colleagues saw an interesting issue when trying to build a new Kubernetes cluster on an Ubuntu Linux environment ( on IBM Z ).

For the record, we're running Kubernetes inside Ubuntu containers which are hosted, via runq, on a Secure Service Container (SSC) logical partition (LPAR). In this scenario, we're using docker as the container runtime inside the Ubuntu container which is running inside the SSC LPAR ( nested FTW ).

However, the specific issue seen when running commands such as: -

kubeadm init --pod-network-cidr=192.168.0.0/16 --ignore-preflight-errors=all

wasn't directly related to the use of runq or the SSC LPAR.

Instead, the command returned: -

[preflight] You can also perform this action in beforehand using 'kubeadm config images pull'
[WARNING ImagePull]: failed to pull image k8s.gcr.io/kube-apiserver:v1.20.7: output: Error response from daemon: Get https://k8s.gcr.io/v2/: x509: certificate signed by unknown authority
, error: exit status 1
[WARNING ImagePull]: failed to pull image k8s.gcr.io/kube-controller-manager:v1.20.7: output: Error response from daemon: Get https://k8s.gcr.io/v2/: x509: certificate signed by unknown authority
, error: exit status 1
[WARNING ImagePull]: failed to pull image k8s.gcr.io/kube-scheduler:v1.20.7: output: Error response from daemon: Get https://k8s.gcr.io/v2/: x509: certificate signed by unknown authority
, error: exit status 1
[WARNING ImagePull]: failed to pull image k8s.gcr.io/kube-proxy:v1.20.7: output: Error response from daemon: Get https://k8s.gcr.io/v2/: x509: certificate signed by unknown authority
, error: exit status 1
[WARNING ImagePull]: failed to pull image k8s.gcr.io/pause:3.2: output: Error response from daemon: Get https://k8s.gcr.io/v2/: x509: certificate signed by unknown authority
, error: exit status 1
[WARNING ImagePull]: failed to pull image k8s.gcr.io/etcd:3.4.13-0: output: Error response from daemon: Get https://k8s.gcr.io/v2/: x509: certificate signed by unknown authority
, error: exit status 1
[WARNING ImagePull]: failed to pull image k8s.gcr.io/coredns:1.7.0: output: Error response from daemon: Get https://k8s.gcr.io/v2/: x509: certificate signed by unknown authority
, error: exit status 1

Interestingly, an article on LinkedIn led me to the solution: -


In part, the author had me reload/restart the Docker service: -

service docker reload
service docker restart

Prior to this, I'd also checked some of the missing pre-requisite steps: -

apt-get install -y ca-certificates gnupg2 curl apt-transport-https

curl -s https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo apt-key add -

plus opening up some firewall ports, using iptables : -

iptables -A INPUT -p tcp -m tcp --dport 6443 -j ACCEPT
iptables -A INPUT -p tcp -m tcp --dport 10250 -j ACCEPT

However, I suspect that the combination of: -

apt-get install -y ca-certificates gnupg2 curl apt-transport-https

and: -

service docker reload
service docker restart

did the trick.

One nice thing that I learned ( TIL ) was: -

kubeadm config images pull

to test whether kubeadm can get it's required image before starting the init process: -

I0601 10:02:47.536650   25480 version.go:251] remote version is much newer: v1.21.1; falling back to: stable-1.20
[config/images] Pulled k8s.gcr.io/kube-apiserver:v1.20.7
[config/images] Pulled k8s.gcr.io/kube-controller-manager:v1.20.7
[config/images] Pulled k8s.gcr.io/kube-scheduler:v1.20.7
[config/images] Pulled k8s.gcr.io/kube-proxy:v1.20.7
[config/images] Pulled k8s.gcr.io/pause:3.2
[config/images] Pulled k8s.gcr.io/etcd:3.4.13-0
[config/images] Pulled k8s.gcr.io/coredns:1.7.0

Having pulled the images normally, without any trust exceptions, kubeadm init ran happily .....

Kata Containers and Ubuntu Linux - lessons learned - 4/many

Building on the series: -

Kata Containers and Ubuntu Linux - lessons learned - 1/many

Kata Containers and Ubuntu Linux - lessons learned - 2/many

Kata Containers and Ubuntu Linux - lessons learned - 3/many - a WIP

I've also had some fun n' games trying to build various components of Kata Containers 2.0 under Ubuntu, including the kernel that's used within the guest Virtual Machine (VM) ...

Problem

Building the kernel: -

sudo ./build-kernel.sh build

fails with: -

/home/hayd/go/src/github.com/kata-containers/kata-containers/tools/packaging/kernel/../scripts/lib.sh: line 25: go: command not found
~/go/src/github.com/kata-containers/tests /home/hayd/go/src/github.com/kata-containers/kata-containers/tools/packaging/kernel
/home/hayd/go/src/github.com/kata-containers/kata-containers/tools/packaging/kernel
INFO: Config version: 85
INFO: Kernel version: 5.10.25
***
*** Configuration file ".config" not found!
***
*** Please run some configurator (e.g. "make oldconfig" or
*** "make menuconfig" or "make xconfig").
***
make: *** [Makefile:697: .config] Error 1

Similarly: -

sudo ./build-kernel.sh setup

fails with: -

/home/hayd/go/src/github.com/kata-containers/kata-containers/tools/packaging/kernel/../scripts/lib.sh: line 25: go: command not found
~/go/src/github.com/kata-containers/tests /home/hayd/go/src/github.com/kata-containers/kata-containers/tools/packaging/kernel
/home/hayd/go/src/github.com/kata-containers/kata-containers/tools/packaging/kernel
INFO: Config version: 85
INFO: Kernel version: 5.10.25
INFO: kernel path does not exist, will download kernel
linux-5.10.25.tar.xz: OK
INFO: kernel tarball already downloaded
linux-5.10.25.tar.xz: OK
INFO: Apply patches from /home/hayd/go/src/github.com/kata-containers/kata-containers/tools/packaging/kernel/patches/5.10.x
INFO: Found 2 patches
INFO: Apply /home/hayd/go/src/github.com/kata-containers/kata-containers/tools/packaging/kernel/patches/5.10.x/0001-arm64-mmu-compared-with-linear-start-physical-addres.patch
INFO: Apply /home/hayd/go/src/github.com/kata-containers/kata-containers/tools/packaging/kernel/patches/5.10.x/0001-NO-UPSTREAM-9P-always-use-cached-inode-to-fill-in-v9.patch
INFO: Constructing config from fragments: /home/hayd/go/src/github.com/kata-containers/kata-containers/tools/packaging/kernel/configs/fragments/x86_64/.config
/bin/sh: 1: flex: not found
make[1]: *** [scripts/Makefile.host:9: scripts/kconfig/lexer.lex.c] Error 127
make: *** [Makefile:602: allnoconfig] Error 2

In both cases, the problems is missing pre-requisites ...

Solution

Install the missing prereqs e.g. flex, bison etc.

apt-get update && apt-get --with-new-pkgs upgrade -y 

apt-get install -y docker.io make flex bison libelf-dev

etc.

Kata Containers and Ubuntu Linux - lessons learned - 3/many - a WIP

Following on from: -

Kata Containers and Ubuntu Linux - lessons learned - 1/many

and: -

Kata Containers and Ubuntu Linux - lessons learned - 2/many

here's one I've yet to solve ....

Having overcome the earlier issues with Kata Containers 2.0 on Ubuntu Linux 20.04 running under VMware Fusion 12 on macOS 11 ( phew ), I'm hitting a similar/different issue ...

Problem

Starting a container using Kata 2: -

sudo ctr run --rm --tty --runtime io.containerd.kata.v2 docker.io/library/ubuntu:latest ubuntu

returns: -

[sudo] password for hayd: 
ctr: failed to create shim: failed to launch qemu: exit status 1, error messages from qemu log: qemu-system-x86_64: error: failed to set MSR 0x48d to 0x5600000016
qemu-system-x86_64: ../target/i386/kvm.c:2701: kvm_buf_set_msrs: Assertion `ret == cpu->kvm_msr_buf->nmsrs' failed.
: unknown

From a fair amount of Google fu, I'm thinking that this is a nested virtualisation issue i.e. I'm using a nice stack of software here ....

Top

macOS 11.4 Big Sur

VMware Fusion 12.1.2

Ubuntu 20.04.2

kernel 5.4.0-73-generic

containerd 1.5.2

kata-runtime 2.1.0

qemu 4.2-3ubuntu6.16

Bottom

I've not yet got to the bottom of this ...... watch this space ....

Kata Containers and Ubuntu Linux - lessons learned - 2/many

Following on from  Kata Containers and Ubuntu Linux - lessons learned - 1/many here's another one ...

Testing the Kata Containers 2.0 runtime environment was failing, again using Ubuntu 20.04 on VMware Fusion on macOS 11 ...

Problem

sudo kata-runtime kata-check

returns: -

WARN[0000] Not running network checks as super user      arch=amd64 name=kata-runtime pid=1283 source=runtime
ERRO[0000] CPU property not found                        arch=amd64 description="Virtualization support" name=vmx pid=1283 source=runtime type=flag
WARN[0000] modprobe insert module failed                 arch=amd64 error="exit status 1" module=kvm_intel name=kata-runtime output="modprobe: ERROR: could not insert 'kvm_intel': Operation not supported\n" pid=1283 source=runtime
ERRO[0000] kernel property not found                     arch=amd64 description="Intel KVM" name=kvm_intel pid=1283 source=runtime type=module
ERRO[0000] ERROR: System is not capable of running Kata Containers  arch=amd64 name=kata-runtime pid=1283 source=runtime
ERROR: System is not capable of running Kata Containers

Solution

Update the VM configuration to support virtualisation via "Enable hypervisor applications in this virtual machine"



Kata Containers and Ubuntu Linux - lessons learned - 1/many

 This is the first of a few consecutive posts about my recent experiences with Kata Containers and Ubuntu, running on various platforms including my Mac ( via VMware Fusion 12 ).

I'm building up a list of "lessons learned" here, so that I can come back and find them when I need them ....


Firstly, having installed Kata 2.0 as the underlying container runtime for containerd  I was testing the container creation process on an Ubuntu 20.04 VM running under Fusion on my Mac ....

Problem 

sudo ctr run --rm --tty --runtime io.containerd.kata.v2 docker.io/library/ubuntu:latest ubuntu

returns: -

ctr: open /dev/vhost-vsock: no such device: unknown

Debugging with: -

sudo modprobe /dev/vhost-vsock

threw up: -

modprobe: FATAL: Module /dev/vhost-vsock not found in directory /lib/modules/5.4.0-73-generic

Solution

So the default VMware Tools equivalent ( open-vm-tools ) was getting in the way: -

Stop the Service

sudo service open-vm-tools stop

Unload the module that's grabbing vsock

sudo modprobe -r vmw_vsock_vmci_transport

Load the vhost_vsock module

sudo modprobe /dev/vhost-vsock

Job done

Alternative

Uninstall open-vm-tools

sudo apt-get remove --auto-remove open-vm-tools

More to follow ....

Reminder - installing podman and skopeo on Ubuntu 22.04

This follows on from: - Lest I forget - how to install pip on Ubuntu I had reason to install podman  and skopeo  on an Ubuntu box: - lsb_rel...