A Portal to a Portal: June 2021

Monday, 28 June 2021

Docker Content Trust and the Case of the PEBCAK

I'm doing some work with trusted/signed container images at present, and am using IBM Container Registry (ICR) as my ... container registry.

I'm doing the actual build/tag/push from an Ubuntu 20.04 VM, having logged into ICR: -

docker login us.icr.io --username iamapikey

and having set my Bash variables to enable Docker Content Trust (DCT): -

export DOCKER_CONTENT_TRUST=1

export DOCKER_CONTENT_TRUST_SERVER=https://us.icr.io:4443/

I then built / tagged: -

docker build --no-cache -t us.icr.io/foobarsnafu/hello_world_nginx_dct_june_2021:latest -f Dockerfile .

and pushed my image: -

docker push us.icr.io/foobarsnafu/hello_world_nginx_dct_june_2021:latest

Whilst this appeared to work, it ultimately failed: -

The push refers to repository [us.icr.io/foobarsnafu/hello_world_nginx_dct_june_2021]

d0471711ab1a: Pushed

5dbe8c3d30af: Pushed

67780d477478: Pushed

5db88766b0e0: Pushed

36dfa50192c8: Pushed

8506b073cd53: Pushed

468af79aab10: Pushed

fbf82c12d86e: Pushed

4dc20fbc0e8d: Pushed

b831cc3ae47e: Pushed

ace0eda3e3be: Pushed

latest: digest: sha256:bfcadd198529d842b97dcd633f7b0b65fbcdca4599886a172a31eff0543f3f9d size: 2610

Signing and pushing trust metadata

unable to reach trust server at this time: 301.

I checked and re-checked my steps, and then turned to my faithful friend, Google .... and found this: -

* fatal: unable to reach trust server at this time: 301. #1516

in which the person raising the issue said this: -

After a day and a half of debugging and redoing everything from scratch several times I've figured out that the problem was the trailing slash in the notary server url.

That was revelatory .... notice that I'd previously typed: -

export DOCKER_CONTENT_TRUST_SERVER=https://us.icr.io:4443/

Once I changed my DCT URL to: -

export DOCKER_CONTENT_TRUST_SERVER=https://us.icr.io:4443

( i.e. no trailing slash )

everything was copacetic 😹

Easy when you know ~~how~~ now

Wednesday, 23 June 2021

More fun with containerd and the ctr tool

As per previous posts, I've been tinkering ( gosh, I love that word ) with containerd and Kata 2.0 a lot recently.

Having deployed containerd and Kata 2.0 into my Kubernetes 1.21 environment, I am happily creating Pods using the Kata runtime: -

vi nginx-kata.yaml

apiVersion: v1
kind: Pod
metadata:
name: nginx-kata
spec:
runtimeClassName: kata
containers:
- name: nginx
image: nginx

kubectl apply -f nginx-kata.yaml

pod/nginx-kata created

and then using tools such as crictl to see what's going on ( on the K8s Compute Node ) : -

vi /etc/crictl.yaml

runtime-endpoint: unix:///run/containerd/containerd.sock

image-endpoint: unix:///run/containerd/containerd.sock

timeout: 10

debug: false

crictl pods

POD ID CREATED STATE NAME NAMESPACE ATTEMPT

dc02fc7c89641 10 minutes ago Ready nginx-kata default 0

d1f8ce098f089 2 days ago Ready coredns-558bd4d5db-qwfl9 kube-system 0

7f3783a919973 2 days ago Ready coredns-558bd4d5db-54k58 kube-system 0

5dca2e336f243 2 days ago Ready calico-kube-controllers-cc8959d7f-xwggk kube-system 0

9a9bca9a8e611 2 days ago Ready calico-node-wvn6q kube-system 0

d98889f80a38a 2 days ago Ready kube-proxy-bc897 kube-system 0

crictl ps

CONTAINER ID IMAGE CREATED STATE NAME ATTEMPT POD ID

efefdc0c91767 4f380adfc10f4 11 minutes ago Running nginx 0 dc02fc7c89641

7bc01b279a245 296a6d5035e2d 2 days ago Running coredns 0 d1f8ce098f089

693fae66b8cff 296a6d5035e2d 2 days ago Running coredns 0 7f3783a919973

ebd27f6028920 2ce0e04399aca 2 days ago Running calico-kube-controllers 0 5dca2e336f243

dac664782bbac ebc659140e762 2 days ago Running calico-node 0 9a9bca9a8e611

fc7b02adf0cfe 38ddd85fe90e0 2 days ago Running kube-proxy 0 d98889f80a38a

but I wanted to also see what was going on using the ctr tool, which ships with containerd

I tried this: -

ctr container list

but that only returned: -

CONTAINER IMAGE RUNTIME

I even tried directing ctr to the same endpoint as crictl : -

ctr --address /run/containerd/containerd.sock containers list

which similarly returned: -

CONTAINER IMAGE RUNTIME

I dug about online for a bit and and found this: -

"ctr containers ls" get nothing #1815

in the containerd GitHub repo, which said, in part: -

containerd has namespaces: https://github.com/containerd/containerd/blob/master/README.md#namespaces

ctr --namespace k8s.io containers ls

Once I amended my command: -

ctr --namespace k8s.io container list

all was well: -

CONTAINER IMAGE RUNTIME

5dca2e336f24335693c6e6e36bfa9448b77f62ebd24dc6ff7dfbad6046b4e451 k8s.gcr.io/pause:3.2 io.containerd.runc.v2

693fae66b8cffa4a6f2502704e6fb3ae581a85154284158122801e272182a480 k8s.gcr.io/coredns/coredns:v1.8.0 io.containerd.runc.v2

7bc01b279a245902df8a74daa2647158992ca304313772002d3b92f989c832b4 k8s.gcr.io/coredns/coredns:v1.8.0 io.containerd.runc.v2

7e9debe26471498d06a99403ade2b46e9ce8ef50cade17a2a44efddcacb7ec70 sha256:021ecb3cb5348375a201bc8e2fe97c04da8c675a89185ae5bb597f7b2bdd2097 io.containerd.runc.v2

7f3783a9199736931fa2b97915238e024f17ecb456c2c83e89520e4c0e4de6f3 k8s.gcr.io/pause:3.2 io.containerd.runc.v2

9a9bca9a8e61143df776733aca2503094fa6ab7560f06c9a6809942f16418cd0 k8s.gcr.io/pause:3.2 io.containerd.runc.v2

a0f5f0b540e2bcde26ab4cb7ea887718202252ec9abe35b8be3bc351c9163d2c sha256:021ecb3cb5348375a201bc8e2fe97c04da8c675a89185ae5bb597f7b2bdd2097 io.containerd.runc.v2

d1f8ce098f089ffc85065d003c803223bd89f9508b5f78fccfcb7942b1a17f4d k8s.gcr.io/pause:3.2 io.containerd.runc.v2

d89404d155c09d8b2d5134989675593b2d45a3fe2b7972c1aba83f990af92dca docker.io/calico/pod2daemon-flexvol:v3.18.4 io.containerd.runc.v2

d98889f80a38aa7b5dde86f7560b3a5791f2e621896063f991a994e79b84cce1 k8s.gcr.io/pause:3.2 io.containerd.runc.v2

dac664782bbacf4e9d531c79d8702ce7070b7cbf6ef0e0f6936036c574c1d946 docker.io/calico/node:v3.18.4 io.containerd.runc.v2

dc02fc7c8964173ac4d9273590191079b9aa943de62071cebdf31784dbe28b89 sha256:80d28bedfe5dec59da9ebf8e6260224ac9008ab5c11dbbe16ee3ba3e4439ac2c io.containerd.kata.v2

ebd27f60289200020add88b0e09de8347521ddd9a9710f1dd5ea1b824be38ea4 sha256:2ce0e04399acab807c909223153f44dcd197765af5eb1e0a858acaf8869b27e4 io.containerd.runc.v2

efefdc0c917678590423c193f59ab3bcce02c7282fe876422506fd45e9693967 docker.io/library/nginx@sha256:8f7dcfc0d8c01c5b66a49f1f33803c959a354fabb4d0128e6144a7732c8e70eb io.containerd.kata.v2

fc7b02adf0cfe264f02a8dffe6ba938377b129eb51a471257d58f524cf05351f k8s.gcr.io/kube-proxy:v1.21.0 io.containerd.runc.v2

Thursday, 17 June 2021

Networking notworking on Ubuntu ?

Whilst helping a colleague dig into some issues he was seeing pulling an IBM WebSphere Liberty image from Docker Hub, I realised that my Ubuntu box was missing a couple of useful utilities, including nslookup and traceroute.

In the latter case, I did have traceroute6 but not the IP v4 equivalent.

Easily fixed ...

apt-get update && apt-get install -y dnsutils traceroute

and now we're good to go: -

which nslookup

/usr/bin/nslookup

nslookup -version

nslookup 9.16.1-Ubuntu

which traceroute

/usr/sbin/traceroute

traceroute --version

Modern traceroute for Linux, version 2.1.0

Wednesday, 16 June 2021

Aurélie Vache's Series - Some really useful learnings ...

Saw this on Twitter earlier: -

Aurélie Vache's Series

including: -

Understanding Network things: part 1 – L4 / L7 layers

Understanding Network things: part 2 – IP address & CIDR

Aurélie also has some useful Kubernetes material on YouTube, including: -

Understanding Kubernetes in a visual way - 16 - Nodes (Cordon, Drain,Taint) | Kubernetes en français

Monday, 14 June 2021

Design, build, and deploy universal application images

From IBM ( for whom I've worked for ~30 years ) : -

Building high-quality container images and their corresponding pod specifications is the foundation for Kubernetes to effectively run and manage an application in production. There are numerous ways to build images, so knowing where to start can be confusing.
This learning path introduces you to the universal application image (UAI). A UAI is an image that uses Red Hat’s Universal Base Image (UBI) as its foundation, includes the application being deployed, and also adds extra elements that make it more secure and scalable in Kubernetes and Red Hat OpenShift.
Specifically, a universal application image:
Is built from a Red Hat UBI
Can run on Kubernetes and OpenShift
Does not require any Red Hat licensing, so it’s freely distributable
Includes qualities that make it run more efficiently
Is supported by Red Hat when run in OpenShift
The articles in this learning path describes best practices for packaging an application, highlighting elements that are critical to include in designing the image, performing the build, and deploying the application.

Design, build, and deploy universal application images

Thursday, 10 June 2021

Tinkering with containerd and the ctr tool

Some notes from a recent tinkering with containerd and ctr ...

which ctr

/usr/bin/ctr

ctr version

Client:
Version: 1.4.4-0ubuntu1~20.04.2
Revision:
Go version: go1.13.8
Server:
Version: 1.4.4-0ubuntu1~20.04.2
Revision:
UUID: 47a84416-93a1-4934-b850-fecb8dddf519

Pull an image

ctr image pull docker.io/library/nginx:latest -u davidhay1969

docker.io/library/nginx:latest: resolved |++++++++++++++++++++++++++++++++++++++|
index-sha256:6d75c99af15565a301e48297fa2d121e15d80ad526f8369c526324f0f7ccb750: exists |++++++++++++++++++++++++++++++++++++++|
manifest-sha256:61191087790c31e43eb37caa10de1135b002f10c09fdda7fa8a5989db74033aa: exists |++++++++++++++++++++++++++++++++++++++|
layer-sha256:351ad75a6cfabc7f2e103963945ff803d818f0bdcf604fd2072a0eefd6674bde: exists |++++++++++++++++++++++++++++++++++++++|
layer-sha256:596b1d696923618bec6ff5376cc9aed03a3724bc75b6c03221fd877b62046d05: exists |++++++++++++++++++++++++++++++++++++++|
layer-sha256:30afc0b18f67ae8441c2d26e356693009bb8927ab7e3bce05d5ed99531c9c1d4: exists |++++++++++++++++++++++++++++++++++++++|
layer-sha256:febe5bd23e98102ed5ff64b8f5987f516a945745c08bbcf2c61a50fb6e7b2257: exists |++++++++++++++++++++++++++++++++++++++|
layer-sha256:8283eee92e2f756bd57f96ea295e332ab9031724267d4f939de1f7d19fe9611a: exists |++++++++++++++++++++++++++++++++++++++|
config-sha256:d1a364dc548d5357f0da3268c888e1971bbdb957ee3f028fe7194f1d61c6fdee: exists |++++++++++++++++++++++++++++++++++++++|
layer-sha256:69692152171afee1fd341febc390747cfca2ff302f2881d8b394e786af605696: exists |++++++++++++++++++++++++++++++++++++++|
elapsed: 1.3 s total: 0.0 B (0.0 B/s)
unpacking linux/amd64 sha256:6d75c99af15565a301e48297fa2d121e15d80ad526f8369c526324f0f7ccb750...
done

List images

ctr image list

docker.io/library/nginx:latest application/vnd.docker.distribution.manifest.list.v2+json sha256:6d75c99af15565a301e48297fa2d121e15d80ad526f8369c526324f0f7ccb750 51.3 MiB linux/386,linux/amd64,linux/arm/v5,linux/arm/v7,linux/arm64/v8,linux/mips64le,linux/ppc64le,linux/s390x -

Create a container ( in background mode via -d )

ctr run --net-host -d --rm -t docker.io/library/nginx:latest nginx

Nothing returned

List running containers

ctr container list

CONTAINER IMAGE RUNTIME
nginx docker.io/library/nginx:latest io.containerd.runc.v2

List tasks

ctr task list

TASK PID STATUS
nginx 1287661 RUNNING

List Linux processes

ps aux | grep containerd | grep -v grep

root 39604 0.8 1.6 1287024 67348 ? Ssl Jun08 18:11 /usr/bin/containerd
root 1287636 0.0 0.1 111852 7952 ? Sl 01:44 0:00 /usr/bin/containerd-shim-runc-v2 -namespace default -id nginx -address /run/containerd/containerd.sock

Inspect task

ctr task ps nginx

PID INFO
1287661 -
1287712 -
1287713 -

Attempt to remove task

ctr task delete nginx

ERRO[0000] unable to delete nginx error="task must be stopped before deletion: running: failed precondition"
ctr: task must be stopped before deletion: running: failed precondition

Attempt to remove container

ctr container delete nginx

ERRO[0000] failed to delete container "nginx" error="cannot delete a non stopped container: {running 0 0001-01-01 00:00:00 +0000 UTC}"
ctr: cannot delete a non stopped container: {running 0 0001-01-01 00:00:00 +0000 UTC}

Kill the task

ctr task kill nginx

Nothing returned

Attempt to remove task

ctr task delete nginx

Nothing returned

Attempt to remove container

ctr container delete nginx

Nothing returned

Create a container ( in foreground mode via -t with bash )

- note that the container automatically terminates, and is removed, upon exit, via the --rm remove switch

ctr run --net-host -t --rm -t docker.io/library/nginx:latest nginx sh

Inspect Nginx configuration ( inside container )

cat /etc/nginx/nginx.conf

user nginx;
worker_processes auto;
error_log /var/log/nginx/error.log notice;
pid /var/run/nginx.pid;

events {
worker_connections 1024;
}

http {
include /etc/nginx/mime.types;
default_type application/octet-stream;
log_format main '$remote_addr - $remote_user [$time_local] "$request" '
'$status $body_bytes_sent "$http_referer" '
'"$http_user_agent" "$http_x_forwarded_for"';
access_log /var/log/nginx/access.log main;
sendfile on;
#tcp_nopush on;
keepalive_timeout 65;
#gzip on;
include /etc/nginx/conf.d/*.conf;
}

Exit container

exit

Create a container ( in foreground mode via -t with bash mounting a local /k8s directory into the container as /k8s )

mkdir /k8s

echo "Hello World!" >> /k8s/greeting.txt

ctr run --net-host --mount type=bind,src=/k8s,dst=/k8s,options=rbind -t --rm -t docker.io/library/nginx:latest nginx sh

Display greeting from inside container

cat /k8s/greeting.txt

Hello World!

Exit container

exit

Wednesday, 9 June 2021

Tinkering with OpenLDAP on Docker on Ubuntu

Following a discussion with a colleague on Slack, I thought I'd remind myself how OpenLDAP works as a service running inside a container, via the Docker container runtime interface (CRI).

Using this for inspiration: -

Docker image for OpenLDAP support

I pulled the requisite image from Docker Hub: -

docker pull osixia/openldap:1.5.0 -u davidhay1969:<DOCKER TOKEN>

and created a container: -

docker run --detach -p 3389:389 osixia/openldap:1.5.0

Note that I'm using port mapping via -p 3389:389 to map the external ( host ) port of 3389 to the internal ( container ) port of 389

This allows me to run the container without needing to run it in privileged mode ( as Unix typically blocks non-root processes from listening on ports lower than 1,024 ).

Once the container was running happily: -

docker ps -a

CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES

23a39685da58 osixia/openldap:1.5.0 "/container/tool/run" 20 minutes ago Up 20 minutes 636/tcp, 0.0.0.0:3389->389/tcp agitated_mendel

55de9ae1b94a busybox "sh" 2 days ago Created nostalgic_mclean

da6a3136a33e busybox "sh" 13 days ago Created happy_swirles

I installed ldap-utils to give me the ldapsearch command: -

apt-get install -y ldap-utils

and then ran ldapsearch against the container via the mapped port: -

ldapsearch -H ldap://localhost:3389 -D cn=admin,dc=example,dc=org -w admin -b dc=example,dc=org

Note that I'm using the default credentials of admin / admin and would, of course, be changing this if this was a real-world environment .....

Bash and fun with escape characters

Whilst writing the world's simplest Bash script to: -

delete a pod
deploy a new pod
list the running pods
describe the newly deployed pod

I wanted to add newline characters into my echo statements, to make things more readable.

I've written before about echo -e so was just double-checking my understanding via the command-line ...

I entered: -

echo -e "Hello World!\n"

but, instead of a friendly greeting, I saw: -

-bash: !\n: event not found

Wait, what now ?

Yeah, of course, I've entered a magical ~~place~~ escape sequence of: -

pling slash n

which, obviously, isn't what I meant to do ....

Easy solution - stick in a space character ...

echo -e "Hello World! \n"

Hello World!

So here's the finished script: -

#!/bin/bash

clear

echo -e "Deleting existing busybox-kata pod\n"

kubectl delete pod busybox-kata

echo -e "\nDeploying new busybox-kata pod\n"

kubectl apply -f busybox-kata.yaml

echo -e "\nSleeping ...\n"

sleep 10

echo -e "\nChecking pods ...\n"

kubectl get pods

echo -e "\nSleeping ...\n"

sleep 5

echo -e "\nDescribing busybox-kata pod to see all is good ...\n"

kubectl describe pod busybox-kata | tail -8

Sunday, 6 June 2021

Doh, Kubernetes fails to run due to a lack of ...

Having started the build of a new K8s 1.21 cluster: -

kubeadm init --pod-network-cidr=10.244.0.0/16 --apiserver-advertise-address=$private_ip --apiserver-cert-extra-sans=$public_ip --kubernetes-version ${KUBE_VERSION}

I saw: -

[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s
[kubelet-check] Initial timeout of 40s passed.
Unfortunately, an error has occurred:
timed out waiting for the condition
This error is likely caused by:
- The kubelet is not running
- The kubelet is unhealthy due to a misconfiguration of the node in some way (required cgroups disabled)
If you are on a systemd-powered system, you can try to troubleshoot the error with the following commands:
- 'systemctl status kubelet'
- 'journalctl -xeu kubelet'
Additionally, a control plane component may have crashed or exited when started by the container runtime.
To troubleshoot, list all containers using your preferred container runtimes CLI.
Here is one example how you may list all Kubernetes containers running in cri-o/containerd using crictl:
- 'crictl --runtime-endpoint /run/containerd/containerd.sock ps -a | grep kube | grep -v pause'
Once you have found the failing container, you can inspect its logs with:
- 'crictl --runtime-endpoint /run/containerd/containerd.sock logs CONTAINERID'
error execution phase wait-control-plane: couldn't initialize a Kubernetes cluster
To see the stack trace of this error execute with --v=5 or higher

I did a spot of digging in the system log ( /var/log/syslog ) and found this: -

Jun 6 12:17:53 hurlinux2 containerd[39485]: time="2021-06-06T12:17:53.210883120-07:00" level=error msg="RunPodSandbox for &PodSandboxMetadata{Name:kube-apiserver-telicity1.fyre.ibm.com,Uid:599ab88dc99dd5fbdb7c6a92e4e965ba,Namespace:kube-system,Attempt:0,} failed, error" error="failed to create containerd task: failed to create shim: OCI runtime create failed: unable to retrieve OCI runtime error (open /run/containerd/io.containerd.runtime.v2.task/k8s.io/177b079efbe1b71b65586a467313d1a15e802a2b2d323a7ed974d5d2e99e33f5/log.json: no such file or directory): exec: \"runc\": executable file not found in $PATH: unknown"

Jun 6 12:17:53 hurlinux2 kubelet[41466]: E0606 12:17:53.211759 41466 remote_runtime.go:116] "RunPodSandbox from runtime service failed" err="rpc error: code = Unknown desc = failed to create containerd task: failed to create shim: OCI runtime create failed: unable to retrieve OCI runtime error (open /run/containerd/io.containerd.runtime.v2.task/k8s.io/177b079efbe1b71b65586a467313d1a15e802a2b2d323a7ed974d5d2e99e33f5/log.json: no such file or directory): exec: \"runc\": executable file not found in $PATH: unknown"

which made me think "Hmmmm, wonder what I forgot"

A quick trip to Aptitude ...

apt-get install runc

Reading package lists... Done

Building dependency tree

Reading state information... Done

The following NEW packages will be installed:

runc

0 upgraded, 1 newly installed, 0 to remove and 8 not upgraded.

Need to get 4,018 kB of archives.

After this operation, 15.7 MB of additional disk space will be used.

Get:1 http://us.archive.ubuntu.com/ubuntu focal-updates/main amd64 runc amd64 1.0.0~rc93-0ubuntu1~20.04.2 [4,018 kB]

Fetched 4,018 kB in 1s (4,037 kB/s)

Selecting previously unselected package runc.

(Reading database ... 107797 files and directories currently installed.)

Preparing to unpack .../runc_1.0.0~rc93-0ubuntu1~20.04.2_amd64.deb ...

Unpacking runc (1.0.0~rc93-0ubuntu1~20.04.2) ...

Setting up runc (1.0.0~rc93-0ubuntu1~20.04.2) ...

Processing triggers for man-db (2.9.1-1) ...

and a kubeadm reset ( to clear down the borked cluster creation ) and I was then able to re-run kubeadm init as before, and we're good to go .....

TIL: How do I find the requisite information to join a Compute Node to an existing K8s Cluster ?

Having build a K8s 1.21 cluster a week or so back, I'd removed my Compute Node with: -

kubeadm reset

( run on the Compute Node itself )

I then wanted to find the command that I'd previously used to join the Compute Node to the cluster.

Now kubeadm init generates a token etc. which lives for 24 hours.

So finding the command from hist or from documentation ain't gonna cut it.

Thankfully, we have this: -

kubeadm token create --print-join-command

which generates output such as this: -

kubeadm join 10.51.16.135:6443 --token isahtxb.nfv74gu4yxbxxq2j --discovery-token-ca-cert-hash sha256:1375f426f376b99240ed34bf952f4c026cc0afaad4adbba816187ff5bcc384b6

with which I can then join the Compute Node into the cluster, and we're back up and running ...

Thanks as ever to StackOverflow, who've got my back with: -

How do I find the join command for kubeadm on the master?

Why won't Kubernetes kubelet come up ?

After an unscheduled reboot of the VMs that host my K8s cluster, I was struggling to work out why the kubelet wasn't starting properly.

I ran systemctl start kubelet.service to start it and then checked the status with systemctl status kubelet.service which showed: -

● kubelet.service - kubelet: The Kubernetes Node Agent
Loaded: loaded (/lib/systemd/system/kubelet.service; enabled; vendor preset: enabled)
Drop-In: /etc/systemd/system/kubelet.service.d
└─10-kubeadm.conf
Active: active (running) since Sun 2021-06-06 00:35:01 PDT; 3s ago
Docs: https://kubernetes.io/docs/home/
Main PID: 82478 (kubelet)
Tasks: 7 (limit: 2279)
Memory: 14.6M
CGroup: /system.slice/kubelet.service
└─82478 /usr/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf --config=/var/lib/kubelet/conf>
Jun 06 00:35:01 garble1.domain.com systemd[1]: Started kubelet: The Kubernetes Node Agent.
Jun 06 00:35:01 garble1.domain.com kubelet[82478]: I0606 00:35:01.836881 82478 server.go:197] "Warning: For remote container runtime, --pod-infra-container-image is i>

Jun 06 00:35:01 garble1.domain.com ubelet[82478]: I0606 00:35:01.866762 82478 server.go:440] "Kubelet version" kubeletVersion="v1.21.0"
Jun 06 00:35:01 garble1.domain.com kubelet[82478]: I0606 00:35:01.867455 82478 server.go:851] "Client rotation is on, will bootstrap in background"
Jun 06 00:35:01 garble1.domain.com kubelet[82478]: I0606 00:35:01.870367 82478 certificate_store.go:130] Loading cert/key pair from "/var/lib/kubelet/pki/kubelet-clie>
Jun 06 00:35:01 garble1.domain.com kubelet[82478]: I0606 00:35:01.873004 82478 dynamic_cafile_content.go:167] Starting client-ca-bundle::/etc/kubernetes/pki/ca.crt

which looked OK.

I checked again: -

systemctl status kubelet.service

and saw: -

● kubelet.service - kubelet: The Kubernetes Node Agent
Loaded: loaded (/lib/systemd/system/kubelet.service; enabled; vendor preset: enabled)
Drop-In: /etc/systemd/system/kubelet.service.d
└─10-kubeadm.conf
Active: activating (auto-restart) (Result: exit-code) since Sun 2021-06-06 00:35:22 PDT; 8s ago
Docs: https://kubernetes.io/docs/home/
Process: 82505 ExecStart=/usr/bin/kubelet $KUBELET_KUBECONFIG_ARGS $KUBELET_CONFIG_ARGS $KUBELET_KUBEADM_ARGS $KUBELET_EXTRA_ARGS (code=exited, status=1/FAILURE)
Main PID: 82505 (code=exited, status=1/FAILURE)
Jun 06 00:35:22 garble1.domain.com systemd[1]: kubelet.service: Main process exited, code=exited, status=1/FAILURE
Jun 06 00:35:22 garble1.domain.com systemd[1]: kubelet.service: Failed with result 'exit-code'.

which looked not so good.

I then checked the syslog with: -

tail -f /var/log/syslog

and saw, amongst many other things, this: -

Jun 6 00:40:27 garble1 kubelet[83211]: E0606 00:40:27.104582 83211 server.go:292] "Failed to run kubelet" err="failed to run Kubelet: running with swap on is not supported, please disable swap! or set --fail-swap-on flag to false. /proc/swaps contained: [Filename\t\t\t\tType\t\tSize\tUsed\tPriority /swap.img file\t\t4194300\t0\t-2]"

Of course, the VMs were rebooted ... so swap is still on ....

A quick trip to swapoff with: -

swapoff -a

and we're back in the game.

kubectl get nodes

NAME STATUS ROLES AGE VERSION

garble1.domain.com Ready control-plane,master 3d13h v1.21.0

garble2.domain.com Ready <none> 3d13h v1.21.0

crictl pods

POD ID CREATED STATE NAME NAMESPACE ATTEMPT

c3969548182d6 17 seconds ago Ready calico-node-nl2g2 kube-system 0

bd06ccb126620 18 seconds ago Ready kube-proxy-ht4mq kube-system 0

5a31b04c1d01a 18 seconds ago Ready kube-scheduler-garble1.domain.com kube-system 0

ac6e59ccb87f1 25 seconds ago Ready kube-controller-manager-garble1.domain.com kube-system 0

d2ece5d26441e 35 seconds ago Ready kube-apiserver-garble1.domain.com kube-system 0

10019ac4de96d 45 seconds ago Ready etcd-garble1.domain.com kube-system 0

Wrangling Kubernetes using crictl

I needed to find a way to remove a bunch of NotReady pods from my K8s 1.21 cluster, on both the Control Plane and Compute Nodes.

Simples, use crictl

A useful StackOverflow post - Master not pods in NotReady status - gave me this: -

crictl pods|grep NotReady|cut -f1 -d" "|xargs -L 1 -I {} -t crictl rmp {}

but, of course, I had to remember how to tell crictl to look at the requisite endpoint - specifically at unix:///var/run/containerd/containerd.sock

This was easy; I created /etc/crictl.yaml

runtime-endpoint: unix:///var/run/containerd/containerd.sock
image-endpoint: unix:///var/run/containerd/containerd.sock
timeout: 10
debug: false

and was off to the races ...

Thursday, 3 June 2021

Apple Remote - tell the telly to turn the heck off !

One nice feature of the Apple TV ( I have a couple of 'em ) is that they can turn the HDMI-attached TV on when you hit the [Menu] button on the  remote ... which is nice ....

But, of course, there's no off button .....

Wait, what now ?

So how do I turn off the TV when I'm done ? Go find and use the "normal" TV remote like a cave person ?

Or.....

Hit the "Siri' button - the one with the mic logo and say something like "Please turn off" ...

And that's it - one remote to rule them all, one remote to find them ( no, that's AirTags, fool ), one remote to bring them all and, in the Dark Mode, to bind them

Wednesday, 2 June 2021

Now that I did not know - using pushd and popd to navigate the Bourne Again Shell (BASH)

Further tinkering with Kata Containers etc. led me here: -

Install cri-tools

You can install the cri-tools from source code:

$ go get github.com/kubernetes-incubator/cri-tools

$ pushd $GOPATH/src/github.com/kubernetes-incubator/cri-tools

$ make

$ sudo -E make install

$ popd

I'd seen references to pushd and popd before but decided to unleash Google fu to know a little more ....

Navigating the Bash shell with pushd and popd

Pushd and popd are the fastest navigational commands you've never heard of.

The pushd and popd commands are built-in features of the Bash shell to help you "bookmark" directories for quick navigation between locations on your hard drive. You might already feel that the terminal is an impossibly fast way to navigate your computer; in just a few key presses, you can go anywhere on your hard drive, attached storage, or network share. But that speed can break down when you find yourself going back and forth between directories, or when you get "lost" within your filesystem. Those are precisely the problems pushd and popd can help you solve.

So now I'm learning .....

Tuesday, 1 June 2021

Inspecting certificates using OpenSSL and a variant of grep

In the context of: -

Building Kubernetes on Linux on IBM Z - it's a matter of trust ...

today I learned (TIL) that one could use egrep to examine x509 certificates: -

echo | openssl s_client -connect storage.googleapis.com:443 | egrep "^subject=|^issuer="

depth=2 OU = GlobalSign Root CA - R2, O = GlobalSign, CN = GlobalSign
verify return:1
depth=1 C = US, O = Google Trust Services, CN = GTS CA 1O1
verify return:1
depth=0 C = US, ST = California, L = Mountain View, O = Google LLC, CN = *.storage.googleapis.com
verify return:1
DONE
subject=/C=US/ST=California/L=Mountain View/O=Google LLC/CN=*.storage.googleapis.com
issuer=/C=US/O=Google Trust Services/CN=GTS CA 1O1

which is good to know 😁

Building Kubernetes on Linux on IBM Z - it's a matter of trust ...

One of my colleagues saw an interesting issue when trying to build a new Kubernetes cluster on an Ubuntu Linux environment ( on IBM Z ).

For the record, we're running Kubernetes inside Ubuntu containers which are hosted, via runq, on a Secure Service Container (SSC) logical partition (LPAR). In this scenario, we're using docker as the container runtime inside the Ubuntu container which is running inside the SSC LPAR ( nested FTW ).

However, the specific issue seen when running commands such as: -

kubeadm init --pod-network-cidr=192.168.0.0/16 --ignore-preflight-errors=all

wasn't directly related to the use of runq or the SSC LPAR.

Instead, the command returned: -

[preflight] You can also perform this action in beforehand using 'kubeadm config images pull'
[WARNING ImagePull]: failed to pull image k8s.gcr.io/kube-apiserver:v1.20.7: output: Error response from daemon: Get https://k8s.gcr.io/v2/: x509: certificate signed by unknown authority
, error: exit status 1
[WARNING ImagePull]: failed to pull image k8s.gcr.io/kube-controller-manager:v1.20.7: output: Error response from daemon: Get https://k8s.gcr.io/v2/: x509: certificate signed by unknown authority
, error: exit status 1
[WARNING ImagePull]: failed to pull image k8s.gcr.io/kube-scheduler:v1.20.7: output: Error response from daemon: Get https://k8s.gcr.io/v2/: x509: certificate signed by unknown authority
, error: exit status 1
[WARNING ImagePull]: failed to pull image k8s.gcr.io/kube-proxy:v1.20.7: output: Error response from daemon: Get https://k8s.gcr.io/v2/: x509: certificate signed by unknown authority
, error: exit status 1
[WARNING ImagePull]: failed to pull image k8s.gcr.io/pause:3.2: output: Error response from daemon: Get https://k8s.gcr.io/v2/: x509: certificate signed by unknown authority
, error: exit status 1
[WARNING ImagePull]: failed to pull image k8s.gcr.io/etcd:3.4.13-0: output: Error response from daemon: Get https://k8s.gcr.io/v2/: x509: certificate signed by unknown authority
, error: exit status 1
[WARNING ImagePull]: failed to pull image k8s.gcr.io/coredns:1.7.0: output: Error response from daemon: Get https://k8s.gcr.io/v2/: x509: certificate signed by unknown authority
, error: exit status 1

Interestingly, an article on LinkedIn led me to the solution: -

Fixing the output: Error response from daemon: Get https://k8s.gcr.io/v2/: x509: certificate signed by unknown authority

In part, the author had me reload/restart the Docker service: -

service docker reload

service docker restart

Prior to this, I'd also checked some of the missing pre-requisite steps: -

apt-get install -y ca-certificates gnupg2 curl apt-transport-https

curl -s https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo apt-key add -

plus opening up some firewall ports, using iptables : -

iptables -A INPUT -p tcp -m tcp --dport 6443 -j ACCEPT

iptables -A INPUT -p tcp -m tcp --dport 10250 -j ACCEPT

However, I suspect that the combination of: -

apt-get install -y ca-certificates gnupg2 curl apt-transport-https

and: -

service docker reload

service docker restart

did the trick.

One nice thing that I learned ( TIL ) was: -

kubeadm config images pull

to test whether kubeadm can get it's required image before starting the init process: -

I0601 10:02:47.536650 25480 version.go:251] remote version is much newer: v1.21.1; falling back to: stable-1.20

[config/images] Pulled k8s.gcr.io/kube-apiserver:v1.20.7

[config/images] Pulled k8s.gcr.io/kube-controller-manager:v1.20.7

[config/images] Pulled k8s.gcr.io/kube-scheduler:v1.20.7

[config/images] Pulled k8s.gcr.io/kube-proxy:v1.20.7

[config/images] Pulled k8s.gcr.io/pause:3.2

[config/images] Pulled k8s.gcr.io/etcd:3.4.13-0

[config/images] Pulled k8s.gcr.io/coredns:1.7.0

Having pulled the images normally, without any trust exceptions, kubeadm init ran happily .....

Kata Containers and Ubuntu Linux - lessons learned - 4/many

Building on the series: -

Kata Containers and Ubuntu Linux - lessons learned - 1/many

Kata Containers and Ubuntu Linux - lessons learned - 2/many

Kata Containers and Ubuntu Linux - lessons learned - 3/many - a WIP

I've also had some fun n' games trying to build various components of Kata Containers 2.0 under Ubuntu, including the kernel that's used within the guest Virtual Machine (VM) ...

Problem

Building the kernel: -

sudo ./build-kernel.sh build

fails with: -

Similarly: -

sudo ./build-kernel.sh setup

fails with: -

/home/hayd/go/src/github.com/kata-containers/kata-containers/tools/packaging/kernel/../scripts/lib.sh: line 25: go: command not found
~/go/src/github.com/kata-containers/tests /home/hayd/go/src/github.com/kata-containers/kata-containers/tools/packaging/kernel
/home/hayd/go/src/github.com/kata-containers/kata-containers/tools/packaging/kernel
INFO: Config version: 85
INFO: Kernel version: 5.10.25
INFO: kernel path does not exist, will download kernel
linux-5.10.25.tar.xz: OK
INFO: kernel tarball already downloaded
linux-5.10.25.tar.xz: OK
INFO: Apply patches from /home/hayd/go/src/github.com/kata-containers/kata-containers/tools/packaging/kernel/patches/5.10.x
INFO: Found 2 patches
INFO: Apply /home/hayd/go/src/github.com/kata-containers/kata-containers/tools/packaging/kernel/patches/5.10.x/0001-arm64-mmu-compared-with-linear-start-physical-addres.patch
INFO: Apply /home/hayd/go/src/github.com/kata-containers/kata-containers/tools/packaging/kernel/patches/5.10.x/0001-NO-UPSTREAM-9P-always-use-cached-inode-to-fill-in-v9.patch
INFO: Constructing config from fragments: /home/hayd/go/src/github.com/kata-containers/kata-containers/tools/packaging/kernel/configs/fragments/x86_64/.config
/bin/sh: 1: flex: not found
make[1]: *** [scripts/Makefile.host:9: scripts/kconfig/lexer.lex.c] Error 127
make: *** [Makefile:602: allnoconfig] Error 2

In both cases, the problems is missing pre-requisites ...

Solution

Install the missing prereqs e.g. flex, bison etc.

apt-get update && apt-get --with-new-pkgs upgrade -y

apt-get install -y docker.io make flex bison libelf-dev

etc.

Kata Containers and Ubuntu Linux - lessons learned - 3/many - a WIP

Following on from: -

Kata Containers and Ubuntu Linux - lessons learned - 1/many

and: -

Kata Containers and Ubuntu Linux - lessons learned - 2/many

here's one I've yet to solve ....

Having overcome the earlier issues with Kata Containers 2.0 on Ubuntu Linux 20.04 running under VMware Fusion 12 on macOS 11 ( phew ), I'm hitting a similar/different issue ...

Problem

Starting a container using Kata 2: -

sudo ctr run --rm --tty --runtime io.containerd.kata.v2 docker.io/library/ubuntu:latest ubuntu

returns: -

[sudo] password for hayd:
ctr: failed to create shim: failed to launch qemu: exit status 1, error messages from qemu log: qemu-system-x86_64: error: failed to set MSR 0x48d to 0x5600000016
qemu-system-x86_64: ../target/i386/kvm.c:2701: kvm_buf_set_msrs: Assertion `ret == cpu->kvm_msr_buf->nmsrs' failed.
: unknown

From a fair amount of Google fu, I'm thinking that this is a nested virtualisation issue i.e. I'm using a nice stack of software here ....

Top

macOS 11.4 Big Sur

VMware Fusion 12.1.2

Ubuntu 20.04.2

kernel 5.4.0-73-generic

containerd 1.5.2

kata-runtime 2.1.0

qemu 4.2-3ubuntu6.16

Bottom

I've not yet got to the bottom of this ...... watch this space ....

Kata Containers and Ubuntu Linux - lessons learned - 2/many

Following on from Kata Containers and Ubuntu Linux - lessons learned - 1/many here's another one ...

Testing the Kata Containers 2.0 runtime environment was failing, again using Ubuntu 20.04 on VMware Fusion on macOS 11 ...

Problem

sudo kata-runtime kata-check

returns: -

WARN[0000] Not running network checks as super user arch=amd64 name=kata-runtime pid=1283 source=runtime
ERRO[0000] CPU property not found arch=amd64 description="Virtualization support" name=vmx pid=1283 source=runtime type=flag
WARN[0000] modprobe insert module failed arch=amd64 error="exit status 1" module=kvm_intel name=kata-runtime output="modprobe: ERROR: could not insert 'kvm_intel': Operation not supported\n" pid=1283 source=runtime
ERRO[0000] kernel property not found arch=amd64 description="Intel KVM" name=kvm_intel pid=1283 source=runtime type=module
ERRO[0000] ERROR: System is not capable of running Kata Containers arch=amd64 name=kata-runtime pid=1283 source=runtime
ERROR: System is not capable of running Kata Containers

Solution

Update the VM configuration to support virtualisation via "Enable hypervisor applications in this virtual machine"

Kata Containers and Ubuntu Linux - lessons learned - 1/many

This is the first of a few consecutive posts about my recent experiences with Kata Containers and Ubuntu, running on various platforms including my Mac ( via VMware Fusion 12 ).

I'm building up a list of "lessons learned" here, so that I can come back and find them when I need them ....

Firstly, having installed Kata 2.0 as the underlying container runtime for containerd I was testing the container creation process on an Ubuntu 20.04 VM running under Fusion on my Mac ....

Problem

sudo ctr run --rm --tty --runtime io.containerd.kata.v2 docker.io/library/ubuntu:latest ubuntu

returns: -

ctr: open /dev/vhost-vsock: no such device: unknown

Debugging with: -

sudo modprobe /dev/vhost-vsock

threw up: -

modprobe: FATAL: Module /dev/vhost-vsock not found in directory /lib/modules/5.4.0-73-generic

Solution

So the default VMware Tools equivalent ( open-vm-tools ) was getting in the way: -

Stop the Service

sudo service open-vm-tools stop

Unload the module that's grabbing vsock

sudo modprobe -r vmw_vsock_vmci_transport

Load the vhost_vsock module

sudo modprobe /dev/vhost-vsock

Job done

Alternative

Uninstall open-vm-tools

sudo apt-get remove --auto-remove open-vm-tools

More to follow ....

A Portal to a Portal

Monday, 28 June 2021

Docker Content Trust and the Case of the PEBCAK

Wednesday, 23 June 2021

More fun with containerd and the ctr tool

Thursday, 17 June 2021

Networking notworking on Ubuntu ?

Wednesday, 16 June 2021

Aurélie Vache's Series - Some really useful learnings ...

Monday, 14 June 2021

Design, build, and deploy universal application images

Thursday, 10 June 2021

Tinkering with containerd and the ctr tool

Wednesday, 9 June 2021

Tinkering with OpenLDAP on Docker on Ubuntu

Bash and fun with escape characters

Sunday, 6 June 2021

Doh, Kubernetes fails to run due to a lack of ...

TIL: How do I find the requisite information to join a Compute Node to an existing K8s Cluster ?

Why won't Kubernetes kubelet come up ?

Wrangling Kubernetes using crictl

Thursday, 3 June 2021

Apple Remote - tell the telly to turn the heck off !

Wednesday, 2 June 2021

Now that I did not know - using pushd and popd to navigate the Bourne Again Shell (BASH)

Tuesday, 1 June 2021

Inspecting certificates using OpenSSL and a variant of grep

Building Kubernetes on Linux on IBM Z - it's a matter of trust ...

Kata Containers and Ubuntu Linux - lessons learned - 4/many

Kata Containers and Ubuntu Linux - lessons learned - 3/many - a WIP

Kata Containers and Ubuntu Linux - lessons learned - 2/many

Kata Containers and Ubuntu Linux - lessons learned - 1/many

Note to self - Firefox and local connections

Search This Blog