Wednesday, 28 July 2021

Fun and games with sed and unterminated commands in Jenkins

So it took me ~3 hours to fix a Bug that should've taken ~10 minutes ...

I was trying to mitigate an issue with one of our Alpine Linux-based images, where our IBM Container Registry (ICR) Vulnerability Advisor (VA) tool was (rightly) complaining about our exposure to CVE-2021-36159 with apk-tools.

I knew that the mitigation was to update the Dockerfile to ensure that this package was updated.

However, given that I'm building from someone else's GH project, where I don't control the Dockerfile per se, I wanted to have my Jenkins Pipeline job amend the Dockerfile "in flight".

So the Dockerfile had the line: -

FROM alpine:3.13 AS run

so I used a bit of sed sweetness to add: -

RUN apk --no-cache upgrade apk-tools

How hard can it be ?

I even tested it using Bash: -

echo "RUN apk --no-cache add procps" > /tmp/foo.txt

cat /tmp/foo.txt

RUN apk --no-cache add procps

sed -i 's/RUN apk --no-cache add procps/RUN apk --no-cache add procps\nRUN apk --no-cache upgrade apk-tools/g' /tmp/foo.txt

cat /tmp/foo.txt

RUN apk --no-cache add procps
RUN apk --no-cache upgrade apk-tools

Easy right ?

Nah, not with Bash embedded in Groovy via a Jenkinsfile ...

Each and every time I ran my Pipeline, the sed command threw up: -

10:45:38  sed: -e expression #1, char 61: unterminated `s' command

etc.

I tried various different incarnations .... with different separators, including / and ; but to no avail.

The internet had the answer, as per usual ....


specifically this: -

2. Insert lines using Regular expression

which provided the following example: -

sed '/PATTERN/ i <LINE-TO-BE-ADDED>' FILE.txt

I tested this manually: -

echo "RUN apk --no-cache add procps" > /tmp/foo.txt

cat /tmp/foo.txt

RUN apk --no-cache add procps

sed -i "/RUN apk --no-cache add procps/a RUN apk --no-cache upgrade apk-tools" /tmp/foo.txt

cat /tmp/foo.txt

RUN apk --no-cache add procps
RUN apk --no-cache upgrade apk-tools

And, better still, it worked within Jenkins: -

11:42:04  Step 14/29 : RUN apk --no-cache add procps
11:42:04   ---> Running in 9505a71400bb
11:42:04  fetch https://dl-cdn.alpinelinux.org/alpine/v3.13/main/s390x/APKINDEX.tar.gz
11:42:04  fetch https://dl-cdn.alpinelinux.org/alpine/v3.13/community/s390x/APKINDEX.tar.gz
11:42:04  (1/5) Installing libintl (0.20.2-r2)
11:42:04  (2/5) Installing ncurses-terminfo-base (6.2_p20210109-r0)
11:42:04  (3/5) Installing ncurses-libs (6.2_p20210109-r0)
11:42:04  (4/5) Installing libproc (3.3.16-r0)
11:42:04  (5/5) Installing procps (3.3.16-r0)
11:42:04  Executing busybox-1.32.1-r6.trigger
11:42:04  OK: 7 MiB in 19 packages

11:42:05  Removing intermediate container 9505a71400bb
11:42:05   ---> a4947b0b1d8d
11:42:05  Step 15/29 : RUN apk --no-cache upgrade apk-tools

which is nice :-)

I've raised an issue with the original project's GH repo, as it'd be better to get apk-tools upgraded at "source" so to speak, but I'm rather happy with my experience - every day is, indeed, a school day


Tuesday, 27 July 2021

Fun and games with sudo and Go in Ubuntu 20.04

Whilst reviewing a colleague's documentation, where he was describing how to code in Go, I noticed a discrepancy in the way that things work when one runs a command as a non-root user vs. running as root, via Super User Do ( sudo )

Having downloaded / unpacked Go thusly: -

sudo wget -c https://golang.org/dl/go1.16.6.linux-amd64.tar.gz -O - | sudo tar -xz -C /usr/local

resulting in Go being installed in /usr/local/go with the actual go binary being in /usr/local/go/bin

ls -al /usr/local/go/bin

total 17128
drwxr-xr-x  2 root root     4096 Jul 12 20:04 .
drwxr-xr-x 10 root root     4096 Jul 12 20:01 ..
-rwxr-xr-x  1 root root 14072999 Jul 12 20:04 go
-rwxr-xr-x  1 root root  3453176 Jul 12 20:04 gofmt

we needed to run a build ( of containerd ) as root, via sudo make and sudo make install

However, this failed: -

cd ~/containerd

sudo make

+ bin/ctr

/bin/sh: 1: go: not found

make: *** [Makefile:219: bin/ctr] Error 127

even though I'd confirmed that go was installed: -

which go

/usr/local/go/bin/go

ls -al `which go`

-rwxr-xr-x 1 root root 14072999 Jul 12 20:04 /usr/local/go/bin/go

go version

go version go1.16.6 linux/amd64

as I'd previously added it to my PATH via this line in ~/.profile : -

export PATH=$PATH:/usr/local/go/bin

Of course, the wrinkle was that I'm running go as root, via sudo ....

This totally helped: -

bash profile works for user but not sudo

specifically the answer that had me validate the PATH in both cases: -

echo 'echo $PATH' | sh

/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin:/usr/local/go/bin

echo 'echo $PATH' | sudo sh 

/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/snap/bin

Note the difference .... ( apart from /usr/games of course ) ?

The answer then had me alias sudo in ~/.bashrc as follows: -

alias sudo='sudo env PATH=$PATH'

validated with: -

alias

alias alert='notify-send --urgency=low -i "$([ $? = 0 ] && echo terminal || echo error)" "$(history|tail -n1|sed -e '\''s/^\s*[0-9]\+\s*//;s/[;&|]\s*alert$//'\'')"'
alias egrep='egrep --color=auto'
alias fgrep='fgrep --color=auto'
alias grep='grep --color=auto'
alias hist='history | cut -c 8-'
alias l='ls -CF'
alias la='ls -A'
alias ll='ls -alF'
alias ls='ls --color=auto'
alias sudo='sudo env PATH=$PATH'

and now the validation works: -

echo 'echo $PATH' | sh

/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin:/usr/local/go/bin

echo 'echo $PATH' | sudo sh 

/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin:/usr/local/go/bin

and, better still, the build works: -

cd ~/containerd/

sudo make
+ bin/ctr
+ bin/containerd
+ bin/containerd-stress
+ bin/containerd-shim
+ bin/containerd-shim-runc-v1
+ bin/containerd-shim-runc-v2
+ binaries

sudo make install

+ install bin/ctr bin/containerd bin/containerd-stress bin/containerd-shim bin/containerd-shim-runc-v1 bin/containerd-shim-runc-v2

which is nice

Also, make me a sandwich .....

Sunday, 25 July 2021

Nesting VMs - not quite as cosy as it sounds....

I wrote about this a few months back: -

Kata Containers and Ubuntu Linux - lessons learned - 3/many - a WIP

in the context of VM nesting being a pain ....

For context, I'm trying ( and failing ) to get Kata Containers fully running inside an Ubuntu VM running on VMware Fusion on my Mac.

This is what I currently have: -

Host: macOS 11.5 Big Sur

Virtualisation: VMware Fusion 12.1.2

Guest: Ubuntu 20.04.2 LTS

Kernel: 5.4.0-80-generic #90-Ubuntu SMP

kata-runtime  : 2.1.0

QEMU: 5.2.0

I've been experimenting with various container runtimes here, including containerd and CRI-O 

However, each and every time I'm hitting the same nested virtualisation issue

Most recently, when I try and use crictl and runp to start a container using the Kata 2.0 runtime: -

sudo crictl runp test/testdata/sandbox_config.json

FATA[0002] run pod sandbox: rpc error: code = Unknown desc = CreateContainer failed: failed to launch qemu: exit status 1, error messages from qemu log: qemu-system-x86_64: error: failed to set MSR 0x48d to 0x5600000016
qemu-system-x86_64: ../target/i386/kvm.c:2701: kvm_buf_set_msrs: Assertion `ret == cpu->kvm_msr_buf->nmsrs' failed.
: unknown 

I've tried various hacks to mitigate this but I just cannot get past it ...

More digging is required, or this combination is a bust - thankfully I have many other options, including IBM Cloud Virtual Servers.......

More to come ....

For a future me - remembering to stop VMware Tools service when running Kata Containers on Ubuntu on VMware on macOS

I keep writing and re-writing this script, because I forgot to store it somewhere memorable ....

The problem to be solved is that the Kata Containers runtime, aka kata-runtime, will not start unless I remember to stop the VMware Tools service, aka open-vm-tools.service and unload a few dependent modules.

At present, I'm running Kata via a snap installation - I'm not terribly keen on that approach, as there seem to be too many fiddly configuration things to-do, but I'll look at that another day, and perhaps raise an issue in the Kata repo.

Meantime, this is what I have: -

/snap/kata-containers/current/usr/bin/kata-runtime -version

kata-runtime  : 2.1.0

   commit   : 0f822919268e4095dd9bdbbb2351248b53746501

   OCI specs: 1.0.1-dev

and, when I run the check tool: -

/snap/kata-containers/current/usr/bin/kata-runtime check

I get this: -

WARN[0000] Not running network checks as super user      arch=amd64 name=kata-runtime pid=1431 source=runtime
WARN[0000] modprobe insert module failed                 arch=amd64 error="exit status 1" module=vhost_vsock name=kata-runtime output="modprobe: ERROR: could not insert 'vhost_vsock': Device or resource busy\n" pid=1431 source=runtime
ERRO[0000] kernel property not found                     arch=amd64 description="Host Support for Linux VM Sockets" name=vhost_vsock pid=1431 source=runtime type=module
System is capable of running Kata Containers
System can currently create Kata Containers

The vhost_vsock module cannot be loaded, as evidenced by this: -

modprobe vhost_vsock

modprobe: ERROR: could not insert 'vhost_vsock': Device or resource busy

lsmod | grep vhost

vhost_net              32768  0
tap                    24576  1 vhost_net
vhost                  49152  1 vhost_net

Somewhere I realised / discovered / found out that VMware Tools, more specifically open-vm-tools, was responsible :-)

Given that, for my use cases at the moment, I don't need VMware Tools for e.g. GUI interactions between guest and host, shared folders etc. I chose to simply disable it.

I've got a script for that: -

~/stop_tools.sh 

#!/bin/bash
systemctl stop open-vm-tools.service
rmmod vmw_vsock_virtio_transport_common
rmmod vmw_vsock_vmci_transport
rmmod vsock
rmmod vhost_net
rmmod vhost

So I can run my script: -

~/stop_tools.sh 

and check that neither vhost* or vsock* modules are now loaded: -

lsmod | grep vs

<NOTHING RETURNED>

lsmod | grep vh

<NOTHING RETURNED>

and then load the vhost_vsock module: -

modprobe vhost_vsock

and check the loaded modules: -

lsmod | grep vs

vhost_vsock            24576  0
vmw_vsock_virtio_transport_common    32768  1 vhost_vsock
vhost                  49152  1 vhost_vsock
vsock                  36864  2 vmw_vsock_virtio_transport_common,vhost_vsock

lsmod | grep vh

vhost_vsock            24576  0
vmw_vsock_virtio_transport_common    32768  1 vhost_vsock
vhost                  49152  1 vhost_vsock
vsock                  36864  2 vmw_vsock_virtio_transport_common,vhost_vsock

and, even more importantly, use Kata: -

/snap/kata-containers/current/usr/bin/kata-runtime check

WARN[0000] Not running network checks as super user      arch=amd64 name=kata-runtime pid=1911 source=runtime
System is capable of running Kata Containers
System can currently create Kata Containers

/snap/kata-containers/current/usr/bin/kata-runtime kata-check

WARN[0000] Not running network checks as super user      arch=amd64 name=kata-runtime pid=1926 source=runtime
System is capable of running Kata Containers
System can currently create Kata Containers

I could probably hack a better solution and/or uninstall VMware Tools but .....

Friday, 23 July 2021

A reprise - growing disks in Ubuntu

 I've written this many times over the years, with many different Linux distributions, but had a need to REDO FROM START yesterday, growing the disk within an Ubuntu VM running on VMware Fusion.

So, for my future self, this is what worked for me ( using Ubuntu 20.04.02 LTS )

Look at the current disk

df -kmh /

Filesystem                         Size  Used Avail Use% Mounted on

/dev/mapper/ubuntu--vg-ubuntu--lv   19G  8.5G  9.2G  49% /

Increase VM from 20 to 50 GB

This is easily done via VMware Fusion: -


I grew the disk from 20 GB to 50 GB whilst the VM was shutdown, and then booted it up.

Inspect the current partition layout

fdisk /dev/sda -l

GPT PMBR size mismatch (41943039 != 104857599) will be corrected by write.

Disk /dev/sda: 50 GiB, 53687091200 bytes, 104857600 sectors

Disk model: VMware Virtual S

Units: sectors of 1 * 512 = 512 bytes

Sector size (logical/physical): 512 bytes / 512 bytes

I/O size (minimum/optimal): 512 bytes / 512 bytes

Disklabel type: gpt

Disk identifier: DA7D21AD-C6E5-4549-B753-391F2EB88C8E


Device       Start      End  Sectors Size Type

/dev/sda1     2048     4095     2048   1M BIOS boot

/dev/sda2     4096  2101247  2097152   1G Linux filesystem

/dev/sda3  2101248 41940991 39839744  19G Linux filesystem

Create a new partition

I did this via CLI, but could've quite easily used fdisk interactively. Also, I took the defaults on start, end, sectors, partition type etc.

(

echo "n"

echo -e "\n"

echo -e "\n"

echo -e "\n"

echo "w"

) | fdisk /dev/sda

Inspect the new partition layout

fdisk /dev/sda -l

Disk /dev/sda: 50 GiB, 53687091200 bytes, 104857600 sectors

Disk model: VMware Virtual S

Units: sectors of 1 * 512 = 512 bytes

Sector size (logical/physical): 512 bytes / 512 bytes

I/O size (minimum/optimal): 512 bytes / 512 bytes

Disklabel type: gpt

Disk identifier: DA7D21AD-C6E5-4549-B753-391F2EB88C8E


Device        Start       End  Sectors Size Type

/dev/sda1      2048      4095     2048   1M BIOS boot

/dev/sda2      4096   2101247  2097152   1G Linux filesystem

/dev/sda3   2101248  41940991 39839744  19G Linux filesystem

/dev/sda4  41940992 104857566 62916575  30G Linux filesystem

Create a Physical Volume using the new sda4 partition

pvcreate /dev/sda4

  Physical volume "/dev/sda4" successfully created.

Extend the existing Volume Group to use the new sda4 partition  

vgextend /dev/ubuntu-vg/ /dev/sda4

  Volume group "ubuntu-vg" successfully extended

Extend the Logical Volume to fit

- Note that I deliberately chose the value of 29G to fit inside the newly added 30GB of disk

lvextend -L +29G /dev/ubuntu-vg/ubuntu-lv

  Size of logical volume ubuntu-vg/ubuntu-lv changed from <19.00 GiB (4863 extents) to <48.00 GiB (12287 extents).

  Logical volume ubuntu-vg/ubuntu-lv successfully resized.

Resize the file-system to fit  

resize2fs /dev/mapper/ubuntu--vg-ubuntu--lv

resize2fs 1.45.5 (07-Jan-2020)

Filesystem at /dev/mapper/ubuntu--vg-ubuntu--lv is mounted on /; on-line resizing required

old_desc_blocks = 3, new_desc_blocks = 6

The filesystem on /dev/mapper/ubuntu--vg-ubuntu--lv is now 12581888 (4k) blocks long.

Look at the current disk

df -kmh /

Filesystem                         Size  Used Avail Use% Mounted on

/dev/mapper/ubuntu--vg-ubuntu--lv   48G  8.5G   37G  19% /

Celebrate !


Thursday, 15 July 2021

Gah, problems building Kata Containers but totally self-inflicted

 Whilst trying to build the kernel using Kata Containers, I kept hitting a problem which appeared to be with the make command.

As an example: -

./build-kernel.sh setup

/root/go/github.com/src/tools/packaging/kernel/../scripts/lib.sh: line 30: pushd: /root/go/src/github.com/kata-containers/tests: No such file or directory
/root/go/github.com/src/tools/packaging/kernel/../scripts/lib.sh: line 31: .ci/install_yq.sh: No such file or directory
/root/go/github.com/src/tools/packaging/kernel/../scripts/lib.sh: line 32: popd: directory stack empty
INFO: Config version: 85
INFO: Kernel version: 5.10.25
INFO: /root/go/github.com/src/tools/packaging/kernel/kata-linux-5.10.25-85 already exist
Kernel source ready: /root/go/github.com/src/tools/packaging/kernel/kata-linux-5.10.25-85

I assumed that this was a missing pre-requisite on my part, harking back to a previous post: -

Kata Containers and Ubuntu Linux - lessons learned - 4/many

Well, kinda but not quite ...

I'm not 100% sure what led me to the realisation but I did finally notice this in the above message: -

/root/go/src/github.com/kata-containers/tests

'cos I'd (inadvertently) cloned the Kata Containers repo into ... 

/root/go/github.com/src

Not sure why but I did ...

The developer guide does make it clear: -

$ go get -d -u github.com/kata-containers/kata-containers
$ cd $GOPATH/src/github.com/kata-containers/kata-containers/src/runtime
$ make && sudo -E PATH=$PATH make install

Build and install the Kata Containers runtime

Once I did it properly: -

git clone git@github.com:kata-containers/kata-containers.git $GOPATH/src/github.com

cd $GOPATH/src/github.com/tools/packaging/kernel

./build-kernel.sh setup

./build-kernel.sh build

all was well, funnily enough

๐Ÿ˜ฎ‍๐Ÿ’จ๐Ÿ˜ฎ‍๐Ÿ’จ๐Ÿ˜ฎ‍๐Ÿ’จ๐Ÿ˜ฎ‍๐Ÿ’จ๐Ÿ˜ฎ‍๐Ÿ’จ๐Ÿ˜ฎ‍๐Ÿ’จ

Saturday, 10 July 2021

Following up - arrays and string munging in Rust

 Following up on And here we go - more Rust By Example I had some fun munging a string into elements of an array in Rust.

This is with what I ended up: -

fn main() {
    let image_string: &str = "docker.io/davidhay1969/hello-world-nginx:latest";
    let mut image_array: [&str; 4] = ["registry", "namespace", "repository", "tag"];
    let mut index = 0;
    for _part in image_string.split(&['/', ':'][..]) {
        image_array[index] = _part;
        index = index + 1;
    }
    for _i in 0..image_array.len() {
        println!("Index {} value {}",_i,image_array[_i]);
    }
}

and this is how it looks when I run it: -

cargo run

    Finished dev [unoptimized + debuginfo] target(s) in 0.00s
     Running `target/debug/hello_world`
Index 0 value docker.io
Index 1 value davidhay1969
Index 2 value hello-world-nginx
Index 3 value latest

noting that I'm doing this in Microsoft Visual Studio Code 

I did get some useful inspiration for this from, amongst others: -






Now to add this functionality into my work with Kata Containers ..... YAY!

Monday, 5 July 2021

And here we go - more Rust By Example

Following on from my earlier post: -

Learning Rust - because Kata

here's a simpler version of my function: -

fn main() {

    for part in "docker.io/davidhay1969/hello-world-nginx:latest".split(&['/', ':'][..]) {

        println!("{}", part);

    }

}

where I'm splitting the container image name: -

docker.io/davidhay1969/hello-world-nginx:latest

So when I create this on my Mac: -

vi hello.rs

fn main() {

    for part in "docker.io/davidhay1969/hello-world-nginx:latest".split(&['/', ':'][..]) {

        println!("{}", part);

    }

}

and compile it: -

rustc hello.rs

and run it: -

./hello

docker.io

davidhay1969

hello-world-nginx

latest

Nice

PS Thanks to How can I split a string (String or &str) on more than one delimiter?

Learning Rust - because Kata

As per previous posts here, I've been working with Kata Containers this past few months, and, as such, have been tinkering with Rust because a core part of the Kata project, namely the kata-agent is written in that particular new ( to me ) language.

Right now, I'm looking at a mechanism to parse container image names, separating out the registry, namespace, repository and tag.

Therefore, I needed to find a way to parse a string e.g. docker.io/davidhay1969/hello-world-nginx:latest into a series of individual strings: -

docker.io

davidhay1969

hello-world-nginx

latest

I started with dot net perls which introduced me to the split() function, and then proceeded from there ...

I then found the Rust By Example site, which allowed me to test out my learnings interactively ...

I started with, of course, Hello World! 

// This is a comment, and is ignored by the compiler
// You can test this code by clicking the "Run" button over there ->
// or if you prefer to use your keyboard, you can use the "Ctrl + Enter" shortcut
// This code is editable, feel free to hack it!
// You can always return to the original code by clicking the "Reset" button ->
// This is the main function
fn main() {
    // Statements here are executed when the compiled binary is called
    // Print text to the console
    let full_name="Dave;Hay";
    let names = full_name.split(';');
    print!("Hello ");
    for n in names {
        print!("{} ",n);
    }
}

One nice thing is that I can make AND test changes within the web page, rather than having to code/test/code/test in, say, Visual Studio Code.

When I click 'Run', here we go ...

Hello Dave Hay 

and then amended it to meet my specific requirement wrt container images: -

// This is a comment, and is ignored by the compiler
// You can test this code by clicking the "Run" button over there ->
// or if you prefer to use your keyboard, you can use the "Ctrl + Enter" shortcut

// This code is editable, feel free to hack it!
// You can always return to the original code by clicking the "Reset" button ->

// This is the main function
fn main() {
    // Statements here are executed when the compiled binary is called

    // Print text to the console
    let full_container_image="docker.io/davidhay1969/hello-world-nginx:latest";
    let names = full_container_image.split('/');
    println!("The image is :-");
    for n in names {
        println!("{} ",n);
    }
}

One nice thing is that I can make AND test changes within the web page, rather than having to code/test/code/test in, say, Visual Studio Code.

When I click 'Run', here we go ...

The image is :-
docker.io 
davidhay1969 
hello-world-nginx:latest 

Obviously, I need to add some logic to handle the semi-colon ( ; ) between the repository and tag, but that's easy enough to do ....

Sunday, 4 July 2021

Fun n' games breaking, and then fixing, containerd

By virtue of the fact that I was hacking around with my Kubernetes 1.21 / containerd 1.44 / Kata 2.0 environment today, I managed to rather impressively break the containerd runtime.

In even better news, I managed to fix it again, albeit with a lot of help from my friends - well, to be specific, from GitHub ....

I think I managed to break things by replacing a pair of binaries underlying containerd and Kata : -

containerd-shim-kata-v2

kata-runtime

whilst I had a pod running using the Kata runtime ...

By the time I realised I'd broken things, it was kinda too late ....

The containerd runtime refused to ... well, run, and things weren't looking too good for me - I was even considering cluster rebuild time .....

I'd tried: -

systemctl stop containerd.service

systemctl start containerd.service

systemctl stop kubelet.service

systemctl start kubelet.service

and even a reboot, but no dice....

Whilst debugging, I did check the containerd service: -

systemctl status containerd.service --no-pager --full

● containerd.service - containerd container runtime
     Loaded: loaded (/lib/systemd/system/containerd.service; enabled; vendor preset: enabled)
     Active: activating (start-pre) since Sun 2021-07-04 01:04:39 PDT; 1ms ago
       Docs: https://containerd.io
Cntrl PID: 6922 ((modprobe))
      Tasks: 0
     Memory: 0B
     CGroup: /system.slice/containerd.service
             └─6922 (modprobe)
Jul 04 01:04:39 sideling2.fyre.ibm.com systemd[1]: Stopped containerd container runtime.
Jul 04 01:04:39 sideling2.fyre.ibm.com systemd[1]: Starting containerd container runtime...

systemctl status containerd.service --no-pager --full

● containerd.service - containerd container runtime
     Loaded: loaded (/lib/systemd/system/containerd.service; enabled; vendor preset: enabled)
     Active: activating (auto-restart) (Result: exit-code) since Sun 2021-07-04 01:04:39 PDT; 2s ago
       Docs: https://containerd.io
    Process: 6922 ExecStartPre=/sbin/modprobe overlay (code=exited, status=0/SUCCESS)
    Process: 6929 ExecStart=/usr/bin/containerd (code=exited, status=1/FAILURE)
   Main PID: 6929 (code=exited, status=1/FAILURE)

as well as the kubelet service: -

systemctl status kubelet.service --no-pager --full

   ● kubelet.service - kubelet: The Kubernetes Node Agent
        Loaded: loaded (/lib/systemd/system/kubelet.service; enabled; vendor preset: enabled)
       Drop-In: /etc/systemd/system/kubelet.service.d
                └─10-kubeadm.conf
        Active: active (running) since Sun 2021-07-04 01:18:53 PDT; 4s ago
          Docs: https://kubernetes.io/docs/home/
      Main PID: 11243 (kubelet)
         Tasks: 7 (limit: 4616)
        Memory: 15.0M
        CGroup: /system.slice/kubelet.service
                └─11243 /usr/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf --config=/var/lib/kubelet/config.yaml --container-runtime=remote --container-runtime-endpoint=/run/containerd/containerd.sock --pod-infra-container-image=k8s.gcr.io/pause:3.4.1

   Jul 04 01:18:53 sideling2.fyre.ibm.com systemd[1]: Started kubelet: The Kubernetes Node Agent.
   Jul 04 01:18:54 sideling2.fyre.ibm.com kubelet[11243]: I0704 01:18:54.014459   11243 server.go:197] "Warning: For remote container runtime, --pod-infra-container-image is ignored in kubelet, which should be set in that remote runtime instead"
   Jul 04 01:18:54 sideling2.fyre.ibm.com kubelet[11243]: I0704 01:18:54.044083   11243 server.go:440] "Kubelet version" kubeletVersion="v1.21.0"
   Jul 04 01:18:54 sideling2.fyre.ibm.com kubelet[11243]: I0704 01:18:54.044870   11243 server.go:851] "Client rotation is on, will bootstrap in background"
   Jul 04 01:18:54 sideling2.fyre.ibm.com kubelet[11243]: I0704 01:18:54.062425   11243 certificate_store.go:130] Loading cert/key pair from "/var/lib/kubelet/pki/kubelet-client-current.pem".
   Jul 04 01:18:54 sideling2.fyre.ibm.com kubelet[11243]: I0704 01:18:54.064001   11243 dynamic_cafile_content.go:167] Starting client-ca-bundle::/etc/kubernetes/pki/ca.crt
   root@sideling2:~# systemctl status kubelet.service --no-pager --full
   ● kubelet.service - kubelet: The Kubernetes Node Agent
        Loaded: loaded (/lib/systemd/system/kubelet.service; enabled; vendor preset: enabled)
       Drop-In: /etc/systemd/system/kubelet.service.d
                └─10-kubeadm.conf
        Active: activating (auto-restart) (Result: exit-code) since Sun 2021-07-04 01:18:59 PDT; 4s ago
          Docs: https://kubernetes.io/docs/home/
       Process: 11243 ExecStart=/usr/bin/kubelet $KUBELET_KUBECONFIG_ARGS $KUBELET_CONFIG_ARGS $KUBELET_KUBEADM_ARGS $KUBELET_EXTRA_ARGS (code=exited, status=1/FAILURE)
      Main PID: 11243 (code=exited, status=1/FAILURE)

   Jul 04 01:18:59 sideling2.fyre.ibm.com systemd[1]: kubelet.service: Main process exited, code=exited, status=1/FAILURE
   Jul 04 01:18:59 sideling2.fyre.ibm.com systemd[1]: kubelet.service: Failed with result 'exit-code'.

but didn't draw too many conclusions ...

However, I got more insight when I tried / failed to run containerd interactively: -

/usr/bin/containerd

which, in part, failed with: -

FATA[2021-07-04T01:04:22.466954937-07:00] Failed to run CRI service                     error="failed to recover state: failed to reserve sandbox name \"nginx-kata_default_ed1e725f-8bfc-4c8f-89e0-3477c6a4c451_0\": name \"nginx-kata_default_ed1e725f-8bfc-4c8f-89e0-3477c6a4c451_0\" is reserved for \"3e08cb56351c79ec0e854f6fa0ac1e272c251a6dea29602a63a4cd994abf54a2\""

Now, given that the pod that had been running when I disabled to replace two of the in-use binaries was called nginx-kata I suspect that the problem might be that the pod was still running, despite containerd not working, and effectively locking the pod sandbox - as per the above.

Thankfully, this GH issue in the containerd project came to my rescue : -

'failed to reserve sandbox name' error after hard reboot #1014

Following the advice therein, I tried / failed to see what was going on using ctr : -

ctr --namespace k8s.io containers list

ctr: failed to dial "/run/containerd/containerd.sock": connection error: desc = "transport: error while dialing: dial unix /run/containerd/containerd.sock: connect: connection refused"

I even tried the Hail Mary approach of nuking the errant container: -

ctr --namespace k8s.io containers rm 3e08cb56351c79ec0e854f6fa0ac1e272c251a6dea29602a63a4cd994abf54a2

but to no avail: -

ctr: failed to dial "/run/containerd/containerd.sock": connection error: desc = "transport: error while dialing: dial unix /run/containerd/containerd.sock: connect: connection refused"

I even got very brutal: -

find /var/lib/containerd/io.containerd.runtime.v2.task/k8s.io/|grep 3e08cb56351c79ec0e854f6fa0ac1e272c251a6dea29602a63a4cd994abf54a2

/var/lib/containerd/io.containerd.runtime.v2.task/k8s.io/3e08cb56351c79ec0e854f6fa0ac1e272c251a6dea29602a63a4cd994abf54a2

rm -Rf /var/lib/containerd/io.containerd.runtime.v2.task/k8s.io/3e08cb56351c79ec0e854f6fa0ac1e272c251a6dea29602a63a4cd994abf54a2

but again to no avail.

Reading and re-reading the aforementioned GH issues, I noted that one of the responders said, in part: -

you can start containerd with disable_plugins = [ cri ] in the
/etc/containerd/config.toml

to which someone else responded: -

PS: the config option for the workaround is disabled_plugins, not disable_plugins

I checked out the file: -

vi /etc/containerd/config.toml

and noted: -

version = 2
root = "/var/lib/containerd"
state = "/run/containerd"
plugin_dir = ""
disabled_plugins = []
required_plugins = []
oom_score = 0
...


so I amended it: -

version = 2
root = "/var/lib/containerd"
state = "/run/containerd"
plugin_dir = ""
disabled_plugins = ["cri"]
required_plugins = []
oom_score = 0
...

and tried again to manually start containerd: -

/usr/bin/containerd

which borked with: -

containerd: invalid disabled plugin URI "cri" expect io.containerd.x.vx

Remembering that the URI had changed from plain old cri to io.containerd.grpc.v1.cri so I changed it again: -

version = 2
root = "/var/lib/containerd"
state = "/run/containerd"
plugin_dir = ""
disabled_plugins = ["io.containerd.grpc.v1.cri"]
required_plugins = []
oom_score = 0
...

and was able to start containerd manually: -

/usr/bin/containerd

INFO[2021-07-04T09:31:50.507911132-07:00] starting containerd                           revision= version="1.4.4-0ubuntu1~20.04.2"
INFO[2021-07-04T09:31:50.556285374-07:00] loading plugin "io.containerd.content.v1.content"...  type=io.containerd.content.v1
INFO[2021-07-04T09:31:50.556487705-07:00] loading plugin "io.containerd.snapshotter.v1.aufs"...  type=io.containerd.snapshotter.v1
INFO[2021-07-04T09:31:50.568920159-07:00] loading plugin "io.containerd.snapshotter.v1.btrfs"...  type=io.containerd.snapshotter.v1
INFO[2021-07-04T09:31:50.570026092-07:00] skip loading plugin "io.containerd.snapshotter.v1.btrfs"...  error="path /var/lib/containerd/io.containerd.snapshotter.v1.btrfs (xfs) must be a btrfs filesystem to be used with the btrfs snapshotter: skip plugin" type=io.containerd.snapshotter.v1
INFO[2021-07-04T09:31:50.570139702-07:00] loading plugin "io.containerd.snapshotter.v1.devmapper"...  type=io.containerd.snapshotter.v1
WARN[2021-07-04T09:31:50.570350493-07:00] failed to load plugin io.containerd.snapshotter.v1.devmapper  error="devmapper not configured"
INFO[2021-07-04T09:31:50.570398653-07:00] loading plugin "io.containerd.snapshotter.v1.native"...  type=io.containerd.snapshotter.v1
INFO[2021-07-04T09:31:50.570478494-07:00] loading plugin "io.containerd.snapshotter.v1.overlayfs"...  type=io.containerd.snapshotter.v1
INFO[2021-07-04T09:31:50.571028515-07:00] loading plugin "io.containerd.snapshotter.v1.zfs"...  type=io.containerd.snapshotter.v1
INFO[2021-07-04T09:31:50.571796367-07:00] skip loading plugin "io.containerd.snapshotter.v1.zfs"...  error="path /var/lib/containerd/io.containerd.snapshotter.v1.zfs must be a zfs filesystem to be used with the zfs snapshotter: skip plugin" type=io.containerd.snapshotter.v1
INFO[2021-07-04T09:31:50.571859377-07:00] loading plugin "io.containerd.metadata.v1.bolt"...  type=io.containerd.metadata.v1
WARN[2021-07-04T09:31:50.571936777-07:00] could not use snapshotter devmapper in metadata plugin  error="devmapper not configured"
INFO[2021-07-04T09:31:50.571969917-07:00] metadata content store policy set             policy=shared
INFO[2021-07-04T09:31:50.572636609-07:00] loading plugin "io.containerd.differ.v1.walking"...  type=io.containerd.differ.v1
INFO[2021-07-04T09:31:50.572707090-07:00] loading plugin "io.containerd.gc.v1.scheduler"...  type=io.containerd.gc.v1
INFO[2021-07-04T09:31:50.573169651-07:00] loading plugin "io.containerd.service.v1.introspection-service"...  type=io.containerd.service.v1
INFO[2021-07-04T09:31:50.573381772-07:00] loading plugin "io.containerd.service.v1.containers-service"...  type=io.containerd.service.v1
INFO[2021-07-04T09:31:50.573488782-07:00] loading plugin "io.containerd.service.v1.content-service"...  type=io.containerd.service.v1
INFO[2021-07-04T09:31:50.573642362-07:00] loading plugin "io.containerd.service.v1.diff-service"...  type=io.containerd.service.v1
INFO[2021-07-04T09:31:50.573717172-07:00] loading plugin "io.containerd.service.v1.images-service"...  type=io.containerd.service.v1
INFO[2021-07-04T09:31:50.573778242-07:00] loading plugin "io.containerd.service.v1.leases-service"...  type=io.containerd.service.v1
INFO[2021-07-04T09:31:50.574107503-07:00] loading plugin "io.containerd.service.v1.namespaces-service"...  type=io.containerd.service.v1
INFO[2021-07-04T09:31:50.574211764-07:00] loading plugin "io.containerd.service.v1.snapshots-service"...  type=io.containerd.service.v1
INFO[2021-07-04T09:31:50.574268314-07:00] loading plugin "io.containerd.runtime.v1.linux"...  type=io.containerd.runtime.v1
INFO[2021-07-04T09:31:50.574727045-07:00] loading plugin "io.containerd.runtime.v2.task"...  type=io.containerd.runtime.v2
DEBU[2021-07-04T09:31:50.575140177-07:00] loading tasks in namespace                    namespace=k8s.io
INFO[2021-07-04T09:31:50.585638375-07:00] loading plugin "io.containerd.monitor.v1.cgroups"...  type=io.containerd.monitor.v1
INFO[2021-07-04T09:31:50.586886318-07:00] loading plugin "io.containerd.service.v1.tasks-service"...  type=io.containerd.service.v1
INFO[2021-07-04T09:31:50.587187039-07:00] loading plugin "io.containerd.internal.v1.restart"...  type=io.containerd.internal.v1
INFO[2021-07-04T09:31:50.587704810-07:00] loading plugin "io.containerd.grpc.v1.containers"...  type=io.containerd.grpc.v1
INFO[2021-07-04T09:31:50.588049332-07:00] loading plugin "io.containerd.grpc.v1.content"...  type=io.containerd.grpc.v1
INFO[2021-07-04T09:31:50.588408763-07:00] loading plugin "io.containerd.grpc.v1.diff"...  type=io.containerd.grpc.v1
INFO[2021-07-04T09:31:50.588663644-07:00] loading plugin "io.containerd.grpc.v1.events"...  type=io.containerd.grpc.v1
INFO[2021-07-04T09:31:50.588885984-07:00] loading plugin "io.containerd.grpc.v1.healthcheck"...  type=io.containerd.grpc.v1
INFO[2021-07-04T09:31:50.589144744-07:00] loading plugin "io.containerd.grpc.v1.images"...  type=io.containerd.grpc.v1
INFO[2021-07-04T09:31:50.589440975-07:00] loading plugin "io.containerd.grpc.v1.leases"...  type=io.containerd.grpc.v1
INFO[2021-07-04T09:31:50.589764946-07:00] loading plugin "io.containerd.grpc.v1.namespaces"...  type=io.containerd.grpc.v1
INFO[2021-07-04T09:31:50.590026197-07:00] loading plugin "io.containerd.internal.v1.opt"...  type=io.containerd.internal.v1
INFO[2021-07-04T09:31:50.590392338-07:00] loading plugin "io.containerd.grpc.v1.snapshots"...  type=io.containerd.grpc.v1
INFO[2021-07-04T09:31:50.590632959-07:00] loading plugin "io.containerd.grpc.v1.tasks"...  type=io.containerd.grpc.v1
INFO[2021-07-04T09:31:50.590854780-07:00] loading plugin "io.containerd.grpc.v1.version"...  type=io.containerd.grpc.v1
INFO[2021-07-04T09:31:50.591069010-07:00] loading plugin "io.containerd.grpc.v1.introspection"...  type=io.containerd.grpc.v1
INFO[2021-07-04T09:31:50.591965292-07:00] serving...                                    address=/run/containerd/containerd.sock.ttrpc
INFO[2021-07-04T09:31:50.592328083-07:00] serving...                                    address=/run/containerd/containerd.sock
DEBU[2021-07-04T09:31:50.592592984-07:00] sd notification                               error="<nil>" notified=false state="READY=1"
INFO[2021-07-04T09:31:50.592869195-07:00] containerd successfully booted in 0.088386s  

Having "proved" that this worked, I quit the interactive containerd process, and instead started the service: -

systemctl start containerd.service

which started OK.

I was then able to use ctr : -

ctr --namespace k8s.io containers list

and: -

ctr --namespace k8s.io containers list|grep 3e08cb56351c79ec0e854f6fa0ac1e272c251a6dea29602a63a4cd994abf54a2

before using the nuke option: -

ctr --namespace k8s.io containers rm 3e08cb56351c79ec0e854f6fa0ac1e272c251a6dea29602a63a4cd994abf54a2

Having nuked the errant container, I backed out the changes to the configuration file: -

vi /etc/containerd/config.toml 

version = 2
root = "/var/lib/containerd"
state = "/run/containerd"
plugin_dir = ""
disabled_plugins = []
required_plugins = []
oom_score = 0
...

and restarted containerd : -

systemctl restart containerd.service

and things were back to normal.

Can you say "Yay!" ? I bet you can ....

Friday, 2 July 2021

Learning new stuff each and every day - turning Markdown into man pages

I'm tinkering with a container tool called skopeo at present, and am building it on an Ubuntu 20.04 box.

One thing that I'd not realised, until I was looking to contribute some changes back to the project, was that there's a tool that turns Markdown ( .md ) into man pages, meaning that in-line documentation can be generated on the fly.

I found this out whilst building skopeo with the make command, which borked with: -

CGO_CFLAGS="" CGO_LDFLAGS="-L/usr/lib/x86_64-linux-gnu -lgpgme -lassuan -lgpg-error" GO111MODULE=on go build -mod=vendor "-buildmode=pie" -ldflags '-X main.gitCommit=85546491235c78cf51efa1ca060f1d582d5e1ab1 ' -gcflags "" -tags "  " -o bin/skopeo ./cmd/skopeo
sed -e 's/\((skopeo.*\.md)\)//' -e 's/\[\(skopeo.*\)\]/\1/' docs/skopeo-login.1.md  | /root/go/bin/go-md2man -in /dev/stdin -out docs/skopeo-login.1
/bin/sh: 1: /root/go/bin/go-md2man: not found
make: *** [Makefile:143: docs/skopeo-login.1] Error 127

What I didn't realise is that I was missing a key dependency - go-md2man - which is described: -

go-md2man. Converts markdown into roff (man pages). Uses blackfriday to process markdown into man pages. Usage ./md2man -in /path/to/markdownfile.​md ...

This was easily fixed: -

apt-get install -y go-md2man

Reading package lists... Done
Building dependency tree       
Reading state information... Done
The following NEW packages will be installed:
  go-md2man
0 upgraded, 1 newly installed, 0 to remove and 26 not upgraded.
Need to get 655 kB of archives.
After this operation, 2,047 kB of additional disk space will be used.
Get:1 http://us.archive.ubuntu.com/ubuntu focal/universe amd64 go-md2man amd64 1.0.10+ds-1 [655 kB]
Fetched 655 kB in 1s (898 kB/s)    
Selecting previously unselected package go-md2man.
(Reading database ... 116777 files and directories currently installed.)
Preparing to unpack .../go-md2man_1.0.10+ds-1_amd64.deb ...
Unpacking go-md2man (1.0.10+ds-1) ...
Setting up go-md2man (1.0.10+ds-1) ...
Processing triggers for man-db (2.9.1-1) ...

and then validated: -

which go-md2man

/usr/bin/go-md2man

More importantly, my build completed: -

make clean

rm -rf bin docs/*.1

make

CGO_CFLAGS="" CGO_LDFLAGS="-L/usr/lib/x86_64-linux-gnu -lgpgme -lassuan -lgpg-error" GO111MODULE=on go build -mod=vendor "-buildmode=pie" -ldflags '-X main.gitCommit=85546491235c78cf51efa1ca060f1d582d5e1ab1 ' -gcflags "" -tags "  " -o bin/skopeo ./cmd/skopeo
sed -e 's/\((skopeo.*\.md)\)//' -e 's/\[\(skopeo.*\)\]/\1/' docs/skopeo-login.1.md  | /usr/bin/go-md2man -in /dev/stdin -out docs/skopeo-login.1
sed -e 's/\((skopeo.*\.md)\)//' -e 's/\[\(skopeo.*\)\]/\1/' docs/skopeo-list-tags.1.md  | /usr/bin/go-md2man -in /dev/stdin -out docs/skopeo-list-tags.1
sed -e 's/\((skopeo.*\.md)\)//' -e 's/\[\(skopeo.*\)\]/\1/' docs/skopeo-sync.1.md  | /usr/bin/go-md2man -in /dev/stdin -out docs/skopeo-sync.1
sed -e 's/\((skopeo.*\.md)\)//' -e 's/\[\(skopeo.*\)\]/\1/' docs/skopeo-copy.1.md  | /usr/bin/go-md2man -in /dev/stdin -out docs/skopeo-copy.1
sed -e 's/\((skopeo.*\.md)\)//' -e 's/\[\(skopeo.*\)\]/\1/' docs/skopeo-standalone-sign.1.md  | /usr/bin/go-md2man -in /dev/stdin -out docs/skopeo-standalone-sign.1
sed -e 's/\((skopeo.*\.md)\)//' -e 's/\[\(skopeo.*\)\]/\1/' docs/skopeo-standalone-verify.1.md  | /usr/bin/go-md2man -in /dev/stdin -out docs/skopeo-standalone-verify.1
sed -e 's/\((skopeo.*\.md)\)//' -e 's/\[\(skopeo.*\)\]/\1/' docs/skopeo-inspect.1.md  | /usr/bin/go-md2man -in /dev/stdin -out docs/skopeo-inspect.1
sed -e 's/\((skopeo.*\.md)\)//' -e 's/\[\(skopeo.*\)\]/\1/' docs/skopeo-manifest-digest.1.md  | /usr/bin/go-md2man -in /dev/stdin -out docs/skopeo-manifest-digest.1
sed -e 's/\((skopeo.*\.md)\)//' -e 's/\[\(skopeo.*\)\]/\1/' docs/skopeo-logout.1.md  | /usr/bin/go-md2man -in /dev/stdin -out docs/skopeo-logout.1
sed -e 's/\((skopeo.*\.md)\)//' -e 's/\[\(skopeo.*\)\]/\1/' docs/skopeo.1.md  | /usr/bin/go-md2man -in /dev/stdin -out docs/skopeo.1
sed -e 's/\((skopeo.*\.md)\)//' -e 's/\[\(skopeo.*\)\]/\1/' docs/skopeo-delete.1.md  | /usr/bin/go-md2man -in /dev/stdin -out docs/skopeo-delete.1

Yay, VMware Fusion and macOS Big Sur - no longer "NAT good friends" - forgive the double negative and the terrible pun ...

After macOS 11 Big Sur was released in 2020, VMware updated their Fusion product to v12 and, sadly, managed to break Network Address Trans...