Wednesday, 28 July 2021

Fun and games with sed and unterminated commands in Jenkins

So it took me ~3 hours to fix a Bug that should've taken ~10 minutes ...

I was trying to mitigate an issue with one of our Alpine Linux-based images, where our IBM Container Registry (ICR) Vulnerability Advisor (VA) tool was (rightly) complaining about our exposure to CVE-2021-36159 with apk-tools.

I knew that the mitigation was to update the Dockerfile to ensure that this package was updated.

However, given that I'm building from someone else's GH project, where I don't control the Dockerfile per se, I wanted to have my Jenkins Pipeline job amend the Dockerfile "in flight".

So the Dockerfile had the line: -

FROM alpine:3.13 AS run

so I used a bit of sed sweetness to add: -

RUN apk --no-cache upgrade apk-tools

How hard can it be ?

I even tested it using Bash: -

echo "RUN apk --no-cache add procps" > /tmp/foo.txt

cat /tmp/foo.txt

RUN apk --no-cache add procps

sed -i 's/RUN apk --no-cache add procps/RUN apk --no-cache add procps\nRUN apk --no-cache upgrade apk-tools/g' /tmp/foo.txt

cat /tmp/foo.txt

RUN apk --no-cache add procps
RUN apk --no-cache upgrade apk-tools

Easy right ?

Nah, not with Bash embedded in Groovy via a Jenkinsfile ...

Each and every time I ran my Pipeline, the sed command threw up: -

10:45:38  sed: -e expression #1, char 61: unterminated `s' command

etc.

I tried various different incarnations .... with different separators, including / and ; but to no avail.

The internet had the answer, as per usual ....


specifically this: -

2. Insert lines using Regular expression

which provided the following example: -

sed '/PATTERN/ i <LINE-TO-BE-ADDED>' FILE.txt

I tested this manually: -

echo "RUN apk --no-cache add procps" > /tmp/foo.txt

cat /tmp/foo.txt

RUN apk --no-cache add procps

sed -i "/RUN apk --no-cache add procps/a RUN apk --no-cache upgrade apk-tools" /tmp/foo.txt

cat /tmp/foo.txt

RUN apk --no-cache add procps
RUN apk --no-cache upgrade apk-tools

And, better still, it worked within Jenkins: -

11:42:04  Step 14/29 : RUN apk --no-cache add procps
11:42:04   ---> Running in 9505a71400bb
11:42:04  fetch https://dl-cdn.alpinelinux.org/alpine/v3.13/main/s390x/APKINDEX.tar.gz
11:42:04  fetch https://dl-cdn.alpinelinux.org/alpine/v3.13/community/s390x/APKINDEX.tar.gz
11:42:04  (1/5) Installing libintl (0.20.2-r2)
11:42:04  (2/5) Installing ncurses-terminfo-base (6.2_p20210109-r0)
11:42:04  (3/5) Installing ncurses-libs (6.2_p20210109-r0)
11:42:04  (4/5) Installing libproc (3.3.16-r0)
11:42:04  (5/5) Installing procps (3.3.16-r0)
11:42:04  Executing busybox-1.32.1-r6.trigger
11:42:04  OK: 7 MiB in 19 packages

11:42:05  Removing intermediate container 9505a71400bb
11:42:05   ---> a4947b0b1d8d
11:42:05  Step 15/29 : RUN apk --no-cache upgrade apk-tools

which is nice :-)

I've raised an issue with the original project's GH repo, as it'd be better to get apk-tools upgraded at "source" so to speak, but I'm rather happy with my experience - every day is, indeed, a school day


Tuesday, 27 July 2021

Fun and games with sudo and Go in Ubuntu 20.04

Whilst reviewing a colleague's documentation, where he was describing how to code in Go, I noticed a discrepancy in the way that things work when one runs a command as a non-root user vs. running as root, via Super User Do ( sudo )

Having downloaded / unpacked Go thusly: -

sudo wget -c https://golang.org/dl/go1.16.6.linux-amd64.tar.gz -O - | sudo tar -xz -C /usr/local

resulting in Go being installed in /usr/local/go with the actual go binary being in /usr/local/go/bin

ls -al /usr/local/go/bin

total 17128
drwxr-xr-x  2 root root     4096 Jul 12 20:04 .
drwxr-xr-x 10 root root     4096 Jul 12 20:01 ..
-rwxr-xr-x  1 root root 14072999 Jul 12 20:04 go
-rwxr-xr-x  1 root root  3453176 Jul 12 20:04 gofmt

we needed to run a build ( of containerd ) as root, via sudo make and sudo make install

However, this failed: -

cd ~/containerd

sudo make

+ bin/ctr

/bin/sh: 1: go: not found

make: *** [Makefile:219: bin/ctr] Error 127

even though I'd confirmed that go was installed: -

which go

/usr/local/go/bin/go

ls -al `which go`

-rwxr-xr-x 1 root root 14072999 Jul 12 20:04 /usr/local/go/bin/go

go version

go version go1.16.6 linux/amd64

as I'd previously added it to my PATH via this line in ~/.profile : -

export PATH=$PATH:/usr/local/go/bin

Of course, the wrinkle was that I'm running go as root, via sudo ....

This totally helped: -

bash profile works for user but not sudo

specifically the answer that had me validate the PATH in both cases: -

echo 'echo $PATH' | sh

/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin:/usr/local/go/bin

echo 'echo $PATH' | sudo sh 

/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/snap/bin

Note the difference .... ( apart from /usr/games of course ) ?

The answer then had me alias sudo in ~/.bashrc as follows: -

alias sudo='sudo env PATH=$PATH'

validated with: -

alias

alias alert='notify-send --urgency=low -i "$([ $? = 0 ] && echo terminal || echo error)" "$(history|tail -n1|sed -e '\''s/^\s*[0-9]\+\s*//;s/[;&|]\s*alert$//'\'')"'
alias egrep='egrep --color=auto'
alias fgrep='fgrep --color=auto'
alias grep='grep --color=auto'
alias hist='history | cut -c 8-'
alias l='ls -CF'
alias la='ls -A'
alias ll='ls -alF'
alias ls='ls --color=auto'
alias sudo='sudo env PATH=$PATH'

and now the validation works: -

echo 'echo $PATH' | sh

/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin:/usr/local/go/bin

echo 'echo $PATH' | sudo sh 

/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin:/usr/local/go/bin

and, better still, the build works: -

cd ~/containerd/

sudo make
+ bin/ctr
+ bin/containerd
+ bin/containerd-stress
+ bin/containerd-shim
+ bin/containerd-shim-runc-v1
+ bin/containerd-shim-runc-v2
+ binaries

sudo make install

+ install bin/ctr bin/containerd bin/containerd-stress bin/containerd-shim bin/containerd-shim-runc-v1 bin/containerd-shim-runc-v2

which is nice

Also, make me a sandwich .....

Sunday, 25 July 2021

Nesting VMs - not quite as cosy as it sounds....

I wrote about this a few months back: -

Kata Containers and Ubuntu Linux - lessons learned - 3/many - a WIP

in the context of VM nesting being a pain ....

For context, I'm trying ( and failing ) to get Kata Containers fully running inside an Ubuntu VM running on VMware Fusion on my Mac.

This is what I currently have: -

Host: macOS 11.5 Big Sur

Virtualisation: VMware Fusion 12.1.2

Guest: Ubuntu 20.04.2 LTS

Kernel: 5.4.0-80-generic #90-Ubuntu SMP

kata-runtime  : 2.1.0

QEMU: 5.2.0

I've been experimenting with various container runtimes here, including containerd and CRI-O 

However, each and every time I'm hitting the same nested virtualisation issue

Most recently, when I try and use crictl and runp to start a container using the Kata 2.0 runtime: -

sudo crictl runp test/testdata/sandbox_config.json

FATA[0002] run pod sandbox: rpc error: code = Unknown desc = CreateContainer failed: failed to launch qemu: exit status 1, error messages from qemu log: qemu-system-x86_64: error: failed to set MSR 0x48d to 0x5600000016
qemu-system-x86_64: ../target/i386/kvm.c:2701: kvm_buf_set_msrs: Assertion `ret == cpu->kvm_msr_buf->nmsrs' failed.
: unknown 

I've tried various hacks to mitigate this but I just cannot get past it ...

More digging is required, or this combination is a bust - thankfully I have many other options, including IBM Cloud Virtual Servers.......

More to come ....

For a future me - remembering to stop VMware Tools service when running Kata Containers on Ubuntu on VMware on macOS

I keep writing and re-writing this script, because I forgot to store it somewhere memorable ....

The problem to be solved is that the Kata Containers runtime, aka kata-runtime, will not start unless I remember to stop the VMware Tools service, aka open-vm-tools.service and unload a few dependent modules.

At present, I'm running Kata via a snap installation - I'm not terribly keen on that approach, as there seem to be too many fiddly configuration things to-do, but I'll look at that another day, and perhaps raise an issue in the Kata repo.

Meantime, this is what I have: -

/snap/kata-containers/current/usr/bin/kata-runtime -version

kata-runtime  : 2.1.0

   commit   : 0f822919268e4095dd9bdbbb2351248b53746501

   OCI specs: 1.0.1-dev

and, when I run the check tool: -

/snap/kata-containers/current/usr/bin/kata-runtime check

I get this: -

WARN[0000] Not running network checks as super user      arch=amd64 name=kata-runtime pid=1431 source=runtime
WARN[0000] modprobe insert module failed                 arch=amd64 error="exit status 1" module=vhost_vsock name=kata-runtime output="modprobe: ERROR: could not insert 'vhost_vsock': Device or resource busy\n" pid=1431 source=runtime
ERRO[0000] kernel property not found                     arch=amd64 description="Host Support for Linux VM Sockets" name=vhost_vsock pid=1431 source=runtime type=module
System is capable of running Kata Containers
System can currently create Kata Containers

The vhost_vsock module cannot be loaded, as evidenced by this: -

modprobe vhost_vsock

modprobe: ERROR: could not insert 'vhost_vsock': Device or resource busy

lsmod | grep vhost

vhost_net              32768  0
tap                    24576  1 vhost_net
vhost                  49152  1 vhost_net

Somewhere I realised / discovered / found out that VMware Tools, more specifically open-vm-tools, was responsible :-)

Given that, for my use cases at the moment, I don't need VMware Tools for e.g. GUI interactions between guest and host, shared folders etc. I chose to simply disable it.

I've got a script for that: -

~/stop_tools.sh 

#!/bin/bash
systemctl stop open-vm-tools.service
rmmod vmw_vsock_virtio_transport_common
rmmod vmw_vsock_vmci_transport
rmmod vsock
rmmod vhost_net
rmmod vhost

So I can run my script: -

~/stop_tools.sh 

and check that neither vhost* or vsock* modules are now loaded: -

lsmod | grep vs

<NOTHING RETURNED>

lsmod | grep vh

<NOTHING RETURNED>

and then load the vhost_vsock module: -

modprobe vhost_vsock

and check the loaded modules: -

lsmod | grep vs

vhost_vsock            24576  0
vmw_vsock_virtio_transport_common    32768  1 vhost_vsock
vhost                  49152  1 vhost_vsock
vsock                  36864  2 vmw_vsock_virtio_transport_common,vhost_vsock

lsmod | grep vh

vhost_vsock            24576  0
vmw_vsock_virtio_transport_common    32768  1 vhost_vsock
vhost                  49152  1 vhost_vsock
vsock                  36864  2 vmw_vsock_virtio_transport_common,vhost_vsock

and, even more importantly, use Kata: -

/snap/kata-containers/current/usr/bin/kata-runtime check

WARN[0000] Not running network checks as super user      arch=amd64 name=kata-runtime pid=1911 source=runtime
System is capable of running Kata Containers
System can currently create Kata Containers

/snap/kata-containers/current/usr/bin/kata-runtime kata-check

WARN[0000] Not running network checks as super user      arch=amd64 name=kata-runtime pid=1926 source=runtime
System is capable of running Kata Containers
System can currently create Kata Containers

I could probably hack a better solution and/or uninstall VMware Tools but .....

Friday, 23 July 2021

A reprise - growing disks in Ubuntu

 I've written this many times over the years, with many different Linux distributions, but had a need to REDO FROM START yesterday, growing the disk within an Ubuntu VM running on VMware Fusion.

So, for my future self, this is what worked for me ( using Ubuntu 20.04.02 LTS )

Look at the current disk

df -kmh /

Filesystem                         Size  Used Avail Use% Mounted on

/dev/mapper/ubuntu--vg-ubuntu--lv   19G  8.5G  9.2G  49% /

Increase VM from 20 to 50 GB

This is easily done via VMware Fusion: -


I grew the disk from 20 GB to 50 GB whilst the VM was shutdown, and then booted it up.

Inspect the current partition layout

fdisk /dev/sda -l

GPT PMBR size mismatch (41943039 != 104857599) will be corrected by write.

Disk /dev/sda: 50 GiB, 53687091200 bytes, 104857600 sectors

Disk model: VMware Virtual S

Units: sectors of 1 * 512 = 512 bytes

Sector size (logical/physical): 512 bytes / 512 bytes

I/O size (minimum/optimal): 512 bytes / 512 bytes

Disklabel type: gpt

Disk identifier: DA7D21AD-C6E5-4549-B753-391F2EB88C8E


Device       Start      End  Sectors Size Type

/dev/sda1     2048     4095     2048   1M BIOS boot

/dev/sda2     4096  2101247  2097152   1G Linux filesystem

/dev/sda3  2101248 41940991 39839744  19G Linux filesystem

Create a new partition

I did this via CLI, but could've quite easily used fdisk interactively. Also, I took the defaults on start, end, sectors, partition type etc.

(

echo "n"

echo -e "\n"

echo -e "\n"

echo -e "\n"

echo "w"

) | fdisk /dev/sda

Inspect the new partition layout

fdisk /dev/sda -l

Disk /dev/sda: 50 GiB, 53687091200 bytes, 104857600 sectors

Disk model: VMware Virtual S

Units: sectors of 1 * 512 = 512 bytes

Sector size (logical/physical): 512 bytes / 512 bytes

I/O size (minimum/optimal): 512 bytes / 512 bytes

Disklabel type: gpt

Disk identifier: DA7D21AD-C6E5-4549-B753-391F2EB88C8E


Device        Start       End  Sectors Size Type

/dev/sda1      2048      4095     2048   1M BIOS boot

/dev/sda2      4096   2101247  2097152   1G Linux filesystem

/dev/sda3   2101248  41940991 39839744  19G Linux filesystem

/dev/sda4  41940992 104857566 62916575  30G Linux filesystem

Create a Physical Volume using the new sda4 partition

pvcreate /dev/sda4

  Physical volume "/dev/sda4" successfully created.

Extend the existing Volume Group to use the new sda4 partition  

vgextend /dev/ubuntu-vg/ /dev/sda4

  Volume group "ubuntu-vg" successfully extended

Extend the Logical Volume to fit

- Note that I deliberately chose the value of 29G to fit inside the newly added 30GB of disk

lvextend -L +29G /dev/ubuntu-vg/ubuntu-lv

  Size of logical volume ubuntu-vg/ubuntu-lv changed from <19.00 GiB (4863 extents) to <48.00 GiB (12287 extents).

  Logical volume ubuntu-vg/ubuntu-lv successfully resized.

Resize the file-system to fit  

resize2fs /dev/mapper/ubuntu--vg-ubuntu--lv

resize2fs 1.45.5 (07-Jan-2020)

Filesystem at /dev/mapper/ubuntu--vg-ubuntu--lv is mounted on /; on-line resizing required

old_desc_blocks = 3, new_desc_blocks = 6

The filesystem on /dev/mapper/ubuntu--vg-ubuntu--lv is now 12581888 (4k) blocks long.

Look at the current disk

df -kmh /

Filesystem                         Size  Used Avail Use% Mounted on

/dev/mapper/ubuntu--vg-ubuntu--lv   48G  8.5G   37G  19% /

Celebrate !


Thursday, 15 July 2021

Gah, problems building Kata Containers but totally self-inflicted

 Whilst trying to build the kernel using Kata Containers, I kept hitting a problem which appeared to be with the make command.

As an example: -

./build-kernel.sh setup

/root/go/github.com/src/tools/packaging/kernel/../scripts/lib.sh: line 30: pushd: /root/go/src/github.com/kata-containers/tests: No such file or directory
/root/go/github.com/src/tools/packaging/kernel/../scripts/lib.sh: line 31: .ci/install_yq.sh: No such file or directory
/root/go/github.com/src/tools/packaging/kernel/../scripts/lib.sh: line 32: popd: directory stack empty
INFO: Config version: 85
INFO: Kernel version: 5.10.25
INFO: /root/go/github.com/src/tools/packaging/kernel/kata-linux-5.10.25-85 already exist
Kernel source ready: /root/go/github.com/src/tools/packaging/kernel/kata-linux-5.10.25-85

I assumed that this was a missing pre-requisite on my part, harking back to a previous post: -

Kata Containers and Ubuntu Linux - lessons learned - 4/many

Well, kinda but not quite ...

I'm not 100% sure what led me to the realisation but I did finally notice this in the above message: -

/root/go/src/github.com/kata-containers/tests

'cos I'd (inadvertently) cloned the Kata Containers repo into ... 

/root/go/github.com/src

Not sure why but I did ...

The developer guide does make it clear: -

$ go get -d -u github.com/kata-containers/kata-containers
$ cd $GOPATH/src/github.com/kata-containers/kata-containers/src/runtime
$ make && sudo -E PATH=$PATH make install

Build and install the Kata Containers runtime

Once I did it properly: -

git clone git@github.com:kata-containers/kata-containers.git $GOPATH/src/github.com

cd $GOPATH/src/github.com/tools/packaging/kernel

./build-kernel.sh setup

./build-kernel.sh build

all was well, funnily enough

๐Ÿ˜ฎ‍๐Ÿ’จ๐Ÿ˜ฎ‍๐Ÿ’จ๐Ÿ˜ฎ‍๐Ÿ’จ๐Ÿ˜ฎ‍๐Ÿ’จ๐Ÿ˜ฎ‍๐Ÿ’จ๐Ÿ˜ฎ‍๐Ÿ’จ

Saturday, 10 July 2021

Following up - arrays and string munging in Rust

 Following up on And here we go - more Rust By Example I had some fun munging a string into elements of an array in Rust.

This is with what I ended up: -

fn main() {
    let image_string: &str = "docker.io/davidhay1969/hello-world-nginx:latest";
    let mut image_array: [&str; 4] = ["registry", "namespace", "repository", "tag"];
    let mut index = 0;
    for _part in image_string.split(&['/', ':'][..]) {
        image_array[index] = _part;
        index = index + 1;
    }
    for _i in 0..image_array.len() {
        println!("Index {} value {}",_i,image_array[_i]);
    }
}

and this is how it looks when I run it: -

cargo run

    Finished dev [unoptimized + debuginfo] target(s) in 0.00s
     Running `target/debug/hello_world`
Index 0 value docker.io
Index 1 value davidhay1969
Index 2 value hello-world-nginx
Index 3 value latest

noting that I'm doing this in Microsoft Visual Studio Code 

I did get some useful inspiration for this from, amongst others: -






Now to add this functionality into my work with Kata Containers ..... YAY!

Monday, 5 July 2021

And here we go - more Rust By Example

Following on from my earlier post: -

Learning Rust - because Kata

here's a simpler version of my function: -

fn main() {

    for part in "docker.io/davidhay1969/hello-world-nginx:latest".split(&['/', ':'][..]) {

        println!("{}", part);

    }

}

where I'm splitting the container image name: -

docker.io/davidhay1969/hello-world-nginx:latest

So when I create this on my Mac: -

vi hello.rs

fn main() {

    for part in "docker.io/davidhay1969/hello-world-nginx:latest".split(&['/', ':'][..]) {

        println!("{}", part);

    }

}

and compile it: -

rustc hello.rs

and run it: -

./hello

docker.io

davidhay1969

hello-world-nginx

latest

Nice

PS Thanks to How can I split a string (String or &str) on more than one delimiter?

Learning Rust - because Kata

As per previous posts here, I've been working with Kata Containers this past few months, and, as such, have been tinkering with Rust because a core part of the Kata project, namely the kata-agent is written in that particular new ( to me ) language.

Right now, I'm looking at a mechanism to parse container image names, separating out the registry, namespace, repository and tag.

Therefore, I needed to find a way to parse a string e.g. docker.io/davidhay1969/hello-world-nginx:latest into a series of individual strings: -

docker.io

davidhay1969

hello-world-nginx

latest

I started with dot net perls which introduced me to the split() function, and then proceeded from there ...

I then found the Rust By Example site, which allowed me to test out my learnings interactively ...

I started with, of course, Hello World! 

// This is a comment, and is ignored by the compiler
// You can test this code by clicking the "Run" button over there ->
// or if you prefer to use your keyboard, you can use the "Ctrl + Enter" shortcut
// This code is editable, feel free to hack it!
// You can always return to the original code by clicking the "Reset" button ->
// This is the main function
fn main() {
    // Statements here are executed when the compiled binary is called
    // Print text to the console
    let full_name="Dave;Hay";
    let names = full_name.split(';');
    print!("Hello ");
    for n in names {
        print!("{} ",n);
    }
}

One nice thing is that I can make AND test changes within the web page, rather than having to code/test/code/test in, say, Visual Studio Code.

When I click 'Run', here we go ...

Hello Dave Hay 

and then amended it to meet my specific requirement wrt container images: -

// This is a comment, and is ignored by the compiler
// You can test this code by clicking the "Run" button over there ->
// or if you prefer to use your keyboard, you can use the "Ctrl + Enter" shortcut

// This code is editable, feel free to hack it!
// You can always return to the original code by clicking the "Reset" button ->

// This is the main function
fn main() {
    // Statements here are executed when the compiled binary is called

    // Print text to the console
    let full_container_image="docker.io/davidhay1969/hello-world-nginx:latest";
    let names = full_container_image.split('/');
    println!("The image is :-");
    for n in names {
        println!("{} ",n);
    }
}

One nice thing is that I can make AND test changes within the web page, rather than having to code/test/code/test in, say, Visual Studio Code.

When I click 'Run', here we go ...

The image is :-
docker.io 
davidhay1969 
hello-world-nginx:latest 

Obviously, I need to add some logic to handle the semi-colon ( ; ) between the repository and tag, but that's easy enough to do ....

Sunday, 4 July 2021

Fun n' games breaking, and then fixing, containerd

By virtue of the fact that I was hacking around with my Kubernetes 1.21 / containerd 1.44 / Kata 2.0 environment today, I managed to rather impressively break the containerd runtime.

In even better news, I managed to fix it again, albeit with a lot of help from my friends - well, to be specific, from GitHub ....

I think I managed to break things by replacing a pair of binaries underlying containerd and Kata : -

containerd-shim-kata-v2

kata-runtime

whilst I had a pod running using the Kata runtime ...

By the time I realised I'd broken things, it was kinda too late ....

The containerd runtime refused to ... well, run, and things weren't looking too good for me - I was even considering cluster rebuild time .....

I'd tried: -

systemctl stop containerd.service

systemctl start containerd.service

systemctl stop kubelet.service

systemctl start kubelet.service

and even a reboot, but no dice....

Whilst debugging, I did check the containerd service: -

systemctl status containerd.service --no-pager --full

● containerd.service - containerd container runtime
     Loaded: loaded (/lib/systemd/system/containerd.service; enabled; vendor preset: enabled)
     Active: activating (start-pre) since Sun 2021-07-04 01:04:39 PDT; 1ms ago
       Docs: https://containerd.io
Cntrl PID: 6922 ((modprobe))
      Tasks: 0
     Memory: 0B
     CGroup: /system.slice/containerd.service
             └─6922 (modprobe)
Jul 04 01:04:39 sideling2.fyre.ibm.com systemd[1]: Stopped containerd container runtime.
Jul 04 01:04:39 sideling2.fyre.ibm.com systemd[1]: Starting containerd container runtime...

systemctl status containerd.service --no-pager --full

● containerd.service - containerd container runtime
     Loaded: loaded (/lib/systemd/system/containerd.service; enabled; vendor preset: enabled)
     Active: activating (auto-restart) (Result: exit-code) since Sun 2021-07-04 01:04:39 PDT; 2s ago
       Docs: https://containerd.io
    Process: 6922 ExecStartPre=/sbin/modprobe overlay (code=exited, status=0/SUCCESS)
    Process: 6929 ExecStart=/usr/bin/containerd (code=exited, status=1/FAILURE)
   Main PID: 6929 (code=exited, status=1/FAILURE)

as well as the kubelet service: -

systemctl status kubelet.service --no-pager --full

   ● kubelet.service - kubelet: The Kubernetes Node Agent
        Loaded: loaded (/lib/systemd/system/kubelet.service; enabled; vendor preset: enabled)
       Drop-In: /etc/systemd/system/kubelet.service.d
                └─10-kubeadm.conf
        Active: active (running) since Sun 2021-07-04 01:18:53 PDT; 4s ago
          Docs: https://kubernetes.io/docs/home/
      Main PID: 11243 (kubelet)
         Tasks: 7 (limit: 4616)
        Memory: 15.0M
        CGroup: /system.slice/kubelet.service
                └─11243 /usr/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf --config=/var/lib/kubelet/config.yaml --container-runtime=remote --container-runtime-endpoint=/run/containerd/containerd.sock --pod-infra-container-image=k8s.gcr.io/pause:3.4.1

   Jul 04 01:18:53 sideling2.fyre.ibm.com systemd[1]: Started kubelet: The Kubernetes Node Agent.
   Jul 04 01:18:54 sideling2.fyre.ibm.com kubelet[11243]: I0704 01:18:54.014459   11243 server.go:197] "Warning: For remote container runtime, --pod-infra-container-image is ignored in kubelet, which should be set in that remote runtime instead"
   Jul 04 01:18:54 sideling2.fyre.ibm.com kubelet[11243]: I0704 01:18:54.044083   11243 server.go:440] "Kubelet version" kubeletVersion="v1.21.0"
   Jul 04 01:18:54 sideling2.fyre.ibm.com kubelet[11243]: I0704 01:18:54.044870   11243 server.go:851] "Client rotation is on, will bootstrap in background"
   Jul 04 01:18:54 sideling2.fyre.ibm.com kubelet[11243]: I0704 01:18:54.062425   11243 certificate_store.go:130] Loading cert/key pair from "/var/lib/kubelet/pki/kubelet-client-current.pem".
   Jul 04 01:18:54 sideling2.fyre.ibm.com kubelet[11243]: I0704 01:18:54.064001   11243 dynamic_cafile_content.go:167] Starting client-ca-bundle::/etc/kubernetes/pki/ca.crt
   root@sideling2:~# systemctl status kubelet.service --no-pager --full
   ● kubelet.service - kubelet: The Kubernetes Node Agent
        Loaded: loaded (/lib/systemd/system/kubelet.service; enabled; vendor preset: enabled)
       Drop-In: /etc/systemd/system/kubelet.service.d
                └─10-kubeadm.conf
        Active: activating (auto-restart) (Result: exit-code) since Sun 2021-07-04 01:18:59 PDT; 4s ago
          Docs: https://kubernetes.io/docs/home/
       Process: 11243 ExecStart=/usr/bin/kubelet $KUBELET_KUBECONFIG_ARGS $KUBELET_CONFIG_ARGS $KUBELET_KUBEADM_ARGS $KUBELET_EXTRA_ARGS (code=exited, status=1/FAILURE)
      Main PID: 11243 (code=exited, status=1/FAILURE)

   Jul 04 01:18:59 sideling2.fyre.ibm.com systemd[1]: kubelet.service: Main process exited, code=exited, status=1/FAILURE
   Jul 04 01:18:59 sideling2.fyre.ibm.com systemd[1]: kubelet.service: Failed with result 'exit-code'.

but didn't draw too many conclusions ...

However, I got more insight when I tried / failed to run containerd interactively: -

/usr/bin/containerd

which, in part, failed with: -

FATA[2021-07-04T01:04:22.466954937-07:00] Failed to run CRI service                     error="failed to recover state: failed to reserve sandbox name \"nginx-kata_default_ed1e725f-8bfc-4c8f-89e0-3477c6a4c451_0\": name \"nginx-kata_default_ed1e725f-8bfc-4c8f-89e0-3477c6a4c451_0\" is reserved for \"3e08cb56351c79ec0e854f6fa0ac1e272c251a6dea29602a63a4cd994abf54a2\""

Now, given that the pod that had been running when I disabled to replace two of the in-use binaries was called nginx-kata I suspect that the problem might be that the pod was still running, despite containerd not working, and effectively locking the pod sandbox - as per the above.

Thankfully, this GH issue in the containerd project came to my rescue : -

'failed to reserve sandbox name' error after hard reboot #1014

Following the advice therein, I tried / failed to see what was going on using ctr : -

ctr --namespace k8s.io containers list

ctr: failed to dial "/run/containerd/containerd.sock": connection error: desc = "transport: error while dialing: dial unix /run/containerd/containerd.sock: connect: connection refused"

I even tried the Hail Mary approach of nuking the errant container: -

ctr --namespace k8s.io containers rm 3e08cb56351c79ec0e854f6fa0ac1e272c251a6dea29602a63a4cd994abf54a2

but to no avail: -

ctr: failed to dial "/run/containerd/containerd.sock": connection error: desc = "transport: error while dialing: dial unix /run/containerd/containerd.sock: connect: connection refused"

I even got very brutal: -

find /var/lib/containerd/io.containerd.runtime.v2.task/k8s.io/|grep 3e08cb56351c79ec0e854f6fa0ac1e272c251a6dea29602a63a4cd994abf54a2

/var/lib/containerd/io.containerd.runtime.v2.task/k8s.io/3e08cb56351c79ec0e854f6fa0ac1e272c251a6dea29602a63a4cd994abf54a2

rm -Rf /var/lib/containerd/io.containerd.runtime.v2.task/k8s.io/3e08cb56351c79ec0e854f6fa0ac1e272c251a6dea29602a63a4cd994abf54a2

but again to no avail.

Reading and re-reading the aforementioned GH issues, I noted that one of the responders said, in part: -

you can start containerd with disable_plugins = [ cri ] in the
/etc/containerd/config.toml

to which someone else responded: -

PS: the config option for the workaround is disabled_plugins, not disable_plugins

I checked out the file: -

vi /etc/containerd/config.toml

and noted: -

version = 2
root = "/var/lib/containerd"
state = "/run/containerd"
plugin_dir = ""
disabled_plugins = []
required_plugins = []
oom_score = 0
...


so I amended it: -

version = 2
root = "/var/lib/containerd"
state = "/run/containerd"
plugin_dir = ""
disabled_plugins = ["cri"]
required_plugins = []
oom_score = 0
...

and tried again to manually start containerd: -

/usr/bin/containerd

which borked with: -

containerd: invalid disabled plugin URI "cri" expect io.containerd.x.vx

Remembering that the URI had changed from plain old cri to io.containerd.grpc.v1.cri so I changed it again: -

version = 2
root = "/var/lib/containerd"
state = "/run/containerd"
plugin_dir = ""
disabled_plugins = ["io.containerd.grpc.v1.cri"]
required_plugins = []
oom_score = 0
...

and was able to start containerd manually: -

/usr/bin/containerd

INFO[2021-07-04T09:31:50.507911132-07:00] starting containerd                           revision= version="1.4.4-0ubuntu1~20.04.2"
INFO[2021-07-04T09:31:50.556285374-07:00] loading plugin "io.containerd.content.v1.content"...  type=io.containerd.content.v1
INFO[2021-07-04T09:31:50.556487705-07:00] loading plugin "io.containerd.snapshotter.v1.aufs"...  type=io.containerd.snapshotter.v1
INFO[2021-07-04T09:31:50.568920159-07:00] loading plugin "io.containerd.snapshotter.v1.btrfs"...  type=io.containerd.snapshotter.v1
INFO[2021-07-04T09:31:50.570026092-07:00] skip loading plugin "io.containerd.snapshotter.v1.btrfs"...  error="path /var/lib/containerd/io.containerd.snapshotter.v1.btrfs (xfs) must be a btrfs filesystem to be used with the btrfs snapshotter: skip plugin" type=io.containerd.snapshotter.v1
INFO[2021-07-04T09:31:50.570139702-07:00] loading plugin "io.containerd.snapshotter.v1.devmapper"...  type=io.containerd.snapshotter.v1
WARN[2021-07-04T09:31:50.570350493-07:00] failed to load plugin io.containerd.snapshotter.v1.devmapper  error="devmapper not configured"
INFO[2021-07-04T09:31:50.570398653-07:00] loading plugin "io.containerd.snapshotter.v1.native"...  type=io.containerd.snapshotter.v1
INFO[2021-07-04T09:31:50.570478494-07:00] loading plugin "io.containerd.snapshotter.v1.overlayfs"...  type=io.containerd.snapshotter.v1
INFO[2021-07-04T09:31:50.571028515-07:00] loading plugin "io.containerd.snapshotter.v1.zfs"...  type=io.containerd.snapshotter.v1
INFO[2021-07-04T09:31:50.571796367-07:00] skip loading plugin "io.containerd.snapshotter.v1.zfs"...  error="path /var/lib/containerd/io.containerd.snapshotter.v1.zfs must be a zfs filesystem to be used with the zfs snapshotter: skip plugin" type=io.containerd.snapshotter.v1
INFO[2021-07-04T09:31:50.571859377-07:00] loading plugin "io.containerd.metadata.v1.bolt"...  type=io.containerd.metadata.v1
WARN[2021-07-04T09:31:50.571936777-07:00] could not use snapshotter devmapper in metadata plugin  error="devmapper not configured"
INFO[2021-07-04T09:31:50.571969917-07:00] metadata content store policy set             policy=shared
INFO[2021-07-04T09:31:50.572636609-07:00] loading plugin "io.containerd.differ.v1.walking"...  type=io.containerd.differ.v1
INFO[2021-07-04T09:31:50.572707090-07:00] loading plugin "io.containerd.gc.v1.scheduler"...  type=io.containerd.gc.v1
INFO[2021-07-04T09:31:50.573169651-07:00] loading plugin "io.containerd.service.v1.introspection-service"...  type=io.containerd.service.v1
INFO[2021-07-04T09:31:50.573381772-07:00] loading plugin "io.containerd.service.v1.containers-service"...  type=io.containerd.service.v1
INFO[2021-07-04T09:31:50.573488782-07:00] loading plugin "io.containerd.service.v1.content-service"...  type=io.containerd.service.v1
INFO[2021-07-04T09:31:50.573642362-07:00] loading plugin "io.containerd.service.v1.diff-service"...  type=io.containerd.service.v1
INFO[2021-07-04T09:31:50.573717172-07:00] loading plugin "io.containerd.service.v1.images-service"...  type=io.containerd.service.v1
INFO[2021-07-04T09:31:50.573778242-07:00] loading plugin "io.containerd.service.v1.leases-service"...  type=io.containerd.service.v1
INFO[2021-07-04T09:31:50.574107503-07:00] loading plugin "io.containerd.service.v1.namespaces-service"...  type=io.containerd.service.v1
INFO[2021-07-04T09:31:50.574211764-07:00] loading plugin "io.containerd.service.v1.snapshots-service"...  type=io.containerd.service.v1
INFO[2021-07-04T09:31:50.574268314-07:00] loading plugin "io.containerd.runtime.v1.linux"...  type=io.containerd.runtime.v1
INFO[2021-07-04T09:31:50.574727045-07:00] loading plugin "io.containerd.runtime.v2.task"...  type=io.containerd.runtime.v2
DEBU[2021-07-04T09:31:50.575140177-07:00] loading tasks in namespace                    namespace=k8s.io
INFO[2021-07-04T09:31:50.585638375-07:00] loading plugin "io.containerd.monitor.v1.cgroups"...  type=io.containerd.monitor.v1
INFO[2021-07-04T09:31:50.586886318-07:00] loading plugin "io.containerd.service.v1.tasks-service"...  type=io.containerd.service.v1
INFO[2021-07-04T09:31:50.587187039-07:00] loading plugin "io.containerd.internal.v1.restart"...  type=io.containerd.internal.v1
INFO[2021-07-04T09:31:50.587704810-07:00] loading plugin "io.containerd.grpc.v1.containers"...  type=io.containerd.grpc.v1
INFO[2021-07-04T09:31:50.588049332-07:00] loading plugin "io.containerd.grpc.v1.content"...  type=io.containerd.grpc.v1
INFO[2021-07-04T09:31:50.588408763-07:00] loading plugin "io.containerd.grpc.v1.diff"...  type=io.containerd.grpc.v1
INFO[2021-07-04T09:31:50.588663644-07:00] loading plugin "io.containerd.grpc.v1.events"...  type=io.containerd.grpc.v1
INFO[2021-07-04T09:31:50.588885984-07:00] loading plugin "io.containerd.grpc.v1.healthcheck"...  type=io.containerd.grpc.v1
INFO[2021-07-04T09:31:50.589144744-07:00] loading plugin "io.containerd.grpc.v1.images"...  type=io.containerd.grpc.v1
INFO[2021-07-04T09:31:50.589440975-07:00] loading plugin "io.containerd.grpc.v1.leases"...  type=io.containerd.grpc.v1
INFO[2021-07-04T09:31:50.589764946-07:00] loading plugin "io.containerd.grpc.v1.namespaces"...  type=io.containerd.grpc.v1
INFO[2021-07-04T09:31:50.590026197-07:00] loading plugin "io.containerd.internal.v1.opt"...  type=io.containerd.internal.v1
INFO[2021-07-04T09:31:50.590392338-07:00] loading plugin "io.containerd.grpc.v1.snapshots"...  type=io.containerd.grpc.v1
INFO[2021-07-04T09:31:50.590632959-07:00] loading plugin "io.containerd.grpc.v1.tasks"...  type=io.containerd.grpc.v1
INFO[2021-07-04T09:31:50.590854780-07:00] loading plugin "io.containerd.grpc.v1.version"...  type=io.containerd.grpc.v1
INFO[2021-07-04T09:31:50.591069010-07:00] loading plugin "io.containerd.grpc.v1.introspection"...  type=io.containerd.grpc.v1
INFO[2021-07-04T09:31:50.591965292-07:00] serving...                                    address=/run/containerd/containerd.sock.ttrpc
INFO[2021-07-04T09:31:50.592328083-07:00] serving...                                    address=/run/containerd/containerd.sock
DEBU[2021-07-04T09:31:50.592592984-07:00] sd notification                               error="<nil>" notified=false state="READY=1"
INFO[2021-07-04T09:31:50.592869195-07:00] containerd successfully booted in 0.088386s  

Having "proved" that this worked, I quit the interactive containerd process, and instead started the service: -

systemctl start containerd.service

which started OK.

I was then able to use ctr : -

ctr --namespace k8s.io containers list

and: -

ctr --namespace k8s.io containers list|grep 3e08cb56351c79ec0e854f6fa0ac1e272c251a6dea29602a63a4cd994abf54a2

before using the nuke option: -

ctr --namespace k8s.io containers rm 3e08cb56351c79ec0e854f6fa0ac1e272c251a6dea29602a63a4cd994abf54a2

Having nuked the errant container, I backed out the changes to the configuration file: -

vi /etc/containerd/config.toml 

version = 2
root = "/var/lib/containerd"
state = "/run/containerd"
plugin_dir = ""
disabled_plugins = []
required_plugins = []
oom_score = 0
...

and restarted containerd : -

systemctl restart containerd.service

and things were back to normal.

Can you say "Yay!" ? I bet you can ....

Friday, 2 July 2021

Learning new stuff each and every day - turning Markdown into man pages

I'm tinkering with a container tool called skopeo at present, and am building it on an Ubuntu 20.04 box.

One thing that I'd not realised, until I was looking to contribute some changes back to the project, was that there's a tool that turns Markdown ( .md ) into man pages, meaning that in-line documentation can be generated on the fly.

I found this out whilst building skopeo with the make command, which borked with: -

CGO_CFLAGS="" CGO_LDFLAGS="-L/usr/lib/x86_64-linux-gnu -lgpgme -lassuan -lgpg-error" GO111MODULE=on go build -mod=vendor "-buildmode=pie" -ldflags '-X main.gitCommit=85546491235c78cf51efa1ca060f1d582d5e1ab1 ' -gcflags "" -tags "  " -o bin/skopeo ./cmd/skopeo
sed -e 's/\((skopeo.*\.md)\)//' -e 's/\[\(skopeo.*\)\]/\1/' docs/skopeo-login.1.md  | /root/go/bin/go-md2man -in /dev/stdin -out docs/skopeo-login.1
/bin/sh: 1: /root/go/bin/go-md2man: not found
make: *** [Makefile:143: docs/skopeo-login.1] Error 127

What I didn't realise is that I was missing a key dependency - go-md2man - which is described: -

go-md2man. Converts markdown into roff (man pages). Uses blackfriday to process markdown into man pages. Usage ./md2man -in /path/to/markdownfile.​md ...

This was easily fixed: -

apt-get install -y go-md2man

Reading package lists... Done
Building dependency tree       
Reading state information... Done
The following NEW packages will be installed:
  go-md2man
0 upgraded, 1 newly installed, 0 to remove and 26 not upgraded.
Need to get 655 kB of archives.
After this operation, 2,047 kB of additional disk space will be used.
Get:1 http://us.archive.ubuntu.com/ubuntu focal/universe amd64 go-md2man amd64 1.0.10+ds-1 [655 kB]
Fetched 655 kB in 1s (898 kB/s)    
Selecting previously unselected package go-md2man.
(Reading database ... 116777 files and directories currently installed.)
Preparing to unpack .../go-md2man_1.0.10+ds-1_amd64.deb ...
Unpacking go-md2man (1.0.10+ds-1) ...
Setting up go-md2man (1.0.10+ds-1) ...
Processing triggers for man-db (2.9.1-1) ...

and then validated: -

which go-md2man

/usr/bin/go-md2man

More importantly, my build completed: -

make clean

rm -rf bin docs/*.1

make

CGO_CFLAGS="" CGO_LDFLAGS="-L/usr/lib/x86_64-linux-gnu -lgpgme -lassuan -lgpg-error" GO111MODULE=on go build -mod=vendor "-buildmode=pie" -ldflags '-X main.gitCommit=85546491235c78cf51efa1ca060f1d582d5e1ab1 ' -gcflags "" -tags "  " -o bin/skopeo ./cmd/skopeo
sed -e 's/\((skopeo.*\.md)\)//' -e 's/\[\(skopeo.*\)\]/\1/' docs/skopeo-login.1.md  | /usr/bin/go-md2man -in /dev/stdin -out docs/skopeo-login.1
sed -e 's/\((skopeo.*\.md)\)//' -e 's/\[\(skopeo.*\)\]/\1/' docs/skopeo-list-tags.1.md  | /usr/bin/go-md2man -in /dev/stdin -out docs/skopeo-list-tags.1
sed -e 's/\((skopeo.*\.md)\)//' -e 's/\[\(skopeo.*\)\]/\1/' docs/skopeo-sync.1.md  | /usr/bin/go-md2man -in /dev/stdin -out docs/skopeo-sync.1
sed -e 's/\((skopeo.*\.md)\)//' -e 's/\[\(skopeo.*\)\]/\1/' docs/skopeo-copy.1.md  | /usr/bin/go-md2man -in /dev/stdin -out docs/skopeo-copy.1
sed -e 's/\((skopeo.*\.md)\)//' -e 's/\[\(skopeo.*\)\]/\1/' docs/skopeo-standalone-sign.1.md  | /usr/bin/go-md2man -in /dev/stdin -out docs/skopeo-standalone-sign.1
sed -e 's/\((skopeo.*\.md)\)//' -e 's/\[\(skopeo.*\)\]/\1/' docs/skopeo-standalone-verify.1.md  | /usr/bin/go-md2man -in /dev/stdin -out docs/skopeo-standalone-verify.1
sed -e 's/\((skopeo.*\.md)\)//' -e 's/\[\(skopeo.*\)\]/\1/' docs/skopeo-inspect.1.md  | /usr/bin/go-md2man -in /dev/stdin -out docs/skopeo-inspect.1
sed -e 's/\((skopeo.*\.md)\)//' -e 's/\[\(skopeo.*\)\]/\1/' docs/skopeo-manifest-digest.1.md  | /usr/bin/go-md2man -in /dev/stdin -out docs/skopeo-manifest-digest.1
sed -e 's/\((skopeo.*\.md)\)//' -e 's/\[\(skopeo.*\)\]/\1/' docs/skopeo-logout.1.md  | /usr/bin/go-md2man -in /dev/stdin -out docs/skopeo-logout.1
sed -e 's/\((skopeo.*\.md)\)//' -e 's/\[\(skopeo.*\)\]/\1/' docs/skopeo.1.md  | /usr/bin/go-md2man -in /dev/stdin -out docs/skopeo.1
sed -e 's/\((skopeo.*\.md)\)//' -e 's/\[\(skopeo.*\)\]/\1/' docs/skopeo-delete.1.md  | /usr/bin/go-md2man -in /dev/stdin -out docs/skopeo-delete.1

More fun with pip

Again with the Python and pip  fun, this time on my Mac, where commands such as:  pip3 list and: - pip3 install --upgrade pip were failing w...