Monday, 25 October 2021

IBM Cloud - OCP clusters, Ingress and Certificate Manager

So this is definitely a work-in-progress but I may have resolved an issue that I was seeing with a newly created OpenShift Container Platform (OCP) cluster.

TL;DR; the command ic cs cluster ls showed my cluster state as warning and never as ready.

When I inspected the cluster using ic cs cluster get --cluster $cluster_name I saw: -

Ingress Subdomain:              - †   

Ingress Secret:                 - †   

Ingress Status:                 -   

Ingress Message:                -   

and: -

† Your Ingress subdomain and secret might not be ready yet. For more info by cluster type, see 'https://ibm.biz/ingress-sub' for Kubernetes or 'https://ibm.biz/ingress-sub-ocp' for OpenShift.

and, after a while, this: -

Ingress Message:                Could not upload certificates to Certificate Manager instance. Ensure you have the correct IAM permissions. For more info, see http://ibm.biz/ingress-secret   

I followed the suggested link: -

and ended up with: -


which said, in part: -     

What’s happening

You create and delete a cluster multiple times, such as for automation purposes.

Every time that you create the cluster, you use either the same name or a name that is very similar to previous names that you used. When you run ibmcloud ks cluster get --cluster <cluster>, your cluster is in a normal state but no Ingress Subdomain or Ingress Secret are available.
Why it’s happening

When you create and delete a cluster that uses the same name multiple times, the Ingress subdomain for that cluster in the format <cluster_name>.<globally_unique_account_HASH>-0000.<region>.containers.appdomain.cloud is registered and unregistered each time.

The certificate for the subdomain is also generated and deleted each time. If you create and delete a cluster with the same name 5 times or more within 7 days, you might reach the Let's Encrypt Duplicate Certificate rate limit, because the same Ingress subdomain and certificate are registered every time that you create the cluster. Because very long cluster names are truncated to 24 characters in the Ingress subdomain for the cluster, you can also reach the rate limit if you use multiple cluster names that have the same first 24 characters.

Given that I'm writing a document guiding one through the process of deploying OCP on IBM Cloud, I have been re-using the same cluster name e.g. roks-oct2021 over and over the past few days.

Working on the hypothesis that that's the root cause, I've changed the way that I generate the cluster name for my document: -

export cluster_name="roks_`date +%s`"

which uses the date in epoch format e.g. run the command three times in sequence: -

date +%s

1635176609

date +%s

1635176610

date +%s

1635176611

and note the difference.

I've just deleted and recreated my cluster, and it's looking good thus far: -

ic cs cluster ls

OK
Name              ID                     State       Created          Workers   Location    Version                 Resource Group Name   Provider   
roks_1635176017   c5rcsngf0kf7u096q2e0   deploying   10 minutes ago   2         Frankfurt   4.8.11_1526_openshift   default               vpc-gen2   

The state shows as deploying rather than warning and, even more promisingly, the number of Worker ( Computer Nodes ) shows as 2 rather than 0.

We'll see ....

No comments:

Note to self - use kubectl to query images in a pod or deployment

In both cases, we use JSON ... For a deployment, we can do this: - kubectl get deployment foobar --namespace snafu --output jsonpath="{...