[BUG] crc start error doesn't exit with non-zero exit code #4284

bobbygryzynger · 2024-07-24T18:34:06Z

General information

OS: Linux
Hypervisor: KVM
Did you run crc setup before starting it (Yes/No)? Yes
Running CRC on: Laptop

CRC version

CRC version: 2.22.1+e8068b4
OpenShift version: 4.13.3
Podman version: 4.4.4

CRC status

$ crc status --log-level debug
DEBU CRC version: 2.22.1+e8068b4                  
DEBU OpenShift version: 4.13.3                    
DEBU Podman version: 4.4.4                        
DEBU Running 'crc status'                         
CRC VM:          Running
OpenShift:       Starting (v4.13.3)
RAM Usage:       9.412GB of 16.77GB
Disk Usage:      20.8GB of 79.93GB (Inside the CRC VM)
Cache Usage:     202.3GB
Cache Directory: /home/bgryzyng/.crc/cache

CRC config

$ crc config view
- consent-telemetry                     : yes
- cpus                                  : 6
- disk-size                             : 75
- enable-cluster-monitoring             : true
- host-network-access                   : true
- memory                                : 16384
- network-mode                          : user

Host Operating System

$ cat /etc/os-release
NAME="Fedora Linux"
...

Steps to reproduce

Run crc start
If an error, occurs a non-zero exit code is not provided

Expected

If an error occurs, exit code should be 1 or greater

Actual

Exit code is zero

Logs

...
INFO 3 operators are progressing: authentication, kube-apiserver, monitoring 
INFO 3 operators are progressing: authentication, kube-apiserver, monitoring 
INFO 3 operators are progressing: authentication, kube-apiserver, monitoring 
INFO 3 operators are progressing: authentication, kube-apiserver, monitoring 
INFO 3 operators are progressing: authentication, kube-apiserver, monitoring 
ERRO Cluster is not ready: cluster operators are still not stable after 10m20.6920432s 
INFO Adding crc-admin and crc-developer contexts to kubeconfig... 
Started the OpenShift cluster.
...

The text was updated successfully, but these errors were encountered:

praveenkumar · 2024-07-25T04:05:35Z

ERRO Cluster is not ready: cluster operators are still not stable after 10m20.6920432s

@bobbygryzynger this we display as error but actually it is mostly a warning (we might change the messaging) sometime it happens that due to slow IO or cert regenerate it takes more than expected time for operator reconciliation but that's doesn't mean the cluster is completely useless.

bobbygryzynger · 2024-07-25T12:23:01Z

Thanks @praveenkumar, understood. For this particular error, the cluster was unstable after this.

A suggestion: make anything that's error level exit with a non-zero. This particular issue could just be a warning, but my experience with it suggests it should still be an error. Maybe it could remain an error with a suggestion logged on how to increase the timeout (if that's possible).

praveenkumar · 2024-07-29T05:05:31Z

but my experience with it suggests it should still be an error.

@bobbygryzynger Because for you the operator never become stable?

bobbygryzynger · 2024-07-29T17:52:22Z

@praveenkumar, that's right. When I saw this, even after waiting a bit, the operators were still unstable.

praveenkumar · 2024-07-30T06:43:54Z

@praveenkumar, that's right. When I saw this, even after waiting a bit, the operators were still unstable.

In that case, if you able to access the kube api then try to use https://docs.openshift.com/container-platform/4.16/support/troubleshooting/troubleshooting-operator-issues.html to see why an operator is not stable.

bobbygryzynger · 2024-07-30T13:20:59Z

@praveenkumar the operators being unstable is really a separate issue from what I'm requesting here. All I'd really like to see here is that when errors occur, a non-zero code is produced so that my scripts can pick up on that.

praveenkumar · 2024-07-31T09:02:46Z

@bobbygryzynger Right and as I said what you observe as error should be a warning and it is issue with our messaging apart from that we are already returning non-zero exit code when error happen.

bobbygryzynger · 2024-07-31T14:23:10Z

It seems to me that the cluster being unstable is worthy of an error, but I won't belabor the point.

praveenkumar · 2024-08-01T08:59:36Z

@bobbygryzynger I am reopening this issue since we didn't make change in the messaging yet.

It seems to me that the cluster being unstable is worthy of an error, but I won't belabor the point.

At some point I do agree but if we error out and not execute next steps which actually update the kubeconfig with user context then there is no easy way to access the cluster api to debug which clusteroperator is behaving differently and that's the reason I consider this as warning and let user use the API for debugging or may be the operator which is misbehaving is not even required by user then this can be ignored.

cfergeau · 2024-09-05T11:12:24Z

At some point I do agree but if we error out and not execute next steps

I think the request is that when there is this "cluster not ready" message, after crc completes, echo $? should be non-0. I don't think this asks for crc to exit right away, so it could still try to update ~/.kube/config.

bobbygryzynger added kind/bug Something isn't working status/need triage labels Jul 24, 2024

bobbygryzynger closed this as completed Jul 31, 2024

bobbygryzynger closed this as not planned Won't fix, can't repro, duplicate, stale Jul 31, 2024

praveenkumar reopened this Aug 1, 2024

vyasgun mentioned this issue Aug 12, 2024

Print warning instead of error in case of unstable cluster #4315

Merged

vyasgun self-assigned this Aug 13, 2024

vyasgun added this to Project planning: crc Aug 14, 2024

vyasgun moved this to Ready for review in Project planning: crc Aug 14, 2024

vyasgun mentioned this issue Oct 8, 2024

Improvements in error codes and messages returned during crc start #4395

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] crc start error doesn't exit with non-zero exit code #4284

[BUG] crc start error doesn't exit with non-zero exit code #4284

bobbygryzynger commented Jul 24, 2024

praveenkumar commented Jul 25, 2024

bobbygryzynger commented Jul 25, 2024 •

edited

Loading

praveenkumar commented Jul 29, 2024

bobbygryzynger commented Jul 29, 2024

praveenkumar commented Jul 30, 2024

bobbygryzynger commented Jul 30, 2024

praveenkumar commented Jul 31, 2024

bobbygryzynger commented Jul 31, 2024

praveenkumar commented Aug 1, 2024

cfergeau commented Sep 5, 2024

[BUG] crc start error doesn't exit with non-zero exit code #4284

[BUG] crc start error doesn't exit with non-zero exit code #4284

Comments

bobbygryzynger commented Jul 24, 2024

General information

CRC version

CRC status

CRC config

Host Operating System

Steps to reproduce

Expected

Actual

Logs

praveenkumar commented Jul 25, 2024

bobbygryzynger commented Jul 25, 2024 • edited Loading

praveenkumar commented Jul 29, 2024

bobbygryzynger commented Jul 29, 2024

praveenkumar commented Jul 30, 2024

bobbygryzynger commented Jul 30, 2024

praveenkumar commented Jul 31, 2024

bobbygryzynger commented Jul 31, 2024

praveenkumar commented Aug 1, 2024

cfergeau commented Sep 5, 2024

bobbygryzynger commented Jul 25, 2024 •

edited

Loading