Harvester 1.x pre-upgrade check script issues #20

megabreit · 2025-01-16T18:06:18Z

I noticed some issues with the check script which are either confusing, uncomfortable or simply misleading.

The Node Free Space check is not printing node names

Starting Node Free Space check...
Trying to get results from 10.x.x.x:9090
Nodes doesn't have enough free space:
10.x.x.11:9796
10.x.x.12:9796
10.x.x.46:9796
Node-Free-Space Test: Failed

Now it's up to the admin to guess the names. Please print the real node names here!

The script terminates in the middle of execution when a volume is removed while the script is running. Please continue the script execution in this case!
Detached volumes are shown as "Degraded", regardless of how many replicas they have. This is simply wrong. This test needs to be improved.

May I ask to include this script into the Harvester support-bundle? Even though there might be a newer version, it would improve the UX on customer side: No need to manually download the script, find a control plane node, put it there, execute it and collect the output just to satisfy the requirements for a proactive case.
It could be even included in the Harvester image and executed automatically before any upgrade...

The text was updated successfully, but these errors were encountered:

ParadoxGuitarist · 2025-01-16T21:37:39Z

Now it's up to the admin to guess the names. Please print the real node names here!

I agree that it would be nice to have the real node names, but in fairness, there's no guess work here. Just use the WebUI and correlate the node names and IP addresses. The page has the IP addresses listed afaik. There's no guesswork involved.

The script terminates in the middle of execution when a volume is removed while the script is running. Please continue the script execution in this case!

Can you provide output/logs for this? I dont' know how to handle an exception like that without some sort of output or reproducibility.

Detached volumes are shown as "Degraded", regardless of how many replicas they have. This is simply wrong. This test needs to be improved.

This doesn't make sense to me, but maybe it will to the Suse people. I 100% had detached volumes that weren't degraded in my test cases. Again, I think you'll need to provide more info to get the ball rolling on this one.

megabreit · 2025-01-16T23:49:56Z

Are you able to access Suse cases?

megabreit · 2025-01-17T18:18:17Z

Regarding the volume removal issue, this is easy:

<snip>
Degraded Longhorn Volume found: pvc-dff48b06-7556-465e-8074-493fa61b8a7e
Degraded Longhorn Volume found: pvc-e09d1227-19f3-4811-b620-43d6d12952fe
Degraded Longhorn Volume found: pvc-e34b9b53-7846-40d0-88f5-c45b79fa6482
Error from server (NotFound): volumes.longhorn.io "pvc-e66ba4c9-4cb2-4e90-821d-8c801e8ae830" not found
harvester-root-prompt #

...and the script terminates.

Regarding the detached volume issue: I ran the script on 4 clusters so far and compared the output. All detached volumes are found to be degraded, and no other volumes are degraded. All the volumes have 3 replicas and are "healthy" otherwise.
The (verbose) log is not really helpful. It's like this for one detached volumes:

[2025-01-16 14:47:58] Checking engine: pvc-425a12ef-e51a-422e-8626-09d78e7054ea-e-f2bdda1d
[2025-01-16 14:47:59] Degraded Longhorn Volume found: pvc-425a12ef-e51a-422e-8626-09d78e7054ea

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Harvester 1.x pre-upgrade check script issues #20

Harvester 1.x pre-upgrade check script issues #20

megabreit commented Jan 16, 2025

ParadoxGuitarist commented Jan 16, 2025

megabreit commented Jan 16, 2025

megabreit commented Jan 17, 2025

Harvester 1.x pre-upgrade check script issues #20

Harvester 1.x pre-upgrade check script issues #20

Comments

megabreit commented Jan 16, 2025

ParadoxGuitarist commented Jan 16, 2025

megabreit commented Jan 16, 2025

megabreit commented Jan 17, 2025