Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Harvester 1.x pre-upgrade check script issues #20

Open
megabreit opened this issue Jan 16, 2025 · 3 comments
Open

Harvester 1.x pre-upgrade check script issues #20

megabreit opened this issue Jan 16, 2025 · 3 comments

Comments

@megabreit
Copy link

I noticed some issues with the check script which are either confusing, uncomfortable or simply misleading.

  • The Node Free Space check is not printing node names
Starting Node Free Space check...
Trying to get results from 10.x.x.x:9090
Nodes doesn't have enough free space:
10.x.x.11:9796
10.x.x.12:9796
10.x.x.46:9796
Node-Free-Space Test: Failed

Now it's up to the admin to guess the names. Please print the real node names here!

  • The script terminates in the middle of execution when a volume is removed while the script is running. Please continue the script execution in this case!
  • Detached volumes are shown as "Degraded", regardless of how many replicas they have. This is simply wrong. This test needs to be improved.

May I ask to include this script into the Harvester support-bundle? Even though there might be a newer version, it would improve the UX on customer side: No need to manually download the script, find a control plane node, put it there, execute it and collect the output just to satisfy the requirements for a proactive case.
It could be even included in the Harvester image and executed automatically before any upgrade...

@ParadoxGuitarist
Copy link
Contributor

Now it's up to the admin to guess the names. Please print the real node names here!

I agree that it would be nice to have the real node names, but in fairness, there's no guess work here. Just use the WebUI and correlate the node names and IP addresses. The page has the IP addresses listed afaik. There's no guesswork involved.

The script terminates in the middle of execution when a volume is removed while the script is running. Please continue the script execution in this case!

Can you provide output/logs for this? I dont' know how to handle an exception like that without some sort of output or reproducibility.

Detached volumes are shown as "Degraded", regardless of how many replicas they have. This is simply wrong. This test needs to be improved.

This doesn't make sense to me, but maybe it will to the Suse people. I 100% had detached volumes that weren't degraded in my test cases. Again, I think you'll need to provide more info to get the ball rolling on this one.

@megabreit
Copy link
Author

Are you able to access Suse cases?

@megabreit
Copy link
Author

Regarding the volume removal issue, this is easy:

<snip>
Degraded Longhorn Volume found: pvc-dff48b06-7556-465e-8074-493fa61b8a7e
Degraded Longhorn Volume found: pvc-e09d1227-19f3-4811-b620-43d6d12952fe
Degraded Longhorn Volume found: pvc-e34b9b53-7846-40d0-88f5-c45b79fa6482
Error from server (NotFound): volumes.longhorn.io "pvc-e66ba4c9-4cb2-4e90-821d-8c801e8ae830" not found
harvester-root-prompt #

...and the script terminates.

Regarding the detached volume issue: I ran the script on 4 clusters so far and compared the output. All detached volumes are found to be degraded, and no other volumes are degraded. All the volumes have 3 replicas and are "healthy" otherwise.
The (verbose) log is not really helpful. It's like this for one detached volumes:

[2025-01-16 14:47:58] Checking engine: pvc-425a12ef-e51a-422e-8626-09d78e7054ea-e-f2bdda1d
[2025-01-16 14:47:59] Degraded Longhorn Volume found: pvc-425a12ef-e51a-422e-8626-09d78e7054ea

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants