-
-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CloudWatchLogsLogGroup - stuck waiting #331
Comments
Just so I understand, it does delete it but it tends to delete it and it comes back simply due to the order in which deletions happen and how long it takes for things like clusters to get removed? |
I don't think it comes back. It seems like it never trigers the remove or when it was trigering the remove - it is still bein used and the remove fails.
This message is repeated until aws-nuke gives up. |
Do you ever see a trigger remove? |
Yes:
Also I found that EKS removal was triggered after the CloudWatchLogsLogGroup trigger (here you can see the LogGroup was already in waiting status)
|
What is happening is it is getting deleted, but gets recreated by the cluster before the tool detects it is gone, so it's waiting for a removal that's already technically happened. Not sure at the moment the best way to handle. Need to give this some thought. |
I actually think this can be solved very simply by a patch to libnuke. I'll put a PR together and make the binaries available. Ultimately what is happening is that the CloudWatch LogGroup is being deleted, but coming back, however due to how the resource matching works currently, the There's two fixes here, a) we need some sort of upper threshold on "waiting" resources and b) we need to prioritize property matching over stringer. The CloudWatch Log Group properties would mismatch due to LastEvent and CreatedAt time, this invalidating and trying to remove again. However, the ultimate fix is going to be some sort of DAG for deletions. |
This might fix the issue -- #332 Builds should be available here -- https://github.com/ekristen/aws-nuke/actions/runs/11097585762 If you can run and test that would be appreciated. It's the best fix that can be done at the moment, the real fix will be dependency graph, but that's a ways off. |
I will give it a try on Wednesday evening - I have an account lined up for that time that needs to be nuked. |
It seems it did not help. With the build you referenced in #332 it now just said the log group was removed but it was actually not removed. |
So it is removed just comes back, this at least fixed then problem of getting stuck waiting. The problem of it coming back is only ever going to be solved by a dependency based delete. |
@martivo I think we should merge #332 -- as it fixes the infinite waiting problem, and then close this and track it against with DAG feature issue I have open as I believe that's the only way to truly solve this problem, otherwise, it's pretty much, have to run it twice sort of a scenario. Thoughts? |
I guess its ok to merge, but then you will most surely get another isse that the nuke is not actually deleting all the resources. From my perspective this behaviour is ok - at least I can now decently run it twice without haivng to wait a long time. |
I have run into this issue on aws-nuke v3.32.0 |
Fix is coming this weekend. It'll prevent it from being stuck but it won't catch it if it comes back unless you run a second time for the immediate fix. |
The log group should hopefully not have a lot in it after it comes back as the resources generating the logs get deleted, so it shouldn't cost much and waiting for the next run would be acceptable. Probably costs more to have the nuke job stuck waiting indefinitely until the job times out than the cost of the logs coming back. |
Creating the same issue in this fork as was closed in the old repo rebuy-de/aws-nuke#500
The problem still exists, tested with ghcr.io/ekristen/aws-nuke:v3.23.0
nuke.log
When I re-run aws-nuke then deletion is successfull (only the CloudWatchLogsLogGroup is deleted)
The text was updated successfully, but these errors were encountered: