Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: nrop: controller: don't delete unused resources explictly #1029

Closed
wants to merge 11 commits into from

Conversation

ffromani
Copy link
Member

@ffromani ffromani commented Oct 2, 2024

We add Controller References, so we expect the system to take care of orphaned resources.
Hence, the explicit delete logic can be removed. That logic was always suspicious, it should never be needed and it's a sweeping yet effective fix that could have masked real issues. It's time to address these issues and had real ownership and object cleanup instead of using this bandaid.

Besides simplification, we now are down two List() per reconcile loop

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Oct 2, 2024
@openshift-ci openshift-ci bot requested review from shajmakh and swatisehgal October 2, 2024 15:50
Copy link
Contributor

openshift-ci bot commented Oct 2, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ffromani

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 2, 2024
@ffromani
Copy link
Member Author

ffromani commented Oct 2, 2024

e2e tests:

  • extend install suite (runs on CI)
  • add explicit check to verify all expected manifests are present
  • add explicit check to ensure all expected manifests are removed once we remove the NROP object (including when we delete/recreate, probably modify existing test + add explicit test)
  • add explicit using a NROP object with 2 nodegroups. Edit the NROP object to remove 1, make sure all the objects pertaining to the second are removed, while the objects pertaining to the one left are not gone

@ffromani ffromani force-pushed the avoid-explicit-delete branch from 88b8b3b to b9489aa Compare October 2, 2024 16:48
@ffromani
Copy link
Member Author

ffromani commented Oct 2, 2024

/hold

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 2, 2024
@ffromani
Copy link
Member Author

ffromani commented Oct 3, 2024

xref: CNF-3740

@shajmakh
Copy link
Member

shajmakh commented Oct 3, 2024

if not explicitly deleted (or at least ownerRefernce is removed) wouldn't they still be under the related objects of the NRO CR?

@ffromani
Copy link
Member Author

ffromani commented Oct 3, 2024

if not explicitly deleted (or at least ownerRefernce is removed) wouldn't they still be under the related objects of the NRO CR?

this is one of the questions we need to answer. The rationale for explicit delete is fuzzy and likely obsolete (since few releases)

@ffromani ffromani force-pushed the avoid-explicit-delete branch from b9489aa to 0d9a026 Compare October 15, 2024 11:30
@ffromani ffromani mentioned this pull request Oct 15, 2024
@ffromani ffromani force-pushed the avoid-explicit-delete branch from 0d9a026 to c3211dc Compare October 15, 2024 13:45
@ffromani
Copy link
Member Author

good, the expected failures mean that so far the suites are working as expected. We can proceed adding the real new e2e tests.

@ffromani ffromani force-pushed the avoid-explicit-delete branch 2 times, most recently from ea2e464 to e415ac8 Compare October 16, 2024 13:35
@ffromani
Copy link
Member Author

needs to be rebased on top of #1045

@ffromani ffromani force-pushed the avoid-explicit-delete branch 5 times, most recently from 5640c6d to d14d861 Compare October 17, 2024 11:24
@ffromani ffromani changed the title WIP: nrop: controller: don't delete unused resources explictly nrop: controller: don't delete unused resources explictly Oct 17, 2024
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Oct 17, 2024
add missing GinkgoHelper() declaration to make error
trace point to the failing test, not to the shared helper.

Signed-off-by: Francesco Romani <[email protected]>
cosmetic change, no intended change in behavior

Signed-off-by: Francesco Romani <[email protected]>
We add Controller References, so we expect the system to
take care of orphaned resources.
Move the relevant code in the new package `dangling`,
which will cleaned up and reused later.

Signed-off-by: Francesco Romani <[email protected]>
keep the happy path on the left. No inteded changes in behavior.

Signed-off-by: Francesco Romani <[email protected]>
instead of expecting them from the caller.
No intended changes in behavior.

Signed-off-by: Francesco Romani <[email protected]>
Don't delete straight up orphan objects, return them.
This way we can use this code in validation.

Signed-off-by: Francesco Romani <[email protected]>
@ffromani ffromani force-pushed the avoid-explicit-delete branch from d14d861 to 675fea8 Compare October 17, 2024 11:38
@ffromani
Copy link
Member Author

/hold cancel

tests are good enough now

@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 17, 2024
@ffromani ffromani force-pushed the avoid-explicit-delete branch from 675fea8 to 0c6af74 Compare October 17, 2024 13:32
@ffromani
Copy link
Member Author

/hold

I think finally I managed to remember. The scenario the explicit delete is supposed to cover is:

  1. NodeGroups are edited or deleted
  2. Due to item 1, Nodes (MCPs) previously managed are no longer managed
  3. The MC/DaemonSet of the previous nodegroup, now no longer relevant, are left lingering
  4. Since the NRO object still exists, these objects are never cleaned up

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 17, 2024
Signed-off-by: Francesco Romani <[email protected]>
wire in validator support for dangling objects.
Note: `pkg/validator` imports from `internal`
since few releases, so it should be moved or changed.

Signed-off-by: Francesco Romani <[email protected]>
These utilities are really code we want to use internally,
so they hardly will ever promoted in `pkg/`, and surely
not in their current form, without extensive polishing.

We can reuse this code in the e2e tests however.

Trivial code movement and minimal API changes to support
the move (e.g. renderScheduler -> render.RenderScheduler -> render.Scheduler)

Signed-off-by: Francesco Romani <[email protected]>
The uninstall tests can and should check that all
the objects managed by a NRO instance are deleted
once the NRO object itself is deleted.

Note this does not include yet edits in node groups,
because those are hard to setup in CI.

Signed-off-by: Francesco Romani <[email protected]>
WIP TBD

Signed-off-by: Francesco Romani <[email protected]>
@ffromani ffromani force-pushed the avoid-explicit-delete branch from 0c6af74 to 6eece60 Compare October 17, 2024 14:58
@ffromani ffromani changed the title nrop: controller: don't delete unused resources explictly WIP: nrop: controller: don't delete unused resources explictly Oct 17, 2024
@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Oct 17, 2024
Copy link
Contributor

openshift-ci bot commented Oct 17, 2024

@ffromani: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/ci-install-e2e 6eece60 link true /test ci-install-e2e

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@ffromani
Copy link
Member Author

much ado about nothing. There are good ideas in this PR, but the overall concept turned out wrong. I'll post later more targeted PRs to recycle the good bits in there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants