Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

overlay node image before bootstrapping if necessary #899

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

jlebon
Copy link
Member

@jlebon jlebon commented Sep 6, 2024

As per openshift/enhancements#1637, we're trying
to get rid of all OpenShift-versioned components from the bootimages.

This means that there will no longer be oc, kubelet, or crio
binaries for example, which bootstrapping obviously relies on.

To adapt to this, the OpenShift installer now ships a new
node-image-overlay.service in its bootstrap Ignition config. This
service takes care of pulling down the node image and overlaying it,
effectively updating the system to the node image version.

Here, we accordingly also adapt assisted-installer so that we run
node-image-overlay.service before starting e.g. kubelet.service and
bootkube.service.

See also: openshift/installer#8742

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Sep 6, 2024
Copy link

openshift-ci bot commented Sep 6, 2024

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@openshift-ci openshift-ci bot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Sep 6, 2024
@openshift-bot
Copy link
Contributor

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

@openshift-ci openshift-ci bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 10, 2024
@jlebon
Copy link
Member Author

jlebon commented Dec 13, 2024

/remove-lifecycle stale

Now that openshift/enhancements#1637 has merged, we will need this. I'll leave it in draft for now until I can rebase and retest it, but feel free to start reviewing.

@openshift-ci openshift-ci bot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 13, 2024
Because that's effectively what it is. Also use that in another case
that was manually calling `stat`.
As per openshift/enhancements#1637, we're trying
to get rid of all OpenShift-versioned components from the bootimages.

This means that there will no longer be oc, kubelet, or crio
binaries for example, which bootstrapping obviously relies on.

To adapt to this, the OpenShift installer now ships a new
`node-image-overlay.service` in its bootstrap Ignition config. This
service takes care of pulling down the node image and overlaying it,
effectively updating the system to the node image version.

Here, we accordingly also adapt assisted-installer so that we run
`node-image-overlay.service` before starting e.g. `kubelet.service` and
`bootkube.service`.

See also: openshift/installer#8742
@jlebon jlebon changed the title WIP: overlay node image before bootstrapping if necessary overlay node image before bootstrapping if necessary Dec 21, 2024
@jlebon jlebon marked this pull request as ready for review December 21, 2024 17:21
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Dec 21, 2024
Copy link

openshift-ci bot commented Dec 21, 2024

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: jlebon
Once this PR has been reviewed and has the lgtm label, please assign jhernand for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot requested review from jhernand and omertuc December 21, 2024 17:22
@jlebon
Copy link
Member Author

jlebon commented Dec 21, 2024

Rebased and verified this still works! Requires openshift/installer#8742.

Copy link

codecov bot commented Dec 21, 2024

Codecov Report

Attention: Patch coverage is 0% with 3 lines in your changes missing coverage. Please review.

Project coverage is 52.35%. Comparing base (34b971a) to head (33d9f10).

Files with missing lines Patch % Lines
src/ops/ops.go 0.00% 3 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #899      +/-   ##
==========================================
- Coverage   55.20%   52.35%   -2.85%     
==========================================
  Files          15       14       -1     
  Lines        3286     2695     -591     
==========================================
- Hits         1814     1411     -403     
+ Misses       1292     1159     -133     
+ Partials      180      125      -55     
Files with missing lines Coverage Δ
src/ops/ops.go 43.39% <0.00%> (ø)

... and 1 file with indirect coverage changes

Copy link

openshift-ci bot commented Dec 21, 2024

@jlebon: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/edge-unit-test 33d9f10 link true /test edge-unit-test
ci/prow/okd-scos-e2e-aws-ovn 33d9f10 link false /test okd-scos-e2e-aws-ovn
ci/prow/edge-e2e-oci-assisted-4-18 33d9f10 link false /test edge-e2e-oci-assisted-4-18

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@@ -669,6 +685,25 @@ func (i *installer) waitForBootkube(ctx context.Context) {
}
}

func (i *installer) waitForNodeImageOverlay(ctx context.Context) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably it can be change to just call utils.WaitForPredicate with
i.ops.SystemctlAction("is-active", "node-image-overlay.service")
WDYT?

// DryRebootHappened checks if a reboot happened according to a particular reboot marker path
// The dry run installer creates this file on "Reboot" (instead of actually rebooting)
// We use this function to check whether the given node in the cluster have already rebooted
func (o *ops) DryRebootHappened(markerPath string) bool {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO lets leave this one as it helps to understand that we are looking for dry reboot logic, it can though call for FileExists


i.waitForNodeImageOverlay(context.Background())

if !i.ops.FileExists("/usr/bin/oc") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lets make /usr/bin/oc to be a const?

// If we're in a pure RHEL/CentOS environment, we need to overlay the node image
// first to have access to e.g. oc, kubelet, cri-o, etc...
// https://github.com/openshift/enhancements/pull/1637
if !i.ops.FileExists("/usr/bin/oc") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we pull overlay the node image as a first thing? I mean we already applied bootstrap ignition at this point and this can be a little bit late for us to pull another image layer, we probably should do it before

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants