Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MGMT-19148: Wait for OLM operator setup jobs #940

Closed

Conversation

jhernand
Copy link
Contributor

When operators are automatically enabled from assisted installer the custom manifests may contain Kubernetes jobs to complete the setup. For example, a hypothetical storage operator can include in the custom manifests a job that waits for the storage class and marks it as the default. Currently the installer doesn't wait for jobs to finish, and it will declare the cluster to be ready even if those jobs haven't been completed. This patch changes the controller so that it assumes that jobs that have the agent-install.openshift.io/setup-job label are such setup jobs, and waits for them to finish. The value of the label must be the name of the corresponding operator.

This change was partially done in the past, but the _Dockerfile_ wasn't
updated to use the new `mockgen` command. As a result if one installs
the `mockgen` with the command from the dockerfile and then runs `make
generate` the resulting code doesn't build:

```
$ grep mockgen Dockerfile.assisted-installer-build
  go install github.com/golang/mock/[email protected] && \

$ go install github.com/golang/mock/[email protected]

$ make generate
go generate ..
make format
make[1]: Entering directory '/files/projects/assisted-installer/repository'
make[1]: Leaving directory '/files/projects/assisted-installer/repository'

$ make build
CGO_ENABLED=0  go build -o build/installer src/main/main.go
src/main/drymock/dry_mode_k8s_mock.go:308:36: cannot use mockController (variable of type *"go.uber.org/mock/gomock".Controller) as *"github.com/golang/mock/gomock".Controller value in argument to k8s_client.NewMockK8SClient
make: *** [Makefile:68: installer] Error 1
```

To avoid this issue this patch updates the _Dockerfile_, the generated
code and the test code so that they use the same version of the mock
package.

Note that this wasn't detected by CI because we aren't running `make
generate`.

Signed-off-by: Juan Hernandez <[email protected]>
This patch adds a new `options metav1.ListOptions` parameter to the
`ListJobs` method of Kubernetes client interface. This new parameter
will be needed in order to add support for waiting for operator setup
jobs, because that will need to list jobs in all namespaces using a
label selector.

In addition to adding the options the patch also changes the mock
configuration so that they are more precise. Instead of this:

```go
mockk8sclient.EXPECT().ListJobs(gomock.Any(), gomock.Any())
```

We use this now:

```go
mockk8sclient.EXPECT().ListJobs(olmNamespace, metav1.ListOptions{})
```

That will help when introducing other tests where the parameters will be
different.

Related: https://issues.redhat.com/browse/MGMT-19148
Related: https://issues.redhat.com/browse/MGMT-19056
Signed-off-by: Juan Hernandez <[email protected]>
@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Nov 11, 2024
@openshift-ci-robot
Copy link

openshift-ci-robot commented Nov 11, 2024

@jhernand: This pull request references MGMT-19148 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the task to target the "4.18.0" version, but no target version was set.

In response to this:

When operators are automatically enabled from assisted installer the custom manifests may contain Kubernetes jobs to complete the setup. For example, a hypothetical storage operator can include in the custom manifests a job that waits for the storage class and marks it as the default. Currently the installer doesn't wait for jobs to finish, and it will declare the cluster to be ready even if those jobs haven't been completed. This patch changes the controller so that it assumes that jobs that have the agent-install.openshift.io/setup-job label are such setup jobs, and waits for them to finish. The value of the label must be the name of the corresponding operator.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci bot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Nov 11, 2024
Copy link

openshift-ci bot commented Nov 11, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jhernand

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 11, 2024
Copy link

codecov bot commented Nov 11, 2024

Codecov Report

Attention: Patch coverage is 83.78378% with 18 lines in your changes missing coverage. Please review.

Project coverage is 56.01%. Comparing base (628fadb) to head (4bf87da).
Report is 28 commits behind head on master.

Files with missing lines Patch % Lines
...taller_controller/assisted_installer_controller.go 82.17% 11 Missing and 7 partials ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #940      +/-   ##
==========================================
+ Coverage   55.19%   56.01%   +0.81%     
==========================================
  Files          15       15              
  Lines        3279     3381     +102     
==========================================
+ Hits         1810     1894      +84     
- Misses       1290     1301      +11     
- Partials      179      186       +7     
Files with missing lines Coverage Δ
.../assisted_installer_controller/operator_handler.go 80.51% <100.00%> (+1.06%) ⬆️
...ed-installer-controller/assisted_installer_main.go 27.10% <ø> (ø)
...taller_controller/assisted_installer_controller.go 76.39% <82.17%> (+0.48%) ⬆️

@jhernand jhernand force-pushed the add_support_for_olm_setup_jobs branch from 3a7f355 to 4346df0 Compare November 11, 2024 16:18
@jhernand
Copy link
Contributor Author

/test edge-e2e-ai-operator-ztp

When operators are automatically enabled from assisted installer the
custom manifests may contain Kubernetes jobs to complete the setup. For
example, a hypothetical storage operator can include in the custom
manifests a job that waits for the storage class and marks it as the
default. Currently the installer doesn't wait for jobs to finish, and it
will declare the cluster to be ready even if those jobs haven't been
completed. This patch changes the controller so that it assumes that jobs
that have the `agent-install.openshift.io/setup-job` label are such
setup jobs, and waits for them to finish. The value of the label must be
the name of the corresponding operator.

Related: https://issues.redhat.com/browse/MGMT-19148
Signed-off-by: Juan Hernandez <[email protected]>
@jhernand jhernand force-pushed the add_support_for_olm_setup_jobs branch from 4346df0 to 4bf87da Compare November 13, 2024 11:24
Copy link

openshift-ci bot commented Nov 13, 2024

@jhernand: all tests passed!

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@@ -1506,3 +1543,110 @@ func (c controller) SetReadyState(waitTimeout time.Duration) *models.Cluster {
func (c *controller) GetStatus() *ControllerStatus {
return c.status
}

// waitForSetupJobs waits till the jobs are completed successfully.
func (c *controller) waitForSetupJobs(ctx context.Context) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't we make it part of waiting for specific operator in operator handler?

// and that will in turn stop the controller without first checking if the setup jobs have finished.
// So instead we mark the operator as progressing, and will mark it as available later, when we check
// the setup jobs.
var reportedStatus models.OperatorStatus
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe all setup jobs logic should be here, if operator has setup job don't set it as available but i don't think we should split the logic

@jhernand
Copy link
Contributor Author

I am closing this because it doesn't bring a real benefit. Operators can still include jobs to complete their setup, we will just not explicitly wait for them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants