Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor Ceilometer services' status objects #561

Merged
merged 3 commits into from
Jan 17, 2025

Conversation

paramite
Copy link
Contributor

This patch joins status objects and uses prefixed contitions instead and splits hash object for each service.

This change is required as it fixes malfunctioning KSMStatus and adds missing hashing of KSM configuration.`

Copy link
Contributor

openshift-ci bot commented Dec 10, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: paramite

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@abays
Copy link
Contributor

abays commented Dec 10, 2024

Removing required status fields that exist in older APIs and adding new required fields will break OLM updates. So we shouldn't be doing what is proposed in the CR, I think.

@stuggi
Copy link
Contributor

stuggi commented Dec 10, 2024

@paramite you have to delete bin/controller-gen to get the new version 0.14.0 installed. also see the comment from @abays . This PR can not land as is

@vyzigold
Copy link
Contributor

Removing required status fields that exist in older APIs and adding new required fields will break OLM updates. So we shouldn't be doing what is proposed in the CR, I think.

Unless I missed something, this is only removing required fields, it's not adding new required fields. Is removing required fields also an issue for OLM?

Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/924588bcc2944382af315ff10227e06f

✔️ openstack-k8s-operators-content-provider SUCCESS in 2h 01m 30s
podified-multinode-edpm-deployment-crc FAILURE in 1h 05m 50s
telemetry-operator-multinode-autoscaling-tempest FAILURE in 1h 36m 17s
telemetry-operator-multinode-default-telemetry FAILURE in 1h 05m 56s
functional-tests-on-osp18 FAILURE in 1h 39m 09s (non-voting)
functional-logging-tests-osp18 FAILURE in 1h 11m 29s (non-voting)
functional-graphing-tests-osp18 FAILURE in 1h 37m 58s (non-voting)
functional-metric-verification-tests-osp18 FAILURE in 1h 36m 38s (non-voting)

@abays
Copy link
Contributor

abays commented Dec 10, 2024

Removing required status fields that exist in older APIs and adding new required fields will break OLM updates. So we shouldn't be doing what is proposed in the CR, I think.

Unless I missed something, this is only removing required fields, it's not adding new required fields. Is removing required fields also an issue for OLM?

Ah, I missed the omitempty attributes because I was looking for the kubebuilder Optional tag.

Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/8d2b9ccb738140f292e671ccf1194121

✔️ openstack-k8s-operators-content-provider SUCCESS in 2h 37m 11s
podified-multinode-edpm-deployment-crc FAILURE in 1h 04m 09s
telemetry-operator-multinode-autoscaling-tempest FAILURE in 1h 38m 36s
telemetry-operator-multinode-default-telemetry FAILURE in 1h 06m 26s
functional-tests-on-osp18 FAILURE in 1h 36m 12s (non-voting)
functional-logging-tests-osp18 FAILURE in 1h 08m 27s (non-voting)
functional-graphing-tests-osp18 FAILURE in 1h 35m 30s (non-voting)
functional-metric-verification-tests-osp18 FAILURE in 1h 37m 08s (non-voting)

@vyzigold
Copy link
Contributor

Some of the conditions in ceilometer CR are in "unknown" state, which is why the CI fails.

This patch moves status data about kube-state-metrics
to CeilometerStatus, because currently KSMStatus does not update
properly and if helper.PatchInstance is made to update KSMStatus
then Ceilometer objects ends in never ending reconciliation loop.
CeilometerStatus was created for the sake of consistency with KSMStatus,
which is now deprecated and unused because of the functionality issues.
Keeping the name makes for Ceilometer does not make sense any more.
Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/76fac7385261481d9b28677828fed632

✔️ openstack-k8s-operators-content-provider SUCCESS in 2h 06m 03s
✔️ podified-multinode-edpm-deployment-crc SUCCESS in 1h 19m 26s
✔️ telemetry-operator-multinode-autoscaling-tempest SUCCESS in 1h 17m 15s
telemetry-operator-multinode-default-telemetry FAILURE in 1h 08m 17s
✔️ telemetry-operator-multinode-power-monitoring SUCCESS in 49m 37s
✔️ functional-tests-on-osp18 SUCCESS in 1h 50m 14s
✔️ functional-logging-tests-osp18 SUCCESS in 1h 25m 01s
✔️ functional-graphing-tests-osp18 SUCCESS in 1h 11m 34s
✔️ functional-metric-verification-tests-osp18 SUCCESS in 1h 12m 07s

@paramite
Copy link
Contributor Author

Copy link
Contributor

@vyzigold vyzigold left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

otherwise this lgtm

}

// KSMStatus defines the observed state of kube-state-metrics
// NOTE(mmagr): remove with API version increment
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment will show in a description of the ksmStatus. For example when executingoc explain ceilometer. Not sure if that was intentional. An empty line between the 2 comments sholud hide the top line from users.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, good catch. That is certainly not the intention. Will fix.

@vyzigold
Copy link
Contributor

recheck

Could not find any reason related to this patch and from [1] it seems that deployments succeeded.

[1] https://logserver.rdoproject.org/61/561/fb0b0754fb1dd9063b979ddb909b458f7ebe1bd7/github-check/telemetry-operator-multinode-default-telemetry/92687e5/controller/ci-framework-data/logs/openstack-k8s-operators-openstack-must-gather/namespaces/openstack/statefulset/

The issue is with operators restarting. This is happening everywhere across the CI downstream and upstream. I tried to point out the issue to people, but I didn't have much luck, guess I just don't know who to talk to and personally I don't know what to do with it. IMO it's not good that the operators are restarting, but we're the only ones actually testing it.

Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/00f1a0a2af014bfaa2e3c1ecea719792

openstack-k8s-operators-content-provider FAILURE in 14m 56s
⚠️ podified-multinode-edpm-deployment-crc SKIPPED Skipped due to failed job openstack-k8s-operators-content-provider
⚠️ telemetry-operator-multinode-autoscaling-tempest SKIPPED Skipped due to failed job openstack-k8s-operators-content-provider
⚠️ telemetry-operator-multinode-default-telemetry SKIPPED Skipped due to failed job openstack-k8s-operators-content-provider
✔️ telemetry-operator-multinode-power-monitoring SUCCESS in 51m 33s
⚠️ functional-tests-on-osp18 SKIPPED Skipped due to failed job openstack-k8s-operators-content-provider
⚠️ functional-logging-tests-osp18 SKIPPED Skipped due to failed job openstack-k8s-operators-content-provider
⚠️ functional-graphing-tests-osp18 SKIPPED Skipped due to failed job openstack-k8s-operators-content-provider
⚠️ functional-metric-verification-tests-osp18 SKIPPED Skipped due to failed job openstack-k8s-operators-content-provider

@paramite
Copy link
Contributor Author

recheck

Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/5b03e81ea6ac48328e17c03b08c2cbd3

✔️ openstack-k8s-operators-content-provider SUCCESS in 2h 05m 41s
✔️ podified-multinode-edpm-deployment-crc SUCCESS in 1h 18m 21s
✔️ telemetry-operator-multinode-autoscaling-tempest SUCCESS in 1h 10m 04s
telemetry-operator-multinode-default-telemetry FAILURE in 26m 05s
✔️ telemetry-operator-multinode-power-monitoring SUCCESS in 48m 50s
✔️ functional-tests-on-osp18 SUCCESS in 1h 41m 04s
functional-logging-tests-osp18 FAILURE in 31m 01s
✔️ functional-graphing-tests-osp18 SUCCESS in 1h 09m 05s
✔️ functional-metric-verification-tests-osp18 SUCCESS in 1h 11m 30s

@paramite
Copy link
Contributor Author

recheck

@paramite
Copy link
Contributor Author

/retest

@paramite
Copy link
Contributor Author

/test telemetry-operator-build-deploy-kuttl

@paramite
Copy link
Contributor Author

Seems like image pull is failing in the kuttl job:

13m Warning FailedToRetrieveImagePullSecret pod/keystone8594-account-delete-fg842 Unable to retrieve some image pull secrets (galera-openstack-dockercfg-8b97z); attempting to pull the image may not succeed.

@paramite
Copy link
Contributor Author

/test telemetry-operator-build-deploy-kuttl

@vyzigold
Copy link
Contributor

/lgtm

@openshift-ci openshift-ci bot added the lgtm label Jan 17, 2025
@paramite paramite merged commit cc10815 into openstack-k8s-operators:main Jan 17, 2025
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants