Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CNF-11234: Enable RTE metrics to be scraped securely by Prometheus #1107

Merged

Conversation

rbaturov
Copy link
Contributor

This is a follow-up PR for Enable NROP metrics to be to scraped securely by Prometheus .

This PR encompasses and integrates all the needed infrastructure to enable secured communication for scraping metrics by Prometheus, now for RTE.

A follow-up PR will be posted for e2e tests later.

To validate that this PR is functioning correctly, please follow these steps:

  1. build image of the operator (make docker-build docker-push)
  2. run: make deploy
  3. Attach to one of the prometheus pods oc exec -it prometheus-k8s-0 -n openshift-monitoring /bin/bash
  4. run:
curl -v \
--cacert /etc/prometheus/configmaps/serving-certs-ca-bundle/service-ca.crt \
-H "Authorization: Bearer $(cat /var/run/secrets/kubernetes.io/serviceaccount/token)" \
https://numaresources-rte-metrics-service.numaresources.svc:2112/metrics

Copy link
Contributor

openshift-ci bot commented Dec 10, 2024

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Dec 10, 2024
@rbaturov rbaturov marked this pull request as ready for review December 11, 2024 06:22
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Dec 11, 2024
@openshift-ci openshift-ci bot requested review from ffromani and Tal-or December 11, 2024 06:22
@ffromani
Copy link
Member

/retest

Copy link
Member

@ffromani ffromani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure we need a separate manifest set. I'. mot saying I'm against it but I'd like to see if/how we can merge it with the existing manifest set

Comment on lines 29 to 40
- apiGroups:
- ""
resources:
- serviceaccounts
verbs:
- '*'
- apiGroups:
- ""
resources:
- services
verbs:
- '*'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, this is a role, hence it is ONLY granted in the NROP namespace, right? Stll this is very broad, we need to narrow down AND document in the commit message why do we need these permissions

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, you are right. we want in only in the NROP namespace.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ffromani
I've tried to create a separate role (not clusterRole) and a RoleBinding.
But adding the label
//+kubebuilder:rbac:groups="",resources=services,verbs=*
followed by make manifests appends its in the clusterRole.
Not sure how we can avoide this and append this in the dedicated role.


for _, obj := range r.RTEMetricsManifests.ToObjects() {
// Check if the object already exists
existingObj := obj.DeepCopyObject().(client.Object)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need the cast here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I'll remove the casting:
I'll get this error when using r.Client.Get:
cannot use existingObj (variable of type runtime.Object) as client.Object value in argument to r.Client.Get: runtime.Object does not implement client.Object (missing method GetAnnotations)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and why do we use DeepCopyObject() and not DeepCopy() ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ffromani This because the client.Object implements DeepCopyObject() and not DeepCopy()

controllers/numaresourcesoperator_controller_test.go Outdated Show resolved Hide resolved
@rbaturov
Copy link
Contributor Author

I'm not sure we need a separate manifest set. I'. mot saying I'm against it but I'd like to see if/how we can merge it with the existing manifest set

I agree on this one.
But I coudln't figure out a more appropriate place to put this manifest.

@rbaturov rbaturov force-pushed the expose-metrics-securely-rte branch 3 times, most recently from 1efaad4 to b827ccd Compare December 15, 2024 10:14
@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Dec 15, 2024
@rbaturov rbaturov force-pushed the expose-metrics-securely-rte branch from b827ccd to 7b5b369 Compare December 15, 2024 10:20
@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Dec 15, 2024
@rbaturov rbaturov force-pushed the expose-metrics-securely-rte branch from 7b5b369 to 15b513e Compare December 15, 2024 11:23
Copy link
Member

@ffromani ffromani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve
/lgtm

good starting point for 4.19 and onwards

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Dec 16, 2024
Copy link
Contributor

openshift-ci bot commented Dec 16, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ffromani, rbaturov

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Dec 16, 2024
@ffromani
Copy link
Member

/hold

need to check RBAC rules again

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Dec 16, 2024
@ffromani
Copy link
Member

/lgtm cancel

@openshift-ci openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label Dec 16, 2024
@rbaturov
Copy link
Contributor Author

/hold

need to check RBAC rules again

Regarding this, it is correct that now I have granted permission to the operator to manage service objects across the entire cluster and not narrowed it down to NROP namespace.
As I said before, I'm not sure how we can create a role not a clusterrole for this matter.

@rbaturov rbaturov force-pushed the expose-metrics-securely-rte branch from 15b513e to 3ddbf0e Compare December 19, 2024 08:33
@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Dec 19, 2024
@rbaturov rbaturov force-pushed the expose-metrics-securely-rte branch from 3ddbf0e to 63eef4a Compare December 19, 2024 08:33
@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Dec 19, 2024
@rbaturov
Copy link
Contributor Author

/retest-required

2 similar comments
@rbaturov
Copy link
Contributor Author

/retest-required

@rbaturov
Copy link
Contributor Author

/retest-required

@ffromani
Copy link
Member

/retest-required

the first retest it's a freebie, but past that we need to check why tests are failing and only when we are confident it's actually an infra issue we should retest

@rbaturov
Copy link
Contributor Author

rbaturov commented Dec 19, 2024

/retest-required

the first retest it's a freebie, but past that we need to check why tests are failing and only when we are confident it's actually an infra issue we should retest

I saw there was a timeout. I'm not sure this is the infra indeed, I wanted to give it a try before deep diving into that
Edit: actually I forgot this is the second retest. will take a look now

@rbaturov rbaturov changed the title Enable RTE metrics to be scraped securely by Prometheus CNF-11234: Enable RTE metrics to be scraped securely by Prometheus Dec 19, 2024
@openshift-ci-robot
Copy link

@rbaturov: This pull request references CNF-11234 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.19.0" version, but no target version was set.

In response to this:

This is a follow-up PR for Enable NROP metrics to be to scraped securely by Prometheus .

This PR encompasses and integrates all the needed infrastructure to enable secured communication for scraping metrics by Prometheus, now for RTE.

A follow-up PR will be posted for e2e tests later.

To validate that this PR is functioning correctly, please follow these steps:

  1. build image of the operator (make docker-build docker-push)
  2. run: make deploy
  3. Attach to one of the prometheus pods oc exec -it prometheus-k8s-0 -n openshift-monitoring /bin/bash
  4. run:
curl -v \
--cacert /etc/prometheus/configmaps/serving-certs-ca-bundle/service-ca.crt \
-H "Authorization: Bearer $(cat /var/run/secrets/kubernetes.io/serviceaccount/token)" \
https://numaresources-rte-metrics-service.numaresources.svc:2112/metrics

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@rbaturov rbaturov force-pushed the expose-metrics-securely-rte branch from 63eef4a to 91efaab Compare December 19, 2024 13:01
Copy link
Member

@ffromani ffromani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

we can narrow down permissions later on

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Dec 19, 2024
@ffromani
Copy link
Member

/hold cancel

@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Dec 19, 2024
@rbaturov rbaturov force-pushed the expose-metrics-securely-rte branch from 91efaab to 2edf895 Compare December 22, 2024 07:29
@openshift-ci openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label Dec 22, 2024
@rbaturov rbaturov force-pushed the expose-metrics-securely-rte branch from 2edf895 to a063be5 Compare December 22, 2024 07:36
@rbaturov rbaturov force-pushed the expose-metrics-securely-rte branch from a063be5 to fbeca08 Compare December 22, 2024 10:46
@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Dec 22, 2024
* Enabled metrics by default by setting --metrics-mode=httptls.
* Added an RTE metrics package and associated manifests. Currently, this includes only the Service manifest.
* The Service manifest includes a special annotation monitored by the Service CA operator. This operator generates the tls.key and tls.crt files in the rte-metrics-service-cert secret, which is consumed by the RTE worker container.
* Updated the DaemonSet configuration to include the required volume and volumeMount for accessing the metrics secret.
* Deployment of this Service is required for metrics functionality.

Signed-off-by: Ronny Baturov <[email protected]>
As part of enabling metrics for RTE, a Service resource is created during the deployment of the RTE metrics manifests by the operator. This commit grants the operator pod the necessary permissions to deploy the Service CR.

Signed-off-by: Ronny Baturov <[email protected]>
* Integrating RTE metrics manifests to be deployed by the operator
* This adds unit test for metrics components creation

Signed-off-by: Ronny Baturov <[email protected]>
Signed-off-by: Ronny Baturov <[email protected]>
@rbaturov rbaturov force-pushed the expose-metrics-securely-rte branch from fbeca08 to ae455f8 Compare December 22, 2024 10:52
@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Dec 22, 2024
@Tal-or
Copy link
Collaborator

Tal-or commented Dec 22, 2024

/lgtm
Good enough for now. Thanks!

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Dec 22, 2024
@openshift-merge-bot openshift-merge-bot bot merged commit 1dd2f17 into openshift-kni:main Dec 22, 2024
15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants