-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[release-4.18] Enable NROP metrics to be to scraped securely by Prometheus #1140
[release-4.18] Enable NROP metrics to be to scraped securely by Prometheus #1140
Conversation
/approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: ffromani, rbaturov The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
needs rebase |
Signed-off-by: Ronny Baturov <[email protected]>
This commit consist of the following changes: * Enabled secureserving in the metrics server. * Added a Service for the deployment with a special annotation, which is monitored by the Service CA operator. This operator will generate the tls.key and tls.crt files inside the metrics-service-cert secret, which is used by manager container. * Added volume and volumeMount to enable the deployment to consume the metrics secret generated with the tls.key and tls.crt. * Added Role and RoleBinding resources to grant the necessary permissions to the Prometheus service account. * modified ServiceMonitor resource to enable Prometheus pods to scrape metrics. Most of this configuration was based on this guide: https://rhobs-handbook.netlify.app/products/openshiftmonitoring/collecting_metrics.md/ Signed-off-by: Ronny Baturov <[email protected]>
* Added the openshift.io/cluster-monitoring: "true" label to the NROP namespace. This label is required to enable Prometheus to scrape metrics from the namespace. * Added the annotation operatorframework.io/cluster-monitoring=true. This will cause the OpenShift console to display a checkbox during installation, allowing users to enable cluster monitoring for the operator. Signed-off-by: Ronny Baturov <[email protected]>
Signed-off-by: Ronny Baturov <[email protected]>
* Added a Role and RoleBinding for the manager pod, allowing the operator to manage Services exclusively within the NROP namespace. * Removed the Service-related RBAC permissions previously granted in the ClusterRole. Signed-off-by: Ronny Baturov <[email protected]>
This ensures the informer can successfully watch and list Service objects. Without this restriction, the operator will encounter the following failures: reflector.go:561] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers.go:106: failed to list *v1.Service: services is forbidden: User "system:serviceaccount:numaresources:numaresources-controller-manager" cannot list resource "services" in API group "" at the cluster scope reflector.go:158] "Unhandled Error" err="sigs.k8s.io/controller-runtime/pkg/cache/internal/informers.go:106: Failed to watch *v1.Service: failed to list *v1.Service: services is forbidden: User \"system:serviceaccount:numaresources:numaresources-controller-manager\" cannot list resource \"services\" in API group \"\" at the cluster scope" logger="UnhandledError" Signed-off-by: Ronny Baturov <[email protected]>
e990e3b
to
1904cfc
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
93834c9
into
openshift-kni:release-4.18
This PR is a combined backport of #1106 and #1133.
This PR encompasses all the required changes to reintegrate NROP metrics with Prometheus. This implementation uses the metrics server API from the controller-runtime.
The majority of the changes in this PR follow the guidelines outlined in the following guide:
https://rhobs-handbook.netlify.app/products/openshiftmonitoring/collecting_metrics.md/
The difference between this implementation and the guide's is that we use a
bearerTokenFile
for Prometheus authentication instead of tls.crt and tls.key. This approach uses TLS but does not implement mTLS.A follow-up PR will be issued to ensure we implement this for RTE metrics as well.
To validate that this PR is functioning correctly, please follow these steps:
make docker-build docker-push
)make deploy
oc exec -it prometheus-k8s-0 -n openshift-monitoring /bin/bash