Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OPA response is nil issue #6972

Closed
kirk-patton opened this issue Aug 28, 2024 · 10 comments
Closed

OPA response is nil issue #6972

kirk-patton opened this issue Aug 28, 2024 · 10 comments

Comments

@kirk-patton
Copy link
Contributor

          @kirk-patton I would recommend using the latest version of OPA and try to repro with that. Also it would help if you could share the policy, input etc. There were some issues around OPA returning a "nil" response in an older version which we fixed. Not sure if you're seeing the same issue but using the latest OPA version would help to figure that out.

Originally posted by @ashutosh-narkar in #6585 (comment)

@kirk-patton
Copy link
Contributor Author

I have updated the clusters to opa 0.66.0. Things were quite for a few weeks. The issue has resurfaced though. I tried asking in the community slack channel, but so far no response.

The decision logger we use does not return the admission review when the response from OPA is nil. When the response is nil, should be possible to get the admissionReview? I would like to be able to provide more specifics, but I am having trouble getting any information at the time of the issue. I do know that doing a rollout restart of the opa deployment alleviates the issue for a time hours/a couple days. The issue seems more prevalent in certain clusters.

@ashutosh-narkar
Copy link
Member

If you could provide some details that help repro the issue that would be really helpful.

@ashutosh-narkar
Copy link
Member

The decision logger we use does not return the admission review when the response from OPA is nil.

Is that part of the input?

@kirk-patton kirk-patton changed the title @kirk-patton I would recommend using the latest version of OPA and try to repro with that. Also it would help if you could share the policy, input etc. There were some issues around OPA returning a "nil" response in an older version which we fixed. Not sure if you're seeing the same issue but using the latest OPA version would help to figure that out. OPA response is nil issue Aug 29, 2024
@kirk-patton
Copy link
Contributor Author

kirk-patton commented Aug 29, 2024

If you could provide some details that help repro the issue that would be really helpful.

If only I could reproduce the issue :-)

We use OPA as and admission webhook in multiple k8s clusters. We load our policies using kube-mgmt. We have a decision logger configured to log OPA policy success/fail with some details to aid in troubleshooting. If OPA returns a true/false response, we can see the k8s admissionReview just fine. But, if OPA returns a nil response, the admissionReview is also nil.

Given that the decision logger does not return any useful information when the response from OPA is nil, there is not much to go on.

Does OPA expose the k8s admission review details when OPA response is nil? Or is there some other recommended way to get useful information to help determine what triggers the nil response?

From our decision logger..

	Revision    string                   `json:"revision,omitempty"`
	DecisionID  string                   `json:"decision_id,omitempty"`
	RemoteAddr  string                   `json:"remote_addr"`
	Query       string                   `json:"query"`
	Path        string                   `json:"path"`
	Timestamp   string                   `json:"timestamp"`
	Input       v1beta1.AdmissionReview  `json:"input,omitempty"`   <====
	Result      *v1beta1.AdmissionReview `json:"result,omitempty"`
	Error       *types.ErrorV1           `json:"error,omitempty"`
	Explanation types.TraceV1            `json:"explanation,omitempty"`
	Metrics     types.MetricsV1          `json:"metrics,omitempty"`
}

We are not running into any issues when we run the unit tests on our policies. We alert when OPA returns a nil response, as it impacts the users submitting to the cluster. Performing a k8s rolling restart of the OPA deployment temporarily resolves the issue. Jobs submitted that encounter the OPA response is nil, submit fine after OPA is restarted.

@ashutosh-narkar
Copy link
Member

Even if the policy result is nil, there should no impact on the input included in the decision log.

But, if OPA returns a nil response, the admissionReview is also nil.

Are there any errors reported in the logs?

@kirk-patton
Copy link
Contributor Author

kirk-patton commented Aug 29, 2024

Even if the policy result is nil, there should no impact on the input included in the decision log.

But, if OPA returns a nil response, the admissionReview is also nil.

Are there any errors reported in the logs?

I have only looked at the decision-logger side-car when the issue has surfaced. I will make a point to examine OPA logs to see if there are any clues there.

Even if the policy result is nil, there should no impact on the input included in the decision log.

Thank you for confirming that! I will update the decision logger to spew.Dump to the console and see what that gets me

@ashutosh-narkar
Copy link
Member

Can you please also share your OPA config?

@kirk-patton
Copy link
Contributor Author

I am working on gathering more information to help diagnose this issue.

@kirk-patton
Copy link
Contributor Author

Looks like the issue we are seeing is the same as open-policy-agent/kube-mgmt#189

OPA is getting restarted on our AWS cluster, possible dues to Karpenter. kube-mgmt does not reload the policies. The response we get back just contains the decision-id and nothing else.

We are planning on adding a livenessProbe to the mgmt side-car

@ashutosh-narkar
Copy link
Member

I would recommend using bundles for policy and data distribution. Here's an example tutorial.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants