Are opa_value_add_path() / opa_value_remove_path() safe to "use in anger"? #359
-
Greetings! With the OPA WASM runtime, the opa_value_add_path() and opa_value_remove_path() exports are available to incrementally add or remove data in the VM's space. But are they really safe to use repeatedly? The documentation (https://www.openpolicyagent.org/docs/latest/wasm/#exports) seems to indicate that "Values removed will be freed". However, running repeated remove()/add()s in a tight loop ends up leaking memory very quickly. I've readily observed this in test code (repeated add()/remove() using same path and same data). Frome what I can tell from looking at the source (https://github.com/open-policy-agent/opa/blob/main/wasm/src/value.c), the issue seems to come down to shallow opa_free()s in several places.
Although it would be painful, one might be able to prune the data bit by bit using opa_value_remove_path() starting from the leaves of the data and working back. But that wouldn't address the issue of the "path object" also leaking memory. For what it's worth, my calling convention for these functions mirrors the code in the OPA agent at https://github.com/open-policy-agent/opa/blob/main/internal/wasm/sdk/internal/wasm/vm.go#L522 and https://github.com/open-policy-agent/opa/blob/main/internal/wasm/sdk/internal/wasm/vm.go#L571. So ... are these functions ever really intended to be "used in anger"? Or (at least in the current implementation) are they best for small, one-off deltas in the data only? Under what conditions would we expect OPA itself to end up leaking memory in the same way? Thanks in advance for your time and consideration. |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 3 replies
-
Hey! Thanks for the detailed post.
I think you've provided good evidence that it isn't 😅
Those methods are called whenever a store update changes the data: all the existing Wasm VMs need to have their data updated. The code is here. From the top of my head, I cannot tell if the Wasm VMs are decommissioned when they're not returned to the pool, or if they're kept around. If they were reset in some way, the leak wouldn't happen in OPA when using Wasm bundles. But I wouldn't bet on it, it may very well be that it's leaking memory in the same way. I think the Wasm modules in OPA-run-as-server haven't really caught on, so there's a very limited user base. (If there were more users, we'd have seen reports before.) Looking at the code, you wrote
I guess that's the problem, then, since it's clearly been the intention to free the elements. At least that's how I'd read the code. I'm afraid your best bet would be to push a pull request. I suspect you've got a good understanding of what isn't happening but should be. Would you be up to that? 😃 |
Beta Was this translation helpful? Give feedback.
-
Hey! Thanks for the response and explanation. It's good to hear that this probably won't hit the OPA-run-as-server for most cases for the time being. I wasn't sure if it was transparently compiling policies to WASM VMs on the backend as an optimization. (that could be a neat feature. :) ) I thought about trying to adjust the opa_object_free() and opa_array_free() calls to recursively call opa_value_free() on their elements. The one thing that I'm not sure of is whether the values form a tree or a DAG. I'm guessing it's not a general graph since I've seen some (unrelated) rego errors that seemed to report being found due to cycles in the graph. But maybe that's just the AST and not the final value tree. If the values form a tree, then the deep-free should be straightforward. But if two different values can refer to a common third value (i.e. it has a cycle or it's a DAG), then recursively freeing values would require probably require some kind of reference counting scheme. It's doable, but a much bigger change. :( I know the values that I'd put in for policy data would just be JSON tree. But, I don't know the code well enought to know whether the same code might be used for other rego data that contains cycles. Anyone happen to know? |
Beta Was this translation helpful? Give feedback.
Hey! Thanks for the detailed post.
I think you've provided good evidence that it isn't 😅
Those methods are called whenever a store update changes the data: all the existing Wasm VMs need to have their data updated. The code is here.
From the top of my head, I cannot tell if the Wasm VMs are decommissioned when they're not returned to the pool, or if they're kept around. If they were reset in some way, the leak wouldn't happen in OPA when using Wasm bundles. But I wouldn't bet on it, it may very well be that it's leaking memory in the same way.
I think …