-
Notifications
You must be signed in to change notification settings - Fork 267
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
examples(allocation): free memory after unmarshalling a result from the guest #1390
examples(allocation): free memory after unmarshalling a result from the guest #1390
Conversation
This is the benchamrks result I get by comparing main vs this PR:
I'm not sure if I did everything correct but it looks like there is still a large difference |
f5fa2bb
to
ca79698
Compare
The memory increase seems negligible (it's 1000%+, but just 500 bytes). My previous analysis is totally invalidated because I read The CPU increase I don't know how to evaluate. Is a 20µs increase a problem? If so, I would test a couple of things, to test the assumption that TinyGo using One is a "caller allocates" convention: host calls The other is a global variable for the result. In the guest: // Use this global variable to keep the return value from an external function from being GCed.
var lastResult any
// Call this function to allow GCing the return value from an external function.
func ClearLastResult() { lastResult = nil } |
I would chime in on 20us being a problem. I think it is because for example a web handler can return in less than a microsecond and tinygo is used to implement file handlers. @dmvolod seemed to have a regression fix around tinygo recently and you can tell in these benchmarks that even using things like protobuf, the whole round trip is less than the additional overhead added here. knqyf263/go-plugin#48 TL;DR; something's wrong and we shouldn't advocate (merge) this unless the overhead becomes more a percentage than a factor ;) |
3671d22
to
43eff01
Compare
hmm |
if err != nil { | ||
return 0, err | ||
} | ||
if len(results) > 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should never be false, basically this is a bug if the sig is I32I32_I64, so panic or return an error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I add something like
if len(results) == 0 {
panic("unexpected")
}
Then it panics, but don't know exactly why
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok lemme pull this and take a look
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the problem is that the code was switched to the signature of "greeting" but the function name called was "greet" which didn't return a value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah greet
vs greeting
43eff01
to
370a8ba
Compare
@codefromthecrypt with the latest commit which implements the suggestion in tinygo-org/tinygo#2787 (comment), the results are:
|
370a8ba
to
e29ae28
Compare
if err = m.CallI32I32_V(testCtx, "greet", namePtr, nameSize); err != nil { | ||
return err | ||
// Now, we can call "greeting", which reads the string we wrote to memory! | ||
ptrSize, fnErr := m.CallI32I32_I64(testCtx, "greeting", namePtr, nameSize) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so the main thing here is that the benchmark is doing something different than before. Due to this, it cannot be compared directly to the prior (now it is calling "greeting" but before it was calling "greet"
in some ways it looks like we are doing the same thing as tinymem now. However, mixing and matching the built-in "malloc" and "free" with an explicit version (similar to tinymem) is a little confusing I think, especially in an example. I prefer the example either use the built-in functions like it did before or manually manage, but not both. So if you think that we shouldn't use malloc/free exported by tinygo then I think we just inline the latest version of the code in tinymem into the example/benchmark, then refer to tinymem as a repo that does this for you. https://github.com/tetratelabs/tinymem/blob/main/example/greeting.go |
so to the point of @ncruces here #1390 (comment)
this is what was going on before, and kindof is still. Similar to rust, this was using the underlying malloc/free built-in imports.
This is kindof happening now, except using a map instead of a single field. IMHO if doing something more complicated (like using a map), I think the code that is almost like tinymem should be exactly like that, since that code was reviewed etc, no need to inline something almost the same. In other words, not copy/paste from my old comment, rather the latest version of it. OTOH, I don't know that the special malloc/free exports are going away any time, except possibly not exported by default. Frankly, I have lost track of the real motivation to change anything, but I also don't mind changing as long as it is coherent! |
@codefromthecrypt this whole exercise is an attempt to add some clarification about how host/guest memory should be handled for future devs that like me where looking for an example as it looks like this topic is not much clear, yet. And this PR seems to be a confirmation :) I don't have any preference about using timymem (which I discovered only today) or the builtin free/malloc and CGO so I would love to have some guidance about which one you'd like to be part of the examples. Also, should add a dedicated example and related benchmark ? |
I guess if tinymem's goal is to encode best practices in terms of allocations across the host-guest boundary, it's only natural that an example either uses it or reimplements it. My suggestions were more with the goal of root causing the benchmark regression. From the data @lburgazzoli collected it seems using CGo from TinyGo might be the culprit? And if that's so, that probably confirms that tinymem's approach is the best one, for a TinyGo guest. |
Surprisingly or not I think I am on the same page with both! Maybe took a bit for my brain to catch up. Conflicting advice is basically the opposite of clear, so that's why I think our example should not mix both in the same code. When they are mixed it is really difficult to figure out what advice is and even more unclear is the perf impacts. So, let's recap our entrypoints: We have the example which shows new users how to do allocation, and we also have a safeguard Allocation benchmark which both serves to compare more realistic interaction performance (vs wasmtime etc), and also acts in some ways as a soak test. For example, it was through the benchmark we found things breaking functionally or performance. That is a surprising helpful thing. Along the way, though, we changed the example to something it wasn't before. Specifically "greeting" vs "greet", so we can't really compare performance of the two based on the prior commit. Until the exact use case is run, both ways I don't think we can say for sure an outcome. Here's an idea, if folks have patience for it. Note: this is temporary to achieve an outcome of knowing if both approaches work or have dramatically different outcomes when compared exactly. Change the go allocation source that calls tinygo's wasm to use a CLI arg to determine the style to use. Change the tinygo source to implement both approaches to "greeting": pure CGO including go imports and also the tinymem approach. For tinymem, literally copy/paste the code from tinymem so it isn't subtly different. For example export "greeting_cgo" and "greeting_tinymem" then in the host side do both approaches switched on an arg. So, for example the cgo style would use "malloc" and tinymem style "_malloc". Another way is to make another directory, like tinygo-malloc, if it helps. Then, run the benchmark against each. You can make a manual benchmark or modify BenchmarkAllocation to allow choosing which. The main idea is benchmarking exactly the same use case, both to see the results and also that they complete without error. |
I also want to clarify from my personal POV, I didn't help make tinymem because I thought it was a better idea. This project was made defensively because tinygo were considering removing the CGO imports (those in use by go-plugin, trivy and others). I don't personally want folks to have to do memory things manually, as wasm is hard enough as it is! |
If there is not urgency, I would like to take this task |
14d3752
to
1ffb3f6
Compare
In my latest commit, I added a POC for what has been described by @codefromthecrypt in the latest comments. |
c4b5b91
to
8692e2e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for progressing this! can you paste output of bench results for these two? This will help guide the discussion as people are more likely to engage when they see the difference (vs pulling and running manually)
981580c
to
4f23ede
Compare
@codefromthecrypt I've adapted the allocation example to use CGO for the I kept the
Let me know if this matches what you had in mind. |
4f23ede
to
a35f61e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the help again. I made some requests to pare down the change to the minimum. Don't forget to update the guidance in the site directory for tinygo!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
more notes as my prior ended up in the wrong place. Thanks again!
// managed by TinyGo hence it must be freed by the host. | ||
func stringToNativePtr(s string) (uint32, uint32) { | ||
if len(s) == 0 { | ||
return 0, 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this special case valid? if so add a trailing comment why (for example,// CGO.free ignores a pointer of zero
)
If not, delete the special case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I recall, calling malloc
with a size equals to 0, is implementation dependent hence I opted not to call it in such case.
I Will add a note if that's fine with you but in this case it is probably never happening.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an example, so we can't really add suggestions to special case zero unless it will work. We don't have to consider externally provided impls of malloc/free in the example, though. What tinygo uses with this example and flags given to build is enough.
@anuraaga @dgryski do either of you happen to know what's better:
- not dodging calling malloc dependent on length
- dodging calling malloc with zero length (results in free called with zero)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here's the tinygo code in question. Personally, I prefer to not special case without saying why we are doing it. Most people won't know why they are calling malloc/free in the first place and code like this gets copy/pasted around. As best I can tell, we are assuming the result will be zero and so we are returning that instead of calling things. That would be a perf optimization, which we'd comment why.
If we don't have a good reason tied to tinygo, I prefer we don't do any special casing in this function. Without comments, special casing raises more questions than answers, especially that a zero pointer is something safe to return back to tinygo on free. For example, pointers are often memory offsets in wasm, so zero would be the initial memory position. ack in the tinygo impl the pointer is mapped, but anyway the point is now I'm having to think about if zero is a safe sentinel value for free or not. it is a lot better to leave the problem on what to do with length zero completely up to tinygo (e.g. don't special case). Cheaper to review, less code, easier example.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
special case removed
ps if you feel like benchmarking "greeting" is better than "greet", feel free to switch. What I mean is let's keep only one for the allocation example. I don't care strongly which. |
a703f1d
to
2ad486f
Compare
…he guest Signed-off-by: Luca Burgazzoli <[email protected]>
2ad486f
to
586b61c
Compare
I don't have any strong opinion, so let's keep the benchmark as it is, it wont be too complex to add a new one when/if it would become a need. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I'll go ahead and merge this and do the site updates on own vs asking again here. Appreciate your help getting the underlying solution together and benchmarking to ensure it didn't cause a regression.
I can try to update the docs before EOW if you want me to. |
Epic effort for something that started out with improving example code. |
@lburgazzoli I raised #1429 as I think we should get this sorted to avoid a confusing comment chain over a week again. This sort of stuff is cognitively deep and best way I can find is to contain it by time to shortest as possible, as this avoids things like having to cross when people are out for holidays etc. In general, personally I try to complete everything PR of any kind within a couple days for the same reason, even if not cognitively high. Even small things can get difficult to load back into the brain if drug out over several days, much less weeks, with several people. |
@lburgazzoli sorry for just spotting this, I guess I only grasped it when looking at committed code in whole. What keeps wazero/examples/allocation/tinygo/testdata/greet.go Lines 20 to 24 in 9dd8b1b
I'm wondering if we need to add |
I'll update tinymem once above is resolved. We do tend to see things like this in syscall code, after the call like https://github.com/golang/go/blob/0d347544cbca0f42b160424f6bc2458ebcc7b3fc/src/syscall/fs_wasip1.go#L824-L829 cc @Pryz @achille-roussel |
I completely missed that usage to be honest.
I think the usage of I think the only safest option would be to use |
You're right, I managed to miss the copy. Using The only other one I can think of is to have |
A more unsafe way would be to |
Yes, that too. The only additional unsafety to that it's supposed to be read-only, and the host can change it, but the host can always change it, so… |
agree the host can always change anything. Doing anything in the guest that makes it seem like it cannot is probably not a good idea. Like adding racing stripes to a Honda, doesn't make it faster. We should make the guest code concise, cheap as possible and most importantly relevant. If the current code isn't really leaking it, or it is more complicated than an alternative to do the same. PR welcome, especially before we communicate widely a change. |
@codefromthecrypt @ncruces opened a PR with a possible implementation #1434, feel free to close it if doe snot make sense |
Signed-off-by: Luca Burgazzoli [email protected]