examples(allocation): free memory after unmarshalling a result from the guest #1390

lburgazzoli · 2023-04-21T09:40:30Z

Signed-off-by: Luca Burgazzoli [email protected]

lburgazzoli · 2023-04-21T09:42:09Z

This is the benchamrks result I get by comparing main vs this PR:

➜ benchstat old.txt new.txt 
goos: linux
goarch: amd64
pkg: github.com/tetratelabs/wazero/internal/integration_test/vs/compiler
cpu: Intel(R) Core(TM) i7-10850H CPU @ 2.70GHz
                          │   old.txt    │                new.txt                 │
                          │    sec/op    │    sec/op      vs base                 │
Allocation/Compile-12       12.00m ± 15%    12.16m ±  2%          ~ (p=0.699 n=6)
Allocation/Instantiate-12   598.9µ ±  1%    622.0µ ±  3%     +3.85% (p=0.009 n=6)
Allocation/Call-12          2.031µ ±  6%   28.617µ ± 24%  +1308.99% (p=0.002 n=6)
geomean                     244.4µ          600.5µ         +145.71%

                          │   old.txt    │                new.txt                 │
                          │     B/op     │     B/op       vs base                 │
Allocation/Compile-12       2.553Mi ± 0%   2.553Mi ±  0%          ~ (p=0.818 n=6)
Allocation/Instantiate-12   348.0Ki ± 0%   348.0Ki ±  0%          ~ (p=0.626 n=6)
Allocation/Call-12            48.00 ± 0%    586.00 ± 10%  +1120.83% (p=0.002 n=6)
geomean                     34.94Ki        80.45Ki         +130.26%

                          │   old.txt   │              new.txt               │
                          │  allocs/op  │  allocs/op   vs base               │
Allocation/Compile-12       2.023k ± 0%   2.023k ± 0%        ~ (p=1.000 n=6)
Allocation/Instantiate-12    841.5 ± 0%    841.5 ± 0%        ~ (p=1.000 n=6)
Allocation/Call-12           5.000 ± 0%    6.000 ± 0%  +20.00% (p=0.002 n=6)
geomean                      204.2         217.0        +6.27%

pkg: github.com/tetratelabs/wazero/internal/integration_test/vs/interpreter
                          │   old.txt    │               new.txt                │
                          │    sec/op    │    sec/op      vs base               │
Allocation/Compile-12       2.761m ±  8%    3.238m ± 14%  +17.27% (p=0.009 n=6)
Allocation/Instantiate-12   168.9µ ± 30%    152.4µ ± 82%        ~ (p=0.589 n=6)
Allocation/Call-12          53.99µ ± 11%   107.08µ ±  9%  +98.33% (p=0.002 n=6)
geomean                     293.1µ          375.2µ        +28.01%

                          │   old.txt    │               new.txt                │
                          │     B/op     │     B/op      vs base                │
Allocation/Compile-12       1.449Mi ± 0%   1.449Mi ± 0%         ~ (p=0.589 n=6)
Allocation/Instantiate-12   222.8Ki ± 0%   222.8Ki ± 0%         ~ (p=0.186 n=6)
Allocation/Call-12          2.241Ki ± 0%   6.569Ki ± 0%  +193.18% (p=0.002 n=6)
geomean                     90.49Ki        129.5Ki        +43.13%

                          │   old.txt   │                new.txt                │
                          │  allocs/op  │  allocs/op   vs base                  │
Allocation/Compile-12       1.076k ± 0%   1.077k ± 0%         ~ (p=0.061 n=6)
Allocation/Instantiate-12    747.0 ± 0%    747.0 ± 0%         ~ (p=1.000 n=6) ¹
Allocation/Call-12           98.00 ± 0%   264.00 ± 0%  +169.39% (p=0.002 n=6)
geomean                      428.7         596.6        +39.19%
¹ all samples are equal

I'm not sure if I did everything correct but it looks like there is still a large difference

internal/integration_test/vs/bench_allocation.go

ncruces · 2023-04-21T12:24:56Z

The memory increase seems negligible (it's 1000%+, but just 500 bytes). My previous analysis is totally invalidated because I read Mi in the first line and assumed 500 megabytes. So please disregard that.

The CPU increase I don't know how to evaluate. Is a 20µs increase a problem?

If so, I would test a couple of things, to test the assumption that TinyGo using cgo might be the issue.

One is a "caller allocates" convention: host calls malloc with a (perhaps oversized) buffer, passes the pointer and length in, guest returns the length.

The other is a global variable for the result. In the guest:

// Use this global variable to keep the return value from an external function from being GCed.
var lastResult any

// Call this function to allow GCing the return value from an external function.
func ClearLastResult() { lastResult = nil }

codefromthecrypt · 2023-04-21T12:39:47Z

I would chime in on 20us being a problem. I think it is because for example a web handler can return in less than a microsecond and tinygo is used to implement file handlers.

@dmvolod seemed to have a regression fix around tinygo recently and you can tell in these benchmarks that even using things like protobuf, the whole round trip is less than the additional overhead added here. knqyf263/go-plugin#48

TL;DR; something's wrong and we shouldn't advocate (merge) this unless the overhead becomes more a percentage than a factor ;)

codefromthecrypt · 2023-04-21T13:09:10Z

hmm panic: runtime error: invalid memory address or nil pointer dereference so I think something is busted, but luckily doesn't need the benchmark to find it. Can you have a look at TestAllocation?.

internal/integration_test/vs/bench_allocation.go

codefromthecrypt · 2023-04-21T13:11:59Z

internal/integration_test/vs/runtime.go

+	if err != nil {
+		return 0, err
+	}
+	if len(results) > 0 {


this should never be false, basically this is a bug if the sig is I32I32_I64, so panic or return an error.

If I add something like

if len(results) == 0 { panic("unexpected") }

Then it panics, but don't know exactly why

ok lemme pull this and take a look

the problem is that the code was switched to the signature of "greeting" but the function name called was "greet" which didn't return a value.

ah greet vs greeting

lburgazzoli · 2023-04-21T14:00:15Z

@codefromthecrypt with the latest commit which implements the suggestion in tinygo-org/tinygo#2787 (comment), the results are:

➜ benchstat old.txt new.txt 
goos: linux
goarch: amd64
pkg: github.com/tetratelabs/wazero/internal/integration_test/vs/compiler
cpu: Intel(R) Core(TM) i7-10850H CPU @ 2.70GHz
                          │   old.txt   │              new.txt               │
                          │   sec/op    │   sec/op     vs base               │
Allocation/Compile-12       11.93m ± 6%   11.96m ± 4%        ~ (p=1.000 n=6)
Allocation/Instantiate-12   608.9µ ± 2%   624.0µ ± 2%   +2.47% (p=0.041 n=6)
Allocation/Call-12          2.016µ ± 2%   2.488µ ± 4%  +23.41% (p=0.002 n=6)
geomean                     244.7µ        264.8µ        +8.23%

                          │   old.txt    │               new.txt               │
                          │     B/op     │     B/op      vs base               │
Allocation/Compile-12       2.553Mi ± 0%   2.554Mi ± 0%   +0.03% (p=0.041 n=6)
Allocation/Instantiate-12   348.0Ki ± 0%   352.2Ki ± 0%   +1.22% (p=0.002 n=6)
Allocation/Call-12            48.00 ± 0%     56.00 ± 0%  +16.67% (p=0.002 n=6)
geomean                     34.94Ki        36.93Ki        +5.71%

                          │   old.txt   │              new.txt               │
                          │  allocs/op  │  allocs/op   vs base               │
Allocation/Compile-12       2.023k ± 0%   2.033k ± 0%   +0.49% (p=0.002 n=6)
Allocation/Instantiate-12    841.0 ± 0%    843.0 ± 0%   +0.24% (p=0.002 n=6)
Allocation/Call-12           5.000 ± 0%    6.000 ± 0%  +20.00% (p=0.002 n=6)
geomean                      204.1         217.5        +6.52%

pkg: github.com/tetratelabs/wazero/internal/integration_test/vs/interpreter
                          │   old.txt    │               new.txt               │
                          │    sec/op    │    sec/op     vs base               │
Allocation/Compile-12       3.017m ± 19%   2.978m ± 10%        ~ (p=1.000 n=6)
Allocation/Instantiate-12   154.9µ ± 14%   151.1µ ±  1%        ~ (p=0.310 n=6)
Allocation/Call-12          58.87µ ± 10%   65.98µ ±  3%  +12.08% (p=0.004 n=6)
geomean                     301.9µ         309.6µ         +2.55%

                          │   old.txt    │              new.txt               │
                          │     B/op     │     B/op      vs base              │
Allocation/Compile-12       1.449Mi ± 0%   1.458Mi ± 0%  +0.57% (p=0.002 n=6)
Allocation/Instantiate-12   222.8Ki ± 0%   223.0Ki ± 0%  +0.06% (p=0.002 n=6)
Allocation/Call-12          2.239Ki ± 0%   2.330Ki ± 0%  +4.06% (p=0.002 n=6)
geomean                     90.46Ki        91.86Ki       +1.55%

                          │   old.txt   │              new.txt              │
                          │  allocs/op  │  allocs/op   vs base              │
Allocation/Compile-12       1.077k ± 0%   1.087k ± 0%  +0.93% (p=0.002 n=6)
Allocation/Instantiate-12    747.0 ± 0%    748.0 ± 0%  +0.13% (p=0.002 n=6)
Allocation/Call-12           98.00 ± 0%   103.00 ± 1%  +5.10% (p=0.002 n=6)
geomean                      428.8         437.5       +2.03%

codefromthecrypt · 2023-04-21T14:29:12Z

internal/integration_test/vs/bench_allocation.go

-	if err = m.CallI32I32_V(testCtx, "greet", namePtr, nameSize); err != nil {
-		return err
+	// Now, we can call "greeting", which reads the string we wrote to memory!
+	ptrSize, fnErr := m.CallI32I32_I64(testCtx, "greeting", namePtr, nameSize)


so the main thing here is that the benchmark is doing something different than before. Due to this, it cannot be compared directly to the prior (now it is calling "greeting" but before it was calling "greet"

codefromthecrypt · 2023-04-21T14:38:48Z

in some ways it looks like we are doing the same thing as tinymem now.

However, mixing and matching the built-in "malloc" and "free" with an explicit version (similar to tinymem) is a little confusing I think, especially in an example.

I prefer the example either use the built-in functions like it did before or manually manage, but not both.

So if you think that we shouldn't use malloc/free exported by tinygo then I think we just inline the latest version of the code in tinymem into the example/benchmark, then refer to tinymem as a repo that does this for you.

https://github.com/tetratelabs/tinymem/blob/main/example/greeting.go

codefromthecrypt · 2023-04-21T14:52:54Z

so to the point of @ncruces here #1390 (comment)

One is a "caller allocates" convention: host calls malloc with a (perhaps oversized) buffer, passes the pointer and length in, guest returns the length.

this is what was going on before, and kindof is still. Similar to rust, this was using the underlying malloc/free built-in imports.

The other is a global variable for the result. In the guest:

This is kindof happening now, except using a map instead of a single field. IMHO if doing something more complicated (like using a map), I think the code that is almost like tinymem should be exactly like that, since that code was reviewed etc, no need to inline something almost the same. In other words, not copy/paste from my old comment, rather the latest version of it.

OTOH, I don't know that the special malloc/free exports are going away any time, except possibly not exported by default. Frankly, I have lost track of the real motivation to change anything, but I also don't mind changing as long as it is coherent!

lburgazzoli · 2023-04-21T19:17:34Z

@codefromthecrypt this whole exercise is an attempt to add some clarification about how host/guest memory should be handled for future devs that like me where looking for an example as it looks like this topic is not much clear, yet.

And this PR seems to be a confirmation :)

I don't have any preference about using timymem (which I discovered only today) or the builtin free/malloc and CGO so I would love to have some guidance about which one you'd like to be part of the examples. Also, should add a dedicated example and related benchmark ?

ncruces · 2023-04-21T22:01:11Z

I guess if tinymem's goal is to encode best practices in terms of allocations across the host-guest boundary, it's only natural that an example either uses it or reimplements it.

My suggestions were more with the goal of root causing the benchmark regression. From the data @lburgazzoli collected it seems using CGo from TinyGo might be the culprit?

And if that's so, that probably confirms that tinymem's approach is the best one, for a TinyGo guest.

codefromthecrypt · 2023-04-22T06:05:24Z

Surprisingly or not I think I am on the same page with both! Maybe took a bit for my brain to catch up.

Conflicting advice is basically the opposite of clear, so that's why I think our example should not mix both in the same code. When they are mixed it is really difficult to figure out what advice is and even more unclear is the perf impacts.

So, let's recap our entrypoints: We have the example which shows new users how to do allocation, and we also have a safeguard Allocation benchmark which both serves to compare more realistic interaction performance (vs wasmtime etc), and also acts in some ways as a soak test. For example, it was through the benchmark we found things breaking functionally or performance. That is a surprising helpful thing.

Along the way, though, we changed the example to something it wasn't before. Specifically "greeting" vs "greet", so we can't really compare performance of the two based on the prior commit. Until the exact use case is run, both ways I don't think we can say for sure an outcome.

Here's an idea, if folks have patience for it. Note: this is temporary to achieve an outcome of knowing if both approaches work or have dramatically different outcomes when compared exactly.

Change the go allocation source that calls tinygo's wasm to use a CLI arg to determine the style to use. Change the tinygo source to implement both approaches to "greeting": pure CGO including go imports and also the tinymem approach. For tinymem, literally copy/paste the code from tinymem so it isn't subtly different.

For example export "greeting_cgo" and "greeting_tinymem" then in the host side do both approaches switched on an arg. So, for example the cgo style would use "malloc" and tinymem style "_malloc". Another way is to make another directory, like tinygo-malloc, if it helps.

Then, run the benchmark against each. You can make a manual benchmark or modify BenchmarkAllocation to allow choosing which. The main idea is benchmarking exactly the same use case, both to see the results and also that they complete without error.

codefromthecrypt · 2023-04-22T06:11:15Z

I also want to clarify from my personal POV, I didn't help make tinymem because I thought it was a better idea. This project was made defensively because tinygo were considering removing the CGO imports (those in use by go-plugin, trivy and others). I don't personally want folks to have to do memory things manually, as wasm is hard enough as it is!

lburgazzoli · 2023-04-22T12:25:33Z

Here's an idea, if folks have patience for it. Note: this is temporary to achieve an outcome of knowing if both approaches work or have dramatically different outcomes when compared exactly.

Change the go allocation source that calls tinygo's wasm to use a CLI arg to determine the style to use. Change the tinygo source to implement both approaches to "greeting": pure CGO including go imports and also the tinymem approach. For tinymem, literally copy/paste the code from tinymem so it isn't subtly different.

For example export "greeting_cgo" and "greeting_tinymem" then in the host side do both approaches switched on an arg. So, for example the cgo style would use "malloc" and tinymem style "_malloc". Another way is to make another directory, like tinygo-malloc, if it helps.

Then, run the benchmark against each. You can make a manual benchmark or modify BenchmarkAllocation to allow choosing which. The main idea is benchmarking exactly the same use case, both to see the results and also that they complete without error.

If there is not urgency, I would like to take this task

lburgazzoli · 2023-04-26T11:30:41Z

In my latest commit, I added a POC for what has been described by @codefromthecrypt in the latest comments.
Some names must probably be changed for something better but here I just want to have some feedback

codefromthecrypt

Thanks for progressing this! can you paste output of bench results for these two? This will help guide the discussion as people are more likely to engage when they see the difference (vs pulling and running manually)

lburgazzoli · 2023-05-02T10:14:52Z

@codefromthecrypt I've adapted the allocation example to use CGO for the greeting function and squashed the commits.

I kept the BenchmarkAllocationCGO so there is one bench that runs against greet and one against greeting:

➜ go test -run='^$' -bench '^BenchmarkAllocation.*' ./internal/integration_test/vs/compiler -count=6
goos: linux
goarch: amd64
pkg: github.com/tetratelabs/wazero/internal/integration_test/vs/compiler
cpu: Intel(R) Core(TM) i7-10850H CPU @ 2.70GHz

BenchmarkAllocation/Compile-12                73          32071839 ns/op         2958373 B/op       2027 allocs/op
BenchmarkAllocation/Compile-12                73          14590089 ns/op         2953361 B/op       2022 allocs/op
BenchmarkAllocation/Compile-12                81          15130377 ns/op         2953334 B/op       2022 allocs/op
BenchmarkAllocation/Compile-12                80          14616454 ns/op         2953033 B/op       2022 allocs/op
BenchmarkAllocation/Compile-12                84          14875860 ns/op         2952820 B/op       2021 allocs/op
BenchmarkAllocation/Compile-12                79          15151539 ns/op         2952698 B/op       2021 allocs/op
BenchmarkAllocation/Instantiate-12          3987            338833 ns/op          359861 B/op        852 allocs/op
BenchmarkAllocation/Instantiate-12          3866            347517 ns/op          359903 B/op        852 allocs/op
BenchmarkAllocation/Instantiate-12          3709            362471 ns/op          359768 B/op        852 allocs/op
BenchmarkAllocation/Instantiate-12          3744            357544 ns/op          359742 B/op        852 allocs/op
BenchmarkAllocation/Instantiate-12          3633            362594 ns/op          359736 B/op        852 allocs/op
BenchmarkAllocation/Instantiate-12          3699            368921 ns/op          359766 B/op        852 allocs/op
BenchmarkAllocation/Call-12               342232              3376 ns/op              48 B/op          5 allocs/op
BenchmarkAllocation/Call-12               362686              3294 ns/op              48 B/op          5 allocs/op
BenchmarkAllocation/Call-12               366472              3247 ns/op              48 B/op          5 allocs/op
BenchmarkAllocation/Call-12               348795              3282 ns/op              48 B/op          5 allocs/op
BenchmarkAllocation/Call-12               367167              3263 ns/op              48 B/op          5 allocs/op
BenchmarkAllocation/Call-12               343881              3523 ns/op              48 B/op          5 allocs/op

BenchmarkAllocationCGO/Compile-12             79          14663107 ns/op         2952708 B/op       2022 allocs/op
BenchmarkAllocationCGO/Compile-12             80          15036263 ns/op         2952842 B/op       2021 allocs/op
BenchmarkAllocationCGO/Compile-12             81          14828323 ns/op         2952739 B/op       2022 allocs/op
BenchmarkAllocationCGO/Compile-12             81          14646740 ns/op         2952697 B/op       2021 allocs/op
BenchmarkAllocationCGO/Compile-12             80          14670556 ns/op         2952770 B/op       2022 allocs/op
BenchmarkAllocationCGO/Compile-12             81          14636888 ns/op         2952636 B/op       2021 allocs/op
BenchmarkAllocationCGO/Instantiate-12       3385            417383 ns/op          360447 B/op        852 allocs/op
BenchmarkAllocationCGO/Instantiate-12       3056            377058 ns/op          359731 B/op        852 allocs/op
BenchmarkAllocationCGO/Instantiate-12       3600            368788 ns/op          359733 B/op        852 allocs/op
BenchmarkAllocationCGO/Instantiate-12       3544            376303 ns/op          359736 B/op        852 allocs/op
BenchmarkAllocationCGO/Instantiate-12       3094            362826 ns/op          359752 B/op        852 allocs/op
BenchmarkAllocationCGO/Instantiate-12       3470            362647 ns/op          359761 B/op        852 allocs/op
BenchmarkAllocationCGO/Call-12            413808              2770 ns/op              64 B/op          7 allocs/op
BenchmarkAllocationCGO/Call-12            436496              2749 ns/op              64 B/op          7 allocs/op
BenchmarkAllocationCGO/Call-12            427888              2754 ns/op              64 B/op          7 allocs/op
BenchmarkAllocationCGO/Call-12            426919              2745 ns/op              64 B/op          7 allocs/op
BenchmarkAllocationCGO/Call-12            435013              2769 ns/op              64 B/op          7 allocs/op
BenchmarkAllocationCGO/Call-12            436152              2771 ns/op              64 B/op          7 allocs/op

PASS
ok      github.com/tetratelabs/wazero/internal/integration_test/vs/compiler     68.776s

Let me know if this matches what you had in mind.

codefromthecrypt

Thanks for the help again. I made some requests to pare down the change to the minimum. Don't forget to update the guidance in the site directory for tinygo!

internal/integration_test/vs/bench_allocation_cgo.go

examples/allocation/tinygo/testdata/greet.go

internal/integration_test/vs/runtime.go

internal/integration_test/vs/wasmedge/wasmedge_test.go

codefromthecrypt

more notes as my prior ended up in the wrong place. Thanks again!

examples/allocation/tinygo/testdata/greet.go

codefromthecrypt · 2023-05-02T11:18:48Z

examples/allocation/tinygo/testdata/greet.go

+// managed by TinyGo hence it must be freed by the host.
+func stringToNativePtr(s string) (uint32, uint32) {
+	if len(s) == 0 {
+		return 0, 0


Is this special case valid? if so add a trailing comment why (for example,// CGO.free ignores a pointer of zero)
If not, delete the special case.

As far as I recall, calling malloc with a size equals to 0, is implementation dependent hence I opted not to call it in such case.

I Will add a note if that's fine with you but in this case it is probably never happening.

This is an example, so we can't really add suggestions to special case zero unless it will work. We don't have to consider externally provided impls of malloc/free in the example, though. What tinygo uses with this example and flags given to build is enough.

@anuraaga @dgryski do either of you happen to know what's better:

not dodging calling malloc dependent on length

dodging calling malloc with zero length (results in free called with zero)

here's the tinygo code in question. Personally, I prefer to not special case without saying why we are doing it. Most people won't know why they are calling malloc/free in the first place and code like this gets copy/pasted around. As best I can tell, we are assuming the result will be zero and so we are returning that instead of calling things. That would be a perf optimization, which we'd comment why.

If we don't have a good reason tied to tinygo, I prefer we don't do any special casing in this function. Without comments, special casing raises more questions than answers, especially that a zero pointer is something safe to return back to tinygo on free. For example, pointers are often memory offsets in wasm, so zero would be the initial memory position. ack in the tinygo impl the pointer is mapped, but anyway the point is now I'm having to think about if zero is a safe sentinel value for free or not. it is a lot better to leave the problem on what to do with length zero completely up to tinygo (e.g. don't special case). Cheaper to review, less code, easier example.

https://github.com/tinygo-org/tinygo/blob/0d56dee00f49bd50eb373c02c30062a75ec28f10/src/runtime/arch_tinygowasm_malloc.go#L13-L34

special case removed

codefromthecrypt · 2023-05-02T11:26:45Z

ps if you feel like benchmarking "greeting" is better than "greet", feel free to switch. What I mean is let's keep only one for the allocation example. I don't care strongly which.

…he guest Signed-off-by: Luca Burgazzoli <[email protected]>

lburgazzoli · 2023-05-02T12:23:19Z

ps if you feel like benchmarking "greeting" is better than "greet", feel free to switch. What I mean is let's keep only one for the allocation example. I don't care strongly which.

I don't have any strong opinion, so let's keep the benchmark as it is, it wont be too complex to add a new one when/if it would become a need.

codefromthecrypt

Thanks, I'll go ahead and merge this and do the site updates on own vs asking again here. Appreciate your help getting the underlying solution together and benchmarking to ensure it didn't cause a regression.

lburgazzoli · 2023-05-02T23:10:36Z

I can try to update the docs before EOW if you want me to.

ncruces · 2023-05-02T23:39:23Z

Epic effort for something that started out with improving example code.
Thanks, and really sorry if I kinda lost track along the way!

codefromthecrypt · 2023-05-02T23:50:59Z

@lburgazzoli I raised #1429 as I think we should get this sorted to avoid a confusing comment chain over a week again. This sort of stuff is cognitively deep and best way I can find is to contain it by time to shortest as possible, as this avoids things like having to cross when people are out for holidays etc.

In general, personally I try to complete everything PR of any kind within a couple days for the same reason, even if not cognitively high. Even small things can get difficult to load back into the brain if drug out over several days, much less weeks, with several people.

ncruces · 2023-05-03T00:12:51Z

@lburgazzoli sorry for just spotting this, I guess I only grasped it when looking at committed code in whole.
But I might be missing something obvious and I'd like to get your understanding before filing an issue/PR.

What keeps message alive (i.e. not available for GC) after stringToPtr returns?

wazero/examples/allocation/tinygo/testdata/greet.go

Lines 20 to 24 in 9dd8b1b

    
           // log a message to the console using _log. 
        
           func log(message string) { 
        
           	ptr, size := stringToPtr(message) 
        
           	_log(ptr, size) 
        
           }

I'm wondering if we need to add runtime.KeepAlive(message) after _log.

codefromthecrypt · 2023-05-03T00:18:04Z

I'll update tinymem once above is resolved. We do tend to see things like this in syscall code, after the call like https://github.com/golang/go/blob/0d347544cbca0f42b160424f6bc2458ebcc7b3fc/src/syscall/fs_wasip1.go#L824-L829 cc @Pryz @achille-roussel

lburgazzoli · 2023-05-03T06:09:59Z

@lburgazzoli sorry for just spotting this, I guess I only grasped it when looking at committed code in whole. But I might be missing something obvious and I'd like to get your understanding before filing an issue/PR.

What keeps message alive (i.e. not available for GC) after stringToPtr returns?

I completely missed that usage to be honest.

wazero/examples/allocation/tinygo/testdata/greet.go

Lines 20 to 24 in 9dd8b1b

// log a message to the console using _log.

func log(message string) {

ptr, size := stringToPtr(message)

_log(ptr, size)

}

I'm wondering if we need to add runtime.KeepAlive(message) after _log.

I think the usage of message is safe, however the implementation of stringToPtr is not as it performs a copy of the string's data but that slice goes out of scope hence the address returned by the function may become invalid if a GC is triggered.

I think the only safest option would be to use stringToLeakedPtr ad free up memory with defer, similar to what it is done in the host code.

ncruces · 2023-05-03T12:59:44Z

You're right, I managed to miss the copy. Using stringToLeakedPtr (C.malloc) is one way.

The only other one I can think of is to have stringToPtr return unsafe.Pointer, then cast it in the caller, and runtime.KeepAlive the unsafe.Pointer.

lburgazzoli · 2023-05-03T13:24:02Z

KeepAlive

A more unsafe way would be to runtime.KeepAlive the message and return the actual data that backs the string with unsafe.StringData, but there will be dragons :)

ncruces · 2023-05-03T13:41:22Z

Yes, that too. The only additional unsafety to that it's supposed to be read-only, and the host can change it, but the host can always change it, so…

codefromthecrypt · 2023-05-03T23:15:41Z

The only additional unsafety to that it's supposed to be read-only, and the host can change it, but the host can always change it, so…

agree the host can always change anything. Doing anything in the guest that makes it seem like it cannot is probably not a good idea. Like adding racing stripes to a Honda, doesn't make it faster. We should make the guest code concise, cheap as possible and most importantly relevant. If the current code isn't really leaking it, or it is more complicated than an alternative to do the same. PR welcome, especially before we communicate widely a change.

lburgazzoli · 2023-05-04T07:40:53Z

@codefromthecrypt @ncruces opened a PR with a possible implementation #1434, feel free to close it if doe snot make sense

lburgazzoli mentioned this pull request Apr 21, 2023

examples(allocation): free memory after unmarshalling a result from the guest #1368

Merged

lburgazzoli force-pushed the allocation-example-free branch 2 times, most recently from f5fa2bb to ca79698 Compare April 21, 2023 10:07

ncruces reviewed Apr 21, 2023

View reviewed changes

internal/integration_test/vs/bench_allocation.go Outdated Show resolved Hide resolved

lburgazzoli force-pushed the allocation-example-free branch 2 times, most recently from 3671d22 to 43eff01 Compare April 21, 2023 13:01

codefromthecrypt reviewed Apr 21, 2023

View reviewed changes

internal/integration_test/vs/bench_allocation.go Outdated Show resolved Hide resolved

codefromthecrypt reviewed Apr 21, 2023

View reviewed changes

lburgazzoli force-pushed the allocation-example-free branch from 43eff01 to 370a8ba Compare April 21, 2023 13:58

lburgazzoli force-pushed the allocation-example-free branch from 370a8ba to e29ae28 Compare April 21, 2023 14:02

lburgazzoli changed the title ~~examples(allocation): free memory after unmarshalling a result from the guest (#1368)~~ examples(allocation): free memory after unmarshalling a result from the guest Apr 21, 2023

codefromthecrypt reviewed Apr 21, 2023

View reviewed changes

lburgazzoli force-pushed the allocation-example-free branch 2 times, most recently from 14d3752 to 1ffb3f6 Compare April 26, 2023 11:28

lburgazzoli force-pushed the allocation-example-free branch 2 times, most recently from c4b5b91 to 8692e2e Compare April 26, 2023 12:10

codefromthecrypt reviewed Apr 26, 2023

View reviewed changes

lburgazzoli force-pushed the allocation-example-free branch from 981580c to 4f23ede Compare May 2, 2023 10:12

lburgazzoli force-pushed the allocation-example-free branch from 4f23ede to a35f61e Compare May 2, 2023 10:17

codefromthecrypt reviewed May 2, 2023

View reviewed changes

lburgazzoli force-pushed the allocation-example-free branch 4 times, most recently from a703f1d to 2ad486f Compare May 2, 2023 12:18

examples(allocation): free memory after unmarshalling a result from t…

586b61c

…he guest Signed-off-by: Luca Burgazzoli <[email protected]>

lburgazzoli force-pushed the allocation-example-free branch from 2ad486f to 586b61c Compare May 2, 2023 12:20

codefromthecrypt approved these changes May 2, 2023

View reviewed changes

codefromthecrypt marked this pull request as ready for review May 2, 2023 23:00

codefromthecrypt requested a review from mathetake as a code owner May 2, 2023 23:00

codefromthecrypt merged commit b2c11d8 into tetratelabs:main May 2, 2023

codefromthecrypt mentioned this pull request May 2, 2023

Revises instructions to latest guidance from TinyGo #1429

Merged

lburgazzoli deleted the allocation-example-free branch May 3, 2023 05:51

examples(allocation): free memory after unmarshalling a result from the guest #1390

examples(allocation): free memory after unmarshalling a result from the guest #1390

Conversation

lburgazzoli commented Apr 21, 2023

lburgazzoli commented Apr 21, 2023

ncruces commented Apr 21, 2023 • edited Loading

codefromthecrypt commented Apr 21, 2023

codefromthecrypt commented Apr 21, 2023

codefromthecrypt Apr 21, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lburgazzoli commented Apr 21, 2023

Choose a reason for hiding this comment

codefromthecrypt commented Apr 21, 2023 • edited Loading

codefromthecrypt commented Apr 21, 2023

lburgazzoli commented Apr 21, 2023

ncruces commented Apr 21, 2023

codefromthecrypt commented Apr 22, 2023 • edited Loading

codefromthecrypt commented Apr 22, 2023

lburgazzoli commented Apr 22, 2023

lburgazzoli commented Apr 26, 2023

codefromthecrypt left a comment

Choose a reason for hiding this comment

lburgazzoli commented May 2, 2023

codefromthecrypt left a comment

Choose a reason for hiding this comment

codefromthecrypt left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codefromthecrypt May 2, 2023 • edited Loading

Choose a reason for hiding this comment

lburgazzoli May 2, 2023 • edited Loading

Choose a reason for hiding this comment

codefromthecrypt commented May 2, 2023

lburgazzoli commented May 2, 2023

codefromthecrypt left a comment

Choose a reason for hiding this comment

lburgazzoli commented May 2, 2023

ncruces commented May 2, 2023

codefromthecrypt commented May 2, 2023

ncruces commented May 3, 2023

codefromthecrypt commented May 3, 2023

lburgazzoli commented May 3, 2023

ncruces commented May 3, 2023

lburgazzoli commented May 3, 2023

ncruces commented May 3, 2023

codefromthecrypt commented May 3, 2023

lburgazzoli commented May 4, 2023

ncruces commented Apr 21, 2023 •

edited

Loading

codefromthecrypt Apr 21, 2023 •

edited

Loading

codefromthecrypt commented Apr 21, 2023 •

edited

Loading

codefromthecrypt commented Apr 22, 2023 •

edited

Loading

codefromthecrypt May 2, 2023 •

edited

Loading

lburgazzoli May 2, 2023 •

edited

Loading