Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

examples(allocation): free memory after unmarshalling a result from the guest #1390

Merged

Conversation

lburgazzoli
Copy link
Contributor

Signed-off-by: Luca Burgazzoli [email protected]

@lburgazzoli
Copy link
Contributor Author

This is the benchamrks result I get by comparing main vs this PR:

➜ benchstat old.txt new.txt 
goos: linux
goarch: amd64
pkg: github.com/tetratelabs/wazero/internal/integration_test/vs/compiler
cpu: Intel(R) Core(TM) i7-10850H CPU @ 2.70GHz
                          │   old.txt    │                new.txt                 │
                          │    sec/op    │    sec/op      vs base                 │
Allocation/Compile-12       12.00m ± 15%    12.16m ±  2%          ~ (p=0.699 n=6)
Allocation/Instantiate-12   598.9µ ±  1%    622.0µ ±  3%     +3.85% (p=0.009 n=6)
Allocation/Call-12          2.031µ ±  6%   28.617µ ± 24%  +1308.99% (p=0.002 n=6)
geomean                     244.4µ          600.5µ         +145.71%

                          │   old.txt    │                new.txt                 │
                          │     B/op     │     B/op       vs base                 │
Allocation/Compile-12       2.553Mi ± 0%   2.553Mi ±  0%          ~ (p=0.818 n=6)
Allocation/Instantiate-12   348.0Ki ± 0%   348.0Ki ±  0%          ~ (p=0.626 n=6)
Allocation/Call-12            48.00 ± 0%    586.00 ± 10%  +1120.83% (p=0.002 n=6)
geomean                     34.94Ki        80.45Ki         +130.26%

                          │   old.txt   │              new.txt               │
                          │  allocs/op  │  allocs/op   vs base               │
Allocation/Compile-12       2.023k ± 0%   2.023k ± 0%        ~ (p=1.000 n=6)
Allocation/Instantiate-12    841.5 ± 0%    841.5 ± 0%        ~ (p=1.000 n=6)
Allocation/Call-12           5.000 ± 0%    6.000 ± 0%  +20.00% (p=0.002 n=6)
geomean                      204.2         217.0        +6.27%

pkg: github.com/tetratelabs/wazero/internal/integration_test/vs/interpreter
                          │   old.txt    │               new.txt                │
                          │    sec/op    │    sec/op      vs base               │
Allocation/Compile-12       2.761m ±  8%    3.238m ± 14%  +17.27% (p=0.009 n=6)
Allocation/Instantiate-12   168.9µ ± 30%    152.4µ ± 82%        ~ (p=0.589 n=6)
Allocation/Call-12          53.99µ ± 11%   107.08µ ±  9%  +98.33% (p=0.002 n=6)
geomean                     293.1µ          375.2µ        +28.01%

                          │   old.txt    │               new.txt                │
                          │     B/op     │     B/op      vs base                │
Allocation/Compile-12       1.449Mi ± 0%   1.449Mi ± 0%         ~ (p=0.589 n=6)
Allocation/Instantiate-12   222.8Ki ± 0%   222.8Ki ± 0%         ~ (p=0.186 n=6)
Allocation/Call-12          2.241Ki ± 0%   6.569Ki ± 0%  +193.18% (p=0.002 n=6)
geomean                     90.49Ki        129.5Ki        +43.13%

                          │   old.txt   │                new.txt                │
                          │  allocs/op  │  allocs/op   vs base                  │
Allocation/Compile-12       1.076k ± 0%   1.077k ± 0%         ~ (p=0.061 n=6)
Allocation/Instantiate-12    747.0 ± 0%    747.0 ± 0%         ~ (p=1.000 n=6) ¹
Allocation/Call-12           98.00 ± 0%   264.00 ± 0%  +169.39% (p=0.002 n=6)
geomean                      428.7         596.6        +39.19%
¹ all samples are equal

I'm not sure if I did everything correct but it looks like there is still a large difference

@ncruces
Copy link
Collaborator

ncruces commented Apr 21, 2023

The memory increase seems negligible (it's 1000%+, but just 500 bytes). My previous analysis is totally invalidated because I read Mi in the first line and assumed 500 megabytes. So please disregard that.

The CPU increase I don't know how to evaluate. Is a 20µs increase a problem?

If so, I would test a couple of things, to test the assumption that TinyGo using cgo might be the issue.

One is a "caller allocates" convention: host calls malloc with a (perhaps oversized) buffer, passes the pointer and length in, guest returns the length.

The other is a global variable for the result. In the guest:

// Use this global variable to keep the return value from an external function from being GCed.
var lastResult any

// Call this function to allow GCing the return value from an external function.
func ClearLastResult() { lastResult = nil }

@codefromthecrypt
Copy link
Contributor

I would chime in on 20us being a problem. I think it is because for example a web handler can return in less than a microsecond and tinygo is used to implement file handlers.

@dmvolod seemed to have a regression fix around tinygo recently and you can tell in these benchmarks that even using things like protobuf, the whole round trip is less than the additional overhead added here. knqyf263/go-plugin#48

TL;DR; something's wrong and we shouldn't advocate (merge) this unless the overhead becomes more a percentage than a factor ;)

@lburgazzoli lburgazzoli force-pushed the allocation-example-free branch 2 times, most recently from 3671d22 to 43eff01 Compare April 21, 2023 13:01
@codefromthecrypt
Copy link
Contributor

hmm panic: runtime error: invalid memory address or nil pointer dereference so I think something is busted, but luckily doesn't need the benchmark to find it. Can you have a look at TestAllocation?.

if err != nil {
return 0, err
}
if len(results) > 0 {
Copy link
Contributor

@codefromthecrypt codefromthecrypt Apr 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should never be false, basically this is a bug if the sig is I32I32_I64, so panic or return an error.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I add something like

	if len(results) == 0 {
		panic("unexpected")
	}

Then it panics, but don't know exactly why

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok lemme pull this and take a look

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the problem is that the code was switched to the signature of "greeting" but the function name called was "greet" which didn't return a value.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah greet vs greeting

@lburgazzoli lburgazzoli force-pushed the allocation-example-free branch from 43eff01 to 370a8ba Compare April 21, 2023 13:58
@lburgazzoli
Copy link
Contributor Author

@codefromthecrypt with the latest commit which implements the suggestion in tinygo-org/tinygo#2787 (comment), the results are:

➜ benchstat old.txt new.txt 
goos: linux
goarch: amd64
pkg: github.com/tetratelabs/wazero/internal/integration_test/vs/compiler
cpu: Intel(R) Core(TM) i7-10850H CPU @ 2.70GHz
                          │   old.txt   │              new.txt               │
                          │   sec/op    │   sec/op     vs base               │
Allocation/Compile-12       11.93m ± 6%   11.96m ± 4%        ~ (p=1.000 n=6)
Allocation/Instantiate-12   608.9µ ± 2%   624.0µ ± 2%   +2.47% (p=0.041 n=6)
Allocation/Call-12          2.016µ ± 2%   2.488µ ± 4%  +23.41% (p=0.002 n=6)
geomean                     244.7µ        264.8µ        +8.23%

                          │   old.txt    │               new.txt               │
                          │     B/op     │     B/op      vs base               │
Allocation/Compile-12       2.553Mi ± 0%   2.554Mi ± 0%   +0.03% (p=0.041 n=6)
Allocation/Instantiate-12   348.0Ki ± 0%   352.2Ki ± 0%   +1.22% (p=0.002 n=6)
Allocation/Call-12            48.00 ± 0%     56.00 ± 0%  +16.67% (p=0.002 n=6)
geomean                     34.94Ki        36.93Ki        +5.71%

                          │   old.txt   │              new.txt               │
                          │  allocs/op  │  allocs/op   vs base               │
Allocation/Compile-12       2.023k ± 0%   2.033k ± 0%   +0.49% (p=0.002 n=6)
Allocation/Instantiate-12    841.0 ± 0%    843.0 ± 0%   +0.24% (p=0.002 n=6)
Allocation/Call-12           5.000 ± 0%    6.000 ± 0%  +20.00% (p=0.002 n=6)
geomean                      204.1         217.5        +6.52%

pkg: github.com/tetratelabs/wazero/internal/integration_test/vs/interpreter
                          │   old.txt    │               new.txt               │
                          │    sec/op    │    sec/op     vs base               │
Allocation/Compile-12       3.017m ± 19%   2.978m ± 10%        ~ (p=1.000 n=6)
Allocation/Instantiate-12   154.9µ ± 14%   151.1µ ±  1%        ~ (p=0.310 n=6)
Allocation/Call-12          58.87µ ± 10%   65.98µ ±  3%  +12.08% (p=0.004 n=6)
geomean                     301.9µ         309.6µ         +2.55%

                          │   old.txt    │              new.txt               │
                          │     B/op     │     B/op      vs base              │
Allocation/Compile-12       1.449Mi ± 0%   1.458Mi ± 0%  +0.57% (p=0.002 n=6)
Allocation/Instantiate-12   222.8Ki ± 0%   223.0Ki ± 0%  +0.06% (p=0.002 n=6)
Allocation/Call-12          2.239Ki ± 0%   2.330Ki ± 0%  +4.06% (p=0.002 n=6)
geomean                     90.46Ki        91.86Ki       +1.55%

                          │   old.txt   │              new.txt              │
                          │  allocs/op  │  allocs/op   vs base              │
Allocation/Compile-12       1.077k ± 0%   1.087k ± 0%  +0.93% (p=0.002 n=6)
Allocation/Instantiate-12    747.0 ± 0%    748.0 ± 0%  +0.13% (p=0.002 n=6)
Allocation/Call-12           98.00 ± 0%   103.00 ± 1%  +5.10% (p=0.002 n=6)
geomean                      428.8         437.5       +2.03%

@lburgazzoli lburgazzoli force-pushed the allocation-example-free branch from 370a8ba to e29ae28 Compare April 21, 2023 14:02
@lburgazzoli lburgazzoli changed the title examples(allocation): free memory after unmarshalling a result from the guest (#1368) examples(allocation): free memory after unmarshalling a result from the guest Apr 21, 2023
if err = m.CallI32I32_V(testCtx, "greet", namePtr, nameSize); err != nil {
return err
// Now, we can call "greeting", which reads the string we wrote to memory!
ptrSize, fnErr := m.CallI32I32_I64(testCtx, "greeting", namePtr, nameSize)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so the main thing here is that the benchmark is doing something different than before. Due to this, it cannot be compared directly to the prior (now it is calling "greeting" but before it was calling "greet"

@codefromthecrypt
Copy link
Contributor

codefromthecrypt commented Apr 21, 2023

in some ways it looks like we are doing the same thing as tinymem now.

However, mixing and matching the built-in "malloc" and "free" with an explicit version (similar to tinymem) is a little confusing I think, especially in an example.

I prefer the example either use the built-in functions like it did before or manually manage, but not both.

So if you think that we shouldn't use malloc/free exported by tinygo then I think we just inline the latest version of the code in tinymem into the example/benchmark, then refer to tinymem as a repo that does this for you.

https://github.com/tetratelabs/tinymem/blob/main/example/greeting.go

@codefromthecrypt
Copy link
Contributor

so to the point of @ncruces here #1390 (comment)

One is a "caller allocates" convention: host calls malloc with a (perhaps oversized) buffer, passes the pointer and length in, guest returns the length.

this is what was going on before, and kindof is still. Similar to rust, this was using the underlying malloc/free built-in imports.

The other is a global variable for the result. In the guest:

This is kindof happening now, except using a map instead of a single field. IMHO if doing something more complicated (like using a map), I think the code that is almost like tinymem should be exactly like that, since that code was reviewed etc, no need to inline something almost the same. In other words, not copy/paste from my old comment, rather the latest version of it.

OTOH, I don't know that the special malloc/free exports are going away any time, except possibly not exported by default. Frankly, I have lost track of the real motivation to change anything, but I also don't mind changing as long as it is coherent!

@lburgazzoli
Copy link
Contributor Author

@codefromthecrypt this whole exercise is an attempt to add some clarification about how host/guest memory should be handled for future devs that like me where looking for an example as it looks like this topic is not much clear, yet.

And this PR seems to be a confirmation :)

I don't have any preference about using timymem (which I discovered only today) or the builtin free/malloc and CGO so I would love to have some guidance about which one you'd like to be part of the examples. Also, should add a dedicated example and related benchmark ?

@ncruces
Copy link
Collaborator

ncruces commented Apr 21, 2023

I guess if tinymem's goal is to encode best practices in terms of allocations across the host-guest boundary, it's only natural that an example either uses it or reimplements it.

My suggestions were more with the goal of root causing the benchmark regression. From the data @lburgazzoli collected it seems using CGo from TinyGo might be the culprit?

And if that's so, that probably confirms that tinymem's approach is the best one, for a TinyGo guest.

@codefromthecrypt
Copy link
Contributor

codefromthecrypt commented Apr 22, 2023

Surprisingly or not I think I am on the same page with both! Maybe took a bit for my brain to catch up.

Conflicting advice is basically the opposite of clear, so that's why I think our example should not mix both in the same code. When they are mixed it is really difficult to figure out what advice is and even more unclear is the perf impacts.

So, let's recap our entrypoints: We have the example which shows new users how to do allocation, and we also have a safeguard Allocation benchmark which both serves to compare more realistic interaction performance (vs wasmtime etc), and also acts in some ways as a soak test. For example, it was through the benchmark we found things breaking functionally or performance. That is a surprising helpful thing.

Along the way, though, we changed the example to something it wasn't before. Specifically "greeting" vs "greet", so we can't really compare performance of the two based on the prior commit. Until the exact use case is run, both ways I don't think we can say for sure an outcome.

Here's an idea, if folks have patience for it. Note: this is temporary to achieve an outcome of knowing if both approaches work or have dramatically different outcomes when compared exactly.

Change the go allocation source that calls tinygo's wasm to use a CLI arg to determine the style to use. Change the tinygo source to implement both approaches to "greeting": pure CGO including go imports and also the tinymem approach. For tinymem, literally copy/paste the code from tinymem so it isn't subtly different.

For example export "greeting_cgo" and "greeting_tinymem" then in the host side do both approaches switched on an arg. So, for example the cgo style would use "malloc" and tinymem style "_malloc". Another way is to make another directory, like tinygo-malloc, if it helps.

Then, run the benchmark against each. You can make a manual benchmark or modify BenchmarkAllocation to allow choosing which. The main idea is benchmarking exactly the same use case, both to see the results and also that they complete without error.

@codefromthecrypt
Copy link
Contributor

I also want to clarify from my personal POV, I didn't help make tinymem because I thought it was a better idea. This project was made defensively because tinygo were considering removing the CGO imports (those in use by go-plugin, trivy and others). I don't personally want folks to have to do memory things manually, as wasm is hard enough as it is!

@lburgazzoli
Copy link
Contributor Author

Here's an idea, if folks have patience for it. Note: this is temporary to achieve an outcome of knowing if both approaches work or have dramatically different outcomes when compared exactly.

Change the go allocation source that calls tinygo's wasm to use a CLI arg to determine the style to use. Change the tinygo source to implement both approaches to "greeting": pure CGO including go imports and also the tinymem approach. For tinymem, literally copy/paste the code from tinymem so it isn't subtly different.

For example export "greeting_cgo" and "greeting_tinymem" then in the host side do both approaches switched on an arg. So, for example the cgo style would use "malloc" and tinymem style "_malloc". Another way is to make another directory, like tinygo-malloc, if it helps.

Then, run the benchmark against each. You can make a manual benchmark or modify BenchmarkAllocation to allow choosing which. The main idea is benchmarking exactly the same use case, both to see the results and also that they complete without error.

If there is not urgency, I would like to take this task

@lburgazzoli lburgazzoli force-pushed the allocation-example-free branch 2 times, most recently from 14d3752 to 1ffb3f6 Compare April 26, 2023 11:28
@lburgazzoli
Copy link
Contributor Author

In my latest commit, I added a POC for what has been described by @codefromthecrypt in the latest comments.
Some names must probably be changed for something better but here I just want to have some feedback

@lburgazzoli lburgazzoli force-pushed the allocation-example-free branch 2 times, most recently from c4b5b91 to 8692e2e Compare April 26, 2023 12:10
Copy link
Contributor

@codefromthecrypt codefromthecrypt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for progressing this! can you paste output of bench results for these two? This will help guide the discussion as people are more likely to engage when they see the difference (vs pulling and running manually)

@lburgazzoli lburgazzoli force-pushed the allocation-example-free branch from 981580c to 4f23ede Compare May 2, 2023 10:12
@lburgazzoli
Copy link
Contributor Author

@codefromthecrypt I've adapted the allocation example to use CGO for the greeting function and squashed the commits.

I kept the BenchmarkAllocationCGO so there is one bench that runs against greet and one against greeting:

➜ go test -run='^$' -bench '^BenchmarkAllocation.*' ./internal/integration_test/vs/compiler -count=6
goos: linux
goarch: amd64
pkg: github.com/tetratelabs/wazero/internal/integration_test/vs/compiler
cpu: Intel(R) Core(TM) i7-10850H CPU @ 2.70GHz

BenchmarkAllocation/Compile-12                73          32071839 ns/op         2958373 B/op       2027 allocs/op
BenchmarkAllocation/Compile-12                73          14590089 ns/op         2953361 B/op       2022 allocs/op
BenchmarkAllocation/Compile-12                81          15130377 ns/op         2953334 B/op       2022 allocs/op
BenchmarkAllocation/Compile-12                80          14616454 ns/op         2953033 B/op       2022 allocs/op
BenchmarkAllocation/Compile-12                84          14875860 ns/op         2952820 B/op       2021 allocs/op
BenchmarkAllocation/Compile-12                79          15151539 ns/op         2952698 B/op       2021 allocs/op
BenchmarkAllocation/Instantiate-12          3987            338833 ns/op          359861 B/op        852 allocs/op
BenchmarkAllocation/Instantiate-12          3866            347517 ns/op          359903 B/op        852 allocs/op
BenchmarkAllocation/Instantiate-12          3709            362471 ns/op          359768 B/op        852 allocs/op
BenchmarkAllocation/Instantiate-12          3744            357544 ns/op          359742 B/op        852 allocs/op
BenchmarkAllocation/Instantiate-12          3633            362594 ns/op          359736 B/op        852 allocs/op
BenchmarkAllocation/Instantiate-12          3699            368921 ns/op          359766 B/op        852 allocs/op
BenchmarkAllocation/Call-12               342232              3376 ns/op              48 B/op          5 allocs/op
BenchmarkAllocation/Call-12               362686              3294 ns/op              48 B/op          5 allocs/op
BenchmarkAllocation/Call-12               366472              3247 ns/op              48 B/op          5 allocs/op
BenchmarkAllocation/Call-12               348795              3282 ns/op              48 B/op          5 allocs/op
BenchmarkAllocation/Call-12               367167              3263 ns/op              48 B/op          5 allocs/op
BenchmarkAllocation/Call-12               343881              3523 ns/op              48 B/op          5 allocs/op

BenchmarkAllocationCGO/Compile-12             79          14663107 ns/op         2952708 B/op       2022 allocs/op
BenchmarkAllocationCGO/Compile-12             80          15036263 ns/op         2952842 B/op       2021 allocs/op
BenchmarkAllocationCGO/Compile-12             81          14828323 ns/op         2952739 B/op       2022 allocs/op
BenchmarkAllocationCGO/Compile-12             81          14646740 ns/op         2952697 B/op       2021 allocs/op
BenchmarkAllocationCGO/Compile-12             80          14670556 ns/op         2952770 B/op       2022 allocs/op
BenchmarkAllocationCGO/Compile-12             81          14636888 ns/op         2952636 B/op       2021 allocs/op
BenchmarkAllocationCGO/Instantiate-12       3385            417383 ns/op          360447 B/op        852 allocs/op
BenchmarkAllocationCGO/Instantiate-12       3056            377058 ns/op          359731 B/op        852 allocs/op
BenchmarkAllocationCGO/Instantiate-12       3600            368788 ns/op          359733 B/op        852 allocs/op
BenchmarkAllocationCGO/Instantiate-12       3544            376303 ns/op          359736 B/op        852 allocs/op
BenchmarkAllocationCGO/Instantiate-12       3094            362826 ns/op          359752 B/op        852 allocs/op
BenchmarkAllocationCGO/Instantiate-12       3470            362647 ns/op          359761 B/op        852 allocs/op
BenchmarkAllocationCGO/Call-12            413808              2770 ns/op              64 B/op          7 allocs/op
BenchmarkAllocationCGO/Call-12            436496              2749 ns/op              64 B/op          7 allocs/op
BenchmarkAllocationCGO/Call-12            427888              2754 ns/op              64 B/op          7 allocs/op
BenchmarkAllocationCGO/Call-12            426919              2745 ns/op              64 B/op          7 allocs/op
BenchmarkAllocationCGO/Call-12            435013              2769 ns/op              64 B/op          7 allocs/op
BenchmarkAllocationCGO/Call-12            436152              2771 ns/op              64 B/op          7 allocs/op

PASS
ok      github.com/tetratelabs/wazero/internal/integration_test/vs/compiler     68.776s

Let me know if this matches what you had in mind.

@lburgazzoli lburgazzoli force-pushed the allocation-example-free branch from 4f23ede to a35f61e Compare May 2, 2023 10:17
Copy link
Contributor

@codefromthecrypt codefromthecrypt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the help again. I made some requests to pare down the change to the minimum. Don't forget to update the guidance in the site directory for tinygo!

internal/integration_test/vs/bench_allocation_cgo.go Outdated Show resolved Hide resolved
examples/allocation/tinygo/testdata/greet.go Show resolved Hide resolved
internal/integration_test/vs/runtime.go Outdated Show resolved Hide resolved
internal/integration_test/vs/wasmedge/wasmedge_test.go Outdated Show resolved Hide resolved
Copy link
Contributor

@codefromthecrypt codefromthecrypt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

more notes as my prior ended up in the wrong place. Thanks again!

examples/allocation/tinygo/testdata/greet.go Outdated Show resolved Hide resolved
// managed by TinyGo hence it must be freed by the host.
func stringToNativePtr(s string) (uint32, uint32) {
if len(s) == 0 {
return 0, 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this special case valid? if so add a trailing comment why (for example,// CGO.free ignores a pointer of zero)
If not, delete the special case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I recall, calling malloc with a size equals to 0, is implementation dependent hence I opted not to call it in such case.

I Will add a note if that's fine with you but in this case it is probably never happening.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an example, so we can't really add suggestions to special case zero unless it will work. We don't have to consider externally provided impls of malloc/free in the example, though. What tinygo uses with this example and flags given to build is enough.

@anuraaga @dgryski do either of you happen to know what's better:

  • not dodging calling malloc dependent on length
  • dodging calling malloc with zero length (results in free called with zero)

Copy link
Contributor

@codefromthecrypt codefromthecrypt May 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here's the tinygo code in question. Personally, I prefer to not special case without saying why we are doing it. Most people won't know why they are calling malloc/free in the first place and code like this gets copy/pasted around. As best I can tell, we are assuming the result will be zero and so we are returning that instead of calling things. That would be a perf optimization, which we'd comment why.

If we don't have a good reason tied to tinygo, I prefer we don't do any special casing in this function. Without comments, special casing raises more questions than answers, especially that a zero pointer is something safe to return back to tinygo on free. For example, pointers are often memory offsets in wasm, so zero would be the initial memory position. ack in the tinygo impl the pointer is mapped, but anyway the point is now I'm having to think about if zero is a safe sentinel value for free or not. it is a lot better to leave the problem on what to do with length zero completely up to tinygo (e.g. don't special case). Cheaper to review, less code, easier example.

https://github.com/tinygo-org/tinygo/blob/0d56dee00f49bd50eb373c02c30062a75ec28f10/src/runtime/arch_tinygowasm_malloc.go#L13-L34

Copy link
Contributor Author

@lburgazzoli lburgazzoli May 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

special case removed

@codefromthecrypt
Copy link
Contributor

ps if you feel like benchmarking "greeting" is better than "greet", feel free to switch. What I mean is let's keep only one for the allocation example. I don't care strongly which.

@lburgazzoli lburgazzoli force-pushed the allocation-example-free branch 4 times, most recently from a703f1d to 2ad486f Compare May 2, 2023 12:18
@lburgazzoli lburgazzoli force-pushed the allocation-example-free branch from 2ad486f to 586b61c Compare May 2, 2023 12:20
@lburgazzoli
Copy link
Contributor Author

ps if you feel like benchmarking "greeting" is better than "greet", feel free to switch. What I mean is let's keep only one for the allocation example. I don't care strongly which.

I don't have any strong opinion, so let's keep the benchmark as it is, it wont be too complex to add a new one when/if it would become a need.

Copy link
Contributor

@codefromthecrypt codefromthecrypt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I'll go ahead and merge this and do the site updates on own vs asking again here. Appreciate your help getting the underlying solution together and benchmarking to ensure it didn't cause a regression.

@codefromthecrypt codefromthecrypt marked this pull request as ready for review May 2, 2023 23:00
@codefromthecrypt codefromthecrypt merged commit b2c11d8 into tetratelabs:main May 2, 2023
@lburgazzoli
Copy link
Contributor Author

I can try to update the docs before EOW if you want me to.

@ncruces
Copy link
Collaborator

ncruces commented May 2, 2023

Epic effort for something that started out with improving example code.
Thanks, and really sorry if I kinda lost track along the way!

@codefromthecrypt
Copy link
Contributor

@lburgazzoli I raised #1429 as I think we should get this sorted to avoid a confusing comment chain over a week again. This sort of stuff is cognitively deep and best way I can find is to contain it by time to shortest as possible, as this avoids things like having to cross when people are out for holidays etc.

In general, personally I try to complete everything PR of any kind within a couple days for the same reason, even if not cognitively high. Even small things can get difficult to load back into the brain if drug out over several days, much less weeks, with several people.

@ncruces
Copy link
Collaborator

ncruces commented May 3, 2023

@lburgazzoli sorry for just spotting this, I guess I only grasped it when looking at committed code in whole.
But I might be missing something obvious and I'd like to get your understanding before filing an issue/PR.

What keeps message alive (i.e. not available for GC) after stringToPtr returns?

// log a message to the console using _log.
func log(message string) {
ptr, size := stringToPtr(message)
_log(ptr, size)
}

I'm wondering if we need to add runtime.KeepAlive(message) after _log.

@codefromthecrypt
Copy link
Contributor

I'll update tinymem once above is resolved. We do tend to see things like this in syscall code, after the call like https://github.com/golang/go/blob/0d347544cbca0f42b160424f6bc2458ebcc7b3fc/src/syscall/fs_wasip1.go#L824-L829 cc @Pryz @achille-roussel

@lburgazzoli lburgazzoli deleted the allocation-example-free branch May 3, 2023 05:51
@lburgazzoli
Copy link
Contributor Author

@lburgazzoli sorry for just spotting this, I guess I only grasped it when looking at committed code in whole. But I might be missing something obvious and I'd like to get your understanding before filing an issue/PR.

What keeps message alive (i.e. not available for GC) after stringToPtr returns?

I completely missed that usage to be honest.

// log a message to the console using _log.
func log(message string) {
ptr, size := stringToPtr(message)
_log(ptr, size)
}

I'm wondering if we need to add runtime.KeepAlive(message) after _log.

I think the usage of message is safe, however the implementation of stringToPtr is not as it performs a copy of the string's data but that slice goes out of scope hence the address returned by the function may become invalid if a GC is triggered.

I think the only safest option would be to use stringToLeakedPtr ad free up memory with defer, similar to what it is done in the host code.

@ncruces
Copy link
Collaborator

ncruces commented May 3, 2023

You're right, I managed to miss the copy. Using stringToLeakedPtr (C.malloc) is one way.

The only other one I can think of is to have stringToPtr return unsafe.Pointer, then cast it in the caller, and runtime.KeepAlive the unsafe.Pointer.

@lburgazzoli
Copy link
Contributor Author

KeepAlive

A more unsafe way would be to runtime.KeepAlive the message and return the actual data that backs the string with unsafe.StringData, but there will be dragons :)

@ncruces
Copy link
Collaborator

ncruces commented May 3, 2023

Yes, that too. The only additional unsafety to that it's supposed to be read-only, and the host can change it, but the host can always change it, so…

@codefromthecrypt
Copy link
Contributor

The only additional unsafety to that it's supposed to be read-only, and the host can change it, but the host can always change it, so…

agree the host can always change anything. Doing anything in the guest that makes it seem like it cannot is probably not a good idea. Like adding racing stripes to a Honda, doesn't make it faster. We should make the guest code concise, cheap as possible and most importantly relevant. If the current code isn't really leaking it, or it is more complicated than an alternative to do the same. PR welcome, especially before we communicate widely a change.

@lburgazzoli
Copy link
Contributor Author

@codefromthecrypt @ncruces opened a PR with a possible implementation #1434, feel free to close it if doe snot make sense

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants