Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AsyncRead and AsyncWrite support #5

Open
mrzv opened this issue Sep 5, 2021 · 39 comments
Open

AsyncRead and AsyncWrite support #5

mrzv opened this issue Sep 5, 2021 · 39 comments
Assignees

Comments

@mrzv
Copy link
Contributor

mrzv commented Sep 5, 2021

I'm trying to figure out how to use the async side of the code. Specifically, I'd like to replace an example that spawns an std::process::Child, via a Command, and interacts with its stdio via Read/Write traits, with an example that uses tokio::process::{Command,Child}. Its stdio is implemented via traits AsyncRead/AsyncWrite. Right now, as far as I can tell, I'm supposed to wrap them in an async function, which gets passed to BincodeAsyncClientTransport. (An example like this would be amazing, by the way.) I'm wondering if it makes sense and would be possible to add support for AsyncRead/AsyncWrite directly to essrpc, to mirror the Read/Write interface?

@Electron100
Copy link
Owner

That's a good question. Essrpc was originally written before async/await was fully standardized. Let me see what I can do to both document things and clean them up to match current best practices.

@Electron100 Electron100 self-assigned this Sep 6, 2021
@Electron100
Copy link
Owner

I've started working on this on a new async branch here https://github.com/Electron100/essrpc/tree/async. So far I've converted Bincode/Json async transports to use AsyncRead/AsyncWrite, but I'd like to fix a few rough edges and also better support traits that already start async before I merge the branch in and publish a new version

@mrzv
Copy link
Contributor Author

mrzv commented Sep 7, 2021

Great. Thanks for such a quick response. I'll check this branch out.

@mrzv
Copy link
Contributor Author

mrzv commented Sep 8, 2021

I'm sure you already know this limitation — it's clear from your comment in rx_response — but I'm getting

RPCError { kind: TransportEOF, msg: "EOF during bincode deserialization", cause: None }

when I try the async code.

@Electron100
Copy link
Owner

Yes, that's one of the things I want to fix before merging the branch back. But I don't actually have an example that fails currently. I'm not sure how easily yours extracts from the rest of your codebase, but if it does I'd love to include it in a test.

@mrzv
Copy link
Contributor Author

mrzv commented Sep 9, 2021

I just added example in mrzv/essrpc@66228f2. Not sure how to turn it into a test, but if you run it with

cargo run --example async-command --features bincode_transport,async_client 

it will give the error.

@Electron100
Copy link
Owner

Sorry for the slow progress here -- I'm not sure where the time goes. I have added a test based on your example, and pushed changes to the async branch with a frame format around bincode so that the test passes (the transport now knows how many bytes to read), but I'm not entirely happy with it. I don't think I've entirely grokked the async world: how best to share code between sync and async implementations and how best to deal with underlying libraries (such as bincode) that expect to work with sync readers. Changing the actual wire format to support async feels gross...

@mrzv
Copy link
Contributor Author

mrzv commented Sep 16, 2021

Thanks. I'll check it out. I see some eprintln!s in the commits. I assume these are temporary and will go away?

As for the rest, I'm afraid I'm a complete newbie to Rust, so I don't have anything meaningful to offer. I'm still fumbling around async, so no real insight there.

Either way, many thanks for all your work on this.

@mrzv
Copy link
Contributor Author

mrzv commented Sep 16, 2021

Super. This works in my code. Thanks again!

I have some thoughts on how async code could be re-organized to make the user experience better (mostly to not have to deal with switching between tokio and async-std futures), but that's probably best discussed in another issue. It's a cosmetic matter anyway.

@mrzv
Copy link
Contributor Author

mrzv commented Sep 17, 2021

Two questions.

When I run cargo test in the repository, I get all sorts of errors. Are the tests supposed to run? Or are they in some incomplete state?

I'm trying to use the new code in my project, and I switched from tokio::process:Command to openssh::Command. The code works for the first few calls, or in some small restricted settings, but then eventually hangs. The trouble is I have no idea how to debug it. Do you have any advice how to peak into what's going on inside essrpc?

@mrzv
Copy link
Contributor Author

mrzv commented Sep 17, 2021

Never mind the first question. I was missing --features bincode_transport,async_client,json_transport.

Any debugging advice is appreciated.

@mrzv
Copy link
Contributor Author

mrzv commented Sep 18, 2021

Ok, never mind all that. openssh was a red herring. I can reproduce the problem in your tests by simply passing a large buffer as an argument rather than returning it. The buffer needs to be a little larger than before, 8KiB.

I've added a big_buffer_argument to tests, which just hangs, see mrzv/essrpc@5c11a29.

@Electron100
Copy link
Owner

Sorry for the continuingly slow replies. Interestingly I didn't get the hang at the same point you do, but I do get it if I increase the big buffer size all the way up to 256kb. Debugging now...

BTW I usually use an old-fashioned Makefile on top of Cargo to abstract away giving the necessary feature arguments (and have one in this repo). Running make check will run cargo test --all-features (and also Clippy).

@mrzv
Copy link
Contributor Author

mrzv commented Sep 18, 2021

I'm guessing it's something platform-dependent. The other concerning bit is that if you make the buffer very large (128MiB), the big_buffer test takes forever (on the order of a minute), which doesn't make a lot of sense. So something is off.

And thanks for the pointer to make check.

@Electron100
Copy link
Owner

There were bugs in the parsing of the ServerTransport bincode implementation -- I hadn't properly updated it for the new framed format. I've pushed changes to the async branch. Also a client bug where I called write when I meant write_all. Tests are all passing (and quickly) for me new even with an enormous buffer.
Still cleanup I want to do before merging into master. The first version of essrpc was written before async was really in Rust, and I'm still getting up to speed on both the ecosystem and the best practices.

BTW you asked about debugging. I'm not sure what your background is, but for me the debugging process for rust is the same as it would be for C++. For a hang I run the program under gdb. Once i repro the hang under the debugger, I break execution and run thread apply all bt to look at the backtraces of all threads and find the wedged one (and what it's doing). Cargo test prints out the path of the actual binary it's executing, so I use that as an argument to rust-gdb. In this case the server thread (which doesn't currently have an option to use async, though I should add that) was blocked on a recv call. Theoretically if an async program were not hanging on a blocking call but simply failing to make progress on all its futures I suppose I might have to break into the poll machinery, but haven't seen that type of issue yet.

@mrzv
Copy link
Contributor Author

mrzv commented Sep 19, 2021

Thanks!

Currently, for me, basic_json_async fails intermittently. I don't need json transport, so it's a non-issue, but just FYI, if you are not aware of it.

The speed mystery is also solved. It was caused by the missing --release flag, to cargo test. With it, the tests pass in 5 seconds (with a 256MiB buffer); without, those tests take 3 minutes. I didn't expect such a difference for IO-heavy code, but it's fine.

Thanks for the pointers on debugging. They are helpful. My questions were more about what are the relevant bits of code to look at. I was trying to debug a remote server (running over SSH on a different machine), so attaching a debugger would be trickier. But it's all good. Thanks again.

@mrzv mrzv mentioned this issue Sep 19, 2021
@Electron100
Copy link
Owner

Yes, I need to make similar fixes for the json transport to what I've made in bincode.

Interesting, we're seeing very different speed characteristics.On my system (Arch Linux, i7-7700K cpu, Rust 1.54), cargo test --all-features --release takes 0.5 seconds, and debug is still only 0.6 seconds. I'm going to have to try benchmarking on some of my other computers to get a better feel for the variability. Mind sharing what sort of system you're getting such slow numbers on?

@mrzv
Copy link
Contributor Author

mrzv commented Sep 20, 2021

I'm getting this on a 5-year-old MacBook Pro. But I wouldn't worry about these numbers. I tried the code in a real application today, where the client is on the MacBook Pro, the server is on a beefy ArchLinux workstation, and they are talking using this RPC library over SSH. I got a sustained 60MiB/s data transfer, which is as high as I can get over SSH. So it looks very good.

@mrzv
Copy link
Contributor Author

mrzv commented Sep 24, 2021

Do you mind releasing what's in the async branch as a new version on crates.io?

@mrzv
Copy link
Contributor Author

mrzv commented Sep 30, 2021

I don't mean to be pushy, but do you mind releasing what's currently in async branch as a simple version bump? I want to publish a crate, but it needs that as a dependency.

@Electron100
Copy link
Owner

Sorry for the delay here. As you noted earlier in this thread, the json transport is broken and needs cleaning up. Trying to get that working now, as well as some additional cleanup

@Electron100
Copy link
Owner

Ok, sorry again this took so long but I've just published 0.4. It does make some tweaks vs what was in the async branch earlier -- I took a dependency on Tokio to tokio_util::codec::Framed in the bincode transport instead of doing my own framing. Since I was now depending on Tokio anyway, I now use the AsyncRead and AsyncWrite traits from Tokio as well.

@mrzv
Copy link
Contributor Author

mrzv commented Oct 1, 2021

Great. Thanks! I'll try it out and tweak my code as necessary.

@mrzv
Copy link
Contributor Author

mrzv commented Oct 1, 2021

Hm, when I add essrpc to a clean project, and run cargo build, I get:

error: failed to select a version for the requirement `essrpc = "^0.4.0"`
candidate versions found which didn't match: 0.3.1, 0.3.0, 0.2.2, ...
location searched: crates.io index

Any idea what could be causing this? Can there be some internal conflict of the dependencies, or something?

@mrzv
Copy link
Contributor Author

mrzv commented Oct 1, 2021

I get the same error, when I try to build what's in your repo now.

@mrzv
Copy link
Contributor Author

mrzv commented Oct 1, 2021

Correction, not the same, but similar:

cargo build
    Updating crates.io index
error: failed to select a version for the requirement `essrpc_macros = "^0.3"`
candidate versions found which didn't match: 0.4.0

@Electron100
Copy link
Owner

Sorry, I forgot to push some commits to the repo (pushed now), which definitely explains the second issue.

I'm a bit confused by the first issue though when you're referencing crates.io. I wonder if there could be a caching/propagation delay in crates.io? I just tried it now -- created a new project with cargo new and then added

[dependencies]
essrpc = "0.4"

And cargo build succeeded without issue, the logs showing that it downloaded 0.4 of essrpc from crates.io. Could you try again?

@mrzv
Copy link
Contributor Author

mrzv commented Oct 1, 2021

Hm, your repository contents compile for me now, but the original problem is still there. If I do cargo init, then cargo add essrpc, it adds:

[dependencies]
essrpc = "0.4.0"

but when I run cargo build, I get:

    Updating crates.io index
error: failed to select a version for the requirement `essrpc = "^0.4.0"`
candidate versions found which didn't match: 0.3.1, 0.3.0, 0.2.2, ...
location searched: crates.io index
required by package `my-rpc v0.1.0 (.../my-rpc)`

Changing 0.4.0 to 0.4, produces the same effect. I wonder what's going on.

@mrzv
Copy link
Contributor Author

mrzv commented Oct 1, 2021

Hm, I just tried it on a different machine (Linux vs Mac), and cargo build went through without a hitch. What could be causing this?

@mrzv
Copy link
Contributor Author

mrzv commented Oct 1, 2021

Removing ~/.cargo fixed the problem. Still not sure what's causing it, but it's clearly on my end. Sorry for the false alarm.

@mrzv
Copy link
Contributor Author

mrzv commented Oct 1, 2021

Well, my main code is broken with the new version (after the minor adjustments to make it compile). It looks like I get the error on the server side (TransportError). I'll try to debug, but if you have pointers what changed and where I should be looking, I'll appreciate them.

@mrzv
Copy link
Contributor Author

mrzv commented Oct 1, 2021

big_buffer_* tests now fail, with a sufficiently large buffer. (For me, this happens with 256*1024*1024, but I have a feeling it's architecture-dependent.)

@Electron100
Copy link
Owner

I'm a bit surprised -- the main change in the bincode transport was to switch to Tokio's Framed on the async side instead of doing custom framing (although since there isn't an async server option yet, the server side does do its own framing that should be compatible with the Tokio framing on the async client side). The RPCError with the TransportError kind should also have a cause associated. Can you share the cause? It may give further insight.

Regarding big_buffer_ Tokio's framing has a default maximum size of 8MB, so failing with 256MB is not unexpected (the error I see when I try this is fairly explanatory: "IO error in transport caused by:\n frame size too big"). Sending such large values over RPC seems somewhat unusual, but I'm happy to make the maximum configurable if you have need of it.

@mrzv
Copy link
Contributor Author

mrzv commented Oct 2, 2021

It looks like that was my mistake. I must have had a mismatch between client and server code, where the server was compiled using the older version of essrpc. So it's probably that the framing didn't match. At least, when I'm trying it now, it seems to work. If I uncover more, I'll let you know.

The buffer size is an issue for me. I'm transferring entire files as function arguments, and there is not really a limit on their size. It sounds like I'll have to figure out chunking myself somehow. On the other hand, it would make it more user-friendly, if essrpc could chunk the arguments itself.

@mrzv
Copy link
Contributor Author

mrzv commented Oct 2, 2021

Just to add, it's not just files that will exceed 8MiB. I also have vectors (of structures) that I want to pass around that are larger than that.

@mrzv
Copy link
Contributor Author

mrzv commented Oct 2, 2021

Just for my own edification: why does tokio's framing need max size, while yours didn't?

@mrzv
Copy link
Contributor Author

mrzv commented Oct 2, 2021

If I'm reading length_delimited correctly, the max_frame_len is only used as a safety check for overflow (perhaps against a malicious read/write?). Then perhaps the easiest solution is for you to expose set_max_frame_length, and I could set it to something arbitrarily large. That would work for my purposes, I think.

@Electron100
Copy link
Owner

My read of the Tokio code is the same. I've just published 0.4.1 which removes the frame length restriction entirely (sets it to usize::MAX). In the future I might re-add it back as something configurable in the future, but on reflection it didn't make much sense as-is with the restriction being only on the client side and only for async clients.

@mrzv
Copy link
Contributor Author

mrzv commented Oct 3, 2021

That's perfect. usize::MAX is exactly what I wanted. Much appreciated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants