Replies: 1 comment
-
Hey @flxo, thank you for gathering this detailed profiling! Really awesome to see. It's quite useful seeing what the overhead kameo brings over using plain channels. I've copied your benchmark into kameo, and added one other test where I use plain channels, but box the request and reply (similar to what kameo does), and got the following results:
Based on this, the important numbers are:
So it seems like besides just boxing, kameo does add some other overhead too, probably due to the use of
Nope, the bench looks correct, I think what we're seeing here is really just the overhead of everything kameo provides. I'll dig into this more since it's quite interesting seeing the overhead added. But I ran the same benchmark with the latest version of actix again and got
If you'd like to squeeze out the performance, it might be worth going with raw channels in this case at the expense of developer experience. Sadly boxing the messages seemed to be the only way I could get actors working without using a big message enum in Rust. |
Beta Was this translation helpful? Give feedback.
-
Hello,
Just want to share this. I was curious about the overhead of
kameo
compared to "plain" channels and a task.Lets see where kameo spends it's cpu time. The following minimal program built in release mode (+
debug=true
in the profile) and executed on a macOS on2,4 GHz 8-Core Intel Core i9
.produces this profile. Captured with samply(❤️).
You can see the two big blocks:
kameo::actor::actor_ref::ActorRef::send::{{closure}}
and<kameo::actor_kind::SyncActor as kameo::actor_kind::ActorState>::handle_message::{{closure}}
in the flame graph beside the usual Tokio runtime stuff. Interesting is: ~5.4% overhead due toBox::new
in the tx path. ~8.3% (+4%free
) for boxing on the rx side.I used
criterion
to compare a simple echo actor that is processing sync calls with a plain Tokio task that sends a reply on a received on shotSender
. Think this gives an impression about the overhead of the convenience that kameo brings.Question: Did I do something wrong here?
Output:
The difference is quite huge (~2.5). Clearly everything in between you and the raw channel costs performance but I wouldn't have expect that. I didn't examine
async
messages.I'm evaluation this all here because we're thinking about using it in some networking application and thoughput matters.
Let me know if I missed something or there's a systematic error.
cheers,
@flxo
Beta Was this translation helpful? Give feedback.
All reactions