Understand current performance bottlenecks in signature generation #72

jakmeier · 2024-12-19T13:50:10Z

Description

Currently, when the dev network starts and generates triples and pre-signatures at full speed, we only observe around 40 messages per second on each node. (see graph and comment here)

We need to understand what the limiting factors are.

In a perfect world, we are only limited by the CPU-time it takes to perform the cryptographic work and the network delay. We can achieve this if we ensure 3 things.

Incoming messages are always immediately delivered to cait-sith through Protocol::message.
We are always calling Protocol::poke until it tells us to wait.
Any messages generated are immediately sent.

In theory, these 3 tasks can always run in parallel, without blocking each other. Anytime one of these three tasks is not running, we potentially introduce overhead that could be avoided.

In practice, Protocol::poke and Protocol::message both require mutable access to Protocol, so they will not be able to run in parallel on the same Protocol instance. But presumably Protocol::message doesn't do actual work and only records the incoming message (to be checked), so the overhead of doing it serially should be minimal.

In any case, we should try to find out if any of these three tasks is delayed significantly due to implementation inefficiencies.

Possible steps

We can add specific metrics to help us understand how much time we spend on each task. (Created as a separate issue: Add more performance metrics #71)
Look at general execution traces from tracing to see if anything looks suspicious. See here for how it's done in nearcore.
- Note: I see we have some tracing tooling already in the code base but I am not sure how much it is used and how well maintained. It might be worth it to invest some time into this by adding appropriate spans and setting up good tooling to look at the timing of the traces.
Observe machines while they are under load with general tools like htop(1), iotop(8), perf(1), or more specialized tools like tokio-console if we can compile the nodes with nightly tokio features.

The text was updated successfully, but these errors were encountered:

jakmeier · 2024-12-19T14:35:40Z

If anyone already has data to help understand the performance bottlenecks better, or knows a good way to find it out, please share it here. :)

jakmeier · 2024-12-19T14:59:22Z

Related issue with relevant analysis: #32

volovyks · 2024-12-20T16:37:24Z

Are you suggesting replacing Prometheus with Tracing? Or built these new metrics using it?
It looks promising, but it will require refactoring.

jakmeier · 2025-01-02T09:54:35Z

No, I would still keep Prometheus for old and new metrics.

Tracing is an additional tool for deeper performance investigations. It can work more generally, including places where we haven't added metrics. And it can potentially give more fine-grained information, telling you exactly how many micro seconds are spend on each function. But it might require you to add more tracing information at runtime to be useful.

I see today logs are already done with the tracing macros (e.g. tracing::warn!) and we even have some span info added with macros, too (e.g. tracing::info_span! and #[tracing::instrument]). This means, you are already producing at least some tracing data.

This data is then consumed by a tracing subscriber. This code suggest you have this integrated with Google's Stackdriver. I'm not familiar with Stackdriver but perhaps that's all you need to look at the execution traces in detail and you can use it already today for investigating performance issues. Maybe you can find out which functions a node spends most time in.

At nearcore, Jaeger is used for presenting the traces. The tokio documentation chapter Next steps with Tracing has an example how to set this up. But I wouldn't add something new before understanding what you already have (Stackdriver).

volovyks · 2025-01-02T11:47:35Z

Google provides Google Cloud Profiler and Tracing functionality. However, it is not turned on for our project, and it does not appear to support Rust projects natively. But I understand what you mean. Profiling and its flame graphs should give us many insights.
@auto-mausx have you worked with it? Have you seen it working for Rust? I have not found much information about it.

auto-mausx · 2025-01-02T17:22:29Z

So as with any tracing profiler, we will need to send traces that make the visualization useful. I have used GCP Tracing before, albeit just as a simple functionality check. We can utilize Google or grafana as they both have Tracing support via OpenTelemetry.

volovyks self-assigned this Jan 9, 2025

volovyks added vertical scaling P1 labels Jan 21, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Understand current performance bottlenecks in signature generation #72

Understand current performance bottlenecks in signature generation #72

jakmeier commented Dec 19, 2024

jakmeier commented Dec 19, 2024

jakmeier commented Dec 19, 2024

volovyks commented Dec 20, 2024

jakmeier commented Jan 2, 2025 •

edited

Loading

volovyks commented Jan 2, 2025

auto-mausx commented Jan 2, 2025

Understand current performance bottlenecks in signature generation #72

Understand current performance bottlenecks in signature generation #72

Comments

jakmeier commented Dec 19, 2024

Description

Possible steps

jakmeier commented Dec 19, 2024

jakmeier commented Dec 19, 2024

volovyks commented Dec 20, 2024

jakmeier commented Jan 2, 2025 • edited Loading

volovyks commented Jan 2, 2025

auto-mausx commented Jan 2, 2025

jakmeier commented Jan 2, 2025 •

edited

Loading