Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat: Distributed proof generation #105

Open
mratsim opened this issue Apr 18, 2024 · 0 comments
Open

Feat: Distributed proof generation #105

mratsim opened this issue Apr 18, 2024 · 0 comments

Comments

@mratsim
Copy link
Contributor

mratsim commented Apr 18, 2024

At the moment, zkVMs are somewhat too slow for real-time proving on a single computer.
This is especially the case for Risc0 which requires dozens of GPUs.

Unfortunately, on most cloud providers scaling is significantly easier by adding more machines, or it might be the only way to scale as for example on AWS, we cannot get more than 8 GPUs per instance.

Hence we need distributed computing support on Raiko.

zkVMs are all in the process of adding continuations (Risc0, Powdr) / Sharding (SP1) / Chunking (Halo2), hence an easy way to distribute compute would be to have a master prover with a fleet of workers. The master interacts with the end-user and delegates work to workers.

GPU Cloud pricing analysis

We need 16GB or 24GB VRAM.
P40 have a 2016 architecture and V100 a 2019 architecture
image

EC2 P3 have V100 but 16GB and more expensive so non-starter
image

P4d with A100 (2021) and 80GB Vram are non-starter due to price
image

EC2 G5 with A10G fits our need but limited to 8
image

Risc0

Distributing Risc0 requires sending work remotely and calling prove_segment on the worker then sending back the result.

Note: Risc0 only supports 1 GPU per machine (GPU 0):

Once the stark proof is generated, it needs to be wrapped in Groth16 (automatically done when going through Bonsai).
We should be able to use their compact_proof / stark2snark for this:

SP1

Distributing SP1 requires sending work remotely and calling prove_shard on the worker then sending back the result.

Note: SP1 supports delegating to a "Succinct Network" with protocol defined here: https://github.com/succinctlabs/sp1/blob/5db203c/sdk/src/proto/network.rs and RPC https://github.com/succinctlabs/sp1/blob/5db203c55647c30618431822d33f419614f9fab6/sdk/src/lib.rs#L104-L157 but this seems to only delegate work, not split work.

Wrapping stark proof in snark is WIP.

Maintenance considerations

We will likely need:

  • either a new trait "DistributedProver" with respectively prove_shard and prove_segment
  • or extend the existing traits with prove_shard_on_worker and prove_segment_on_worker

Maintenance to sync with upstream should be minimized.

cc @Champii @petarvujovic98 @CeciliaZ030

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant