-
Notifications
You must be signed in to change notification settings - Fork 481
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Help out for batched random key generation? #705
Comments
I made a minimally viable benchmark program here in https://github.com/stevefan1999-personal/ed25519-gen-speed-test. I used the fastest random number generator possible to get rid of its effect. On my 3700X PC, this generates:
|
Take a look at the |
May I get some papers or lecture notes on how to use that? |
It takes an iterator over |
If I read that correctly, I can use multiscalar_mul to do something that means |
Okay, so I came up with this: let mut rng = Rng::new();
let scalars: Vec<ExpandedSecretKey> = (0..16)
.map(|_| {
let mut secret_key: SecretKey = SecretKey::default();
rng.fill(&mut secret_key);
ExpandedSecretKey::from(&secret_key)
})
.collect();
EdwardsPoint::multiscalar_mul(
scalars.iter().map(|x| x.scalar),
scalars.iter().map(|_| constants::ED25519_BASEPOINT_POINT),
); However, it returns an Edwards point, so how do I separate the multiplied (scalar * ed25519 base point) to their own respective points? |
Aah sorry, that's the wrong API shape for your problem |
Ahh I think it is close. I just need to generate N scalars from entropy, and get the [for some scalar in scalars (scalar * ed25519 base point)] without having it dot product summed, but do so in an optimized, data-parallel way. This way I don't have to blow up the CPU cache while doing SIMD in lockstep. Also, I tried to simplify the public key generation, is this correct? let mut secret_key = SecretKey::default();
csprng.fill_bytes(&mut secret_key);
let pub_key =
EdwardsPoint::mul_base(&Scalar::from_bytes_mod_order(clamp_integer(
Sha512::default().chain_update(secret_key).finalize()[..32]
.try_into()
.unwrap(),
)))
.compress()
.to_bytes(); |
I'm trying to generate vanity wallet addresses, which are originally in ed25519, so I start by generating random bytes from a PRNG enough to cast into a secret key and then generate key pairs, but I ran into performance issue so I want to debug it:
On my 3700X PC, and running in full 16 threads, it only runs 300K keypairs/second with a precise monotonic clock. However, the C vanity address generator on my same PC, with same parameters can generate 30M keypairs/second with random data, so I think there could be something wrong here
So I started digging, which means we start from here:
curve25519-dalek/ed25519-dalek/src/signing.rs
Lines 102 to 108 in d5ef57a
And then here:
curve25519-dalek/ed25519-dalek/src/signing.rs
Lines 788 to 794 in d5ef57a
And then here:
curve25519-dalek/ed25519-dalek/src/hazmat.rs
Lines 57 to 76 in d5ef57a
And then here:
curve25519-dalek/curve25519-dalek/src/scalar.rs
Lines 237 to 246 in d5ef57a
And eventually here:
curve25519-dalek/curve25519-dalek/src/scalar.rs
Lines 1127 to 1128 in d5ef57a
I noticed that despite I'm using AVX512, the flame graph shows that those
UnpackedScalar::mul_internal
andUnpackedScalar::montgomery_reduce
are still using generic serial implementation and took a large group of execution time, I'm not sure if this can be sped-up and improved? Or maybe I should look for something else for address generation?I noticed that the wallet vanity address generator in C uses SUPERCOP's version, and their ed25519 key generation is much faster as well.
The text was updated successfully, but these errors were encountered: