Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Poor performance. 8.58 - 9.45 cpb #1

Closed
WildCryptoFox opened this issue Sep 21, 2019 · 9 comments · Fixed by #2
Closed

Poor performance. 8.58 - 9.45 cpb #1

WildCryptoFox opened this issue Sep 21, 2019 · 9 comments · Fixed by #2

Comments

@WildCryptoFox
Copy link

Hey. I'm (gradually) implementing Sphinx and HORNET. I'll be using HCTR as my SPRP because it builds on normal primitives (BC = AES-128, AXU = POLYVAL), is simple and is sufficiently fast enough. I'm avoiding AEZ on the premise that its complex and intrusive design is unhealthy for implementations, but I decided to benchmark it as it was supposed to be quick and is an SPRP.

Recommendations:

  1. Split out the AES key expansion, it is slow and can be reused. This may not matter for your use case as Loopix uses distinct paths per packet; but it makes AEZ look horrible to Rust consumers who find this crate.

  2. Encrypt/decrypt in place. Allocating is slow, especially if you want to handle high loads and are interested in AEZ for its high performance.

throughput/aez-encrypt/1024
                        time:   [9642.0824 cycles 9680.5622 cycles 9718.7663 cycles]
                        thrpt:  [9.4910 cpb 9.4537 cpb 9.4161 cpb]
Found 5 outliers among 100 measurements (5.00%)
  5 (5.00%) high mild
throughput/aez-decrypt/1024
                        time:   [9654.9435 cycles 9695.2769 cycles 9736.5744 cycles]
                        thrpt:  [9.5084 cpb 9.4680 cpb 9.4287 cpb]
Found 3 outliers among 100 measurements (3.00%)
  3 (3.00%) high mild
throughput/aez-encrypt/2048
                        time:   [18458.1809 cycles 18537.1270 cycles 18614.7876 cycles]
                        thrpt:  [9.0893 cpb 9.0513 cpb 9.0128 cpb]
throughput/aez-decrypt/2048
                        time:   [18457.0398 cycles 18541.9442 cycles 18625.2243 cycles]
                        thrpt:  [9.0943 cpb 9.0537 cpb 9.0122 cpb]
throughput/aez-encrypt/4096
                        time:   [35980.6519 cycles 36157.0494 cycles 36334.6163 cycles]
                        thrpt:  [8.8708 cpb 8.8274 cpb 8.7843 cpb]
throughput/aez-decrypt/4096
                        time:   [35602.2529 cycles 35758.1323 cycles 35921.9576 cycles]
                        thrpt:  [8.7700 cpb 8.7300 cpb 8.6920 cpb]
throughput/aez-encrypt/8192
                        time:   [70154.7788 cycles 70514.6029 cycles 70879.3800 cycles]
                        thrpt:  [8.6523 cpb 8.6077 cpb 8.5638 cpb]
throughput/aez-decrypt/8192
                        time:   [70397.5765 cycles 70727.0361 cycles 71055.9571 cycles]
                        thrpt:  [8.6738 cpb 8.6337 cpb 8.5935 cpb]
throughput/aez-encrypt/16384
                        time:   [139955.0308 cycles 140464.0862 cycles 140970.7971 cycles]
                        thrpt:  [8.6042 cpb 8.5732 cpb 8.5422 cpb]
throughput/aez-decrypt/16384
                        time:   [139951.4526 cycles 140577.4754 cycles 141218.4333 cycles]
                        thrpt:  [8.6193 cpb 8.5802 cpb 8.5420 cpb]
@david415
Copy link
Member

  1. pull-requests welcome.

  2. this is not a rust implementation of AEZ! rather it is simply a wrapper around Ted Krovetz's C implementation with AES-NI and vector hardware optimizations. If it's slow then it's because the C it wraps is slow.

  3. you want to implement HORNET even though we now live in the Snowden apocalypse where it is well understood that these kinds of anonymous communication networks are easily broken by sufficiently global adversaries, and they are not theoretical but actually exist? mix networks for the win. do the people of this gray world not deserve anonymous communication networks that are not so easily broken?

  4. the original intention here was to make a Sphinx implementation in Rust that is binary compatible with the Katzenpost sphinx. I have succeeded in achieving this goal. However I would think that replacing AEZ with a Keccak Farfalle would be more simple and possibly faster. Why do you prefer HCTR over Farfalle?

@david415
Copy link
Member

and another thing!

you must've written some small amount of code to perform those benchmarks. why do i have to ask for this code? why not just open source everything by default? academia and much of the industry still does not understand open source.

@james-darkfox did you use the criterion crate to perform these benchmarks or what? where's the code?

@WildCryptoFox
Copy link
Author

WildCryptoFox commented Sep 22, 2019

(1) and (2) I may send in a PR, though I'm working on other things first. I did realize they're just bindings but the C has the option to split out the key expansion and can probably do in-place enc/decryption, even if it can't, the Rust interface exposed can take a second buffer for the output without forcing the allocation.

(3) I do recognize that onion routers do not compare with mixnets with respect to global adversaries; I'm "fine" with discarding them from the threat model for a general purpose anonymity system. I do remain interested in mixnet work, especially anything based on Loopix with its bandwidth-latency balancing. Loopix is great for instant messaging and Email but not general IP traffic. HORNET is great for forwarding IP.

(4) Hm. I recall the name but do not see any evidence on my system that I've discussed or experimented with it. I'll have to take a look into this one. I prefer HCTR because its simple construction is general, thus its components may be swapped out with minimal analysis. AEZ and Farfalle are specific functions, not general constructions. The performance numbers are impressive and may add enough value to consider them, but they can always be added later.

I am publishing the code, thought I included the link, whoops. Locally I've hacked together a criterion Measurement implementation for CyclesPerByte and will be publishing this independently in a PR to criterion (PR submitted #336).

My sphinx code is very early and HORNET code basically nonexistent. All will be published on GitLab/sio4. :-)

Besides the custom Measurement, the benchmark entry for aez's encrypt is trivial. I'll remove the aez benchmark from my code as it isn't a fair comparison at least yet. I may re-add it when either of us remove the forced allocation and split out the key expansion. It just makes the graph too tall!

// CyclesPerByte => https://github.com/bheisler/criterion.rs/pull/336
group.bench_function(BenchmarkId::new("aez-encrypt", size), |b| {
    b.iter(|| aez::aez::encrypt(&[0u8; 48], &[0u8; 16], &buf))
});

group.bench_function(BenchmarkId::new("aez-decrypt", size), |b| {
    b.iter(|| aez::aez::encrypt(&[0u8; 48], &[0u8; 16], &buf))
});

lines

@WildCryptoFox
Copy link
Author

WildCryptoFox commented Sep 22, 2019

@david415 I'm using a Intel(R) Core(TM) M-5Y51 CPU @ 1.10GHz; dual core which turbos up to ~2500MHz. aes+avx2. No avx512. Newer CPUs (relevant for HORNET relays), with AES-NI and AVX512 would yield much better performance. The Rust POLYVAL implementation doesn't yet support AVX512.

Edit: Just in case you clicked that mis-link. /sio4/ on GitLAB not GitHUB.

@david415
Copy link
Member

@james-darkfox thank you for your contributions!

@WildCryptoFox
Copy link
Author

WildCryptoFox commented Sep 22, 2019

@david415 build.rs "-O0" oh wow... "-O3" ;-)

7-9 -> 0.7 cpb

@david415
Copy link
Member

does this finding produce a nicer line on that graph?

@WildCryptoFox
Copy link
Author

@david415 Uh.. Very. I didn't graph it but it would look very much like AES-CTR, as expected. But now I'm rewriting your bindings. :)

@WildCryptoFox
Copy link
Author

WildCryptoFox commented Sep 22, 2019

And getting invalid memory accesses... somehow. o.0

Edit: Bah. Turns out it doesn't like a zero sized nonce.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants