-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AVX2 implementation of mulcache #624
base: main
Are you sure you want to change the base?
Conversation
@hanno-becker let me know if I opened the PR correctly or not :) |
@dkostic You did :-) Set the |
@dkostic While waiting for the CI, can you clean up the history and add some brief commit messages? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Arm Cortex-A76 (Raspberry Pi 5) benchmarks
Benchmark suite | Current: 1d89eb6 | Previous: e2494f3 | Ratio |
---|---|---|---|
ML-KEM-512 keypair |
29051 cycles |
29058 cycles |
1.00 |
ML-KEM-512 encaps |
35391 cycles |
35404 cycles |
1.00 |
ML-KEM-512 decaps |
45858 cycles |
45881 cycles |
1.00 |
ML-KEM-768 keypair |
49322 cycles |
49337 cycles |
1.00 |
ML-KEM-768 encaps |
55641 cycles |
55586 cycles |
1.00 |
ML-KEM-768 decaps |
70484 cycles |
70368 cycles |
1.00 |
ML-KEM-1024 keypair |
72097 cycles |
72093 cycles |
1.00 |
ML-KEM-1024 encaps |
80837 cycles |
80885 cycles |
1.00 |
ML-KEM-1024 decaps |
100604 cycles |
100773 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Intel Xeon 4th gen (c7i)
Benchmark suite | Current: 1d89eb6 | Previous: e2494f3 | Ratio |
---|---|---|---|
ML-KEM-512 keypair |
9246 cycles |
9334 cycles |
0.99 |
ML-KEM-512 encaps |
13420 cycles |
13356 cycles |
1.00 |
ML-KEM-512 decaps |
18108 cycles |
17994 cycles |
1.01 |
ML-KEM-768 keypair |
16098 cycles |
16096 cycles |
1.00 |
ML-KEM-768 encaps |
18244 cycles |
18299 cycles |
1.00 |
ML-KEM-768 decaps |
24599 cycles |
24680 cycles |
1.00 |
ML-KEM-1024 keypair |
21651 cycles |
21709 cycles |
1.00 |
ML-KEM-1024 encaps |
25108 cycles |
25032 cycles |
1.00 |
ML-KEM-1024 decaps |
34206 cycles |
34105 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AMD EPYC 3rd gen (c6a)
Benchmark suite | Current: 1d89eb6 | Previous: e2494f3 | Ratio |
---|---|---|---|
ML-KEM-512 keypair |
17024 cycles |
16996 cycles |
1.00 |
ML-KEM-512 encaps |
20791 cycles |
20786 cycles |
1.00 |
ML-KEM-512 decaps |
27173 cycles |
27191 cycles |
1.00 |
ML-KEM-768 keypair |
28753 cycles |
28801 cycles |
1.00 |
ML-KEM-768 encaps |
31559 cycles |
31636 cycles |
1.00 |
ML-KEM-768 decaps |
40928 cycles |
41014 cycles |
1.00 |
ML-KEM-1024 keypair |
41880 cycles |
41877 cycles |
1.00 |
ML-KEM-1024 encaps |
46658 cycles |
46744 cycles |
1.00 |
ML-KEM-1024 decaps |
59214 cycles |
59626 cycles |
0.99 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Intel Xeon 3rd gen (c6i)
Benchmark suite | Current: 1d89eb6 | Previous: e2494f3 | Ratio |
---|---|---|---|
ML-KEM-512 keypair |
15738 cycles |
15699 cycles |
1.00 |
ML-KEM-512 encaps |
22229 cycles |
22199 cycles |
1.00 |
ML-KEM-512 decaps |
29837 cycles |
29826 cycles |
1.00 |
ML-KEM-768 keypair |
27074 cycles |
27055 cycles |
1.00 |
ML-KEM-768 encaps |
30402 cycles |
30327 cycles |
1.00 |
ML-KEM-768 decaps |
41481 cycles |
41414 cycles |
1.00 |
ML-KEM-1024 keypair |
36540 cycles |
36540 cycles |
1 |
ML-KEM-1024 encaps |
42287 cycles |
42242 cycles |
1.00 |
ML-KEM-1024 decaps |
57388 cycles |
57500 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Intel Xeon 4th gen (c7i) (no-opt)
Benchmark suite | Current: 1d89eb6 | Previous: e2494f3 | Ratio |
---|---|---|---|
ML-KEM-512 keypair |
28789 cycles |
28718 cycles |
1.00 |
ML-KEM-512 encaps |
34802 cycles |
34752 cycles |
1.00 |
ML-KEM-512 decaps |
45196 cycles |
45107 cycles |
1.00 |
ML-KEM-768 keypair |
48497 cycles |
48607 cycles |
1.00 |
ML-KEM-768 encaps |
58759 cycles |
58828 cycles |
1.00 |
ML-KEM-768 decaps |
72548 cycles |
72591 cycles |
1.00 |
ML-KEM-1024 keypair |
71912 cycles |
72117 cycles |
1.00 |
ML-KEM-1024 encaps |
86530 cycles |
86588 cycles |
1.00 |
ML-KEM-1024 decaps |
104168 cycles |
104392 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Graviton4
Benchmark suite | Current: 1d89eb6 | Previous: e2494f3 | Ratio |
---|---|---|---|
ML-KEM-512 keypair |
18132 cycles |
18130 cycles |
1.00 |
ML-KEM-512 encaps |
22141 cycles |
22170 cycles |
1.00 |
ML-KEM-512 decaps |
28755 cycles |
28799 cycles |
1.00 |
ML-KEM-768 keypair |
30570 cycles |
30583 cycles |
1.00 |
ML-KEM-768 encaps |
33628 cycles |
33636 cycles |
1.00 |
ML-KEM-768 decaps |
43103 cycles |
43154 cycles |
1.00 |
ML-KEM-1024 keypair |
44183 cycles |
44179 cycles |
1.00 |
ML-KEM-1024 encaps |
49650 cycles |
49659 cycles |
1.00 |
ML-KEM-1024 decaps |
62536 cycles |
62635 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AMD EPYC 3rd gen (c6a) (no-opt)
Benchmark suite | Current: 1d89eb6 | Previous: e2494f3 | Ratio |
---|---|---|---|
ML-KEM-512 keypair |
39656 cycles |
39692 cycles |
1.00 |
ML-KEM-512 encaps |
48298 cycles |
48373 cycles |
1.00 |
ML-KEM-512 decaps |
62990 cycles |
62919 cycles |
1.00 |
ML-KEM-768 keypair |
65695 cycles |
65551 cycles |
1.00 |
ML-KEM-768 encaps |
77223 cycles |
77229 cycles |
1.00 |
ML-KEM-768 decaps |
96348 cycles |
96444 cycles |
1.00 |
ML-KEM-1024 keypair |
97977 cycles |
98024 cycles |
1.00 |
ML-KEM-1024 encaps |
112370 cycles |
112502 cycles |
1.00 |
ML-KEM-1024 decaps |
136497 cycles |
136546 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Graviton3
Benchmark suite | Current: 1d89eb6 | Previous: e2494f3 | Ratio |
---|---|---|---|
ML-KEM-512 keypair |
18948 cycles |
18947 cycles |
1.00 |
ML-KEM-512 encaps |
23519 cycles |
23518 cycles |
1.00 |
ML-KEM-512 decaps |
30607 cycles |
30639 cycles |
1.00 |
ML-KEM-768 keypair |
32301 cycles |
32293 cycles |
1.00 |
ML-KEM-768 encaps |
35875 cycles |
35878 cycles |
1.00 |
ML-KEM-768 decaps |
46036 cycles |
46024 cycles |
1.00 |
ML-KEM-1024 keypair |
46611 cycles |
46547 cycles |
1.00 |
ML-KEM-1024 encaps |
52438 cycles |
52390 cycles |
1.00 |
ML-KEM-1024 decaps |
66195 cycles |
66195 cycles |
1 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Graviton2
Benchmark suite | Current: 1d89eb6 | Previous: e2494f3 | Ratio |
---|---|---|---|
ML-KEM-512 keypair |
29038 cycles |
29072 cycles |
1.00 |
ML-KEM-512 encaps |
35402 cycles |
35424 cycles |
1.00 |
ML-KEM-512 decaps |
45860 cycles |
45897 cycles |
1.00 |
ML-KEM-768 keypair |
49327 cycles |
49325 cycles |
1.00 |
ML-KEM-768 encaps |
55651 cycles |
55570 cycles |
1.00 |
ML-KEM-768 decaps |
70505 cycles |
70379 cycles |
1.00 |
ML-KEM-1024 keypair |
72122 cycles |
72035 cycles |
1.00 |
ML-KEM-1024 encaps |
80853 cycles |
80799 cycles |
1.00 |
ML-KEM-1024 decaps |
100623 cycles |
100705 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AMD EPYC 4th gen (c7a)
Benchmark suite | Current: 1d89eb6 | Previous: e2494f3 | Ratio |
---|---|---|---|
ML-KEM-512 keypair |
11565 cycles |
11487 cycles |
1.01 |
ML-KEM-512 encaps |
16408 cycles |
16445 cycles |
1.00 |
ML-KEM-512 decaps |
22022 cycles |
22135 cycles |
0.99 |
ML-KEM-768 keypair |
19576 cycles |
19615 cycles |
1.00 |
ML-KEM-768 encaps |
22032 cycles |
22182 cycles |
0.99 |
ML-KEM-768 decaps |
30163 cycles |
30433 cycles |
0.99 |
ML-KEM-1024 keypair |
26475 cycles |
26368 cycles |
1.00 |
ML-KEM-1024 encaps |
30751 cycles |
30788 cycles |
1.00 |
ML-KEM-1024 decaps |
41818 cycles |
41536 cycles |
1.01 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Intel Xeon 3rd gen (c6i) (no-opt)
Benchmark suite | Current: 1d89eb6 | Previous: e2494f3 | Ratio |
---|---|---|---|
ML-KEM-512 keypair |
47351 cycles |
47320 cycles |
1.00 |
ML-KEM-512 encaps |
56429 cycles |
56470 cycles |
1.00 |
ML-KEM-512 decaps |
72910 cycles |
72983 cycles |
1.00 |
ML-KEM-768 keypair |
78273 cycles |
78420 cycles |
1.00 |
ML-KEM-768 encaps |
90346 cycles |
90458 cycles |
1.00 |
ML-KEM-768 decaps |
111651 cycles |
111585 cycles |
1.00 |
ML-KEM-1024 keypair |
115406 cycles |
115669 cycles |
1.00 |
ML-KEM-1024 encaps |
131054 cycles |
131031 cycles |
1.00 |
ML-KEM-1024 decaps |
158136 cycles |
158137 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Graviton4 (no-opt)
Benchmark suite | Current: 1d89eb6 | Previous: e2494f3 | Ratio |
---|---|---|---|
ML-KEM-512 keypair |
37911 cycles |
37912 cycles |
1.00 |
ML-KEM-512 encaps |
43325 cycles |
43322 cycles |
1.00 |
ML-KEM-512 decaps |
55764 cycles |
55510 cycles |
1.00 |
ML-KEM-768 keypair |
63119 cycles |
63064 cycles |
1.00 |
ML-KEM-768 encaps |
70452 cycles |
70587 cycles |
1.00 |
ML-KEM-768 decaps |
86875 cycles |
86993 cycles |
1.00 |
ML-KEM-1024 keypair |
94605 cycles |
94564 cycles |
1.00 |
ML-KEM-1024 encaps |
105332 cycles |
105345 cycles |
1.00 |
ML-KEM-1024 decaps |
126811 cycles |
126633 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AMD EPYC 4th gen (c7a) (no-opt)
Benchmark suite | Current: 1d89eb6 | Previous: e2494f3 | Ratio |
---|---|---|---|
ML-KEM-512 keypair |
37797 cycles |
37760 cycles |
1.00 |
ML-KEM-512 encaps |
44404 cycles |
44409 cycles |
1.00 |
ML-KEM-512 decaps |
58614 cycles |
58594 cycles |
1.00 |
ML-KEM-768 keypair |
61363 cycles |
61357 cycles |
1.00 |
ML-KEM-768 encaps |
69965 cycles |
69958 cycles |
1.00 |
ML-KEM-768 decaps |
88865 cycles |
88942 cycles |
1.00 |
ML-KEM-1024 keypair |
88772 cycles |
88682 cycles |
1.00 |
ML-KEM-1024 encaps |
101249 cycles |
101026 cycles |
1.00 |
ML-KEM-1024 decaps |
123485 cycles |
123373 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Graviton3 (no-opt)
Benchmark suite | Current: 1d89eb6 | Previous: e2494f3 | Ratio |
---|---|---|---|
ML-KEM-512 keypair |
39335 cycles |
39330 cycles |
1.00 |
ML-KEM-512 encaps |
45258 cycles |
45265 cycles |
1.00 |
ML-KEM-512 decaps |
57137 cycles |
57185 cycles |
1.00 |
ML-KEM-768 keypair |
65830 cycles |
65943 cycles |
1.00 |
ML-KEM-768 encaps |
73719 cycles |
73797 cycles |
1.00 |
ML-KEM-768 decaps |
89805 cycles |
89811 cycles |
1.00 |
ML-KEM-1024 keypair |
99009 cycles |
99026 cycles |
1.00 |
ML-KEM-1024 encaps |
109998 cycles |
109978 cycles |
1.00 |
ML-KEM-1024 decaps |
130741 cycles |
130765 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
⚠️ Performance Alert ⚠️
Possible performance regression was detected for benchmark 'Graviton3 (no-opt)'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.03
.
Benchmark suite | Current: 61c9f18 | Previous: 5ba851a | Ratio |
---|---|---|---|
ML-KEM-512 encaps |
56437 cycles |
54243 cycles |
1.04 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Graviton2 (no-opt)
Benchmark suite | Current: 1d89eb6 | Previous: e2494f3 | Ratio |
---|---|---|---|
ML-KEM-512 keypair |
60609 cycles |
61294 cycles |
0.99 |
ML-KEM-512 encaps |
69730 cycles |
69785 cycles |
1.00 |
ML-KEM-512 decaps |
88861 cycles |
88870 cycles |
1.00 |
ML-KEM-768 keypair |
101780 cycles |
101971 cycles |
1.00 |
ML-KEM-768 encaps |
113964 cycles |
114128 cycles |
1.00 |
ML-KEM-768 decaps |
139489 cycles |
139418 cycles |
1.00 |
ML-KEM-1024 keypair |
154183 cycles |
154578 cycles |
1.00 |
ML-KEM-1024 encaps |
170179 cycles |
170453 cycles |
1.00 |
ML-KEM-1024 decaps |
202625 cycles |
202866 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bananapi bpi-f3 benchmarks
Benchmark suite | Current: 1d89eb6 | Previous: e2494f3 | Ratio |
---|---|---|---|
ML-KEM-512 keypair |
330164 cycles |
331565 cycles |
1.00 |
ML-KEM-512 encaps |
439412 cycles |
440801 cycles |
1.00 |
ML-KEM-512 decaps |
585957 cycles |
589328 cycles |
0.99 |
ML-KEM-768 keypair |
547168 cycles |
550817 cycles |
0.99 |
ML-KEM-768 encaps |
686520 cycles |
690055 cycles |
0.99 |
ML-KEM-768 decaps |
875914 cycles |
882124 cycles |
0.99 |
ML-KEM-1024 keypair |
807824 cycles |
811469 cycles |
1.00 |
ML-KEM-1024 encaps |
980240 cycles |
983768 cycles |
1.00 |
ML-KEM-1024 decaps |
1210824 cycles |
1218161 cycles |
0.99 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Arm Cortex-A72 (Raspberry Pi 4) benchmarks
Benchmark suite | Current: 1d89eb6 | Previous: e2494f3 | Ratio |
---|---|---|---|
ML-KEM-512 keypair |
51592 cycles |
51940 cycles |
0.99 |
ML-KEM-512 encaps |
58086 cycles |
58576 cycles |
0.99 |
ML-KEM-512 decaps |
74072 cycles |
74200 cycles |
1.00 |
ML-KEM-768 keypair |
88091 cycles |
88340 cycles |
1.00 |
ML-KEM-768 encaps |
96658 cycles |
96787 cycles |
1.00 |
ML-KEM-768 decaps |
119319 cycles |
119436 cycles |
1.00 |
ML-KEM-1024 keypair |
131635 cycles |
131851 cycles |
1.00 |
ML-KEM-1024 encaps |
145038 cycles |
145657 cycles |
1.00 |
ML-KEM-1024 decaps |
176171 cycles |
177347 cycles |
0.99 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Arm Cortex-A55 (Snapdragon 888) benchmarks
Benchmark suite | Current: 1d89eb6 | Previous: e2494f3 | Ratio |
---|---|---|---|
ML-KEM-512 keypair |
58239 cycles |
58319 cycles |
1.00 |
ML-KEM-512 encaps |
65480 cycles |
65746 cycles |
1.00 |
ML-KEM-512 decaps |
83784 cycles |
84487 cycles |
0.99 |
ML-KEM-768 keypair |
98969 cycles |
98973 cycles |
1.00 |
ML-KEM-768 encaps |
110258 cycles |
110508 cycles |
1.00 |
ML-KEM-768 decaps |
136100 cycles |
136773 cycles |
1.00 |
ML-KEM-1024 keypair |
150320 cycles |
150172 cycles |
1.00 |
ML-KEM-1024 encaps |
166535 cycles |
166783 cycles |
1.00 |
ML-KEM-1024 decaps |
201494 cycles |
202453 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The performance figures in the CI are a bit disappointing, but I still think this change is worth merging because (a) it makes all backends use the same approach, (b) it enables a hopefully more impactful optimization, namely lazy reduction.
@dkostic Can you clean up the history? I'll make time to go over the code again in detail in the meantime.
@mkannwischer Please chime in.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had a closer look at the code. Thank you again @dkostic for the work, I think we're close. Two main points:
- The documentation needs improving. This is partly pre-existing, but there is some trickery regarding the sign of half of the multiplications that should be documented. In particular, it should be documented that half of the AVX2 mulcache is negated compared to what it should be conceptually (and is, in the C and AArch64 backends).
- The twisted mulcache zetas should be precomputed instead of always being recomputed in the mulcache computation -- this is also what's done in the C and AArch64 backend. While I'm happy deferring further lazy reduction optimizations to a later PR, the storing of the twisted twiddles should be done already here
9abea89
to
46d8bef
Compare
Slight performance improvement: |--------------------------------------| | MLKEM 512 | Before | After | Improv | | keypair | 15101 | 14835 | 1.02x | | encaps | 19664 | 18631 | 1.05x | | decaps | 25824 | 24420 | 1.05x | |--------------------------------------| | MLKEM 768 | Before | After | Improv | | keypair | 26187 | 25157 | 1.04x | | encaps | 28014 | 27248 | 1.03x | | decaps | 36989 | 35659 | 1.03x | |--------------------------------------| | MLKEM 1024 | Before | After | Improv | | keypair | 36014 | 35630 | 1.01x | | encaps | 39797 | 39347 | 1.00x | | decaps | 52139 | 51524 | 1.01x | |--------------------------------------| measured on Intel(R) Xeon(R) Platinum 8488C. Signed-off-by: dkostic <[email protected]> make zetas_avx2 static Signed-off-by: dkostic <[email protected]> auto-generate consts Signed-off-by: dkostic <[email protected]> format + lint Signed-off-by: dkostic <[email protected]> Header file for zetas Signed-off-by: dkostic <[email protected]> return missing symlink Signed-off-by: dkostic <[email protected]> mulcache_compute in asm Signed-off-by: dkostic <[email protected]> format + lint Signed-off-by: dkostic <[email protected]> run autogenerate_files Signed-off-by: dkostic <[email protected]> namespacing Signed-off-by: dkostic <[email protected]>
46d8bef
to
aa84319
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
⚠️ Performance Alert ⚠️
Possible performance regression was detected for benchmark 'AMD EPYC 4th gen (c7a)'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.03
.
Benchmark suite | Current: aa84319 | Previous: e2494f3 | Ratio |
---|---|---|---|
ML-KEM-512 keypair |
12175 cycles |
11487 cycles |
1.06 |
This comment was automatically generated by workflow using github-action-benchmark.
Signed-off-by: Hanno Becker <[email protected]>
Signed-off-by: Hanno Becker <[email protected]>
42adb1c
to
1d89eb6
Compare
@dkostic A lot has changed on |
Summary:
AVX2 implementation of mulcache
Slight performance improvement:
measured on Intel(R) Xeon(R) Platinum 8488C.
Addresses #477 .
Steps:
If your pull request consists of multiple sequential changes, please describe them here:
Performed local tests:
lint
passingtests all
passingtests bench
passingtests cbmc
passingDo you expect this change to impact performance: Yes, see above.