Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add SVE support for Graviton4 #364

Merged
merged 1 commit into from
Jan 4, 2025
Merged

Conversation

gswirski
Copy link
Contributor

Graviton4 CPU reduced SVE register size from 256 to 128 bit. This PR adds support for this configuration by essentially duplicating the code. I tried doing some macro magic with variadic arguments but it ended up being more confusing than simply having 128 and 256 bit versions.

Copy link

Quality Gate Failed Quality Gate failed

Failed conditions
13.5% Duplication on New Code (required ≤ 3%)

See analysis details on SonarQube Cloud

Copy link
Contributor

@bobbinth bobbinth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Thank you!

@bobbinth bobbinth merged commit f825c23 into 0xPolygonMiden:next Jan 4, 2025
12 of 13 checks passed
@bobbinth
Copy link
Contributor

bobbinth commented Jan 4, 2025

I updated the benchmarks in 6da2a62 and it seems like there is a modest improvement over Graviton 3, but I guess since they reduces the register size to 128 bits, that's the best we can hope for?

Also, to run tests I did the following on a Graviton 4 machine:

RUSTFLAGS="-C target-feature=+sve -C debug-assertions -C overflow-checks -C debuginfo=2" cargo nextest run --profile default --release

Would you do anything differently?

@gswirski
Copy link
Contributor Author

gswirski commented Jan 6, 2025

Thank you for updating benchmark info, I missed it. I was using only RUSTFLAGS="-C target-feature=+sve for testing because I didn't know any better. This was enough to trigger SVE code.

As for further Graviton4 improvements, I'm not very optimistic. SVE2 added multiply-add with overflow opcodes that would be very helpful, except for the fact that then SVE registers become a single value (result + carry). I can try multiply-add without overflow in the coming days to collapse some of the instructions, but this PR was the easiest and fastest way to at least get Graviton4 on par with Graviton3.

@gswirski gswirski deleted the graviton4 branch January 6, 2025 14:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants