-
Notifications
You must be signed in to change notification settings - Fork 493
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introducing experimental gradient accumulation API #8584
Conversation
@rpsilva-aws do you plan on merging this into r2.6? |
@tengyifei Ideally, yes. It's perfectly fine for the 3-layer MLP, but we're seeing a small difference for Llama runs (difference being, from a previous local patch set that was just before cleaning some of the code), so we're just quickly identifying what it is. |
Okay, please aim to sort out all critical issues by Jan 21 if you're aiming for 2.6 so that we could review and cherrypick it by Jan 22. 2.6 release is quicking drawing in and I would like a few days to test all the builds. |
720d1e6
to
d6bfdd1
Compare
08831d6
to
567ccb5
Compare
4589eb2
to
dfbef15
Compare
689dd0e
to
1ce443c
Compare
1ce443c
to
3cd4542
Compare
Tests are succeeding with the default |
b4def6a
to
a9ab7a5
Compare
a9ab7a5
to
78984b0
Compare
There was some setup issue with CPU's |
Looks like everything passed after a retry |
In this PR, we introduce
experimental.gradient_accumulation
which leverages XLA'sWhile
op to accumulate gradients.