-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implementing HF Padding-Free and GraniteLM Support #257
Conversation
3bcd40f
to
705ad43
Compare
3bb42af
to
9de3409
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few comments but it looks good so far. Just need to test it
This pull request has merge conflicts that must be resolved before it can be |
9460378
to
8e3f553
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally seems like a solid drop-in replacement with some good updates. First round of review only covers code- I'll run everything on AMD hardware as well.
94d3adb
to
a45e82b
Compare
Signed-off-by: Mustafa Eyceoz <[email protected]>
Signed-off-by: Mustafa Eyceoz <[email protected]>
@JamesKunstle added comments for the things you wanted |
@Maxusmusti Much appreciated- testing on AMD now |
Testing passes on AMD, losses between Dolomite and Llama are in parity. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tests pass and loss curves look good between Dolomite and Llama padding-free on AMD. Feel good about merging this.
@@ -199,7 +199,7 @@ def print_masked_samples(data, tokenizer, is_pretrain, num_proc): | |||
def get_masked_and_orig_text(sample): | |||
labels = sample["labels"] | |||
input_ids = sample["input_ids"] | |||
mask_id = get_sp_token(tokenizer, "<MASK>")[0] | |||
mask_id = get_sp_token(tokenizer, "<|MASK|>")[0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will this affect existing models? Or is this purely for training-time?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah only relevant during training
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Updating the data collator for models with HF padding-free support, adding support for upcoming Granite HF model class, and updating flags/interface accordingly.
-Mustafa