Raise UserWarning in RewardTraining if PEFT target_modules="all-linear" #2743

JohnGiorgi · 2025-02-02T21:13:16Z

What does this PR do?

I ran into a nasty edge case that I don't think is documented anywhere when using the RewardTrainer to train a reward model. If a peft_config is provided and target_modules="all-linear" (a wild card that means: adapt all linear layers except for the output) then the output layer of the reward model, which is often newly initialized and is used to score the chosen/rejected completions, will go un-adapted (and therefore un-trained). Performance will be impacted as you might expect, e.g. here's two runs of mine with target_modules="all-linear" and target_modules=None (the default):

This PR simply raises a UserWarning in RewardModel in this case. You could almost argue that raising an error would be warranted, but I wonder if there is some scenario in which the output layer is already trained, and a user wants to just adapt some intermediate layers with LoRA.

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a GitHub issue? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

…="all-linear" when RMing

JohnGiorgi · 2025-02-02T21:30:32Z

trl/trainer/reward_trainer.py

+                target_modules = (
+                    peft_config.get("target_modules", None)
+                    if isinstance(peft_config, dict)
+                    else peft_config.target_modules
+                )


I am handling both cases of peft_config being dict and being a PeftConfigobject.

It is type-hinted as a dict in the RewardModel docstring, but I can see a PeftConfig object is passed to RewardTrainer here.

JohnGiorgi · 2025-02-02T23:55:27Z

Closing because I wasn't able to reproduce this on closer inspection... I can also see model.score.parameters() is updated after training even with target_modules=all-linear...

JohnGiorgi added 4 commits February 2, 2025 15:59

fix: warn a user if target_modules="all-linear" when reward modeling

d62d59b

tests: add test confirming user warning is raised when target_modules…

a974572

…="all-linear" when RMing

fix: tweak UserWarning

e606875

tests: make test look for specific userwarning

8550984

JohnGiorgi commented Feb 2, 2025

View reviewed changes

JohnGiorgi closed this Feb 2, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Raise UserWarning in RewardTraining if PEFT target_modules="all-linear" #2743

Raise UserWarning in RewardTraining if PEFT target_modules="all-linear" #2743

JohnGiorgi commented Feb 2, 2025 •

edited

Loading

JohnGiorgi Feb 2, 2025

JohnGiorgi commented Feb 2, 2025

Raise UserWarning in RewardTraining if PEFT target_modules="all-linear" #2743

Raise UserWarning in RewardTraining if PEFT target_modules="all-linear" #2743

Conversation

JohnGiorgi commented Feb 2, 2025 • edited Loading

What does this PR do?

Before submitting

Who can review?

JohnGiorgi Feb 2, 2025

Choose a reason for hiding this comment

JohnGiorgi commented Feb 2, 2025

JohnGiorgi commented Feb 2, 2025 •

edited

Loading