[`core`] Fix `use_reentrant` issues #1036

younesbelkada · 2023-10-18T23:26:54Z

This PR depends on huggingface/transformers#27020 - with that PR, we introduce a new argument in the gradient_checkpointing_enable() API in order for users to pass gradient_checkpointing_kwargs. To fix some issues with DDP and gradient_checkpointing, it is recommended to use use_reentrant=True which is the fix for huggingface/trl#835

Therefore I propose to expose an optional argument gradient_checkpointing_kwargs in prepare_model_for_kbit_training

cc @BenjaminBossan @pacman100

BenjaminBossan · 2023-10-23T11:16:59Z

Is this ready for review?

younesbelkada · 2023-10-24T10:03:00Z

Almost, I'll update the PR in a bit with more description

HuggingFaceDocBuilderDev · 2023-10-24T10:16:24Z

The documentation is not available anymore as the PR was closed or merged.

younesbelkada · 2023-10-25T10:20:32Z

huggingface/transformers#27020 being merged, this PR is ready for review!

BenjaminBossan

Thanks for making the big fix in transformers and updating PEFT to enable it. I have only a few comments, please check them out.

BenjaminBossan · 2023-10-25T10:30:28Z

src/peft/utils/other.py

+            If True, use gradient checkpointing to save memory at the expense of slower backward pass.
+        gradient_checkpointing_kwargs (`dict`, *optional*, defaults to `None`):
+            Keyword arguments to pass to the gradient checkpointing function, e.g. `use_reentrant=True`. Note this is
+            only available in the latest transformers versions.


It would be better to specify the exact transformers version, because "latest" is relative.

BenjaminBossan · 2023-10-25T10:34:25Z

src/peft/utils/other.py

+        use_gradient_checkpointing (`bool`, *optional*, defaults to `True`):
+            If True, use gradient checkpointing to save memory at the expense of slower backward pass.
+        gradient_checkpointing_kwargs (`dict`, *optional*, defaults to `None`):
+            Keyword arguments to pass to the gradient checkpointing function, e.g. `use_reentrant=True`. Note this is


Could you explain what use_reentrant does?

I referred users to check the pytorch documentation, let me know if you want me to detail more

src/peft/utils/other.py

BenjaminBossan

Thanks for adjusting the code. It now looks good to me.

younesbelkada · 2023-10-25T13:56:31Z

Thanks, I'll merge after finishing some experiments with huggingface/transformers#27068 and huggingface/trl#912

pacman100 · 2023-11-02T14:55:13Z

To fix some issues with DDP and gradient_checkpointing, it is recommended to use use_reentrant=True which is the fix for huggingface/trl#835

Hello Younes, thank you for fixing the gradient checkpointing related issues as per our discussions.

Just a nit, here you meant use_reentrant=False, right?

younesbelkada · 2023-11-02T16:08:13Z

AH yes correct @pacman100 !

fix use_reentrant issues

5cd3c8b

younesbelkada added 2 commits October 24, 2023 10:08

fix

d936c4a

fixup

2d16300

younesbelkada mentioned this pull request Oct 24, 2023

[core / DDP] Fix RM trainer + DDP + quantization + propagate gradient_checkpointing_kwargs in SFT & DPO huggingface/trl#912

Merged

younesbelkada marked this pull request as ready for review October 25, 2023 10:20

BenjaminBossan requested changes Oct 25, 2023

View reviewed changes

younesbelkada added 2 commits October 25, 2023 13:04

address comments.

90105df

add warnings

53e053e

younesbelkada requested a review from BenjaminBossan October 25, 2023 13:11

oops

3cb4cc2

BenjaminBossan approved these changes Oct 25, 2023

View reviewed changes

younesbelkada mentioned this pull request Oct 30, 2023

[Trainer / GC] Add gradient_checkpointing_kwargs in trainer and training arguments huggingface/transformers#27068

Merged

younesbelkada added 2 commits October 30, 2023 11:26

fix

0af21e1

quality

012c0d5

younesbelkada merged commit bdeb06b into huggingface:main Oct 31, 2023

younesbelkada deleted the add-usereentrant branch October 31, 2023 15:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[`core`] Fix `use_reentrant` issues #1036

[`core`] Fix `use_reentrant` issues #1036

younesbelkada commented Oct 18, 2023 •

edited

Loading

BenjaminBossan commented Oct 23, 2023

younesbelkada commented Oct 24, 2023

HuggingFaceDocBuilderDev commented Oct 24, 2023 •

edited

Loading

younesbelkada commented Oct 25, 2023

BenjaminBossan left a comment

BenjaminBossan Oct 25, 2023

BenjaminBossan Oct 25, 2023

younesbelkada Oct 25, 2023

BenjaminBossan left a comment

younesbelkada commented Oct 25, 2023

pacman100 commented Nov 2, 2023

younesbelkada commented Nov 2, 2023

[core] Fix use_reentrant issues #1036

[core] Fix use_reentrant issues #1036

Conversation

younesbelkada commented Oct 18, 2023 • edited Loading

BenjaminBossan commented Oct 23, 2023

younesbelkada commented Oct 24, 2023

HuggingFaceDocBuilderDev commented Oct 24, 2023 • edited Loading

younesbelkada commented Oct 25, 2023

BenjaminBossan left a comment

Choose a reason for hiding this comment

BenjaminBossan Oct 25, 2023

Choose a reason for hiding this comment

BenjaminBossan Oct 25, 2023

Choose a reason for hiding this comment

younesbelkada Oct 25, 2023

Choose a reason for hiding this comment

BenjaminBossan left a comment

Choose a reason for hiding this comment

younesbelkada commented Oct 25, 2023

pacman100 commented Nov 2, 2023

younesbelkada commented Nov 2, 2023

[`core`] Fix `use_reentrant` issues #1036

[`core`] Fix `use_reentrant` issues #1036

younesbelkada commented Oct 18, 2023 •

edited

Loading

HuggingFaceDocBuilderDev commented Oct 24, 2023 •

edited

Loading