[PPO] feat: Add LoRA support for PPO #205

StephenXie · 2025-02-05T07:53:46Z

This PR adds LoRA (Low-Rank Adaptation) support for PPO (#159)

Changes

Added LoRA support to actor and critic configuration (see [SFT] feat: Add LoRA support for SFT #127)
Merge the PEFT adapter before serving the model with vLLM and unmerge afterward.

Features

Configurable LoRA rank and alpha parameters
Target module specification for selective adaptation
Compatible with FSDP sharding strategy

Some known issues:

Merge Ref and Actor when LoRA is on requires modifying ppo_trainer logic, we need some help
No thorough testing yet
Line 80 of fsdp_vllm.py needs to be cleaned up
params = OrderedDict((k.replace(".base_layer.", "."), v) for k, v in params.items() if not ".lora_" in k)

Jiayi-Pan · 2025-02-05T08:10:01Z

Relevant thread

#159

Jiayi-Pan/TinyZero#15

vermouth1992 · 2025-02-05T08:56:52Z

verl/workers/sharding_manager/fsdp_vllm.py

+
+        if isinstance(self.module._fsdp_wrapped_module, PeftModel):
+            # the model to sync weights to is a vLLM model (not a peft model), so we need to merge the adapters
+            with FSDP.summon_full_params(self.module):


Summon full params may cause OOM. @PeterSH6 Is there a better approach, that can merge lora weights in sharded form, or at least, one parameter after another to support large models?

Shirley-Kokane · 2025-02-18T00:38:03Z

verl/workers/sharding_manager/fsdp_vllm.py

+        if isinstance(self.module._fsdp_wrapped_module, PeftModel):
+            # the model to sync weights to is a vLLM model (not a peft model), so we need to merge the adapters
+            with FSDP.summon_full_params(self.module):
+                self.module.merge_adapter()


The merge_adapter is not releasing the same original model structure as before. Is there any other way to merge and get the original base model structure?

StephenXie and others added 7 commits February 2, 2025 22:20

Add initial LoRA to ppo fsdp actor

ac31baa

Add LoRA in critic and adjust the config format

76b1875

Update peft implementation

b2a2824

Update get_fsdp_wrap_policy for the critic

50be77d

actor ref lora 2in1 wip

25fd586

minor fix

e01412a

clean up fsdp lora logic

a0be6d9

StephenXie marked this pull request as draft February 5, 2025 08:01

actor ref lora 2in1

cbc9e5d

vermouth1992 reviewed Feb 5, 2025

View reviewed changes

SUMEETRM mentioned this pull request Feb 8, 2025

Add support for PPO/GRPO with LoRA + multi-GPU finetuning Jiayi-Pan/TinyZero#51

Open

Shirley-Kokane reviewed Feb 18, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[PPO] feat: Add LoRA support for PPO #205

[PPO] feat: Add LoRA support for PPO #205

StephenXie commented Feb 5, 2025 •

edited

Loading

Jiayi-Pan commented Feb 5, 2025 •

edited

Loading

vermouth1992 Feb 5, 2025

Shirley-Kokane Feb 18, 2025

[PPO] feat: Add LoRA support for PPO #205

Are you sure you want to change the base?

[PPO] feat: Add LoRA support for PPO #205

Conversation

StephenXie commented Feb 5, 2025 • edited Loading

Changes

Features

Some known issues:

Jiayi-Pan commented Feb 5, 2025 • edited Loading

vermouth1992 Feb 5, 2025

Choose a reason for hiding this comment

Shirley-Kokane Feb 18, 2025

Choose a reason for hiding this comment

StephenXie commented Feb 5, 2025 •

edited

Loading

Jiayi-Pan commented Feb 5, 2025 •

edited

Loading