Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is mini-batch updates implemented in GRPO of trl ? #2903

Closed
jackfsuia opened this issue Feb 19, 2025 · 3 comments
Closed

Is mini-batch updates implemented in GRPO of trl ? #2903

jackfsuia opened this issue Feb 19, 2025 · 3 comments
Labels
🏋 GRPO Related to GRPO ❓ question Seeking clarification or more information

Comments

@jackfsuia
Copy link

According to the GRPO algorithm from the following image from paper: https://arxiv.org/abs/2402.03300
Image

There should exist mini batch off-policy updates (the red underlined part), but I did not find them in trl implementations of GRPO, or, did I get this wrong? Thank you.

@github-actions github-actions bot added 🏋 GRPO Related to GRPO ❓ question Seeking clarification or more information labels Feb 19, 2025
@qgallouedec
Copy link
Member

Not yet. See this for ref #2608 (comment) and I'm working on it #2899

@jackfsuia
Copy link
Author

Not yet. See this for ref #2608 (comment) and I'm working on it #2899

Interesting. I implemented one based on PPOTrainer of trl at https://github.com/jackfsuia/nanoRLHF/blob/main/GRPO/grpo_trainer.py. I don't know if thats similar to what you need.

@qgallouedec
Copy link
Member

Closed by #2899

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🏋 GRPO Related to GRPO ❓ question Seeking clarification or more information
Projects
None yet
Development

No branches or pull requests

2 participants