Skip to content

Actions: huggingface/trl

Hugging Face Issue Labeler

Actions

Loading...
Loading

Show workflow options

Create status badge

Loading
190 workflow runs
190 workflow runs

Filter by Event

Filter by Status

Filter by Branch

Filter by Actor

Directory not empty error in checkpointing
Hugging Face Issue Labeler #190: Issue #2928 opened by mehdiataei
February 21, 2025 23:24 32s
February 21, 2025 23:24 32s
Is there any problem with GRPOtrainer’s memory usage?
Hugging Face Issue Labeler #189: Issue #2927 opened by Tuziking
February 21, 2025 17:11 48s
February 21, 2025 17:11 48s
There is miscaculated attention mask shape in comment in GRPO trainer.
Hugging Face Issue Labeler #188: Issue #2924 opened by linkedlist771
February 21, 2025 15:25 36s
February 21, 2025 15:25 36s
NCCL timeout when GRPO training with vllm
Hugging Face Issue Labeler #187: Issue #2923 opened by edwardzjl
February 21, 2025 09:26 39s
February 21, 2025 09:26 39s
How to support multi-device VLLM inference in the GRPO Trainer
Hugging Face Issue Labeler #186: Issue #2922 opened by 0x404
February 21, 2025 09:24 40s
February 21, 2025 09:24 40s
simple question: SFTTrainer ValueError
Hugging Face Issue Labeler #185: Issue #2920 opened by jbw3016
February 21, 2025 02:11 30s
February 21, 2025 02:11 30s
Fine tuning "thinking"/"reasoning" models
Hugging Face Issue Labeler #184: Issue #2919 opened by GhostDog98
February 21, 2025 00:47 35s
February 21, 2025 00:47 35s
GRPO from VLM models?
Hugging Face Issue Labeler #183: Issue #2917 opened by dipta007
February 20, 2025 19:10 32s
February 20, 2025 19:10 32s
I want to solve this issue: ValueError: Unable to create tensor
Hugging Face Issue Labeler #182: Issue #2916 opened by jbw3016
February 20, 2025 17:09 37s
February 20, 2025 17:09 37s
SFTTrainer: Why do we always switch to chatML?
Hugging Face Issue Labeler #181: Issue #2915 opened by jbw3016
February 20, 2025 15:42 28s
February 20, 2025 15:42 28s
Clarification on KL Divergence Computation in GRPOTrainer
Hugging Face Issue Labeler #180: Issue #2914 opened by zhaopku
February 20, 2025 14:02 34s
February 20, 2025 14:02 34s
How to specify the GPU used by vllm
Hugging Face Issue Labeler #179: Issue #2913 opened by xiaolizh1
February 20, 2025 10:32 40s
February 20, 2025 10:32 40s
Getting an error while using a PEFT model as a reward model in PPO training.
Hugging Face Issue Labeler #178: Issue #2911 opened by Tarak200
February 20, 2025 06:58 31s
February 20, 2025 06:58 31s
L447 of GRPO trainer 'num_return_sequences=self.num_generations'
Hugging Face Issue Labeler #177: Issue #2910 opened by zhengqigao
February 20, 2025 01:57 26s
February 20, 2025 01:57 26s
Cannot import name 'shard_checkpoint' (possibly deprecated in transformers)
Hugging Face Issue Labeler #176: Issue #2909 opened by anshuln2
February 20, 2025 00:20 37s
February 20, 2025 00:20 37s
GRPO: Enable updating the reference model for KL divergence penalty calculation
Hugging Face Issue Labeler #175: Issue #2908 opened by ko-redtruck
February 19, 2025 21:03 28s
February 19, 2025 21:03 28s
What‘s the GRPOTrainer's error 浮点数例外(吐核)
Hugging Face Issue Labeler #173: Issue #2906 opened by Tuziking
February 19, 2025 15:16 52s
February 19, 2025 15:16 52s
How to use GRPOTrainer to train a LLM for code generation? What is the format of the dataset?
Hugging Face Issue Labeler #172: Issue #2905 opened by xiangxinhello
February 19, 2025 12:38 19s
February 19, 2025 12:38 19s
Save memory when layers are shared with ref model?
Hugging Face Issue Labeler #171: Issue #2904 opened by raphael-sch
February 19, 2025 10:58 37s
February 19, 2025 10:58 37s
Is mini-batch updates implemented in GRPO of trl ?
Hugging Face Issue Labeler #170: Issue #2903 opened by jackfsuia
February 19, 2025 09:01 34s
February 19, 2025 09:01 34s
GRPO completions skip special tokens ?
Hugging Face Issue Labeler #169: Issue #2897 opened by MohamedAliRashad
February 18, 2025 19:51 29s
February 18, 2025 19:51 29s
RuntimeError: q must have shape (batch_size, seqlen_q, num_heads, head_size)
Hugging Face Issue Labeler #168: Issue #2896 opened by MohamedAliRashad
February 18, 2025 16:15 32s
February 18, 2025 16:15 32s
Stop at eos_token for sampling in online algorithms
Hugging Face Issue Labeler #167: Issue #2892 opened by haoxiongliu
February 18, 2025 13:26 35s
February 18, 2025 13:26 35s
[Qwen2.5] LoRA with SFT seems to be stuck forever with DeepSpeed
Hugging Face Issue Labeler #166: Issue #2891 opened by sayakpaul
February 18, 2025 12:17 44s
February 18, 2025 12:17 44s