DPOTrainer loss goes down to 0.0 while at the end it reports the train_loss is 0.15 - the loss during training and at end differs substantially #174

	name: "Hugging Face Issue Labeler"
	on:
	issues:
	types: opened

	jobs:
	triage:
	runs-on: ubuntu-latest
	permissions:
	issues: write
	steps:
	- uses: actions/checkout@v3
	- uses: August-murr/auto-labeler@main
	with:
	hf-api-key: ${{ secrets.CI_HF_API_TOKEN }}

Provide feedback