We use LLama-Factory off the shelf for our reward model training. Therefore, to install the environment, please refer to their repository. At the time of writing this page, the following scripts work for us:
conda create -n llamaFactory python=3.11
conda init
conda activate llamaFactory
pip install -e ".[torch,metrics]"
pip install deepspeed==0.15.4
pip install -U "huggingface_hub[cli]"
Please complete the following steps:
- Move the 3 files under configs into the llamafactory directory after you have cloned it.
- Add the following two entries to
LLaMA-Factory/data/dataset_info.json
:
"AceCodePair-300K": {
"hf_hub_url": "TIGER-Lab/AceCodePair-300K",
"ranking": true,
"columns": {
"prompt": "instruction",
"query": "input",
"chosen": "chosen",
"rejected": "rejected"
}
},
"AceCodePair-QwenCoderIns32B": {
"hf_hub_url": "TIGER-Lab/AceCodePair-QwenCoderIns32B",
"ranking": true,
"columns": {
"prompt": "instruction",
"query": "input",
"chosen": "chosen",
"rejected": "rejected"
}
}
- Change the
output_dir
field in the yaml files that you have copied for the desired model output path. - Run:
llamafactory-cli train train_qwen_coder_ins_2.5_{7/32}b.yaml