Installation

We use LLama-Factory off the shelf for our reward model training. Therefore, to install the environment, please refer to their repository. At the time of writing this page, the following scripts work for us:

conda create -n llamaFactory python=3.11
conda init
conda activate llamaFactory
pip install -e ".[torch,metrics]"
pip install deepspeed==0.15.4
pip install -U "huggingface_hub[cli]"

Setup

Please complete the following steps:

Move the 3 files under configs into the llamafactory directory after you have cloned it.
Add the following two entries to LLaMA-Factory/data/dataset_info.json:

"AceCodePair-300K": {
    "hf_hub_url": "TIGER-Lab/AceCodePair-300K",
    "ranking": true,
    "columns": {
      "prompt": "instruction",
      "query": "input",
      "chosen": "chosen",
      "rejected": "rejected"
    }
  },
"AceCodePair-QwenCoderIns32B": {
    "hf_hub_url": "TIGER-Lab/AceCodePair-QwenCoderIns32B",
    "ranking": true,
    "columns": {
      "prompt": "instruction",
      "query": "input",
      "chosen": "chosen",
      "rejected": "rejected"
    }
  }

Training

Change the output_dir field in the yaml files that you have copied for the desired model output path.
Run:

llamafactory-cli train train_qwen_coder_ins_2.5_{7/32}b.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Installation

Setup

Training

Files

README.md

Latest commit

History

README.md

File metadata and controls

Installation

Setup

Training