This repository contains official implementation of the paper Thermometer: Towards Universal Calibration for Large Language Models.
pip install torch==2.2.1 transformers==4.28.1 evaluate==0.4.1 tqdm pandas
- includes all the information of training configurations and model hyper-parameter.
- includes the dataloader to load and process the datasets.
-
- process_mrqa.py pre-process the raw data in free-form QA datasets MRQA;
- extract_features.py aims to extract labels, features, and logits from pretrained LLMs;
- train_thermometer.py and eval_thermometer.py contain the main function to train Thermometer, and the functions to evaluate calibration performance of trained Thermometer, respectively.
-
exract.sh
-
train.sh
-
eval.sh
-
Free-form QA task requires an additional step to pre-process the raw data, i.e., append the LLM's response to the prompts,
mrqa.sh
-
--model_type decoder_only --model_name Llama-2-7b-chat-hf --model_type encoder_decoder --model_name flan-t5-xl
@InProceedings{pmlr-v235-shen24c,
title = {Thermometer: Towards Universal Calibration for Large Language Models},
author = {Shen, Maohao and Das, Subhro and Greenewald, Kristjan and Sattigeri, Prasanna and Wornell, Gregory W. and Ghosh, Soumya},
booktitle = {Proceedings of the 41st International Conference on Machine Learning},
pages = {44687--44711},
year = {2024},
editor = {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix},
volume = {235},
series = {Proceedings of Machine Learning Research},
month = {21--27 Jul},
publisher = {PMLR}
}