This repository provides toy examples demonstrating the concept of Verbalized Machine Learning (VML) introduced by the paper:
Verbalized Machine Learning: Revisiting Machine Learning with Language Models
Tim Z. Xiao, Robert Bamler, Bernhard Schölkopf, Weiyang Liu
Paper: https://arxiv.org/abs/2406.04344
VML introduces a new framework of machine learning. Unlike conventional machine learning models that are typically optimized over a continuous parameter space, VML constrains the parameter space to be human-interpretable natural language. Such a constraint leads to a new perspective of function approximation, where an LLM with a text prompt can be viewed as a function parameterized by the text prompt.
Many classical machine learning problems can be solved under this new framework using an LLM-parameterized learner and optimizer. The major advantages of VML include:
- Easy encoding of inductive bias: prior knowledge about the problem and hypothesis class can be encoded in natural language and fed into the LLM-parameterized learner.
- Automatic model class selection: the optimizer can automatically select a concrete model class based on data and verbalized prior knowledge, and it can update the model class during training.
- Interpretable learner updates: the LLM-parameterized optimizer can provide explanations for why each learner update is performed.
- Tutorial: Colab hands-on with linear regression
- Exp: Regression examples
- Linear
- Polynormial
- Sine
- Exp: Classification examples
- 2D plane
- Medical Image (PneumoniaMNIST)
Python 3.10
Other dependencies are in requirements.txt
VML uses pretrained LLMs as excution engines. Hence, we need to have access to an LLM endpoint. This can be done through either the OpenAI endpoint (if you have an account), or open-source models such as Llama.
(Of cource, you can also manually copy/paste the entire prompt into ChatGPT website to have a quick tryout without setting up the endpoints.)
To use LLMs service provided by OpenAI, you can copy your OpenAI API key to the variable OPENAI_API_KEY
.
vLLM provides an easy and fast inference engine for many open-source LLMs including Llama. After you install vLLM, you can start a Llama API server using the following command. vLLM uses the same API interface as OpenAI.
python -m vllm.entrypoints.openai.api_server \
--model <HUGGINGFACE_MODEL_DIR> \
--dtype auto \
--api-key token-abc123 \
--tensor-parallel-size <NUMBER_OF_GPU>
python regression.py \
--model "llama" \
--task "linear_regression" \
--batch_size 10 \
--eval_batch_size 100 \
--epochs 5
Following is the Bibtex for the VML paper:
@article{xiao2024verbalized,
title = {Verbalized Machine Learning: Revisiting Machine Learning with Language Models},
author = {Xiao, Tim Z. and Bamler, Robert and Schölkopf, Bernhard and Liu, Weiyang},
journal = {arXiv preprint arXiv:2406.04344},
year = {2024},
}
We welcome the community to submit pull request for any new example of VML into this repo! We hope this repo provides interesting examples of VML and inspires new ideas for future LLMs research!