Verbalized Machine Learning -- Examples

This repository provides toy examples demonstrating the concept of Verbalized Machine Learning (VML) introduced by the paper:

Verbalized Machine Learning: Revisiting Machine Learning with Language Models
Tim Z. Xiao, Robert Bamler, Bernhard Schölkopf, Weiyang Liu
Paper: https://arxiv.org/abs/2406.04344

VML introduces a new framework of machine learning. Unlike conventional machine learning models that are typically optimized over a continuous parameter space, VML constrains the parameter space to be human-interpretable natural language. Such a constraint leads to a new perspective of function approximation, where an LLM with a text prompt can be viewed as a function parameterized by the text prompt.

Many classical machine learning problems can be solved under this new framework using an LLM-parameterized learner and optimizer. The major advantages of VML include:

Easy encoding of inductive bias: prior knowledge about the problem and hypothesis class can be encoded in natural language and fed into the LLM-parameterized learner.
Automatic model class selection: the optimizer can automatically select a concrete model class based on data and verbalized prior knowledge, and it can update the model class during training.
Interpretable learner updates: the LLM-parameterized optimizer can provide explanations for why each learner update is performed.

TODO

Environment

Python 3.10

Other dependencies are in requirements.txt

Step 1 - Setup LLMs Endpoint

VML uses pretrained LLMs as excution engines. Hence, we need to have access to an LLM endpoint. This can be done through either the OpenAI endpoint (if you have an account), or open-source models such as Llama.

(Of cource, you can also manually copy/paste the entire prompt into ChatGPT website to have a quick tryout without setting up the endpoints.)

(a) OpenAI Endpoint

To use LLMs service provided by OpenAI, you can copy your OpenAI API key to the variable OPENAI_API_KEY.

(b; alternatively) Local Endpoint: Start the vLLM API server in a separate terminal

vLLM provides an easy and fast inference engine for many open-source LLMs including Llama. After you install vLLM, you can start a Llama API server using the following command. vLLM uses the same API interface as OpenAI.

python -m vllm.entrypoints.openai.api_server \
--model <HUGGINGFACE_MODEL_DIR> \
--dtype auto \
--api-key token-abc123 \
--tensor-parallel-size <NUMBER_OF_GPU>

Step 2: VML Quickstart

Train Regression Example Command

python regression.py \
--model "llama" \
--task "linear_regression" \
--batch_size 10 \
--eval_batch_size 100 \
--epochs 5

Citation: Bibtex for VML

Following is the Bibtex for the VML paper:

@article{xiao2024verbalized,
  title = {Verbalized Machine Learning: Revisiting Machine Learning with Language Models},
  author = {Xiao, Tim Z. and Bamler, Robert and Schölkopf, Bernhard and Liu, Weiyang},
  journal = {arXiv preprint arXiv:2406.04344},
  year = {2024},
}

Contributing to this repo

We welcome the community to submit pull request for any new example of VML into this repo! We hope this repo provides interesting examples of VML and inspires new ideas for future LLMs research!

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
imgs		imgs
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
models.py		models.py
optimizers.py		optimizers.py
regression.py		regression.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Verbalized Machine Learning -- Examples

TODO

Environment

Step 1 - Setup LLMs Endpoint

(a) OpenAI Endpoint

(b; alternatively) Local Endpoint: Start the vLLM API server in a separate terminal

Step 2: VML Quickstart

Train Regression Example Command

Citation: Bibtex for VML

Contributing to this repo

About

Languages

License

timxzz/VML_Examples

Folders and files

Latest commit

History

Repository files navigation

Verbalized Machine Learning -- Examples

TODO

Environment

Step 1 - Setup LLMs Endpoint

(a) OpenAI Endpoint

(b; alternatively) Local Endpoint: Start the vLLM API server in a separate terminal

Step 2: VML Quickstart

Train Regression Example Command

Citation: Bibtex for VML

Contributing to this repo

About

Resources

License

Stars

Watchers

Forks

Languages