This is the official implementation of our work "DS-Agent: Automated Data Science by Empowering Large Language Models with Case-Based Reasoning". [arXiv Version] [Download Benchmark(Google Drive)]
We select 30 representative data science tasks covering three data modalities and two fundamental ML task types. Please download the datasets and corresponding configuration files via [Google Drive] here and unzip them to the directory of "development/benchmarks". Besides, we collect the human insight cases from Kaggle in development/data.zip. Please unzip it, too.
This project is built on top of the framework of MLAgentBench. First, install MLAgentBench package with:
cd development
pip install -e.
Then, please install neccessary libraries in the requirements.
pip install -r requirements.txt
Since DS-Agent mainly utilizes GPT-3.5 and GPT-4 for all the experiments, please fill in the openai key in development/MLAgentBench/LLM.py and deployment/generate.py
Run DS-Agent for development tasks with the following command:
cd development/MLAgentBench
python runner.py --task feedbackv2 --llm-name gpt-3.5-turbo-16k --edit-script-llm-name gpt-3.5-turbo-16k
During execution, logs and intermediate solution files will be saved in logs/ and workspace/.
Run DS-Agent for deployment tasks with the provided command:
cd deployment
bash code_generation.sh
bash code_evaluation.sh
For open-sourced LLM, i.e., mixtral-8x7b-Instruct-v0.1 in this paper, we utilize the vllm framework. First, enable the LLMs serverd with
cd deployment
bash start_api.sh
Then, run the script shell and replace the configuration --llm by mixtral.
Please consider citing our paper if you find this work useful:
@article{DS-Agent,
title={DS-Agent: Automated Data Science by Empowering Large Language Models with Case-Based Reasoning},
author={Guo, Siyuan and Deng, Cheng and Wen, Ying and Chen, Hechang and Chang, Yi and Wang, Jun},
journal={arXiv preprint arXiv:2402.17453},
year={2024}
}