Installation and Usage

The source code is written in Python 3 and tested using Python 3.7 on Mac OS and Ubuntu virtual machine (ICSE'22). It is recommended to use a virtual Python environment for this setup. Furthermore, we used bash shell scripts to automate running benchmark and Python scripts.

Environment Setup

Follow the instructions below to set up the environment and run the source code.

Follow these steps to create a virtual environment:

Install Anaconda [installation guide]. For new installation, run conda init <SHELL>. Similiarly, you can also use miniconda (https://docs.conda.io/en/latest/miniconda.html#latest-miniconda-installer-links).
Create a new Python environment. Run on the command line:

conda create --name pipeline python=3.7

conda activate pipeline

The shell should look like: (pipeline) $ . Now, continue to step 2 and install packages using this shell. When you are done with the experiments, you can exit this virtual environment by conda deactivate.

Clone this repository.

git clone https://github.com/sumonbis/DS-Pipeline.git

Navigate to the cloned repository: cd DS-Pipeline/ and install required packages:

pip install -r requirements.txt

Run the pipeline generator tool

Navigate to the benchmark directory cd src/. To run the pipeline generator run

python pipeline-generator.py

The tool takes two inputs:

The notebooks that it parses as .py files.
The API dictionary that maps each API to a pipeline stage.

The tool produces one output file pipe.txt that contains generated pipelines. The output is organized as follows: the name of the root directory is followed by the python scripts under the directory, and then it prints the pipeline. The stages are: Data Acquisition, Data Preparation, Modeling, Training, Evaluation, Prediction.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

INSTALL.md

INSTALL.md

Installation and Usage

Environment Setup

Run the pipeline generator tool

Files

INSTALL.md

Latest commit

History

INSTALL.md

File metadata and controls

Installation and Usage

Environment Setup

Run the pipeline generator tool