Benchmarking Natural Language to Visualization Models

Abstract

This project serves as a tool to test the quality of Natural Language to Visualization (NL2Viz) models based on existing benchmarks. Read the project report here. View the final presentation here. The project was for 6.S079 in the Spring 2022 semester.

Project Overview

The app uses a React front-end (bootstrapped with create-react-app) and a Python backend using the Flask web framework. The benchmark is provided by nvBench [1], which gives a list of $(query, viz)$ pairs. The following NL2Viz models are supported on the server:

ncNet [2]
nl4dv [3]

The tool is deployed at https://nl2viz.herokuapp.com/. Due to the resource-intensive nature of these models, however, it is highly likely that the application crashes several times. Use at your own risk!

Structure

A high-level overview of the project structure is shown below.

📦final-project
 ┣ 📂client
 ┃ ┣ 📂public
 ┃ ┣ 📂src
 ┃ ┃ ┣ 📂components
 ┃ ┃ ┗ 📂pages
 ┣ 📂server
 ┃ ┣ 📂benchmark
 ┃ ┃ ┣ 📂data
 ┃ ┃ ┗ 📜benchmark_meta.json
 ┃ ┣ 📂models
 ┃ ┃ ┣ 📂ncNet
 ┃ ┃ ┗ 📂nl4dv
 ┃ ┣ 📂scripts
 ┃ ┃ ┣ 📜config.py
 ┃ ┃ ┣ 📜get_benchmark_meta.py
 ┃ ┃ ┗ 📜sqlite_to_csv.py
 ┃ ┣ 📜api.py
 ┃ ┗ 📜model_setup.py
 ┣ 📜app.py
 ┣ 📜nltk.txt
 ┣ 📜requirements.txt
 ┗ 📜runtime.txt

This project is separated into a client directory and a server directory. The client handles user interaction, while the server handles the actual data processing and keeps track of the Nl2Viz model instances.

The app.py file is the entry point for the server, while [client/src/index.js] is the entry point for the client, as is standard for React apps.

Datasets

The datasets available in this tool are found here. Note that the only datasets that exist in this directory are those that have an associated benchmark. Some benchmarks require more than one dataset to be used in order to produce the final result; these benchmarks are excluded in this tool. Therefore the datasets that are never used by a benchmark are also not included.

Local Use/Development

Follow the steps below to get started:

Clone the repository into your local workspace.
Set up a virtual environment with Python version no greater than 3.9. This is required for the models to work, since they both use older versions of libraries that have been depracated in the newer versions of Python.
Start the virtual environment.
Install the dependencies in your virtual environment by running pip install -r requirements.txt. NOTE: requirements.txt installs the CPU version of pytorch. This is necessary for the production environment, but may yield slower processing times in development. If you are planning on doing a lot of development, also make sure to install the GPU version by installing the requirements in dev-requirements.txt. This requirements file also contains modules required to perform evaluation, such as Dask and scikit-image. If you do choose to re-run evaluation, also make sure to install the node modules by running npm i in the project root directory.
For the nl4dv model to work, install the following dependencies separately (see the nl4dv documentation for more details.):
- python -m nltk.downloader popular
- python -m spacy download en_core_web_sm
In the root directory, run flask run to start the server. To start it in development mode, create a .flaskenv file in the root directory and add FLASK_ENV=development. The server is served at http://localhost:5000 by default.
Navigate to the client/ directory and run npm i to install the dependencies for the React frontend.
Still in client/ run npm start to start the client, served at http://localhost:3000.
For full access to all of the features, still in client/, run npm run build to build the latest version of the frontend. Then, instead of needing to start the client, simply start the server and navigate to http://localhost:5000 which serves the static buildpack.

References

1. `nvBench` (Benchmark)

Authors: Yuyu Luo, Nan Tang, Guoliang Li, Chengliang Chai, Wenbo Li, and Xuedi Qin
Github
Paper

2. `ncNet` (Model)

Authors: Yuyu Luo, Nan Tang, Guoliang Li, Jiawei Tang, Chengliang Chai, and Xuedi Qin
Github
Paper

3. `nl4dv` (Model)

Authors: Arpit Narechania, Arjun Srinivasan, and John Stasko
Github
Paper

Name		Name	Last commit message	Last commit date
Latest commit History 114 Commits
assets		assets
client		client
server		server
.gitignore		.gitignore
.slugignore		.slugignore
Procfile		Procfile
README.md		README.md
app.py		app.py
clean_metrics.ipynb		clean_metrics.ipynb
compute_metrics.ipynb		compute_metrics.ipynb
dev-requirements.txt		dev-requirements.txt
nltk.txt		nltk.txt
package-lock.json		package-lock.json
package.json		package.json
requirements.txt		requirements.txt
runtime.txt		runtime.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Benchmarking Natural Language to Visualization Models

Abstract

Project Overview

Structure

Datasets

Local Use/Development

References

1. `nvBench` (Benchmark)

2. `ncNet` (Model)

3. `nl4dv` (Model)

About

Releases

Languages

casillasenrique/nl2viz

Folders and files

Latest commit

History

Repository files navigation

Benchmarking Natural Language to Visualization Models

Abstract

Project Overview

Structure

Datasets

Local Use/Development

References

1. nvBench (Benchmark)

2. ncNet (Model)

3. nl4dv (Model)

About

Resources

Stars

Watchers

Forks

Releases

Languages

1. `nvBench` (Benchmark)

2. `ncNet` (Model)

3. `nl4dv` (Model)