Skip to content

A visualization experience of AI/ML academic papers hosted on ArXiV - for project work at the University of California, Berkeley MIDS program (W209, Data Visualization).

License

Notifications You must be signed in to change notification settings

cbenge509/arxiv-ai-analysis

Repository files navigation

arXiv.org AI/ML Analysis

GitHub GitHub Pipenv locked Python version GitHub Pipenv locked dependency version GitHub Pipenv locked dependency version GitHub Pipenv locked dependency version GitHub Pipenv locked dependency version GitHub Pipenv locked dependency version GitHub Pipenv locked dependency version

U.C. Berkeley, Masters in Information & Data Science program - datascience@berkeley
Summer 2020, W209 - Data Visualization - Andrew Reagan, PhD - Section 4


Description

This repo contains the draft work for the visualization of AI/ML research papers catalogued on arXiv.org for calendary years 1993 through 2019. Categories under consideration have been limited to:

  • Computer Science: Artificial Intelligence [cs: AI]
  • Computer Science: Machine Learning [cs: LG]
  • Statistics: Machine Learning [stat: ML]

There are two external visuals for this project:


This project leverages the following visualization frameworks:


Highlight of key files included in this repository:

File Description
ArXiV AI & ML Analytics - Midterm.pptx Midterm presentation for arXiv AI & ML Analysis solution
w209_assignment_2__cris_benge.pdf Assignment 2, covering thorough review and initial hypothesis testing of arXiv data.
load_base_data.py Processes the base arXiv categories data, storing the output into a single Pandas.DataFrame (HDF5 file)
refine_data_for_analysis.py Processes the consolidated (but raw) arXiv categories data, generating the final analysis output data (CSV file)
utils/preprocessing.py Utility class; used for loading the raw arXiv data and generating the processed analysis dataset
plot_utils.py Utility class; used for generating various plots in EDA
Exploratory Data Analysis.ipynb Jupyter Notebook demonstrating the basic exploratory data analysis performed
Clustering and Topic Modeling.ipynb Jupyter Notebook containing the walk-through for clustering and topic modeling of the arXiv dataset

Visualization Samples




References

Data was collected from the tremendous work provided by the arxiv_archive repo and all due credit is referred to:

Geiger, R. Stuart (2020). ArXiV Archive: A Tidy and Complete Archive of Metadata for Papers on arxiv.org. doi | url.


License

Licensed under the MIT License. See LICENSE file for more details.

About

A visualization experience of AI/ML academic papers hosted on ArXiV - for project work at the University of California, Berkeley MIDS program (W209, Data Visualization).

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •