A list of software related to automatic Exploratory Data Analysis
My summary of R packages is available on arxiv
-
dataMaid (CRAN package) - automated checks of data validity.
-
DataExplorer (CRAN package) - automated data exploration (including univariate and bivariate plots, PCA) and treatment.
-
funModeling (CRAN package) - automated EDA, simple feature engineering and outlier detection.
-
SmartEDA (CRAN package) - automated generation of descriptive statistics and uni- and bivariate plots, parallel coordinate plots. Details can be found in a dedicated paper.
-
autoEDA (GitHub package) - automated EDA with uni- and bivariate plots. An article with an introduction can be found on LinkedIn.
- auto-EDA (GitHub package) - uni- and bivariate plots for data exploration in regression and classification problem. The package cleans data automatically to improve the plots. Another version of Xander Horn's package.
-
visdat (CRAN package) - 6 exploratory/diagnostic plots for initial data analysis.
-
dlookr (CRAN package) - tools for data quality diagnosis, basic exploration and feature transformations.
-
FactoInvestigate (CRAN package) - has an automatic reporting module which selects best plots that summarise different projection techniques.
-
xray (CRAN package) - first look at the data - distributions and anomalies. More in the blog post.
-
arsenal (CRAN package) - statistical summaries (models and exploration) and quick reporting.
-
RtutoR (CRAN package) - learning material with a automatic reports module. More at R-Bloggers.
-
exploreR (CRAN package) - exploration based on univariate linear regression.
-
summarytools (CRAN package) - table to summarise datasets and perform simple uni- and bivariate analyses.
-
AEDA (GitHub package) - summary statistics, correlation analysis, cluster analysis, PCA & other projections.
-
dataexpks (GitHub package) - quick reports with basic data summaries.
-
automatic-data-explorer (GitHub package) - basic EDA and creating Markdown reports from multiple R scripts.
-
xda (GitHub package) - basic data summaries.
-
EDA - stub of a package.
-
modeler (GitHub package) - tools for exploration and pre-processing.
-
IEDA (GitHub package) - EDA simplified through interactive visualization.
-
seda (GitHub package) - fast EDA tool in active development.
-
RBioPlot (GitHub package) - automated data analysis and visualization for molecular biology. Details can be found in the paper at NCBI.
- vtreat (CRAN package) - data treatment (pre-processing) that includes dealing with missing data and large categorical variables. Details can be found in the paper about vtreat.
-
Dora (pip library) - data cleaning, featuring engineering and simple modeling tools.
-
statsModels (pip library) - collection of statistical tools, including EDA.
-
TPOT (pip library) - autoML tool with feature engineering module.
-
HoloViews (pip library) - automated visualization based on short data annotations.
-
lens (pip library) - fast calculation of summary statistics and correlations. Presentation about the library.
-
pandas-profiling - popular library for quick data summaries and correlation analysis.
-
speedML (pip library) - large library for ML with module dedicated to fast EDA.
-
edaviz - Python library for fast data exploration in private beta testing phase. Will provide functions for dataset overviews, bivariate plots and finding good predictors.
-
basic-auto-EDA (GitHub library) - automatic report generation.
-
automated_EDA - stub of a library.
-
DIVE - MIT's tools for data exploration that tries to choose best (most informative) visualizations.
-
Automatic Statistician - tool for automated EDA and modeling.
-
Several Shiny apps by R Squared Computing, including visulizer and descriptr.
-
auto-eda - automatic EDA with SQL.
-
elycite - tools for exploration and modelling available (locally) as an web application. Designed for NLP problems.
-
Interactive Data Exploration with “Big Data Tukey Plots” - automated visualization of big data.
-
Foresight: Recommending Visual Insights - Foresight is a system that helps the user rapidly discover visual insights from large high-dimensional datasets.
-
Augmenting Visualizations with Interactive Data Facts to Facilitate Interpretation and Communication.
-
DIVE: A Mixed-Initiative System Supporting Integrated Data Exploration Workflows. The web app is available on MIT website.
-
Voyager: Exploratory Analysis via Faceted Browsing of Visualization Recommendations.
-
Agency plus Automation: Designing Artificial Intelligence into Interactive Systems