Skip to content

A list of software and papers related to automatic/fast Exploratory Data Analysis

Notifications You must be signed in to change notification settings

OmicsAcademy/autoEDA-resources

 
 

Repository files navigation

autoEDA-resources

A list of software related to automatic Exploratory Data Analysis

R packages

My summary of R packages is available on arxiv

Complete Package

  • dataMaid (CRAN package) - automated checks of data validity.

  • DataExplorer (CRAN package) - automated data exploration (including univariate and bivariate plots, PCA) and treatment.

  • funModeling (CRAN package) - automated EDA, simple feature engineering and outlier detection.

  • SmartEDA (CRAN package) - automated generation of descriptive statistics and uni- and bivariate plots, parallel coordinate plots. Details can be found in a dedicated paper.

  • autoEDA (GitHub package) - automated EDA with uni- and bivariate plots. An article with an introduction can be found on LinkedIn.

    • auto-EDA (GitHub package) - uni- and bivariate plots for data exploration in regression and classification problem. The package cleans data automatically to improve the plots. Another version of Xander Horn's package.
  • visdat (CRAN package) - 6 exploratory/diagnostic plots for initial data analysis.

  • dlookr (CRAN package) - tools for data quality diagnosis, basic exploration and feature transformations.

  • FactoInvestigate (CRAN package) - has an automatic reporting module which selects best plots that summarise different projection techniques.

  • xray (CRAN package) - first look at the data - distributions and anomalies. More in the blog post.

  • arsenal (CRAN package) - statistical summaries (models and exploration) and quick reporting.

  • RtutoR (CRAN package) - learning material with a automatic reports module. More at R-Bloggers.

  • exploreR (CRAN package) - exploration based on univariate linear regression.

  • summarytools (CRAN package) - table to summarise datasets and perform simple uni- and bivariate analyses.

Packages in Development

  • AEDA (GitHub package) - summary statistics, correlation analysis, cluster analysis, PCA & other projections.

  • dataexpks (GitHub package) - quick reports with basic data summaries.

  • automatic-data-explorer (GitHub package) - basic EDA and creating Markdown reports from multiple R scripts.

  • xda (GitHub package) - basic data summaries.

  • EDA - stub of a package.

  • modeler (GitHub package) - tools for exploration and pre-processing.

  • IEDA (GitHub package) - EDA simplified through interactive visualization.

  • seda (GitHub package) - fast EDA tool in active development.

Domain-specific packages

Related packages

  • vtreat (CRAN package) - data treatment (pre-processing) that includes dealing with missing data and large categorical variables. Details can be found in the paper about vtreat.

Python libraries

Complete Packages

  • Dora (pip library) - data cleaning, featuring engineering and simple modeling tools.

  • statsModels (pip library) - collection of statistical tools, including EDA.

  • TPOT (pip library) - autoML tool with feature engineering module.

  • HoloViews (pip library) - automated visualization based on short data annotations.

  • lens (pip library) - fast calculation of summary statistics and correlations. Presentation about the library.

  • pandas-profiling - popular library for quick data summaries and correlation analysis.

  • speedML (pip library) - large library for ML with module dedicated to fast EDA.

Packages in Development

  • edaviz - Python library for fast data exploration in private beta testing phase. Will provide functions for dataset overviews, bivariate plots and finding good predictors.

  • basic-auto-EDA (GitHub library) - automatic report generation.

  • automated_EDA - stub of a library.

Web services

  • DIVE - MIT's tools for data exploration that tries to choose best (most informative) visualizations.

  • Automatic Statistician - tool for automated EDA and modeling.

  • Several Shiny apps by R Squared Computing, including visulizer and descriptr.

Standalone software

  • auto-eda - automatic EDA with SQL.

  • elycite - tools for exploration and modelling available (locally) as an web application. Designed for NLP problems.

Papers

About

A list of software and papers related to automatic/fast Exploratory Data Analysis

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • HTML 99.6%
  • Other 0.4%