Skip to content

Latest commit

 

History

History
114 lines (84 loc) · 3.74 KB

README.md

File metadata and controls

114 lines (84 loc) · 3.74 KB

plotMissing

The goal of the plotMissing library is to allow users to easily plot the number of missing values for each column in a tibble object. All of the plot-generating functions in this library are designed to interact with the tidyverse set of packages, and the functions are designed to give users a wide range of flexibility and customization.

The plotMissing::plotMissing function generates a stacked column plot depicting the number of missing (NA) and observed (non-NA) values for each selected column in a tibble. The package provides built-in customization options including changing the colours of the bars, the names shown in the legend, and selecting a subset of the columns to plot.

The plotMissing::plotMissing function is partially based on the missingno Python library (in particular, the msno.bar function) in Python, with many additional customization options for the resulting plot built directly into the function.

Installation

Installing the most recent version

You can install the most recently-published version of the plotMissing library from GitHub with the following code:

# install.packages("devtools")
devtools::install_github("stat545ubc-2023/plotMissing")

Installing from a specific release

If you would like to install a specific version of the plotMissing library (e.g. v0.1.0), you can install the specified version with the following code:

# install.packages("devtools")
devtools::install_github("stat545ubc-2023/plotMissing", ref = "v0.1.0")

In the code above, the input in the ref argument must match the tag of the plotMissing library which you would like to install.

Examples

Because the plotMissing function produces a ggplot2 object, it would be helpful to cover a few visual examples of the outputs of this function to demonstrate what the different parameters in the function represent.

library(plotMissing)

# Load in a dataset and coerce it to a tibble
airquality <- tibble::tibble(datasets::airquality)

Basic function call

plotMissing(airquality)

Changing the plot with the built-in function parameters

plotMissing(airquality, 
            cols = tidyselect::starts_with(c("O", "S", "T", "W")),
            count_names = c("NA", "Not NA"),
            bar_colours = c("#ABCDEF", "#FEDCBA"))

Adding additional graphical parameters with the ... argument

plotMissing(airquality, 
            ggplot2::theme_classic(),
            ggplot2::labs(x = "Column Name", y = "Number of Rows",
                          fill = "NA or not?"),
            cols = tidyselect::starts_with(c("O", "S", "T", "W")),
            count_names = c("NA", "Not NA"),
            bar_colours = c("#ABCDEF", "#FEDCBA"))

Future Updates

There are some future updates planned for the plotMissing package which are meant to improve the ‘customizability’ of the plots which the package generates. The future features include the following:

  • Create plots for only one “group” (e.g. only display the number of missing values per column)
  • Improving the language for the count_names and bar_colours parameters to make them more intuitive
  • Customizing the order of the bars in the plot
  • Letting the users decide whether the coordinates should be flipped
  • Letting users decide which bar (missing values or observed values) is on the left