README.Rmd

---
output: github_document
---

<!-- README.md is generated from README.Rmd. Please edit that file -->

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "man/figures/README-",
  out.width = "100%"
)
options(tibble.print_min = 5, tibble.print_max = 5)
```

# qexplore <a href="https://github.com/jarvisc1/qexplore"><img src="man/figures/logo.png" align="right" height="139" alt="qexplore website" /></a>

<!-- badges: start -->
<!-- [![R-CMD-check](https://github.com/jarvic1/qexplore/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/jarvic1/qexplore/actions/workflows/R-CMD-check.yaml) -->
<!-- [![CRAN status](https://www.r-pkg.org/badges/version/qexplore)](https://CRAN.R-project.org/package=qexplore) -->
<!-- badges: end -->

The goal of `qexplore` is to streamline the process of exploring and interrogating data. It allows users to quickly list rows of a data frame that meet certain criteria or tabulate data based on specific conditions. 

There are two functions `qlist` and `qtab` and they allow you to quickly list or tabulate your data. 

- `qlist` is tidyverse compatiable as the input and output is a `data.frame`. 
- `qtab` is also compatible with `tidyverse` as the input is a `data.frame` and the output is a `tabyl` 
- This also has the unintended consequence that it can be combined with `gt` and `flextable`. 

## Installation

You can install `qexplore` from [GitHub](https://github.com/) with:

```r
# install.packages("remotes")
remotes::install_github("jarvisc1/qexplore")
```

`qexplore` is not currently available on [CRAN](https://cran.r-project.org/).

## Usage

Here are some examples of how to use `qexplore`. First we look at `qlist` and then `qtab`.

The real power of these commands comes from three arguments. 

- `.if`: this is like filter in dplyr, or the `df[**here**, ]` in base or data.table or the `if` option in `Stata` where the command would be written as `list var1 if var2 == 1` 
- `.in`: this allows you to select the rows that you'd like to look at very similar to the `in` option in `Stata` or filtering by rows with base or `data.table`.
- `.by`: Only for `qtab` this allow for a 2x2 table to be printing several times by a third variable. Similar to doing a `table()` by three variables but I prefer this syntax as I think it's cleaner. **Note:** using `.by` will result in the output being a list of `tabyl` so I doubt it will be compatible with `gt` or `flex.table`. 

### Listing Rows
Use `qlist` to filter and explore subsets of data:
  
```{r}
library(qexplore)

# Example dataset
data <- data.frame(
  drink_type = c("Latte", "Espresso", "Cappuccino", "Americano"),
  size = c("Small", "Large", "Medium", "Large"),
  day_of_week = c("Mon", "Tue", "Wed", "Thu")
)

# List specific rows
data |>
  qlist(drink_type, size, day_of_week, .in = 1:2)

# Filter and list
data |>
  qlist(drink_type, size, .if = size == "Large")
```

### Relation to tidyverse
`qlist` takes a dataframe as input and outputs a dataframe. Therefore it can be connected to other tidyverse functions. 

```{r}
library(qexplore)
library(dplyr)

# Example dataset
data <- data.frame(
  drink_type = c("Latte", "Espresso", "Cappuccino", "Americano"),
  size = c("Small", "Large", "Medium", "Large"),
  day_of_week = c("Mon", "Tue", "Wed", "Thu")
)

# List specific rows
data |> 
  filter(size %in% c("Small", "Large")) |> 
  mutate(fav_drink = if_else(drink_type == "Latte", "Fav Drink", "Rubbish Drinks")) |> 
  qlist(drink_type, fav_drink, size, day_of_week, .in = 1:2) |> 
  arrange(fav_drink)
  
```

### Tabulating Data
Use `qtab` to create summary tables:
  
```{r}
larger_data <- data.frame(
  drink_type = sample(data$drink_type, 200, replace = TRUE),
  size = sample(data$size, 200, replace = TRUE),
  day_of_week = sample(c("Mon", "Tue", "Wed", "Thu", "Fri", "Sat", "Sun"), 200, replace = TRUE),
  quantity = sample(1:5, 200, replace = TRUE),  # Add variability for quantity
  price = round(runif(200, 2.5, 6.5), 2)       # Add a random price column
)


# Tabulate a single variable
larger_data |>
  qtab(drink_type)

# Tabulate two variables
larger_data |>
  qtab(drink_type, size)

# Grouped tabulations
larger_data |> 
  filter(day_of_week %in% c("Mon", "Tues")) |> 
  qtab(drink_type, size, .by = day_of_week)
```

### Tabulating Data with different percentages
Use `qtab` with different percentages:
  
```{r}

# Tabulate two variables
larger_data |>
  qtab(drink_type, size, per = "all")

# column percentages
larger_data |>
  qtab(drink_type, size, per = "col")

# row percentages
larger_data |>
  qtab(drink_type, size, per = "row")
# no percentages
larger_data |>
  qtab(drink_type, size, per = "none")

# Grouped tabulations
data |>
  qtab(drink_type, size, .by = day_of_week, per = "none")
```

### Advanced Tabulations

```{r}
# Tabulate with a filter condition
data |>
  qtab(drink_type, size, .if = size == "Large")

# Tabulate specific rows
data |>
  qtab(drink_type, size, .in = 1:3) 

# Group and filter tabulations
data |>
  qtab(drink_type, size, .by = day_of_week, .if = day_of_week %in% c("Mon", "Tue"))
```


### Relation to tidyverse
`qtab` outputs a `tabyl` `data.frame` object. Therefore `qtab` can be piped into anything that `tabyl` can  (I think?)

```{r}
library(qexplore)
library(dplyr)
library(janitor)
library(gt)
library(flextable)

# Tabulate specific rows
data |>
  qtab(drink_type, size, .in = 1:3) |> 
   adorn_title(col_name = "Drink size", row_name = "Drink Type")
  
```


### Relation to gt and flextable
Due to `qtab`  being a `tabyl`, this has the consequence that it can be piped into `gt` and `flextable` though this is the purpose as qexplore is about quickly exploring data via the console. This is a nice feature maybe this will have some unintended consequences. 

```{r}
library(qexplore)
library(dplyr)
library(janitor)
library(gt)

# Tabulate specific rows
gt_table <- data |>
  qtab(drink_type, size, .in = 1:3) |> 
  gt() |> 
    tab_header(
    title = "Drink Types by Day of the Week",
    subtitle = "Percentage Breakdown with Totals"
  )
  
```

```{r, echo = FALSE}
gtsave(gt_table, filename = "man/figures/gt_table.png")

```

![](man/figures/gt_table.png)


```{r}
library(qexplore)
library(dplyr)
library(janitor)
library(flextable)

# Tabulate specific rows
tabyl_data <- data |>
  qtab(drink_type, size, .in = 1:3) 

# Convert tabyl to flextable
flex_table <- tabyl_data |> 
  flextable() |> 
  set_header_labels(
    drink_type = "Drink Type",
    Total = "Weekly Total"
  ) |> 
  add_header_row(
    values = c("", "Day of the Week"), colwidths = c(1, ncol(tabyl_data) - 1)
  ) |> 
  theme_vanilla() |> 
  autofit()

# View the flextable
flex_table

```


<!-- ## Documentation -->

<!-- The following resources will help you get started: -->

<!-- - [Package index](https://github.com/jarvic1/qexplore/reference)   -->
<!-- Overview of all `qexplore` functions. -->

<!-- - [Getting started guide](https://github.com/jarvic1/qexplore/articles/qexplore.html)   -->
<!-- Introductory guide to using `qexplore`. -->

<!-- - [Examples](https://github.com/jarvic1/qexplore/examples)   -->
<!-- Real-world examples of data exploration with `qexplore`. -->

## Acknowledgements

This package brings together if, in, and by arguments from Stata. It replicates something similar that you can do in `data.table`.
It also fits a tidyverse framework and uses tidyverse especially `dplyr` and `janitor`. 

I'd like to thank everyone that was involved in all of the software above. Especially `tidyplots` for the idea that you can take something as well established as `ggplot2` and still make something possibly quicker and maybe more accessible that might help enable more people to use R. 

`qexplore` leverages the following amazing packages to do the heavy lifting:
    cli,
    dplyr,
    rlang,
    tidyselect, and
    tidyr.