-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathREADME.Rmd
269 lines (197 loc) · 7.7 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
---
output: github_document
---
<!-- README.md is generated from README.Rmd. Please edit that file -->
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)
options(tibble.print_min = 5, tibble.print_max = 5)
```
# qexplore <a href="https://github.com/jarvisc1/qexplore"><img src="man/figures/logo.png" align="right" height="139" alt="qexplore website" /></a>
<!-- badges: start -->
<!-- [![R-CMD-check](https://github.com/jarvic1/qexplore/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/jarvic1/qexplore/actions/workflows/R-CMD-check.yaml) -->
<!-- [![CRAN status](https://www.r-pkg.org/badges/version/qexplore)](https://CRAN.R-project.org/package=qexplore) -->
<!-- badges: end -->
The goal of `qexplore` is to streamline the process of exploring and interrogating data. It allows users to quickly list rows of a data frame that meet certain criteria or tabulate data based on specific conditions.
There are two functions `qlist` and `qtab` and they allow you to quickly list or tabulate your data.
- `qlist` is tidyverse compatiable as the input and output is a `data.frame`.
- `qtab` is also compatible with `tidyverse` as the input is a `data.frame` and the output is a `tabyl`
- This also has the unintended consequence that it can be combined with `gt` and `flextable`.
## Installation
You can install `qexplore` from [GitHub](https://github.com/) with:
```r
# install.packages("remotes")
remotes::install_github("jarvisc1/qexplore")
```
`qexplore` is not currently available on [CRAN](https://cran.r-project.org/).
## Usage
Here are some examples of how to use `qexplore`. First we look at `qlist` and then `qtab`.
The real power of these commands comes from three arguments.
- `.if`: this is like filter in dplyr, or the `df[**here**, ]` in base or data.table or the `if` option in `Stata` where the command would be written as `list var1 if var2 == 1`
- `.in`: this allows you to select the rows that you'd like to look at very similar to the `in` option in `Stata` or filtering by rows with base or `data.table`.
- `.by`: Only for `qtab` this allow for a 2x2 table to be printing several times by a third variable. Similar to doing a `table()` by three variables but I prefer this syntax as I think it's cleaner. **Note:** using `.by` will result in the output being a list of `tabyl` so I doubt it will be compatible with `gt` or `flex.table`.
### Listing Rows
Use `qlist` to filter and explore subsets of data:
```{r}
library(qexplore)
# Example dataset
data <- data.frame(
drink_type = c("Latte", "Espresso", "Cappuccino", "Americano"),
size = c("Small", "Large", "Medium", "Large"),
day_of_week = c("Mon", "Tue", "Wed", "Thu")
)
# List specific rows
data |>
qlist(drink_type, size, day_of_week, .in = 1:2)
# Filter and list
data |>
qlist(drink_type, size, .if = size == "Large")
```
### Relation to tidyverse
`qlist` takes a dataframe as input and outputs a dataframe. Therefore it can be connected to other tidyverse functions.
```{r}
library(qexplore)
library(dplyr)
# Example dataset
data <- data.frame(
drink_type = c("Latte", "Espresso", "Cappuccino", "Americano"),
size = c("Small", "Large", "Medium", "Large"),
day_of_week = c("Mon", "Tue", "Wed", "Thu")
)
# List specific rows
data |>
filter(size %in% c("Small", "Large")) |>
mutate(fav_drink = if_else(drink_type == "Latte", "Fav Drink", "Rubbish Drinks")) |>
qlist(drink_type, fav_drink, size, day_of_week, .in = 1:2) |>
arrange(fav_drink)
```
### Tabulating Data
Use `qtab` to create summary tables:
```{r}
larger_data <- data.frame(
drink_type = sample(data$drink_type, 200, replace = TRUE),
size = sample(data$size, 200, replace = TRUE),
day_of_week = sample(c("Mon", "Tue", "Wed", "Thu", "Fri", "Sat", "Sun"), 200, replace = TRUE),
quantity = sample(1:5, 200, replace = TRUE), # Add variability for quantity
price = round(runif(200, 2.5, 6.5), 2) # Add a random price column
)
# Tabulate a single variable
larger_data |>
qtab(drink_type)
# Tabulate two variables
larger_data |>
qtab(drink_type, size)
# Grouped tabulations
larger_data |>
filter(day_of_week %in% c("Mon", "Tues")) |>
qtab(drink_type, size, .by = day_of_week)
```
### Tabulating Data with different percentages
Use `qtab` with different percentages:
```{r}
# Tabulate two variables
larger_data |>
qtab(drink_type, size, per = "all")
# column percentages
larger_data |>
qtab(drink_type, size, per = "col")
# row percentages
larger_data |>
qtab(drink_type, size, per = "row")
# no percentages
larger_data |>
qtab(drink_type, size, per = "none")
# Grouped tabulations
data |>
qtab(drink_type, size, .by = day_of_week, per = "none")
```
### Advanced Tabulations
```{r}
# Tabulate with a filter condition
data |>
qtab(drink_type, size, .if = size == "Large")
# Tabulate specific rows
data |>
qtab(drink_type, size, .in = 1:3)
# Group and filter tabulations
data |>
qtab(drink_type, size, .by = day_of_week, .if = day_of_week %in% c("Mon", "Tue"))
```
### Relation to tidyverse
`qtab` outputs a `tabyl` `data.frame` object. Therefore `qtab` can be piped into anything that `tabyl` can (I think?)
```{r}
library(qexplore)
library(dplyr)
library(janitor)
library(gt)
library(flextable)
# Tabulate specific rows
data |>
qtab(drink_type, size, .in = 1:3) |>
adorn_title(col_name = "Drink size", row_name = "Drink Type")
```
### Relation to gt and flextable
Due to `qtab` being a `tabyl`, this has the consequence that it can be piped into `gt` and `flextable` though this is the purpose as qexplore is about quickly exploring data via the console. This is a nice feature maybe this will have some unintended consequences.
```{r}
library(qexplore)
library(dplyr)
library(janitor)
library(gt)
# Tabulate specific rows
gt_table <- data |>
qtab(drink_type, size, .in = 1:3) |>
gt() |>
tab_header(
title = "Drink Types by Day of the Week",
subtitle = "Percentage Breakdown with Totals"
)
```
```{r, echo = FALSE}
gtsave(gt_table, filename = "man/figures/gt_table.png")
```
![](man/figures/gt_table.png)
```{r}
library(qexplore)
library(dplyr)
library(janitor)
library(flextable)
# Tabulate specific rows
tabyl_data <- data |>
qtab(drink_type, size, .in = 1:3)
# Convert tabyl to flextable
flex_table <- tabyl_data |>
flextable() |>
set_header_labels(
drink_type = "Drink Type",
Total = "Weekly Total"
) |>
add_header_row(
values = c("", "Day of the Week"), colwidths = c(1, ncol(tabyl_data) - 1)
) |>
theme_vanilla() |>
autofit()
# View the flextable
flex_table
```
<!-- ## Documentation -->
<!-- The following resources will help you get started: -->
<!-- - [Package index](https://github.com/jarvic1/qexplore/reference) -->
<!-- Overview of all `qexplore` functions. -->
<!-- - [Getting started guide](https://github.com/jarvic1/qexplore/articles/qexplore.html) -->
<!-- Introductory guide to using `qexplore`. -->
<!-- - [Examples](https://github.com/jarvic1/qexplore/examples) -->
<!-- Real-world examples of data exploration with `qexplore`. -->
## Acknowledgements
This package brings together if, in, and by arguments from Stata. It replicates something similar that you can do in `data.table`.
It also fits a tidyverse framework and uses tidyverse especially `dplyr` and `janitor`.
I'd like to thank everyone that was involved in all of the software above. Especially `tidyplots` for the idea that you can take something as well established as `ggplot2` and still make something possibly quicker and maybe more accessible that might help enable more people to use R.
`qexplore` leverages the following amazing packages to do the heavy lifting:
cli,
dplyr,
rlang,
tidyselect, and
tidyr.