forked from joachim-gassen/rdfanalysis
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathREADME.Rmd
244 lines (181 loc) · 7.46 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
---
output: github_document
author: Joachim Gassen
---
```{r setup, include=FALSE}
set.seed(42)
knitr::opts_chunk$set(echo = TRUE, warning = FALSE, message = FALSE, cache = TRUE)
library(tidyverse)
library(rdfanalysis)
library(knitr)
```
# Researcher Degrees of Freedom Analysis
## A package to explore and document your degrees of freedom
This in-development package provides a coding infrastructure
that allows researchers to systematically document and explore their
researcher degrees of freedom when conducting analyses on observational data.
The resulting code base is self-documenting,
supports unit testing and power simulations based on simulated data. The
documented researcher degrees of freedom can be exhausted to generate a
distribution of outcome estimates.
### New: Plot a specification curve
The below displays a specification curve ([Simonsohn, Simmons and Nelson](https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2694998)) based
on an systematic exploration of the Preston curve. It summarizes 11,264
regressions in one plot. See this [blog article](https://joachim-gassen.github.io/2019/04/11264-regressions-in-one-tidy-plot/)
and this [vignette](https://joachim-gassen.github.io/rdfanalysis/articles/analyzing_rdf.html)
for more detail. While the code works with the result data frame created
following the procedure below, it also works on data frames that contain
choices and coefficient estimates created by other means.
``` {r spec_curve}
# devtools::install_github("joachim-gassen/rdfanalysis")
library(rdfanalysis)
load(url("https://joachim-gassen.github.io/data/rdf_ests.RData"))
plot_rdf_spec_curve(ests, "est", "lb", "ub")
```
### Even newer: Let your researcher's degrees of freedom shine!
To explore your specification curve you can also use a shiny frontend that
is included with the package. If you use the coding infrastructure from the
package it even allows you to interactively explore your findings from a
single regression output to the full specification curve. [See for yourself](https://jgassen.shinyapps.io/shiny_rdf_spec_curve).
``` {r shiny_pec_curve, eval = FALSE}
# devtools::install_github("joachim-gassen/rdfanalysis")
library(rdfanalysis)
load(url("https://joachim-gassen.github.io/data/rdf_ests.RData"))
# The following is based on having a local fork of the repo. See the
# vignettes of the package to learn more about how to use the full
# rdfanalsis package.
design <- define_design(steps = c("read_data",
"select_idvs",
"treat_extreme_obs",
"specify_model",
"est_model"),
rel_dir = "vignettes/case_study_code")
shiny_rdf_spec_curve(ests, list("est", "lb", "ub"),
design, "vignettes/case_study_code",
"https://joachim-gassen.github.io/data/wb_new.csv")
```
![A Shiny Specification Curve](docs/articles/shiny_rdf_spec_curve.gif)
## A Package Tour
To provide a quick tour of the full package and its workflow
I will use a "research design" where an
independent variable x is confounded by a co-variate z and where the
only researcher degree of freedom is whether to control for z in the
regression setup. We will ignore the testing bit in this quick walk-through.
For a more in-depth introduction into the package, please refer to the
[vignette included in the documentation](https://joachim-gassen.github.io/rdfanalysis/articles/analyzing_rdf.html).
### Step 1: Open a new Rstudio project in a clean directory and install the `rdfanalysis` package
```{r library, eval = FALSE}
devtools::install_github("joachim-gassen/rdfanalysis")
library(rdfanalysis)
```
### Step 2: Write a function that simulates data
```{r simulate_data}
sim_data <- function(n, effect_size) {
z <- rnorm(n)
x <- rnorm(n) + z
y <- effect_size*x + z + rnorm(n)
data.frame(x = x, y = y, z = z)
}
```
### Step 3: Define your research design by a series of functions
Each research design consists of a series of steps. Each step becomes a
function. To initialize these functions, you can use the call `define_design()`.
It creates a `code` directory in your current working directory and produces
template code for each step. In this case, our design will have only one step.
```{r define_design, eval = FALSE}
design <- define_design("est_model")
```
```{r define_design_static, include = FALSE}
design <- c("est_model")
```
### Step 4: Develop your code for each step
Edit the resulting template file `est_model.R` in the code directory
until it looks like the code below.
```{r steps}
est_model <- function(input = NULL, choice = NULL) {
step_description <- c(
"## Estimate model",
"### Content",
"",
"This step estimates on OLS model based on simulated data."
)
choice_description <- c(
"### Choice",
"",
"A character value `control_for_z` that may take one of the following values:",
"",
"- `yes`: control for z",
"- `no`: do not control for z"
)
choice_type <- list(
list(name = "control_for_z",
type = "character",
valid_values = c("yes", "no"))
)
if (is.null(choice)) return(list(
step_description = step_description,
choice_description = choice_description,
choice_type = choice_type
)) else check_choice(choice, choice_type)
# ___ Analysis code starts below ___
if(choice[[1]] == "yes")
mod <- lm(y ~ x + z, data = input)
else mod <- lm(y ~ x, data = input)
return(list(
data = list(
est = summary(mod)$coefficient[2,1],
lb = confint(mod)[2,1],
ub = confint(mod)[2,2]
),
protocol = list(choice)
))
}
```
### Step 5: Source your code
```{r source_design, eval = FALSE}
source_design(design)
```
### Step 6: Test your code
```{r test_design}
test_design(design, input = sim_data(100, 0.1), reporter = "minimal")
```
### Step 7: Document your design
The below serves documentation purposes. The function `prepare_design_documentation()`
generates a PDF file in your project directory that documents your code. For it
to work you need a local `R Markdown` installation that is capable to produce PDF files.
```{r document_design, eval = FALSE}
prepare_design_documentation(design, output_file = "my_design.pdf")
```
`prepare_design_flow_chart()` produces a quick visual of the code flow.
```{r flow_chart}
prepare_design_flow_chart(design, landscape = TRUE)
```
### Step 8: Run a single protocol of choices
```{r single_protocol}
sim_data(100, 0.1) %>%
est_model("yes")
```
### Step 9: Assess the power of your analysis for a certain protocol
```{r sim_power}
power_df <- simulate_design_power(design, protocol = list("yes"),
input_sim_func = sim_data,
range_n = seq(100, 1000, 100),
effect_size = 0.1)
library(tidyverse)
power_df %>%
group_by(n) %>%
summarise(power = sum(lb > 0)/n()) %>%
ggplot(aes(x = n, y = power)) +
geom_line() +
theme_minimal()
```
### Step 10: Exhaust your researcher degrees of freedom
```{r exhaust_rdf, results = "hide"}
df <- exhaust_design(design, sim_data(1000, 0.1))
```
```{r display_rdf}
kable(df)
```
Only two researcher degrees of freedom in this setting but you will easily get into the thousands in a real research setting.
For a real-life case study on how to use `rdfanalysis`, please refer to the
[vignette included in the documentation](https://joachim-gassen.github.io/rdfanalysis/articles/analyzing_rdf.html).