Skip to content

Commit

Permalink
chapter corrections
Browse files Browse the repository at this point in the history
  • Loading branch information
nhejazi committed Mar 11, 2021
1 parent 7a5244d commit 80269a4
Show file tree
Hide file tree
Showing 7 changed files with 646 additions and 442 deletions.
2 changes: 1 addition & 1 deletion .travis.yml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
branches:
only:
- master
- master

env:
global:
Expand Down
4 changes: 2 additions & 2 deletions 03-tlverse.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -85,8 +85,8 @@ https://github.com/tlverse, not yet on [CRAN](https://CRAN.R-project.org/). You
can use the [`usethis` package](https://usethis.r-lib.org/) to install them:

```{r installation, eval=FALSE}
install.packages("usethis")
usethis::install_github("tlverse/tlverse")
install.packages("devtools")
devtools::install_github("tlverse/tlverse")
```

The `tlverse` depends on a large number of other packages that are also hosted
Expand Down
666 changes: 338 additions & 328 deletions 06-sl3.Rmd

Large diffs are not rendered by default.

114 changes: 84 additions & 30 deletions 07-tmle3.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -13,60 +13,114 @@ Based on the [`tmle3` `R` package](https://github.com/tlverse/tmle3).

## Introduction

In the previous chapter on `sl3` we learned how to estimate a regression function like $E[Y|X]$ from data. That's an important first step in learning from data, but how can we use this predictive model to estimate statistical and causal effects?

Going back to the roadmap in Chapter 1, suppose we'd like to estimate the effect of a treatment variable $A$ on an outcome $Y$. As discussed, one potential parameter that characterizes that effect is the Average Treatment Effect ATE, defined as: $\psi_0=E_W[E[Y|A=1,W]-E[Y|A=0,W]]$ and interpreted as the difference in mean outcome under when treatment $A=1$ and $A=0$, averaging over the distribution of covariates $W$. We'll illustrate several potential estimators for this parameter, and motivate the use of TMLE, using the following example data:
In the previous chapter on `sl3` we learned how to estimate a regression
function like $\mathbb{E}[Y \mid X]$ from data. That's an important first step
in learning from data, but how can we use this predictive model to estimate
statistical and causal effects?

Going back to the roadmap in Chapter 1, suppose we'd like to estimate the effect
of a treatment variable $A$ on an outcome $Y$. As discussed, one potential
parameter that characterizes that effect is the Average Treatment Effect (ATE),
defined as $\psi_0 = \mathbb{E}_W[E[Y \mid A=1,W] - \mathbb{E}[Y \mid A=0,W]]$
and interpreted as the difference in mean outcome under when treatment $A=1$ and
$A=0$, averaging over the distribution of covariates $W$. We'll illustrate
several potential estimators for this parameter, and motivate the use of TMLE,
using the following example data:

```{r tmle_fig1, results="asis", echo = FALSE}
knitr::include_graphics("img/misc/tmle_sim/schematic_1_truedgd.png")
```

The small ticks on the right indicate the mean outcomes (averaging over $W$) under $A=1$ and $A=0$ respectively, so their difference is the quantity we'd like to estimate.
The small ticks on the right indicate the mean outcomes (averaging over $W$)
under $A=1$ and $A=0$ respectively, so their difference is the quantity we'd
like to estimate.

While we hope to motivate the application of TMLE in this chapter, we refer the interested reader to the two Targeted Learning books and associated works for full technical details.
While we hope to motivate the application of TMLE in this chapter, we refer the
interested reader to the two Targeted Learning books and associated works for
full technical details.

### Substitution Estimators

We can use `sl3` to fit a Super Learner or other regression model to estimate the function $E[Y|A,W]$. We refer to this function as $\bar{Q}_0(A,W)$ and our estimate of it as $\bar{Q}_n(A,W)$. We can then directly "plug-in" that estimate to obtain an estimate of the ATE: $\hat{\psi}_n=\frac{1}{n}\sum(\bar{Q}_n(1,W)-\bar{Q}_n(0,W))$. This kind of estimator is called a plug-in or substitution estimator, as we substitute our estimate $Q_n(A,W)$ of the function $Q_0(A,W)$ for the function itself.

We can use `sl3` to fit a Super Learner or other regression model to estimate
the function $\mathbb{E}_0[Y \mid A,W]$. We refer to this function as
$\bar{Q}_0(A,W)$ and our estimate of it as $\bar{Q}_n(A,W)$. We can then
directly "plug-in" that estimate to obtain an estimate of the ATE:
$\hat{\psi}_n=\frac{1}{n}\sum(\bar{Q}_n(1,W)-\bar{Q}_n(0,W))$. This kind of
estimator is called a plug-in or substitution estimator, as we substitute our
estimate $Q_n(A,W)$ of the function $Q_0(A,W)$ for the function itself.

Applying `sl3` to estimate the outcome regression in our example, we can see that it fits the data quite well:
Applying `sl3` to estimate the outcome regression in our example, we can see
that it fits the data quite well:

```{r tmle_fig2, results="asis", echo = FALSE}
knitr::include_graphics("img/misc/tmle_sim/schematic_2b_sllik.png")
```

The solid lines indicate the `sl3` estimate of the regression function, with the dotted lines indicating the `tmle3` update described below.
The solid lines indicate the `sl3` estimate of the regression function, with the
dotted lines indicating the `tmle3` update described below.

While substitution estimators are intuitive, naively using this approach with a Super Learner estimate of $\bar{Q}_0(A,W)$ has several limitations. First, Super Learner is selecting learner weights to minimize risk across the entire regression function, instead of "targeting" the ATE parameter we hope to estimate, leading to biased estimation. That is, `sl3` is trying to do well on the full regression curve on the left, instead of focusing on the small ticks on the right. What's more, the sampling distribution of this approach is not asymptotically linear, and therefore inference is not possible.
While substitution estimators are intuitive, naively using this approach with a
Super Learner estimate of $\bar{Q}_0(A,W)$ has several limitations. First, Super
Learner is selecting learner weights to minimize risk across the entire
regression function, instead of "targeting" the ATE parameter we hope to
estimate, leading to biased estimation. That is, `sl3` is trying to do well on
the full regression curve on the left, instead of focusing on the small ticks on
the right. What's more, the sampling distribution of this approach is not
asymptotically linear, and therefore inference is not possible.

We can see these limitations illustrated in the estimates generated for the example data:
We can see these limitations illustrated in the estimates generated for the
example data:

```{r tmle_fig3, results="asis", echo = FALSE}
knitr::include_graphics("img/misc/tmle_sim/schematic_3_effects.png")
```

We see that Super Learner, estimates the true parameter value (indicated by the dashed vertical line) more accurately than GLM. However, it is still less accurate than TMLE, and valid inference is not possible. In contrast, TMLE achieves a less biased estimator and valid inference.
We see that Super Learner, estimates the true parameter value (indicated by the
dashed vertical line) more accurately than GLM. However, it is still less
accurate than TMLE, and valid inference is not possible. In contrast, TMLE
achieves a less biased estimator and valid inference.

## TMLE

TMLE takes an initial estimate of $\bar{Q}_0(A,W)$ as well as an estimate of the propensity score $\bar{g}_0(A|W)=p(A|W)$ and produces an updated estimate $\bar{Q}^{\star}_0(A,W)$ that is "targeted" to the parameter of interest. TMLE keeps the benefits of substitution estimators (it is one), but augments the original estimates to correct for bias and also results in an asymptotically linear (and thus normally-distributed) estimator with consistent Wald-style confidence intervals.

There are different types of TMLE, sometimes for the same set of parameters, but below is an example of the algorithm for estimating the ATE. $\bar{Q}^{\star}_n(A,W)$ is the TMLE augmented estimate
TMLE takes an initial estimate $\bar{Q}_n(A,W)$ as well as an estimate of the
propensity score $g_n(A \mid W) = p(A \mid W)$ and produces an updated estimate
$\bar{Q}^{\star}_n(A,W)$ that is "targeted" to the parameter of interest. TMLE
keeps the benefits of substitution estimators (it is one), but augments the
original estimates to correct for bias and also results in an asymptotically
linear (and thus normally-distributed) estimator with consistent Wald-style
confidence intervals.

There are different types of TMLE, sometimes for the same set of parameters,
but below is an example of the algorithm for estimating the ATE.
$\bar{Q}^{\star}_n(A,W)$ is the TMLE-augmented estimate
$f(\bar{Q}^{\star}_n(A,W)) = f(\bar{Q}_n(A,W)) + \epsilon_n \cdot h_n(A,W)$,
where $f(\cdot)$ is the appropriate link function (e.g., logit), $\epsilon_n$
is an estimated coefficient and $h_n(A,W)$ is a "clever covariate". In this case, $h_n(A,W) = \frac{A}{g_n(W)}-\frac{1-A}{1-g_n(W)}$, with $g_n(W)
= \mathbb{P}(A=1 \mid W)$ being the estimated (also by SL) propensity score,
so the estimator depends both on initial SL fit of the outcome regression
($\bar{Q}_0$) and an SL fit of the propensity score ($g_n$).

There are further robust augmentations that are used in `tlverse`, such as an added layer of cross-validation to avoid over-fitting bias (CV-TMLE), and so called methods that can more robustly estimated several parameters simultaneously (e.g., the points on a survival curve).
where $f(\cdot)$ is the appropriate link function (e.g., logit), $\epsilon_n$ is
an estimated coefficient and $h_n(A,W)$ is a "clever covariate". In this case,
$h_n(A,W) = \frac{A}{g_n(A \mid W)} - \frac{1-A}{1-g_n(A, W)}$, with $g_n(A, W)
= \mathbb{P}(A=1 \mid W)$ being the estimated (also by SL) propensity score, so
the estimator depends both on initial SL fit of the outcome regression
($\bar{Q}_n$) and an SL fit of the propensity score ($g_n$).

There are further robust augmentations that are used in `tlverse`, such as an
added layer of cross-validation to avoid over-fitting bias (CV-TMLE), and so
called methods that can more robustly estimated several parameters
simultaneously (e.g., the points on a survival curve).

### Inference

Because TMLE yields an **asymptotically linear**, estimator, obtaining inference is trivial. Each TMLE is associated with an **influence function** that describes its asymptotic distribution, and Wald-style inference can be obtained by plugging into this function our estimates $\bar{Q}^{\star}_n$ and $g_n$ and taking the sample standard error.
Because TMLE yields an **asymptotically linear**, estimator, obtaining inference
is trivial. Each TMLE is associated with an **influence function** that
describes its asymptotic distribution, and Wald-style inference can be obtained
by plugging into this function our estimates $\bar{Q}^{\star}_n$ and $g_n$ and
taking the sample standard error.

The following sections describe both a simple and more detailed way of specifying and estimating a TMLE in the `tlverse`. In designing `tmle3`, we sought to replicate as closely as possible the very general estimation framework of TMLE, and so each theoretical object relevant to TMLE is encoded in a corresponding software object. First, we will present the simple application of `tmle3` to the WASH Benefits exaple, and then go on to describe the underlying objects in more detail.
The following sections describe both a simple and more detailed way of
specifying and estimating a TMLE in the `tlverse`. In designing `tmle3`, we
sought to replicate as closely as possible the very general estimation framework
of TMLE, and so each theoretical object relevant to TMLE is encoded in a
corresponding software object. First, we will present the simple application of
`tmle3` to the WASH Benefits example, and then go on to describe the underlying
objects in more detail.

## Easy-Bake Example: `tmle3` for ATE

Expand Down Expand Up @@ -121,7 +175,7 @@ Currently, missingness in `tmle3` is handled in a fairly simple way:

* Missing covariates are median (for continuous) or mode (for discrete)
imputed, and additional covariates indicating imputation are generated
* Observations missing either treatment or outcome variables are excluded.
* Observations missing treatment variable are excluded.

We implemented IPCW-TMLE to more efficiently handle missingness in the outcome
variable, and we plan to implement an IPCW-TMLE to handle missingness in the
Expand All @@ -140,8 +194,8 @@ node_list <- processed$node_list
`tmle3` is general, and allows most components of the TMLE procedure to be
specified in a modular way. However, most end-users will not be interested in
manually specifying all of these components. Therefore, `tmle3` implements a
`tmle3_Spec` object that bundles a set ofcomponents into a _specification_
that, with minimal additional detail, can be run by an end-user.
`tmle3_Spec` object that bundles a set of components into a _specification_
("Spec") that, with minimal additional detail, can be run by an end-user.

We'll start with using one of the specs, and then work our way down into the
internals of `tmle3`.
Expand All @@ -164,7 +218,7 @@ to be estimated with `sl3`:
```{r tmle3-learner-list}
# choose base learners
lrnr_mean <- make_learner(Lrnr_mean)
lrnr_xgb <- make_learner(Lrnr_xgboost)
lrnr_rf <- make_learner(Lrnr_ranger)
# define metalearners appropriate to data types
ls_metalearner <- make_learner(Lrnr_nnls)
Expand All @@ -173,11 +227,11 @@ mn_metalearner <- make_learner(
loss_loglik_multinomial
)
sl_Y <- Lrnr_sl$new(
learners = list(lrnr_mean, lrnr_xgb),
learners = list(lrnr_mean, lrnr_rf),
metalearner = ls_metalearner
)
sl_A <- Lrnr_sl$new(
learners = list(lrnr_mean, lrnr_xgb),
learners = list(lrnr_mean, lrnr_rf),
metalearner = mn_metalearner
)
learner_list <- list(A = sl_A, Y = sl_Y)
Expand Down
163 changes: 88 additions & 75 deletions DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,82 +1,95 @@
Package: tlversehandbook
Title: Targeted Learning in R with the 'tlverse'
Version: 0.0.4
Authors@R: c(
person("Jeremy", "Coyle", email = "[email protected]",
role = "aut",
comment = c(ORCID = "0000-0002-9874-6649")),
person("Nima", "Hejazi", email = "[email protected]",
role = c("aut", "cre", "cph"),
comment = c(ORCID = "0000-0002-7127-2789")),
person("Ivana", "Malenica", email = "[email protected]",
role = "aut",
comment = c(ORCID = "0000-0002-7404-8088")),
person("Rachael", "Phillips", email = "[email protected]",
role = "aut",
comment = c(ORCID = "0000-0002-8474-591X")),
person("Alan", "Hubbard", email = "[email protected]",
role = c("aut", "ths"),
comment = c(ORCID = "0000-0002-3769-0127")),
person("Mark", "van der Laan", email = "[email protected]",
role = c("aut", "ths"),
comment = c(ORCID = "0000-0003-1432-5511"))
)
Version: 0.1.0
Authors@R:
c(person(given = "Jeremy",
family = "Coyle",
role = "aut",
email = "[email protected]",
comment = c(ORCID = "0000-0002-9874-6649")),
person(given = "Nima",
family = "Hejazi",
role = c("aut", "cre", "cph"),
email = "[email protected]",
comment = c(ORCID = "0000-0002-7127-2789")),
person(given = "Ivana",
family = "Malenica",
role = "aut",
email = "[email protected]",
comment = c(ORCID = "0000-0002-7404-8088")),
person(given = "Rachael",
family = "Phillips",
role = "aut",
email = "[email protected]",
comment = c(ORCID = "0000-0002-8474-591X")),
person(given = "Alan",
family = "Hubbard",
role = c("aut", "ths"),
email = "[email protected]",
comment = c(ORCID = "0000-0002-3769-0127")),
person(given = "Mark",
family = "van der Laan",
role = c("aut", "ths"),
email = "[email protected]",
comment = c(ORCID = "0000-0003-1432-5511")))
Maintainer: Nima Hejazi <[email protected]>
Description: An open source reproducible handbook for causal machine learning
and data science with the targeted learning methodology, with an emphasis on
practical examples and tutorials using the 'tlverse' ecosystem of packages.
Depends: R (>= 3.6.0)
Description: An open source reproducible handbook for causal machine
learning and data science with the targeted learning methodology, with
an emphasis on practical examples and tutorials using the 'tlverse'
ecosystem of packages.
URL: https://github.com/tlverse/tlverse-workshop,
https://tlverse.org/tlverse-handbook
BugReports: https://github.com/tlverse/tlverse-workshop/issues
Depends:
R (>= 3.6.0)
Imports:
bookdown,
bslib,
downlit,
rmarkdown,
ggplot2,
tibble,
tidyr,
dplyr,
readr,
knitr,
stringr,
skimr,
kableExtra,
ggfortify,
data.table,
mvtnorm,
dagitty,
ggdag,
randomForest,
forecast,
delayed,
origami,
sl3,
tmle3,
tmle3mopttx,
tmle3shift
bookdown,
bslib,
dagitty,
data.table,
delayed,
downlit,
dplyr,
forecast,
ggdag,
ggfortify,
ggplot2,
kableExtra,
knitr,
mvtnorm,
origami,
randomForest,
readr,
rmarkdown,
skimr,
sl3,
stringr,
tibble,
tidyr,
tmle3,
tmle3mopttx,
tmle3shift
Suggests:
nnls,
Rsolnp,
arm,
gam,
e1071,
glmnet,
xgboost,
speedglm,
ranger,
SuperLearner,
hal9001,
haldensify,
polspline
arm,
e1071,
gam,
glmnet,
hal9001,
haldensify,
nnls,
polspline,
ranger,
Rsolnp,
speedglm,
SuperLearner,
xgboost
Remotes:
github::rstudio/bslib,
github::rstudio/bookdown,
github::tlverse/sl3@devel,
github::tlverse/tmle3@master,
github::tlverse/tmle3mopttx@5ba5f65,
github::tlverse/tmle3shift@master,
github::nhejazi/haldensify@f0de4b5
URL:
https://github.com/tlverse/tlverse-workshop,
https://tlverse.org/tlverse-handbook
BugReports: https://github.com/tlverse/tlverse-workshop/issues
github::nhejazi/haldensify@f0de4b5,
github::rstudio/bookdown,
github::rstudio/bslib,
github::tlverse/sl3@devel,
github::tlverse/tmle3@master,
github::tlverse/tmle3mopttx@5ba5f65,
github::tlverse/tmle3shift@master
Encoding: UTF-8
RoxygenNote: 7.1.1
12 changes: 6 additions & 6 deletions index.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -149,10 +149,10 @@ network) and adaptive sequential designs.
### Rachael Phillips {-}

Rachael Phillips is a PhD student in biostatistics, advised by Alan Hubbard and
Mark van der Laan. She has an MA in Biostatistics, BS in Biology, and BA in
Mathematics. A student of targeted learning and causal inference; her research
integrates personalized medicine, human-computer interaction, experimental
design, and regulatory policy.
Mark van der Laan. She has an MA in Biostatistics, BS in Biology, and BA in
Mathematics. A student of targeted learning and causal inference; her research
integrates personalized medicine, human-computer interaction, experimental
design, and regulatory policy.

### Alan Hubbard {-}

Expand Down Expand Up @@ -219,8 +219,8 @@ introductory resources:

For a general introduction to causal inference, we recommend

* [Miguel A. Hernán and James M. Robins' _Causal Inference_, forthcoming
2020](https://www.hsph.harvard.edu/miguel-hernan/causal-inference-book/)
* [Miguel A. Hernán and James M. Robins' _Causal Inference: What If_,
2021](https://www.hsph.harvard.edu/miguel-hernan/causal-inference-book/)
* [Jason A. Roy's _A Crash Course in Causality: Inferring Causal Effects from
Observational Data_ on
Coursera](https://www.coursera.org/learn/crash-course-in-causality)
Expand Down
Loading

0 comments on commit 80269a4

Please sign in to comment.