Isobolic Interaction Identification and Estimation using Data-Adaptive Stochastic Interventions Authors: David McCoy
The IsoXshift
R package offers an approach which identifies the
minimum effort intervention on two exposures which, if the population
were given these intervention levels, would result in a target outcome.
This parameter reflects the most synergistic interaction or set of
interactions in a mixed exposure. The target parameter is similar to
isobolic interactions used in toxicology studies where researchers
investigate how much simultaneous dose of two exposures results in a
target outcome, like cancer or cell death in cultures.
From a policy perspective, this parameter represents the most efficient intervention that can be done on a mixed exposure to get to a desired outcome. This is because, for a collection of possible interventions or changes to pollutants, for example, we find the exposure set that most efficiently results in an expected outcome close to our desired outcome. Efficient here means the exposure(s) need to be shifted the least to get to a desired outcome, like pre-industry levels of thyroid cancer etc.
This package first identifies the most efficient intervention policy that gets to a desired outcome using g-computation which results in two exposure levels a set of exposures should be set to. Because it’s unrealistic to set a population to this specific oracle intervention, because the likelihood of certain individuals exposed to this level may be near 0, we instead estimate the effects of the policy if we were to get everyone as close as possible to this oracle level. This is done by finding an intervention level as close to the oracle level as possible under some restrictions that the individual conditional likelihood of being exposed doesn’t move too far away from their observed levels.
We the estimate the impact of our “intention to intervene” using CV-TMLE. Using this oracle point paramater as our target we shift individuals as close as possible to this level without violating the density ratio, the intervention level exposure likelihood compared to observed level likelihood. Thus, each individuals actual intervention is different but is aimed towards the target, hence intention to intervene.
We define interaction as the counterfactual mean of the outcome under stochastic interventions of two exposures compared to the additive counterfactual mean of the two exposures intervened on independently. These interventions or exposure changes depend on naturally observed values, as described in past literature (Dı́az and van der Laan 2012; Haneuse and Rotnitzky 2013), but with our new parameter in mind. Thus, what is estimated is like asking, what the expected outcome is if we were to enforce the most efficient policy intervention in a realistic setting where not everyone can actually receive that exact exposure level or levels.
To utilize the package, users need to provide vectors for exposures, covariates, and outcomes. They also specify the target_outcome_lvl for the outcome, epsilon, which is some allowed closeness to the target. For example, if the target outcome level is 15, and epsilon is 0.5, then interventions that lead to 15.5 are considered. The restriction limit is hn_trunc_thresh which is the allowed distance from the original exposure likelihood. 10 for example indicates that the likelihood should not be more than x10 difference from the original exposure level likelihood. That is, if an individual’s likelihood is originally 0.1 given their covariate history and the likelihood of exposure to the intervened level is 0.01, this is 10 times different and would be the limit intervention.
A detailed guide is provided in the vignette. With these inputs,
IsoXshift
processes the data and delivers tables showcasing
fold-specific results and aggregated outcomes, allowing users to glean
insights effectively.
IsoXshift
also incorporates features from the sl3
package (Coyle,
Hejazi, Malenica, et al. 2022), facilitating ensemble machine learning
in the estimation process. If the user does not specify any stack
parameters, IsoXshift
will automatically create an ensemble of machine
learning algorithms that strike a balance between flexibility and
computational efficiency.
Note: Because the IsoXshift
package (currently) depends on sl3
that allows ensemble machine learning to be used for nuisance parameter
estimation and sl3
is not on CRAN the IsoXshift
package is not
available on CRAN and must be downloaded here.
IsoXshift
uses the sl3
package to build ensemble machine learners
for each nuisance parameter. We have to install off the development
branch, first download these two packages for sl3
remotes::install_github("tlverse/sl3@devel")
Make sure sl3
installs correctly then install IsoXshift
remotes::install_github("blind-contours/IsoXshift@main")
To illustrate how IsoXshift
may be used to ascertain the effect of a
mixed exposure, we will use synthetic data from the National Institute
of Environmental Health. Let’s first load the relevant packages:
library(IsoXshift)
library(devtools)
#> Loading required package: usethis
library(kableExtra)
library(sl3)
seed <- 429153
set.seed(seed)
We will directly use synthetic data from the NIEHS used to test new mixture methods. This data has built in strong positive and negative marginal effects and certain interactions. Found here: https://github.com/niehs-prime/2015-NIEHS-MIxtures-Workshop
data("NIEHS_data_1", package = "IsoXshift")
NIEHS_data_1$W <- rnorm(nrow(NIEHS_data_1), mean = 0, sd = 0.1)
w <- NIEHS_data_1[, c("W", "Z")]
a <- NIEHS_data_1[, c("X1", "X2", "X3", "X4", "X5", "X6", "X7")]
y <- NIEHS_data_1$Y
head(NIEHS_data_1) %>%
kbl(caption = "NIEHS Data") %>%
kable_classic(full_width = F, html_font = "Cambria")
obs | Y | X1 | X2 | X3 | X4 | X5 | X6 | X7 | Z | W |
---|---|---|---|---|---|---|---|---|---|---|
1 | 7.534686 | 0.4157066 | 0.5308077 | 0.2223965 | 1.1592634 | 2.4577556 | 0.9438601 | 1.8714406 | 0 | 0.1335790 |
2 | 19.611934 | 0.5293572 | 0.9339570 | 1.1210595 | 1.3350074 | 0.3096883 | 0.5190970 | 0.2418065 | 0 | 0.0585291 |
3 | 12.664050 | 0.4849759 | 0.7210988 | 0.4629027 | 1.0334138 | 0.9492810 | 0.3664090 | 0.3502445 | 0 | 0.1342057 |
4 | 15.600288 | 0.8275456 | 1.0457137 | 0.9699040 | 0.9045099 | 0.9107914 | 0.4299847 | 1.0007901 | 0 | 0.0734320 |
5 | 18.606498 | 0.5190363 | 0.7802400 | 0.6142188 | 0.3729743 | 0.5038126 | 0.3575472 | 0.5906156 | 0 | -0.0148427 |
6 | 18.525890 | 0.4009491 | 0.8639886 | 0.5501847 | 0.9011016 | 1.2907615 | 0.7990418 | 1.5097039 | 0 | 0.1749775 |
This data has X1 and X7 has the most synergy or super-additive effect so we might expect to find this relationship as the most synergistic exposure relationship based on our definition. It is also possible that the most efficient intervention is one that intervenes on an antagonistic pair, shifting positive associations higher and negative lower in the antagonistic interaction.
ptm <- proc.time()
sim_results <- IsoXshift(
w = w,
a = a,
y = y,
n_folds = 6,
num_cores = 6,
outcome_type = "continuous",
seed = seed,
target_outcome_lvl = 12,
epsilon = 0.5
)
#>
#> Iter: 1 fn: 222.8222 Pars: 0.02601 0.97399
#> Iter: 2 fn: 222.8222 Pars: 0.02601 0.97399
#> solnp--> Completed in 2 iterations
#>
#> Iter: 1 fn: 494.9785 Pars: 0.000007745 0.999992255
#> Iter: 2 fn: 494.9785 Pars: 0.00000003548 0.99999996452
#> solnp--> Completed in 2 iterations
#>
#> Iter: 1 fn: 487.6227 Pars: 0.999994298 0.000005701
#> Iter: 2 fn: 487.6227 Pars: 0.9999991589 0.0000008411
#> Iter: 3 fn: 487.6227 Pars: 0.9999994859 0.0000005141
#> solnp--> Completed in 3 iterations
#>
#> Iter: 1 fn: 226.2807 Pars: 0.05745 0.94255
#> Iter: 2 fn: 226.2807 Pars: 0.05745 0.94255
#> solnp--> Completed in 2 iterations
#>
#> Iter: 1 fn: 484.0375 Pars: 0.0000005784 0.9999994218
#> Iter: 2 fn: 484.0375 Pars: 0.0000001083 0.9999998917
#> solnp--> Completed in 2 iterations
#>
#> Iter: 1 fn: 479.8223 Pars: 0.37822 0.62178
#> Iter: 2 fn: 479.8223 Pars: 0.37820 0.62180
#> solnp--> Completed in 2 iterations
#>
#> Iter: 1 fn: 221.0642 Pars: 0.00000005466 0.99999994546
#> Iter: 2 fn: 221.0642 Pars: 0.00000003637 0.99999996363
#> solnp--> Completed in 2 iterations
#>
#> Iter: 1 fn: 496.3027 Pars: 0.21429 0.78571
#> Iter: 2 fn: 496.3027 Pars: 0.21398 0.78602
#> solnp--> Completed in 2 iterations
#>
#> Iter: 1 fn: 491.9260 Pars: 0.0000125 0.9999875
#> Iter: 2 fn: 491.9260 Pars: 0.000007487 0.999992513
#> solnp--> Completed in 2 iterations
#>
#> Iter: 1 fn: 104.6614 Pars: 0.08134 0.91866
#> Iter: 2 fn: 104.6614 Pars: 0.08134 0.91866
#> solnp--> Completed in 2 iterations
#>
#> Iter: 1 fn: 495.9732 Pars: 0.41785 0.58215
#> Iter: 2 fn: 495.9732 Pars: 0.41785 0.58215
#> solnp--> Completed in 2 iterations
#>
#> Iter: 1 fn: 495.9761 Pars: 0.9999993936 0.0000006071
#> Iter: 2 fn: 495.9761 Pars: 0.9999996437 0.0000003563
#> solnp--> Completed in 2 iterations
#>
#> Iter: 1 fn: 220.9891 Pars: 0.00000000428 0.99999999571
#> Iter: 2 fn: 220.9891 Pars: 0.00000000105 0.99999999895
#> solnp--> Completed in 2 iterations
#>
#> Iter: 1 fn: 496.7959 Pars: 0.32776 0.67224
#> Iter: 2 fn: 496.7959 Pars: 0.32777 0.67223
#> solnp--> Completed in 2 iterations
#>
#> Iter: 1 fn: 498.7841 Pars: 0.99999888 0.00000112
#> Iter: 2 fn: 498.7841 Pars: 0.9999997064 0.0000002936
#> solnp--> Completed in 2 iterations
#>
#> Iter: 1 fn: 235.0054 Pars: 0.03661 0.96339
#> Iter: 2 fn: 235.0054 Pars: 0.03661 0.96339
#> solnp--> Completed in 2 iterations
#>
#> Iter: 1 fn: 492.8295 Pars: 0.999995555 0.000004445
#> Iter: 2 fn: 492.8295 Pars: 0.999998947 0.000001053
#> Iter: 3 fn: 492.8295 Pars: 0.9999995304 0.0000004696
#> solnp--> Completed in 3 iterations
#>
#> Iter: 1 fn: 491.7396 Pars: 0.9999963 0.0000037
#> Iter: 2 fn: 491.7396 Pars: 0.999997994 0.000002006
#> solnp--> Completed in 2 iterations
proc.time() - ptm
#> user system elapsed
#> 74.595 3.610 976.269
oracle_parameter <- sim_results$`Oracle Pooled Results`
k_fold_results <- sim_results$`K-fold Results`
oracle_targets <- sim_results$`K Fold Oracle Targets`
Of note: these results will be more consistent with higher folds but here we use 6 so readme builds more quickly for users.
k_fold_results <- do.call(rbind, k_fold_results)
rownames(k_fold_results) <- NULL
k_fold_results %>%
kbl(caption = "List of K Fold Results") %>%
kable_classic(full_width = F, html_font = "Cambria")
Psi | Variance | SE | Lower CI | Upper CI | P-value | N | Type | Fold | Average Delta |
---|---|---|---|---|---|---|---|---|---|
-13.5086705 | 0.0273477 | 0.1653715 | -13.8328 | -13.1845 | 0.0000000 | 84 | X1 | 1 | -0.902 |
-12.1225847 | 0.2853685 | 0.5341989 | -13.1696 | -11.0756 | 0.0000000 | 84 | X5 | 1 | 2.222 |
-14.7686203 | 0.0086283 | 0.0928884 | -14.9507 | -14.5866 | 0.0000000 | 84 | X1-X5 | 1 | -0.902-2.222 |
10.8626348 | 0.1947857 | 0.4413453 | 9.9976 | 11.7277 | 0.0000000 | 84 | Interaction | 1 | -0.902-2.222 |
0.7719493 | 0.9860012 | 0.9929759 | -1.1742 | 2.7181 | 0.4385318 | 84 | X1 | 2 | -0.871 |
0.9442252 | 0.6999801 | 0.8366482 | -0.6956 | 2.5840 | 0.3019336 | 84 | X5 | 2 | 2.023 |
10.2589986 | 1.5165367 | 1.2314774 | 7.8453 | 12.6726 | 0.0000000 | 84 | X1-X5 | 2 | -0.871-2.023 |
8.5428240 | 1.0429063 | 1.0212279 | 6.5413 | 10.5444 | 0.0000000 | 84 | Interaction | 2 | -0.871-2.023 |
-0.6416571 | 0.6609212 | 0.8129706 | -2.2351 | 0.9517 | 0.4766824 | 83 | X1 | 3 | -0.893 |
-2.2660756 | 0.6944661 | 0.8333463 | -3.8994 | -0.6327 | 0.0130522 | 83 | X5 | 3 | 1.942 |
2.7737158 | 1.7005188 | 1.3040394 | 0.2178 | 5.3296 | 0.0151431 | 83 | X1-X5 | 3 | -0.893-1.942 |
5.6814485 | 0.4304117 | 0.6560577 | 4.3956 | 6.9673 | 0.0000000 | 83 | Interaction | 3 | -0.893-1.942 |
-0.9763588 | 1.0797738 | 1.0391217 | -3.0130 | 1.0603 | 0.3381620 | 83 | X1 | 4 | -0.899 |
8.4683426 | 1.0442643 | 1.0218925 | 6.4655 | 10.4712 | 0.0000000 | 83 | X5 | 4 | 1.941 |
5.4460469 | 1.9400116 | 1.3928430 | 2.7161 | 8.1760 | 0.0000039 | 83 | X1-X5 | 4 | -0.899-1.941 |
-2.0459369 | 0.7792955 | 0.8827772 | -3.7761 | -0.3157 | 0.0294401 | 83 | Interaction | 4 | -0.899-1.941 |
-9.3113201 | 0.1582846 | 0.3978500 | -10.0911 | -8.5315 | 0.0000000 | 83 | X1 | 5 | -0.939 |
-6.3934531 | 1.0026758 | 1.0013370 | -8.3560 | -4.4309 | 0.0000000 | 83 | X5 | 5 | 1.589 |
-16.1190466 | 0.1048507 | 0.3238066 | -16.7537 | -15.4844 | 0.0000000 | 83 | X1-X5 | 5 | -0.939-1.589 |
-0.4142735 | 1.3869984 | 1.1777090 | -2.7225 | 1.8940 | 0.7026539 | 83 | Interaction | 5 | -0.939-1.589 |
-6.2074296 | 0.1879825 | 0.4335695 | -7.0572 | -5.3576 | 0.0000000 | 83 | X1 | 6 | -0.858 |
-5.4070107 | 0.5718131 | 0.7561833 | -6.8891 | -3.9249 | 0.0000000 | 83 | X5 | 6 | 1.687 |
-13.2608365 | 0.1036790 | 0.3219922 | -13.8919 | -12.6297 | 0.0000000 | 83 | X1-X5 | 6 | -0.858-1.687 |
-1.6463962 | 1.2525969 | 1.1191948 | -3.8400 | 0.5472 | 0.1196468 | 83 | Interaction | 6 | -0.858-1.687 |
Here we see that X1-X5 are found in all the folds. This means that, to get to our target outcome of 15, with precision up to 0.5, these two exposures are found to most efficiently get to our target outcome under minimal intervention.
The column Psi shows the expected change in outcome under shift compared to no shift. Type indicates which variable was shifted, X1, X5, X1 and X5 and then interaction which compares X1-X5 to X1 + X5. So for example a Psi of -13.5 for X1 indicates that the outcome reduces by 13.5 when we attempt to shift X1 towards the oracle point parameter. Which in this fold is 0.05. So under a policy where we try and shift X1 towards the value 0.05 under restrictions of not violating positivity support the outcome goes down by 13.5. The deltas in Average Delta columns shows the average delta in the fold for each variable. This is the average shift we get under our support to get to the oracle parameter. For example -13.5 means that the average outcome goes down by 13.5 when shifting X1 by -0.902 on average, towards our target.
Overall, average delta column indicates the average shift away from each individuals observed exposure level in order to reach the target under restrictions.
In reality, when looking at the Average Delta column we can only reduce X1 by about 1 and increase X5 by about 2 with support from our data.
These interventions are:
oracle_targets <- do.call(rbind, oracle_targets)
oracle_targets %>%
kbl(caption = "Oracle Targets") %>%
kable_classic(full_width = F, html_font = "Cambria")
Variable1 | Variable2 | ObservedLevel1 | ObservedLevel2 | IntervenedLevel1 | IntervenedLevel2 | AvgDiff | Difference | |
---|---|---|---|---|---|---|---|---|
1 | X1 | X5 | 1.155827 | 1.242123 | 0.0566475 | 3.414815 | 1.635936 | 0.0668849 |
2 | X1 | X5 | 1.147629 | 1.218358 | 0.0481946 | 3.414815 | 1.647946 | 0.3846076 |
3 | X1 | X5 | 1.143893 | 1.235131 | 0.0481946 | 3.155947 | 1.508257 | 0.0206467 |
21 | X1 | X5 | 1.124229 | 1.236977 | 0.2923519 | 3.155947 | 1.375423 | 0.2560524 |
11 | X1 | X5 | 1.140465 | 1.249656 | 0.0481946 | 2.751177 | 1.296896 | 0.4992674 |
12 | X1 | X5 | 1.140439 | 1.240693 | 0.0481946 | 2.897264 | 1.374408 | 0.1197868 |
Here this table shows the average exposed level for each exposure, the intervened level for both exposures, this is the level the exposures are set to which gets to the target outcome most efficiently, Avg Difference, is the average difference between the intervention and observed outcome (the “effort”), and Difference is the difference between the expected outcome under intervention and the target outcome.
What we see here is that to get to the target outcome 12, where the observed average is 53, so a significant reduction, the most efficient intervention is to reduce X1 to around 0.05 and to increase X5 (due to its antagonistic relationship) to about 3.
To get more power, we do a pooled TMLE over our findings for the intervention with minimal effort that gets to our target outcome:
oracle_parameter %>%
kbl(caption = "Pooled Oracle Parameter") %>%
kable_classic(full_width = F, html_font = "Cambria")
Psi | Variance | SE | Lower CI | Upper CI | P-value | N | Type | Fold |
---|---|---|---|---|---|---|---|---|
-4.500727 | 0.0942057 | 0.3069295 | -5.1023 | -3.8992 | 0.0000000 | 500 | Var 1 | Pooled TMLE |
-5.274783 | 0.1789838 | 0.4230647 | -6.1040 | -4.4456 | 0.0000000 | 500 | Var 2 | Pooled TMLE |
-8.023789 | 0.0973612 | 0.3120275 | -8.6354 | -7.4122 | 0.0000000 | 500 | Joint | Pooled TMLE |
1.751721 | 0.1782565 | 0.4222043 | 0.9242 | 2.5792 | 0.0070199 | 500 | Interaction | Pooled TMLE |
This gives pooled estimates for the shift of each variable in the relationship individually, joint and our definition of interaction comparing the expectation of the outcome under joint shift compared to the expectations under the sum of individual shifts.
Overall, this package finds the intervention that, with minimal effort, gets to a desired outcome in a mixed exposure. It then estimates, using CV-TMLE, a policy intervention that attempts to get a population’s exposure as close as possible to this oracle intervention level without violating positivity.
In this NIEHS data set we correctly identify the most synergistic relationship built into the data.
More discussion is found in the vignette.
If you encounter any bugs or have any specific feature requests, please file an issue. Further details on filing issues are provided in our contribution guidelines.
Contributions are very welcome. Interested contributors should consult our contribution guidelines prior to submitting a pull request.
After using the IsoXshift
R package, please cite the following:
-
R/
tmle3shift
- An R package providing an independent implementation of the same core routines for the TML estimation procedure and statistical methodology as is made available here, through reliance on a unified interface for Targeted Learning provided by thetmle3
engine of thetlverse
ecosystem. -
R/
medshift
- An R package providing facilities to estimate the causal effect of stochastic treatment regimes in the mediation setting, including classical (IPW) and augmented double robust (one-step) estimators. This is an implementation of the methodology explored by Dı́az and Hejazi (2020). -
R/
haldensify
- A minimal package for estimating the conditional density treatment mechanism component of this parameter based on using the highly adaptive lasso (Coyle, Hejazi, Phillips, et al. 2022; Hejazi, Coyle, and van der Laan 2020) in combination with a pooled hazard regression. This package implements a variant of the approach advocated by Dı́az and van der Laan (2011).
The development of this software was supported in part through NIH grant P42ES004705 from NIEHS
© 2020-2022 David B. McCoy
The contents of this repository are distributed under the MIT license. See below for details:
MIT License
Copyright (c) 2020-2022 David B. McCoy
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
Coyle, Jeremy R, Nima S Hejazi, Ivana Malenica, Rachael V Phillips, and Oleg Sofrygin. 2022. “sl3: Modern Machine Learning Pipelines for Super Learning.” https://doi.org/10.5281/zenodo.1342293.
Coyle, Jeremy R, Nima S Hejazi, Rachael V Phillips, Lars W van der Laan, and Mark J van der Laan. 2022. “hal9001: The Scalable Highly Adaptive Lasso.” https://doi.org/10.5281/zenodo.3558313.
Dı́az, Iván, and Nima S Hejazi. 2020. “Causal Mediation Analysis for Stochastic Interventions.” Journal of the Royal Statistical Society: Series B (Statistical Methodology) 82 (3): 661–83. https://doi.org/10.1111/rssb.12362.
Dı́az, Iván, and Mark J van der Laan. 2011. “Super Learner Based Conditional Density Estimation with Application to Marginal Structural Models.” The International Journal of Biostatistics 7 (1): 1–20.
———. 2012. “Population Intervention Causal Effects Based on Stochastic Interventions.” Biometrics 68 (2): 541–49.
Haneuse, Sebastian, and Andrea Rotnitzky. 2013. “Estimation of the Effect of Interventions That Modify the Received Treatment.” Statistics in Medicine 32 (30): 5260–77.
Hejazi, Nima S, Jeremy R Coyle, and Mark J van der Laan. 2020. “hal9001: Scalable Highly Adaptive Lasso Regression in R.” Journal of Open Source Software 5 (53): 2526. https://doi.org/10.21105/joss.02526.