Skip to content

Commit

Permalink
readme update, quick fixes, pckgdown
Browse files Browse the repository at this point in the history
  • Loading branch information
blind-contours committed Mar 8, 2024
1 parent 0919aa7 commit d0e5539
Show file tree
Hide file tree
Showing 84 changed files with 12,110 additions and 703 deletions.
132 changes: 73 additions & 59 deletions R/InterXshift.R
Original file line number Diff line number Diff line change
@@ -1,18 +1,9 @@
#' @title Data-adaptive estimation of interactions, effect modification, and
#' mediation using stochastic shift intervention target parameters. In many mixed exposure settings,
#' interactions in the mixture, effect modifiers in the covariates that modify the
#' impact of an exposure and mediating pathways from exposure to outcome are generally unknown. SuperNOVA finds these variable sets
#' on one part of the data and estimates counterfactual outcome changes given shifts to exposure on an estimation part of the data.
#' Using cross-validation and targeted learning, estimators are created that utlize machine learning that are unbiased and have the
#' minimum variance.
#' @title InterXshift
#'
#' @description The SuperNOVA function provides an efficient approach to estimate
#' interactions, effect modification, and mediation using targeted minimum loss
#' estimators for counterfactual mean differences under various target parameters.
#' The procedure employs data-adaptive ensemble b-spline models and stochastic interventions,
#' leveraging the \pkg{sl3} package for ensemble machine learning. The data is split into V folds, in each fold
#' the training data is used to find variable sets using flexible basis function estimators. Given the different variable sets,
#' stochastic intervention target parameters are applied with cross-validated targeted learning.
#' @description Under a fixed shift to exposures identify using g-computation the joint shift of
#' pairwise exposures in a mixed exposure compared to the additive individual shifts. Positive values indicate
#' synergy and negative antagonism, get the top synergy and antagonism results and use CV-TMLE to efficiently
#' estimate the interaction target parameter.
#'
#' @param w A \code{matrix}, \code{data.frame}, or similar containing a set of
#' baseline covariates. These variables are measured before exposures.
Expand All @@ -24,7 +15,7 @@
#' @param deltas A \code{numeric} value indicating the shift in exposures to
#' define the target parameter, with respect to the scale of the exposures (A). If adaptive_delta
#' is true, these values will be reduced.
#' @param var_sets A list specifying variable sets for deterministic SuperNOVA usage.
#' @param var_sets A list specifying variable sets for deterministic InterXshift usage.
#' Example: var_sets <- c("A_1", "A_1-Z_2") where the analyst provides variable sets
#' for exposures, exposure-mediator, or exposure-covariate relationships.
#' @param estimator The type of estimator to fit: \code{"tmle"} for targeted
Expand All @@ -40,10 +31,6 @@
#' @param zeta_learner Learners for fitting Super Learner ensembles to the outcome model via \pkg{sl3}..
#' @param n_folds Number of folds to use in cross-validation, default is 2.
#' @param outcome_type Data type of the outcome, default is "continuous".
#' @param mediator_type Data type of the mediator, default is "continuous".
#' @param quantile_thresh Threshold based on quantiles of the F-statistic, used to
#' identify "important" basis functions in the data-adaptive procedure.
#' @param verbose Whether to run verbosely (default: FALSE).
#' @param parallel Whether to parallelize across cores (default: TRUE).
#' @param parallel_type Type of parallelization to use if parallel is TRUE:
#' "multi_session" (default), "multicore", or "sequential".
Expand All @@ -53,24 +40,6 @@
#' @param adaptive_delta If TRUE, reduces the user-specified delta until
#' the Hn calculated for a shift does not have any observation greater
#' than hn_trunc_thresh (default: FALSE).
#' @param n_mc_sample Number of iterations to be used for the Monte Carlo integration
#' procedure when using continuous exposures (default: 1000).
#' @param exposure_quantized Whether the exposure has been discretized into bins,
#' in which case the integration procedure is skipped and weighted sums are used instead (default: FALSE).
#' @param mediator_quantized If the mediator is discretized, a multinomial ML function
#' is used in this regression to avoid density estimation (default: FALSE).
#' @param density_type Type of density estimation to be used: "sl" for Super Learner
#' (default) or "hal" for highly adaptive lasso.
#' @param n_bins Number of bins for quantizing the exposure if mediation is detected (default: 10).
#' @param max_degree Maximum degree of interactions used in the highly adaptive lasso
#' density estimator if used (default: 1).
#' @param integration_method Type of integration to be used in the continuous exposure
#' case: "MC" for Monte Carlo integration (default) or "AQ" for adaptive quadrature.
#' @param use_multinomial Whether to use multinomial regression for binned exposures
#' (default: FALSE).
#' @param discover_only TRUE/FALSE. If TRUE, only the data-adaptive path discovery
#' is done. No estimates are delivered only exposure mediator sets. If FALSE paths
#' are both discovered and estimated.
#'
#' @return An S3 object of class \code{SuperNOVA} containing the results of the
#' procedure to compute a TML or one-step estimate of the counterfactual mean
Expand Down Expand Up @@ -102,7 +71,6 @@ InterXshift <- function(w,
zeta_learner = NULL,
n_folds = 2,
outcome_type = "continuous",
verbose = FALSE,
parallel = TRUE,
parallel_type = "multi_session",
num_cores = 2,
Expand Down Expand Up @@ -208,10 +176,6 @@ InterXshift <- function(w,
.options = furrr::furrr_options(seed = seed, packages = "InterXshift")
)

if (discover_only == TRUE) {
return(fold_basis_results)
}

pos_rank_fold_results <- list()
neg_rank_fold_results <- list()
synergy_rank_fold_results <- list()
Expand Down Expand Up @@ -498,14 +462,14 @@ InterXshift <- function(w,
at <- data_internal[data_internal$folds != fold_k, ]
av <- data_internal[data_internal$folds == fold_k, ]

exposure_1 <- fold_synergy_effects$Variable1[[i]]
exposure_2 <- fold_synergy_effects$Variable2[[i]]
exposure_1 <- fold_antagonism_effects$Variable1[[i]]
exposure_2 <- fold_antagonism_effects$Variable2[[i]]

exposures <- as.list(c(exposure_1, exposure_2))
delta <- deltas[unlist(exposures)]
exposures[[3]] <- unlist(exposures)

rank <- fold_positive_effects$Rank[i]
rank <- fold_antagonism_effects$Rank[i]

covars <- c(w_names)

Expand Down Expand Up @@ -620,48 +584,98 @@ InterXshift <- function(w,
top_antagonism_results <- unlist(top_antagonism_results, recursive = FALSE)


pooled_pos_shift_results <- calc_pooled_indiv_shifts(
## calculate pooled results based on rank
pooled_pos_shift_results_rank <- calc_pooled_indiv_shifts(
indiv_shift_results = top_positive_results,
estimator = estimator,
fluctuation = fluctuation,
n_folds = n_folds
n_folds = n_folds,
rank = TRUE
)

pooled_neg_shift_results <- calc_pooled_indiv_shifts(
pooled_neg_shift_results_rank <- calc_pooled_indiv_shifts(
indiv_shift_results = top_negative_results,
estimator = estimator,
fluctuation = fluctuation,
n_folds = n_folds
n_folds = n_folds,
rank = TRUE
)

pooled_synergy_shift_results <- calc_pooled_intxn_shifts(
pooled_synergy_shift_results_rank <- calc_pooled_intxn_shifts(
intxn_shift_results = top_synergy_results,
estimator = estimator,
a_names = a_names,
w_names = w_names,
z_names = z_names,
fluctuation = fluctuation,
n_folds = n_folds
n_folds = n_folds,
rank = TRUE
)

pooled_antagonism_shift_results <- calc_pooled_intxn_shifts(
pooled_antagonism_shift_results_rank <- calc_pooled_intxn_shifts(
intxn_shift_results = top_antagonism_results,
estimator = estimator,
a_names = a_names,
w_names = w_names,
z_names = z_names,
fluctuation = fluctuation,
n_folds = n_folds
n_folds = n_folds,
rank = TRUE
)


## calculate pooled results based on variable set
pooled_pos_shift_results_varset <- calc_pooled_indiv_shifts(
indiv_shift_results = top_positive_results,
estimator = estimator,
fluctuation = fluctuation,
n_folds = n_folds,
rank = FALSE
)

pooled_neg_shift_results_varset <- calc_pooled_indiv_shifts(
indiv_shift_results = top_negative_results,
estimator = estimator,
fluctuation = fluctuation,
n_folds = n_folds,
rank = FALSE
)

pooled_synergy_shift_results_varset <- calc_pooled_intxn_shifts(
intxn_shift_results = top_synergy_results,
estimator = estimator,
a_names = a_names,
w_names = w_names,
z_names = z_names,
fluctuation = fluctuation,
n_folds = n_folds,
rank = FALSE
)

pooled_antagonism_shift_results_varset <- calc_pooled_intxn_shifts(
intxn_shift_results = top_antagonism_results,
estimator = estimator,
a_names = a_names,
w_names = w_names,
z_names = z_names,
fluctuation = fluctuation,
n_folds = n_folds,
rank = FALSE
)



results_list <- list(
"Pos Shift Results" = pooled_pos_shift_results,
"Neg Shift Results" = pooled_neg_shift_results,
"K Fold Synergy Results" = pooled_synergy_shift_results$k_fold_results,
"Pooled Synergy Results" = pooled_synergy_shift_results$pooled_results,
"K Fold Antagonism Results" = pooled_antagonism_shift_results$k_fold_results,
"Pooled Antagonism Results" = pooled_antagonism_shift_results$pooled_results
"Pos Shift Results by Rank" = pooled_pos_shift_results_rank,
"Neg Shift Results by Rank" = pooled_neg_shift_results_rank,
"Pos Shift Results by Exposure Set" = pooled_pos_shift_results_varset,
"Neg Shift Results by Exposure Set" = pooled_neg_shift_results_varset,
"K Fold Synergy Results" = pooled_synergy_shift_results_rank$k_fold_results,
"Pooled Synergy Results by Rank" = pooled_synergy_shift_results_rank$pooled_results,
"K Fold Antagonism Results" = pooled_antagonism_shift_results_rank$k_fold_results,
"Pooled Antagonism Results by Rank" = pooled_antagonism_shift_results_rank$pooled_results,
"Pooled Synergy Results by Exposure Set" = pooled_synergy_shift_results_varset$pooled_results,
"Pooled Antagonism Results by Exposure Set" = pooled_antagonism_shift_results_varset$pooled_results
)

return(results_list)
Expand Down
10 changes: 8 additions & 2 deletions R/calc_pooled_indiv_shifts.R
Original file line number Diff line number Diff line change
Expand Up @@ -26,14 +26,20 @@ calc_pooled_indiv_shifts <- function(indiv_shift_results,
estimator = c("tmle", "onestep"),
fluc_mod_out = NULL,
fluctuation,
n_folds) {
n_folds,
rank = TRUE) {
# set TMLE as default estimator type
estimator <- match.arg(estimator)

results_list <- list()

names <- names(indiv_shift_results)
names <- gsub("^(Rank [0-9]+) :.*", "\\1", names)

if (rank == TRUE) {
names <- gsub("^(Rank [0-9]+) :.*", "\\1", names)
}else{
names <- gsub("^.+?:", "", names)
}


for (var_set in unique(names)) {
Expand Down
9 changes: 7 additions & 2 deletions R/calc_pooled_intxn_shifts.R
Original file line number Diff line number Diff line change
Expand Up @@ -32,13 +32,18 @@ calc_pooled_intxn_shifts <- function(intxn_shift_results,
w_names,
z_names,
fluctuation,
n_folds) {
n_folds,
rank = TRUE) {
# set TMLE as default estimator type
estimator <- match.arg(estimator)

names <- names(intxn_shift_results)
names <- gsub("^(Rank [0-9]+) :.*", "\\1", names)

if (rank == TRUE) {
names <- gsub("^(Rank [0-9]+) :.*", "\\1", names)
}else{
names <- gsub("^.+?:", "", names)
}
k_fold_results_list <- list()
pooled_results_list <- list()

Expand Down
Loading

0 comments on commit d0e5539

Please sign in to comment.