Skip to content

smweinst/PeDecURe

Repository files navigation

Penalized Decomposition Using Residuals (PeDecURe)

PeDecURe provides feature extraction with built-in adjustment for nuisance variables. Our method identifies sources of variation in that data that are shared between features (e.g., measurements derived from neuroimaging scans) and an outcome of interest (e.g., diagnosis), while substantially reducing overlap with information about nuisance variables (e.g., age or sex).

In our paper (now published in Biostatistics), we introduce the intuition behind our method and illustrate that features extracted using PeDecURe are predictive of an outcome of interest, have low correlations with nuisance variables, and show promise for out-of-sample generalizability.

Installation

# install.packages("devtools")
devtools::install_github("smweinst/PeDecURe")

Example

Notation:

  • X : feature matrix (n \times p)
  • A : matrix of nuisance variables (n \times q)
  • Y : vector (length n) with outcome labels (e.g., disease group)

Implementation using PeDecURe R package:

library(PeDecURe)

# get residuals:
resid.dat = get.resid(X,Y,A)
X.star = resid.dat$X.star
X.tilde = resid.dat$X.tilde

# tune lambda:
lambda.tune = pedecure.tune(X.orig = X,
                            X.max = X.star,
                            X.penalize = X.tilde,
                            lambdas = seq(0,10,by=0.1),
                            A = A,
                            Y = Y,
                            nPC = 3)
best.lambda = lambda.tune$lambda_tune

# run pedecure:
pedecure.out = pedecure(X = X.star,
                        X.penalize = X.tilde,
                        A = A,
                        Y = Y,
                        lambda = best.lambda,
                        nPC = 3)

# PC scores - these are our new features that can be used for an association study, predictive model, etc.
## note: X should be centered by column
PC.scores = X%*%pedecure.out$vectors

# Look at correlations between the first few PC scores and the nuisance variables (A1, A2, Y)
cor.scores = partial.cor(PC.scores, A, Y)
scores.partial.cor = cor.scores$partial$estimates
scores.marginal.cor = cor.scores$marginal$estimates

Applying PeDecURe output in a new sample

PC scores in new sample: multiply new feature matrix by PC loadings from above.

  • X.test : feature matrix in a different sample (m \times p)
PC.scores.test = X.test%*%pedecure.out$vectors
# note: pedecure.out$vectors was the output from applying PeDecURe in training sample above

If A and Y are observed in the new sample, we can also look at their correlations with the PC scores in the test sample:

  • A.test : matrix of nuisance variables in the new sample (m \times q) (if observed)
  • Y.test : vector (length m) with outcome labels (if observed)
cor.scores.test = partial.cor(PC.scores.test, A.test, Y.test)
scores.partial.cor.test = cor.scores.test$partial$estimates
scores.marginal.cor.test = cor.scores.test$marginal$estimates

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages