-
Notifications
You must be signed in to change notification settings - Fork 0
Quick Start
Once you have successfully installed the CETYGO package you can recalculate your cell composition variables and associated CETYGO score.
Within the CETYGO package we have provided functions and a a pre-trained model modelBloodCoef
to
enable the estimate of the composition of major blood cell types, as well as the CETYGO score
from a matrix of (normalised) beta values. We have also provided 10
exemplar whole blood profiles generated with the 450K array in the R object
bulkdata
. Using these together we can quickly recalculate both the cellular
proportions for six blood cell types and the CETYGO score for each sample. This
can be done with the following code.
library(CETYGO)
rowIndex<-rownames(bulkdata)[rownames(bulkdata) %in% rownames(modelBloodCoef)]
predProp<-projectCellTypeWithError(bulkdata, modelBloodCoef[rowIndex,])
head(predProp)
For more details, and examples demonstrating how to implement the standard workflow including normalisation of the reference data with the bulk tissue (test) data, please see the vignette included with the package.
We have profiled the behaviour of the CETYGO score when applied to whole blood across a large number of empirical datasets and provide the following guidance for it's interpretation.
-
CETYGO > 0.1 indicates the sample is not composed of the reference cell types, and is potentially the wrong tissue.
-
Elevated CETYGO can be indicative of a technically poor DNA methylation profile.
-
Purified cell types have higher CETYGO scores than bulk tissues.
-
Profiles generated with the EPIC array are associated with higher CETYGO scores than the 450K array.
-
Using our pre-trained model
modelBloodCoef
, across 3001 whole blood samples profiled with the 450K array, the median CETYGO score was 0.045 and the 95% "inter-quartile" range was 0.040-0.061. Across 3350 whole blood samples profiled with the EPIC array, the median CETYGO score was 0.057 and the 95% "inter-quartile" range was 0.050 - 0.069. We can use these results to propose an acceptable range of values. However, it is evident that this is technology and reference panel specific and therefore these boundaries may not be well calibrated for all applications.