Skip to content

Quick Start

ejh243 edited this page May 24, 2023 · 1 revision

Quick Start

Once you have successfully installed the CETYGO package you can recalculate your cell composition variables and associated CETYGO score.

Within the CETYGO package we have provided functions and a a pre-trained model modelBloodCoef to enable the estimate of the composition of major blood cell types, as well as the CETYGO score from a matrix of (normalised) beta values. We have also provided 10 exemplar whole blood profiles generated with the 450K array in the R object bulkdata. Using these together we can quickly recalculate both the cellular proportions for six blood cell types and the CETYGO score for each sample. This can be done with the following code.


library(CETYGO)

rowIndex<-rownames(bulkdata)[rownames(bulkdata) %in% rownames(modelBloodCoef)]
predProp<-projectCellTypeWithError(bulkdata, modelBloodCoef[rowIndex,])

head(predProp)

For more details, and examples demonstrating how to implement the standard workflow including normalisation of the reference data with the bulk tissue (test) data, please see the vignette included with the package.

Interpretation of the CETYGO score

We have profiled the behaviour of the CETYGO score when applied to whole blood across a large number of empirical datasets and provide the following guidance for it's interpretation.

  1. CETYGO > 0.1 indicates the sample is not composed of the reference cell types, and is potentially the wrong tissue.

  2. Elevated CETYGO can be indicative of a technically poor DNA methylation profile.

  3. Purified cell types have higher CETYGO scores than bulk tissues.

  4. Profiles generated with the EPIC array are associated with higher CETYGO scores than the 450K array.

  5. Using our pre-trained model modelBloodCoef, across 3001 whole blood samples profiled with the 450K array, the median CETYGO score was 0.045 and the 95% "inter-quartile" range was 0.040-0.061. Across 3350 whole blood samples profiled with the EPIC array, the median CETYGO score was 0.057 and the 95% "inter-quartile" range was 0.050 - 0.069. We can use these results to propose an acceptable range of values. However, it is evident that this is technology and reference panel specific and therefore these boundaries may not be well calibrated for all applications.