DoRothEA is a gene set resource containing signed transcription factor (TF) - target interactions developed by Garcia-Alonso et al., 2019. The collection of a TF and its transcriptional targets is defined as regulon. DoRothEA regulons were curated and collected from different types of evidence such as literature curated resources, ChIP-seq peaks, TF binding site motifs and interactions inferred directly from gene expression.
For each TF-target interaction we assigned a confidence level based on the number of supporting evidence. The confidence assigment comprises five levels, ranging from A (highest confidence) to E (lowest confidence). Interactions that are supported by all four lines of evidence, manually curated by experts in specific reviews, or supported both in at least two curated resources are considered to be highly reliable and were assigned an A level. Level B-D are reserved for curated and/or ChIP-seq interactions with different levels of additional evidence. Finally, E level is used for interactions that are uniquely supported by computational predictions. To provide the most confident regulon for each TF, we aggregated the TF-target interactions with the highest possible confidence score that resulted in a regulon size equal to or greater than ten targets. The final confidence level assigned to the TF regulon is the lowest confidence score of its component targets.
DoRothEA regulons can be coupled with several statistical method yielding a functional analysis-tool to infer TF activity from gene expression data. The activity is computed by considering not the gene expression of the TFs itself but the mRNA levels of their direct transcriptional targets. We define the transcriptional targets to as footprints of a TF on gene expression. A more detailed description of the concept of footprint-based analysis is available in the review Dugourd et al., 2019.
Typcially, DoRothEA is coupled with the statistical method VIPER as it incorporates the mode of regulation of each TF-target interaction. However, VIPER can be replaced by any other statistical method that aims to analyse gene sets, e.g. GSEA.
DoRothEA as it stands is intended only for academic use as in contains resources whose licenses don't permit commerical use. Howevever, we developed a non-academic version of DoRothEA by removing the critically resources (namely KEGG and TRED from the curated databases). You find the non-academic package with the regulons here
# install from bioconductor
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("dorothea")
# install the development version from GitHub
# install.packages("devtools")
devtools::install_github("saezlab/dorothea")
Originally DoRothEA contained only human regulons. In a benchmark study we showed that DoRothEA in combination with VIPER is also applicable to mouse data, as described in Holland et al., 2019. Accordingly, we developed a mouse version of DoRothEA by transforming the human genes to their mouse orthologs.
Recent technological advances in single-cell RNA-seq enable the profiling of gene expression at the individual cell level. We showed that DoRothEA in combination with VIPER can be applied to scRNA-seq data, as described in Holland et al., 2020.
Beside the original paper there are two additional papers expanding the usage of DoRothEA regulons.
- If you use DoRothEA for your research please cite the original publication:
Garcia-Alonso L, Holland CH, Ibrahim MM, Turei D, Saez-Rodriguez J. "Benchmark and integration of resources for the estimation of human transcription factor activities." Genome Research. 2019. DOI: 10.1101/gr.240663.118.
- If you use the mouse version of DoRothEA please cite additionally:
Holland CH, Szalai B, Saez-Rodriguez J. "Transfer of regulatory knowledge from human to mouse for functional genomics analysis." Biochimica et Biophysica Acta (BBA) - Gene Regulatory Mechanisms. 2019. DOI: 10.1016/j.bbagrm.2019.194431.
- If you apply DoRothEA's regulons on single-cell RNA-seq data please cite additionally:
Holland CH, Tanevski J, Perales-Patón J, Gleixner J, Kumar MP, Mereu E, Joughin BA, Stegle O, Lauffenburger DA, Heyn H, Szalai B, Saez-Rodriguez, J. "Robustness and applicability of transcription factor and pathway analysis tools on single-cell RNA-seq data." Genome Biology. 2020. DOI: 10.1186/s13059-020-1949-z.