-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
3b76a2d
commit c946ac0
Showing
1 changed file
with
84 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,84 @@ | ||
# boostDM-CH pipeline | ||
|
||
**BoostDM-CH** is a method to score all possible single base substitutions in clonal hematopoiesis (CH) driver genes for their potential to drive CH. | ||
|
||
The method's rationale and analysis results thereof are described in this study: | ||
|
||
**Identification of Clonal Hematopoiesis Driver Mutations through In Silico Saturation Mutagenesis** | ||
Santiago Demajo, Joan Enric Ramis-Zaldivar, Ferran Muiños, Miguel L Grau, Maria Andrianova, Nuria Lopez-Bigas, Abel Gonzalez-Perez | ||
medRxiv 2023.12.13.23299893 | ||
URL: [https://doi.org/10.1101/2023.12.13.23299893](https://doi.org/10.1101/2023.12.13.23299893) | ||
|
||
The method is based on two previous projects: **IntOGen-CH** and **boostDM** | ||
|
||
**IntOGen-CH** pipeline provides a compendium of CH driver genes and mutational features upon using the **IntOGen** pipeline with a catalog of high VAF somatic mutations called in normal blood from tumor samples by reverse variant calling, which has been described in this study: | ||
|
||
**Discovering the drivers of clonal hematopoiesis** | ||
Oriol Pich, Iker Reyes-Salazar, Abel Gonzalez-Perez, Nuria Lopez-Bigas | ||
URL: [https://doi.org/10.1038/s41467-022-31878-0](https://doi.org/10.1038/s41467-022-31878-0) | ||
|
||
**boostDM** is a methodology to annotate mutations in cancer genes for their potential to drive tumorigenesis, which has been described in another study: | ||
|
||
**In silico saturation mutagenesis of cancer genes** | ||
Ferran Muiños, Francisco Martinez-Jimenez, Oriol Pich, Abel Gonzalez-Perez, Nuria Lopez-Bigas | ||
URL: [https://www.nature.com/articles/s41586-021-03771-1](https://www.nature.com/articles/s41586-021-03771-1) | ||
|
||
## Content | ||
This repo contains the source code to reproduce the training and prediction steps of the boostDM-CH pipeline, | ||
starting from the output data coming after the pipeline of IntOGen-CH | ||
|
||
## Prerequisites | ||
|
||
#### HPC | ||
|
||
It is strongly recommended to run this pipeline in an HPC environment. | ||
|
||
#### Singularity | ||
|
||
The pipeline makes use of Singularity to handle software dependencies. | ||
|
||
* Create a singularity image using the [Singularity](https://github.com/bbglab/boostdm-pipeline/blob/ch/containers_build/boostdm/Singularity) recipe: | ||
``` | ||
$ cd containers_build/boostdm/ | ||
$ sudo singularity build boostdm.sif Singularity | ||
``` | ||
* Copy the singularity image in the root | ||
``` | ||
$ cp containers_build/boostdm/boostdm.sif . | ||
``` | ||
|
||
#### Nextflow | ||
|
||
The pipeline runs with the [nextflow](https://www.nextflow.io/) workflow orchestrator. | ||
The boostDM-CH pipeline has been tested with nextflow version 20.07.1 which you can install e.g. with conda: | ||
``` | ||
conda install -c bioconda nextflow=20.07.1 | ||
``` | ||
|
||
## Running the pipeline | ||
|
||
#### Configuration | ||
|
||
Consider the following adjustments before running the workflow: | ||
|
||
* Adjust the values in the **nextflow.config** file. In particular, check the following: | ||
|
||
* env: **INTOGEN_DATASETS**, **DRIVERS_PATH** and **COHORTS_PATH** paths | ||
* singularity: **cacheDir** and **runOptions** paths. | ||
* process: make sure **container** matches the name of the singularity image file. | ||
* profile: include any custom execution profiles in the **config/** folder | ||
|
||
* Adjust the **OUTPUT** parameter to match the **output/** folder in the **boostdm.nf** nextflow script. | ||
|
||
* Execution profiles are defined in the **config/** folder. Make sure that your custom profile matches | ||
the computational resources available. | ||
|
||
#### Run with nextflow | ||
|
||
``` | ||
$ nextflow run boostdm.nf -profile <execution-profile> -resume | ||
``` | ||
|
||
#### Run details | ||
|
||
When the pipeline finishes, nextflow provides a summary of all the processes run in the file **trace.txt**. |