Skip to content

Commit

Permalink
Merged in xengsort (pull request #192)
Browse files Browse the repository at this point in the history
Xengsort
  • Loading branch information
MikeWLloyd committed May 7, 2024
2 parents 07d4c11 + d52cb0b commit 382b138
Show file tree
Hide file tree
Showing 39 changed files with 299 additions and 98 deletions.
2 changes: 1 addition & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -13,4 +13,4 @@ test.csv
test2.csv
.nf-test
.nf-test.log
nf-test-report.tap
nf-test-report*
6 changes: 5 additions & 1 deletion .zenodo.json
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
{
"upload_type": "software",
"description": "v0.6.1 Release. See https://github.com/TheJacksonLaboratory/cs-nf-pipelines/wiki",
"description": "See https://github.com/TheJacksonLaboratory/cs-nf-pipelines/wiki",
"title": "cs-nf-pipelines",
"creators": [
{
Expand Down Expand Up @@ -36,6 +36,10 @@
"affiliation": "The Jackson Laboratory",
"name": "Gabriel Rech"
},
{
"affiliation": "The Jackson Laboratory",
"name": "Ardian Ferraj"
},
{
"affiliation": "The Jackson Laboratory",
"name": "Anuj Srivastava"
Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.11068737.svg)](https://doi.org/10.5281/zenodo.11068737)
[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.11068736.svg)](https://doi.org/10.5281/zenodo.11068736)

# JAX NGS Operations Nextflow DSL2 Pipelines

Expand Down
25 changes: 25 additions & 0 deletions ReleaseNotes.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,30 @@
# RELEASE NOTES

## Release 0.6.3

In this release we change the read disambiguation tool Xenome for Xengsort. Extensive benchmarking shows high concordance among results obtained from both tools.

Additionally, we correct an issue with the human PTA workflow when running the combination of the `--pdx` and `--split_fastq` options. Data run with this combination of options from version 0.6.0-0.6.2 should be re-run.

### Pipelines Added:

None

### Modules Added:

1. xengsort/xengsort_classify.nf
1. xengsort/xengsort_index.nf

### Pipeline Changes:

1. Xengsort replaces Xenome for all PDX based workflows (RNAseq, RNA fusion, Hs PTA, Somatic WES, Somatic WES PTA)
1. Correction made for the Human PTA when running the combination of the `--pdx` and `--split_fastq` options.

### Module Changes:

None


## Release 0.6.2

In this minor release we adjust memory and wall clock statements, and modified `bin/pta/merge-caller-vcfs.r` to correct for an edge case related bug.
Expand Down
6 changes: 4 additions & 2 deletions bin/help/pta.nf
Original file line number Diff line number Diff line change
Expand Up @@ -12,8 +12,10 @@ The following are human specific parameters. To see help for mouse, add `--gen_o
--csv_input | /<FILE_PATH> | CSV delimited sample sheet that controls how samples are processed. The required input header is: patient,sex,status,sampleID,lane,fastq_1,fastq_2. See the repository wiki (https://github.com/TheJacksonLaboratory/cs-nf-pipelines/wiki) for additional information.
--xenome_prefix | /projects/compsci/omics_share/human/GRCh38/supporting_files/xenome/trans_human_GRCh38_84_NOD_based_on_mm10_k25| Xenome index for deconvolution of human and mouse reads. Used when `--pdx` is run.
--pdx | false | Options: false, true. If specified, 'Xenome' is run on reads to deconvolute human and mouse reads. Human only reads are used in analysis.
--pdx | false | Options: false, true. If specified, 'Xengsort' is run on reads to deconvolute human and mouse reads. Human only reads are used in analysis.
--xengsort_host_fasta | '/projects/compsci/omics_share/mouse/GRCm39/genome/sequence/imputed/rel_2112_v8/NOD_ShiLtJ.39.fa' | Xengsort host fasta file. Used by Xengsort Index when `--pdx` is run, and xengsort_idx_path is `null` or false.
--xengsort_idx_path | '/projects/compsci/omics_share/human/GRCh38/supporting_files/xengsort' | Xengsort index for deconvolution of human and mouse reads. Used when `--pdx` is run. If `null`, Xengsort Index is run using ref_fa and host_fa.
--xengsort_idx_name | 'hg38_GRCm39-NOD_ShiLtJ' | Xengsort index name associated with files located in `xengsort_idx_path` or name given to outputs produced by Xengsort Index
--deduplicate_reads | false | Options: false, true. If specified, run bbmap clumpify on input reads. Clumpify will deduplicate reads prior to trimming. This can help with mapping and downstream steps when analyzing high coverage WGS data.
Expand Down
7 changes: 5 additions & 2 deletions bin/help/rna_fusion.nf
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,6 @@ Parameter | Default | Description
--gen_org | mouse | Options: mouse and human.
--xenome_prefix | /projects/compsci/omics_share/human/GRCh38/supporting_files/xenome/trans_human_GRCh38_84_NOD_based_on_mm10_k25| Xenome index for deconvolution of human and mouse reads. Used when `--pdx` is run.
--read_length | 150 | Options: 75, 100, 150. Changed relative to sample read length.
--star_index | /projects/omics_share/human/GRCh38/transcriptome/indices/rna_fusion/star/star-2.7.4a-150bp | STAR index used by several tools. Change the index relative to sample read length. Read length options: 75, 100, 150.
--star_fusion_star_index | /projects/omics_share/human/GRCh38/transcriptome/indices/rna_fusion/starfusion/star-150 | STAR-fusion index. Change the index relative to sample read length. Read length options: 75, 100, 150.
Expand Down Expand Up @@ -47,7 +46,11 @@ Parameter | Default | Description
--fusion_report_opt | null | Additional fusion-report options can be provided.
--databases | /projects/compsci/omics_share/human/GRCh38/supporting_files/rna_fusion_dbs | Fusion-report databases of known fusion events. Used in report generation only.
--pdx | false | Options: false, true. If specified, 'Xenome' is run on reads to deconvolute human and mouse reads. Human only reads are used in analysis.
--pdx | false | Options: false, true. If specified, 'Xengsort' is run on reads to deconvolute human and mouse reads. Human only reads are used in analysis.
--ref_fa | '/projects/compsci/omics_share/human/GRCh38/genome/sequence/gatk/Homo_sapiens_assembly38.fasta'| Xengsort graft fasta file. Used by Xengsort Index when `--pdx` is run, and xengsort_idx_path is `null` or false.
--xengsort_host_fasta | '/projects/compsci/omics_share/mouse/GRCm39/genome/sequence/imputed/rel_2112_v8/NOD_ShiLtJ.39.fa' | Xengsort host fasta file. Used by Xengsort Index when `--pdx` is run, and xengsort_idx_path is `null` or false.
--xengsort_idx_path | '/projects/compsci/omics_share/human/GRCh38/supporting_files/xengsort' | Xengsort index for deconvolution of human and mouse reads. Used when `--pdx` is run. If `null`, Xengsort Index is run using ref_fa and host_fa.
--xengsort_idx_name | 'hg38_GRCm39-NOD_ShiLtJ' | Xengsort index name associated with files located in `xengsort_idx_path` or name given to outputs produced by Xengsort Index
'''
}
Expand Down
10 changes: 6 additions & 4 deletions bin/help/rnaseq.nf
Original file line number Diff line number Diff line change
Expand Up @@ -18,9 +18,6 @@ Parameter | Default | Description
--gen_org | mouse | Options: mouse and human.
--genome_build | 'GRCm38' | Mouse specific. Options: GRCm38 or GRCm39. If gen_org == human, build defaults to GRCm38.
--pdx | false | Options: true or false. If 'true' Xenome is run to remove mouse reads from samples.
--xenome_prefix | '/projects/compsci/omics_share/human/GRCh38/supporting_files/xenome/trans_human_GRCh38_84_NOD_based_on_mm10_k25' | Pre-compiled Xenome classification index files. Used if PDX analysis is specified.
--quality_phred | 15 | The quality value that is required for a base to pass. Default: 15 which is a phred quality score of >=Q15.
--unqualified_perc | 40 | Percent of bases that are allowed to be unqualified (0~100). Default: 40 which is 40%.
--detect_adapter_for_pe | false | If true, adapter auto-detection is used for paired end data. By default, paired-end data adapter sequence auto-detection is disabled as the adapters can be trimmed by overlap analysis. However, --detect_adapter_for_pe will enable it. Fastp will run a little slower if you specify the sequence adapters or enable adapter auto-detection, but usually result in a slightly cleaner output, since the overlap analysis may fail due to sequencing errors or adapter dimers.
Expand Down Expand Up @@ -50,8 +47,13 @@ Parameter | Default | Description
| Human: '/projects/omics_share/human/GRCh38/transcriptome/annotation/ensembl/v104/Homo_sapiens.GRCh38.104.chr_patch_hapl_scaff.rRNA.interval_list'
| The coverage metric calculation step requires this file. Refers to human assembly when --gen_org human. JAX users should not change this parameter.
--pdx | false | Options: false, true. If specified, 'Xenome' is run on reads to deconvolute human and mouse reads. Human only reads are used in analysis.
--pdx | false | Options: false, true. If specified, 'Xengsort' is run on reads to deconvolute human and mouse reads. Human only reads are used in analysis.
--classifier_table | '/projects/compsci/omics_share/human/GRCh38/supporting_files/rna_ebv_classifier/EBVlym_classifier_table_48.txt' | EBV expected gene signatures used in EBV classifier. Only used when '--pdx' is run.
--ref_fa | '/projects/compsci/omics_share/human/GRCh38/genome/sequence/gatk/Homo_sapiens_assembly38.fasta'| Xengsort graft fasta file. Used by Xengsort Index when `--pdx` is run, and xengsort_idx_path is `null` or false.
--xengsort_host_fasta | '/projects/compsci/omics_share/mouse/GRCm39/genome/sequence/imputed/rel_2112_v8/NOD_ShiLtJ.39.fa' | Xengsort host fasta file. Used by Xengsort Index when `--pdx` is run, and xengsort_idx_path is `null` or false.
--xengsort_idx_path | '/projects/compsci/omics_share/human/GRCh38/supporting_files/xengsort' | Xengsort index for deconvolution of human and mouse reads. Used when `--pdx` is run. If `null`, Xengsort Index is run using ref_fa and host_fa.
--xengsort_idx_name | 'hg38_GRCm39-NOD_ShiLtJ' | Xengsort index name associated with files located in `xengsort_idx_path` or name given to outputs produced by Xengsort Index
There are two additional parameters that are human specific. They are:
Expand Down
6 changes: 4 additions & 2 deletions bin/help/somatic_wes.nf
Original file line number Diff line number Diff line change
Expand Up @@ -21,8 +21,10 @@ Parameter | Default | Description
--unqualified_perc | 40 | Percent of bases that are allowed to be unqualified (0~100). Default: 40 which is 40%.
--detect_adapter_for_pe | false | If true, adapter auto-detection is used for paired end data. By default, paired-end data adapter sequence auto-detection is disabled as the adapters can be trimmed by overlap analysis. However, --detect_adapter_for_pe will enable it. Fastp will run a little slower if you specify the sequence adapters or enable adapter auto-detection, but usually result in a slightly cleaner output, since the overlap analysis may fail due to sequencing errors or adapter dimers.
--pdx | false | Options: false, true. If specified, 'Xenome' is run on reads to deconvolute human and mouse reads. Human only reads are used in analysis.
--xenome_prefix | /projects/compsci/omics_share/human/GRCh38/supporting_files/xenome/trans_human_GRCh38_84_NOD_based_on_mm10_k25| Xenome index for deconvolution of human and mouse reads. Used when `--pdx` is run.
--pdx | false | Options: false, true. If specified, 'Xengsort' is run on reads to deconvolute human and mouse reads. Human only reads are used in analysis.
--xengsort_host_fasta | '/projects/compsci/omics_share/mouse/GRCm39/genome/sequence/imputed/rel_2112_v8/NOD_ShiLtJ.39.fa' | Xengsort host fasta file. Used by Xengsort Index when `--pdx` is run, and xengsort_idx_path is `null` or false.
--xengsort_idx_path = | '/projects/compsci/omics_share/human/GRCh38/supporting_files/xengsort' | Xengsort index for deconvolution of human and mouse reads. Used when `--pdx` is run. If `null`, Xengsort Index is run using ref_fa and host_fa.
--xengsort_idx_name = | 'hg38_GRCm39-NOD_ShiLtJ' | Xengsort index name associated with files located in `xengsort_idx_path` or name given to outputs produced by Xengsort Index
--genotype_targets | '/projects/compsci/omics_share/human/GRCh38/supporting_files/ancestry_panel/snp_panel_v2_targets_annotations.snpwt.bed.gz' | Target SNP bed file for the ancestry panel. Can contain annotation information.
--snpID_list | '/projects/compsci/omics_share/human/GRCh38/supporting_files/ancestry_panel/snp_panel_v2.list' | Target SNPs in list used in BCFtools filtering step
Expand Down
6 changes: 4 additions & 2 deletions bin/help/somatic_wes_pta.nf
Original file line number Diff line number Diff line change
Expand Up @@ -21,8 +21,10 @@ Parameter | Default | Description
--unqualified_perc | 40 | Percent of bases that are allowed to be unqualified (0~100). Default: 40 which is 40%.
--detect_adapter_for_pe | false | If true, adapter auto-detection is used for paired end data. By default, paired-end data adapter sequence auto-detection is disabled as the adapters can be trimmed by overlap analysis. However, --detect_adapter_for_pe will enable it. Fastp will run a little slower if you specify the sequence adapters or enable adapter auto-detection, but usually result in a slightly cleaner output, since the overlap analysis may fail due to sequencing errors or adapter dimers.
--pdx | false | Options: false, true. If specified, 'Xenome' is run on reads to deconvolute human and mouse reads. Human only reads are used in analysis.
--xenome_prefix | /projects/compsci/omics_share/human/GRCh38/supporting_files/xenome/trans_human_GRCh38_84_NOD_based_on_mm10_k25| Xenome index for deconvolution of human and mouse reads. Used when `--pdx` is run.
--pdx | false | Options: false, true. If specified, 'Xengsort' is run on reads to deconvolute human and mouse reads. Human only reads are used in analysis.
--xengsort_host_fasta | '/projects/compsci/omics_share/mouse/GRCm39/genome/sequence/imputed/rel_2112_v8/NOD_ShiLtJ.39.fa' | Xengsort host fasta file. Used by Xengsort Index when `--pdx` is run, and xengsort_idx_path is `null` or false.
--xengsort_idx_path = | '/projects/compsci/omics_share/human/GRCh38/supporting_files/xengsort' | Xengsort index for deconvolution of human and mouse reads. Used when `--pdx` is run. If `null`, Xengsort Index is run using ref_fa and host_fa.
--xengsort_idx_name = | 'hg38_GRCm39-NOD_ShiLtJ' | Xengsort index name associated with files located in `xengsort_idx_path` or name given to outputs produced by Xengsort Index
--genotype_targets | '/projects/compsci/omics_share/human/GRCh38/supporting_files/ancestry_panel/snp_panel_v2_targets_annotations.snpwt.bed.gz' | Target SNP bed file for the ancestry panel. Can contain annotation information.
--snpID_list | '/projects/compsci/omics_share/human/GRCh38/supporting_files/ancestry_panel/snp_panel_v2.list' | Target SNPs in list used in BCFtools filtering step
Expand Down
4 changes: 3 additions & 1 deletion bin/log/pta.nf
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,9 @@ ______________________________________________________
--quality_phred ${params.quality_phred}
--unqualified_perc ${params.unqualified_perc}
--detect_adapter_for_pe ${params.detect_adapter_for_pe}
--xenome_prefix ${params.xenome_prefix}
--xengsort_host_fasta ${params.xengsort_host_fasta}
--xengsort_idx_path ${params.xengsort_idx_path}
--xengsort_idx_name ${params.xengsort_idx_name}
--ref_fa ${params.ref_fa}
--ref_fa_indices ${params.ref_fa_indices}
--ref_fa_dict ${params.ref_fa_dict}
Expand Down
5 changes: 4 additions & 1 deletion bin/log/rna_fusion.nf
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,10 @@ ______________________________________________________
--keep_intermediate ${params.keep_intermediate}
-c ${params.config}
--multiqc_config ${params.multiqc_config}
--xenome_prefix ${params.xenome_prefix}
--ref_fa ${params.ref_fa}
--xengsort_host_fasta ${params.xengsort_host_fasta}
--xengsort_idx_path ${params.xengsort_idx_path}
--xengsort_idx_name ${params.xengsort_idx_name}
--read_length ${params.read_length}
--star_index ${params.star_index}
--star_fusion_star_index ${params.star_fusion_star_index}
Expand Down
5 changes: 4 additions & 1 deletion bin/log/rnaseq.nf
Original file line number Diff line number Diff line change
Expand Up @@ -47,7 +47,10 @@ ______________________________________________________
--detect_adapter_for_pe ${params.detect_adapter_for_pe}
--pdx ${params.pdx}
--xenome_prefix ${params.xenome_prefix}
--ref_fa ${params.ref_fa}
--xengsort_host_fasta ${params.xengsort_host_fasta}
--xengsort_idx_path ${params.xengsort_idx_path}
--xengsort_idx_name ${params.xengsort_idx_name}
--strandedness_ref ${params.strandedness_ref}
--strandedness_gtf ${params.strandedness_gtf}
Expand Down
4 changes: 3 additions & 1 deletion bin/log/somatic_wes.nf
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,9 @@ ______________________________________________________
--pubdir ${params.pubdir}
--organize_by ${params.organize_by}
--pdx ${params.pdx}
--xenome_index ${params.xenome_prefix}
--xengsort_host_fasta ${params.xengsort_host_fasta}
--xengsort_idx_path ${params.xengsort_idx_path}
--xengsort_idx_name ${params.xengsort_idx_name}
--ref_fa ${params.ref_fa}
--ref_fa_indices ${params.ref_fa_indices}
--quality_phred ${params.quality_phred}
Expand Down
4 changes: 3 additions & 1 deletion bin/log/somatic_wes_pta.nf
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,9 @@ ______________________________________________________
--pubdir ${params.pubdir}
--organize_by ${params.organize_by}
--pdx ${params.pdx}
--xenome_index ${params.xenome_prefix}
--xengsort_host_fasta ${params.xengsort_host_fasta}
--xengsort_idx_path ${params.xengsort_idx_path}
--xengsort_idx_name ${params.xengsort_idx_name}
--ref_fa ${params.ref_fa}
--ref_fa_indices ${params.ref_fa_indices}
--quality_phred ${params.quality_phred}
Expand Down
2 changes: 1 addition & 1 deletion bin/shared/multiqc/pta_multiqc.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ export_plots: true
module_order:
- fastp
- fastqc
- xenome
- xengsort
- conpair
- gatk
- picard
Expand Down
2 changes: 1 addition & 1 deletion bin/shared/multiqc/rna_fusion_multiqc.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ export_plots: true

module_order:
- fastqc
- xenome
- xengsort
- custom_content

table_columns_visible:
Expand Down
2 changes: 1 addition & 1 deletion bin/shared/multiqc/rnaseq_multiqc.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ export_plots: true
module_order:
- fastp
- fastqc
- xenome
- xengsort
- star
- rsem
- picard
Expand Down
2 changes: 1 addition & 1 deletion bin/shared/multiqc/somatic_wes_multiqc.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ export_plots: true
module_order:
- fastp
- fastqc
- xenome
- xengsort
- gatk
- picard

Expand Down
2 changes: 1 addition & 1 deletion bin/shared/multiqc/somatic_wes_pta_multiqc.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ export_plots: true
module_order:
- fastp
- fastqc
- xenome
- xengsort
- gatk
- picard

Expand Down
6 changes: 4 additions & 2 deletions config/pta.config
Original file line number Diff line number Diff line change
Expand Up @@ -29,8 +29,10 @@ params {
// NOTE: For PE data, the adapter sequence auto-detection is disabled by default since the adapters can be trimmed by overlap analysis. However, you can specify --detect_adapter_for_pe to enable it.
// For PE data, fastp will run a little slower if you specify the sequence adapters or enable adapter auto-detection, but usually result in a slightly cleaner output, since the overlap analysis may fail due to sequencing errors or adapter dimers.

// Xenome index
xenome_prefix=params.reference_cache+'/human/GRCh38/supporting_files/xenome/hg38_broad_NOD_based_on_mm10_k25'
// xengsort index
xengsort_host_fasta = params.reference_cache+'/mouse/GRCm39/genome/sequence/imputed/rel_2112_v8/NOD_ShiLtJ.39.fa'
xengsort_idx_path = params.reference_cache+'/human/GRCh38/supporting_files/xengsort'
xengsort_idx_name = 'hg38_GRCm39-NOD_ShiLtJ'

// Reference fasta
ref_fa = params.reference_cache+'/human/GRCh38/genome/sequence/gatk/Homo_sapiens_assembly38.fasta'
Expand Down
9 changes: 6 additions & 3 deletions config/rna_fusion.config
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

manifest {
name = "rna_fusion"
description = 'Pipeline for processing of PDX RNASeq samples to call RNA Fusions, contains xenome step for processing PDX samples'
description = 'Pipeline for processing of PDX RNASeq samples to call RNA Fusions, contains xengsort step for processing PDX samples'
}

params {
Expand All @@ -21,8 +21,11 @@ params {
// PDX
pdx = false

// Xenome index
xenome_prefix=params.reference_cache+'/human/GRCh38/supporting_files/xenome/hg38_broad_NOD_based_on_mm10_k25'
// xengsort index
ref_fa = params.reference_cache+'/human/GRCh38/genome/sequence/gatk/Homo_sapiens_assembly38.fasta'
xengsort_host_fasta = params.reference_cache+'/mouse/GRCm39/genome/sequence/imputed/rel_2112_v8/NOD_ShiLtJ.39.fa'
xengsort_idx_path = params.reference_cache+'/human/GRCh38/supporting_files/xengsort'
xengsort_idx_name = 'hg38_GRCm39-NOD_ShiLtJ'

// READ LENGTH ADJUSTMENTS:
read_length = 150 // change relative to sample being processed. 75, 100, 125, and 150 are supported.
Expand Down
Loading

0 comments on commit 382b138

Please sign in to comment.