Skip to content

PacBio ReadMe

MikeWLloyd edited this page Apr 22, 2024 · 4 revisions

MMRSVD Germline SV Pacific Biosciences (PacBio) Documentation

SV Analysis Pipeline: PacBio Data

(--workflow germline_sv, --data_type pacbio)

For input sample:

•   PBMM2 Mapping to reference genome
•   PBSV SV calling   
•   SNIFFLES SV calling   
•   SURVIVOR SV merging  
•   SURVIVOR Annotation of results based on intersection with previously identified mouse SVs, genic and exonic regions  

PacBio Flowchart

flowchart TD
    p00([PACBIO READS\nFASTQ])
    p001([REFERENCE_GENOME\nGRCm39])
    p002[PBMM2_INDEX]
    p003[PRE-ALIGNED BAM]
    p01[PBMM2_CALL]
    p02[PBSV_DISCOVER]
    p03[PBSV_CALL]
    p04[SNIFFLES]
    p05[SURVIVOR_MERGE]
    p06[SURVIVOR_SUMMARY]
    p07[SURVIVOR_VCF_TO_TABLE]
    p08[SURVIVOR_TO_BED]
    p09[SURVIVOR_BED_INTERSECT]
    p10[SURVIVOR_ANNOTATION]
    p11[SURVIVOR_ANNOTATION_WITH_EXONS]
    o1([Genomic BAM]):::output
    o2([PB SV Calls]):::output
    o3([SNIFFLES SV Calls]):::output
    o4([Merged VCF]):::output
    o5([Annotated SV Calls]):::output
    o6([SV Joined Results]):::output
    o7([Intersect BEDS]):::output
    p00 --> p01
    p001 -..-> |Generate Reference Index if Neccesary| p002
    p002 --> p01
    p01 -->o1
    o1 --> p02
    p02 --> p03
    p001 --> p03
    o1 --> p04
    p003 -..-> |If Pre-Aligned Bam Provided| p02
    p003 -..-> |If Pre-Aligned Bam Provided| p04
    p03 --> o2
    o2 --> p05
    p04 --> o3
    o3 --> p05
    p05 --> o4
    o4 --> p06
    o4 --> p07
    p06 --> p08
    p06 --> p10
    p07 --> p10
    p07 --> p08
    p08 --> p09
    p08 --> p10
    p09 --> o7
    o7 --> p10
    o4 --> p11
    o7 --> p11
    p10 --> o6
    p11 --> o5
    classDef output fill:#90aaff,stroke:#6c8eff,stroke-width:2px,color:#000000
Loading

Parameters for MMRSVD Germline SV Pipeline (pacbio)

  • --sampleID

    • Default: <STRING>
    • Comment: The sample ID for the input data (required).
  • --pubdir

    • Default: /<PATH>
    • Comment: The directory that the saved outputs will be stored.
  • --organize_by

    • Default: sample
    • Comment: How to organize the output folder structure. Options: sample or analysis.
  • --cacheDir

    • Default: '/projects/omics_share/meta/containers'
    • Comment: This is directory that contains cached Singularity containers. JAX users should not change this parameter.
  • -w

    • Default: /<PATH>
    • Comment: The directory that all intermediary files and nextflow processes utilize. This directory can become quite large. This should be a location on /fastscratch or other directory with ample storage.
  • --data_type

    • Selected: pacbio
    • Comment: The germline sv workflow will run in pacbio mode with this option selected.
  • --pbmode

    • Selected: null
    • Comment: Options: CCS or CLR. Specify whether input data are from PacBio CCS or CLR data.
  • --fastq1

    • Default: null
    • Comment: The path to a single FASTQ file, or one of a pair of FASTQs for paired-end data.
  • --fastq2

    • Default: null
    • Comment: The path to the second of a pair of FASTQs for paired-end data.
  • --bam

    • Default: null
    • Comment: The path to a BAM input data if alignment has already been performed outside this pipeline.
  • --fasta

    • Default: /<PATH>
    • Comment: The path to the reference genome in FASTA format.
  • --fasta_index

    • Default: /<PATH>
    • Comment: Optional paramter to specify index for reference genome. If not provided, pipeline will generate an index.
  • --genome_build

    • Default: GRCm38
    • Comment: Mouse specific. Options: GRCm38 or GRCm39. Parameter that controls reference data used for alignment and annotation.
  • --tandem_repeats

    • Default: '/ref_data/ucsc_mm10_trf_chr_sorted.bed'
    • Comment: BED file that lists the coordinates of centromeres and telomeres to exclude as alignment targets. Note: default path refers to a location within the containers qquay.io/jaxcompsci/pbsv-td_refs:2.8.0--refv0.2.0 and quay.io/jaxcompsci/sniffles-td_refs:2.0.7--refv0.2.0, which require this file.
  • --sv_ins_ref

    • Default: '/ref_data/variants_freeze5_sv_INS_mm39_to_mm10_sorted.bed.gz'
    • Comment: BED file that lists previously indentified insertion SVs. Note: default path refers to a location within the container quay.io/jaxcompsci/bedtools-sv_refs:2.30.0--refv0.2.0, which requires this file.
  • --sv_del_ref

    • Default: '/ref_data/variants_freeze5_sv_DEL_mm39_to_mm10_sorted.bed.gz'
    • BED file that lists previously indentified deletion SVs. Note: default path refers to a location within the container quay.io/jaxcompsci/bedtools-sv_refs:2.30.0--refv0.2.0, which requires this file.
  • --sv_inv_ref

    • Default: '/ref_data/variants_freeze5_sv_INV_mm39_to_mm10_sorted.bed.gz'
    • BED file that lists previously indentified inversion SVs. Note: default path refers to a location within the container quay.io/jaxcompsci/bedtools-sv_refs:2.30.0--refv0.2.0, which requires this file.
  • --reg_ref

    • Default: '/ref_data/mus_musculus.GRCm38.Regulatory_Build.regulatory_features.20180516.gff.gz'
    • BED file that lists regulatory features. Note: default path refers to a location within the container quay.io/jaxcompsci/bedtools-sv_refs:2.30.0--refv0.2.0, which requires this file.
  • --genes_bed

    • Default: '/ref_data/Mus_musculus.GRCm38.102.gene_symbol.bed'
    • BED file that lists gene symbol IDs and coordinates. Note: default path refers to a location within the container quay.io/jaxcompsci/bedtools-sv_refs:2.30.0--refv0.2.0, which requires this file.
  • --exons_bed

    • Default: '/ref_data/Mus_musculus.GRCm38.102.exons.bed'
    • BED file that lists exons and coordinates. Note: default path refers to a location within the container quay.io/jaxcompsci/bedtools-sv_refs:2.30.0--refv0.2.0, which requires this file.
  • --surv_dist

    • Default: 1000
    • Maximum distance between breakpoints for merging SVs.
  • --surv_supp

    • Default: 1
    • The number of callers (out of 4) required to support an SV.
  • --surv_type

    • Default: 1
    • Boolean (0/1) that requires SVs to be the same type for merging.
  • --surv_strand

    • Default: 1
    • Boolean (0/1) that requires SVs to be on the same strand for merging.
  • --surv_min

    • Default: 30
    • Minimum length (bp) to output SVs.

Pipeline Default Outputs

Naming Convention Description
germline_sv_report.html Nextflow autogenerated report
trace/trace.txt Nextflow trace of processes
${sampleID}/${sampleID}_PACBIO_PS_struct_var.vcf VCF output combining merged PBSV and Sniffles calls annotated for overlap with exonic regions
${sampleID}/${sampleID}_survivor_joined_results.csv Table of SVs annotated with overlaps of previously identified SVs (beck), genes, exons, regulatory regions
${sampleID}/alignments/${sampleID}.pbmm2.aligned.bam Analysis-ready alignment of reads
${sampleID}/alignments/${sampleID}.pbmm2.aligned.bam.bai Index for analysis-ready alignment of reads
${sampleID}/unmerged_calls/${sampleID}.pbsv_calls.vcf SV calls from PBSV
${sampleID}/unmerged_calls/${sampleID}.sniffles_sorted_prefix.vcf SV calls from Sniffles
Clone this wiki locally