GBRS Pipeline ReadMe

GBRS Documentation

NOTE: EMASE and GBRS are part of the same software package, and share functions between them. This workflow outputs from EMASE and GBRS, but they are derived from the same software package.

GBRS Pipeline (--workflow gbrs)

•   Step 1: Map reads with BOWTIE, and convert SAM to BAM. Note, R1 and R2 are mapped separately for paired-end data.  
•   Step 2: Convert BAM to EMASE. Note: R1 and R2 are converted separately for paired-end data.  
•   Step 3: Paired-end data: Find common alignments and compress EMASE file.  
•   Step 3: Single-end data: Compress EMASE file.  
•   Step 4: Quantify multiway expression  
•   Step 5: Genotype reconstruction   
•   Step 6: Quantify diploid expression with GBRS   
•   Step 7: Interpolate genotypes and genotype probabilities    
•   Step 8: Plot inferred genotypes  
•   Step 9: Export genotype probabilities

Parameters for GBRS Pipeline

--pubdir
- Default: /<PATH>
- Comment: The directory that the saved outputs will be stored.
--sample_folder
- Default: /<PATH>
- Comment: The path to the folder that contains all the samples to be run by the pipeline. The files in this path can also be symbolic links.
--extension
- Default: .fastq.gz
- Comment: The expected extension for the input read files.
--pattern
- Default: "*_R{1,2}*"
- Comment: The expected R1 / R2 matching pattern. The default value will match reads with names like this READ_NAME_R1_MoreText.fastq.gz or READ_NAME_R1.fastq.gz
--read_type
- Default: PE
- Comment: Options: PE and SE. Type of reads: paired end (PE) or single end (SE).
--concat_lanes
- Default: false
- Comment: Options: false and true. If this boolean is specified, FASTQ files will be concatenated by sample. Used in cases where samples are divided across individual sequencing lanes.
--concat_sampleID_delim
- Default: "_"
- Comment: The delimited to split FASTQ file names.
--concat_sampleID_positions
- Default: "1"
- Comment: The number of elements to keep after splitting on the chosen delimiter in the sample name.
--bowtie_index
- Default: /projects/compsci/omics_share/mouse/GRCm39/transcriptome/indices/imputed/rel_2112_v8/bowtie/bowtie.transcripts
- Comment: Path to the bowtie index. Include the bowtie prefix in this path (e.g., /path/to/bowtie.transcripts where bowtie.transcripts.* are the full set of index files in the directory.
--transcripts_info
- Default: /projects/compsci/omics_share/mouse/GRCm39/supporting_files/emase_gbrs/rel_2112_v8/emase.fullTranscripts.info
- Comment: A file containing all transcript IDs. NOTE: These IDs must not contain haplotype IDs. This file must also have a 'length' column. Note that 'length' is not used in this context. ONLY IDs are used from this file. Can be obtained from prepare_emase workflow (emase.fullTranscripts.info)
--gbrs_strain_list
- Default: "A,B,C,D,E,F,G,H"
- Comment: A list of haplotype names corresponding to genomes used in hybrid genome construction (e.g., 'A,B,C,D,E,F,G,H').
--gene2transcript_csv
- Default: /projects/compsci/omics_share/mouse/GRCm39/supporting_files/emase_gbrs/rel_2112_v8/emase.gene2transcripts.tsv
- Comment: A file containing all gene to transcript ID translations. NOTE: These IDs must not contain haplotype IDs. Can be obtained from prepare_emase workflow (emase.gene2transcripts.tsv)
--full_transcript_info
- Default: /projects/compsci/omics_share/mouse/GRCm39/supporting_files/emase_gbrs/rel_2112_v8/emase.pooled.fullTranscripts.info
- Comment: A file containing all transcript IDs with transcript lengths. NOTE: These IDs must contain haplotype IDs. Can be obtained from prepare_emase workflow (emase.pooled.fullTranscripts.info)
--emase_model
- Default: '4'
- Comment: Options:(1, 2, 3, 4). 1: reads are apportioned among genes first, then between alleles, and then among isoforms. 2: reads are apportioned among genes first, then among isoforms, and then between alleles. 3: reads are apportioned among genes first, then among each isoform-allele combination.
--emission_prob_avecs
- Default: /projects/compsci/omics_share/mouse/GRCm39/supporting_files/emase_gbrs/rel_2112_v8/gbrs_emissions_all_tissues.avecs.npz
- Comment: The emission probability vector file.
--trans_prob_dir
- Default: /projects/compsci/omics_share/mouse/GRCm39/supporting_files/emase_gbrs/rel_2112_v8/transition_probabilities
- Comment: A directory containing all transition probability files.
--gene_position_file
- Default: /projects/compsci/omics_share/mouse/GRCm39/supporting_files/emase_gbrs/rel_2112_v8/ref.gene_pos.ordered_ensBuild_105.npz
- Comment: A python compressed NPZ file containing gene position in primary reference coordinates.
--genotype_grid
- Default: /projects/compsci/omics_share/mouse/GRCm39/supporting_files/emase_gbrs/rel_2112_v8/ref.genome_grid.GRCm39.tsv
- Comment: Simulated marker grid used in genotype inference.
--founder_hex_colors
- Default: /projects/compsci/omics_share/mouse/GRCm39/supporting_files/emase_gbrs/rel_2112_v8/founder.hexcolor.info
- Comment: A file containing hexcode colors for plotting genotypes.
--gbrs_expression_threshold
- Default: 1.5
- Comment: GBRS expression threshold limit required to quantify a gene.
--gbrs_sigma
- Default: 0.12
- Comment: GBRS scaling factor.

Pipeline Default Outputs

NOTE: * Represents a wild card that is a placeholder for values that will be filled by input file names and/or parameters when the pipeline is run.

Naming Convention	Description
`emase_report.html`	Nextflow autogenerated report
`trace.txt`	Nextflow trace of processes
`/gbrs/.emase.h5`	EMASE/GBRS compressed BAM h5 file
`/emase/.isoforms.alignment_counts`	Multiway raw alignment counts for all transcript isoforms
`/emase/.isoforms.expected_read_counts`	Multiway expected read counts for all transcript isoforms
`/emase/.isoforms.tpm`	Multiway transcript per million (tpm) for all transcript isoforms
`/emase/.genes.alignment_counts`	Multiway raw alignment counts for all genes
`/emase/.genes.expected_read_counts`	Multiway expected read counts for all genes
`/emase/.genes.tpm`	Multiway transcript per million (tpm) for all genes
`/gbrs/.isoforms.alignment_counts`	Diploid raw alignment counts for all transcript isoforms
`/gbrs/.isoforms.expected_read_counts`	Diploid expected read counts for all transcript isoforms
`/gbrs/.isoforms.tpm`	Diploid transcript per million (tpm) for all transcript isoforms
`/gbrs/.genes.alignment_counts`	Diploid raw alignment counts for all genes
`/gbrs/.genes.expected_read_counts`	Diploid expected read counts for all genes
`/gbrs/.genes.tpm`	Diploid transcript per million (tpm) for all genes
`/gbrs/.interpolated.genoprobs.npz`	Interpolated genotype probabilities in python NPZ format
`/gbrs/.interpolated.genoprobs.tsv`	Interpolated genotype probabilities in tab delimited format
`/gbrs/.plotted.genome.pdf`	Plotted interpolated genotypes
`/gbrs/.genotypes.tsv`	Inferred genotypes for all genes
`/stats/.bowtie_R1.log`	Statistics output from read1 mapping from Bowtie
`/stats/.bowtie_R2.log`	Statistics output from read2 mapping from Bowtie, if data are paired end

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GBRS Pipeline ReadMe

GBRS Documentation

GBRS Pipeline (--workflow gbrs)

Parameters for GBRS Pipeline

Pipeline Default Outputs

Pipeline Options Outputs

Home

Pipeline Documentation

Benchmarking Documentation

Pipeline development and Release Documentation

Clone this wiki locally