Skip to content

Generating master files

Mark Maienschein-Cline edited this page Feb 20, 2024 · 3 revisions

Basic setup

Use the generate_masterfile_job.sh script to run demultiplexing and data processing steps for ribosomal profiling of individual samples.

Change parameter values between the EDIT BELOW HERE and STOP EDITING lines based on the requirements of your job. Do not change the parameter name, but do change the value that comes after. Ensure the results are all enclosed in quotes.

Outputs

It creates one output, the master file based on the selected comaparison.

Required settings

Parameter names to edit:

CONTROL_MANIFEST and TREATMENT_MANIFEST These are the manifest output files for the primary outputs of the 'Riboseq Sample Processing' tool. They should have the follow file types in them (third column):

Gene_counts
Gene_posRPM
Gene_negRPM

These can be the same file, if you are comparing two samples that were processed in the same run.

CONTROL_SAMPLE and TREATMENT_SAMPLE The name of the control and treatment samples. These should match the sample names in the second column of the manifests.

The output master file will calculate fold-changes as treatment/control.

REFERENCE Reference genome in fasta format.

ANNOTATION Gene annotations for your genome. This is a 5-column, tab-delimited file, formatted as:

[ID]  [start]  [end]  [strand]  [name/description]

For example:

ACT41903.1;thrL      190     255     +       thr operon leader peptide
ACT41904.1;thrA      336     2798    +       Bifunctional aspartokinase/homoserine dehydrogenase 1
ACT41905.1;thrB      2800    3732    +       homoserine kinase
ACT41906.1;thrC      3733    5019    +       L-threonine synthase
ACT41907.1;yaaX      5232    5528    +       DUF2502 family putative periplasmic protein
ACT41908.1;yaaA      5681    6457    -       peroxide resistance protein%2C lowers intracellular iron

OUTPUT_NAME Name for output file.

Optional settings

GENE_THRESHOLD Minimum coverage threshold to include a gene in the master file, across both samples.

CODON_THRESHOLD_SAMPLE and CODON_THRESHOLD_TREATMENT Minimum codon coverage threshold (counts per codon) to include a codon in the master file, for control or treatment samples respectively.

RPM_SEGMENT Set to segment or all to perform codon normalization based on:

  • segment: Segmentation of the gene, normalizing separately within the first 10 codons, last 10 codons, and the rest of the gene (default, standard behavior).
  • all: Over the entire gene.
Clone this wiki locally