Releases: nedialkova-lab/mim-tRNAseq
Releases · nedialkova-lab/mim-tRNAseq
Updated differential expression heatmaps, logOR filtering, and bugfixes
New features and bugfixes
Differential expression heatmaps
- New heatmaps in DESeq2 folders which show scaled expression (normalised DESeq2 read counts), and log2 fold change values for all conditions vs control condition for significantly differentially expressed genes (padj <= 0.05).
- Line plot of basemean expression include on far right.
- See
DE_isodecodersScaled_hm.pdf
andDE_anticodonScaled_hm.pdf
in/DESeq2/isodecoder
and/DESeq2/anticodon
, respectively.
Normalised counts outputs
- Tables of DESeq2 normalised counts for isodecoders and anticodons now output to
/counts
folder. - These are the same values present in the last columns of all DESeq2
diffexpr-results.csv
files, which means they are also excluding undeconvoluted clusters.
Bugfixes
Major update v0.3 - new deconvolution and differential mods analysis
Major update to mim-seq core algorithms
New deconvolution
- Deconvolution now assesses all mismatches between parent and children instead of individual mismatches one-at-a-time. This makes all unique tRNA sequences distinguishable from each other offering full resolution within the tRNA transcriptome.
- From the full set of mismatches, a minimal set is defined that is sufficient for resolution of each transcript.
- The minimal set is chosen at the most 3' position possible to account for coverage drop-off close to the 5' end due to modification-induced stops to RT.
- A new parameter,
--deconv-cov-ratio
, allows the user to set a threshold for the drop in coverage due to stops that render some reads difficult to assign to their correct transcript. For example, if many reads (say 60%) end at position 26 in a reference tRNA due to m2,2G modification here, but a mismatch at position 13 is needed to assign this read to a child transcript within the cluster, then many reads will not be able to be assigned and will stay assigned to the parent.- In this case, the
--deconv-cov-ratio
can be set to 0.5 (i.e. a drop in coverage of 50% from the 3' end to the required mismatch in question). This will mark the parent and the specific sequence as not deconvoluted (as 60% of the reads end before the mismatch), and these will be excluded from modification analysis (and table outputs) as well as DESeq2 analysis of differential expression.
- In this case, the
Differential modification analysis
- In experiments with more than 1 condition, conditions will be compared pairwise to assess significant differences in modification status.
- To achieve this, proportions of modified and unmodified nucleotides at each position for each tRNA are used for the calculation of log Odds Ratios (logOR). These are then tested for significance with a Chi-Squared test, and corrected for multiple testing using FDR.
- See the original paper Methods section for detailed explanation.
Behrens et al. (2021) High-resolution quantitative profiling of tRNA abundance and modification status in eukaryotes by mim-tRNAseq. Mol. Cell.
Small bugfixes and changes
--min-cov
now accepts an integer for filtering low coverage tRNAs (as before, where the integer represents a total read coverage threshold for the tRNA), as well as a fraction between 0 and 1 representing the proportion of reads mapped to that tRNA relative to all tRNA mapped reads (recommended values to start testing are 0.0005).- Automatic connection and parsing of data from Modomics API. Local Modomics file can be used using
--local-modomics
--double-cca
mode enables assessment of tRNA ends with 3'CCACCA. This end addition is common in marking tRNAs for degradation. In this mode, CCA analysis files will represent proportions of the second CCA end addition not the first. I.e. the proportion of 3'-CCA is representative of 3'-CCACCA, 3'-CC represents 3'CCACC, and so on. "Absent" indicates all those reads without a second CCA end (which may include those with full single 3'-CCA down to absent single CCA).
additionalMods reduction, improved predicted mods discovery and isodecoder name shortening
Updates
- Reduced additional mods file to contain only inosine 34. This allows mim-tRNAseq to discover and annotate other mods on its own and reduces misalignment, e.g. human Asp-GTC-3
- New mods discovered after round 1 of alignment are then subtracted from predictedMods so that their validity can be rechecked after alignment improvements in round 2. Only these are then added to predictedMods
- Isodecoder names now exclude 2nd number in name (except "chr" containing names) in all outputs and plots
- Stops and misinc heatmaps no longer plot gap information ("-")
knownMods.csv
renamed toallMods.csv
as it includes Modomics, additionalMods, and predictedMods sites
Bugfixes
allMods.csv
positions now 1-based--misinc-thresh
now used for heatmap row annotations (i.e. stop and misinc site counts)- Duplicates in
countTable
no longer removes as pandas issue removes non-duplicated rows
Minor bugfixes and updates
Updates
- Additional mitochondrial modifications from Clark 2016 added to
additionalMods.txt
- GSNAP index names match input files not experiment name flag
- DESeq2 threshold for significance now adjusted p < 0.01
- readthroughTable proportion now 1 - stop proportion at position, reflecting actual readthrough
- DESeq2 single replicates now get normalised count outputs using
estimateSizeFactors()
Bugfixes
- Strict handling of additionalMods location type (i.e. mito or cytosolic) when adding to modified position index
- Correct handling of insertions between cluster parent and child: modified sites after insertions in child need position correction by number of insertions (i.e. all mods after insertions are decreased to adjust for parent insertions)
- Indel handling for discrepancy between GSNAP an usearch. In some cases the two algorithms chose different, closely spaced positions to introduce an insertion, effecting downstream cluster splitting and mod position analysis
- Fixed isodecoders added to
unique_isodecoderMMs
based on insertions that weren't unique to one isodecoder only - Adjust position of mismatch between parent and child if there are preceding insertions so that the correct position in the child is used for storing the identity of the mismatch
Bugfixes to dependancies and new PyPi package
Bugfixes
- Changes to dependancy versions
- Needed new version to create new distribution for PyPI
PyPi package and automated species data
New features
--species
(or-s
) flag to specify one of a few built-in species data. This negates the need for specifying tRNA references, tRNAscan out files, and mitochondrial tRNA sequences- Species files for which there isn't built in data can still be specified using
-t
,-o
and-m
- Package now available on PyPi and installable using pip
mods/readthroughTable.csv
is now and additional which gives information on the proportion of reads that stop at a given site relative to the total reads at that site (as opposed to RTstopTable.csv which gives the proportion of total reads for the tRNA that stop at a site)
Bugfixes to v0.2
Bug fixes
- --cluster-id 1 now functions correctly in terms of producing Isodecoder_counts.txt
- Disabled clustering also functions correctly to produce both counts outputs
New features
- Isodecoder and anticodon level heatmaps of vst normalised tRNA expression
Stable v0.2 release
Major update to modification analysis
mim-tRNAseq now performs modification analysis per unique tRNA sequence instead of per cluster. This is achieved by analysing mismatches between cluster members and using unique mismatches to characterise unique tRNA sequences that can be split from the cluster member. This was previously done to split overall read counts, but now this is performed before modification analysis so that each read can be assigned to a unique tRNA sequence group. Each read is then assessed for stops and modifications after assignment to its new group.
Other new features
- New predicted modifications and inosines output to
mods/predictedMods.csv
. This contains predicted sites for each sample run, with canonical position numbering and proportions of each nucleotide misincorporated for easier annotation of new detected mods.
Minor bug-fixes
- S. pombe reference tRNA names altered for consistent naming in output plots
- Mitochondrial coverage plots now have legends with two columns so that all items are visible in output PDF
First stable release
v0.1 first release version 0.1