Translate & Align V-Pipe Reads #53

gordonkoehn · 2024-11-29T16:35:09Z

Currently, we use Nextclade to translate from nucleotides to amino acids.

We also use it to realign the notes and amino acids, even though our nucleotides are already aligned.

This seemed fine for small test data, yet it may be infeasible or a mere waste of resources to realign the reads again.

So consider rewriting the translate functions.

The text was updated successfully, but these errors were encountered:

gordonkoehn · 2024-11-29T16:44:45Z

Potential Solution: call functions within Nextclade are told after the alignment.

Reach out the the nextclade devs

gordonkoehn · 2024-12-02T12:34:32Z

See their response:

`.sam` to amino acids – can I just translate ? nextstrain/nextclade#1556

In particular from rneher

A simple script could translate those already aligned reads. You'd need to figure which ORFs you read falls into, the reading frame, and then use something like Bio.Sequence.translate from the biopython package to translate the sequence.

gordonkoehn · 2024-12-02T12:46:25Z

So, we probably need a custom tool, yet I am unsure how to get the new amino acid positions.

Plus in general I don't know how to deal with the insertions

Read Pairing & Merge #54

Can I just translate the nucleotide insertion to amino-acid insertions?

gordonkoehn · 2024-12-02T13:06:11Z

@DrYak, for now, I am ignoring insertions completely. There are too many biological unknowns for me here. I'd cherish your advice here sometime.

gordonkoehn · 2024-12-03T13:38:47Z

The issue:

gordonkoehn · 2024-12-03T14:55:57Z

Need some input before I can continue.

gordonkoehn · 2024-12-09T13:30:25Z

Take-aways from chat with @LaraFuhrmann

BIG CAVEAT to uploading the .bam reads to SILO is that this discards all of V-Pipe's mutation calling efforts, so the database will contain sequencing artefacts.
Yet, anyone will still be able to query, which is useful in urgent situations quickly.
As a rough estimate, V-Pipe's computational effort is 15 % preprocessing / 25 % nucleotide alignment / 60 % mutational calling.
The current plan hence is just to take all .bam take single reads to align with nextclade and once down, merge pair reads
Basically, we only use V-Pipe's preprocessing and use Nextclade alignment / at the end, we again have V-Pipe's post-processing.

So actionable:

implement to run nextclade for batches of the reads to still run in a small docker, just for longer.
this means sr2silo is a computationally expensive step

@DrYak FYI. Let's also chat about this before I do it.

gordonkoehn · 2024-12-11T14:28:19Z

Take-aways from chat with @DrYak:

There exists, indeed, no code for this as of now. This is because most of the time, people will use mutations called on nucleotides and only translate these mutations, not entire alignments.

There are two options:

1. build it from the ground( with pysam and proper logic to handle all coding frames, handle corner cases)
1. hack Nextclade and hook into the process after the alignment they do

In both cases, the steps would be to take a .sam of the single reads and make a .sam with paired reads, i.e. using Micha's tool.

Once that exists, translate the amino acid alignment with either option.

Option 1) would probably take me 1-2 Months to handle properly – hard to estimate – for my lack of experience.

Option 2) could be quick and handle all corner cases.

gordonkoehn · 2024-12-17T08:54:27Z

This is a tool that Niko shared

Check out this tool VIRULINGN

it appears it does the alignment itself. So this is not what we want, but it reads like there are some difficulties with the translation. Which just proved my point to use a well-supported tool. Then Nextclade will be their better choice if I hack something.

gordonkoehn · 2024-12-17T08:56:07Z

How does V-Pipe align? Is that better in any way than Nextclade?

Ivan mentioned some probabilistic work in the alignment and theorised that Nextclade might do something more simplistic.

gordonkoehn added the bug Something isn't working label Dec 2, 2024

gordonkoehn changed the title ~~Remove Realignment by NextClade, Scale Performance to Fullsize BAM~~ Remove Realignment by NextClade, Write Custom Amino Acid Translation Dec 2, 2024

gordonkoehn changed the title ~~Remove Realignment by NextClade, Write Custom Amino Acid Translation~~ Remove Realignment by NextClade, Write Custom Amino Acid Translation/Insertion Dec 2, 2024

gordonkoehn added the help wanted Extra attention is needed label Dec 3, 2024

gordonkoehn self-assigned this Dec 3, 2024

gordonkoehn changed the title ~~Remove Realignment by NextClade, Write Custom Amino Acid Translation/Insertion~~ Translate & Align V-Pipe Reads Dec 9, 2024

gordonkoehn mentioned this issue Dec 19, 2024

Read Pairing & Merge #54

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Translate & Align V-Pipe Reads #53

Translate & Align V-Pipe Reads #53

gordonkoehn commented Nov 29, 2024 •

edited

Loading

gordonkoehn commented Nov 29, 2024 •

edited

Loading

gordonkoehn commented Dec 2, 2024 •

edited

Loading

gordonkoehn commented Dec 2, 2024 •

edited

Loading

gordonkoehn commented Dec 2, 2024

gordonkoehn commented Dec 3, 2024

gordonkoehn commented Dec 3, 2024

gordonkoehn commented Dec 9, 2024 •

edited

Loading

gordonkoehn commented Dec 11, 2024 •

edited

Loading

gordonkoehn commented Dec 17, 2024 •

edited

Loading

gordonkoehn commented Dec 17, 2024 •

edited

Loading

Translate & Align V-Pipe Reads #53

Translate & Align V-Pipe Reads #53

Comments

gordonkoehn commented Nov 29, 2024 • edited Loading

gordonkoehn commented Nov 29, 2024 • edited Loading

gordonkoehn commented Dec 2, 2024 • edited Loading

gordonkoehn commented Dec 2, 2024 • edited Loading

gordonkoehn commented Dec 2, 2024

gordonkoehn commented Dec 3, 2024

gordonkoehn commented Dec 3, 2024

gordonkoehn commented Dec 9, 2024 • edited Loading

gordonkoehn commented Dec 11, 2024 • edited Loading

gordonkoehn commented Dec 17, 2024 • edited Loading

gordonkoehn commented Dec 17, 2024 • edited Loading

gordonkoehn commented Nov 29, 2024 •

edited

Loading

gordonkoehn commented Nov 29, 2024 •

edited

Loading

gordonkoehn commented Dec 2, 2024 •

edited

Loading

gordonkoehn commented Dec 2, 2024 •

edited

Loading

gordonkoehn commented Dec 9, 2024 •

edited

Loading

gordonkoehn commented Dec 11, 2024 •

edited

Loading

gordonkoehn commented Dec 17, 2024 •

edited

Loading

gordonkoehn commented Dec 17, 2024 •

edited

Loading