Option to avoid ambiguous nucleotides in generated probes #58

med-ss20 · 2024-05-31T07:40:03Z

Dear Dr. Metsky,

Is there a way to configure CATCH to avoid generating probes with ambiguous nucleotides and only use standard nucleotides (A, T, C, G)? I understand that it will lead to a greater number of probes for the given dataset.

Thank you very much for any advice and this great tool 👍🏻

Regards,
Sviat

haydenm · 2024-06-03T20:37:46Z

Hi Sviat,

I'm glad that you find CATCH useful. Yes, there is an option to do what you're looking for! It's --expand-n. The help message for that argument is:

Expand each probe so that 'N' bases are replaced by real
bases; for example, the probe 'ANA' would be replaced
with the probes 'AAA', 'ATA', 'ACA', and 'AGA'; this is
done combinatorially across all 'N' bases in a probe, and
thus the number of new probes grows exponentially with the
number of 'N' bases in a probe. If followed by a command-
line argument (INT), this only expands at most INT randomly
selected N bases, and the rest are replaced with random
unambiguous bases (default INT is 3).

For example, setting --expand-n 10 combinatorially expands up to 10 N nucleotides with real nucleotides, and replaces the rest randomly with real nucleotides. You could set the value to be the probe length if you want to combinatorially expand all Ns. Note that this does not work with non-N ambiguity characters (e.g., Y); if you have those, my suggestion would be to replace them with N in the input.

Hayden

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Option to avoid ambiguous nucleotides in generated probes #58

Option to avoid ambiguous nucleotides in generated probes #58

med-ss20 commented May 31, 2024

haydenm commented Jun 3, 2024

Option to avoid ambiguous nucleotides in generated probes #58

Option to avoid ambiguous nucleotides in generated probes #58

Comments

med-ss20 commented May 31, 2024

haydenm commented Jun 3, 2024