GitHub - bonsai-team/iedera: subset and spaced seed design tool

This branch is 5 commits ahead of laurentnoe/iedera:master.

Name	Name	Last commit message	Last commit date
Latest commit laurentnoe Merge branch 'laurentnoe:master' into master May 28, 2024 278f434 · May 28, 2024 History 442 Commits
.circleci	.circleci	CircleCI Choco Mingw ugly downgrade	Apr 5, 2024
src	src	Updating bioinfo.cristal.univ-lille.fr to bioinfo.univ-lille.fr	Jun 22, 2023
tests	tests	Updating bioinfo.cristal.univ-lille.fr to bioinfo.univ-lille.fr	Jun 22, 2023
.gitignore	.gitignore	Updating ".gitignore" rules with generated test files	Jan 8, 2017
AUTHORS	AUTHORS	First github release (iedera 1.06 alpha9)	Dec 17, 2016
COPYING	COPYING	Updating licences with some cleanup	May 2, 2024
ChangeLog	ChangeLog	Modify "mode change"	Dec 17, 2016
INSTALL	INSTALL	First github release (iedera 1.06 alpha9)	Dec 17, 2016
LICENSE	LICENSE	Updating licences with some cleanup	May 2, 2024
Makefile.am	Makefile.am	Updating tests cleaning and "make test"	Jan 15, 2017
Makefile.in	Makefile.in	Updating Makefile.in	May 7, 2022
NEWS	NEWS	Updating Documentation	May 7, 2022
README	README	Updating bioinfo.cristal.univ-lille.fr to bioinfo.univ-lille.fr	Jun 22, 2023
README.rst	README.rst	rtf Typo	May 2, 2024
aclocal.m4	aclocal.m4	Updating aclocal.m4	May 7, 2022
appveyor.yml	appveyor.yml	Removing cleaning of gcov files, ignoring some	May 13, 2022
codecov.yml	codecov.yml	Changing codecov ignore rules : stars additions	May 13, 2022
configure	configure	Updating bioinfo.cristal.univ-lille.fr to bioinfo.univ-lille.fr	Jun 22, 2023
configure.ac	configure.ac	Updating bioinfo.cristal.univ-lille.fr to bioinfo.univ-lille.fr	Jun 22, 2023
depcomp	depcomp	build: updating install scripts from autoreconf -fi	Apr 10, 2018
doxygenconfig	doxygenconfig	Updating doxygen config file	Jun 29, 2021
install-sh	install-sh	build: updating install scripts from autoreconf -fi	Apr 10, 2018
missing	missing	build: updating install scripts from autoreconf -fi	Apr 10, 2018
mkinstalldirs	mkinstalldirs	build: updating install scripts from autoreconf -fi	Apr 10, 2018
plot_mow_seeds.py	plot_mow_seeds.py	Updating the plot comments, to explain the model used in Mow seeds	May 23, 2022
plot_spaced_seeds.py	plot_spaced_seeds.py	Updating scripts with alignment_length as global (was undefined and t…	May 20, 2022
plot_spaced_seeds_figure.png	plot_spaced_seeds_figure.png	Adding a picture to illustrate	May 20, 2022
tam92.py	tam92.py	Adding the Martin Frith "tam92.py" tool, that generates "-f" command …	Sep 30, 2020
test-driver	test-driver	build: updating install scripts from autoreconf -fi	Apr 10, 2018

Repository files navigation

iedera

(more at <http://bioinfo.univ-lille.fr/yass/iedera.php>)

iedera is a tool to select and design spaced seeds, transition constrained spaced seeds, or more generally subset seeds, and vectorized subset seed patterns.

Installation

(more at <http://bioinfo.univ-lille.fr/yass/iedera.php#downloadiedera>)

Binaries for Windows (x64) and OS X (x64) are available at <https://github.com/laurentnoe/iedera/releases>.

Otherwise, you need a C++ compiler and the autotools. On Linux, you can install g++, autoconf, automake. On Mac, you can install xcode, or the command line developer tools (or you can use macports to install g++-mp-5 for example).

Using the command line, type:

git clone https://github.com/laurentnoe/iedera.git
cd iedera
./configure
make

or:

git clone https://github.com/laurentnoe/iedera.git
cd iedera
autoreconf
./configure
automake
make

you can install iedera to a standard /local/bin directory:

sudo make install

or copy the binary directly to your homedir:

cp src/iedera ~/.

Command-line

(more at <http://bioinfo.univ-lille.fr/yass/iedera.php#quick>)

First, use one of these two parameters :

`-spaced`	for spaced seeds
`-transitive`	for transitive spaced seeds

since they are shortcuts for quite long command lines.

Then you can change the weight, span, and number of seeds being designed:

`-w <N,N>`	for the weight range, where N = [1..16] seems reasonable
`-s <N,N>`	for the span range, where N = [1..32] seems reasonable
`-n <N>`	for the number of seeds, where N = [1..32]

as well as the length of the alignment:

-l <N>

where N = [1..64] seems reasonable

NOTE : since enumeration of all the combination of multiple seeds may take time, if "-n" is chosen with a value greater than one, please consider the two following:

`-r <N>`	to run the tool on N randomly generated seed patterns
`-k`	to activate the hill-climbing algorithm on previous parameter -r

(more at <http://bioinfo.univ-lille.fr/yass/iedera.php#details>)

Examples

Spaced seeds

A very small example where the seed weight is set to 11, and the span is at most 18 (full enumeration):

iedera -spaced -w 11,11 -s 11,18

will give the classical PatternHunter 1 spaced seed

###-#--#-#--##-###    0.999999761581      0.467122       0.532878
(SEED PATTERN)        (selectivity)       (SENSITIVITY)  (distance to 1,1)

A second example where the number of seeds is now set to 2, the alignment length is set to 50, and 10000 seeds will be tested with the hill-climbing algorithm activated:

iedera -spaced -n 2 -w 11,11 -s 11,22 -l 50 -r 10000 -k

Transition seeds

A very small example for transition seeds (hill climbing):

iedera -transitive -w 11,11 -s 11,22 -r 10000 -k

Lossless seeds

A very small example for lossless seeds (from Burkhard&Karkkainen) : find a lossless seed of weight 12, span at most 19, on alignments of length 25 with 2 mismatches:

iedera -spaced -s 12,19 -w 12,12 -l 25 -L 1,0 -X 2

A second example for lossless seeds (from Kucherov,Noe&Roytberg) on the previous problem, but with two seeds of weight 14, and span between 20 and 21 (to ease the search):

iedera -spaced -l 25 -L 1,0 -X 2 -n 2 -s 20,21 -w 14,14  -r 100..some.zeros..00 -k

IUPAC seeds

IUPAC filtered seeds could challenge minimizer based techniques <https://www.biorxiv.org/content/10.1101/2020.07.24.220616v2>, so we have extended the iedera tool to support such seeds

First getting the alignment probabilities, out of the TAM92 model <https://pubmed.ncbi.nlm.nih.gov/1630306/>, then launching the optimization for a starting shape, and with the given probabilities:

iedera -iupac -s 5,17 -m "RYYNNNNN,RRYNNNNN" -i shuffle  -r 10000 -k -z 100 -f  `./tam92.py -p 20 -k 1 --gc 50`

YNYRNNnnNN,RNYRNnnNNN       0.9999961853027 0.912921        0.087079

Here :

N is a mach symbol (equivalent to #)
n is a dont care symbol (equivalent to -)
R and Y (uppercase) are respectively Purine and Pyrimine Matches (e.g. R is A-A or T-T matches but not A-T or T-A; use downcase symbols to allow all)

Input/Ouput and reoptimization

Sometimes, it may be helpful to rerun several times the same experiment, and keep the best result of all runs. This can be easily done with input/ouput:

`-e <filename>`	for input file (filename can be a non existing file)
`-o <filename>`	for output file (filename may be of same name as input)

so running this command-line multiple times:

iedera -spaced -l 25 -L 1,0 -X 2 -n 2 -w 14,14 -s 20,21 -r 10000 -k -e file_n2_w14_l25_x2_lossless.txt -o file_n2_w14_l25_x2_lossless.txt

will probably find a lossless set of two seeds. Running this command-line multiple times:

iedera -spaced -l 64 -n 2 -w 11,11 -s 11,22 -r 10000 -k -e file_n2_w11_l64_lossy.txt -o file_n2_w11_l64_lossy.txt

will also probably improve the sensitivity result.

Polynomial form

Bernoulli model

When the probability p to generate a match is not fixed (for example p=0.7 was set in all the previous examples), Mak & Benson have proposed to use a polynomial form and select what they called dominant seeds. We have noticed that this dominance applies as well for any other i.i.d criteria as the Hit Integration (Chung & Park), for Lossless seeds, and several discrete models ... (see <http://doi.org/10.1186/s13015-017-0092-1>) so the flag:

`-p`	to activate dominant selection and output polynomial coefficients

is added in the current commited version of iedera (master branch).

Other multivariate models

When the probabilitic model is more complex compared to a simple Bernoulli model on a binary alphabet, it is possible to compute the probability as a multivariate polynomial form. For a given seed provided with the -m parameter, the output will contain this polynomial form set in square brackets. Selection of the best seeds is left as an exercice for the reader. The flag -pF <filename> activates the output of the multivariate polynomial on the given model. The next example gives sensitivity of the seed 1101 on alignments of length 8

iedera -spaced -pF model_bernoulli_simple_x_xp.txt  -m "##-#" -l 8

on the bernoulli model provided by the file model_bernoulli_simple_x_xp.txt

Tools provided with iedera

The iedera binary is located in src/iedera. The scripts plot_spaced_seeds.py and plot_mow_seeds.py are provided to plot :

the sensitivity for a 1st hit, on alignments generated with a (parameter-free) bernoulli model,
the frequency for a 1st hit, on alignments generated with an increasing frequency of matches, for a set of given seeds.

References

how to cite this tool:

Kucherov G., Noe L., Roytberg, M., A unifying framework for seed sensitivity and its application to subset seeds, Journal of Bioinformatics and Computational Biology, 4(2):553-569, 2006 <http://doi.org/10.1142/S0219720006001977>

Noe L., Best hits of 11110110111: model-free selection and parameter-free sensitivity calculation of spaced seeds, Algorithms for Molecular Biology, 12(1). 2017 <http://doi.org/10.1186/s13015-017-0092-1>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Licenses found

Repository files navigation

iedera

Installation

Command-line

Examples

Spaced seeds

Transition seeds

Lossless seeds

IUPAC seeds

Input/Ouput and reoptimization

Polynomial form

Bernoulli model

Other multivariate models

Tools provided with iedera

References

About

Licenses found

Releases

Packages

Languages

License

bonsai-team/iedera

Folders and files

Latest commit

History

Repository files navigation

iedera

Installation

Command-line

Examples

Spaced seeds

Transition seeds

Lossless seeds

IUPAC seeds

Input/Ouput and reoptimization

Polynomial form

Bernoulli model

Other multivariate models

Tools provided with iedera

References

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages