-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge branch 'implementNanopolish' into master branch. This implement…
…s the full nanopolish protocol with tests
- Loading branch information
Showing
10 changed files
with
615 additions
and
22 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,6 +1,6 @@ | ||
#!/bin/bash | ||
|
||
curdir=`pwd` | ||
export PATH=$PATH:$curdir/DAZZ_DB:$curdir/DALIGNER:$curdir/nanocorrect:$curdir/poaV2:$curdir/ncbi-blast-2.4.0+/bin | ||
export PATH=$PATH:$curdir/DAZZ_DB:$curdir/DALIGNER:$curdir/nanocorrect:$curdir/poaV2:$curdir/ncbi-blast-2.4.0+/bin:$curdir/bwa:$curdir/samtools | ||
echo $PATH | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,33 @@ | ||
#!/bin/bash | ||
|
||
|
||
#This is a basal integration test. It assumes the user correctly installed the dependencies in INSTALL.md, ran install.sh and env.sh in the current shell. This script will download a set of 75 sequences. All the sequences belong to pacbio barcode 01. A ~40 MB set of raw fast5 files will also be download, since fast5 files are needed for nanopolish. | ||
#poreFUME is fired up, check whether this matches your local settings, such as cores and directory paths. | ||
#Finally, the output (CARD annotation) is compared to the pre-computed CARD annotation (in test/data) and should be similar. | ||
|
||
|
||
curdir=`pwd` | ||
echo $curdir | ||
|
||
if [ ! -f test/data/testSet75.tar.gz ]; then | ||
cd test | ||
cd data | ||
wget http://www.student.dtu.dk/~evand/poreFUME_data/testSet75.tar.gz | ||
tar -zxvf testSet75.tar.gz | ||
fi | ||
|
||
if [ ! -f inputData/testSet75.fasta ]; then | ||
cd $curdir | ||
cd inputData | ||
wget http://www.student.dtu.dk/~evand/poreFUME_data/testSet75.fasta | ||
fi | ||
|
||
cd $curdir | ||
python poreFUME.py inputData/testSet75.fasta inputData/pb_39.fasta --PacBioLegacyBarcode --cores 8 --pathCARD=inputData/n.fasta.protein.homolog.fasta --pathNanocorrect=$curdir/nanocorrect/ --pathRawreads=$curdir/test/data/testSet75 --overwriteNanocorrect --pathNanopolish=$curdir/nanopolish/ --overwriteNanopolish --overwriteDemux --overwriteCARD | ||
|
||
if ! diff -q output/annotation/testSet75/testSet75.afterNP.annotated.csv test/data/testSet75.afterNP.annotated.csv > /dev/null 2>&1; then | ||
echo "Integration test failed, the output in output/annotation/testSet75/testSet75.afterNP.annotated.csv is not what is expected" | ||
else | ||
echo "Integration test passed!" | ||
fi | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,50 @@ | ||
#### This little script filters the CIGAR strings and only when the alignment has a mimumum amount of bases it will be pushed to stdout | ||
import sys | ||
import re | ||
from signal import signal, SIGPIPE, SIG_DFL | ||
signal(SIGPIPE,SIG_DFL) | ||
|
||
import optparse | ||
|
||
parser = optparse.OptionParser() | ||
|
||
parser.add_option('-m', '--minAlignment', | ||
action="store", dest="minAlignment", | ||
help="set the minimal alignment length", default="spam") | ||
|
||
options, args = parser.parse_args() | ||
|
||
|
||
if __name__ == "__main__": | ||
for line in sys.stdin: | ||
if line[0:1] == "@": | ||
sys.stdout.write(line) | ||
continue | ||
# sys.stderr.write("DEBUG: got line: " + line) | ||
#sys.stdout.write(line) | ||
|
||
|
||
|
||
|
||
regex = r"(\d+)M" | ||
|
||
test_str = line.split('\t')[5] | ||
|
||
matches = re.finditer(regex, test_str) | ||
|
||
alignMatch = 0 | ||
for matchNum, match in enumerate(matches): | ||
matchNum = matchNum + 1 | ||
|
||
# print ("Match {matchNum} was found at {start}-{end}: {match}".format(matchNum = matchNum, start = match.start(), end = match.end(), match = match.group())) | ||
|
||
for groupNum in range(0, len(match.groups())): | ||
groupNum = groupNum + 1 | ||
|
||
# print ("Group {groupNum} found at {start}-{end}: {group}".format(groupNum = groupNum, start = match.start(groupNum), end = match.end(groupNum), group = match.group(groupNum))) | ||
alignMatch += int(match.group(groupNum)) | ||
|
||
if alignMatch > int(options.minAlignment): | ||
sys.stdout.write(line) | ||
# print alignMatch | ||
|
Oops, something went wrong.