We used ShortStack
and the trimmed sRNA-seq read files as input to predict milRNAs in Blumeria graminis f.sp. hordei and Hordeum vulgare. In case of Blumeria graminis f.sp. hordei, we ran the code using the full genome assembly. The code was run for files RSB15248-1 through RSB15248-18, generating a total of 18 ShortStack
outputs.
ShortStack -dicermin 18 -mincov 2 --readfile RSB15248-1.trimmed.fastq --genomefile bgh_dh14_v4.fa
For Hordeum vulgare, we ran the ShortStack
pipeline for each of the 8 Hordeum vulgare chromosomes separetely, since the genome was too large to process as a whole. The codes were run for files RSB15248-1 through RSB15248-18, generating a total of 144 ShortStack
outputs.
ShortStack -dicermin 18 -mincov 2 --readfile RSB15248-1.trimmed.fastq --genomefile Hordeum_vulgare.IBSC_v2.chr1H.fa
ShortStack -dicermin 18 -mincov 2 --readfile RSB15248-1.trimmed.fastq --genomefile Hordeum_vulgare.IBSC_v2.chr2H.fa
ShortStack -dicermin 18 -mincov 2 --readfile RSB15248-1.trimmed.fastq --genomefile Hordeum_vulgare.IBSC_v2.chr3H.fa
ShortStack -dicermin 18 -mincov 2 --readfile RSB15248-1.trimmed.fastq --genomefile Hordeum_vulgare.IBSC_v2.chr4H.fa
ShortStack -dicermin 18 -mincov 2 --readfile RSB15248-1.trimmed.fastq --genomefile Hordeum_vulgare.IBSC_v2.chr5H.fa
ShortStack -dicermin 18 -mincov 2 --readfile RSB15248-1.trimmed.fastq --genomefile Hordeum_vulgare.IBSC_v2.chr6H.fa
ShortStack -dicermin 18 -mincov 2 --readfile RSB15248-1.trimmed.fastq --genomefile Hordeum_vulgare.IBSC_v2.chr7H.fa
ShortStack -dicermin 18 -mincov 2 --readfile RSB15248-1.trimmed.fastq --genomefile Hordeum_vulgare.IBSC_v2.chrUn.fa
ShortStack
generates several files. Results.txt
contains all identified loci and their DicerCall and Phasing scores. The BAM files is the short read mapping file. The GFF3 file ShortStack_D.gff3
contains all identified loci that passed the DicerCall prediction.
Counts.txt
ErrorLogs.txt
Log.txt
MIRNAs
Results.txt
RSB15248-1.bgh.bam
ShortStack_All.gff3
ShortStack_D.gff3
ShortStack_N.gff3
Unplaced.txt
We used samtools
to remove unmapped reads from BAM files. We ran this code on all BAM files produced by bowtie
in the ShortStack
pipeline.
samtools view -F 0x04 -b RSB15248-1.bgh.bam > RSB15248-1.bgh_aligned.bam
In case of Hordeum vulgare, we then combined the BAM files for the respective chromosomes.
samtools merge RSB15248-1.Hvu_merged.bam RSB15248-1.hvu_aligned.chr1H.bam RSB15248-1.hvu_aligned.chr2H.bam RSB15248-1.hvu_aligned.chr3H.bam RSB15248-1.hvu_aligned.chr4H.bam RSB15248-1.hvu_aligned.chr5H.bam RSB15248-1.hvu_aligned.chr6H.bam RSB15248-1.hvu_aligned.chr7H.bam RSB15248-1.hvu_aligned.chrUn.bam
We started by subsetting the Results.txt
files for those clusters that had a MIRNA score of N15 or Y.
#for RSB15248-1
awk '{if ($13 == "Y" || $13 == "N15") print }' Results.txt > Results_N15_RSB15248-1.txt
[...]
#for RSB15248-18
awk '{if ($13 == "Y" || $13 == "N15") print }' Results.txt > Results_N15_RSB15248-18.txt
Then, subsetting the GFF files for these loci.
awk '{print $2}' Results_N15_RSB15248-1.txt | grep -Fwf - ShortStack_All.gff3 > RSB15248-1.gff3
[...]
awk '{print $2}' Results_N15_RSB15248-18.txt | grep -Fwf - ShortStack_All.gff3 > RSB15248-18.gff3
In case of B. hordei, we then used BEDtools
to merge files to nonredundant loci.
cat RSB15248-1.gff3 RSB15248-2.gff3 RSB15248-3.gff3 RSB15248-4.gff3 RSB15248-5.gff3 \
RSB15248-6.gff3 RSB15248-7.gff3 RSB15248-8.gff3 RSB15248-9.gff3 RSB15248-10.gff3 \
RSB15248-11.gff3 RSB15248-12.gff3 RSB15248-13.gff3 RSB15248-14.gff3 RSB15248-15.gff3 \
RSB15248-16.gff3 RSB15248-17.gff3 RSB15248-18.gff3 | bedtools merge > merge_Bgh.gff3
In case of H. vulgare, we performed the bedtools intersect
after merging the filtered Results.txt
and ShortStack_All.gff3
from the various chromosomes. First, it was required to rename the individual clusters according to the respective chromosomes.
# example chrUn, done same way for chr1H, chr2H, etc.
sed 's/Cluster_/Cluster-chrUn_/g' Results.txt > Results.renamed.txt
sed 's/Cluster_/Cluster-chrUn_/g' ShortStack_All.gff3 > ShortStack_All.renamed.gff3
Then, files could be merged without conflict using cat
.
# example RSB15248-1, done same way for all samples
cat ShortStack_RSB15248-1_chr1H/Results.renamed.txt ShortStack_RSB15248-1_chr2H/Results.renamed.txt \
ShortStack_RSB15248-1_chr3H/Results.renamed.txt ShortStack_RSB15248-1_chr4H/Results.renamed.txt \
ShortStack_RSB15248-1_chr5H/Results.renamed.txt ShortStack_RSB15248-1_chr6H/Results.renamed.txt \
ShortStack_RSB15248-1_chr7H/Results.renamed.txt ShortStack_RSB15248-1_chrUn/Results.renamed.txt \
> Results_N15_RSB15248-1_Hvu.txt
cat ShortStack_RSB15248-1_chr1H/ShortStack_All.renamed.txt ShortStack_RSB15248-1_chr2H/ShortStack_All.renamed.txt \
ShortStack_RSB15248-1_chr3H/ShortStack_All.renamed.txt ShortStack_RSB15248-1_chr4H/ShortStack_All.renamed.txt \
ShortStack_RSB15248-1_chr5H/ShortStack_All.renamed.txt ShortStack_RSB15248-1_chr6H/ShortStack_All.renamed.txt \
ShortStack_RSB15248-1_chr7H/ShortStack_All.renamed.txt ShortStack_RSB15248-1_chrUn/ShortStack_All.renamed.txt \
> RSB15248-1_Hvu.gff3
Now, data were ready for bedtools merge
.
cat RSB15248-1_Hvu.gff3 RSB15248-2_Hvu.gff3 RSB15248-3_Hvu.gff3 RSB15248-4_Hvu.gff3 RSB15248-5_Hvu.gff3 \
RSB15248-6_Hvu.gff3 RSB15248-7_Hvu.gff3 RSB15248-8_Hvu.gff3 RSB15248-9_Hvu.gff3 RSB15248-10_Hvu.gff3 \
RSB15248-11_Hvu.gff3 RSB15248-12_Hvu.gff3 RSB15248-13_Hvu.gff3 RSB15248-14_Hvu.gff3 RSB15248-15_Hvu.gff3 \
RSB15248-16_Hvu.gff3 RSB15248-17_Hvu.gff3 RSB15248-18_Hvu.gff3 | bedtools merge > merge_Hvu.gff3