Skip to content

Latest commit

 

History

History
91 lines (87 loc) · 5.85 KB

04.ShortStack analyis for milRNA discovery in H. vulgare and B. graminis f.sp. hordei.md

File metadata and controls

91 lines (87 loc) · 5.85 KB

Running ShortStack to discover candidate milRNAs

We used ShortStack and the trimmed sRNA-seq read files as input to predict milRNAs in Blumeria graminis f.sp. hordei and Hordeum vulgare. In case of Blumeria graminis f.sp. hordei, we ran the code using the full genome assembly. The code was run for files RSB15248-1 through RSB15248-18, generating a total of 18 ShortStack outputs.

ShortStack -dicermin 18 -mincov 2 --readfile RSB15248-1.trimmed.fastq --genomefile bgh_dh14_v4.fa

For Hordeum vulgare, we ran the ShortStack pipeline for each of the 8 Hordeum vulgare chromosomes separetely, since the genome was too large to process as a whole. The codes were run for files RSB15248-1 through RSB15248-18, generating a total of 144 ShortStack outputs.

ShortStack -dicermin 18 -mincov 2 --readfile RSB15248-1.trimmed.fastq --genomefile Hordeum_vulgare.IBSC_v2.chr1H.fa
ShortStack -dicermin 18 -mincov 2 --readfile RSB15248-1.trimmed.fastq --genomefile Hordeum_vulgare.IBSC_v2.chr2H.fa
ShortStack -dicermin 18 -mincov 2 --readfile RSB15248-1.trimmed.fastq --genomefile Hordeum_vulgare.IBSC_v2.chr3H.fa
ShortStack -dicermin 18 -mincov 2 --readfile RSB15248-1.trimmed.fastq --genomefile Hordeum_vulgare.IBSC_v2.chr4H.fa
ShortStack -dicermin 18 -mincov 2 --readfile RSB15248-1.trimmed.fastq --genomefile Hordeum_vulgare.IBSC_v2.chr5H.fa
ShortStack -dicermin 18 -mincov 2 --readfile RSB15248-1.trimmed.fastq --genomefile Hordeum_vulgare.IBSC_v2.chr6H.fa
ShortStack -dicermin 18 -mincov 2 --readfile RSB15248-1.trimmed.fastq --genomefile Hordeum_vulgare.IBSC_v2.chr7H.fa
ShortStack -dicermin 18 -mincov 2 --readfile RSB15248-1.trimmed.fastq --genomefile Hordeum_vulgare.IBSC_v2.chrUn.fa

ShortStack generates several files. Results.txt contains all identified loci and their DicerCall and Phasing scores. The BAM files is the short read mapping file. The GFF3 file ShortStack_D.gff3 contains all identified loci that passed the DicerCall prediction.

Counts.txt
ErrorLogs.txt
Log.txt
MIRNAs
Results.txt
RSB15248-1.bgh.bam
ShortStack_All.gff3
ShortStack_D.gff3
ShortStack_N.gff3
Unplaced.txt

Parsing BAM files

We used samtools to remove unmapped reads from BAM files. We ran this code on all BAM files produced by bowtie in the ShortStack pipeline.

samtools view -F 0x04 -b RSB15248-1.bgh.bam > RSB15248-1.bgh_aligned.bam

In case of Hordeum vulgare, we then combined the BAM files for the respective chromosomes.

samtools merge RSB15248-1.Hvu_merged.bam RSB15248-1.hvu_aligned.chr1H.bam RSB15248-1.hvu_aligned.chr2H.bam RSB15248-1.hvu_aligned.chr3H.bam RSB15248-1.hvu_aligned.chr4H.bam RSB15248-1.hvu_aligned.chr5H.bam RSB15248-1.hvu_aligned.chr6H.bam RSB15248-1.hvu_aligned.chr7H.bam RSB15248-1.hvu_aligned.chrUn.bam  

Parsing tables for obtaining non-redundant milRNA loci

We started by subsetting the Results.txt files for those clusters that had a MIRNA score of N15 or Y.

#for RSB15248-1
awk '{if ($13 == "Y" || $13 == "N15") print }' Results.txt > Results_N15_RSB15248-1.txt
[...]
#for RSB15248-18
awk '{if ($13 == "Y" || $13 == "N15") print }' Results.txt > Results_N15_RSB15248-18.txt

Then, subsetting the GFF files for these loci.

awk '{print $2}' Results_N15_RSB15248-1.txt | grep -Fwf - ShortStack_All.gff3 > RSB15248-1.gff3
[...]
awk '{print $2}' Results_N15_RSB15248-18.txt | grep -Fwf - ShortStack_All.gff3 > RSB15248-18.gff3

In case of B. hordei, we then used BEDtools to merge files to nonredundant loci.

cat RSB15248-1.gff3 RSB15248-2.gff3 RSB15248-3.gff3 RSB15248-4.gff3 RSB15248-5.gff3 \
  RSB15248-6.gff3 RSB15248-7.gff3 RSB15248-8.gff3 RSB15248-9.gff3 RSB15248-10.gff3 \
  RSB15248-11.gff3 RSB15248-12.gff3 RSB15248-13.gff3 RSB15248-14.gff3 RSB15248-15.gff3 \
  RSB15248-16.gff3 RSB15248-17.gff3 RSB15248-18.gff3 | bedtools merge > merge_Bgh.gff3

In case of H. vulgare, we performed the bedtools intersect after merging the filtered Results.txt and ShortStack_All.gff3 from the various chromosomes. First, it was required to rename the individual clusters according to the respective chromosomes.

# example chrUn, done same way for chr1H, chr2H, etc. 
sed 's/Cluster_/Cluster-chrUn_/g' Results.txt > Results.renamed.txt
sed 's/Cluster_/Cluster-chrUn_/g' ShortStack_All.gff3 > ShortStack_All.renamed.gff3

Then, files could be merged without conflict using cat.

# example RSB15248-1, done same way for all samples
cat ShortStack_RSB15248-1_chr1H/Results.renamed.txt ShortStack_RSB15248-1_chr2H/Results.renamed.txt \
  ShortStack_RSB15248-1_chr3H/Results.renamed.txt ShortStack_RSB15248-1_chr4H/Results.renamed.txt \
  ShortStack_RSB15248-1_chr5H/Results.renamed.txt ShortStack_RSB15248-1_chr6H/Results.renamed.txt \
  ShortStack_RSB15248-1_chr7H/Results.renamed.txt ShortStack_RSB15248-1_chrUn/Results.renamed.txt \
  > Results_N15_RSB15248-1_Hvu.txt
cat ShortStack_RSB15248-1_chr1H/ShortStack_All.renamed.txt ShortStack_RSB15248-1_chr2H/ShortStack_All.renamed.txt \
  ShortStack_RSB15248-1_chr3H/ShortStack_All.renamed.txt ShortStack_RSB15248-1_chr4H/ShortStack_All.renamed.txt \
  ShortStack_RSB15248-1_chr5H/ShortStack_All.renamed.txt ShortStack_RSB15248-1_chr6H/ShortStack_All.renamed.txt \
  ShortStack_RSB15248-1_chr7H/ShortStack_All.renamed.txt ShortStack_RSB15248-1_chrUn/ShortStack_All.renamed.txt \
  > RSB15248-1_Hvu.gff3

Now, data were ready for bedtools merge.

cat RSB15248-1_Hvu.gff3 RSB15248-2_Hvu.gff3 RSB15248-3_Hvu.gff3 RSB15248-4_Hvu.gff3 RSB15248-5_Hvu.gff3 \
  RSB15248-6_Hvu.gff3 RSB15248-7_Hvu.gff3 RSB15248-8_Hvu.gff3 RSB15248-9_Hvu.gff3 RSB15248-10_Hvu.gff3 \
  RSB15248-11_Hvu.gff3 RSB15248-12_Hvu.gff3 RSB15248-13_Hvu.gff3 RSB15248-14_Hvu.gff3 RSB15248-15_Hvu.gff3 \
  RSB15248-16_Hvu.gff3 RSB15248-17_Hvu.gff3 RSB15248-18_Hvu.gff3 | bedtools merge > merge_Hvu.gff3