Each of the read files contained reads of 15-75 nt in length. We extracted the read size distributions of the trimmed reads as follows:
awk '{if(NR%4==2) print length($1)}' RSB15248-1.trimmed.fastq | sort -n | uniq -c > RSB15248-1.sizeDistr.txt
This results in a two-column output, where column 1 contains the counts and column 2 the read size. To summarize the results in one table:
paste <(awk '{print $2}' RSB15248-1.sizeDistr.txt) <(awk '{print $1}' RSB15248-1.sizeDistr.txt) <(awk '{print $1}' RSB15248-2.sizeDistr.txt) <(awk '{print $1}' RSB15248-3.sizeDistr.txt) <(awk '{print $1}' RSB15248-4.sizeDistr.txt) <(awk '{print $1}' RSB15248-5.sizeDistr.txt) <(awk '{print $1}' RSB15248-6.sizeDistr.txt) <(awk '{print $1}' RSB15248-7.sizeDistr.txt) <(awk '{print $1}' RSB15248-8.sizeDistr.txt) <(awk '{print $1}' RSB15248-9.sizeDistr.txt) <(awk '{print $1}' RSB15248-10.sizeDistr.txt) <(awk '{print $1}' RSB15248-11.sizeDistr.txt) <(awk '{print $1}' RSB15248-12.sizeDistr.txt) <(awk '{print $1}' RSB15248-13.sizeDistr.txt) <(awk '{print $1}' RSB15248-14.sizeDistr.txt) <(awk '{print $1}' RSB15248-15.sizeDistr.txt) <(awk '{print $1}' RSB15248-16.sizeDistr.txt) <(awk '{print $1}' RSB15248-17.sizeDistr.txt) <(awk '{print $1}' RSB15248-18.sizeDistr.txt) > sizeDistr.txt
Generating a table with read size distribution, which can be inspected by head sizeDistr.txt
:
15 1380177 1290395 1796106 836532 653638 762148 363552 330822 300718 186332 292662 224531 271791 474893 202018 588802 707857 701715
16 4385732 3648088 3364706 924891 596408 882130 472160 477660 419417 240304 504547 289781 377443 415108 286773 1080844 1263246 1179919
17 1558281 1706594 2822221 1458984 946189 1408104 870096 834546 752348 353522 695557 339091 537506 583595 453019 1750946 2012774 1861221
18 1855285 1962225 2984519 2130960 1411695 2025815 1786815 1564398 1448955 503534 904296 436786 479231 642339 511473 2133160 2431974 2223099
19 3604061 3294956 3164676 2126402 1417759 2031843 1591491 1524530 1363873 506642 994414 428036 520634 622376 509871 2223685 2522453 2331232
20 2581484 2549389 2846196 1511700 1025126 1458101 869478 876975 805377 611789 1109874 455417 519656 621036 578042 2488148 2749977 2570071
We also generated statistics of the number of reads mapping to specific features in the two reference genomes, separated by read length. First, we created a read length index.
samtools faidx RSB15248-1.trimmed.fasta
awk '{print $1,$2}' RSB15248-1.trimmed.fasta.fai | sed 's/\:/\_/g' > RSB15248-1.trimmed.fasta.length
We then used the BAM output files from bowtie
mapping of short reads within the ShortStack
pipeline. Mapping was done against the reference genomes of Blumeria graminis f.sp. hordei (Frantzeskakis et al., 2018) and Hordeum vulgare (Hordeum_vulgare.IBSC_v2), respectively. First, we determined the overall mapping to both genomes:
# Blumeria graminis f.sp. hordei
samtools view -F 4 RSB15248-1.bgh_aligned.bam | awk '{print $1}' | sed 's/\:/\_/g' | awk '!x[$0]++' | grep -Fwf - RSB15248-1.trimmed.fasta.length | awk '{print $2}' | sort -n | uniq -c | awk '{print $1,$2}' > RSB15248-1.bgh_all_mapped.txt
# Hordeum vulgare
samtools view -F 4 merged_RSB15248-1.hvu_aligned.bam | awk '{print $1}' | sed 's/\:/\_/g' | awk '!x[$0]++' | grep -Fwf - RSB15248-1.trimmed.fasta.length | awk '{print $2}' | sort -n | uniq -c | awk '{print $1,$2}' > RSB15248-1.hvu_all_mapped.txt
We summarized the results from each sample into one document as follows:
# Blumeria graminis f.sp. hordei
paste RSB15248-1.bgh_all_mapped.txt RSB15248-2.bgh_all_mapped.txt RSB15248-3.bgh_all_mapped.txt RSB15248-4.bgh_all_mapped.txt RSB15248-5.bgh_all_mapped.txt RSB15248-6.bgh_all_mapped.txt RSB15248-7.bgh_all_mapped.txt RSB15248-8.bgh_all_mapped.txt RSB15248-9.bgh_all_mapped.txt RSB15248-10.bgh_all_mapped.txt RSB15248-11.bgh_all_mapped.txt RSB15248-12.bgh_all_mapped.txt RSB15248-13.bgh_all_mapped.txt RSB15248-14.bgh_all_mapped.txt RSB15248-15.bgh_all_mapped.txt RSB15248-16.bgh_all_mapped.txt RSB15248-17.bgh_all_mapped.txt RSB15248-18.bgh_all_mapped.txt | cut -f 1,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36 > bgh_all_mapped.txt
# Hordeum vulgare
paste RSB15248-1.hvu_all_mapped.txt RSB15248-2.hvu_all_mapped.txt RSB15248-3.hvu_all_mapped.txt RSB15248-4.hvu_all_mapped.txt RSB15248-5.hvu_all_mapped.txt RSB15248-6.hvu_all_mapped.txt RSB15248-7.hvu_all_mapped.txt RSB15248-8.hvu_all_mapped.txt RSB15248-9.hvu_all_mapped.txt RSB15248-10.hvu_all_mapped.txt RSB15248-11.hvu_all_mapped.txt RSB15248-12.hvu_all_mapped.txt RSB15248-13.hvu_all_mapped.txt RSB15248-14.hvu_all_mapped.txt RSB15248-15.hvu_all_mapped.txt RSB15248-16.hvu_all_mapped.txt RSB15248-17.hvu_all_mapped.txt RSB15248-18.hvu_all_mapped.txt | cut -f 1,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36 > hvu_all_mapped.txt
In order to determine the number of reads of a given read length mapping to genomic features, i.e., transposable elements, coding genes, or milRNA genes, we first created necessary temporary files, i.e., the SAM header, and a non-repetitive text (SAM) file. The codes used for Blumeria graminis f.sp. hordei:
# SAM header
samtools view -F 4 RSB15248-1.bgh_aligned.bam | grep '@' > RSB15248-1.bgh.head
# temporary non-redundant read mapping file
samtools view -F 4 RSB15248-1.bgh_aligned.bam | awk '!visited[$1]++' | awk '{gsub(":","_",$1)}1' | sed 's/[[:space:]]/\t/g' > RSB15248-1.bgh_aligned.sam
Then, creating the read mapping SAM files for each read length separately:
grep -Fwf <(awk '($2==15)' RSB15248-1.trimmed.fasta.length | awk '{print $1}') RSB15248-1.bgh_aligned.sam | cat RSB15248-1.bgh.head - > RSB15248-1_15.bgh.sam
...
grep -Fwf <(awk '($2==40)' RSB15248-1.trimmed.fasta.length | awk '{print $1}') RSB15248-1.bgh_aligned.sam | cat RSB15248-1.bgh.head - > RSB15248-1_40.bgh.sam
The same for Hordeum vulgare.
# SAM header
samtools view -F 4 RSB15248-1_merged.hvu_aligned.bam | grep '@' > RSB15248-1.hvu.head
# temporary non-redundant read mapping file
samtools view -F 4 RSB15248-1_merged.hvu_aligned.bam | awk '{gsub(":","_",$1)}1' | sed 's/[[:space:]]/\t/g' | awk '!visited[$1]++' > RSB15248-1_merged.hvu_aligned.sam
Creating the read mapping SAM files for each read length separately.
grep -Fwf <(awk '($2==15)' RSB15248-1.trimmed.fasta.length | awk '{print $1}') RSB15248-1_merged.hvu_aligned.sam | cat RSB15248-1.hvu.head - > RSB15248-1_15.hvu.sam
...
grep -Fwf <(awk '($2==40)' RSB15248-1.trimmed.fasta.length | awk '{print $1}') RSB15248-1_merged.hvu_aligned.sam | cat RSB15248-1.hvu.head - > RSB15248-1_40.hvu.sam
We then used featureCounts
to determine the number of reads mapping to the respective features. We used the GTF
annotation files for coding genes, milRNAs, and transposable elements, parsed to look as follows:
$ head Bgh_milRNA_v1.gff
##gff-version 3 ;
scaffold_18 ShortStack nc_RNA 4313703 4313881 . + . ID=BghMilRNA_00001;DicerCall=22;MIRNA=Y;MajorRNA=UGAGAAAGCGGCCAAGAUGAUG;
scaffold_9 ShortStack nc_RNA 86105 86250 . + . ID=BghMilRNA_00002;DicerCall=22;MIRNA=Y;MajorRNA=UAGUCCGGCGUGUUGGACCCUC;
scaffold_8 ShortStack nc_RNA 7899823 7899937 . - . ID=BghMilRNA_00003;DicerCall=22;MIRNA=Y;MajorRNA=UGGGUAGUUGUUGAUGAAAUG;
scaffold_1 ShortStack nc_RNA 102466 102756 . - . ID=BghMilRNA_00004;DicerCall=21;MIRNA=N15;MajorRNA=UGGAGUGAGGAUAUUUGUUGG;
scaffold_1 ShortStack nc_RNA 148126 148145 . - . ID=BghMilRNA_00005;DicerCall=20;MIRNA=N15;MajorRNA=GAAGCGUGCCGUUGGUGGGG;
scaffold_1 ShortStack nc_RNA 235447 235467 . + . ID=BghMilRNA_00006;DicerCall=21;MIRNA=N15;MajorRNA=UGAACUAGGACUUCGGGGGAG;
scaffold_1 ShortStack nc_RNA 352439 352527 . - . ID=BghMilRNA_00007;DicerCall=21;MIRNA=N15;MajorRNA=UGAACUAGGACUUCGGGGGAG;
scaffold_10 ShortStack nc_RNA 30007 30076 . + . ID=BghMilRNA_00008;DicerCall=21;MIRNA=N15;MajorRNA=UUUGGAUCCAACAAUGUUGCG;
scaffold_10 ShortStack nc_RNA 121960 121998 . + . ID=BghMilRNA_00009;DicerCall=22;MIRNA=N15;MajorRNA=UACUUGAGUCGCGUGUCUGAA;
$ head bgh_dh14_v4-2.gff
##gff-version 3
scaffold_1 . gene 17468 19945 . - . ID=BLGH_07057;Name=bghG004133000001001;geneID=BLGH_07057;gene_name=BLGH_07057;Dbxref=Gene3D:G3DSA:1.10.510.10,InterPro:IPR011009,SUPERFAMILY:SSF56112;
scaffold_1 . mRNA 17468 19945 . - . ID=BLGH_07057-mRNA-1;Parent=BLGH_07057;geneID=BLGH_07057;gene_name=BLGH_07057;Dbxref=Gene3D:G3DSA:1.10.510.10,InterPro:IPR011009,SUPERFAMILY:SSF56112;
scaffold_1 . exon 17468 17592 . - . Parent=BLGH_07057-mRNA-1;Name=DB67CD106629CEB4D176A838D8BED4F9;
scaffold_1 . exon 17671 19767 . - . Parent=BLGH_07057-mRNA-1;Name=68A4AAC3A3D822E5D9BB543C42F25956;
scaffold_1 . exon 19814 19945 . - . Parent=BLGH_07057-mRNA-1;Name=1962183D7FA28EA06D835E98484FBFE6;
scaffold_1 . CDS 17549 17592 . - 2 Parent=BLGH_07057-mRNA-1;
scaffold_1 . CDS 17671 19767 . - 2 Parent=BLGH_07057-mRNA-1;
scaffold_1 . CDS 19814 19940 . - 0 Parent=BLGH_07057-mRNA-1;
scaffold_1 . gene 171190 173755 . - . ID=BLGH_07058;Name=bghG004133000001001;geneID=BLGH_07058;gene_name=BLGH_07058;
$ head bgh_dh14_v4.fa.out.gtf
##gff-version 2
##date 2019-04-05
##sequence-region bgh_dh14_v4.fa
scaffold_1 RepeatMasker gene 11 759 18.1 + . gene_id "Motif:HaTad1-2_BG"
scaffold_1 RepeatMasker gene 764 2908 15.5 + . gene_id "Motif:HaTad1-2_BG"
scaffold_1 RepeatMasker gene 2921 8190 6.5 + . gene_id "Motif:Tad1-17_BG"
scaffold_1 RepeatMasker gene 8191 8328 1.4 + . gene_id "Motif:Copia-18_BG-LTR"
scaffold_1 RepeatMasker gene 8329 12888 1.3 + . gene_id "Motif:Copia-18_BG-I"
scaffold_1 RepeatMasker gene 12889 13026 0.7 + . gene_id "Motif:Copia-18_BG-LTR"
scaffold_1 RepeatMasker gene 13027 13609 5.3 + . gene_id "Motif:Tad1-17_BG"
The counts were determined for RSB15248-1 (same procedure for the other 17 samples):
# mapped to milRNAs
featureCounts -T 2 -p -M --fraction -t nc_RNA -g ID -a Bgh_milRNA_v1.gff -o RSB15248-1.bgh-milRNA.txt RSB15248-1_15.bgh.sam RSB15248-1_16.bgh.sam RSB15248-1_17.bgh.sam RSB15248-1_18.bgh.sam RSB15248-1_19.bgh.sam RSB15248-1_20.bgh.sam RSB15248-1_21.bgh.sam RSB15248-1_22.bgh.sam RSB15248-1_23.bgh.sam RSB15248-1_24.bgh.sam RSB15248-1_25.bgh.sam RSB15248-1_26.bgh.sam RSB15248-1_27.bgh.sam RSB15248-1_28.bgh.sam RSB15248-1_29.bgh.sam RSB15248-1_30.bgh.sam RSB15248-1_31.bgh.sam RSB15248-1_32.bgh.sam RSB15248-1_33.bgh.sam RSB15248-1_34.bgh.sam RSB15248-1_35.bgh.sam RSB15248-1_36.bgh.sam RSB15248-1_37.bgh.sam RSB15248-1_38.bgh.sam RSB15248-1_39.bgh.sam RSB15248-1_40.bgh.sam
# mapped to coding genes
featureCounts -T 2 -p -M --fraction -t mRNA -g ID -a bgh_dh14_v4-2.gff -o RSB15248-1.bgh-gene.txt RSB15248-1_15.bgh.sam RSB15248-1_16.bgh.sam RSB15248-1_17.bgh.sam RSB15248-1_18.bgh.sam RSB15248-1_19.bgh.sam RSB15248-1_20.bgh.sam RSB15248-1_21.bgh.sam RSB15248-1_22.bgh.sam RSB15248-1_23.bgh.sam RSB15248-1_24.bgh.sam RSB15248-1_25.bgh.sam RSB15248-1_26.bgh.sam RSB15248-1_27.bgh.sam RSB15248-1_28.bgh.sam RSB15248-1_29.bgh.sam RSB15248-1_30.bgh.sam RSB15248-1_31.bgh.sam RSB15248-1_32.bgh.sam RSB15248-1_33.bgh.sam RSB15248-1_34.bgh.sam RSB15248-1_35.bgh.sam RSB15248-1_36.bgh.sam RSB15248-1_37.bgh.sam RSB15248-1_38.bgh.sam RSB15248-1_39.bgh.sam RSB15248-1_40.bgh.sam
# mapped to transposable elements
featureCounts -T 2 -p -M --fraction -t gene -g gene_id -a bgh_dh14_v4.fa.out.gtf -o RSB15248-1.bgh-TE.txt RSB15248-1_15.bgh.sam RSB15248-1_16.bgh.sam RSB15248-1_17.bgh.sam RSB15248-1_18.bgh.sam RSB15248-1_19.bgh.sam RSB15248-1_20.bgh.sam RSB15248-1_21.bgh.sam RSB15248-1_22.bgh.sam RSB15248-1_23.bgh.sam RSB15248-1_24.bgh.sam RSB15248-1_25.bgh.sam RSB15248-1_26.bgh.sam RSB15248-1_27.bgh.sam RSB15248-1_28.bgh.sam RSB15248-1_29.bgh.sam RSB15248-1_30.bgh.sam RSB15248-1_31.bgh.sam RSB15248-1_32.bgh.sam RSB15248-1_33.bgh.sam RSB15248-1_34.bgh.sam RSB15248-1_35.bgh.sam RSB15248-1_36.bgh.sam RSB15248-1_37.bgh.sam RSB15248-1_38.bgh.sam RSB15248-1_39.bgh.sam RSB15248-1_40.bgh.sam
Summarizing the results, written in .summary
, from all 18 samples in one text file:
grep Assigned *.summary > countsbysize.bgh.summary
The result should look like this:
countsbysize.bgh.summary:countsbyreadsize/RSB15248-10.bgh-gene.txt.summary:Assigned 30441 22983 18490 15850 16682 26754 108109 170422 39187 12580 9068 8145 7746 7191 6614 5545 4518 4285 2760 1954 1530 1128 838 590 386 232
countsbysize.bgh.summary:countsbyreadsize/RSB15248-10.bgh-milRNA.txt.summary:Assigned 148 178 251 514 542 1606 6329 6772 1036 249 92 88 59 116 54 113 34 31 31 11 4 3 3 4 5 0
countsbysize.bgh.summary:countsbyreadsize/RSB15248-10.bgh-TE.txt.summary:Assigned 44869 41885 52992 50749 63731 134833 626134 876226 176699 45947 27956 23742 20686 16090 14713 11030 14971 10693 6714 4996 4015 3427 2276 1392 1004 643
countsbysize.bgh.summary:countsbyreadsize/RSB15248-11.bgh-gene.txt.summary:Assigned 32625 36431 43605 52543 54804 72348 238140 318126 67765 22781 15782 14013 14264 12733 11441 9713 7887 6506 5140 4139 3271 2563 1909 1370 1049 765
countsbysize.bgh.summary:countsbyreadsize/RSB15248-11.bgh-milRNA.txt.summary:Assigned 189 375 586 937 1517 3638 13197 12125 1605 331 169 105 106 276 80 290 37 17 22 18 12 10 10 7 3 3
countsbysize.bgh.summary:countsbyreadsize/RSB15248-11.bgh-TE.txt.summary:Assigned 63577 85295 136516 171610 202025 352249 1400241 1679902 298650 70612 45265 47029 44027 35456 30501 31771 20780 19984 14162 11042 7796 5929 6148 3690 2462 2053
countsbysize.bgh.summary:countsbyreadsize/RSB15248-12.bgh-gene.txt.summary:Assigned 38939 25908 9384 4035 1038 419 1089 918 288 220 154 178 273 368 162 188 258 460 171 87 39 35 21 14 4 25
countsbysize.bgh.summary:countsbyreadsize/RSB15248-12.bgh-milRNA.txt.summary:Assigned 42 69 53 128 25 18 62 57 2 0
countsbysize.bgh.summary:countsbyreadsize/RSB15248-12.bgh-TE.txt.summary:Assigned 45017 39080 25008 39242 4190 2984 4682 4457 1667 1032 688 625 681 564 367 406 438 515 490 458 237 567 131 64 42 14
countsbysize.bgh.summary:countsbyreadsize/RSB15248-13.bgh-gene.txt.summary:Assigned 50095 28226 23555 2903 1502 427 491 593 287 242 211 201 292 535 313 362 662 1585 116 60 51 54 46 19 14 6
To use featureCounts
to determine the number of reads mapping to transposable elements, we needed to run RepeatMasker
for generating information about the transposon landscape in the genome of Hordeum vulgare. This was done for each chromosome separately.
RepeatMasker -pa 4 -species Hordeum -gff -dir RepeatMasker /hpcwork/rwth0146/SequencingData_published/Hordeum_vulgare_v160517/160404_barley_pseudomolecules_masked_chr1H.fasta;
RepeatMasker -pa 4 -species Hordeum -gff -dir RepeatMasker /hpcwork/rwth0146/SequencingData_published/Hordeum_vulgare_v160517/160404_barley_pseudomolecules_masked_chr2H.fasta;
RepeatMasker -pa 4 -species Hordeum -gff -dir RepeatMasker /hpcwork/rwth0146/SequencingData_published/Hordeum_vulgare_v160517/160404_barley_pseudomolecules_masked_chr3H.fasta;
RepeatMasker -pa 4 -species Hordeum -gff -dir RepeatMasker /hpcwork/rwth0146/SequencingData_published/Hordeum_vulgare_v160517/160404_barley_pseudomolecules_masked_chr4H.fasta;
RepeatMasker -pa 4 -species Hordeum -gff -dir RepeatMasker /hpcwork/rwth0146/SequencingData_published/Hordeum_vulgare_v160517/160404_barley_pseudomolecules_masked_chr5H.fasta;
RepeatMasker -pa 4 -species Hordeum -gff -dir RepeatMasker /hpcwork/rwth0146/SequencingData_published/Hordeum_vulgare_v160517/160404_barley_pseudomolecules_masked_chr6H.fasta;
RepeatMasker -pa 4 -species Hordeum -gff -dir RepeatMasker /hpcwork/rwth0146/SequencingData_published/Hordeum_vulgare_v160517/160404_barley_pseudomolecules_masked_chr7H.fasta;
RepeatMasker -pa 4 -species Hordeum -gff -dir RepeatMasker /hpcwork/rwth0146/SequencingData_published/Hordeum_vulgare_v160517/160404_barley_pseudomolecules_masked_chrUn.fasta
After, combining the RepeatMasker output GTF files.
cat 160404_barley_pseudomolecules_masked_chr1H.fasta.out.gff 160404_barley_pseudomolecules_masked_chr2H.fasta.out.gff 160404_barley_pseudomolecules_masked_chr3H.fasta.out.gff 160404_barley_pseudomolecules_masked_chr4H.fasta.out.gff 160404_barley_pseudomolecules_masked_chr5H.fasta.out.gff 160404_barley_pseudomolecules_masked_chr6H.fasta.out.gff 160404_barley_pseudomolecules_masked_chr7H.fasta.out.gff 160404_barley_pseudomolecules_masked_chrUn.fasta.out.gff > 160404_barley_pseudomolecules_masked.fasta.out.gtf
Then, we used GTF files for all the features, as for Blumeria graminis f.sp. hordei, looking like this:
$ head Hvu_milRNA_v1.gff
##gff-version 3
#miRNAs
chr1H ShortStack nc_RNA 3894031 3894149 . + . ID=HvuMilRNA_00001;DicerCall=18;MIRNA=Y;MajorRNA=GGCGGAGGCGGUGGCGGC;
chr1H ShortStack nc_RNA 18775469 18775707 . + ID=HvuMilRNA_00002;DicerCall=21;MIRNA=Y;MajorRNA=GAGGGCGGCGAUAACAUUUUC;
chr1H ShortStack nc_RNA 28809524 28809735 . - ID=HvuMilRNA_00003;DicerCall=23;MIRNA=Y;MajorRNA=CGGUGGGUGCGCCGCCGUCGAGC;AlternativeName=miR5384;
chr1H ShortStack nc_RNA 230391203 230391338 . + ID=HvuMilRNA_00004;DicerCall=23;MIRNA=Y;MajorRNA=CGGUGGGUGCGCCGCCGUCGAGC;AlternativeName=miR5384;
chr1H ShortStack nc_RNA 282589545 282589626 . + ID=HvuMilRNA_00005;DicerCall=19;MIRNA=Y;MajorRNA=UAGUGAUUUUCGUGAUUUG;
chr1H ShortStack nc_RNA 316021162 316021259 . - ID=HvuMilRNA_00006;DicerCall=21;MIRNA=Y;MajorRNA=AAAUGUACGGAAGGGUAUUUC;
chr1H ShortStack nc_RNA 353506970 353507136 . + ID=HvuMilRNA_00007;DicerCall=21;MIRNA=Y;MajorRNA=UCGGACCAGGCUUCAUUCCCC;AlternativeName=miR166;
chr1H ShortStack nc_RNA 389189884 389190124 . - ID=HvuMilRNA_00008;DicerCall=21;MIRNA=Y;MajorRNA=UGAUUGAGCCGUGCCAAUAUC;AlternativeName=miR171;
$ head Hv_IBSC_PGSB_r1_HighConf.gtf
chr1H Hv_IBSC_PGSB_r1 transcript 41811 45049 . + . gene_id "HORVU1Hr1G000010"; transcript_id "HORVU1Hr1G000010.1"; clst_id "TCONS_00000001"; confidence_class "HC_G"
chr1H Hv_IBSC_PGSB_r1 exon 41811 42213 . + . gene_id "HORVU1Hr1G000010"; transcript_id "HORVU1Hr1G000010.1"; exon_number "1"; confidence_class "HC_G"
chr1H Hv_IBSC_PGSB_r1 exon 42300 42338 . + . gene_id "HORVU1Hr1G000010"; transcript_id "HORVU1Hr1G000010.1"; exon_number "2"; confidence_class "HC_G"
chr1H Hv_IBSC_PGSB_r1 exon 42427 42567 . + . gene_id "HORVU1Hr1G000010"; transcript_id "HORVU1Hr1G000010.1"; exon_number "3"; confidence_class "HC_G"
chr1H Hv_IBSC_PGSB_r1 exon 42862 43164 . + . gene_id "HORVU1Hr1G000010"; transcript_id "HORVU1Hr1G000010.1"; exon_number "4"; confidence_class "HC_G"
chr1H Hv_IBSC_PGSB_r1 exon 43649 43714 . + . gene_id "HORVU1Hr1G000010"; transcript_id "HORVU1Hr1G000010.1"; exon_number "5"; confidence_class "HC_G"
chr1H Hv_IBSC_PGSB_r1 exon 43970 44086 . + . gene_id "HORVU1Hr1G000010"; transcript_id "HORVU1Hr1G000010.1"; exon_number "6"; confidence_class "HC_G"
chr1H Hv_IBSC_PGSB_r1 exon 44196 45049 . + . gene_id "HORVU1Hr1G000010"; transcript_id "HORVU1Hr1G000010.1"; exon_number "7"; confidence_class "HC_G"
chr1H Hv_IBSC_PGSB_r1 5UTR 41811 42213 . + . gene_id "HORVU1Hr1G000010"; transcript_id "HORVU1Hr1G000010.1"; 5UTR_number "1"; confidence_class "HC_G"
chr1H Hv_IBSC_PGSB_r1 5UTR 42300 42338 . + . gene_id "HORVU1Hr1G000010"; transcript_id "HORVU1Hr1G000010.1"; 5UTR_number "2"; confidence_class "HC_G"
$ head 160404_barley_pseudomolecules_masked.fasta.out.gtf
##gff-version 2
##date 2021-12-16
##sequence-region 160404_barley_pseudomolecules_masked_chr1H.fasta
chr1H RepeatMasker gene 3 240 16.5 + . gene_id "Motif:BARE1_HV-LTR"
chr1H RepeatMasker gene 642 1193 23.8 + . gene_id "Motif:BARE-2_HV_LTR"
chr1H RepeatMasker gene 1332 1771 25.3 + . gene_id "Motif:BARE-2_HV_LTR"
chr1H RepeatMasker gene 1799 1851 4.1 - . gene_id "Motif:BARE1A_HV-LTR"
chr1H RepeatMasker gene 2009 2466 27.0 - . gene_id "Motif:BARE-2_HV_LTR"
chr1H RepeatMasker gene 2704 3436 21.9 - . gene_id "Motif:BARE1_HV-LTR"
chr1H RepeatMasker gene 3537 4942 16.1 - . gene_id "Motif:TREP37"
The counts were determined for RSB15248-1 (same procedure for the other 17 samples):
# mapped to milRNAs
featureCounts -T 8 -p -M --fraction -t nc_RNA -g ID -a Hvu_milRNA_v1.gff -o RSB15248-1.hvu-milRNA.txt RSB15248-1_15.hvu.sam RSB15248-1_16.hvu.sam RSB15248-1_17.hvu.sam RSB15248-1_18.hvu.sam RSB15248-1_19.hvu.sam RSB15248-1_20.hvu.sam RSB15248-1_21.hvu.sam RSB15248-1_22.hvu.sam RSB15248-1_23.hvu.sam RSB15248-1_24.hvu.sam RSB15248-1_25.hvu.sam RSB15248-1_26.hvu.sam RSB15248-1_27.hvu.sam RSB15248-1_28.hvu.sam RSB15248-1_29.hvu.sam RSB15248-1_30.hvu.sam RSB15248-1_31.hvu.sam RSB15248-1_32.hvu.sam RSB15248-1_33.hvu.sam RSB15248-1_34.hvu.sam RSB15248-1_35.hvu.sam RSB15248-1_36.hvu.sam RSB15248-1_37.hvu.sam RSB15248-1_38.hvu.sam RSB15248-1_39.hvu.sam RSB15248-1_40.hvu.sam
# mapped to coding genes
featureCounts -T 8 -p -M --fraction -t transcript -g gene_id -a Hv_IBSC_PGSB_r1_HighConf.gtf -o RSB15248-1.hvu-gene.txt RSB15248-1_15.hvu.sam RSB15248-1_16.hvu.sam RSB15248-1_17.hvu.sam RSB15248-1_18.hvu.sam RSB15248-1_19.hvu.sam RSB15248-1_20.hvu.sam RSB15248-1_21.hvu.sam RSB15248-1_22.hvu.sam RSB15248-1_23.hvu.sam RSB15248-1_24.hvu.sam RSB15248-1_25.hvu.sam RSB15248-1_26.hvu.sam RSB15248-1_27.hvu.sam RSB15248-1_28.hvu.sam RSB15248-1_29.hvu.sam RSB15248-1_30.hvu.sam RSB15248-1_31.hvu.sam RSB15248-1_32.hvu.sam RSB15248-1_33.hvu.sam RSB15248-1_34.hvu.sam RSB15248-1_35.hvu.sam RSB15248-1_36.hvu.sam RSB15248-1_37.hvu.sam RSB15248-1_38.hvu.sam RSB15248-1_39.hvu.sam RSB15248-1_40.hvu.sam
# mapped to transposable elements
featureCounts -T 2 -p -M --fraction -t gene -g gene_id -a 160404_barley_pseudomolecules_masked.fasta.out.gtf -o RSB15248-1.hvu-TE.txt RSB15248-1_15.hvu.sam RSB15248-1_16.hvu.sam RSB15248-1_17.hvu.sam RSB15248-1_18.hvu.sam RSB15248-1_19.hvu.sam RSB15248-1_20.hvu.sam RSB15248-1_21.hvu.sam RSB15248-1_22.hvu.sam RSB15248-1_23.hvu.sam RSB15248-1_24.hvu.sam RSB15248-1_25.hvu.sam RSB15248-1_26.hvu.sam RSB15248-1_27.hvu.sam RSB15248-1_28.hvu.sam RSB15248-1_29.hvu.sam RSB15248-1_30.hvu.sam RSB15248-1_31.hvu.sam RSB15248-1_32.hvu.sam RSB15248-1_33.hvu.sam RSB15248-1_34.hvu.sam RSB15248-1_35.hvu.sam RSB15248-1_36.hvu.sam RSB15248-1_37.hvu.sam RSB15248-1_38.hvu.sam RSB15248-1_39.hvu.sam RSB15248-1_40.hvu.sam
Summarizing the results, written in .summary
, from all 18 samples in one text file:
grep Assigned *.summary > countsbysize.hvu.summary