Read length distribution

Each of the read files contained reads of 15-75 nt in length. We extracted the read size distributions of the trimmed reads as follows:

awk '{if(NR%4==2) print length($1)}' RSB15248-1.trimmed.fastq | sort -n | uniq -c > RSB15248-1.sizeDistr.txt

This results in a two-column output, where column 1 contains the counts and column 2 the read size. To summarize the results in one table:

paste <(awk '{print $2}' RSB15248-1.sizeDistr.txt) <(awk '{print $1}' RSB15248-1.sizeDistr.txt) <(awk '{print $1}' RSB15248-2.sizeDistr.txt) <(awk '{print $1}' RSB15248-3.sizeDistr.txt) <(awk '{print $1}' RSB15248-4.sizeDistr.txt) <(awk '{print $1}' RSB15248-5.sizeDistr.txt) <(awk '{print $1}' RSB15248-6.sizeDistr.txt) <(awk '{print $1}' RSB15248-7.sizeDistr.txt) <(awk '{print $1}' RSB15248-8.sizeDistr.txt) <(awk '{print $1}' RSB15248-9.sizeDistr.txt) <(awk '{print $1}' RSB15248-10.sizeDistr.txt) <(awk '{print $1}' RSB15248-11.sizeDistr.txt) <(awk '{print $1}' RSB15248-12.sizeDistr.txt) <(awk '{print $1}' RSB15248-13.sizeDistr.txt) <(awk '{print $1}' RSB15248-14.sizeDistr.txt) <(awk '{print $1}' RSB15248-15.sizeDistr.txt) <(awk '{print $1}' RSB15248-16.sizeDistr.txt) <(awk '{print $1}' RSB15248-17.sizeDistr.txt) <(awk '{print $1}' RSB15248-18.sizeDistr.txt) > sizeDistr.txt

Generating a table with read size distribution, which can be inspected by head sizeDistr.txt:

15	1380177	1290395	1796106	836532	653638	762148	363552	330822	300718	186332	292662	224531	271791	474893	202018	588802	707857	701715
16	4385732	3648088	3364706	924891	596408	882130	472160	477660	419417	240304	504547	289781	377443	415108	286773	1080844	1263246	1179919
17	1558281	1706594	2822221	1458984	946189	1408104	870096	834546	752348	353522	695557	339091	537506	583595	453019	1750946	2012774	1861221
18	1855285	1962225	2984519	2130960	1411695	2025815	1786815	1564398	1448955	503534	904296	436786	479231	642339	511473	2133160	2431974	2223099
19	3604061	3294956	3164676	2126402	1417759	2031843	1591491	1524530	1363873	506642	994414	428036	520634	622376	509871	2223685	2522453	2331232
20	2581484	2549389	2846196	1511700	1025126	1458101	869478	876975	805377	611789	1109874	455417	519656	621036	578042	2488148	2749977	2570071

Read mapping statistics by read length

We also generated statistics of the number of reads mapping to specific features in the two reference genomes, separated by read length. First, we created a read length index.

samtools faidx RSB15248-1.trimmed.fasta
awk '{print $1,$2}' RSB15248-1.trimmed.fasta.fai | sed 's/\:/\_/g' > RSB15248-1.trimmed.fasta.length

Overall read mapping to Blumeria graminis f.sp. hordei and Hordeum vulgare

We then used the BAM output files from bowtie mapping of short reads within the ShortStack pipeline. Mapping was done against the reference genomes of Blumeria graminis f.sp. hordei (Frantzeskakis et al., 2018) and Hordeum vulgare (Hordeum_vulgare.IBSC_v2), respectively. First, we determined the overall mapping to both genomes:

# Blumeria graminis f.sp. hordei
samtools view -F 4 RSB15248-1.bgh_aligned.bam | awk '{print $1}' | sed 's/\:/\_/g' | awk '!x[$0]++' | grep -Fwf - RSB15248-1.trimmed.fasta.length | awk '{print $2}' | sort -n | uniq -c | awk '{print $1,$2}' > RSB15248-1.bgh_all_mapped.txt
# Hordeum vulgare
samtools view -F 4 merged_RSB15248-1.hvu_aligned.bam | awk '{print $1}' | sed 's/\:/\_/g' | awk '!x[$0]++' | grep -Fwf - RSB15248-1.trimmed.fasta.length | awk '{print $2}' | sort -n | uniq -c | awk '{print $1,$2}' > RSB15248-1.hvu_all_mapped.txt

We summarized the results from each sample into one document as follows:

# Blumeria graminis f.sp. hordei
paste RSB15248-1.bgh_all_mapped.txt RSB15248-2.bgh_all_mapped.txt RSB15248-3.bgh_all_mapped.txt RSB15248-4.bgh_all_mapped.txt RSB15248-5.bgh_all_mapped.txt RSB15248-6.bgh_all_mapped.txt RSB15248-7.bgh_all_mapped.txt RSB15248-8.bgh_all_mapped.txt RSB15248-9.bgh_all_mapped.txt RSB15248-10.bgh_all_mapped.txt RSB15248-11.bgh_all_mapped.txt RSB15248-12.bgh_all_mapped.txt RSB15248-13.bgh_all_mapped.txt RSB15248-14.bgh_all_mapped.txt RSB15248-15.bgh_all_mapped.txt RSB15248-16.bgh_all_mapped.txt RSB15248-17.bgh_all_mapped.txt RSB15248-18.bgh_all_mapped.txt | cut -f 1,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36 > bgh_all_mapped.txt
# Hordeum vulgare
paste RSB15248-1.hvu_all_mapped.txt RSB15248-2.hvu_all_mapped.txt RSB15248-3.hvu_all_mapped.txt RSB15248-4.hvu_all_mapped.txt RSB15248-5.hvu_all_mapped.txt RSB15248-6.hvu_all_mapped.txt RSB15248-7.hvu_all_mapped.txt RSB15248-8.hvu_all_mapped.txt RSB15248-9.hvu_all_mapped.txt RSB15248-10.hvu_all_mapped.txt RSB15248-11.hvu_all_mapped.txt RSB15248-12.hvu_all_mapped.txt RSB15248-13.hvu_all_mapped.txt RSB15248-14.hvu_all_mapped.txt RSB15248-15.hvu_all_mapped.txt RSB15248-16.hvu_all_mapped.txt RSB15248-17.hvu_all_mapped.txt RSB15248-18.hvu_all_mapped.txt | cut -f 1,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36 > hvu_all_mapped.txt

Feature-specific read mapping

In order to determine the number of reads of a given read length mapping to genomic features, i.e., transposable elements, coding genes, or milRNA genes, we first created necessary temporary files, i.e., the SAM header, and a non-repetitive text (SAM) file. The codes used for Blumeria graminis f.sp. hordei:

# SAM header
samtools view -F 4 RSB15248-1.bgh_aligned.bam | grep '@' > RSB15248-1.bgh.head
# temporary non-redundant read mapping file
samtools view -F 4 RSB15248-1.bgh_aligned.bam | awk '!visited[$1]++' | awk '{gsub(":","_",$1)}1' | sed 's/[[:space:]]/\t/g' > RSB15248-1.bgh_aligned.sam

Then, creating the read mapping SAM files for each read length separately:

grep -Fwf <(awk '($2==15)' RSB15248-1.trimmed.fasta.length | awk '{print $1}') RSB15248-1.bgh_aligned.sam | cat RSB15248-1.bgh.head - > RSB15248-1_15.bgh.sam
...
grep -Fwf <(awk '($2==40)' RSB15248-1.trimmed.fasta.length | awk '{print $1}') RSB15248-1.bgh_aligned.sam | cat RSB15248-1.bgh.head - > RSB15248-1_40.bgh.sam

The same for Hordeum vulgare.

# SAM header
samtools view -F 4 RSB15248-1_merged.hvu_aligned.bam | grep '@' > RSB15248-1.hvu.head
# temporary non-redundant read mapping file
samtools view -F 4 RSB15248-1_merged.hvu_aligned.bam | awk '{gsub(":","_",$1)}1' | sed 's/[[:space:]]/\t/g' | awk '!visited[$1]++' > RSB15248-1_merged.hvu_aligned.sam

Creating the read mapping SAM files for each read length separately.

grep -Fwf <(awk '($2==15)' RSB15248-1.trimmed.fasta.length | awk '{print $1}') RSB15248-1_merged.hvu_aligned.sam | cat RSB15248-1.hvu.head - > RSB15248-1_15.hvu.sam
...
grep -Fwf <(awk '($2==40)' RSB15248-1.trimmed.fasta.length | awk '{print $1}') RSB15248-1_merged.hvu_aligned.sam | cat RSB15248-1.hvu.head - > RSB15248-1_40.hvu.sam

Counting feature-mapping reads in Blumeria graminis f.sp. hordei

We then used featureCounts to determine the number of reads mapping to the respective features. We used the GTF annotation files for coding genes, milRNAs, and transposable elements, parsed to look as follows:

$ head Bgh_milRNA_v1.gff
##gff-version 3									;
scaffold_18	ShortStack	nc_RNA	4313703	4313881	.	+	.	ID=BghMilRNA_00001;DicerCall=22;MIRNA=Y;MajorRNA=UGAGAAAGCGGCCAAGAUGAUG;
scaffold_9	ShortStack	nc_RNA	86105	86250	.	+	.	ID=BghMilRNA_00002;DicerCall=22;MIRNA=Y;MajorRNA=UAGUCCGGCGUGUUGGACCCUC;
scaffold_8	ShortStack	nc_RNA	7899823	7899937	.	-	.	ID=BghMilRNA_00003;DicerCall=22;MIRNA=Y;MajorRNA=UGGGUAGUUGUUGAUGAAAUG;
scaffold_1	ShortStack	nc_RNA	102466	102756	.	-	.	ID=BghMilRNA_00004;DicerCall=21;MIRNA=N15;MajorRNA=UGGAGUGAGGAUAUUUGUUGG;
scaffold_1	ShortStack	nc_RNA	148126	148145	.	-	.	ID=BghMilRNA_00005;DicerCall=20;MIRNA=N15;MajorRNA=GAAGCGUGCCGUUGGUGGGG;
scaffold_1	ShortStack	nc_RNA	235447	235467	.	+	.	ID=BghMilRNA_00006;DicerCall=21;MIRNA=N15;MajorRNA=UGAACUAGGACUUCGGGGGAG;
scaffold_1	ShortStack	nc_RNA	352439	352527	.	-	.	ID=BghMilRNA_00007;DicerCall=21;MIRNA=N15;MajorRNA=UGAACUAGGACUUCGGGGGAG;
scaffold_10	ShortStack	nc_RNA	30007	30076	.	+	.	ID=BghMilRNA_00008;DicerCall=21;MIRNA=N15;MajorRNA=UUUGGAUCCAACAAUGUUGCG;
scaffold_10	ShortStack	nc_RNA	121960	121998	.	+	.	ID=BghMilRNA_00009;DicerCall=22;MIRNA=N15;MajorRNA=UACUUGAGUCGCGUGUCUGAA;
$ head bgh_dh14_v4-2.gff
##gff-version 3
scaffold_1	.	gene	17468	19945	.	-	.	ID=BLGH_07057;Name=bghG004133000001001;geneID=BLGH_07057;gene_name=BLGH_07057;Dbxref=Gene3D:G3DSA:1.10.510.10,InterPro:IPR011009,SUPERFAMILY:SSF56112;
scaffold_1	.	mRNA	17468	19945	.	-	.	ID=BLGH_07057-mRNA-1;Parent=BLGH_07057;geneID=BLGH_07057;gene_name=BLGH_07057;Dbxref=Gene3D:G3DSA:1.10.510.10,InterPro:IPR011009,SUPERFAMILY:SSF56112;
scaffold_1	.	exon	17468	17592	.	-	.	Parent=BLGH_07057-mRNA-1;Name=DB67CD106629CEB4D176A838D8BED4F9;
scaffold_1	.	exon	17671	19767	.	-	.	Parent=BLGH_07057-mRNA-1;Name=68A4AAC3A3D822E5D9BB543C42F25956;
scaffold_1	.	exon	19814	19945	.	-	.	Parent=BLGH_07057-mRNA-1;Name=1962183D7FA28EA06D835E98484FBFE6;
scaffold_1	.	CDS	17549	17592	.	-	2	Parent=BLGH_07057-mRNA-1;
scaffold_1	.	CDS	17671	19767	.	-	2	Parent=BLGH_07057-mRNA-1;
scaffold_1	.	CDS	19814	19940	.	-	0	Parent=BLGH_07057-mRNA-1;
scaffold_1	.	gene	171190	173755	.	-	.	ID=BLGH_07058;Name=bghG004133000001001;geneID=BLGH_07058;gene_name=BLGH_07058;
$ head bgh_dh14_v4.fa.out.gtf
##gff-version	2
##date	2019-04-05
##sequence-region	bgh_dh14_v4.fa
scaffold_1	RepeatMasker	gene	11	759	18.1	+	.	gene_id "Motif:HaTad1-2_BG"
scaffold_1	RepeatMasker	gene	764	2908	15.5	+	.	gene_id "Motif:HaTad1-2_BG"
scaffold_1	RepeatMasker	gene	2921	8190	6.5	+	.	gene_id "Motif:Tad1-17_BG"
scaffold_1	RepeatMasker	gene	8191	8328	1.4	+	.	gene_id "Motif:Copia-18_BG-LTR"
scaffold_1	RepeatMasker	gene	8329	12888	1.3	+	.	gene_id "Motif:Copia-18_BG-I"
scaffold_1	RepeatMasker	gene	12889	13026	0.7	+	.	gene_id "Motif:Copia-18_BG-LTR"
scaffold_1	RepeatMasker	gene	13027	13609	5.3	+	.	gene_id "Motif:Tad1-17_BG"

The counts were determined for RSB15248-1 (same procedure for the other 17 samples):

# mapped to milRNAs
featureCounts -T 2 -p -M --fraction -t nc_RNA -g ID -a Bgh_milRNA_v1.gff -o RSB15248-1.bgh-milRNA.txt RSB15248-1_15.bgh.sam RSB15248-1_16.bgh.sam RSB15248-1_17.bgh.sam RSB15248-1_18.bgh.sam RSB15248-1_19.bgh.sam RSB15248-1_20.bgh.sam RSB15248-1_21.bgh.sam RSB15248-1_22.bgh.sam RSB15248-1_23.bgh.sam RSB15248-1_24.bgh.sam RSB15248-1_25.bgh.sam RSB15248-1_26.bgh.sam RSB15248-1_27.bgh.sam RSB15248-1_28.bgh.sam RSB15248-1_29.bgh.sam RSB15248-1_30.bgh.sam RSB15248-1_31.bgh.sam RSB15248-1_32.bgh.sam RSB15248-1_33.bgh.sam RSB15248-1_34.bgh.sam RSB15248-1_35.bgh.sam RSB15248-1_36.bgh.sam RSB15248-1_37.bgh.sam RSB15248-1_38.bgh.sam RSB15248-1_39.bgh.sam RSB15248-1_40.bgh.sam
# mapped to coding genes
featureCounts -T 2 -p -M --fraction -t mRNA -g ID -a bgh_dh14_v4-2.gff -o RSB15248-1.bgh-gene.txt RSB15248-1_15.bgh.sam RSB15248-1_16.bgh.sam RSB15248-1_17.bgh.sam RSB15248-1_18.bgh.sam RSB15248-1_19.bgh.sam RSB15248-1_20.bgh.sam RSB15248-1_21.bgh.sam RSB15248-1_22.bgh.sam RSB15248-1_23.bgh.sam RSB15248-1_24.bgh.sam RSB15248-1_25.bgh.sam RSB15248-1_26.bgh.sam RSB15248-1_27.bgh.sam RSB15248-1_28.bgh.sam RSB15248-1_29.bgh.sam RSB15248-1_30.bgh.sam RSB15248-1_31.bgh.sam RSB15248-1_32.bgh.sam RSB15248-1_33.bgh.sam RSB15248-1_34.bgh.sam RSB15248-1_35.bgh.sam RSB15248-1_36.bgh.sam RSB15248-1_37.bgh.sam RSB15248-1_38.bgh.sam RSB15248-1_39.bgh.sam RSB15248-1_40.bgh.sam
# mapped to transposable elements
featureCounts -T 2 -p -M --fraction -t gene -g gene_id -a bgh_dh14_v4.fa.out.gtf -o RSB15248-1.bgh-TE.txt RSB15248-1_15.bgh.sam RSB15248-1_16.bgh.sam RSB15248-1_17.bgh.sam RSB15248-1_18.bgh.sam RSB15248-1_19.bgh.sam RSB15248-1_20.bgh.sam RSB15248-1_21.bgh.sam RSB15248-1_22.bgh.sam RSB15248-1_23.bgh.sam RSB15248-1_24.bgh.sam RSB15248-1_25.bgh.sam RSB15248-1_26.bgh.sam RSB15248-1_27.bgh.sam RSB15248-1_28.bgh.sam RSB15248-1_29.bgh.sam RSB15248-1_30.bgh.sam RSB15248-1_31.bgh.sam RSB15248-1_32.bgh.sam RSB15248-1_33.bgh.sam RSB15248-1_34.bgh.sam RSB15248-1_35.bgh.sam RSB15248-1_36.bgh.sam RSB15248-1_37.bgh.sam RSB15248-1_38.bgh.sam RSB15248-1_39.bgh.sam RSB15248-1_40.bgh.sam

Summarizing the results, written in .summary, from all 18 samples in one text file:

grep Assigned *.summary > countsbysize.bgh.summary

The result should look like this:

countsbysize.bgh.summary:countsbyreadsize/RSB15248-10.bgh-gene.txt.summary:Assigned	30441	22983	18490	15850	16682	26754	108109	170422	39187	12580	9068	8145	7746	7191	6614	5545	4518	4285	2760	1954	1530	1128	838	590	386	232
countsbysize.bgh.summary:countsbyreadsize/RSB15248-10.bgh-milRNA.txt.summary:Assigned	148	178	251	514	542	1606	6329	6772	1036	249	92	88	59	116	54	113	34	31	31	11	4	3	3	4	5	0
countsbysize.bgh.summary:countsbyreadsize/RSB15248-10.bgh-TE.txt.summary:Assigned	44869	41885	52992	50749	63731	134833	626134	876226	176699	45947	27956	23742	20686	16090	14713	11030	14971	10693	6714	4996	4015	3427	2276	1392	1004	643
countsbysize.bgh.summary:countsbyreadsize/RSB15248-11.bgh-gene.txt.summary:Assigned	32625	36431	43605	52543	54804	72348	238140	318126	67765	22781	15782	14013	14264	12733	11441	9713	7887	6506	5140	4139	3271	2563	1909	1370	1049	765
countsbysize.bgh.summary:countsbyreadsize/RSB15248-11.bgh-milRNA.txt.summary:Assigned	189	375	586	937	1517	3638	13197	12125	1605	331	169	105	106	276	80	290	37	17	22	18	12	10	10	7	3	3
countsbysize.bgh.summary:countsbyreadsize/RSB15248-11.bgh-TE.txt.summary:Assigned	63577	85295	136516	171610	202025	352249	1400241	1679902	298650	70612	45265	47029	44027	35456	30501	31771	20780	19984	14162	11042	7796	5929	6148	3690	2462	2053
countsbysize.bgh.summary:countsbyreadsize/RSB15248-12.bgh-gene.txt.summary:Assigned	38939	25908	9384	4035	1038	419	1089	918	288	220	154	178	273	368	162	188	258	460	171	87	39	35	21	14	4	25
countsbysize.bgh.summary:countsbyreadsize/RSB15248-12.bgh-milRNA.txt.summary:Assigned	42	69	53	128	25	18	62	57	2	0
countsbysize.bgh.summary:countsbyreadsize/RSB15248-12.bgh-TE.txt.summary:Assigned	45017	39080	25008	39242	4190	2984	4682	4457	1667	1032	688	625	681	564	367	406	438	515	490	458	237	567	131	64	42	14
countsbysize.bgh.summary:countsbyreadsize/RSB15248-13.bgh-gene.txt.summary:Assigned	50095	28226	23555	2903	1502	427	491	593	287	242	211	201	292	535	313	362	662	1585	116	60	51	54	46	19	14	6

Counting feature-mapping reads in Hordeum vulgare

To use featureCounts to determine the number of reads mapping to transposable elements, we needed to run RepeatMasker for generating information about the transposon landscape in the genome of Hordeum vulgare. This was done for each chromosome separately.

RepeatMasker -pa 4 -species Hordeum -gff -dir RepeatMasker /hpcwork/rwth0146/SequencingData_published/Hordeum_vulgare_v160517/160404_barley_pseudomolecules_masked_chr1H.fasta;
RepeatMasker -pa 4 -species Hordeum -gff -dir RepeatMasker /hpcwork/rwth0146/SequencingData_published/Hordeum_vulgare_v160517/160404_barley_pseudomolecules_masked_chr2H.fasta;
RepeatMasker -pa 4 -species Hordeum -gff -dir RepeatMasker /hpcwork/rwth0146/SequencingData_published/Hordeum_vulgare_v160517/160404_barley_pseudomolecules_masked_chr3H.fasta;
RepeatMasker -pa 4 -species Hordeum -gff -dir RepeatMasker /hpcwork/rwth0146/SequencingData_published/Hordeum_vulgare_v160517/160404_barley_pseudomolecules_masked_chr4H.fasta;
RepeatMasker -pa 4 -species Hordeum -gff -dir RepeatMasker /hpcwork/rwth0146/SequencingData_published/Hordeum_vulgare_v160517/160404_barley_pseudomolecules_masked_chr5H.fasta;
RepeatMasker -pa 4 -species Hordeum -gff -dir RepeatMasker /hpcwork/rwth0146/SequencingData_published/Hordeum_vulgare_v160517/160404_barley_pseudomolecules_masked_chr6H.fasta;
RepeatMasker -pa 4 -species Hordeum -gff -dir RepeatMasker /hpcwork/rwth0146/SequencingData_published/Hordeum_vulgare_v160517/160404_barley_pseudomolecules_masked_chr7H.fasta;
RepeatMasker -pa 4 -species Hordeum -gff -dir RepeatMasker /hpcwork/rwth0146/SequencingData_published/Hordeum_vulgare_v160517/160404_barley_pseudomolecules_masked_chrUn.fasta

After, combining the RepeatMasker output GTF files.

cat 160404_barley_pseudomolecules_masked_chr1H.fasta.out.gff 160404_barley_pseudomolecules_masked_chr2H.fasta.out.gff 160404_barley_pseudomolecules_masked_chr3H.fasta.out.gff 160404_barley_pseudomolecules_masked_chr4H.fasta.out.gff 160404_barley_pseudomolecules_masked_chr5H.fasta.out.gff 160404_barley_pseudomolecules_masked_chr6H.fasta.out.gff 160404_barley_pseudomolecules_masked_chr7H.fasta.out.gff 160404_barley_pseudomolecules_masked_chrUn.fasta.out.gff > 160404_barley_pseudomolecules_masked.fasta.out.gtf

Then, we used GTF files for all the features, as for Blumeria graminis f.sp. hordei, looking like this:

$ head Hvu_milRNA_v1.gff
##gff-version 3
#miRNAs
chr1H	ShortStack	nc_RNA	3894031	3894149	.	+	.	ID=HvuMilRNA_00001;DicerCall=18;MIRNA=Y;MajorRNA=GGCGGAGGCGGUGGCGGC;
chr1H	ShortStack	nc_RNA	18775469	18775707	.	+	ID=HvuMilRNA_00002;DicerCall=21;MIRNA=Y;MajorRNA=GAGGGCGGCGAUAACAUUUUC;
chr1H	ShortStack	nc_RNA	28809524	28809735	.	-	ID=HvuMilRNA_00003;DicerCall=23;MIRNA=Y;MajorRNA=CGGUGGGUGCGCCGCCGUCGAGC;AlternativeName=miR5384;
chr1H	ShortStack	nc_RNA	230391203	230391338	.	+	ID=HvuMilRNA_00004;DicerCall=23;MIRNA=Y;MajorRNA=CGGUGGGUGCGCCGCCGUCGAGC;AlternativeName=miR5384;
chr1H	ShortStack	nc_RNA	282589545	282589626	.	+	ID=HvuMilRNA_00005;DicerCall=19;MIRNA=Y;MajorRNA=UAGUGAUUUUCGUGAUUUG;
chr1H	ShortStack	nc_RNA	316021162	316021259	.	-	ID=HvuMilRNA_00006;DicerCall=21;MIRNA=Y;MajorRNA=AAAUGUACGGAAGGGUAUUUC;
chr1H	ShortStack	nc_RNA	353506970	353507136	.	+	ID=HvuMilRNA_00007;DicerCall=21;MIRNA=Y;MajorRNA=UCGGACCAGGCUUCAUUCCCC;AlternativeName=miR166;
chr1H	ShortStack	nc_RNA	389189884	389190124	.	-	ID=HvuMilRNA_00008;DicerCall=21;MIRNA=Y;MajorRNA=UGAUUGAGCCGUGCCAAUAUC;AlternativeName=miR171;
$ head Hv_IBSC_PGSB_r1_HighConf.gtf
chr1H	Hv_IBSC_PGSB_r1	transcript	41811	45049	.	+	.	 gene_id "HORVU1Hr1G000010"; transcript_id "HORVU1Hr1G000010.1"; clst_id "TCONS_00000001"; confidence_class "HC_G"
chr1H	Hv_IBSC_PGSB_r1	exon	41811	42213	.	+	.	 gene_id "HORVU1Hr1G000010"; transcript_id "HORVU1Hr1G000010.1"; exon_number "1"; confidence_class "HC_G"
chr1H	Hv_IBSC_PGSB_r1	exon	42300	42338	.	+	.	 gene_id "HORVU1Hr1G000010"; transcript_id "HORVU1Hr1G000010.1"; exon_number "2"; confidence_class "HC_G"
chr1H	Hv_IBSC_PGSB_r1	exon	42427	42567	.	+	.	 gene_id "HORVU1Hr1G000010"; transcript_id "HORVU1Hr1G000010.1"; exon_number "3"; confidence_class "HC_G"
chr1H	Hv_IBSC_PGSB_r1	exon	42862	43164	.	+	.	 gene_id "HORVU1Hr1G000010"; transcript_id "HORVU1Hr1G000010.1"; exon_number "4"; confidence_class "HC_G"
chr1H	Hv_IBSC_PGSB_r1	exon	43649	43714	.	+	.	 gene_id "HORVU1Hr1G000010"; transcript_id "HORVU1Hr1G000010.1"; exon_number "5"; confidence_class "HC_G"
chr1H	Hv_IBSC_PGSB_r1	exon	43970	44086	.	+	.	 gene_id "HORVU1Hr1G000010"; transcript_id "HORVU1Hr1G000010.1"; exon_number "6"; confidence_class "HC_G"
chr1H	Hv_IBSC_PGSB_r1	exon	44196	45049	.	+	.	 gene_id "HORVU1Hr1G000010"; transcript_id "HORVU1Hr1G000010.1"; exon_number "7"; confidence_class "HC_G"
chr1H	Hv_IBSC_PGSB_r1	5UTR	41811	42213	.	+	.	 gene_id "HORVU1Hr1G000010"; transcript_id "HORVU1Hr1G000010.1"; 5UTR_number "1"; confidence_class "HC_G"
chr1H	Hv_IBSC_PGSB_r1	5UTR	42300	42338	.	+	.	 gene_id "HORVU1Hr1G000010"; transcript_id "HORVU1Hr1G000010.1"; 5UTR_number "2"; confidence_class "HC_G"
$ head 160404_barley_pseudomolecules_masked.fasta.out.gtf
##gff-version 2
##date 2021-12-16
##sequence-region 160404_barley_pseudomolecules_masked_chr1H.fasta
chr1H	RepeatMasker	gene	3	240	16.5	+	.	gene_id "Motif:BARE1_HV-LTR"
chr1H	RepeatMasker	gene	642	1193	23.8	+	.	gene_id "Motif:BARE-2_HV_LTR"
chr1H	RepeatMasker	gene	1332	1771	25.3	+	.	gene_id "Motif:BARE-2_HV_LTR"
chr1H	RepeatMasker	gene	1799	1851	 4.1	-	.	gene_id "Motif:BARE1A_HV-LTR"
chr1H	RepeatMasker	gene	2009	2466	27.0	-	.	gene_id "Motif:BARE-2_HV_LTR"
chr1H	RepeatMasker	gene	2704	3436	21.9	-	.	gene_id "Motif:BARE1_HV-LTR"
chr1H	RepeatMasker	gene	3537	4942	16.1	-	.	gene_id "Motif:TREP37"

The counts were determined for RSB15248-1 (same procedure for the other 17 samples):

# mapped to milRNAs
featureCounts -T 8 -p -M --fraction -t nc_RNA -g ID -a Hvu_milRNA_v1.gff -o RSB15248-1.hvu-milRNA.txt RSB15248-1_15.hvu.sam RSB15248-1_16.hvu.sam RSB15248-1_17.hvu.sam RSB15248-1_18.hvu.sam RSB15248-1_19.hvu.sam RSB15248-1_20.hvu.sam RSB15248-1_21.hvu.sam RSB15248-1_22.hvu.sam RSB15248-1_23.hvu.sam RSB15248-1_24.hvu.sam RSB15248-1_25.hvu.sam RSB15248-1_26.hvu.sam RSB15248-1_27.hvu.sam RSB15248-1_28.hvu.sam RSB15248-1_29.hvu.sam RSB15248-1_30.hvu.sam RSB15248-1_31.hvu.sam RSB15248-1_32.hvu.sam RSB15248-1_33.hvu.sam RSB15248-1_34.hvu.sam RSB15248-1_35.hvu.sam RSB15248-1_36.hvu.sam RSB15248-1_37.hvu.sam RSB15248-1_38.hvu.sam RSB15248-1_39.hvu.sam RSB15248-1_40.hvu.sam
# mapped to coding genes
featureCounts -T 8 -p -M --fraction -t transcript -g gene_id -a Hv_IBSC_PGSB_r1_HighConf.gtf -o RSB15248-1.hvu-gene.txt RSB15248-1_15.hvu.sam RSB15248-1_16.hvu.sam RSB15248-1_17.hvu.sam RSB15248-1_18.hvu.sam RSB15248-1_19.hvu.sam RSB15248-1_20.hvu.sam RSB15248-1_21.hvu.sam RSB15248-1_22.hvu.sam RSB15248-1_23.hvu.sam RSB15248-1_24.hvu.sam RSB15248-1_25.hvu.sam RSB15248-1_26.hvu.sam RSB15248-1_27.hvu.sam RSB15248-1_28.hvu.sam RSB15248-1_29.hvu.sam RSB15248-1_30.hvu.sam RSB15248-1_31.hvu.sam RSB15248-1_32.hvu.sam RSB15248-1_33.hvu.sam RSB15248-1_34.hvu.sam RSB15248-1_35.hvu.sam RSB15248-1_36.hvu.sam RSB15248-1_37.hvu.sam RSB15248-1_38.hvu.sam RSB15248-1_39.hvu.sam RSB15248-1_40.hvu.sam
# mapped to transposable elements
featureCounts -T 2 -p -M --fraction -t gene -g gene_id -a 160404_barley_pseudomolecules_masked.fasta.out.gtf -o RSB15248-1.hvu-TE.txt RSB15248-1_15.hvu.sam RSB15248-1_16.hvu.sam RSB15248-1_17.hvu.sam RSB15248-1_18.hvu.sam RSB15248-1_19.hvu.sam RSB15248-1_20.hvu.sam RSB15248-1_21.hvu.sam RSB15248-1_22.hvu.sam RSB15248-1_23.hvu.sam RSB15248-1_24.hvu.sam RSB15248-1_25.hvu.sam RSB15248-1_26.hvu.sam RSB15248-1_27.hvu.sam RSB15248-1_28.hvu.sam RSB15248-1_29.hvu.sam RSB15248-1_30.hvu.sam RSB15248-1_31.hvu.sam RSB15248-1_32.hvu.sam RSB15248-1_33.hvu.sam RSB15248-1_34.hvu.sam RSB15248-1_35.hvu.sam RSB15248-1_36.hvu.sam RSB15248-1_37.hvu.sam RSB15248-1_38.hvu.sam RSB15248-1_39.hvu.sam RSB15248-1_40.hvu.sam

Summarizing the results, written in .summary, from all 18 samples in one text file:

grep Assigned *.summary > countsbysize.hvu.summary

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

02.Read length distribution analysis.md

02.Read length distribution analysis.md

Read length distribution

Read mapping statistics by read length

Overall read mapping to Blumeria graminis f.sp. hordei and Hordeum vulgare

Feature-specific read mapping

Counting feature-mapping reads in Blumeria graminis f.sp. hordei

Counting feature-mapping reads in Hordeum vulgare

Files

02.Read length distribution analysis.md

Latest commit

History

02.Read length distribution analysis.md

File metadata and controls

Read length distribution

Read mapping statistics by read length

Overall read mapping to Blumeria graminis f.sp. hordei and Hordeum vulgare

Feature-specific read mapping

Counting feature-mapping reads in Blumeria graminis f.sp. hordei

Counting feature-mapping reads in Hordeum vulgare