-
Notifications
You must be signed in to change notification settings - Fork 14
/
Copy pathlab_quantification.Rmd
69 lines (48 loc) · 2.56 KB
/
lab_quantification.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
---
title: "Quantification"
subtitle: "Workshop on RNA-Seq"
output:
bookdown::html_document2:
highlight: textmate
toc: false
toc_float:
collapsed: true
smooth_scroll: true
print: false
toc_depth: 4
number_sections: false
df_print: default
code_folding: none
self_contained: false
keep_md: false
encoding: 'UTF-8'
css: "assets/lab.css"
include:
after_body: assets/footer-lab.html
---
```{r,child="assets/header-lab.Rmd"}
```
Once all the read samples have been mapped to the reference we want to quantify how many of the reads that are mapping to different parts of the genome. In most cases this regions are the genes but you can ask to get counts for other parts of the genome as long as you can provide a annotation file with that information. In most cases the format GTF or GFF3 are the most common formats for genome annotation and the ones that most programs use.
In our case we are going to use a program called __featureCounts__ that is fast and reliable.
Then by providing the mapped read files and the annotation you can quantify how many reads that mapped to your reference. There are many settings that you can use to decide how you want to do the counting. In our case we will count the reads that are in the annotated exons (`-t exon`) and sum them up per gene (`-g gene`) . In __featureCounts__, this is done by the following command.
```{sh,eval=FALSE,chunk.title=TRUE}
# With only one bam file
featureCounts -T {threads} -s 2 -t exon -g gene_id -a {input.gtf} -o {output.countTable} bamFiles/file1.bam > results/Counts.featureCount.log
# Or multiple bamfiles.
featureCounts -T {threads} -s 2 -t exon -g gene_id -a {input.gtf} -o {output.countTable} bamFiles/*.bam > results/Counts.featureCount.log
```
This will create a file that contains a matrix were the rows represents the genes and the columns represent the samples. Feature count will also include some information per gene to see were it is annotated in the reference. The file is easy to open in R for further analysis.
***
```{sh ,eval=FALSE,chunk.title=TRUE}
##########################################
### RUN featureCounts ###
#########################################
cd ~/RNAseq/
gunzip rawData/references/annotations/Mus_musculus.GRCm38.101.gtf.gz
featureCounts -T 4 -s 2 -t exon -g gene_id \
-a rawData/references/annotations/Mus_musculus.GRCm38.101.gtf \
-o results/04-quantificationFeatureCounts/allSamples.featureCounts \
results/02-mapping/*1M.sorted.bam \
> results/04-quantificationFeatureCounts/allSamples.featureCounts.log
```
***