forked from jmledford3115/datascibiol
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathlab6_1.Rmd
184 lines (158 loc) · 6.24 KB
/
lab6_1.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
---
title: "Visualizing Data 3"
author: "Joel Ledford"
date: "Winter 2019"
output:
html_document:
theme: spacelab
toc: yes
toc_float: yes
pdf_document:
toc: yes
---
## Resources
- [ggplot2 cheatsheet](https://www.rstudio.com/wp-content/uploads/2015/03/ggplot2-cheatsheet.pdf)
- [R for Data Science](https://r4ds.had.co.nz/)
- [R Cookbook](http://www.cookbook-r.com/)
## Learning Goals
*At the end of this exercise, you will be able to:*
1. Build stacked barplots of categorical variables.
2. Build side-by-side barplots using `position= "dodge"`.
3. Customize colors in plots using `RColorBrewer` and `paletteer`.
4. Build histograms and density plots.
## Load the libraries
```{r message=FALSE, warning=FALSE}
library(tidyverse)
```
```{r}
options(scipen=999) #cancels the use of scientific notation for the session
```
## Data
**Database of vertebrate home range sizes.**
Reference: Tamburello N, Cote IM, Dulvy NK (2015) Energy and the scaling of animal space use. The American Naturalist 186(2):196-211. http://dx.doi.org/10.1086/682070.
Data: http://datadryad.org/resource/doi:10.5061/dryad.q5j65/1
```{r message=FALSE, warning=FALSE}
homerange <-
readr::read_csv("data/Tamburelloetal_HomeRangeDatabase.csv", na = c("", " ", "NA", "#N/A", "-999", "\\"))
```
## Barplots revisited
At this point you should be able to build a barplot that shows counts of observations in a variable using `geom_bar()`. But, you should also be able to use `stat="identity"` to specify both x and y axes.
Although we did not use it last time, `geom_col()` is the same as specifying `stat="identity"` using `geom_bar()`.
## Here is the plot using `geom_bar(stat="identity")`
```{r}
homerange %>%
filter(family=="salmonidae") %>% #filter for salmonid fish only
ggplot(aes(x=common.name, y=log10.mass))+
geom_bar(stat="identity")
```
## Practice
1. Make the same plot above, but use `geom_col()`
```{r}
homerange %>%
filter(family=="salmonidae") %>% #filter for salmonid fish only
ggplot(aes(x=common.name, y=log10.mass))+
geom_col()
```
## Barplots and multiple variables
Last time we explored the `fill` option in boxplots as a way to bring color to the plot; i.e. we filled by the same variable that we were plotting. What happens when we fill by a different categorical variable?
Let's start by counting how many obervations we have in each taxonomic group.
```{r}
homerange %>%
count(taxon)
```
Now let's make a barplot of these data.
```{r}
homerange %>%
ggplot(aes(x=taxon))+
geom_bar(alpha=0.9, na.rm=TRUE)+
coord_flip()+
labs(title = "Observations by Taxon in Homerange Data",
x = "Taxonomic Group")
```
By specifying `fill=trophic.guild` we build a stacked bar plot that shows the proportion of a given taxonomic group that is an herbivore or carnivore.
```{r}
homerange %>%
ggplot(aes(x=taxon, fill=trophic.guild))+
geom_bar(alpha=0.9, na.rm=T)+
coord_flip()+
labs(title = "Observations by Taxon in Homerange Data",
x = "Taxonomic Group",
fill= "Trophic Guild")
```
We can also have counts of each trophic guild within taxonomic group shown side-by-side by specifying `position="dodge"`.
```{r}
homerange %>%
ggplot(aes(x=taxon, fill=trophic.guild))+
geom_bar(alpha=0.9, na.rm=T, position="dodge")+
theme(legend.position = "right",
axis.text.x = element_text(angle = 60, hjust=1))+
labs(title = "Observations by Taxon in Homerange Data",
x = "Taxonomic Group",
fill= "Trophic Guild")
```
We can also scale all bars to a percentage (or proportion).
```{r}
homerange %>%
ggplot(aes(x=taxon, fill=trophic.guild))+
geom_bar(position = position_fill(), stat = "count")+
scale_y_continuous(labels = scales::percent)+
coord_flip()
```
## Practice
1. Make a barplot that shows locomotion type by `primarymethod`. Try both a stacked barplot and `position="dodge"`.
```{r}
homerange %>%
ggplot(aes(x=locomotion, fill=primarymethod))+
geom_bar(position = position_fill(), stat = "count")+
scale_y_continuous(labels = scales::percent)+
coord_flip()
```
## Histograms and Density Plots
Histograms are frequently used by biologists; they show the distribution of continuous variables. As students, you almost certainly have seen histograms of grade distributions. Without getting into the statistics, a histogram `bins` the data and you specify the number of bins that encompass a range of observations. For something like grades, this is easy because the number of bins corresponds to the grades A-F. By default, R uses a formula to calculate the number of bins but some adjustment is often required.
What does the distribution of body mass look like in the homerange data?
```{r}
homerange %>%
ggplot(aes(x=log10.mass)) +
geom_histogram(fill="steelblue", alpha=0.8, color="black")+
labs(title = "Distribution of Body Mass")
```
`Density plots` are similar to histograms but they use a smoothing function to make the distribution more even and clean looking. They do not use bins.
```{r}
homerange %>%
ggplot(aes(x=log10.mass)) +
geom_density(fill="steelblue", alpha=0.4)+
labs(title = "Distribution of Body Mass")
```
I like to see both the histogram and the density curve so I often plot them together. Note that I assign the density plot a different color.
```{r message=FALSE}
homerange %>%
ggplot(aes(x=log10.mass)) +
geom_histogram(aes(y = ..density..), binwidth = .4, fill="steelblue", alpha=0.8, color="black")+
geom_density(color="red")+
labs(title = "Distribution of Body Mass")
```
## Practice
1. Make a histogram of `log10.hra`. Make sure to add a title.
```{r}
homerange %>%
ggplot(aes(x=log10.hra)) +
geom_histogram(fill="steelblue", alpha=0.8, color="black")+
labs(title = "Distribution of Homerange")
```
2. Now plot the same variable using `geom_density()`.
```{r}
homerange %>%
ggplot(aes(x=log10.hra)) +
geom_density(fill="steelblue", alpha=0.8, color="black")+
labs(title = "Distribution of Homerange")
```
3. Combine them both!
```{r}
homerange %>%
ggplot(aes(x=log10.hra)) +
geom_histogram(aes(y = ..density..), binwidth = .4, fill="steelblue", alpha=0.8, color="black")+
geom_density(color="red")+
labs(title = "Distribution of Homerange")
```
## That's it, let's take a break!
--> On to [part 2](https://jmledford3115.github.io/datascibiol/lab6_2.html)