-
Notifications
You must be signed in to change notification settings - Fork 33
/
Copy pathassignment-04.Rmd
157 lines (124 loc) · 7.68 KB
/
assignment-04.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
---
title: 'Assignment 4: Exploration, linear and mixed-effects models'
output:
html_document:
toc: false
---
*To submit this assignment, upload the full document on blackboard,
including the original questions, your code, and the output. Submit
you assignment as a knitted `.pdf` (prefered) or `.html` file.*
1. Visualization (3 marks)
Import the tidyverse library. We will be using the same beaver1 dataset that
we used in last week's assignment.
```{r message=FALSE, warning=FALSE}
library(tidyverse)
```
a. Create a histogram to visualize the distribution of the beavers' body
temperatures, separating the temperature data based on the beaver's activity level.
(after transforming it into a categorical variable the way you did for your
last assignment). Describe the properties of the distribution. When
creating this plot for the purpose of evaluating temperature, what
argument did you adjust and why? (1 mark)
b. What type of variables are temperature and time of day? With this in
mind, create a visualization that will help you get a better understanding
of the relationship between these variables. (0.5 mark)
c. Create a single box plot to simultaneously visualise temperature,
activity, and day. (0.5 mark)
d. What is one prediction you might make about the relationships among your
variables (based on the patterns you observed)? Create a visualization that
illustrates your prediction, improving on your previous plots in at least one
way. State why this plot is an improvement. (1 mark)
2. Outliers (2 marks)
a. In the beaver1 dataset, there are some particularly high/low body
temperature measurements. Give an example of a systematic or random error
(state which) that could have influenced these values. (0.5 marks)
b. Consider whether these values would affect your ability to test whether
temperature varies by activity level. You should generate plots and/or
perform statistical tests with and without these points, and make an
informed decision about whether they should be kept or dropped (Hint: you
may want to either create a second data set or get creative with colour.)
State whether you would remove the points and why. (1.5 marks)
3. Linear models (3 marks)
Run the following code to load the CO2 dataset.
```{r}
co2_df <- as_data_frame(as.matrix(CO2)) %>%
mutate(conc = as.integer(conc),
uptake = as.numeric(uptake))
```
a. Look through the help documentation (?CO2) to understand what each
variable means. Imagine you were running a statistical model to assess the
effects of chilling on plant CO2 uptake. What would the $y$ and $x$
variables be in such a model? What about if you were trying to assess the
relationship between ambient CO~2~ concentrations and plant uptake? Briefly
defend these choices. (1 mark)
b. How much does `uptake` change if `conc` goes up by 10 mL/L? Write out the
interpretation as a simple statement of this contribution of `conc` on
`uptake`. How much CO2 would you predict plants to uptake if atmospheric
concentrations were 2,450 mL/L?. Show your work. (2 marks)
4. Linear mixed-effects models (4 marks).
Santangelo _et al._ (2018) were interested in understanding how plant
defenses, herbivores, and pollinators influence the expression of plant
floral traits (e.g. flower size). Their experiment had 3 treatments, each
with 2 levels: Plant defense (2 levels: defended vs. undefended), herbivory
(2 levels: reduced vs. ambient) and pollination (2 levels: open vs.
supplemental). These treatments were fully crossed for a total of 8
treatment combinations. In each treatment combination, they grew 4
individuals from each of 25 plant genotypes for a total of 800 plants (8
treatment combinations x 25 genotypes x 4 individuals per genotype). Plants
were grown in a common garden at the Koffler Scientific Reserve (UofT's field
research station) and 6 floral traits were measured on all plants throughout
the summer. We will analyze how the treatments influenced one of these
traits in this exercise. Run the code chunk below to download the data,
which includes only a subset of the columns from the full dataset:
```{r}
library(tidyverse)
plant_data <- "https://uoftcoders.github.io/rcourse/data/Santangelo_JEB_2018.csv"
download.file(plant_data, "Santangelo_JEB_2018.csv")
plant_data <- read_csv("Santangelo_JEB_2018.csv",
col_names = TRUE)
glimpse(plant_data)
head(plant_data)
```
You can see that the data contain 792 observations (i.e. plants, 8 died
during the experiment). There are 50 genotypes across 3 treatments:
Herbivory, Pollination, and HCN (i.e. hydrogen cyanide, a plant defense).
There are 6 plant floral traits: Number of days to first flower, banner
petal length, banner petal width, plant biomass, number of flowers, and
number of inflorescences. Finally, since plants that are closer in space in
the common garden may have similar trait expression due to more similar
environments, the authors included 6 spatial "blocks" to account for this
environmental variation (i.e. Plant from block A "share" an environment and
those from block B "share" an environment, etc.). Also keep in mind that
each treatment combination contains 4 individuals of each genotype, which
are likely to have similar trait expression due simply to shared genetics.
a. Use the `lme4` and `lmerTest` R packages to run a linear mixed-effects
model examining how herbivores (`Herbivory`), Pollinators (`Pollination`),
plant defenses (`HCN`) _and all interactions_ influences the width of
banner petals (`Avg.Bnr.Wdth`) produced by plants while accounting for
variation due to spatial block and plant genotype. Also allow the intercept
for `Genotype` to vary across the levels of the herbivory treatment. (1
mark: 0.5 for correct fixed effects specification and 0.5 for correct random
effects structure). You only need to specify the model for this part of the
question.
b. Summarize (i.e. get the output) the model that you ran in part (a). Did
any of the treatments have a significant effect on banner petal length? If
so, which ones? Based on your examination of the model output, how can you
tell which level of the significant treatments resulted in longer or shorter
mean banner petal widths? Make a statement for each significant **main**
effects in the model (i.e. not interactions). If none of the main effects
are significant, then simply write "there are no significant main effects in
the model" (0.5 marks).
c. Using `dplyr` and `gglot2`, plot the mean banner width for one of the
significant interactions in the model above (whichever you choose). The idea
is to show how both treatments interact to influence the mean length of
banner petals using a combination of different colours, linetypes, shapes,
etc. on the same plot (i.e., no faceting). Avoid overlapping points in the
figure. Also include error bars/bands with one standard error around the
mean. As a reminder, I have included the formula to calculate the standard
error of the mean below. (1.5 marks).
$$ SE = \frac{sd}{\sqrt{n}} $$
d. After accounting for the fixed effects, what percentage of the variation
in banner petal width was explained by each of the random effects in the
model? Show yor work. (0.5 marks).
e. Descibe the pattern you see in the figure generated in part (c). Why do
you think the interaction you plotted was significant in the model? (0.5 marks)