-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathone_way_anova_and_kw.qmd
665 lines (529 loc) · 25.7 KB
/
one_way_anova_and_kw.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
# One-way ANOVA and Kruskal-Wallis
```{r}
#| results: "asis"
#| echo: false
source("_common.R")
status("complete")
```
## Overview
In the last chapter, we learnt how to use and interpret the general
linear model when the *x* variable was categorical with two groups. You
will now extend that to situations when there are more than two groups.
This is often known as the one-way ANOVA (**an**alysis **o**f
**var**iance). We will also learn about the Kruskal-Wallis test
[@kruskal1952] which can be used when the assumptions of the general
linear model are not met.
We use `lm()` to carry out a one-way ANOVA. General linear models
applied with `lm()` are based on the normal distribution and known as
parametric tests because they use the parameters of the normal
distribution (the mean and standard deviation) to determine if an effect
is significant. Null hypotheses are about a mean or difference between
means. The assumptions need to be met for the *p*-values generated to be
accurate.
If the assumptions are not met, we can use the non-parametric equivalent
known as the Kruskal-Wallis test. Like other non-parametric tests, the
Kruskal-Wallis test :
- is based on the ranks of values rather than the actual values
themselves
- has a null hypothesis about the mean rank rather than the mean
- has fewer assumptions and can be used in more situations
- tends to be less powerful than a parametric test when the
assumptions are met
<!-- Why not do several two-sample tests? ANOVA terminology and concepts -->
The process of using `lm()` to conduct a one-way ANOVA is very like the
process for using `lm()` to conduct a two-sample *t*-test but with an
important addition. When we get a significant effect of our explanatory
variable, it only tells us that at least two of the means differ. To
find out which means differ, we need a post-hoc test. A post-hoc ("after
this") test is done after a significant ANOVA test. There are several
possible post-hoc tests and we will be using Tukey's HSD (honestly
significant difference) test [@tukey1949] implemented in the
**`emmeans`** [@emmeans] package. Post-hoc tests make adjustments to the
*p*-values to account for the fact that we are doing multiple
comparisons. A Type I error happens when we reject a null hypothesis
that is true and occurs with a probability of 0.05. Doing lots of
comparisons makes it more likely we will get a significant result just
by chance. The post-hoc test adjusts the *p*-values to account for this
increased risk.
### Model assumptions
The assumptions for a general linear model where the explanatory
variable has two or more groups, are the same as for two groups: the
residuals are normally distributed and have homogeneity of variance.
If we have a continuous response and a categorical explanatory variable
with three or more groups, we usually apply the general linear model
with `lm()` and *then* check the assumptions, however, we can sometimes
tell when a non-parametric test would be more appropriate before that:
- Use common sense - the response should be continuous (or nearly
continuous, see [Ideas about data: Theory and
practice](ideas_about_data.html#theory-and-practice)). Consider
whether you would expect the response to be continuous
- There should decimal places and few repeated values.
To examine the assumptions after fitting the linear model, we plot the
residuals and test them against the normal distribution in the same way
as we did for single linear regression.
### Reporting
In reporting the result of one-way ANOVA or Kruskal-Wallis test, we
include:
1. the significance of the effect
- parametric: The *F*-statistic and *p*-value
- non-parametric: The Chi-squared statistic and *p*-value
2. the direction of effect - which of the means/medians is greater
- Post-hoc test
3. the magnitude of effect - how big is the difference between the
means/medians
- parametric: the means and standard errors for each group
- non-parametric: the medians for each group
Figures should reflect what you have said in the statements. Ideally
they should show both the raw data and the statistical model:
- parametric: means and standard errors
- non-parametric: boxplots with medians and interquartile range
We will explore all of these ideas with some examples.
## 🎬 Your turn!
If you want to code along you will need to start a new [RStudio
project](workflow_rstudio.html#rstudio-projects), add a `data-raw`
folder and open a new script. You will also need to load the
**`tidyverse`** package [@tidyverse].
## One-way ANOVA
Researchers wanted the determine the best growth medium for growing
bacterial cultures. They grew bacterial cultures on three different
media formulations and measured the diameter of the colonies. The three
formulations were:
- Control - a generic medium as formulated by the manufacturer
- sugar added - the generic medium with added sugar
- sugar and amino acids added - the generic medium with added sugar
and amino acids
The data are in [culture.csv](data-raw/culture.csv).
### Import and explore
Import the data:
```{r}
culture <- read_csv("data-raw/culture.csv")
```
```{r}
#| echo: false
knitr::kable(culture) |>
kableExtra::kable_styling() |>
kableExtra::scroll_box(height = "200px")
```
The Response variable is colony diameters in millimetres and we would
expect it to be continuous. The Explanatory variable is type of media
and is categorical with 3 groups. It is known “one-way ANOVA” or
“one-factor ANOVA” because there is only one explanatory variable. It
would still be one-way ANOVA if we had 4, 20 or 100 media.
These data are in tidy format [@Wickham2014-nl] - all the diameter
values are in one column with another column indicating the media. This
means they are well formatted for analysis and plotting.
In the first instance it is sensible to create a rough plot of our data.
This is to give us an overview and help identify if there are any issues
like missing or extreme values. It also gives us idea what we are
expecting from the analysis which will make it easier for us to identify
if we make some mistake in applying that analysis.
Violin plots (`geom_violin()`, see @fig-culture-rough), box plots
(`geom_boxplot()`) or scatter plots (`geom_point()`) all make good
choices for exploratory plotting and it does not matter which of these
you choose.
```{r}
#| label: fig-culture-rough
#| fig-cap: "The diameters of bacterial colonies when grown in one of three media. A violin plot is a useful way to get an overview of the data and helps us identify any issues such as missing or extreme values. It also tells us what to expect from the analysis."
#|
ggplot(data = culture,
aes(x = medium, y = diameter)) +
geom_violin()
```
R will order the groups alphabetically by default.
The figure suggests that adding sugar and amino acids to the medium
increases the diameter of the colonies.
Summarising the data for each medium is the next sensible step. The most
useful summary statistics are the means, standard deviations, sample
sizes and standard errors. I recommend the `group_by()` and
`summarise()` approach:
```{r}
culture_summary <- culture %>%
group_by(medium) %>%
summarise(mean = mean(diameter),
std = sd(diameter),
n = length(diameter),
se = std/sqrt(n))
```
We have save the results to `culture_summary` so that we can use the
means and standard errors in our plot later.
```{r}
culture_summary
```
### Apply `lm()`
We can create a one-way ANOVA model like this:
```{r}
mod <- lm(data = culture, diameter ~ medium)
```
And examine the model with:
```{r}
summary(mod)
```
The Estimates in the Coefficients table give:
- `(Intercept)` known as $\beta_0$. The mean of the control group
(@fig-one-way-anova-lm-model). Just as the intercept is the value of
the *y* (the response) when the value of *x* (the explanatory) is
zero in a simple linear regression, this is the value of `diameter`
when the `medium` is at its first level. The order of the levels is
alphabetical by default.
- `mediumsugar added` known as $\beta_1$. This is what needs to be
added to the mean of the
control group to get the mean of the 'medium sugar added' group
(@fig-two-sample-lm-model). Just as the slope is amount of *y* that
needs to be added for each unit of *x* in a simple linear
regression, this is the amount of `diameter` that needs to be added
when the `medium` goes from its first level to its second level
(*i.e.*, one unit). The `mediumsugar added` estimate is positive so
the the 'medium sugar added' group mean is higher than the control
group mean
- `mediumsugar and amino acids added` known as $\beta_2$ is what needs
to be added to the
mean of the control group to get the mean of the 'medium sugar and
amino acids added' group (@fig-two-sample-lm-model). Note that it is
the amount added to the *intercept* (the control in this case). The
`mediumsugar and amino acids added` estimate is positive so the the
'medium sugar and amino acids added' group mean is higher than the
control group mean
If we had more groups, we would have more estimates and all would be
compared to the control group mean.
The *p*-values on each line are tests of whether that coefficient is
different from zero.
- `(Intercept) 10.0700 0.2930 34.370 < 2e-16 ***`
tells us that the control group mean is significantly different from
zero. This is not a very interesting, it just means the control
colonies have a diameter.
- `mediumsugar added 0.1700 0.4143 0.410 0.68483`
tells us that the 'medium sugar added' group mean is not
significantly different from the control group mean.
- `mediumsugar and amino acids added 1.3310 0.4143 3.212 0.00339 **`
tells us that the 'medium sugar and amino acids added' group mean
*is* significantly different from the control group mean.
Note: none of this output tells us whether the medium sugar and amino
acids added' group mean *is* significantly different from the 'medium
sugar added' group mean. We need to do a post-hoc test for that.
The *F* value and *p*-value in the last line are a test of whether the
model as a whole explains a significant amount of variation in the
response variable.
```{r}
#| echo: false
#| label: fig-one-way-anova-lm-model
#| fig-cap: "In an one-way ANOVA model with three groups, the first estimate is the intercept which is the mean of the first group. The second estimate is the 'slope' which is what has to added to the intercept to get the second group mean. The third estimate is the 'slope' which is what has to added to the intercept to get the third group mean. Note that y axis starts at 15 to create more space for the annotations."
ggplot() +
geom_errorbar(data = culture_summary,
aes(x = medium, ymin = mean, ymax = mean),
colour = pal3[1], linewidth = 1,
width = 1) +
scale_x_discrete(expand = c(0,0)) +
scale_y_continuous(expand = c(0,0),
limits = c(9, 12),
name = "Diameter (mm)") +
geom_segment(aes(x = 0.8,
xend = -Inf,
y = mod$coefficients[1] + 1,
yend = mod$coefficients[1]),
colour = pal3[2]) +
annotate("text",
x = 0.9,
y = mod$coefficients[1] + 1.15,
label = glue::glue("Intercept (β0) is mean\nof { culture_summary$medium[1] }: { mod$coefficients[1]|> round(2) }"),
colour = pal3[2],
size = 3) +
geom_segment(aes(x = 1.5,
xend = 1.5,
y = mod$coefficients[1],
yend = mod$coefficients[1] + mod$coefficients[2]),
colour = pal3[3],
arrow = arrow(length = unit(0.03, "npc"),
ends = "both")) +
geom_segment(aes(x = 1.5,
xend = 1.7,
yend = mod$coefficients[1] - 0.25,
y = mod$coefficients[1] + 0.1),
colour = pal3[2]) +
annotate("text",
x = 2,
y = mod$coefficients[1] - 0.35,
label = glue::glue("mediumsugar added (β1) is the difference\nbetween { culture_summary$medium[1] } mean and { culture_summary$medium[2] } mean: { mod$coefficients[2] |> round(2) }"),
colour = pal3[2],
size = 3) +
geom_segment(aes(x = 3,
xend = 3,
y = mod$coefficients[1],
yend = mod$coefficients[1] + mod$coefficients[3]),
colour = pal3[3],
arrow = arrow(length = unit(0.03, "npc"),
ends = "both")) +
geom_segment(aes(x = 3,
xend = 2.1,
yend = mod$coefficients[1] + mod$coefficients[3] - 0.5,
y = mod$coefficients[1] + mod$coefficients[3] - 0.3),
colour = pal3[2]) +
annotate("text",
x = 2.1,
y = mod$coefficients[1] + mod$coefficients[3] - 0.75,
label = glue::glue("mediumsugar and amino acids added (β2)\nis the difference between { culture_summary$medium[1] } mean\nand { culture_summary$medium[3] } mean: { mod$coefficients[3] |> round(2) }"),
colour = pal3[2],
size = 3) +
geom_segment(aes(x = 1.5,
xend = 3,
yend = mod$coefficients[1],
y = mod$coefficients[1]),
colour = pal3[1],
linetype = "dashed") +
theme_classic()
```
The ANOVA is significant but this only tells us that growth medium matters,
meaning at least two of the means differ. To find out which means
differ, we need a post-hoc test. A post-hoc ("after this") test is done
after a significant ANOVA test. There are several possible post-hoc
tests and we will be using Tukey's HSD (honestly significant difference)
test [@tukey1949] implemented in the **`emmeans`** [@emmeans] package.
We need to load the package:
```{r}
library(emmeans)
```
Then carry out the post-hoc test:
```{r}
emmeans(mod, ~ medium) |> pairs()
```
Each row is a comparison between the two means in the 'contrast' column.
The 'estimate' column is the difference between those means and the
'p.value' indicates whether that difference is significant.
A plot can be used to visualise the result of the post-hoc which can be
especially useful when there are very many comparisons.
```{r}
emmeans(mod, ~ medium) |> plot()
```
Where the purple bars overlap, there is no significant difference.
We have found that colony diameters are significantly greater when sugar
and amino acids are added but that adding sugar alone does not
significantly increase colony diameter.
### Check assumptions
Check the assumptions: All general linear models assume the "residuals"
are normally distributed and have "homogeneity" of variance.
Our first check of these assumptions is to use common sense: diameter is
a continuous and we would expect it to be normally distributed thus we
would expect the residuals to be normally distributed thus we would
expect the residuals to be normally distributed
We then proceed by plotting residuals. The `plot()` function can be used
to plot the residuals against the fitted values (See @fig-anova1-plot1).
This is a good way to check for homogeneity of variance.
```{r}
#| label: fig-anova1-plot1
#| fig-cap: "A plot of the residuals against the fitted values shows whether the points are distributed similarly in each group. Any difference seems small but perhaps the residuals are more variable for the highest mean."
plot(mod, which = 1)
```
Perhaps the variance is higher for the highest mean?
We can also use a histogram to check for normality (See
@fig-anova1-plot2).
```{r}
#| label: fig-anova1-plot2
#| fig-cap: "A histogram of residuals is symetrical and seems consistent with a normal distribution. This is a good sign for the assumption of normally distributed residuals."
ggplot(mapping = aes(x = mod$residuals)) +
geom_histogram(bins = 8)
```
Finally, we can use the Shapiro-Wilk test to test for normality.
```{r}
shapiro.test(mod$residuals)
```
The p-value is greater than 0.05 so this test of the normality
assumption is not significant.
Taken together, these results suggest that the assumptions of normality
and homogeneity of variance are probably not violated.
### Report
There was a significant effect of media on the diameter of bacterial
colonies (*F* = 6.11; *d.f.* = 2, 27; *p* = 0.006). Post-hoc testing
with Tukey's Honestly Significant Difference test [@tukey1949] revealed
the colony diameters were significantly larger when grown with both
sugar and amino acids ($\bar{x} \pm s.e$: 11.4 $\pm$ 0.37 mm) than with
neither
(10.2 $\pm$ 0.26 mm; *p* = 0.0092) or just sugar (10.1 $\pm$ 0.23 mm;
*p* = 0.0244). See @fig-culture.
::: {#fig-culture}
```{r}
#| code-fold: true
ggplot() +
geom_point(data = culture, aes(x = medium, y = diameter),
position = position_jitter(width = 0.1, height = 0),
colour = "gray50") +
geom_errorbar(data = culture_summary,
aes(x = medium, ymin = mean - se, ymax = mean + se),
width = 0.3) +
geom_errorbar(data = culture_summary,
aes(x = medium, ymin = mean, ymax = mean),
width = 0.2) +
scale_y_continuous(name = "Diameter (mm)",
limits = c(0, 16.5),
expand = c(0, 0)) +
scale_x_discrete(name = "Medium",
labels = c("Control",
"Sugar added",
"Sugar and amino acids added")) +
annotate("segment", x = 2, xend = 3,
y = 14, yend = 14,
colour = "black") +
annotate("text", x = 2.5, y = 14.5,
label = expression(italic(p)~"= 0.0244")) +
annotate("segment", x = 1, xend = 3,
y = 15.5, yend = 15.5,
colour = "black") +
annotate("text", x = 2, y = 16,
label = expression(italic(p)~"= 0.0092")) +
theme_classic()
```
**Medium affects bacterial colony diameter**. Ten replicate colonies
were grown on three types of media: control, with sugar added and with
both sugar and amino acids added. Error bars are means $\pm$ 1 standard
error. There was a significant effect of media on the diameter of
bacterial colonies (*F* = 6.11; *d.f.* = 2, 27; *p* = 0.006). Post-hoc
testing with Tukey's Honestly Significant Difference test [@tukey1949]
revealed the colony diameters were significantly larger when grown with
both sugar and amino acids than with neither or just sugar. Data
analysis was conducted in R [@R-core] with tidyverse packages [@tidyverse].
:::
# Kruskal-Wallis
Our examination of the assumptions revealed a possible violation of the
assumption of homogeneity of variance. We might reasonably apply a
non-parametric test to this data.
The Kruskal-Wallis [@kruskal1952] is non-parametric equivalent of a
one-way ANOVA. The general question you have about your data - do these
groups differ (or does the medium effect diameter) - is the same, but
one of more of the following is true:
- the response variable is not continuous
- the residuals are not normally distributed
- the sample size is too small to tell if they are normally
distributed.
- the variance is not homogeneous
Summarising the data using the median and interquartile range is more
aligned to the type of analysis than using means and standard
deviations:
```{r}
culture_summary <- culture |>
group_by(medium) |>
summarise(median = median(diameter),
interquartile = IQR(diameter),
n = length(diameter))
```
View the results:
```{r}
culture_summary
```
### Apply `kruskal.test()`
We pass the dataframe and variables to `kruskal.test()` in the same way
as we did for `lm()`. We give the data argument and a "formula" which
says `diameter ~ medium` meaning "explain diameter by medium".
```{r}
kruskal.test(data = culture, diameter ~ medium)
```
The result of the test is given on this line:
`Kruskal-Wallis chi-squared = 8.1005, df = 2, p-value = 0.01742`.
`Chi-squared` is the test statistic. The *p*-value is less than 0.05
meaning there is a significant effect of medium on diameter.
Notice that the *p*-value is a little larger than for the ANOVA. This is
because non-parametric tests are generally more conservative (less
powerful) than their parametric equivalents.
A significant Kruskal-Wallis tells us at least two of the groups differ
but where do the differences lie? The Dunn test [@dunn1964] is a
post-hoc multiple comparison test for a significant Kruskal-Wallis. It
is available in the package **`FSA`** [@FSA]
Load the package using:
```{r}
library(FSA)
```
Then run the post-hoc test with:
```{r}
dunnTest(data = culture, diameter ~ medium)
```
The `P.adj` column gives *p*-value for the comparison listed in the
first column. `Z` is the test statistic. The *p*-values are a little
larger for the `control - sugar and amino acids added` comparison and
the `sugar added - sugar and amino acids added` comparison but they are
still less than 0.05. This means our conclusions are the same as for the
ANOVA.
### Report
There is a significant effect of media on the diameter of bacterial
colonies (Kruskal-Wallis: *chi-squared* = 6.34; *df* = 2; *p*-value =
0.042) with colonies growing significantly better when both sugar and
amino acids are added to the medium. Post-hoc testing with the Dunn test
[@dunn1964] revealed the colony diameters were significantly larger when
grown with both sugar and amino acids (median = 11.3 mm) than with
neither (median = 10.2 mm; *p* = 0.031) or just sugar
(median = 10.2 mm; *p* = 0.038). See @fig-culture-kw.
::: {#fig-culture-kw}
```{r}
#| code-fold: true
ggplot(data = culture, aes(x = medium, y = diameter)) +
geom_boxplot() +
scale_y_continuous(name = "Diameter (mm)",
limits = c(0, 16.5),
expand = c(0, 0)) +
scale_x_discrete(name = "Medium",
labels = c("Control",
"Sugar added",
"Sugar and amino acids added")) +
annotate("segment", x = 2, xend = 3,
y = 14, yend = 14,
colour = "black") +
annotate("text", x = 2.5, y = 14.5,
label = expression(italic(p)~"= 0.038")) +
annotate("segment", x = 1, xend = 3,
y = 15.5, yend = 15.5,
colour = "black") +
annotate("text", x = 2, y = 16,
label = expression(italic(p)~"= 0.031")) +
theme_classic()
```
**Medium affects bacterial colony diameter**. Ten replicate colonies
were grown on three types of media: control, with sugar added and with
both sugar and amino acids added. The heavy lines
indicate median diameter, boxes indicate the interquartile range
and whiskers the range. There was a significant effect of media on the
diameter of bacterial colonies (Kruskal-Wallis: *chi-squared* = 6.34,
*df* = 2, *p*-value = 0.042). Post-hoc testing with the Dunn test
[@dunn1964] revealed the colony diameters were significantly larger when
grown with both sugar and amino acids than with neither or just sugar.
Data analysis was conducted in R [@R-core] with
tidyverse packages [@tidyverse].
:::
# Summary
1. A linear model with one explanatory variable with two or more groups
is also known as a **one-way ANOVA**.
2. We estimate the **coefficients** (also called the **parameters**) of
the model. For a one-way ANOVA with three groups these are the mean
of the first group, $\beta_0$, the difference between the means of
the first and second groups, $\beta_1$, and the difference between
the means of the first and third groups, $\beta_2$. We test whether the
parameters differ significantly from zero
3. We can use `lm()` to one-way ANOVA in R.
4. In the output of `lm()` the coefficients are listed in a table in the
Estimates column. The *p*-value for each coefficient is in the test
of whether it differs from zero. At the bottom of the output there
is an $F$ test of the model *overall*. Now we have more than two
parameters, this is different from the test on any one parameter. The
R-squared value is the proportion of the variance in the response
variable that is explained by the model. It tells us is the
explanatory variable is useful in predicting the response variable
overall.
5. When the $F$ test is significant there is a significant effect of
the explanatory variable on the response variable. To find out which
means differ, we need a **post-hoc** test. Here we use Tukey’s HSD
applied with the `emmeans()` and `pairs()` functions from the
**`emmeans`** package. Post-hoc tests make adjustments to the
*p*-values to account for the fact that we are doing multiple tests.
6. The assumptions of the general linear model are that the residuals
are normally distributed and have homogeneity of variance. A residual
is the difference between the predicted value and the observed value.
7. We examine a histogram of the residuals and use the Shapiro-Wilk
normality test to check the normality assumption. We check the
variance of the residuals is the same for all fitted values with
a residuals vs fitted plot.
8. If the assumptions are not met, we can use the Kruskal-Wallis test
applied with `kruskal.test()` in R and follow it with The Dunn test
applied with `dunnTest()` in the package **`FSA`**.
9. When reporting the results of a test we give the significance,
direction and size of the effect. Our figures and the values we give
should reflect the type of test we have used. We use means and
standard errors for parametric tests and medians and interquartile
ranges for non-parametric tests. We also give the test statistic, the
degrees of freedom (parametric) or sample size (non-parametric) and
the p-value. We annotate our figures with the p-value, making clear
which comparison it applies to.