-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathtwo_sample_tests.qmd
743 lines (578 loc) · 28.4 KB
/
two_sample_tests.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
# Two-Sample tests
```{r}
#| results: "asis"
#| echo: false
source("_common.R")
status("complete")
```
## Overview
In the last chapter we learned about single linear regression, a
technique used when the explanatory variable is continuous. We now turn
our attention to cases where our explanatory variable is categorical and
has two groups. For example, we might want to know if there is a
difference in mass between two subspecies of chaffinch, or if marks in
two subjects are the same.
We use `lm()` or `wilcox.test()` to carry out a two-sample test
depending on whether the assumptions of `lm()` are met. The general
linear models applied with `lm()` are based on the normal distribution
and known as parametric tests because they use the parameters of the
normal distribution (the mean and standard deviation) to determine if an
effect is significant. Null hypotheses are about a mean or difference
between means. The assumptions need to be met for the *p*-values
generated to be accurate.
If the assumptions are not met, we can use alternatives known as
non-parametric tests. Non-parametric tests are based on the ranks of
values rather than the actual values themselves. Null hypotheses are
about the mean rank rather than the mean. These tests have fewer
assumptions and can be used in more situations but the downside is they
tend to be less powerful. This means they are less able to detect a
difference where one exists.
### Independent samples and paired samples.
An important consideration in conducting tests is whether the values in
one group are independent of the values in another group.
Non-independence occurs when the two measures are linked in some way.
The link could be that they are from the same individual, the same time
or the same location. For example, if we want to evaluate a treatment
for high blood pressure we might measure blood pressure before and after
the treatment on the same individuals. This would mean the before and
after measures were not independent. If pairs of observations in
different groups have something in common that make them more similar to
each other than to other observations, then those observations are not
independent. We use a different testing approach for independent and
non-independent samples.
### *T*-tests
A linear model with one explanatory variable with two groups is also
known as a **two-sample *t*-test** when the samples are independent and
as a **paired-samples *t*-test** when they are not. R does have a
`t.test()` function which allows you to fit a linear model with just two
groups. However, here we teach you to use and interpret the `lm()`
function because it is more generalisable. You can use `lm()` when you
have three or more groups or additional explanatory variables. The
output of `lm()` is also in the same form as many other statistical
functions in R. This means what you learn in performing *t*-tests with
`lm()` will help you learn other methods more easily. However, it is
definitely not wrong to use `t.test()` rather than `lm()` for two-group
situations - the procedures are identical and the *p*-values will be the
same.
### Model assumptions
The assumptions of the general linear model are that the residuals are
normally distributed and have homogeneity of variance. A residual is the
difference between the predicted value and the observed value.
If we have a continuous response and a categorical explanatory variable
with two groups, we usually apply the general linear model with `lm()`
and *then* check the assumptions, however, we can sometimes tell when a
non-parametric test would be more appropriate before that:
- Use common sense - the response should be continuous (or nearly
continuous, see [Ideas about data: Theory and
practice](ideas_about_data.html#theory-and-practice)). Consider
whether you would expect the response to be continuous.
- We expect decimal places and few repeated values.
To examine the assumptions after fitting the linear model, we plot the
residuals and test them against the normal distribution in the same way
as we did for single linear regression.
### Reporting
In reporting the result of two-sample test we give:
1. the significance of effect - whether there is there a difference
between the groups
- parametric: whether there is there a difference between the
groups means
- non-parametric: whether there is there a difference between the
group medians
2. the direction of effect - which of the means/medians is greater
3. the magnitude of effect - how big is the difference between the
means/medians
- parametric: the means and standard errors for each group or the
mean difference for paired samples
- non-parametric: the medians for each group or the median
difference for paired samples
Figures should reflect what you have said in the statements. Ideally
they should show both the raw data and the statistical model:
- parametric: means and standard errors
- non-parametric: boxplots with medians and interquartile range
We will explore all of these ideas with some examples.
## 🎬 Your turn!
If you want to code along you will need to start a new [RStudio
project](workflow_rstudio.html#rstudio-projects), add a `data-raw`
folder and open a new script. You will also need to load the
**`tidyverse`** package [@tidyverse].
## Two independent samples, parametric
A number of [subspecies of the common chaffinch, *Fringilla coelebs*](https://en.wikipedia.org/wiki/Common_chaffinch), have been described based principally on the differences in the pattern and colour of the adult male plumage [@suárez2009]. Two of groups of these subspecies are:
- the "coelebs group" (@fig-coelebs) that occurs in Europe and Asia
- the "canariensis group" (@fig-canariensis) that occurs on the Canary Islands.
::: {#fig-subspecies layout-ncol="2"}
![*F. c.
coelebs*](images/512px-Chaffinch_(Fringilla_coelebs).jpg){#fig-coelebs}
![*F. c.
palmae*](images/512px-Fringilla_coelebs_palmae_-_Los_Tilos.jpg){#fig-canariensis}
Adult male *Fringilla coelebs* of the coelebs group on the left (Andreas
Trepte, CC BY-SA 2.5 <https://creativecommons.org/licenses/by-sa/2.5>,
via Wikimedia Commons) and of the canariensis group on the right (H.
Zell, CC BY-SA 3.0 <https://creativecommons.org/licenses/by-sa/3.0>, via
Wikimedia Commons).
:::
The data in [chaff.txt](data-raw/chaff.txt) give the masses of twenty
individuals from each subspecies. We want to know if the subspecies
differ in mass. These groups are independent - there is no link between
values in one group and any value in the other group. In this scenario
our null hypothesis, $H_0$, is that there is no difference between the
two subspecies in mass or that subspecies has no effect on mass.
This is written as: $H_0: \beta_1 = 0$.
### Import and explore
Import the data:
```{r}
chaff <- read_table("data-raw/chaff.txt")
```
```{r}
#| echo: false
knitr::kable(chaff) |>
kableExtra::kable_styling() |>
kableExtra::scroll_box(height = "200px")
```
These data are in tidy format [@Wickham2014-nl] - all the mass values
are in one column with another column indicating the subspecies. This
means they are well formatted for analysis and plotting.
In the first instance, it is always sensible to create a rough plot of
our data. This is to give us an overview and help identify if there are
any issues like missing or extreme values. It also gives us idea what we
are expecting from the analysis which will make it easier for us to
identify if we make some mistake in applying that analysis.
Violin plots (`geom_violin()`), box plots (`geom_boxplot()`, see
@fig-chaff-rough) or scatter plots (`geom_point()`) all make good
choices for exploratory plotting and it does not matter which of these
you choose.
```{r}
#| label: fig-chaff-rough
#| fig-cap: "The mass of two subspecies of chaffinch. A boxplot is a useful way to get an overview of the data and helps us identify any issues such as missing or extreme values. It also tells us what to expect from the analysis."
ggplot(data = chaff,
aes(x = subspecies, y = mass)) +
geom_boxplot()
```
R will order the groups alphabetically by default.
The figure suggests that the canariensis group is heavier than the
coelebs group.
Summarising the data for each subspecies group is the next sensible
step. The most useful summary statistics are the means, standard
deviations, sample sizes and standard errors. I recommend the
`group_by()` and `summarise()` approach:
```{r}
chaff_summary <- chaff |>
group_by(subspecies) |>
summarise(mean = mean(mass),
std = sd(mass),
n = length(mass),
se = std/sqrt(n))
```
We have save the results to `chaff_summary` so that we can use the means
and standard errors in our plot later.
```{r}
chaff_summary
```
### Apply `lm()`
We can create a two-sample model like this:
```{r}
mod <- lm(data = chaff, mass ~ subspecies)
```
And examine the model with:
```{r}
summary(mod)
```
The Estimates in the Coefficients table give:
- `(Intercept)` known as $\beta_0$. The mean of the canariensis group
(@fig-two-sample-lm-model). Just as the intercept is the value of
the *y* (the response) when the value of *x* (the explanatory) is
zero in a simple linear regression, this is the value of `mass` when
the `subspecies` is at its first level. The order of the levels is
alphabetical by default.
- `subspeciescoelebs`, known as $\beta_1$, is what needs to be added
to the mean of the canariensis group to get the mean of the coelebs
group (@fig-two-sample-lm-model). Just as the slope is amount of *y*
that needs to be added for each unit of *x* in a simple linear
regression, this is the amount of `mass` that needs to be added when
the `subspecies` goes from its first level to its second level
(*i.e.*, one unit). The `subspeciescoelebs` estimate is negative so
the the coelebs group mean is lower than the canariensis group mean
The *p*-values on each line are tests of whether that coefficient is
different from zero. Thus it is:
`subspeciescoelebs -1.7950 0.6781 -2.647 0.0118 *`
that tells us the difference between the means is significant.
The *F* value and *p*-value in the last line are a test of whether the
model as a whole explains a significant amount of variation in the
response variable. For a two-sample test, just like a regression, this
is exactly equivalent to the test of the slope against zero and the two
*p*-values will be the same.
```{r}
#| echo: false
#| label: fig-two-sample-lm-model
#| fig-cap: "In a two-sample linear model, the first estimate is the intercept which is the mean of the first group. The second estimate is the 'slope' which is what has to added to the intercept to get the second group mean. Note that y axis starts at 15 to create more space for the annotations."
ggplot() +
geom_errorbar(data = chaff_summary,
aes(x = subspecies, ymin = mean, ymax = mean),
colour = pal3[1], linewidth = 1,
width = 1) +
scale_x_discrete(expand = c(0,0)) +
scale_y_continuous(expand = c(0,0),
limits = c(15, 30),
name = "Mass (g)") +
geom_segment(aes(x = 0.8,
xend = -Inf,
y = mod$coefficients[1] + 3,
yend = mod$coefficients[1]),
colour = pal3[2]) +
annotate("text",
x = 1.1,
y = mod$coefficients[1] + 4,
label = glue::glue("Intercept (β0) is mean of\n{ chaff_summary$subspecies[1] }: { mod$coefficients[1] |> round(2) }"),
colour = pal3[2],
size = 4) +
geom_segment(aes(x = 1.5,
xend = 1.5,
y = mod$coefficients[1],
yend = mod$coefficients[1] + mod$coefficients[2]),
colour = pal3[3],
arrow = arrow(length = unit(0.03, "npc"),
ends = "both")) +
geom_segment(aes(x = 1.7,
xend = 1.5,
y = mod$coefficients[1] + 3,
yend = mod$coefficients[1] - 1),
colour = pal3[2]) +
annotate("text",
x = 2,
y = mod$coefficients[1] + 5,
label = glue::glue("subspeciescoelebs (β1) is the\ndifference between mean of\n{ chaff_summary$subspecies[1] } and { chaff_summary$subspecies[2] }: { mod$coefficients[2] |> round(2) }"),
colour = pal3[2],
size = 4) +
theme_classic()
```
### Check assumptions
Check the assumptions: All general linear models assume the "residuals"
are normally distributed and have "homogeneity" of variance.
Our first check of these assumptions is to use common sense: mass is a
continuous variable and we would expect it to be normally distributed
thus we would also expect the residuals to be normally distributed.
We then plot the residuals. The `plot()` function can be used to plot
the residuals against the fitted values (See @fig-lm-plot1). This is a
good way to check for homogeneity of variance.
```{r}
#| label: fig-lm-plot1
#| fig-cap: "A plot of the residuals against the fitted values shows the points are distributed similarly in each group. This is a good sign for the assumption of homogeneity of variance."
plot(mod, which = 1)
```
We can also use a histogram to check for normality (See @fig-lm-plot2).
```{r}
#| label: fig-lm-plot2
#| fig-cap: "A histogram of residuals is symetrical and seems consistent with a normal distribution. This is a good sign for the assumption of normally distributed residuals."
ggplot(mapping = aes(x = mod$residuals)) +
geom_histogram(bins = 10)
```
Finally, we can use the Shapiro-Wilk test to test for normality.
```{r}
shapiro.test(mod$residuals)
```
The p-value is greater than 0.05 so this test of the normality
assumption is not significant.
Taken together, these results suggest that the assumptions of normality
and homogeneity of variance are not violated.
### Report
Canariensis chaffinches ($\bar{x} \pm s.e$: 22.48 $\pm$ 0.48) were
significantly heavier than Coelebs (20.28 $\pm$ 0.48 ) (*t* = 2.65;
*d.f.* = 38; *p* = 0.012). See @fig-chaff.
::: {#fig-chaff}
```{r}
#| code-fold: true
ggplot() +
geom_point(data = chaff, aes(x = subspecies, y = mass),
position = position_jitter(width = 0.1, height = 0),
colour = "gray50") +
geom_errorbar(data = chaff_summary,
aes(x = subspecies, ymin = mean - se, ymax = mean + se),
width = 0.3) +
geom_errorbar(data = chaff_summary,
aes(x = subspecies, ymin = mean, ymax = mean),
width = 0.2) +
scale_y_continuous(name = "Mass (g)",
limits = c(0, 30),
expand = c(0, 0)) +
scale_x_discrete(name = "Subspecies",
labels = c("Canariensis", "Coelebs")) +
annotate("segment", x = 1, xend = 2,
y = 28, yend = 28,
colour = "black") +
annotate("text", x = 1.5, y = 29,
label = expression(italic(p)~"= 0.012")) +
theme_classic()
```
**Canariensis chaffinches are heavier than Coelebs chaffinches**. The
mean mass of 20 randomly sampled males from each subspecies was
determined. Error bars are $\pm$ 1 standard error. Canariensis
chaffinches were significantly heavier than Coelebs (*t* = 2.65; *d.f.*
= 38; *p* = 0.012). Data analysis was conducted in R [@R-core] with
tidyverse packages [@tidyverse].
:::
## Two independent samples, non-parametric
The non-parametric equivalent of the linear model with two independent
samples is the "Wilcoxon rank sum test" [@wilcoxon1945]. It is commonly
also known as the Mann-Whitney or Wilcoxon–Mann–Whitney.
The general question you have about your data - are these two groups
different - is the same, but one of more of the following is true:
- the response variable is not continuous
- the residuals are not normally distributed
- the sample size is too small to tell if they are normally
distributed.
- the variance is not homogeneous
The test is a applied in R with the `wilcox.test()` function.
The data in [arabidopsis.txt](data-raw/arabidopsis.txt) give the number
of leaves on eight wildtype and eight mutant *Arabidopsis thaliana*
plants. We want to know if the two types of plants have differing
numbers of leaves. These are counts, so they are not continuous and the
sample sizes are quite small. A non-parametric test is a safer option.
### Import and explore
```{r}
arabidopsis <- read_table("data-raw/arabidopsis.txt")
```
These data are in tidy format [@Wickham2014-nl] - the numbers of leaves
are in one column with another column indicating whether the observation
comes from a wildtype or mutant *Arabidopsis*. This means they are well
formatted for analysis and plotting.
Create a quick plot of the data:
```{r}
#| label: fig-arabid-rough
#| fig-cap: "The number of leaves on mutant and wildtype plants. A boxplot is a useful way to get an overview of the data and helps us identify any issues such as missing or extreme values. It also tells us what to expect from the analysis."
ggplot(data = arabidopsis,
aes(x = type, y = leaves)) +
geom_boxplot()
```
Our rough plot shows that the mutant plants have fewer leaves than the
wildtype plants.
Summarising the data using the median and interquartile range is more
aligned to the type of data and the type of analysis than using means
and standard deviations:
```{r}
arabidopsis_summary <- arabidopsis |>
group_by(type) |>
summarise(median = median(leaves),
interquartile = IQR(leaves),
n = length(leaves))
```
View the results:
```{r}
arabidopsis_summary
```
### Apply `wilcox.test()`
We pass the dataframe and variables to `wilcox.test()` in the same way
as we did for `lm()`. We give the data argument and a "formula" which
says `leaves ~ type` meaning "explain leaves by type".
```{r}
wilcox.test(data = arabidopsis, leaves ~ type)
```
The warning message "Warning: cannot compute exact p-value with ties" is
not something to worry about too much. It is a warning rather than an
indication that your results are incorrect. It means the *p* -value is
based on an approximation rather than being exact because there are ties
(some values are the same).
The result of the test is given on this line:
`W = 5, p-value = 0.005051`. `W` is the test statistic. The *p*-value is
less than 0.05 meaning there is a significant difference in the number
of leaves on wildtype and mutant plants.
### Report
There are significantly more leaves on wildtype (median = 8.5) than
mutant (median = 5) plants (Wilcoxon rank sum test: *W* = 5, $n_1$ = 8,
$n_2$ = 8, *p* = 0.005). See @fig-arabid.
::: {#fig-arabid}
```{r}
#| code-fold: true
ggplot(data = arabidopsis,
aes(x = type, y = leaves)) +
geom_boxplot() +
scale_y_continuous(name = "Number of leaves",
limits = c(0, 12),
expand = c(0, 0)) +
scale_x_discrete(name = "",
labels = c("Mutatnt", "Wildtype")) +
annotate("segment", x = 1, xend = 2,
y = 10.5, yend = 10.5,
colour = "black") +
annotate("text", x = 1.5, y = 11,
label = expression(italic(p)~"= 0.005")) +
theme_classic()
```
**Mutant *Arabidopsis thaliana* have fewer leaves**. There are
significantly more leaves on wildtype than mutant plants (Wilcoxon rank
sum test: *W* = 5, $n_1$ = 8, $n_2$ = 8, *p* = 0.005). The heavy lines
indicate the median of leaves, boxes indicate the interquartile range
and whiskers the range. Data analysis was conducted in R [@R-core] with
tidyverse packages [@tidyverse].
:::
## Two paired-samples, parametric
The data in [marks.csv](data-raw/marks.csv) give the marks for ten
students in two subjects: Data Analysis and Biology. These data are
paired because we have two marks from one student so that a mark in one
group has a closer relationship with one of the marks in the other group
than with any of the other values. We want to know if students do equally
well in both subjects.
### Import and explore
Import the data:
```{r}
marks <- read_csv("data-raw/marks.csv")
```
Since these data are paired, it makes sense to highlight how the marks
differ for each student. One way of doing that is to draw a line linking
their marks in each subject. This is known as a spaghetti plot. We can
use two geoms: `geom_point()` and `geom_line()`. To join a student's
marks, we need to set the `group` aesthetic to `student`.[^two_sample_tests-1]
[^two_sample_tests-1]: You might like to try removing
`aes(group = student)` to see what ggplot does when the lines are
*not* grouped by student.
```{r}
ggplot(data = marks, aes(x = subject, y = mark)) +
geom_point() +
geom_line(aes(group = student))
```
Summarise the data so that we can use the means in plots later:
```{r}
marks_summary <- marks |>
group_by(subject) |>
summarise(mean = mean(mark))
```
A paired test requires us to into account the variation between
students.
### Apply `lm()`
We can create a paired-sample model with the `lm()`
function[^two_sample_tests-2] like this:
[^two_sample_tests-2]: This is not the only way to apply a paired test.
When there are only two groups and no other explanatory variables, we
can use `t.test(data = marks, mark ~ subject, paired = TRUE)`. A
more general method that works when you have two or more
non-independent values (e.g., more than two subjects) or additional
explanatory variables is to create a "linear mixed model" with
`lmer()`.
```{r}
mod <- lm(mark ~ subject + factor(student), data = marks)
```
- `mark` is the dependent variable (response).
- `subject` is the independent variable (explanatory factor).
- `factor(student)` accounts for the pairing by treating student as
another explanatory variable. We have used factor because the values
in students are the numbers 1 to 10 and we want `student` to be
treated as a category not a number
```{r}
summary(mod)
```
The coefficient for `(Intercept)` gives the mean Biology mark and that
for `subjectDataAnalysis` is amount that the Data Analysis mark are
above Biology marks in general. The *p*-value tests whether this
difference is significantly different from zero. The rest of the output
considers how students differ. You can ignore this here.
### Check assumptions
We might expect marks to be normally distributed. However, this is a
very small sample, and choosing a non-parametric test instead would be
reasonable. However, we will continue with this example to demonstrate
how to interpret and report on the result of a parametric paired-samples
test (paired-samples *t*-test).
A plot the residuals against the fitted values (`plot(mod, which = 1)`)
is not useful for a paired test. The normality of the residuals should
be checked.
```{r}
ggplot(mapping = aes(x = mod$residuals)) +
geom_histogram(bins = 3)
```
We only have 10 values, so the distribution is never going to look
smooth. We can't draw strong conclusions from this, but we do at least
have a peak at 0. Similarly, a normality test is likely to be
non-significant because of the small sample size, meaning the test
is not very powerful. This means a non-significant result
is not strong evidence of the residuals following a normal distribution:
```{r}
shapiro.test(mod$residuals)
```
### Report
Individual students score significantly higher in Data Analysis than in
Biology (*t* = 2.47; *d.f.* = 9; *p* = 0.0355) with an average
difference of 6.5%. See @fig-marks
::: {#fig-marks}
```{r}
#| code-fold: true
ggplot(data = marks, aes(x = subject, y = mark)) +
geom_point(pch = 1, size = 3) +
geom_line(aes(group = student), linetype = 3) +
geom_point(data = marks_summary,
aes(x = subject, y = mean),
size = 3) +
scale_x_discrete(name = "") +
scale_y_continuous(name = "Mark",
expand = c(0, 0),
limits = c(0, 110)) +
annotate("segment", x = 1, xend = 2,
y = 105, yend = 105,
colour = "black") +
annotate("text", x = 1.5, y = 108,
label = expression(italic(p)~"= 0.0355")) +
theme_classic()
```
**Students score higher in Data Analysis than in Biology**. Open circles indicate an individual student's marks in each subject with dashed lines joining their marks in each subject. The filled circles indicate the mean mark for each subject. Individual students score significantly higher in Data Analysis than in
Biology (*t* = 2.47; *d.f.* = 9; *p* = 0.0355) with an average difference of 6.5%. Data analysis was conducted in R [@R-core] with tidyverse packages [@tidyverse].
:::
## Two paired-samples, non-parametric
We have the marks for just 10 students. This sample is too small for us to judge whether the marks are normally distributed. We will use a non-parametric test instead. The "Wilcoxon signed-rank" test is the non-parametric equivalent of the paired-samples *t*-test. This is often referred to as the paired-sample
Wilcoxon test, or just the Wilcoxon test.
The test is also applied in R with the `wilcox.test()` function but we add the `paired = TRUE` argument. We can also use the tidy format of the data, i.e., the long format with marks in one column and another column indicating the subject.
### Apply `wilcox.test()`
To apply a paired test with `wilcox.test()` we need to use the long
format data (tidy) and add the `paired = TRUE` argument.
```{r}
#wilcox.test(data = marks, mark ~ subject, paired = TRUE)
```
### Report
Individual students score significantly higher in Data Analysis than in
Biology (Wilcoxon signed rank test: *V* = 6.55; $n$ = 10; *p* = 0.036).
See @fig-marks
## Summary
1. A linear model with one explanatory variable with two groups and
one continuous response is "a two-sample test".
2. If pairs of observations in the groups have something in common
that make them more similar to each other, than to other observations,
then those observations are not independent. A **paired-samples test**
is used when the observations are not independent.
3. A linear model with one explanatory variable with two groups and
one continuous response is also known as a **two-sample *t*-test**
when the samples are independent and as a **paired-samples *t*-test**
when they are not
4. We can use `lm()` to do two-sample and paired sample tests. We can
also use `t.test()` for these but using `lm()` helps us understand
tests with more groups and/or more variables where we will have to
use `lm()`. The output of `lm()` is also more typical of the output
of statistical functions in R.
5. We estimate the **coefficients** (also called the **parameters**) of
the model. For a two-sample test these are the mean of the first
group, $\beta_0$ (which might also be called the intercept) and the
difference between the means of the first and second groups,
$\beta_1$ (which might also be called the slope). For a
paired-sample test there is just one parameter, the mean difference
between pairs of values, $\beta_0$ (which might also be called the
intercept). We test whether the parameters differ significantly from
zero
6. We can use `lm()` to a linear regression.
7. In the output of `lm()` the coefficients are listed in a table in the
Estimates column. The *p*-value for each coefficient is in the test
of whether it differs from zero. At the bottom of the output there
is a test of the model *overall*. In this case, this
is exactly the same as the test of the $\beta_1$ and the p-values are
identical. The R-squared value is the proportion of the variance
in the response variable that is explained by the model.
8. The assumptions of the general linear model are that the residuals
are normally distributed and have homogeneity of variance. A residual
is the difference between the predicted value and the observed value.
9. We examine a histogram of the residuals and use the Shapiro-Wilk
normality test to check the normality assumption. We check the
variance of the residuals is the same for all fitted values with
a residuals vs fitted plot.
10. If the assumptions are not met, we can use alternatives known as
non-parametric tests. These are applied with `wilcox.test()` in R.
11. When reporting the results of a test we give the significance,
direction and size of the effect. Our figures and the values we give
should reflect the type of test we have used. We use means and
standard errors for parametric tests and medians and interquartile
ranges for non-parametric tests. We also give the test statistic, the
degrees of freedom (parametric) or sample size (non-parametric) and
the p-value. We annotate our figures with the p-value, making clear
which comparison it applies to.