forked from rdpeng/RepData_PeerAssessment1
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathPA1_template.Rmd
141 lines (105 loc) · 3.71 KB
/
PA1_template.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
---
title: "Reproducible Research: Peer Assessment 1"
output:
html_document:
keep_md: true
---
## Loading and preprocessing the data
```{r}
activityZip <- 'activity.zip';
activityFilename <- 'activity.csv';
if (!file.exists(activityFilename))
unzip(activityZip);
activity <- read.csv(activityFilename, header = TRUE, na.strings = "NA");
summary(activity)
```
## What is mean total number of steps taken per day?
Aggregate activity by date - sum steps for each day
```{r}
aggregatedByDate <- aggregate(. ~ date, data=activity, FUN = sum);
```
Histogram of total steps per day
```{r}
hist(aggregatedByDate$steps, breaks = 30, xlab = "Total steps per day", main = "Histogramm of total steps per day", col=3);
```
Calculate mean of total steps per day
```{r}
meanTotalSteps <- mean(aggregatedByDate$steps);
meanTotalSteps
```
Calculate median of total steps per day
```{r}
medianTotalSteps <- median(aggregatedByDate$steps)
medianTotalSteps
```
## What is the average daily activity pattern?
Aggregate activity by intervals - average steps for each interval
```{r}
aggregatedByInterval <- aggregate(steps ~ interval, data=activity, FUN = mean);
```
Plot of average steps per interval
```{r}
plot(x = aggregatedByInterval$interval, y = aggregatedByInterval$steps, type = "l", xlab = "Interval", ylab="Average steps" )
```
Which interval has the maximum number of steps averaged by day
```{r}
aggregatedByInterval[which.max(aggregatedByInterval$steps), ];
```
## Imputing missing values
Total number of missing values in the dataset (i.e. the total number of rows with NAs)
```{r}
sum(!complete.cases(activity))
```
Fill out missing values with average value for interval
```{r}
activityNoNA <- activity;
for (i in 1:nrow(activityNoNA)) {
if (is.na(activityNoNA$steps[i])) {
activityNoNA$steps[i] <- aggregatedByInterval[activityNoNA$interval[i] == aggregatedByInterval$interval, ]$steps;
}
}
summary(activityNoNA)
```
Histogram of the total number of steps taken each day
```{r}
aggregatedByDateImputed <- aggregate(. ~ date, data=activityNoNA, FUN = sum);
hist(aggregatedByDateImputed$steps, breaks = 30, xlab = "Total steps per day", main = "Histogramm of total steps per day", col=3);
```
Imputing effect on estimates
Mean imputed total steps:
```{r}
meanImputedTotalSteps <- mean(aggregatedByDateImputed$steps)
meanImputedTotalSteps
```
Median imputed total steps:
```{r}
medianImputedTotalSteps <- median(aggregatedByDateImputed$steps)
medianImputedTotalSteps
```
Difference of mean value:
```{r}
meanabsdiff = abs((meanTotalSteps - meanImputedTotalSteps)/meanTotalSteps*100);
meanTotalSteps - meanImputedTotalSteps
```
mean differs in percentage `r meanabsdiff` % - so mean is not changed
Difference of median value:
```{r}
medianabsdiff = abs((medianTotalSteps - medianImputedTotalSteps)/medianTotalSteps*100);
medianTotalSteps - medianImputedTotalSteps
```
median differs in percentage `r medianabsdiff` %
## Are there differences in activity patterns between weekdays and weekends?
Introduce new variable dayType weekend or weekday
```{r}
activityNoNA['dayType'] <- weekdays(as.Date(activityNoNA$date));
activityNoNA[activityNoNA$dayType %in% c('Saturday','Sunday'), ]$dayType <- "weekend";
activityNoNA[activityNoNA$dayType != "weekend", ]$dayType <- "weekday";
activityNoNA$dayType <- as.factor(activityNoNA$dayType);
summary(activityNoNA$dayType)
```
Average steps in interval by day type across all days and make a plots of steps by day type
```{r}
aggregatedByDateTypeImputed <- aggregate(steps ~ interval + dayType, activityNoNA, mean);
library(ggplot2)
qplot(interval, steps, data = aggregatedByDateTypeImputed, geom=c("line"), xlab = "Interval", ylab = "Number of steps" ) + facet_wrap(~ dayType, ncol = 1)
```