-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathseattle_911.Rmd
194 lines (163 loc) · 9.18 KB
/
seattle_911.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
---
title: "Understanding 911 calls around UW"
output: html_notebook
---
_This analysis adapte from [Walking the Beat: Mining Seattle’s Police Report Data](https://www.bayesimpact.org/stories/?name=walking-the-beat-mining-seattles-police-report-data) by Jeff Wong_
This is an [R Markdown](http://rmarkdown.rstudio.com) Notebook. When you execute code within the notebook, the results appear beneath the code.
Try executing this chunk by clicking the *Run* button within the chunk or by placing your cursor inside it and pressing *Cmd+Shift+Enter*.
# 1. Including a few packages
```{r}
if (!require(data.table)) { install.packages('data.table'); require(data.table) }
if (!require(ggplot2)) { install.packages('ggplot2'); require(ggplot2) }
if (!require(reshape2)) { install.packages('reshape2'); require(reshape2) }
if (!require(devtools)) { install.packages('devtools'); require(devtools) }
if (!require(ggmap)) { install.packages('ggmap'); require(ggmap) }
if (!require(glmnet)) { install.packages('glmnet'); require(glmnet) }
if (!require(fields)) { install.packages('fields'); require(fields) }
if (!require(dplyr)) { install.packages('dplyr'); require(dplyr) }
if (!require(lubridate)) { install.packages('lubridate'); require(lubridate) }
```
# 2. Loading the data
```{r}
MGH_LONGITUDE = -122.307987
MGH_LATITUDE = 47.655038
PATH_DATA_ALL = paste(getwd(), "/data/Seattle_Police_Department_911_Incident_Response.csv", sep = "")
PATH_DATA_MGH = paste(getwd(), "/data/Seattle_Police_Department_911_Incident_Response_Near_MGH.csv", sep = "")
PATH_DATA_MGH_TINY = paste(getwd(), "/data/Seattle_Police_Department_911_Incident_Response_Near_MGH_tiny.csv", sep = "")
if(file.exists(PATH_DATA_MGH)){ # switch variable to PATH_DATA_MGH_TINY if there are issues with scale of data
data <- fread(PATH_DATA_MGH, header=T, sep=",")
} else {
data_all = fread(PATH_DATA_ALL, header=T, sep=",")
data_all <- data_all %>% mutate(dist_mgh = sqrt((MGH_LATITUDE-Latitude)^2 + (MGH_LONGITUDE-Longitude)^2)) # Finding crimes closest to MGH
data_mgh <- filter(data_all, dist_mgh<0.01) #22,110 rows
data_mgh_tiny <- filter(data_mgh, dist_mgh<0.007) #5,100 rows
}
#adding columns to the data
data[,at_scene_time_ts := as.POSIXct(strptime(`At Scene Time`, "%m/%d/%Y %I:%M:%S %p"))] #converting time from String to date and time representation (POSIXct)
data[,at_scene_time_hr := hour(ymd_hms(as.character(at_scene_time_ts)))]
data[,at_scene_time_date := as.Date(at_scene_time_ts)]
data[,at_scene_time_week := floor(as.numeric(at_scene_time_date - min(at_scene_time_date, na.rm=T)) / 7) + 1]
data[,event_clearance_ts := as.POSIXct(strptime(`Event Clearance Date`, "%m/%d/%Y %I:%M:%S %p"))]
data[,event_clearance_date := as.Date(event_clearance_ts)]
data[,event_clearance_hr := hour(ymd_hms(as.character(event_clearance_ts)))]
data[,time_until_event_clear := as.numeric(event_clearance_ts - at_scene_time_ts)]
data[,`Initial Type Group` := factor(`Initial Type Group`)]
data[,`Event Clearance Group` := factor(`Event Clearance Group`)]
data[,`Zone/Beat` := factor(`Zone/Beat`)]
data[,LatitudeBin := round(Latitude, 3)]
data[,LongitudeBin := round(Longitude, 3)]
data[,crime_type := ifelse(`Event Clearance Group` %in% crimes.violent, "Violent", ifelse(`Event Clearance Group` %in% crimes.serious, "Serious", "Minor"))]
View(data)
```
# 3. Missing Data
Missing data can be a problem. If a large proportion of data is missing, we may end with results that are not representative of the population.
*TODO* There are several options for date/time (at_scene_time_ts, event_clearance_ts). Figure out what proportion of reports have values.
_hint_: is.na can be helpful
```{r}
?is.na
total_reports = nrow(data)
total_reports
blank_times = filter(data, is.na(at_scene_time_ts)==TRUE)
blank_times_count = nrow(blank_times)
prop_blank = blank_times_count/total_reports
# write code below!
```
#4. Frequency of Crimes by Day of Week
Let's do a quick sanity check of our data by looking at the number of reports by day of week. Because police services are available every day of the week, we would expect at least some reports occuring each day of the week. Run the code below to verify that.
```{r}
sanityCheck = data.frame(data[,table(weekdays(event_clearance_date))]); # frequency of crimes by day of week
View(sanityCheck)
sanityCheck$Var1 = factor(sanityCheck$Var1, levels = rev(c("Sunday", "Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday")))
ggplot(sanityCheck,
aes(x = Var1, y = Freq, fill = as.numeric(Var1) %% 2)) +
geom_bar(stat = 'identity') +
coord_flip() +
theme_bw() +
xlab ("Day of Week") + ylab("Count") +
guides(fill=F)
```
#5. Frequency of Crimes by Time of Day
Perhaps we should have more officers patroling during times of day which are high crime.
*TODO* Plot the number of reported incidents by hour of day and see what suggestions you may make.
_hint_ this code should look similar to the previous block (Frequency of Crimes by Day of Week).
_challenge_: Crime by time of day may vary by weekend days and weekdays. Plot the crime by times of day for weekdays and weekend days separately.
```{r}
# Your code below!
sanityCheckHr = data.frame(data[,table(at_scene_time_hr)]); # frequency of crimes by day of hour
View(sanityCheckHr)
ggplot(sanityCheckHr,
aes(x = at_scene_time_hr, y = Freq, fill = as.numeric(at_scene_time_hr) %% 2)) +
geom_bar(stat = 'identity') +
coord_flip() +
theme_bw() +
xlab ("Hour of Day") + ylab("Count") +
guides(fill=F)
```
#6. Types of contacts
Not all crime is the same! Let's see the frequency of each type of crime. We provide a function ggplot.freqtable.1d to generate the plot. Run the code below.
```{r}
#Uses a frequency table to build a bar chart that is ordered by the counts
ggplot.freqtable.1d = function(x) {
freq = data.frame(x); colnames(freq)[1] = "label"
freq.filtered = droplevels(subset(freq, label != "" & !is.na(label)))
freq.filtered$label = factor(freq.filtered$label,
levels = as.character(freq.filtered[order(freq.filtered$Freq, decreasing = F),]$label),
ordered = F)
ggplot(freq.filtered,
aes(x = label, y = Freq, fill = label)) +
geom_bar(stat = 'identity') +
coord_flip() +
theme_grey()
}
#Ploting frequency of each type of report
ggplot.freqtable.1d(data[,table(`Event Clearance Group`)]) +
xlab("Contact Type") + ylab("Count") +
theme_bw() +
guides(fill=F)
```
#7. Digging deeper in the type of event
There are a lot of "Disturbances" and "Suspicious Circumstances" reported, but what does that mean? The "Event Clearance Description" field may provide more information.
*TODO*: Using the ggplot.freqtable.1d function from above, generate plots which provide more information on the types of crimes which are labeled as "Disturbances."
_hint_: Filter the data by `Event Clearance Group`
_challenge_: Do the same analysis for the "Suspicious Circumstances" group.
```{r}
#TODO
```
#8. Heatmap of Crimes
Some places may be more of hotbeds from crime. Run the code below to figure out which areas have more reported crimes.
```{r}
ZOOM <- 15 # number between 1 and 21
seattle = get_map(location = c(lon = MGH_LONGITUDE, lat = MGH_LATITUDE), zoom = ZOOM, maptype = 'roadmap')
data.filtered = data[!is.na(time_until_event_clear) & time_until_event_clear > 0,
list(count = .N),
list(LongitudeBin, LatitudeBin)]
# If you run into an error about GeomRasterAnn being built on an incompatible version of ggproto, you must reinstall ggmap. Use this command: install.packages("ggmap", type = "source")
ggmap(seattle) +
geom_point(data = data.filtered[count > 10],
mapping = aes(x = LongitudeBin, y = LatitudeBin, size = log10(count), alpha = log10(count)), color = 'blue') +
theme_grey() +
theme(legend.position="top") +
xlab("Longitude") + ylab("Latitude") + labs(title = "Heatmap of Crimes")
```
#9: Responding to casualties
Crimes where people are injured require a fast response. Run this code to look at a heatmap of the time it takes to clear crimes involving injury.
*TODO*: Change the type of crime to determine if clearance time varies by crime.
```{r}
crime = 'PERSON DOWN/INJURY'
data.filtered = data[!is.na(time_until_event_clear) & time_until_event_clear > 0]
data.filtered.contact = data.filtered[`Event Clearance Group` == crime]
time_until_event_clear.q01 = quantile(data.filtered.contact$time_until_event_clear, .01); time_until_event_clear.q99 = quantile(data.filtered.contact$time_until_event_clear, .99)
data.filtered.contact = data.filtered.contact[time_until_event_clear > time_until_event_clear.q01 & time_until_event_clear < time_until_event_clear.q99]
data.filtered.contact2 = data.filtered.contact[,list(Hours = mean(time_until_event_clear / 3600), count = .N), list(Longitude = LongitudeBin, Latitude = LatitudeBin)]
ggmap(seattle) +
geom_point(data = data.filtered.contact2,
mapping = aes(x = Longitude, y = Latitude, color = Hours, alpha = Hours, size = count)) +
scale_colour_gradientn(colours=c("blue", "red")) +
scale_alpha(range = c(0.2, 0.7)) +
scale_size_continuous(range = c(5,20)) +
guides(alpha=F,size=F) +
theme_grey() +
theme(legend.position="top") +
xlab("Lon") + ylab("Lat") +
labs(title = sprintf("Time Until %s Cleared", crime))
```