Skip to content

Commit

Permalink
add rmd file on estimating densities from point patterns
Browse files Browse the repository at this point in the history
  • Loading branch information
ThierryO committed Jun 28, 2017
1 parent e9436f6 commit e09be23
Show file tree
Hide file tree
Showing 3 changed files with 10,585 additions and 0 deletions.
2 changes: 2 additions & 0 deletions source/_site.yml
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,8 @@ navbar:
href: "data-handling-large-files-R.html"
- text: "Read data from INBO databases (SQL Server) from R"
href: "data-handling-database-R.html"
- text: "Estimate density from point pattern"
href: "point-pattern-R.html"
- text: "Git tutorials"
menu:
- text: "Git hands-on session using Github Desktop"
Expand Down
132 changes: 132 additions & 0 deletions source/point-pattern-R.Rmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,132 @@
---
title: Estimating densities from a point pattern
maintainer: Thierry Onkelinx
date: '2017-06-28'
output: html_document
params:
cellsize: 20
---

In this example we focus on a set of 10450 coordinates in a small area. The goal is to estimate the local density of points, expressed as the number of point per unit area. The raw coordinates are given in [WGS84 (EPSG:4326)](https://epsg.io/4326), which is a geodetic coordinate system. That is not suited for calculating distances, so we need to re-project the points into a local projected coordinate system. In this case we use [Lambert72 (EPSG:3170)](https://epsg.io/31370). Next we calculate the density. To visualise the density, we have to transform the results back in to WGS84.

The data used in this example is real data by centred to a different location for privacy reasons. The dataset is available on [GitHub](https://github.com/ThierryO/my_blog/tree/master/data/20170628).

First we must read the data into R. Plotting the raw data helps to check errors in the data.

```{r read-data, fig.cap = "Raw data"}
points <- read.delim("point-pattern/points.txt", sep = " ")
library(ggmap)
map <- get_map(
location = c(lon = mean(points$lon), lat = mean(points$lat)),
zoom = 17,
maptype = "satellite",
source = "google"
)
ggmap(map) +
geom_point(data = points, alpha = 0.1, colour = "blue", shape = 4)
```

The next step is to convert the dataset in to a `SpatialPoints` object with WGS84 project and re-project it into Lambert72. `sp::CRS()` defines the coordinate systems. `sp::coordinates()<-` is an easy way to convert a `data.frame` into a `SpatialPointsDataFrame`, but without specifying a coordinate system. Therefore we need to override the `proj4string` slot with the correct coordinate system. `sp::spTransform()` converts the spatial object from the current coordinate system to another coordinate system.

```{r reproject}
library(sp)
crs_wgs84 <- CRS("+init=epsg:4326")
crs_lambert <- CRS("+init=epsg:31370")
coordinates(points) <- ~lon + lat
points@proj4string <- crs_wgs84
points_lambert <- spTransform(points, crs_lambert)
```

Once we have the points into a projected coordinate system, we can calculate the densities. We start by defining a grid. `cellsize` is the dimension of the square grid cell in the units of the projected coordinate system. Meters in case of Lambert72. The boundaries of the grid are defined using `pretty()`, which turns a vector of numbers into a "pretty" vector with rounded numbers. The combination of the boundaries and the cell size determine the number of grid cells `n` in each dimension. `diff()` calculates the difference between to adjacent numbers of a vector. The density is calculated with `MASS::kde2d()` based on the vectors with the longitude and latitude, the number of grid cells in each dimension and the boundaries of the grid. This returns the grid as a list with elements `x` (a vector of longitude coordinates of the centroids), `y` (a vector of latitude coordinates of the centroids) and `z` (a matrix with densities). The values in `z` are densities for the 'average' point per unit area. When we multiply the value `z` with the area of the grid cell and sum all of them we get 1. So if we multiple `z` with the number of points we get the density of the points per unit area.

We use [`dplyr::mutate()`](http://dplyr.tidyverse.org/) to convert it into a `data.frame`. The last two steps convert the centroids into a set of coordinates for square polygons.

```{r density}
library(MASS)
library(dplyr)
xlim <- range(pretty(points_lambert$lon)) + c(-100, 100)
ylim <- range(pretty(points_lambert$lat)) + c(-100, 100)
n <- c(
diff(xlim),
diff(ylim)
) / params$cellsize + 1
dens <- kde2d(
x = points_lambert$lon,
y = points_lambert$lat,
n = n,
lims = c(xlim, ylim)
)
dx <- diff(dens$x[1:2])
dy <- diff(dens$y[1:2])
sum(dens$z * dx * dy)
dens <- expand.grid(
lon = dens$x,
lat = dens$y
) %>%
mutate(
density = as.vector(dens$z) * length(points_lambert),
id = seq_along(density)
) %>%
merge(
data.frame(
x = dx * (c(0, 0, 1, 1, 0) - 0.5),
y = dy * (c(0, 1, 1, 0, 0) - 0.5)
)
) %>%
mutate(
lon = lon + x,
lat = lat + y
)
```

In order to visualise the result, we have to re-project the coordinates back to WGS84. Then we can display the raster with a web based background image.

```{r ggmap, fig.cap = "Static image of density"}
coordinates(dens) <- ~lon + lat
dens@proj4string <- crs_lambert
dens_wgs <- spTransform(dens, crs_wgs84) %>%
as.data.frame()
ggmap(map) +
geom_polygon(data = dens_wgs, aes(group = id, fill = density), alpha = 0.5) +
scale_fill_gradientn(
"density\n(#/m²)",
colours = rev(rainbow(100, start = 0, end = .7)),
limits = c(0, NA)
)
```

Using `leaflet` to generate a map was a bit more laborious. Using the `data.frame dens_wgs`directly failed. So we converted the `data.frame` in a `SpatialPolygonsDataframe`, which is a combination of a `SpatialPolygons` and a `data.frame`. The `SpatialPolygons` consists of a list of `Polygons`, one for each row of the `data.frame`. A `Polygons` object consist of a list of one or more `Polygon` object. In this example a single polygon which represents the grid cell.

```{r convert-to-Spatial-Polygons}
dens_sp <- lapply(
unique(dens_wgs$id),
function(i){
filter(dens_wgs, id == i) %>%
select(lon, lat) %>%
Polygon() %>%
list() %>%
Polygons(ID = i)
}
) %>%
SpatialPolygons() %>%
SpatialPolygonsDataFrame(
data = dens_wgs %>%
distinct(id, density),
match.ID = FALSE
)
```

`leaflet` requires a predefined function with a colour pallet. We use `leaflet::colorNumeric()` to get a continuous pallet. Setting `stroke = FALSE` removes the borders of the polygon. `fillOpacity` sets the transparency of the polygons.

```{r leaflet, fig.cap = "Dynamic map of density"}
library(leaflet)
pal <- colorNumeric(
palette = rev(rainbow(100, start = 0, end = .7)),
domain = c(0, dens_sp$density)
)
leaflet(dens_sp) %>%
addTiles() %>%
addPolygons(color = ~pal(density), stroke = FALSE, fillOpacity = 0.5) %>%
addLegend(pal = pal, values = ~density)
```

Loading

0 comments on commit e09be23

Please sign in to comment.