-
Notifications
You must be signed in to change notification settings - Fork 5
/
Copy path02-geodata.Rmd
1034 lines (874 loc) · 60.6 KB
/
02-geodata.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
# Spatial data in R {#geodata}
<!-- # Geographic data in R {#geodata} -->
<!-- to discuss -->
\index{spatial data}
## Introduction {#intro-geodata}
Vector and raster data models are two basic models used to represent spatial data.
These spatial data models are closely related to map making, with each model having its own pros and cons.
<!-- - few introduction sections -->
<!-- - mention GDAL, PROJ, and GEOS -->
<!-- - maybe also mention some data sources -->
This chapter stars by describing several popular spatial data models (section \@ref(data-models)).
Each data model is introduced, explained how it is built, and how it is stored using different file formats.
Next, this chapter presents how these different data models are implemented in R (section \@ref(spatial-data-representations-in-r)).
It includes showing how to read different spatial data formats, how to understand spatial R objects, and where to find more information about preprocessing spatial data.
Finally, it focuses on the map projections (section \@ref(crs)).
This section gives a background on why do we need map projections and how to translate spatial data from an ellipsoid into a flat surface or computer screen.
It also explains basic terms and gives an overview of map projections.
<!-- - maybe also references to some books (either here or in the next section or both) -->
<!-- geocompr, spatial data science, some crs book? -->
<!-- explain that often there is a need to prepare spatial data before mapping -->
## Data models
Traditionally, spatial data is described by two basic data models:
vector data model aimed at (section \@ref(vector-data-model)) representing the world using points, lines, and polygons, and raster data model focused on representing surfaces (section \@ref(raster-data-model)).
Additionally, now we have an abundance of available spatial data and a variety of ways to obtain it.
It includes having many district variables and repeated measurements for the same area.
Therefore, we also present the concept of spatial data cubes (section \@ref(spatial-data-cubes)).
### Vector data model
\index{vector data model}
\index{spatial geometries}
\index{spatial attributes}
The vector data model represent the world as a set of spatial geometries with non-spatial attributes (Figure \@ref(fig:vector-data-model)).
The role of geometry is to describe the location and shape of spatial objects.
Attributes, on the other hand, are used to store the properties of the data.
\index{spatial geometries}
There are three basic types of geometries: points, lines, and polygons, all of them are made up of coordinates (left part of Figure \@ref(fig:vector-data-model)).
A point is represented by a pair of coordinates, usually described as X and Y, allowing for locating this point in some space.
X and Y could be unitless, in degrees, or in some measure units, such as meters (extended discussion on coordinates and related topics is in section \@ref(crs)).
Points can represent features on different spatial scales, from a GPS position, location of a bench in a park, to a city on a small scale map.
They are also used to express abstract features, such as locations of map labels.
Properties of points<!--,such as ...--> can be expressed on maps by different point sizes, colors, or shapes<!--(markers/images) -->.
A line extends the idea of a point.
It consists of several points with coordinates (called vertices) that are arranged in some order.
Consecutive points are connected by straight lines.
Therefore, a straight spatial line consists of two points (two pairs of coordinates), while complex spatial lines could be created based on a large number of points.<!--to rewrite-->
It gives the illusion that the line is curved.
Lines are used to representing linear features, such as roads, rivers, boundaries, footpaths, etc.
In this case, we can express line features' attributes using either lines' color or their widths.
<!-- ways to adjust lines aesthetics: colors, lwd (line width) -->
<!-- in theory lty could be also used - but it is not implemented in tmap -->
A polygon is again a set of ordered points (vertices) connected by straight lines.
Its only difference from the line is that the first and the last point in a polygon has the same coordinates, and thus close the object.
<!-- examples of polygons -->
The polygon representation is used to represent shapes and locations of different objects, from a lake or a patch of vegetation, through a building or a city block, to some administrative units.
Polygons also have one unique feature - they could have holes.
A polygon hole represents an area inside of the polygon but does not belong to it.
For example, a lake with an island can be depicted as a polygon with a hole.
The values of polygons' attributes can be represented by the areas (fill) colors.
\index{spatial attributes}
The second part of the vector data model relates to non-spatial attributes (right part of Figure \@ref(fig:vector-data-model)).
Attributes are usually stored as a table, in which each column depicts some property, such as an identification number, a name of a feature, or a value of some characteristic.
Each row, on the other hand, relates to a single spatial geometry.
```{r, echo=FALSE, message=FALSE}
library(sf)
library(tmap)
point_data = st_sf(
id = as.factor(c(1, 2)),
type = c("streetlight", "bench"),
geom = st_sfc(
st_multipoint(rbind(c(1, 2))),
st_multipoint(rbind(c(1.4, 1.2), c(2, 1.5))),
crs = 4326)
)
```
```{r, echo=FALSE}
linestring_sfg1 = st_linestring(rbind(c(0.9, 1.9), c(1.6, 1.9), c(1.8, 1.8)))
linestring_sfg2 = st_linestring(rbind(c(0.8, 1.6), c(1.4, 1.5), c(2, 1.6)))
linestring_sfg3 = st_linestring(rbind(c(1.1, 1), c(1.6, 1.2), c(1.9, 1.2)))
lines_data = st_sf(
id = as.factor(1:3),
type = c("a", "b", "c"),
geom = st_sfc(linestring_sfg1,
linestring_sfg2,
linestring_sfg3,
crs = 4326)
)
lines_data_p = st_cast(lines_data, "MULTIPOINT")
```
```{r, echo=FALSE}
polygon_sfg1 = st_polygon(list(rbind(c(1.4, 1.6),
c(1.6, 1.6),
c(1.6, 1.8),
c(1.4, 1.8),
c(1.4, 1.6))))
polygon_sfg2 = st_multipolygon(list(list(rbind(
c(1.6, 1.3), c(1.8, 1.45), c(1.6, 1.45), c(1.6, 1.3)
)),
st_polygon(
list(rbind(
c(0.9, 1.1), c(1.2, 1.1), c(1.2, 1.4), c(0.9, 1.4), c(0.9, 1.1)
),
rbind(
c(1.0, 1.2), c(1.1, 1.3), c(1.1, 1.2), c(1.0, 1.2)
))
)))
polygon_data = st_sf(
id = as.factor(1:2),
type = c("a", "b"),
geom = st_sfc(polygon_sfg1,
polygon_sfg2,
crs = 4326)
)
polygon_data_p = st_cast(polygon_data, "MULTIPOINT")
```
```{r vector-data-model, echo=FALSE, fig.width = 5, fig.asp = 1, fig.cap="Instances of spatial vector data model: POINTS, LINES, and POLYGONS.", message=FALSE}
source("code/data_model_figures.R")
draw_vector_data(scale = .75)
```
\index{simple feature}
The above ideas could be implemented in many ways. <!--...-->
Currently, [the Simple Feature Access](http://portal.opengeospatial.org/files/?artifact_id=25355) seems to be the most widely used standard.
In it, a feature is every object or concept that have spatial location or extent.
Simple feature standard makes a clear distinction between single- and multi-element features.
We can have a POINT feature and a MULTIPOINT feature, and similarly LINESTRING and MULTILINESTRING, and POLYGON and MULTIPOLYGON.
The main difference between single element features (such as POINT or POLYGON) and multi-element features (such as MULTIPOINT or MULTIPOLYGON) can be clearly seen by looking at attribute tables.
For example, six points stored as POINT features fill six separate rows, while six points stored as just one MULTIPOINT feature occupy just one row.
<!-- redundancy -->
Examples of single- and multi-element features can be seen in Figure \@ref(fig:vector-data-model).
The top example shows point data represented as MULTIPOINT feature: although we have seven points (seven distinct pairs of coordinates), they are gathered into two groups, green and orange, which can be seen in the associated attribute table.
The central example, on the other hand, uses single-element features, where each line geometry relates to one row in the attribute table.
Finally, the bottom example again uses multi-element features, where the second feature (`Country B`) consist of two separate geometries.
The simple feature standard also describes a number of additional geometry types, including Curve, Surface, or Triangle.
Finally, GeometryCollection exists that contains all of the possible geometry types.
<!-- JN: maybe too much information-->
<!-- simple features standard also defines possible topological rules -->
\index{spatial file formats}
A couple hundreds of file formats exist to store spatial vector data.
One of the simplest ways to store spatial data is in the form of a text file (`.csv`) or as a spreadsheet (`.xls` or `.xlsx`).
While it makes storing point data simple, with two columns representing coordinates, it is not easy to store more complex objects in this way.
Text files are also not suitable for storing information about the coordinate reference system used (section \@ref(crs)).
Historically, the shapefile format (`.shp`) developed by the ESRI company gained a lot of interest and become the most widely supported spatial vector file format.
Despite its popularity, this format has a number of shortcomings, including the need to store several files, attribute names limited to ten characters, the ability to store up to 255 attributes and files up to 2GB, and many more.
A fairly recent file format, OGC GeoPackage (`.gpkg`), was developed as an alternative.
It is a single file database free from the limitation of the shapefile format.
Other popular spatial vector file formats include GeoJSON (`.geojson`), GPX (`.gpx`), and KML (`.kml`).
<!-- FlatGeobuf?? -->
<!-- - advantages/disadvantages -->
### Raster data model
\index{raster data model}
The raster data model represents the world using a continuous grid of cells<!--pixels-->, where each cell has a single associated value (Figure \@ref(fig:raster-intro)).
Depending on the type of values, we can distinguish continuous and categorical rasters.
In continuous rasters, such as elevation or precipitation, values vary progressively.
Categorical rasters, on the other hand, uses integer values to represent classes.
Their examples include land cover or soil types maps.
Raster data can also contain cells for which we do not know the value (Figure \@ref(fig:raster-intro)).
For example, data for this part of the area was not collected, or these locations are outside of our area of interest.
```{r raster-intro, echo=FALSE, warning=FALSE, fig.asp = .4, fig.cap="Basic representation of the raster data model: (1) Cell IDs, (2) Cell values, and (3) A raster map", message=FALSE}
#replace example later
library(tmap)
library(spData)
library(stars)
set.seed(2020-06-25)
mat = matrix(1:20, nrow = 5, ncol = 4)
dim(mat) = c(x = 5, y = 4)
mat2 = matrix(sample.int(100, 20), nrow = 5, ncol = 4)
mat2[c(4, 12)] = NA
dim(mat2) = c(x = 5, y = 4)
sta = c(st_as_stars(mat), st_as_stars(mat2))
attr(sta, "dimensions")[[2]]$delta = -1
sta_sf = st_as_sf(sta)
sta_sf$A1.2 = as.character(as.factor(sta_sf$A1.1))
sta_sf$A1.2[is.na(sta_sf$A1.1)] = "NA"
tm_raster1 = tm_shape(sta_sf) +
tm_polygons() +
tm_text("A1") +
tm_layout(frame = FALSE, main.title = "1. Cell IDs")
tm_raster2 = tm_shape(sta_sf) +
tm_polygons("A1.1") +
tm_text("A1.2") +
tm_layout(frame = FALSE, legend.show = FALSE,
main.title = "2. Cell values")
tm_raster3 = tm_shape(sta) +
tm_raster("A1.1") +
tm_layout(frame = FALSE, legend.show = FALSE, inner.margins = 0.02,
main.title = "3. Raster map")
tmap_arrange(tm_raster1, tm_raster2, tm_raster3, nrow = 1)
```
\index{raster data grid types}
When we think about raster data, most of the time we are referring to regular grids (Figure \@ref(fig:grid-types)).
In regular grids, each cell has the same, constant size, and coordinates change from top to bottom and from left to right^[Regular grids can also have coordinated changing in different directions, e.g., from bottom to top.].
<!-- I know it is a simplification-->
Regular rasters can be transformed into rotated and sheared rasters (Figure \@ref(fig:grid-types)).
Rotated grids are the result of transforming both coordinated, $x$ and $y$ using the same rotation coefficients.
Sheared grids are created when the rotation coefficients are not equal.
Rectilinear grids, on the other hand, have orthogonal axes, but consist of rectangular cells with different sizes and shapes (Figure \@ref(fig:grid-types)).
In the last type of raster data grids, curvilinear grids, cells are cuboids of different sizes and shapes (Figure \@ref(fig:grid-types)).
<!-- Q:should the counting in the below figure start from bottom left? -->
```{r grid-types, echo=FALSE, warning=FALSE, fig.asp = .4, fig.cap="Main types of raster data grids: (1) Regular, (2) Rotated, (3) Sheared, (4) Rectilinear, and (5) Curvilinear"}
# regular grid
tm_sta_regular = tm_shape(sta_sf) +
tm_polygons() +
tm_text("A1") +
tm_layout(frame = FALSE, main.title = "Regular")
# rotated grids
sta_rotated = sta
attr(attr(sta_rotated, "dimensions"), "raster")$affine = c(0.1, 0.1)
sta_rotated_sf = st_as_sf(sta_rotated)
tm_sta_rotated = tm_shape(sta_rotated_sf) +
tm_polygons() +
tm_text("A1") +
tm_layout(frame = FALSE, main.title = "Rotated")
# sheared grids
sta_sheared = sta
attr(attr(sta_sheared, "dimensions"), "raster")$affine = c(0.1, 0.2)
sta_sheared_sf = st_as_sf(sta_sheared)
tm_sta_sheared = tm_shape(sta_sheared_sf) +
tm_polygons() +
tm_text("A1") +
tm_layout(frame = FALSE, main.title = "Sheared")
# rectilinear grids
x = c(0, 0.5, 1.5, 2.1, 3, 5)
y = rev(c(0.1, 1, 1.5, 2, 4))
sta_rectilinear = st_as_stars(list(m = mat),
dimensions = st_dimensions(x = x, y = y))
sta_rectilinear_sf = st_as_sf(sta_rectilinear)
tm_sta_rectilinear = tm_shape(sta_rectilinear_sf) +
tm_polygons() +
tm_text("m") +
tm_layout(frame = FALSE, main.title = "Rectilinear")
# curvilinear grids
sta_curvilinear = sta
X1 = matrix(rep(1:5, times = 4), nrow = 5, ncol = 4)
X2 = matrix(c(seq(3.36, 1, length.out = 4),
seq(3.52, 1.16, length.out = 4),
seq(3.68, 1.32, length.out = 4),
seq(3.68, 1.32, length.out = 4),
seq(3.68, 1.32, length.out = 4)),
nrow = 5, ncol = 4,
byrow = TRUE)
am = matrix(1:20, nrow = 5, ncol = 4)
sta_curvilinear = st_as_stars(am)
sta_curvilinear = st_as_stars(sta_curvilinear,
curvilinear = list(X1 = X1,
X2 = X2))
sta_curvilinear_sf = st_as_sf(sta_curvilinear)
tm_sta_curvilinear = tm_shape(sta_curvilinear_sf) +
tm_polygons() +
tm_text("A1") +
# tm_grid() +
tm_layout(frame = FALSE, main.title = "Curvilinear")
# all
tm_sta_all = tmap_arrange(tm_sta_regular, tm_sta_rotated, tm_sta_sheared,
tm_sta_rectilinear, tm_sta_curvilinear, nrow = 1)
tm_sta_all
```
Contrary to spatial vector data, a basic raster data stores just one attribute.
It is, however, possible to stack together many single rasters (also known as raster layers).
This allows us to store and operate on many rasters having the same dimensions at the same time.
Examples of multi-layer rasters include satellite imageries or temporal rasters.
Satellite imageries usually consist of many bands (layers) for different wavelengths.
The most basic bands, representing the colors red, green, and blue, can be connected together to create one composite image with true colors (Figure \@ref(fig:rgb-raster)).
Temporal rasters store one attribute, but for many moments in time.
<!-- + comparing different attributes for the same area -->
Additional information about multi-layer rasters can be also found in Section \@ref(spatial-data-cubes).
```{r rgb-raster, echo=FALSE, message=FALSE, warning=FALSE, fig.asp = .4, fig.cap="Example of three satellite imagery bands: red, green, blue, and the composite image with true colors created using these three bands."}
# we can/should change this example later
landsat234 = raster::stack(system.file("raster/landsat.tif", package = "spDataLarge"))[[1:3]]
tmrgb1 = tm_shape(raster::brick(landsat234[[3]])) +
tm_raster(palette = "Greys") +
tm_layout(frame = FALSE, legend.show = FALSE, main.title = "Red")
tmrgb2 = tm_shape(raster::brick(landsat234[[2]])) +
tm_raster(palette = "Greys") +
tm_layout(frame = FALSE, legend.show = FALSE, main.title = "Green")
tmrgb3 = tm_shape(raster::brick(landsat234[[1]])) +
tm_raster(palette = "Greys") +
tm_layout(frame = FALSE, legend.show = FALSE, main.title = "Blue")
tmrgb4 = tm_shape(landsat234) +
tm_rgb(r = 3, g = 2, b = 1, max.value = 25780) +
tm_layout(frame = FALSE, main.title = "Composite")
tmap_arrange(tmrgb1, tmrgb2, tmrgb3, tmrgb4, nrow = 1)
```
\index{spatial file formats}
Similarly to vector data, a large number of raster file formats exists.
<!-- text files ?-->
Currently, the GeoTIFF format (`.tif` or `.tiff`) is one of the most popular spatial raster formats.
It is an extended image TIF format that stores spatial metadata (e.g., map projection) along the values.
Another popular spatial raster formats include Arc ASCII (`.asc`) and ERDAS Imagine (`.img`).
<!-- ncdf??? -->
### Spatial data cubes
\index{spatial data cubes}
Traditionally, spatial vector and raster data models refer to a unique set of locations.
For example, each feature in a polygon dataset and each cell in a raster dataset refer to one specific area.
However, to solve real-life problems, we need to store and operate on more complex data structures.
It includes situations when we have many attributes, often for several moments in time.
\index{spatial vector data cubes}
Storing multiple attributes is not a problem for the vector data model, when an attribute table can have many columns.
The question is how to extend the spatial vector data model to include measurements for many times.
For example, let's consider a polygon data with many attributes representing shares of land-use types for several years (Figure \@ref(fig:vector-data-cubes)).
One approach would be to create a separate column for each variable in each year.<!--wide--><!--pros and cons-->
Alternatively, we can have one column representing the year and one column for each attribute, however, this approach would require multiplying each geometry as many times as we have time stamps.
<!--long--><!--pros and cons-->
The third approach involves separating geometries from attributes, and where attributes for each moment are stored independently.
The last idea is used in spatial vector data cubes (section \@ref(the-stars-package)).
An example of the spatial vector data cubes idea can be seen in Figure \@ref(fig:vector-data-cubes).
It consists of two elements: a geometry (MULTIPOLYGON) of provinces of the Netherlands and an array connected to it that stores shares of land-use types for several years.
```{r vector-data-cubes, fig.width = 6, fig.asp=0.60606, message = FALSE, warning = FALSE, echo=FALSE, fig.cap="Vector data cube."}
source("code/data_model_figures.R")
draw_vector_cubes()
```
\index{spatial raster data cubes}
A single raster dataset can store just one variable for a given area.
To store several attributes, we can connect rasters representing different attributes for the same extent, creating multi-layer rasters (section \@ref(raster-data-model)).
Additionally, each of the aforementioned rasters can be collected for many moments in time, adding other layers to the data.
<!--pros and cons-->
The question here is how to efficiently store multi-layer raster data to understand what layers relate to which attribute and time.
Similarly to spatial vector data cubes, we can think of separating spatial dimensions from non-spatial attributes and create spatial raster data cubes (section \@ref(the-stars-package)).
Figure \@ref(fig:raster-data-cubes) gives an example of a raster data cube.
It consists of several single-layer rasters with the same spatial properties, such as resolution, extent, and CRS.
These rasters are organized to store four-dimensions of the data: latitude, longitude, time, and attributes.
It has values of three attributes for five moments in time in total.
```{r raster-data-cubes, fig.asp=0.25, message = FALSE, echo=FALSE, fig.cap="Raster data cube."}
source("code/data_model_figures.R")
draw_data_cubes()
```
Spatial data cubes are suitable for many real-life applications.
For example, time-series of climate measurements for several stations, demographic data on a country level gathered for many years, or satellite imageries over some period of time.
\index{spatial file formats}
One way to create spatial data cubes is by connecting many independent vector or raster objects.
<!-- mention it in the stars section? -->
Second way is to read a spatial data cube from one of the file formats allowing for storing complex data.
It includes formats such as NetCDF (`.nc`) and HDF (`.hdf`).
<!-- spatial vector data cubes file formats? -->
<!-- converting between spatial vector data cube and spatial raster data cube -->
## Spatial data representations in R
\index{vector data model}
R has several packages aimed to represent spatial vector data.
<!-- Recently, the **terra** package has been released containing a new vector data representation. -->
For more than a decade, the **sp** package <!--REF--> was a standard of vector data representation in R.
However, now this package is in the maintenance mode only, and its successor, **sf** is recommended.
The **tmap** package has been using **sf** since version 2.0.
\index{raster data model}
\index{spatial data cubes}
Several R packages can be used to represent spatial raster data, including **raster** and its successor **terra**.
The **raster** package was used as a backbone of raster data visualization until **tmap** version 3.0.
Nowadays, the **stars** package is used by **tmap** to operate on raster data and spatial data cubes.
In the two next sections, we introduce the **sf** package (\@ref(the-sf-package)) and the **stars** package (section \@ref(the-stars-package)).
<!-- spatial data cubes -->
<!-- https://github.com/appelmar/gdalcubes_R -->
<!-- https://ropensci.org/blog/2019/11/05/tidync/ -->
### The sf package
\index{sf}
\index{sf (package)|see {sf}}
The **sf** package implements ideas behind the Simple Feature standard, which describe how to represent spatial vector data.
Its main class, `sf`, has the form of an extended data frame, where each row is a spatial feature.
In it, attributes of the vector data are stored as columns.
It also has one additional column, most often named `geom` or `geometry`^[However, any other names are also possible.].
This column contains geometries in a form of well-known text (WKT), storing all of the coordinates.
<!-- - how to read sf objects from files -->
The **sf** package can read all of the spatial data formats mentioned in section \@ref(vector-data-model) using the `read_sf()` function^[It is also possible to read spatial vector data using the `st_read()` function, which differs from `read_sf()` by having different default arguments.].
<!--improve example-->
```{r}
library(sf)
worldvector = read_sf("data/worldvector.gpkg")
```
The new object, `worldvector`, has a `sf` class.
It has `r nrow(worldvector)` features (rows or geometries) and `r ncol(worldvector) - 1` fields (columns with attributes).
There is also an `r ncol(worldvector)`th column, `geom`, that stores geometries of each feature.
Objects of class `sf` also display a header containing spatial metadata.
It includes geometry type, dimension (`XY`, `XYZ`, `XYM`, `XYZM`), bounding box (`bbox`), and information about the used Coordinate Reference System (`CRS`).
```{r}
worldvector
```
The `worldvector` object has MULTIPOLYGON geometry type, where each feature (row) can consist of one or more polygons.
Each polygon's vertices are represented by a pair of values (`dimension: XY`).
Bounding box allows to quickly understand the spatial extension of the input data.
<!--...-->
Finally, it has `r ifelse(sf::st_crs(worldvector)$IsGeographic, "geographic", "projected")` CRS named `r sf::st_crs(worldvector)$Name`.
You can learn more about Coordinate Reference Systems in section \@ref(crs).
<!-- ref to CRS section -->
Spatial vector data of class `sf` can be also obtained using some of other R data packages.
<!-- add REFs-->
For example, **rnaturalearth** allows to download world map data, **osmdata** imports OpenStreetMap data as `sf` objects, and **tigris** loads TIGER/Line data.
<!-- add reference to geocompr -->
<!-- add reference to https://cran.r-project.org/web/views/Spatial.html (after my updates) -->
The **tmap** package accepts spatial vector data objects from both **sf** and **sp** packages.
In case of having vector objects in a different representation, they should be converted into `sf` objects first, before making maps.
The **sf** package has the `st_as_sf()` function that translates objects of many classes, including `Spatial` (from the **sp** package), `ppp`, `psp`, and `lpp` (from the **spatstat** package), to the objects of class `sf`.
The `st_as_sf()` function also allows to turn data frames into `sf` objects - the user needs to provide the input data frame, names of columns with coordinates, and additionally definition of the CRS of the data.
For example `my_sf = st_as_sf(my_df, coords = c("Xcolumn", "Ycolumn"), crs = 4326)`.
If you want to learn more about operating on `sf` objects, we recommend visiting the package website and vignettes at https://r-spatial.github.io/sf/index.html and reading [the Geocomputation with R book](https://geocompr.github.io/) [@lovelace2019geocomputation].
<!-- - vector simplification? -->
### The stars package
\index{stars}
\index{stars (package)|see {stars}}
<!-- intro stars -->
The **stars** package allows for reading and processing raster data in R.
This package also has support for both spatial vector and raster data cubes.
Its main class, `stars`, is built as a list of matrices or arrays with metadata describing their dimensions.
The **stars** package is also well integrated with **sf**, with many `st_` functions (such as `st_crs()`) working also on `stars` objects.
<!-- - how to read stars objects from files -->
The `read_stars()` function allow to read spatial raster data from a file^[The **stars** package also has a function `read_ncdf()` aimed at improved reading of NetCDF files.].
This function requires at least one argument with a filename to be read.
<!--improve example-->
```{r}
library(stars)
worldelevation = read_stars("data/worldelevation.tif")
```
The new object, `worldelevation`, is of a `stars` class.
It has two dimensions, `x` and `y`, and one attribute `worldelevation.tif`.
```{r}
worldelevation
```
The `worldelevation.tif` attribute is a matrix, where each cell represents an elevation value.
The `x` dimension has 1080 elements (columns), starting from a coordinate (`offset`) of a cell boundary of `-180`.
Next, the coordinates of further cells increase by `0.333333` (`delta`) - resolution in the `x` dimension.
The `y` dimension has 540 elements (rows), starting from a coordinate (`offset`) of a cell boundary of `90`.
For the `y` dimension, each further cell's coordinated decreases by `0.333333` (notice the negative value of `delta`) - resolution in the `y` dimension.
Both dimensions also have the same CRS - `WGS 84`.
`read_stars()` also has several additional arguments including `RasterIO`, which gives control over the input data extent and resolution.
For example, the below code will read just the first and second bands (results not shown).
<!-- - including reading chunks, changing resolution, and selecting bands -->
<!--improve example-->
```{r}
file_path3 = system.file("raster/landsat.tif", package = "spDataLarge")
x3 = read_stars(file_path3, RasterIO = list(bands = c(1, 2)))
```
Internally, a `stars` object is a list of `matrix` or `array` objects with additional attributes describing spatial metadata, such as a number of columns and rows, resolution, coordinate reference system, etc.
All of this information is read from the input file.
Stars objects are constructed by dimensions and attributes.
Dimensions relate to what kind of objects are stored as list elements.
For example, when it is a `matrix` then we just have two dimensions representing columns and rows.
However, it is also possible to store multidimensional `array`s, which allow having many additional dimensions for bands, times, etc.
Attributes, on the other hand, are stored as list elements.
Each attribute can relate, for example to a different variable.
Reading a simple GeoTIFF file would result in having just two dimensions and one attribute (a `matrix`).
On the other hand, reading complex raster file formats, such as NetCDF could result in having more than two dimensions (e.g. time) and many attributes (e.g., an `array` with temperature, precipitation, humidity).
<!-- how it relates to mapping? -->
<!-- - stars proxy -->
<!-- more than 1e8 cells to read -->
Before reading the file, the **stars** package checks if the input data is a curvilinear grid and what is the number of cells in the data.
When the input data is small or curvilinear then the full data is read in computer memory.
Otherwise, a `stars proxy` approach is used, where only metadata is read including pointers to where the complete data is.
When we want to plot large raster data, then it is read at a lower resolution than the native one.
<!-- ref to the section where we are explaining max.plot options -->
The **stars** package also has support for vector data cubes, where each geometry is just stored once (as a dimension), and each attribute is a `matrix` or an `array` with the number of rows equals to the number of geometries, the number of columns equals to another dimension (e.g., time), and possibly the number of `array` layers equals for additional dimensions.
<!-- can we plot them in tmap? -->
<!-- if so - there should be an example in the book + reference -->
<!-- The **tmap** package accepts spatial raster data objects from both **stars** and **raster** packages. -->
More information on how the `stars` objects are organized and how to operate on them can be found in the **stars** package vignettes at https://r-spatial.github.io/stars.
<!-- - advice: sometimes/often it is better to prepare spatial object before the mapping, than trying to over-customize the map -->
## Map projections (CRS) {#crs}
<!-- intro -->
### What are map projections?
```{r crs-01, eval=TRUE, echo=FALSE, results='hide', warning=FALSE, message=FALSE}
library(sf)
library(tmap)
library(grid)
library(dplyr)
library(kableExtra)
data(World)
clr_peel = "#FD8A04"
clr_orange_land = "#FB6801"
clr_peel_back = "#D24010"
clr_bg = "grey95"
clr_bg2 = "grey90" #"#D24010"
clr_land = "grey85"
clr_grat = "grey50"
clr_border = "grey30"
#source("code/crs_examples.R", local = knitr::knit_global()) # something strange going on: when saving this R script, the content isn't updated in the envir, or there is a cashing problem. Therefore, I had to add source(... ) to every chunk below.
```
\index{map projections}
We use maps so often in everyday life that most of us probably forget that a map is just a two-dimensional representation of a three-dimensional object, namely the earth.
For centuries, geographers and mathematicians wondered what the best way is to do this.
Let us wonder with them for a second.
The world is shown as an orange below, not just to stimulate your appetite for this subject, but also since an orange peel is a good analogy for a two-dimensional map.
A world map can be seen as an orange peel that is put flat on the table.
The question is how to do this.
```{r crs-02, echo=FALSE, results='hide', warning=FALSE, eval=FALSE}
source("code/crs_examples.R")
map_orange_world()
```
```{r orange, echo=FALSE, message=FALSE, fig.cap="How to peel an orange?", fig.scap="How to peel an orange?"}
knitr::include_graphics("images/orange_world.png")
```
When we peel the orange, ideally we want to rip the peel near areas of the earth that are less interesting.
What is interesting depends on the application; for applications where land mass is more important than wetlands, it is a good idea to make the rips in the oceans.
The (interrupted) Goode homolosine projection, which is shown below, embodies this idea.
All continents and countries are preserved, except Antarctica and Greenland.
There is also a version of the Goode homolosine projection that focuses on preserving the oceans.
```{r crs-goode, echo=FALSE, results='hide', fig.asp=0.45, warning=FALSE, fig.cap="The (interrupted) Goode homolosine projection"}
source("code/crs_examples.R")
goode = map_goode()
tm_shape(goode$bg) +
tm_polygons(clr_peel) +
tm_shape(goode$land) +
tm_polygons(clr_orange_land, border.col = clr_border, lwd = 1) +
tm_shape(goode$grat) +
tm_lines(col = clr_grat, lwd = 1) +
tm_layout(frame = FALSE)
```
To make the analogy between the orange peel and the surface of the earth complete, we have to assign two fictitious properties to the orange peel, namely that it is stretchable and deformable.
These properties are needed in order to make a non-interrupted map, as we will see in the next sections.
\index{map projections}
\index{coordinate reference system (CRS)}
A method to flatten down the earth, for which the Goode homolosine projection shown Figure \@ref(fig:crs-goode) is an example, is called a *map projection*.
Technically, it is also known as a *coordinate reference system* (*CRS*), which specifies the corresponding coordinate system, as well as the transformations to other map projections.
### A model of the Earth
\index{ellipsoid}
<!--we should use either "Earth" or "earth" (both are right now in the text)-->
The orange and the Earth have another thing in common; both are spheres, but not perfect ones.
The Earth is metaphorically speaking a little fat: the circumference around the equator is 40,075 km whereas around the circumference that crosses both poles is 40,009 km.
<!--source: https://en.wikipedia.org/wiki/Earth_physical_characteristics_tables-->
Therefore, the earth can better be described as an ellipsoid.
The same applies to an orange; every orange is a little different, but probably very few oranges are perfect spheres.
Although the ellipsoid is a good mathematical model to describe the earth's surface, keep in mind that the surface of the earth is not smooth;
land mass usually lies on a higher altitude than sea level.
We could potentially map each point on the surface of the earth using a three-dimensional $(x, y, z)$ Cartesian coordinate system with the center of the mass of the Earth being the origin (0, 0, 0).
However, since this has many mathematical complications, the ellipsoid is often sufficient as a model of the surface of the earth.
\index{datum}
This ellipsoid model and its translation to the Earth' surface is called a *(geodetic) datum*.
The most popular datum is WGS84, which has been introduced in 1984 as an international standard, and has been last revised in 2004.
There are many (slightly) different datums, which are often tailored for local applications.
For instance, NAD83, ETRS89, and GDA94 are slightly better models for North-America, Europe, and Australia respectively.
However, since WGS84 is a very good approximation of the earth as a whole, it has been widely adopted worldwide and is also used by the Global Positioning System (GPS).
\index{latitude and longitude}
When we have specified a datum, we are able to specify geographic locations with two familiar variables, namely *latitude* and *longitude*.
The latitude specifies the north-south position in degrees, where latitude = 0$^\circ$ is the equator.
The latitudes for the north and south pole are 90$^\circ$ and $-90^\circ$ respectively.
The longitude specifies the east-west position in degrees, where by convention, the longitude = 0$^\circ$ meridian crosses the Royal Observatory in Greenwich, UK.
The Longitude range is -180$^\circ$ to 180$^\circ$, and since this is a full circle, -180$^\circ$ and $^\circ$ specify the same longitude.
\index{graticule}
When we see the earth in its three-dimensional form, as in Figure \@ref(fig:orange), the latitude parallels are the horizontal lines around the earth, and the longitude meridians are the vertical lines around the earth.
The set of longitude meridians and latitude parallels is also referred to as *graticule*.
In all the figures in this section, latitude parallels are shown as gray lines for $-60^\circ$, $-30^\circ$, $0^\circ$, $30^\circ$ and $60^\circ$, and longitude meridians from $-180^\circ$ to $180^\circ$ at every $30^\circ$.
Please keep in mind that only a latitude and longitude are not sufficient to specify a geographic location.
A datum is required.
When people exchange latitude-longitude data, it is safe to assume that they implicitly have used the WGS84 datum.
However, it is good practice to specify the datum explicitly.
### Platte Carrée and Web Mercator
\index{web mercator}
\index{EPSG}
Let's take a closer look at two widely used map projections, namely the plain latitude-longitude coordinate system (using the WGS84 datum) and the Web Mercator projection, which is currently the de facto standard for interactive maps.
These projections are indexed as EPSG4326 and EPSG3857 respectively.
EPSG is an institute that maintains a database of standard map projections.
<!--https://geographx.co.nz/map-projections/-->
```{r crs-04, echo=FALSE, results='hide', warning=FALSE, fig.cap="Latitude longitude coordinates (EPSG 4326)", message=FALSE, fig.cap="The WGS84 coordinate system (EPSG4326)", eval=FALSE}
source("code/crs_examples.R")
m4326 = map_4326()
m4326_cyl = map_4326_cyl()
tmap_arrange({
tm_shape(m4326_cyl$bg_back) +
tm_polygons(clr_bg2) +
tm_shape(m4326_cyl$bg_front) +
tm_polygons(clr_bg, border.col = clr_border) +
tm_shape(m4326_cyl$land) +
tm_polygons(clr_land) +
tm_shape(m4326_cyl$grat) +
tm_lines(col = clr_grat, lwd = 1) +
tm_layout(frame = FALSE)},
{ tm_shape(m4326$bg) +
tm_fill(clr_bg) +
tm_shape(m4326$land) +
tm_polygons(clr_land) +
#tm_shape(m4326$grat) +
#tm_lines(col = clr_grat, lwd = 1) +
tm_graticules(x = seq(-180, 180, by = 30), y = seq(-60, 60, by = 30), col = clr_grat, labels.size = .6, labels.cardinal = FALSE, lines = TRUE) +
tm_layout(frame = TRUE, inner.margins = 0)
}, widths = c(.3, .7), asp = NULL)
```
When we fictitiously make little holes in the orange peel at both poles, and stretch these open so wide that they have the same width as the equator, we obtain the cylinder depicted in Figure \@ref(fig:crs-04) (left).
Note that the longitude lines have become straight vertical lines.
When we unroll this cylinder, we obtain a map where the $x$ and $y$ coordinates are the longitude and latitude respectively.
This CRS, which is known as EPSG4326, is shown in Figure \@ref(fig:crs-04) (right).
\index{coordinate reference system (CRS)}
\index{unprojected coordinate reference system (CRS)}
\index{projected coordinate reference system (CRS)}
EPSG4326 is an *unprojected* CRS, since the longitude and latitude have not been transformed.
With *projected* CRSs, the $x$ and $y$ coordinates refer to specific measurement units, usually meters.
The projected variant of this CRS is called the *Platte Carrée* (EPSG4087), and is exactly the same map as shown in Figure \@ref(fig:crs-04) (right), but with other $x$ and $y$ value ranges.
Observe since we stretched the poles open, the area near the poles have been stretched out as well.
More specifically, the closer the land is to one of the poles, the more it has been stretched out.
Since the stretching direction is only horizontally, the shapes of the areas have become wider.
A good example is Greenland, which is normally a 'tall' area (as can be seen in Figure \@ref(fig:orange)).
\index{web mercator}
In order to fix these deformed areas, Gerardus Mercator, a Flemish geographer in the 16th century introduced a method to compensate for this by inflating the areas near the poles even more, but now only in a vertical direction.
This projection is called the Mercator projection.
For web applications, this projection has been slightly modified and renamed to the Web Mercator projection (EPSG3857).
The cylinder and plain map that uses this projection are shown in Figure \@ref(fig:crs-05).
```{r crs-05, echo=FALSE, results='hide', warning=FALSE, fig.cap="Web Mercator projection (EPSG3857)", message=FALSE, eval=FALSE}
source("code/crs_examples.R")
m3857 = map_3857()
m3857_cyl = map_3857_cyl()
tmap_arrange({
tm_shape(m3857_cyl$bg_back) +
tm_polygons(clr_bg2) +
tm_shape(m3857_cyl$bg_front) +
tm_polygons(clr_bg, border.col = clr_border) +
tm_shape(m3857_cyl$land) +
tm_polygons(clr_land) +
tm_shape(m3857_cyl$grat) +
tm_lines(col = clr_grat, lwd = 1) +
tm_layout(frame = FALSE)},
{ tm_shape(m3857$bg) +
tm_fill(clr_bg) +
tm_shape(m3857$land) +
tm_polygons(clr_land) +
#tm_shape(m3857$grat) +
#tm_lines(col = clr_grat, lwd = 1) +
tm_graticules(x = seq(-180, 180, by = 30), y = seq(-60, 60, by = 30), col = clr_grat, labels.size = .6, labels.cardinal = FALSE, lines = TRUE) +
tm_layout(frame = TRUE, inner.margins = 0)
}, widths = c(.3, .7), asp = NULL)
```
Although the areas near the poles have been inflated quite a lot, especially Antarctica and Greenland, the shape of the areas is more or less correct, in particular regarding small areas (which can be seen by comparing with Figure \@ref(fig:orange)).
The Mercator projection is very useful for navigational purposes, and has therefore been embraced by sailors ever since.
Also today, the Web Mercator is the de facto standard for interactive maps and navigation services.
However, for maps that show data the (Web) Mercator projection should be used with great caution, because the hugely inflated areas will influence how we perceive spatial data.
We will discuss this in the next section.
### Types of map projections
<!-- https://en.wikipedia.org/wiki/List_of_map_projections
http://www.geog.uoregon.edu/shinker/geog311/Labs/lab02/properties.htm
https://www.researchgate.net/publication/303311220_Projection_Wizard_-_An_Online_Map_Projection_Selection_Tool
https://projectionwizard.org/
https://kartoweb.itc.nl/geometrics/Map%20projections/body.htm
http://www.geo.hunter.cuny.edu/~jochen/gtech201/lectures/lec6concepts/map%20coordinate%20systems/how%20to%20choose%20a%20projection.htm
https://books.google.nl/books?id=E0JZDwAAQBAJ&pg=PA244&lpg=PA244&dq=Equidistant+projections+important&source=bl&ots=UqDt0ZBgEP&sig=ACfU3U3R1XN0i33v6Izh8fQZGJbpLF9ULw&hl=en&sa=X&ved=2ahUKEwi84tT_68rqAhUQ26QKHRcWD3AQ6AEwEHoECAgQAQ#v=onepage&q=Equidistant%20projections%20important&f=false
-->
\index{map projections}
Let us go back to the original question: how can we make a two-dimensional image of our three-dimensional earth?
Although there are many ways, four basic map projection types can be distinguished.
These are depicted in Figure \@ref(fig:crs-types).
```{r crs-types, echo=FALSE, message=FALSE, out.width="100%", fig.asp=0.35, fig.cap="Four types of map projections"}
source("code/crs_examples.R")
crs_types()
```
\index{cylindrical map projections}
Examples for cylindrical projections have already been given in the previous section; both Platte Carrée and Web-Mercator are cylindrical.
Another widely used cylindrical map projection is the *Universal Transverse Mercator (UTM)*.
The cylinder is not placed upright, but horizontal.
There are 60 positions in which this cylinder can be placed, where in each position, the cylinder faces a longitude range of 6 degrees.
In other words the UTM is not a single projection, but a series of 60 projections.
There are many projections which are pseudo-cylinders in the sense that the radius around the poles is smaller than around the equator.
An example is the Robinson projection shown in Figure \@ref(fig:crs-robin).
Almost all commonly used standard World map projections are (pseudo-)cylindrical.
```{r crs-robin, echo=FALSE, message=FALSE, out.width="100%", fig.asp=0.45, fig.cap="The Robinson projection, which is pseudo-cylindrical."}
data(World)
tm_shape(World, projection = "+proj=robin") +
tm_polygons(clr_land, border.col = clr_border) +
tm_layout(earth.boundary = TRUE,
earth.boundary.color = clr_grat,
bg.color = clr_bg,
space.color = "white",
frame = FALSE) +
tm_graticules(x = seq(-180, 180, by = 30), y = seq(-60, 60, by = 30), col = clr_grat, labels.cardinal = FALSE, ticks = FALSE, labels.show = FALSE)
```
\index{conic map projections}
An example of a conic map projection is shown in \@ref(fig:crs-conic-planar)(a).
As a result of unfolding a cone on a flat surface, a gap is created.
The size (angle) of this gap depends on the width of the cone.
There are also pseudo-conic map projections in which some meridians (longitude lines) are curved.
Conic map projections are useful for mid-latitude areas where the surfaces of the earth and the cone are almost parallel to each other.
\index{planar map projections}
Planar map projections, also known as azimuthal projections, project the Earth on a disk.
This can be done in several ways.
This can best be explained by the position of an imaginary light source.
It can be placed inside the globe, at the surface of the globe opposite to the disk, and at an infinite distance opposite to the disk.
The corresponding families of projections are called gnomonic, stereographic, and orthogonal projections.
Planar map projections are often used for a specific country or continent.
An example is the Lambert Azimuthal Equal-Area projection (EPSG3035), shown in \@ref(fig:crs-conic-planar)(b), which is optimized for Europe.
It can be classified as a stereographic projection, although the light beams are not straight but curved.
Another example of a planar map projection is the orange shown in Figure \@ref(fig:orange).
This is an orthogonal projection.
```{r crs-conic-planar, echo=FALSE, results='hide', warning=FALSE, fig.cap="Examples of a conic (a) and a planar (b) projection.", message=FALSE}
source("code/crs_examples.R")
meqdc = map_eqdc()
m3035 = map_3035()
tmap_arrange({
tm_shape(meqdc$bg) +
tm_polygons(clr_bg, border.col = clr_border) +
tm_shape(meqdc$land) +
tm_polygons(clr_land) +
tm_shape(meqdc$grat) +
tm_lines(col = clr_grat, lwd = 1) +
tm_layout(frame = FALSE) +
tm_layout(title="(a) World Equidistant Conic projection", title.position = c("center", "bottom"), title.size = 0.9, title.color = "#777777", inner.margins = c(.1, 0.02, 0.02, 0.02))},
{tm_shape(m3035$bg) +
tm_polygons(clr_bg, border.col = clr_border) +
tm_shape(m3035$land) +
tm_polygons(clr_land) +
tm_shape(m3035$grat) +
tm_lines(col = clr_grat, lwd = 1) +
tm_layout(frame = FALSE) +
tm_layout(title="(b) Lambert Azimuthal Equal-Area projection", title.position = c("center", "bottom"), title.size = 0.9, title.color = "#777777", inner.margins = c(.1, 0.02, 0.02, 0.02))
})
```
\index{interrupted map projections}
The (interrupted) Goode homolosine projection shown in Figure \@ref(fig:crs-goode) is an example of an interrupted projection.
A special class of these projections are polyhedral projections, which consists of planar faces.
In Figure \@ref(fig:crs-types) a polyhedral of six faces is illustrated.
There is no limit of the number of faces, as the myriahedral projections (TODO reference Van Wijck paper) illustrate.
### Which projection to choose?
\index{map projections}
Hopefully it is clear that there is no perfect projection, since each projection has its pros and cons.
Whether a projection is good for a certain application, depends on two factors.
The first factor is the type of application and in particular which map projection properties are useful or even required for that application.
For instance, navigation requires other map projection properties than statistical maps.
The second factor is the area of interest.
Is the whole World visualized or only a part, and in the latter case, which part?
In this section, guidelines are provided to choose a proper projection based on those two aspects.
Before we go deeper into selecting a projection, it is worth noting that for many countries and continents, government agencies already have chosen projections to be the standard for mapping spatial data.
For instance, a standard for Europe, which is used by Eurostat (the statistical agency of the European Union), is the Lambert Azimuthal Equal-Area projection shown in Figure \@ref(fig:crs-conic-planar)(b).
If the area of interest has such a standard, it is recommended to use it, because it can be safely assumed that this standard is a proper projection, and moreover, it makes cooperation and communication with other parties easier.
However, be aware of the limitations that this particular projection may have, and that there may be better alternatives out there.
**Map projection properties**
\index{map projection: properties}
The type of application is important for the choice of a map projection.
However, it would be quite tedious to list all possible applications and provide projection recommendations for each of them.
Instead, we focus on four map projection properties.
The key step is to find out which of these properties are useful or even required for the target application.
The four properties are listed in the following table.
```{r crs-properties, echo=FALSE}
crs_prop_name = c("Conformal", "Equal area", "Equidistant", "Azimuthal")
crs_prop_attr = c("Local angle (shape)", "Area", "Distance", "Direction")
crs_prop_appl = c("Navigation, climate", "Statistics", "Geology", "Geology")
crs_prop_cyl = c("Mercator", "Gall-Peters, Eckert IV", "Equirectangular", "none")
crs_prop_conic = c("Lambert conformal conic", "Albers conic", "Equidistant conic", "none")
crs_prop_planar = c("Stereographic", "Lambert azimuthal equal-area", "Azimuthal equidistant", "Stereographic, Lambert azimuthal equal-area")
crs_prop_interrupted = c("Myriahedral", "Goode homolosine, Myriahedral", "none", "none")
tb = tibble(Property = crs_prop_name, Preserves = crs_prop_attr, Applications = crs_prop_appl, `Examples (cyclindrical)` = crs_prop_cyl, `Examples (conic)`=crs_prop_conic, `Examples (planar)`=crs_prop_planar, `Examples (interrupted)`=crs_prop_interrupted)
tb2 = as.data.frame(t(rbind(names(tb), as.matrix(tb, dimnames = NULL))))
tb2 = tb2[, -1]
knitr::kable(tb2,
row.names = TRUE,
col.names = NULL,
caption = "Map projection properties.",
caption.short = "Map projection properties.",
booktabs = TRUE) %>%
kableExtra::kable_styling(font_size = 12) %>%
column_spec(1, width = "14em", bold = TRUE) %>%
column_spec(c(3,5), width = "14em")
```
\index{conformal map projections}
A *conformal* projection means that local angles are preserved.
In practice, that means that for instance a map of a crossroad preserves the angles between the roads.
Therefore, this property is required for navigational purposes.
As a consequence that local angles are preserved, local shapes are also preserved.
That means that an small island will be drawn on a map in its true shape, as seen from the sky perpendicular above it.
The Web Mercator shown in Figure \@ref(fig:crs-05) satisfies this property; the closer an area is to one of the poles, the more it is enlarged, but since this is done in both dimensions (latitude and longitude), local shapes are preserved.
\index{equal-area map projections}
A map projection is called *equal-area* if the areas are proportional to the true areas.
This is strongly recommended for maps that show statistics in order to prevent perceptual bias.
Figure \@ref(fig:crs-bias) shows two World maps of population density per country, one in the Web Mercator projection and the other in Eckert IV projection.
The perception of World population is different in these maps; in (a) the vast lands on low-populated areas seem to be Canada, Greenland, and Russia, whereas in (b) also North Africa and Australia emerge as vast low-populated areas.
```{r crs-bias, echo=FALSE, message=FALSE, out.width="100%", fig.asp=0.45, fig.cap="The Robinson projection, which is pseudo-cylindrical."}
library(grid)
World = World[World$iso_a3 != "ATA", ]
tm1 = tm_shape(World, projection = 3857) + tm_polygons("pop_est_dens", style ="log10", title = expression("Population per " * km^2), legend.is.portrait = F) + tm_layout(frame = FALSE, legend.frame = TRUE, inner.margins = c(.1, .02, .02, .02), legend.position = c(.4, .1), scale = 0.7, asp = 1)
tm2 = tm_shape(World, projection = "+proj=eck4") + tm_polygons("pop_est_dens", style ="log10", title = expression("Population per " * km^2), legend.is.portrait = F) + tm_layout(frame = FALSE, legend.frame = TRUE, inner.margins = c(.1, .02, .02, .02), legend.position = c(.4, .1), scale = 0.7, asp = 1)
tmap_arrange(tm1, tm2)
grid.text("(a) Web Mercator is not equal-area", x = .25, y = .08, gp=gpar(cex = 0.8, col = "#777777"))
grid.text("(b) Eckert IV is equal-area", x = .75, y = .08, gp=gpar(cex = 0.8, col = "#777777"))
```
\index{equidistant map projections}
\index{azimuthal map projections}
The other two map projection properties are related to one central point on the map.
A map projection is called *equidistant* if the distances to any other point in the map are preserved, and *azimuthal* if the directions to any other point are preserved.
These properties are in particular useful in the field of geology.
One example is a seismic map around the epicenter of a recent earthquake, where it is important to how far and in which direction the vibrations are spreading.
\index{compromise map projections}
A map projection can satisfy at most two of these properties.
Many map projection do not satisfy any property but are intended as a compromise.
An example is the Robinson projection, shown in Figure \@ref(fig:crs-robin).
**Area of interest**
\index{map projection: area of interest}
The next aspect that is important for the choice of a map projection is the area of interest.
In general, the larger the area, the more concessions have to be made, since the larger the area, the more difficult it is to make a two-dimensional projection.
The following table provides recommendations of map projection types based on the area size and on the latitude of the area.
```{r crs-recommendations, echo=FALSE}
knitr::kable(tibble(`View` = c("World", "Hemisphere", "Continent or smaller"),
`Low latitude (equator)` = c("Pseudo-cylindrical", "Azimuthal", "Cylindrical or azimuthal"),
`Mid latitude`= c("Pseudo-cylindrical", "Azimuthal", "Conic or azimuthal"),
`High latitude (poles)` = c("Pseudo-cylindrical", "Azimuthal", "Azimuthal")),
caption = "Recommended projections.",
caption.short = "Recommended projections.",
booktabs = TRUE) %>%
kableExtra::kable_styling(font_size = 12) %>%
kableExtra::column_spec(1, width = "10em")
```
For World maps, pseudo-cylindrical map projections, such as the Robinson projection (Figure \@ref(fig:crs-robin)) and the Eckert IV projection (Figure \@ref(fig:crs-bias)(b)) are very popular because they have less distortion other map projections.
For areas that cover a half of the sphere, i.e. a hemisphere, azimuthal map projections are recommended.
There are four hemispheres that are often used: the Northern and Southern Hemisphere, with respectively the North and South Pole as center, the Western Hemisphere consisting of the Americas, and the Eastern Hemisphere, which includes the other continents.
However, other hemispheres are often used implicitly, such as a hemisphere centered on Europe used in the Lambert Azimuthal Equal-Area projection shown in Figure \@ref(fig:crs-conic-planar)(b).
For areas with the size of a continent or country, the azimuthal map projection type can be used when centered on the area of interest.
In particular, the Lambert Azimuthal Equal-Area projection when equal area is required, and the Azimuthal Equidistant projection when preserving distances is important.
Alternatively, cylindrical and conic map projection types can be used for areas at low and mid latitudes respectively.
Another alternative is to use a UTM projection.
However, this is only recommended when the target area spans less than 6 degrees longitude.<!--and do not cross the UTM zone lines-->
### CRS in R
\index{coordinate reference system (CRS)}
\index{PROJ}
Coordinate Reference Systems (CRSs) are implemented in the software library [**PROJ**](https://proj.org/).
With implementation, we mean specifying a CRS and transforming coordinates from one CRS to another.
**PROJ** is used by every popular software application for spatial data, in particular **ArcGIS**, **QGIS**, and **GRASS GIS**, and also by many programming languages, including R.
The **sf** package integrates the **PROJ** functions into R.
A CRS is represented in R by an object of class `crs`, which can be retrieved or set with the function `st_crs` (from the **sf** package).
In the following example, a `crs` object is created from an EPSG code, in this case 3035, the Lambert Azimuthal Equal-Area projection for Europe.
```{r crs-new}
library(sf)
# CRS Lambert Azimuthal Equal-Area projection
st_crs(3035)
```
\index{crs objects}
A `crs` object is represented by Well Known Text (WKT).
It includes a specification of the used datum as well as information how to transform it into other CRSs.
Understanding the exact content of the WTK is not important for most users, since it is not needed to write a WKT yourself.
\index{crs objects}
\index{EPSG}
\index{proj4}
A `crs` object can be created in several ways:
<!--to improve in the future-->
1. The first is with an EPSG number as user input specification as shown above. <!--it can be also some other "provider", e.g. "ESRI:37001"-->
2. The second is also with a user input specification, but with a so-called *proj4* character string.
The *proj4* character string for the LAEA projection is `"+proj=laea +lat_0=52 +lon_0=10 +x_0=4321000 +y_0=3210000 +ellps=GRS80 +units=m +no_defs"`.
However, *proj4* character strings should be used with caution since they often lack important CRS information regarding datums and CRS transformations.
Also note that the name *proj4* stands for the **PROJ** library version 4, while the current major version of **PROJ** at the time of writing is already 8.
3. The third way is to provide some WKT definition of the projection. <!--...-->
4. The last way to create a `crs` object is to extract it from an existing spatial data object (e.g., an **sf** or **stars** object) using the `st_crs()` function.
A `crs` object can define a new spatial object's projection or transform an existing spatial object into another projection.
In the example below, we created a new object, `waterfalls`, with names and coordinates of three famous waterfalls.
Next, we converted it into a spatial object of the `sf` class, `waterfalls_sf()` with `st_as_sf()`.
We can see that our object's coordinate reference system is not defined with the `st_crs()` function.
```{r crs-use}
# create a data.frame of three famous waterfalls
waterfalls = data.frame(name = c("Iguazu Falls", "Niagara Falls", "Victoria Falls"),
lat = c(-25.686785, 43.092461, -17.931805),
lon = c(-54.444981, -79.047150, 25.825558))
# create sf object (without specifying the crs)
waterfalls_sf = st_as_sf(waterfalls, coords = c("lon", "lat"))
# extract crs (not defined yet)
st_crs(waterfalls_sf)
```
This function also allows us to specify CRS of our object - in this example, coordinates of our object are in the WGS84 coordinate system, and thus we can use the EPSG code of 4326.
We can also confirmed that our operation was successful also using `st_crs()`.
```{r}
# specify crs
st_crs(waterfalls_sf) = 4326
# extract crs
st_crs(waterfalls_sf)