sf
library(sf)
library(tmap)
library(dplyr)
library(magrittr)
library(tidyr)
map_mode <- "plot"
tmap_mode(map_mode)
The sf
package supports a wide range of spatial
operations both on individual layers, and between layers. These are
discussed in this document, for your reference, and may be particularly
useful when it comes to the mini-project later in the semester.
I have split operations into those which are performed between two datasets, and those which transform an existing dataset by some geometric method.
I will use three small datasets to demonstrate the various operations. These are loaded as follows:
## Reading layer `points' from data source
## `/Users/osullid3/Documents/teaching/Geog315/slides/spatial-data-wrangling/points.gpkg'
## using driver `GPKG'
## Simple feature collection with 575 features and 7 fields
## Geometry type: POINT
## Dimension: XY
## Bounding box: xmin: 1746724 ymin: 5426053 xmax: 1749660 ymax: 5429090
## Projected CRS: NZGD2000 / New Zealand Transverse Mercator 2000
## Reading layer `lines' from data source
## `/Users/osullid3/Documents/teaching/Geog315/slides/spatial-data-wrangling/lines.gpkg'
## using driver `GPKG'
## Simple feature collection with 1664 features and 2 fields
## Geometry type: MULTILINESTRING
## Dimension: XY
## Bounding box: xmin: 1746331 ymin: 5425842 xmax: 1749730 ymax: 5429519
## Projected CRS: NZGD2000_New_Zealand_Transverse_Mercator_2000
## Reading layer `polys' from data source
## `/Users/osullid3/Documents/teaching/Geog315/slides/spatial-data-wrangling/polys.gpkg'
## using driver `GPKG'
## Simple feature collection with 406 features and 2 fields
## Geometry type: POLYGON
## Dimension: XY
## Bounding box: xmin: 1746199 ymin: 5425271 xmax: 1749916 ymax: 5430532
## Projected CRS: NZGD2000 / New Zealand Transverse Mercator 2000
And make a map to see what we are dealing with
tm_shape(mblocks) +
tm_polygons(col = "pop2013", alpha = 0.5) +
tm_shape(streets) +
tm_lines(col = 'purple') +
tm_shape(crashes) +
tm_dots()
These data are:
All for an area around 1.6km from Te Herenga Waka - Victoria University.
The spatial operation most closely related to data table operations is a spatial join. This operation depends on the spatial relations between two layers, which are called binary (spatial) predicates. So we need to look at the predicates before considering join operations.
The binary predicates determine whether or not various spatial
conditions are met between two datasets. The binary predicates available
in sf
are listed here. These vary in their exact
interpretation depending on whether the layers involved are points or
lines or polygons, but an illustration of them for polygons is shown
below, to give you the idea.
These are as discussed in the paper
Egenhofer MJ & Franzosa RD 1991 Point-set topological spatial relations International Journal of Geographical Information Systems 5(2) 161-174.
This wikipedia article provides a good overview of the underlying concepts.
The various binary spatial predicate functions in sf return for two datasets which geometries in the first dataset are related to which in the second based on the relation specified. For example, if we want to know which points in the crashes dataset are in which meshblocks, we would do
## Sparse geometry binary predicate list of length 575, where the
## predicate was `within'
## first 10 elements:
## 1: 123
## 2: 73
## 3: 267
## 4: 264
## 5: 91
## 6: 292
## 7: 68
## 8: 49
## 9: 213
## 10: 74
The list output is which of the polygons in the mblocks
dataset the first 10 of the points in the crashes
data
layer is within. If we do the equivalent test in reverse
## Sparse geometry binary predicate list of length 406, where the
## predicate was `contains'
## first 10 elements:
## 1: (empty)
## 2: (empty)
## 3: (empty)
## 4: (empty)
## 5: 326
## 6: 118, 298, 478, 505
## 7: (empty)
## 8: 108, 134, 221, 295, 348, 451, 535
## 9: 143, 216, 439
## 10: (empty)
we get which points are contained by each of the polygons. In this
case the first four polygons in mblocks
contain no points
from crashes
, the next one contains crash 326, the next one
contains several crashes, and so on.
The results we see here show only the first few rows of the results.
They are also in a ‘sparse’ format showing only the relations which are
TRUE
. If we request the complete results by setting the
option sparse
to FALSE
then we see a matrix
which shows for every combination of object in the first dataset (the
rows) which objects in the second dataset (the columns) satisfies the
required spatial relationship.
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
## [1,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [2,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [3,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [4,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [5,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [6,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [7,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [8,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [9,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [10,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
Usually the sparse form (the default) is less of a pain to deal with! In that form, the result is a list and extracting the information it contains requires some additional processing.
As an example, we can use the st_contains()
operation to
count points in polygons. For example, perhaps we want to count the
number of crashes in different areas, then the lengths()
function applied to the result of st_contains()
gives us
that information.
num_crashes <- mblocks %>%
st_contains(crashes) %>%
lengths()
mblocks.n <- mblocks %>%
mutate(n = num_crashes)
tm_shape(mblocks.n) +
tm_polygons(col = "n")
If we want the information about which particular objects are in the
result list from applying the spatial predicate, then we use the
unlist()
function to get at it. So for example, a list of
the polygons that contain a crash is provided by
## [1] 123 73 267 264 91 292 68 49 213 74 195 76 11 153 271 194 186 144
## [19] 81 64 103 167 118 197 45 87 381 85 113 72 111 188 93 312 201 156
## [37] 94 184 172 13 146 245 229 405 193 333 226 334 154 147 102 282 216 145
## [55] 171 136 347 86 183 162 203 21 138 51 190 173 289 39 307 379 199 165
## [73] 8 99 101 310 255 6 191 364 248 254 204 284 151 97 225 9 137 205
## [91] 273 25 131 269 202 108 305 122 116 209 373 170 198 239 250 125 276 82
## [109] 132 65 398 100 187 371 90 345 189 71 325 401 326 363 327 182 200 150
## [127] 180 88 168 281 110 238 83 303 389 37 77 119 221 24 215 78 291 120
## [145] 275 141 96 158 357 84 224 5 404 374 67 356 128 35 149 126 69 117
## [163] 252 376 92 306 338 115 290 349 28 181 34 247 311 228 63 265 366 139
## [181] 40 142 155 328 105 62 382 75 251 324 208 152
The two key spatial operations between two datasets are spatial
filter (st_filter
) and spatial join
(st_join
).
A spatial filter operation is like the non-spatial
filter
function in dplyr
and will pick out
only those rows in the data that meet some criterion, where the
criterion is a spatial one, based on the relationship to some other
dataset. So for example, we can do a selection of the polygons that
contain one or more points like this
mblocks %>%
st_filter(crashes) %>%
tm_shape() +
tm_polygons() +
tm_shape(crashes) +
tm_dots(size = 0.01)
st_filter
applies some spatial predicate to the
relationship between two layers to perform the filtering. By default the
predicate is st_intersects
which in most cases will pick
out where two spatial layers coincide with one another.
As a different example, consider a situation where we have selected only one polygon in the dataset
mb100 <- mblocks %>%
slice(100) # this selects just row number 100 (we could pick any)
tm_shape(mblocks) +
tm_polygons() +
tm_shape(mb100) +
tm_polygons(col = 'red')
Now we can get the neighbouring polygons like this, by specifying an appropriate predicate:
nmblocks <- mblocks %>%
st_filter(mb100, predicate = st_touches)
tm_shape(mblocks) +
tm_polygons() +
tm_shape(nmblocks) +
tm_polygons(col = 'orange') +
tm_shape(mb100) +
tm_polygons(col = "red")
Note that the initially selected polygon is considered to ‘touch’ itself. It doesn’t arise in this simple example, but this capability can be used to perform complicated multistep spatial manipulations of datasets.
A spatial join not only applies some spatial predicate to the relation between two layers, but will append the data from the second layer to the data table of the first.
This is an effective way of spatially merging datasets for further
analysis. An example of st_join
is in the main
lab materials. The trickiest aspect to spatial joins is when more
than one object meets the join criterion. For example, if we are joining
data from points to polygons then many points may intersect with each
polygon:
## Simple feature collection with 789 features and 9 fields
## Geometry type: POLYGON
## Dimension: XY
## Bounding box: xmin: 1746199 ymin: 5425271 xmax: 1749916 ymax: 5430532
## Projected CRS: NZGD2000 / New Zealand Transverse Mercator 2000
## First 10 features:
## MB2013 pop2013 CRASH_SEV MULTI_VEH LOCN1 LOCN2
## 1 2142900 9 <NA> <NA> <NA> <NA>
## 2 2124200 3 <NA> <NA> <NA> <NA>
## 3 2128800 177 <NA> <NA> <NA> <NA>
## 4 2119900 6 <NA> <NA> <NA> <NA>
## 5 2122600 0 N Multi vehicle MOLESWORTH ST LAMBTON QUAY
## 6 2131400 150 N Multi vehicle DIXON ST VICTORIA ST
## 6.1 2131400 150 N Multi vehicle VICTORIA ST FELTEX LANE
## 6.2 2131400 150 N Multi vehicle VICTORIA ST FELTEX LANE
## 6.3 2131400 150 N Multi vehicle DIXON ST WILLIS ST
## 7 2133407 36 <NA> <NA> <NA> <NA>
## fatalities serious minor geom
## 1 NA NA NA POLYGON ((1749174 5426376, ...
## 2 NA NA NA POLYGON ((1748753 5428642, ...
## 3 NA NA NA POLYGON ((1748281 5427565, ...
## 4 NA NA NA POLYGON ((1748645 5429096, ...
## 5 0 0 0 POLYGON ((1749061 5428876, ...
## 6 0 0 0 POLYGON ((1748521 5427439, ...
## 6.1 0 0 0 POLYGON ((1748521 5427439, ...
## 6.2 0 0 0 POLYGON ((1748521 5427439, ...
## 6.3 0 0 0 POLYGON ((1748521 5427439, ...
## 7 NA NA NA POLYGON ((1748813 5427143, ...
There are only 406 polygons in the mblocks
dataset, but
the joined dataset has 789!
This is because a duplicate entry is made each time a crash point
occurs inside a meshblock. In effect the table we get as output is a
table of the ‘spatial joins’ between the two original datasets, where a
join is a point inside a polygon. Notice also that where a polygon
contains no points we get <NA>
entries in the table,
because no data was found in the crashes
layer that
intersected with the mblocks
layer, but the polygon is
retained anyway.
If we simply don’t want the <NA>
values we can
specify an ‘inner join’ so that only records where a match was found are
retained. We do this by overriding the default ‘left join’ behaviour,
which prioritises retaining all rows in the left hand (i.e. first)
table.
## Simple feature collection with 575 features and 9 fields
## Geometry type: POLYGON
## Dimension: XY
## Bounding box: xmin: 1746575 ymin: 5425845 xmax: 1749846 ymax: 5429182
## Projected CRS: NZGD2000 / New Zealand Transverse Mercator 2000
## First 10 features:
## MB2013 pop2013 CRASH_SEV MULTI_VEH LOCN1 LOCN2
## 5 2122600 0 N Multi vehicle MOLESWORTH ST LAMBTON QUAY
## 6 2131400 150 N Multi vehicle DIXON ST VICTORIA ST
## 6.1 2131400 150 N Multi vehicle VICTORIA ST FELTEX LANE
## 6.2 2131400 150 N Multi vehicle VICTORIA ST FELTEX LANE
## 6.3 2131400 150 N Multi vehicle DIXON ST WILLIS ST
## 8 2134800 177 N Multi vehicle SH 1N TARANAKI ST
## 8.1 2134800 177 N Multi vehicle TARANAKI ST ABEL SMITH ST
## 8.2 2134800 177 N Multi vehicle TARANAKI ST ARTHUR ST
## 8.3 2134800 177 N Multi vehicle SH 1N TARANAKI ST
## 8.4 2134800 177 N Multi vehicle SH 1N TARANAKI ST
## fatalities serious minor geom
## 5 0 0 0 POLYGON ((1749061 5428876, ...
## 6 0 0 0 POLYGON ((1748521 5427439, ...
## 6.1 0 0 0 POLYGON ((1748521 5427439, ...
## 6.2 0 0 0 POLYGON ((1748521 5427439, ...
## 6.3 0 0 0 POLYGON ((1748521 5427439, ...
## 8 0 0 0 POLYGON ((1748721 5426755, ...
## 8.1 0 0 0 POLYGON ((1748721 5426755, ...
## 8.2 0 0 0 POLYGON ((1748721 5426755, ...
## 8.3 0 0 0 POLYGON ((1748721 5426755, ...
## 8.4 0 0 0 POLYGON ((1748721 5426755, ...
This still leaves the question of how to combine the multiple pieces
of information that are now associated with each polygon derived from
multiple crashes. There is no one solution here as it will depend on the
interpretation of the data what makes sense. Usually we have to group
the joined data based on a unique identifier from the original (left
hand) data table, and then use summarise
to calculate
appropriate new variables. For example
crash_counts <- mblocks %>%
st_join(crashes) %>%
group_by(MB2013) %>% # this variable uniquely identifies each row
summarise(num_crashes = n(), # this counts the number of matching cases
fatalities = sum(fatalities),
serious = sum(serious),
minor = sum(minor),
pop2013 = first(pop2013)) %>%
select(MB2013, pop2013, num_crashes, fatalities, serious, minor)
as_tibble(crash_counts)
## # A tibble: 406 × 7
## MB2013 pop2013 num_crashes fatalities serious minor geom
## <chr> <int> <int> <int> <int> <int> <POLYGON [m]>
## 1 21019… 0 1 NA NA NA ((1748508 5430325, 17485…
## 2 21024… 3 1 NA NA NA ((1747656 5429203, 17476…
## 3 21025… 198 1 NA NA NA ((1747332 5429467, 17473…
## 4 21028… 174 1 NA NA NA ((1747086 5429464, 17470…
## 5 21030… 21 1 NA NA NA ((1747462 5429071, 17474…
## 6 21030… 51 1 NA NA NA ((1747558 5429004, 17475…
## 7 21055… 0 1 NA NA NA ((1746865 5428490, 17468…
## 8 21061… 39 2 0 0 2 ((1746734 5428512, 17467…
## 9 21111… 132 3 0 0 1 ((1746722 5428375, 17467…
## 10 21112… 243 1 NA NA NA ((1746568 5428275, 17465…
## # ℹ 396 more rows
The biggest problem this leaves us with is a lot of NA
values. The best way to deal with these is to use the
tidyr::replace_na()
function in the
summarise()
calculation to replace NA values with an
appropriate alternative. For example, we can do
crash_counts <- mblocks %>%
st_join(crashes) %>%
group_by(MB2013) %>%
summarise(num_crashes = n(),
fatalities = replace_na(sum(fatalities), 0),
serious = replace_na(sum(serious), 0),
minor = replace_na(sum(minor), 0),
pop2013 = first(pop2013)) %>%
select(MB2013, pop2013, num_crashes, fatalities, serious, minor)
as_tibble(crash_counts)
## # A tibble: 406 × 7
## MB2013 pop2013 num_crashes fatalities serious minor geom
## <chr> <int> <int> <int> <int> <int> <POLYGON [m]>
## 1 21019… 0 1 0 0 0 ((1748508 5430325, 17485…
## 2 21024… 3 1 0 0 0 ((1747656 5429203, 17476…
## 3 21025… 198 1 0 0 0 ((1747332 5429467, 17473…
## 4 21028… 174 1 0 0 0 ((1747086 5429464, 17470…
## 5 21030… 21 1 0 0 0 ((1747462 5429071, 17474…
## 6 21030… 51 1 0 0 0 ((1747558 5429004, 17475…
## 7 21055… 0 1 0 0 0 ((1746865 5428490, 17468…
## 8 21061… 39 2 0 0 2 ((1746734 5428512, 17467…
## 9 21111… 132 3 0 0 1 ((1746722 5428375, 17467…
## 10 21112… 243 1 0 0 0 ((1746568 5428275, 17465…
## # ℹ 396 more rows
Finally, you may want to clip one layer with another. This is done
using the st_intersection
function. Here’s how it works,
using the convex hull of a set of points (we’ll see what the convex hull
is in a moment).
# Clipping the polygons by the convex hull of the points
hull <- crashes %>%
st_union() %>%
st_convex_hull()
mblocks %>%
st_intersection(hull) %>%
plot()
## Warning: attribute variables are assumed to be spatially constant throughout
## all geometries
st_buffer
Buffering expands spatial objects by some specified distance out from
the objects. You have to specify a distance with the dist
parameter.
crashes.buf <- crashes %>%
st_buffer(dist = 50)
mblocks.buf <- mblocks %>%
st_buffer(dist = -20)
streets.buf <- streets %>%
st_buffer(dist = 20)
This function preserves any variables associated with the dataset in the buffered dataset. A set of points will become a set of circles when buffered, whereas lines become ‘lozenge’ shapes, and polygons become larger, smoothed versions of themselves.
Buffering is frequently used to explore topics like the spatial relationship between (for example) liquor outlets and schools, or accessibility to public transport and so on.
## tmap mode set to plotting
m1 <- tm_shape(mblocks.buf) +
tm_polygons(alpha = 0.75)
m2 <- tm_shape(streets.buf) +
tm_polygons(alpha = 0.75)
m3 <- tm_shape(crashes.buf) +
tm_polygons(alpha = 0.75)
tmap_arrange(m1, m2, m3, nrow = 1)
## Warning: The shape mblocks.buf contains empty units.
## tmap mode set to plotting
If dist
is negative (as it is here for the meshblocks)
it will shrink the geometry in on itself. This only makes sense for
polygons, and if the shrinking is too large it may cause errors or
warnings (as small polygons collapse to nothing or turn
inside-out…).
st_centroid
The centroid of a geometry is a point at its geometric centre.
## Warning: st_centroid assumes attributes are constant over geometries
The centroid is not guaranteed to be inside the geometry. There is at least one example of this in the example data - and most likely more.
If you need a representative point guaranteed to be inside a polygon,
then use st_point_on_surface()
instead, although be aware
that the location of this point is not guaranteed to be the same every
time your run the function!
st_convex_hull
The convex hull of a set of points is the shape you would
get if you stretched an elastic band around the outer edge of the set of
points that define the object. If we extract a single polygon from the
mblocks
dataset, that’s the easiest way to illustrate
this.
Now plot this and its convex hull
tm_shape(mb) +
tm_polygons() +
tm_shape(mb %>% st_convex_hull()) +
tm_polygons(col = 'blue', alpha = 0.35, border.col = 'red')
The convex hull of a geometry is a nice simplified summary of its location. It is particularly useful for representing a collection of points. However, if you apply it to a point dataset, you will just get a collection of points, because the function is applied to each point individually.
To create the convex hull of the whole set of points, you have to
first merge them into a multipoint with the
st_union
operation:
Some operations only make sense when applied to a collection of
geometries, not one object at a time. Two in particular that are widely
used are st_voronoi()
and
st_triangulate()
.
st_voronoi
The Voronoi region associated with a geometry is the region of space closer to that geometry than to any other geometry in a set of geometries. This is easiest to understand for points.
Unfortunately, the st_voronoi
operation applied to a
point dataset behaves rather unexpectedly, so a series of
transformations are required to convert it to a useful Voronoi polygon
dataset:
voronoi <- crashes %>%
st_union() %>% # combine points into a multipoint
st_voronoi() %>%
st_cast() %>% # no idea why this is required!
st_as_sf() # convert to a sf dataset
And now let’s plot them along with the originating points.
By default the function adds a large buffer region to the points before doing the Voronoi calculation. You can reduce this effect by intersecting the result with a bounding box or convex hull:
crashes.hull <- crashes %>%
st_union() %>%
st_convex_hull() %>%
st_buffer(dist = 100)
voronoi %>%
st_intersection(crashes.hull) %>%
tm_shape() +
tm_polygons() +
tm_shape(crashes) +
tm_dots()
The Voronoi polygons of a set of points can be considered ‘service areas’ of the points. Imagine the points are (say) bus stops, then each polygon is the area closer to a given bus stop than to any other bus stop, so other things being equal, it is the ‘service area’ of that bus stop.
Of course… the areas are determined without reference to movement on street networks or hills or anything else, so they are too simplistic for application in an unprocessed form. Alternative methods involve calculating distances over networks not just using straight point-to-point calculations.
An interesting example of the simple form of Voronoi, but applied to true great circle distances on Earth surface is this one by Jason Davies for world airports.
st_triangulate
The triangulation of a set of points connects the points together into a mesh of triangles. Similar to the Voronoi operation, some annoying additional conversions are required.
tri <- crashes %>%
st_union() %>% # combine potnts into a multipoint
st_triangulate() %>%
st_cast() %>% # no idea again!
st_as_sf() # convert to a sf dataset
tm_shape(tri) +
tm_polygons(alpha = 0.5) +
tm_shape(crashes) +
tm_dots()
The triangulation, the Voronoi polygons, and the convex hull are all closely related. The convex hull of the points is also the outer perimeter of the triangulation.
An alternative way to summarise the locations of a collection of
points is called the alphashape which is based on removing
edges from the triangulation until the remaining triangles more closely
fit the shape of the set of points. However alphashapes are not
supported by any functions in the sf
package. An R
package that supports alphashapes is alphahull
but it is
not directly compatible with sf
. If you want to see what
this looks likes try running the following code: