This document is a ‘cheatsheet’ for working with raster datasets in
R and introduces the terra
package which is the
preferred option for handling such data currently. We first load some
libraries (including some vector libraries too).
Reading a dataset is pretty simple, especially if it is a simple
dataset. I have put a bunch of GeoTIFFs in this zip file, which you should
download and unzip to a folder on your computer. Then read the digital
elevation model dataset dem.tif
into a variable with the
rast()
function, like this:
Confusingly this doesn’t tell you anything about what it just did. So you have to ask:
## class : SpatRaster
## dimensions : 242, 240, 1 (nrow, ncol, nlyr)
## resolution : 100, 100 (x, y)
## extent : 1735056, 1759056, 5419610, 5443810 (xmin, xmax, ymin, ymax)
## coord. ref. : NZGD2000 / New Zealand Transverse Mercator 2000 (with axis order normalized for visualization)
## source : dem.tif
## name : dem
## min value : 0.000
## max value : 518.799
This tells us a few things about the dataset, including the dimension (in numbers of cells), the resolution (in units of the projection), the extent, and projection . It also tells us the minimum and maximum values in the data.
As usual in R we can get a summary of the statistics of the dataset:
## dem
## Min. : 0.00
## 1st Qu.: 82.07
## Median :149.17
## Mean :156.96
## 3rd Qu.:219.22
## Max. :518.80
## NA's :23790
What are all those NA values? Well… best bet is to plot the data to make sense of it
This shows us right away that a lot of the rectangular area of the raster is taken up by sea, and since it is an elevation model, we might expect the sea areas to be recorded as NA values (which they are).
tmap
can consume rasters too!tmap
knows about raster data and can map them, quite
happily. The same options such as the number of classes
(n
), the style
of classification
(e.g. quantile), and the colour palette
to us in the
mapping also work for rasters. Here, because you might want to use it,
I’ve also shown how to use R’s built-in terrain colour
palette.
tmap
can also map raster layers in combination with
non-raster layers. For example, read in the nz.gpkg
file
also found in the provided zip file, and add it as an outline around the
raster:
## Reading layer `welly' from data source
## `/Users/osullid3/Documents/teaching/Geog315/slides/raster-cheatsheet/welly.gpkg'
## using driver `GPKG'
## Simple feature collection with 79 features and 64 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: 1735078 ymin: 5419585 xmax: 1759042 ymax: 5443764
## Projected CRS: NZGD2000 / New Zealand Transverse Mercator 2000 (with axis order normalized for visualization)
There are a bunch of raster data sets in this folder. We can read
them into a raster stack by making a directory listing of them and
supplying that to the rast
function
Again, you have to ask to get more information
## class : SpatRaster
## dimensions : 242, 240, 11 (nrow, ncol, nlyr)
## resolution : 100, 100 (x, y)
## extent : 1735056, 1759056, 5419610, 5443810 (xmin, xmax, ymin, ymax)
## coord. ref. : NZGD2000 / New Zealand Transverse Mercator 2000 (with axis order normalized for visualization)
## sources : age.tif
## deficit.tif
## dem.tif
## ... and 8 more source(s)
## names : age, deficit, dem, mas, mat, r2pet, ...
## min values : 0, 0.0000, 0.000, 138, 100, 24, ...
## max values : 2, 133.0221, 518.799, 143, 131, 41, ...
If we want to know the names of all the layers, we can use the
names()
function
## [1] "age" "deficit" "dem" "mas" "mat" "r2pet" "rain"
## [8] "slope" "sseas" "tseas" "vpd"
And plot
will make several plots of all the layers (this
might be a bit slow).
tmap
can also deal with multiple layers, although it
will apply the same classification to all of them unless you override
this behaviour:
tm_shape(layers) +
tm_raster(title = names(layers)) +
tm_layout(legend.position = c("RIGHT", "BOTTOM")) +
tm_facets(free.scales = TRUE) # different scales on each layer
There might be some complaining about fitting the legends in on this
plot, or about projections (this is to do with recent changes in how
projections are handled by the proj
tools see this
post but you can ignore it for now).
To examine a single layer in the stack, just use the $
notation:
The layers included now in the layers
dataset are:
The layers in the stack were assembled from LENZ
data developed by Manaaki Whenua Landcare Research, and available here. Their interpretations are
roughly as follows (this information from the disdat
package).
Layer name | Explanation | Units |
---|---|---|
age | Soil age | 3 classes (0 to 2): <2000, 2000-postglacial (app. 30,000), and pre-glacial |
deficit | Annual water deficit | mm (higher is greater deficit) |
dem | Elevation | metres |
mas | Mean annual solar radiation | MJ/m2/day |
mat | Mean annual temperature | degrees C * 10 |
r2pet | Average monthly ratio of rainfall and potential evapotranspiration | (ratio) |
rain | annual precipitation | mm |
slope | Slope | degrees |
sseas | Solar radiation seasonality | dimensionless |
tseas | Temperature seasonality | degrees C |
vpd | Mean October vapor pressure deficit at 9 AM | kPa |
terra
will also allow you to do operations between
layers, to perform raster calculations. For example, we can easily do
this (nonsensical) calculation
## class : SpatRaster
## dimensions : 242, 240, 1 (nrow, ncol, nlyr)
## resolution : 100, 100 (x, y)
## extent : 1735056, 1759056, 5419610, 5443810 (xmin, xmax, ymin, ymax)
## coord. ref. : NZGD2000 / New Zealand Transverse Mercator 2000 (with axis order normalized for visualization)
## source(s) : memory
## name : mas
## min value : -14228.81
## max value : 274.00
This makes a new layer by mathematically combining the values in the original layers. When combined in this way raster layers must be aligned with one another and in the same projected coordinate system, and the values at the same locations in each layer contribute to the result at the corresponding location in the result.
This means that we can use raster calculations of this type to perform overlay as described in the next section.
If we recode each layer by whether or not it is greater than some threshold, then we can logically combine them to find regions that satisfy a desired combination of criteria. Say we wanted places below 250m elevation with slope under 5 degrees:
below_250m <- layers$dem < 250
flat <- layers$slope < 5
result <- below_250m & flat
names(result) <- c("flat_under_250m") # auto-assigned name is unhelpful
As usual make some maps
m1 <- tm_shape(below_250m) +
tm_raster(palette = c("white", "black"),
style = "cat", title = "dem<250m") +
tm_shape(welly) +
tm_borders(col = "darkgrey", lwd = 0.35)
m2 <- tm_shape(flat) +
tm_raster(palette = c("white", "black"),
style = "cat", title = "slope<5deg") +
tm_shape(welly) +
tm_borders(col = "darkgrey", lwd = 0.35)
m3 <- tm_shape(result) +
tm_raster(palette = c("white", "black"),
style = "cat", title = "Flat below 250m") +
tm_shape(welly) +
tm_borders(col = "darkgrey", lwd = 0.35)
tmap_arrange(m1, m2, m3, nrow = 1)
There is nothing to stop us performing all these operations in a single step, once we are used to how it works:
This makes it relatively easy to flexibly perform complex suitability
evaluations using terra
raster calculation operations.
The simple calculations in the previous sections are known as local because they apply locally across the values at the same locations in each layer. Two other kinds of calculation are possible focal and zonal.
We can apply functions in focal windows within a single layer and perform calculations based on the values within that window centred on each location across a raster. For example, we might want to determine the max value of raster cells in a 3x3 window.
A 3x3 window is the default. If you want a larger window specify the
size with the w
parameter (which must be an odd number)
Notice how a larger window produces a more ‘smoothed’ or generalised result.
Again, we can flexibly combine such operations. For example a measure of relief (or roughness) is the local difference between the maximum and minimum values of that surface
You can also perform calculations based on a set of zones. If you
want the results in a raster, then both inputs must be rasters. That
means we have to do quite a lot of fiddling around with the data. First
we make a zones dataset from some polygons, converting first to
terra
‘s SpatVector
format, then using
rasterize
to select one attribute from the zones as an
identifier for zones. The raster we provide to the
rasterize
function is a ’template’ for the number of cells
and their resolution.
welly_zones <- welly %>%
as("SpatVector") %>%
rasterize(relief, field = "ID")
tm_shape(welly_zones) +
tm_raster(n = 79, palette = "Set1", legend.show = FALSE) +
tm_shape(welly) +
tm_borders()
We can’t easily map the zones, because there are a large number of unique values.
Once we have zones we can perform a zonal
calculation,
which will calculate the result of applying a function across all the
raster cells in each zone. We may need to change the NA values in the
input raster layer to zeros (or some other defined value) so that any
missing NA values in a zone don’t prevent calculation of a result for
the zone they are in.
relief[is.na(relief)] <- 0
zonal_relief <- relief %>%
zonal(welly_zones, fun = mean, as.raster = TRUE) %>%
mask(relief) ##
The result of this operation is a raster where the value in each cell is the mean relief of all the cells in the zone it is in.
tm_shape(zonal_relief) +
tm_raster(n = 10, palette = "RdYlBu", title = "Zonal mean relief") +
tm_shape(welly) +
tm_borders()
In many situations, it is more appropriate to output the results from zonal calculations as a table of values which can be attached to the original vector data.
terra
provides an array of commonly used surface
analysis methods. These are a better option than figuring out how to
calculate them yourself using focal functions.
For example, we can calculate slopes from an elevation model using
the terrain
function
The terrain function offers several options, which you can find out
more about in the help using ?terrain
. A common use is for
calculating hillshade, which can be done like the example below
In the examples for zonal calculations, it was necessary to combine raster and vector data and translate between them.
This is a common requirement, and can sometimes be challenging. Here
a few important examples are shown. The key to success is realising that
terra
has its own vector data format
SpatVector
and terra functions want vector data to be in
this format in order to work. To convert sf
vector data to
SpatVector
use as("SpatVector")
and to convert
from SpatVector
to sf
use
st_as_sf()
.
For example, say we have a set of points, which we want values from a raster layer attached to.
First load the points. Here we have some sampled plants. They have to
be projected to match the target raster, and we also restrict them only
to the study area using st_filter()
.
## Reading layer `tradescantia' from data source
## `/Users/osullid3/Documents/teaching/Geog315/slides/raster-cheatsheet/tradescantia.gpkg'
## using driver `GPKG'
## Simple feature collection with 1635 features and 3 fields
## Geometry type: POINT
## Dimension: XY
## Bounding box: xmin: -176.5629 ymin: -46.9931 xmax: 178.4418 ymax: -34.7562
## Geodetic CRS: WGS 84
Extract the values of a particular raster with this point set, we
have to convert it to SpatVector
points, then run
terra::extract
(we have to specify terra::
because there is another function called extract
in
tidyr
).
## # A tibble: 120 × 2
## ID mat
## <dbl> <dbl>
## 1 1 125.
## 2 2 119
## 3 3 126.
## 4 4 114.
## 5 5 117
## 6 6 130
## 7 7 121
## 8 8 123.
## 9 9 125
## 10 10 120.
## # ℹ 110 more rows
The result is just a table. If we want to map this, we have to add it back to the original spatial data:
plants <- plants %>%
bind_cols(mean_t)
tm_shape(hillshade) +
tm_raster(palette = "Greys", alpha = 0.5, style = "cont",
legend.show = FALSE) +
tm_shape(plants) +
tm_bubbles(col = "mat", palette = "-RdBu", alpha = 0.5,
title.col = "Mean Temp (x10 C)")
We can apply terra::extract
to a raster stack to get a
multi-attribute table all in one go:
## # A tibble: 120 × 12
## ID age deficit dem mas mat r2pet rain slope sseas tseas vpd
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 1 2 62.3 81.9 142 125. 29 1182. 25.0 44 55 35
## 2 2 2 26.6 170. 141 119 35 1206. 7.64 44 47 29
## 3 3 2 48.9 74.0 141 126. 32.1 1194. 24.0 44 57 31
## 4 4 2 14 271. 139 114. 39 1172. 17.7 44 47 25
## 5 5 2 25.4 201. 141 117 36 1153. 6.01 44 45 28
## 6 6 2 99.0 8.21 143 130 26 1116. 1 45 60 36
## 7 7 2 49.6 141. 142 121 32 1134. 14.6 44 48 31
## 8 8 2 41.4 116. 140 123. 35 1159. 15 44 50 28
## 9 9 2 45.6 83.6 141 125 33 1190. 13.0 44 55 31
## 10 10 2 27.0 167. 140 120. 36 1197. 15.6 44 47 28
## # ℹ 110 more rows
This is very useful in statistical model building (which we will get to in a week or two.)