Paul Klee’s Polyphony 1932, Kunstmuseum, Basel
image source commons.wikimedia.org

From overlay to regression

Overlay in GIS

A general perspective on overlay

The basics of regression models

Orange cranberry cake by
Helen Fletcher on flickr.com

GIS layers

One layer per phenomenon

Much practical GIS concerned with the totality of relationships between layers

Particularly the question: “What are all the factors present at this location?”

Most often expressed through overlay analysis

image Ian McHarg's Design with Nature 1969
source suzanneodonovan.wordpress.com

Overlay has a long history, see: Steinitz C, P Parker and L Jordan. 1976. Hand-drawn overlays: Their history and prospective uses. Landscape Architecture (September) 444–455.

Makara turbines from Hawkins Hill

Applications

Site suitability analysis (development, habitat, facility location)

Risk assessment (fire, landslide, earthquake, flood)

Resource evaluation (solar energy, wind energy, agriculture)

...and so on...

image source gitta.info

Overlay two ways

Polygon overlay

Create new layer by intersecting all input layers

New polygons ‘inherit’ all attributes of input polygons

Use all attributes to identify regions of interest

Use `st_intersection` in R

Overlay of factors indicative of low
quality housing Richmond, Virginia, 1934,
in Overlay (in GIS) article by Ola Ahqvist
in International Encyclopedia of Human
Geography

Raster overlay

image source Figure 6.4 in O'Sullivan D and Perry GLW, 2013 Spatial Simulation: Exploring Pattern and Process Wiley.

Overlay becomes a calculation at each cell

Using values at that location across all input layers

An example of map algebra

In R use
`r3 <- r1 + r2`

image source gis.stackexchange.com

Points to note

Ensuring all layers are projected correctly is critical

Raster overlay limited to lowest resolution input

Slivers issue for polygon method

Polygon method best with small number of layers

Raster / grid method scales better to large number of layers

aerial view of the Arc de Triomphe, Paris
image source yatzer.com

Key is that there are many ways to combine layers

Binary
black and white

Input layers are yes/no, true/false, good/bad

Outcome of interest usually all true or all false

Summation admits
shades of grey

Input layers are scores from low to high values in some fixed range

Outcome is sum of all inputs, from low to high

Layers effectively considered of equal importance

image source commons.wikimedia.org

Weighted overlay

Each layer on a different scale reflecting relative importance, or

All layers on same scale, weights set to reflect relative importance

Outcome is weighted sum

Important to investigate impact of changing weights

image source flickr by Tony Hisgett

Making overlay more
evidence-based

In overlay we need weights to associate
with each layer; where to get them?

Inexpert opinion (gut feel?
truthiness? ‘seems about right’?)

Expert opinion

# Weights-of-evidence overlay ## A principled approach to determining relative importance of factors ## No convenient one-stop tool, but easy enough to calculate ## Weights tell us relative probability of occurrence in each cover type or region ## Weights are converted to log form and summed

Calculating the weights

For each factor, weight is events per unit area inside versus outside area where factor applies \[ w_X = \frac{(n_{\mathrm{in}\,X}/{\mathrm{Area}}_{\;X})}{(n_{\mathrm{not\,in}\,X}/{\mathrm{Area}}_{\;\mathrm{not}\,X})} \]

                  
                    Region     n  area in_density out_density weight  log_w
                    <chr>  <int> <dbl>      <dbl>       <dbl>  <dbl>  <dbl>
                  1 A        109 0.891      122.         25.1  4.87   0.688
                  2 B         82 0.921       89.0       121.   0.735 -0.134
                  3 C         38 0.525       72.4       115.   0.628 -0.202

# Overlay in overview ## Overlay involves deriving some output layer $y$ as a function $f$ of a collection of input layers $X=\{x_i\}$ ## Mathematically: $y=f(X)$ ## For weighted overlay: $y=w_0+w_1x_1+w_2x_2+\ldots+w_nx_n$

Francis Galton’s illustration of correlation, 1875 image source commons.wikimedia.org

Regression & friends

Regression is a standard statistical approach to relating variables to one another

Based on a statistical model

We can use regression to determine $f(\mathbf{X})$ in overlay

Structure of regression

Dependent variable (the thing we are modelling)

Independent variables (the explanatory factors)

There are various preferences for the distribution of the variables (normal, no outliers, etc.), which makes exploring data first really important, especially for outlier detection

The result

\[ y=b_0+b_1x_1+\ldots+b_ix_i+\ldots+b_px_p+\epsilon \]

where $y$ is the dependent variable, $x_i$ are the independent variables, $b_i$ are the regression coefficients, and $\epsilon$ is the model error

This mirrors overlay’s
\[ y=w_0+w_1x_1+w_2x_2+\ldots+w_nx_n \]

Summary

Overlay is the most characteristic GIS method

Vector and raster approaches are possible

Can be rather subjective

Regression relates a dependent variable, $y$, to a collection of independent variables, $X=\{x_i\}$

Regression coefficients $b_i$ are an evidence-based way to think about overlay