part of a huge mosaic map of the world by
Chris Chamberlain, see this article for more

Classification and clustering

Think of thematic mapping

We classify values into groups for mapping

We do this one attribute at a time

Think of thematic mapping

We classify values into groups for mapping

We do this one attribute at a time

Classification or symbolization

Cases are classified into a small number of bins or buckets

# Some classification options ## Equal interval ## Quantiles ## Fisher (also others)

image from The World in Infographics:
Animal Kingdom by Jon Richards

Classification is fundamental to knowledge

Things are complicated and multidimensional

Classification can make things easier to grasp

Classification (grouping) simplifies

Another way to think about it

source ‘Monopoly editions’ web page

source Twelve Mile Circle blog

An aside

Wellington Monopoly is weird

see stuff.co.nz

# Key idea: # Classification # (can be) based on # data values

How it works

Classification is about putting things in buckets

For one variable all we do is decide break points

Think of this as segmenting a one-dimensional space in which polygons are located by numbers attached to them

Add a variable

This idea on a map

Bivariate choropleths

image from joshuastevens.net, see that page for more

‘Letting the data speak’

Segmenting each variable separately not satisfactory

Misses interdepencies among variables

Better to use statistical clustering methods

These work off differences and similarities among data points to find ‘natural’ clusters in data

K-means clustering

image source naftaliharris.com

Another nice demo is here at kkevsterrr.github.io by Kevin Bock

# Features of k-means ## It is non-deterministic ## You specify the number of classes (_k_) ahead of time ## It will _always_ find _k_ clusters ## Taken together: this means analyst interpretation is critical

image source stackoverflow.com

Alternatives to k-means

Hierarchical clustering

DBScan family of methods

These are all unsupervised methods and primitive examples of machine-learning

source houzz.com

Classification and clustering

Classification and mapping

The motivation

How it works

Summary

Think of thematic mapping

We classify values into groups for mapping

We do this one attribute at a time

Think of thematic mapping

We classify values into groups for mapping

We do this one attribute at a time

Classification or symbolization

Cases are classified into a small number of bins or buckets

Classification is fundamental to knowledge

Things are complicated and multidimensional

Classification can make things easier to grasp

Classification (grouping) simplifies

Another way to think about it

An aside

Wellington Monopoly is weird

How it works

Classification is about putting things in buckets

For one variable all we do is decide break points

Think of this as segmenting a one-dimensional space in which polygons are located by numbers attached to them

Add a variable

This idea on a map

Bivariate choropleths

‘Letting the data speak’

Segmenting each variable separately not satisfactory

Misses interdepencies among variables

Better to use statistical clustering methods

These work off differences and similarities among data points to find ‘natural’ clusters in data

K-means clustering

Another nice demo is here at kkevsterrr.github.io by Kevin Bock

Alternatives to k-means

Hierarchical clustering

DBScan family of methods

These are all unsupervised methods and primitive examples of machine-learning

Summary

Single variable choropleth maps limited for understanding combined factors

Clustering offers a different approach

Interpretation of clusters found is critical

We’ll look at two practical examples in the next lecture