Star Trek TNG’s Data

Dealing with data

Data and its discontents

(Too) tidy data

Spatial data

image of Rodin’s Le Penseur cropped from
commons.wikimedia.org by Douglas O'Brien

What is/are data?

Not ‘given’ facts

Not information

Not knowledge

Not wisdom

image source boingboing.net

One way to think about data

Data → Information → Knowledge

but this ignores

Data ← Information ← Knowledge

image at mitpress.mit.edu

Better ways to think about data

D’Ignazio and Klein’s Data Feminism is persuasive

Lauren Klein suggests in this interview
“data is pretty much anything that has been systematically collected”

Systematicity implies the wielding of power

Anyhoo...

Enter data science

Reinforces (false) notions of neutrality

What gets counted counts

image source nasa.gov

From a geographical perspective

The “god trick” (Haraway)

Official statistics, e.g. census


My abandoned attempt to keep track of the ANZ COVID numbers

In sum:

Data are messy, complicated, power-laden, inherently biased

No matter what anyone tries to tell you

# About half (possibly way more) of all data science ## is tidying and organising data

Proof 1

source Bureau of Made Up Statistics

# Proof 2 ## “It is often said that 80% of data analysis is spent on the process of cleaning and preparing the data.” ([Wickham 2016](http://dx.doi.org/10.18637/jss.v059.i10), page 1, citing Dasu & Johnson 2003)
# One way or another... # It’s a lot ## So learning how to do it effectively is important ## A large part of the draw of the *R* `tidyverse` is the promise of making these tasks easier

Tidy data

“In tidy data:

  1. Each variable forms a column.
  2. Each observation forms a row.
  3. Each type of observational unit forms a table.”

Wickham 2016, page 4

Var1Var2...Varp
Obs1
Obs2
...
...
...
Obsn

Spatial data

More made up statistics, but... almost all spatial data are tidy

So... many tidyverse tools can be applied to spatial data

image source commons.wikimedia.org from Snow, John. On the Mode of Communication of Cholera, 2nd Ed, John Churchill, New Burlington Street, London, England, 1855.

screenshot from Atari Battlezone game (1981)
see a video of the game here

How this works (1)

Entity-attribute model

Representation by a geometric object, often called a ‘geometry’

Each entity in the world is stored as object with associated attributes

the rural idyll in the Minecraft universe

How this works (2)

Raster model

The world is a grid of cells

Each phenomenon is stored as a measurement in each cell

# Summary ## The world is complicated—data are too, but necessarily simplify things ## ‘Tidy’ data are easier to work with ## Spatial data are tidy if we consider a world made of objects or of cells