Day 36 - Geospatial Data

Fall 2022

Dr. Jared Joseph

December 09, 2022

Overview

Timeline

  • Geospatial Analyses in R

Goal

Learn the tools to do geospatial analyses in R.

Geospatial Analyses in R

Spatial Data in R

Local Data

There are a lot.

  • shp: Shape files (really whole folders)
  • geojson: JSON file with geo info
  • gml: XML with geo info
  • osm: A different falvor of geo xml
  • And more.

tidycensus

NEEDS AND API KEY

  • Ready access to US geospatial data
  • Country, state, county, metro area, and more
  • Can also be used for general census queries

Geospatial Geometry in tidycensus

Geography Definition Available by Available in
“us” United States get_acs(), get_decennial()
“region” Census region get_acs(), get_decennial()
“division” Census division get_acs(), get_decennial()
“state” State or equivalent state get_acs(), get_decennial()
“county” County or equivalent state, county get_acs(), get_decennial()
“county subdivision” County subdivision state, county get_acs(), get_decennial()
“tract” Census tract state, county get_acs(), get_decennial()
“block group” Census block group state, county get_acs(), get_decennial()
“block” Census block state, county get_decennial()
“place” Census-designated place state get_acs(), get_decennial()
“metropolitan statistical area/micropolitan statistical area” Core-based statistical area state get_acs(), get_decennial()
“combined statistical area” Combined statistical area state get_acs(), get_decennial()
“urban area” Census-defined urbanized areas get_acs(), get_decennial()
“congressional district” Congressional district for the year-appropriate Congress state get_acs(), get_decennial()
“school district (elementary)” Elementary school district state get_acs(), get_decennial()
“school district (secondary)” Secondary school district state get_acs(), get_decennial()
“school district (unified)” Unified school district state get_acs(), get_decennial()
“zip code tabulation area” OR “zcta” Zip code tabulation area get_acs(), get_decennial()
“state legislative district (upper chamber)” State senate districts state get_acs(), get_decennial()
“state legislative district (lower chamber)” State house districts state get_acs(), get_decennial()
“voting district” Voting districts (2020 only) state get_decennial()

Sample tidycensus Query

To load in geospatial data using tidycensus we need to specify

  • A geography
  • Possibly the state
  • Our variable(s)
  • What year
  • geometry = TRUE


Here I get the median income by county in MA for 2018.

# load in tidycensus
library(tidycensus)

# get county boundaries for MA
ma_medinc <- get_acs(geography = "county", 
              variables = c(medincome = "B19013_001"), 
              state = "MA", 
              geometry = TRUE,
              year = 2018)

ma_medinc
Simple feature collection with 14 features and 5 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: -73.50814 ymin: 41.23796 xmax: -69.92839 ymax: 42.88659
Geodetic CRS:  NAD83
First 10 features:
   GEOID                             NAME  variable estimate   moe
1  25001 Barnstable County, Massachusetts medincome    70621  1245
2  25013    Hampden County, Massachusetts medincome    53403   860
3  25025    Suffolk County, Massachusetts medincome    64582   805
4  25007      Dukes County, Massachusetts medincome    71224 12123
5  25017  Middlesex County, Massachusetts medincome    97012  1021
6  25005    Bristol County, Massachusetts medincome    66157   880
7  25009      Essex County, Massachusetts medincome    75878  1022
8  25003  Berkshire County, Massachusetts medincome    56674  1559
9  25011   Franklin County, Massachusetts medincome    59522  1827
10 25021    Norfolk County, Massachusetts medincome    99511  1321
                         geometry
1  MULTIPOLYGON (((-70.68698 4...
2  MULTIPOLYGON (((-73.07484 4...
3  MULTIPOLYGON (((-70.93091 4...
4  MULTIPOLYGON (((-70.8071 41...
5  MULTIPOLYGON (((-71.89877 4...
6  MULTIPOLYGON (((-70.83845 4...
7  MULTIPOLYGON (((-70.58029 4...
8  MULTIPOLYGON (((-73.50814 4...
9  MULTIPOLYGON (((-73.02301 4...
10 MULTIPOLYGON (((-70.84466 4...

Basic Maps

With some spatial data loaded in, it is easy enough to plot the data we have spatially.


We could plot using anything included in our spatial dataframe.


Thus, if we had data from elsewhere, as long as our boundaries matched, we could join and plot.

# load in tmap for spatial plotting
library(tmap)

# plot MA with color shading by median income
tm_shape(ma_medinc) +
  tm_polygons(col = "estimate",
              style = "quantile")

Spatial Computations

Let’s say we didn’t have the variables we wanted.


Here is a map of all vehicle crashes in MA in 2022 (so far).


I want to subset these to only look at Hampshire County.

# load sf for spatial data work
library(sf)

# read in crash data
ma_crash = read_sf(
  here::here("week_14/geo/data/2022_Crashes.geojson"))

# plot MA crashes
tm_shape(ma_crash) +
  tm_dots()

Subsetting my Geography - Setup

First I need two geospatial data sources. We have crashes, now I’ll get just Hampshre county data.


Before doing anything with these geo data files, I need to make sure their Coordinate Reference Systems (CRS) match.

# load in jump hampshire county by census tract
hs_tracts <- get_acs(geography = "tract", 
              variables = c(medincome = "B19013_001"), 
              state = "MA", 
              county = "Hampshire",
              geometry = TRUE,
              year = 2018)

# figure out the CRS of each data object
st_crs(hs_tracts) == st_crs(ma_crash)
[1] FALSE
# make them match
ma_crash_transformed = st_transform(ma_crash,
                                    st_crs(hs_tracts))

Subsetting my Geography - Subset

Now that all my data matches, I can actually subset things.


Here I used st_join() to join my two data files, but only keep the points within my Hampshire County polygons.


My left = FALSE is saying this is not a left join, so discard anything not in that overlap.

# filter the points
hs_crashes = st_join(ma_crash_transformed,
                     hs_tracts,
                     join = st_within,
                     left = FALSE)

# plot
tm_shape(hs_crashes) +
  tm_dots()

Geographic Computations

Let’s take things a step further, and do some geo computations.


Specifically, I want to count how many crashes there were within each census tract in Hampshire county.


Here we see Amherst seems to have a lot of crashes.

# get all the crashes within each tract
tract_crashes = st_contains(hs_tracts,
                            hs_crashes)

# count the number of crashes and add to hs_tracts as column
hs_tracts$crash_count = sapply(tract_crashes,
                               length)

# plot crashes per tract
tm_shape(hs_tracts) +
  tm_polygons(col = "crash_count")

For Next Time

Topic

Project Work Time

To-Do

  • Finish Worksheet
  • Work on Project 3