Difficult dialogues:

communicating data analyses effectively

Jo Hardin, Pomona College

8.5.2020

Land Acknowledgement

With our deepest respects to the Tongva and Serrano Peoples, past, present, and emerging.

http://www.laalmanac.com/history/hi05.php

Stanford Policing Data

https://openpolicing.stanford.edu/

Stanford Policing Data

Just under 100 datasets, ~1 TB of data. https://openpolicing.stanford.edu/data/

Why this project?

Engaging questions (for me and for students)
Goldilocks level of data wrangling
Each student can work with a different dataset
Ability (need!) to work with SQL

Step 1: get data

con <- dbConnect(
  MySQL(), host = "XXX", user = "XXX",
  password = "XXX", dbname = "XXX")

raleigh_df <- DBI::dbGetQuery(con, "SELECT * FROM NCraleigh")

Step 2: data viz

raleigh_df %>%
  
  # remove missing data
  filter(!is.na(sex) & !is.na(race)) %>%
  
  # use group_by and summarize to count number of stops per category
  group_by(sex, race) %>%
  summarize(count = n()) %>%
  ungroup() %>%
  
  # find the percentage of age/race stops
  mutate(percentage = round(prop.table(count), digits = 2)) %>%

Step 2: data viz

… continued

  # plot percentages
  ggplot(mapping = aes(x = sex, y = percentage, 
                       fill = race,
                       label = scales::percent(percentage))) +
  geom_bar(position = "dodge", stat = "identity") +
  
  # adjust labels
  geom_text(position = position_dodge(width = .9),    
            vjust = -0.5,   
            size = 3) + 
  scale_y_continuous(labels = scales::percent) +

  # provide labels
  ggtitle("Race & gender breakdown, % out of total")

Step 2: data viz

Step 2: advanced data viz

Accessing, joining, and wrangling all the datasets.

Step 2: advanced data viz

With facet_geo

Step 3: modeling `search`

n.b., all the observations were traffic stops, so we can’t model demographics of who was pulled over (model search instead).

raleigh_search <- glm(formula = search  ~ age * race, 
    family = "binomial", data = raleigh_df, 
    subset = (race %in% c("Black", "white")))

raleigh_search %>% tidy()

## # A tibble: 4 x 5
##   term          estimate std.error statistic  p.value
##   <chr>            <dbl>     <dbl>     <dbl>    <dbl>
## 1 (Intercept)   -1.90     0.0240      -78.9  0.      
## 2 age           -0.0327   0.000759    -43.1  0.      
## 3 racewhite     -0.785    0.0402      -19.5  5.39e-85
## 4 age:racewhite  0.00200  0.00125       1.60 1.10e- 1

Step 3: data viz of model

Language around sensitive data

$Data Feminism by Catherine D’Ignazio & Lauren F. Kleinhttps://datafeminism.io/$

Data Feminism by Catherine D’Ignazio & Lauren F. Kleinhttps://datafeminism.io/

Bias or Oppression?

Machine Bias ProPublica https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing

Language around sensitive data

$Data Feminism by Catherine D’Ignazio & Lauren F. Klein, https://datafeminism.io/$

Data Feminism by Catherine D’Ignazio & Lauren F. Klein, https://datafeminism.io/

Equity vs Equality

Image credit: Interaction Institute for Social Change | Artist: Angus Maguire.

Traffic stops

What are the systems that have created structural racial discrepancies in the US?

Redlining: systematic exclusion of people of color from obtaining mortgages (Federal lending programs)
Home ownership is lower for people of color
Housing discrimination continues today and creates racial differences in where people live and work.

Traffic stops

Our results indicate that police stops and search decisions suffer from persistent racial bias and point to the value of policy interventions to mitigate these disparities.

While the research shows that stops are neither equal nor equitable, a discussion on equity belongs in the conversation around racial discrepancies in traffic stops.

Evidence of racial disparities in traffic stops.

Significance Magazine (August 2020)

Measures for determining racial disparities:

outcomes test (was contraband found?)
veil of darkness (is there a difference just before vs just after sunset?)

GitHub repo: https://github.com/bakuninpr/traffic-stops-and-racial-disparity

Traffic Stop

Motorcycle with Montana plates vs. Van with Colorado plates, traffic stop on Sunday, Aug 2, 2020.

Family with young children handcuffed.

8am weekly zoom calls

The undergraduates who did the complete analysis and wrote the report are Pomona College students: Amber Lee (‘22), Arm Wonghirundacha (‘22), Emma Godfrey (‘21), Ethan Ong (‘21), Ivy Yuan (‘21), Oliver Chang (‘22), and Will Gray (‘22).

Difficult dialogues:

communicating data analyses effectively

Land Acknowledgement

Stanford Policing Data

Stanford Policing Data

Why this project?

Step 1: get data

Step 2: data viz

Step 2: data viz

Step 2: data viz

Step 2: advanced data viz

Step 2: advanced data viz

Step 3: modeling search

Step 3: data viz of model

Language around sensitive data

Bias or Oppression?

Language around sensitive data

Equity vs Equality

Traffic stops

Traffic stops

Significance Magazine (August 2020)

Traffic Stop

8am weekly zoom calls

Thank you!

Step 3: modeling `search`