Jo Hardin, Pomona College
8.5.2020
With our deepest respects to the Tongva and Serrano Peoples, past, present, and emerging.
Just under 100 datasets, ~1 TB of data. https://openpolicing.stanford.edu/data/
raleigh_df %>%
# remove missing data
filter(!is.na(sex) & !is.na(race)) %>%
# use group_by and summarize to count number of stops per category
group_by(sex, race) %>%
summarize(count = n()) %>%
ungroup() %>%
# find the percentage of age/race stops
mutate(percentage = round(prop.table(count), digits = 2)) %>%
… continued
# plot percentages
ggplot(mapping = aes(x = sex, y = percentage,
fill = race,
label = scales::percent(percentage))) +
geom_bar(position = "dodge", stat = "identity") +
# adjust labels
geom_text(position = position_dodge(width = .9),
vjust = -0.5,
size = 3) +
scale_y_continuous(labels = scales::percent) +
# provide labels
ggtitle("Race & gender breakdown, % out of total")
Accessing, joining, and wrangling all the datasets.
With facet_geo
search
n.b., all the observations were traffic stops, so we can’t model demographics of who was pulled over (model search
instead).
raleigh_search <- glm(formula = search ~ age * race,
family = "binomial", data = raleigh_df,
subset = (race %in% c("Black", "white")))
raleigh_search %>% tidy()
## # A tibble: 4 x 5
## term estimate std.error statistic p.value
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercept) -1.90 0.0240 -78.9 0.
## 2 age -0.0327 0.000759 -43.1 0.
## 3 racewhite -0.785 0.0402 -19.5 5.39e-85
## 4 age:racewhite 0.00200 0.00125 1.60 1.10e- 1
Data Feminism by Catherine D’Ignazio & Lauren F. Kleinhttps://datafeminism.io/
Machine Bias ProPublica https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing