Materials for the JSM 2020 session "Teaching with Data for the Public Good"

View the Project on GitHub mine-cetinkaya-rundel/teach-data-public-good

Me, in previous life as University of British Columbia stats prof

- STAT 545 (Exploratory?) Data Analysis grad course
- STAT 540 Statistics for High Dimensional Biology
- Master of Data Science

- Ack, grading! Don’t miss it at all.
- “want to hire?”, unofficial grad rubric
- Misguided litigation re: confidence interval verbiage
- Predict what you’ll see: Either you were right (I win!) or learn something / correct a mistake (I win!)
- Pre-commit to what would convince you –> harder to move the goal posts

A well-reasoned, informal analysis is much better than a formal statistical analysis that lacks intuition.

“They might prefer imperfect solutions to ill-defined problems than perfect solutions to well-defined non-problems.” Gower discussing Cormack (1971)

- Transform/Visualize vs Model, oh yaasss

In practice vs In class

80/20 20/80 - Tension between using real data but also MAKING SURE, e.g. missing data comes up a lot. So hard to find the right data and keep it fresh. Bodwin and Tackett are working real data into courses with very different goals. The more specialized the mandate (“regression”), the harder it is to find real-world data, because extra constraints?
- Difficulty around simply getting the data out of awkward places/formats and into students hands quickly.

- “Goldilocks level of data wrangling” Ha! So hard to get this “just right”.
- “Each student can work with a different dataset” <– neat way to get this w/o impractical explosion of variety
- “Ability (need!) to work with SQL” <– “Teaching-driven personal growth”
- How to communicate: the kind of thing it’s tempting to shy away from because “it’s not statistics”, but learning to communicate is equipping students for the future. Similar to the attitudes re: teaching programming.

The more applied, real-world the course, the more it exposed gaps in what I was a pro at. Not being the expert all the time. Is this more or less risky or rewarding if you already don’t meet the prof stereotype?

Risks vs. rewards of working with, e.g., data on COVID, slave trade, policing. One person’s topical is another persons lived experience. How do you do this with great empathy and humility? Also important to compare to realistic baseline: people weren’t 100% happy with the existing “tired” datasets, so you can’t expect to make everyone perfectly happy with new “read world” datasets either.