Teaching with Data for the Public Good


Materials for the JSM 2020 session "Teaching with Data for the Public Good"

View the Project on GitHub mine-cetinkaya-rundel/teach-data-public-good

JSM 2020 - Invited session

Wed, 8/5/2020
1:00 PM - 2:50 PM ET
Click here to join session

Session abstract The importance of using real data in teaching data science and statistics is undeniable. Using real data also presents an opportunity for us educators to bring significant questions with social implications into the classroom. However finding real datasets that map on to specific topics, concepts, and learning goals is not always easy. Each of the speakers in this session will present a case study they use in their teaching that features a “data for the public good” element and covers specific phases of the data analysis cycle, including data import, tidy, transform, visualize, model, and communicate, followed by a discussion on teaching with not only real, but also relatable and significant data and the technical and pedagogical challenges associated with this goal will follow. Materials from the session will be made available as a public repository for others to easily adapt to their classrooms.

Inside-Out Statistics: Teaching Evidence-Based Reasoning in Introductory Courses [Slides]

Kelly Bodwin (California Polytechnic State University)

A classic introductory statistics class typically covers, one by one, a suite of basic statistical tests. In this talk, I argue for an “inside-out” approach to introductory statistics, in which evidence-based reasoning is the focal point rather than specific procedures. I will share my experiences in teaching a 10-week Introductory Statistics course, of which the first 5 weeks of material did not touch upon any formal statistical tests. I offer for discussion three conceptual colloquialisms on which the course was built: “Imaginary Results”, “Expectations vs. Reality””, and “Are You Convinced?”. In addition, I will share how creative coding tasks were incorporated into the course to promote conceptual understanding.

Kelly Bodwin is an Assistant Professor of Statistics at Cal Poly San Luis Obispo. She teaches statistics courses at a variety of levels, usually focused on or incorporating computing in R. Her current research interests are in developing R tools for education, applications in the the Digital Humanities, and methodologies for high-dimensional clustering.

Who’s Underrepresented? Modeling Undercounts in the U.S. Census [Slides]

Maria Tackett (Duke University)

In this talk, we discuss a learning module about missing data using the United States Census. The Census is a massive data collection project conducted every ten years to obtain a snapshot of the people who live in the country. There are groups of people, however, who are regularly undercounted and thus underrepresented in the data. Because data from the Census is used for important functions such as apportioning seats in the U.S. House of Representatives, it is important to understand the limitations of the data and the potential societal implications. Two models, the Demographic Analysis (DA) and the Dual-Systems Estimates (DSE), have been developed to measure undercounts in the Census. We will discuss a lesson for an undergraduate regression analysis course where students examine the effectiveness of these models and develop their own using publicly available data. We describe the learning outcomes from this module and how they connect to the data analysis cycle presented in R for Data Science. We conclude with the potential challenges and strategies for implementing this lesson in a course.

Maria Tackett is an Assistant Professor of the Practice in the Department of Statistical Science at Duke University. Her current research focuses on undersanding the factors that impact students’ sense of community and self-efficacy in undergraduate math and statistics courses. Prior to coming to Duke in 2018, Maria earned a Ph.D. in Statistics from the University of Virginia and worked as a statistician in industry.

Difficult Dialogues: Communicating Data Analyses Effectively [Slides]

Jo Hardin (Pomona College)

Publicly available data from the Stanford Computational Policy Lab on racial profiling in “stop data” (the information gathered when police officers make discretionary stops) is used for a lab that can be modified to fit a variety of levels for a statistics or data science classroom. We work though the Policy Lab’s published report as well as providing ideas for new statistical insight into the data. The Policy Lab has compiled datasets from state patrol reports of most states as well as local police stop data in dozens of cities. The data will be downloaded directly from the Policy Lab’s repository, and through complete analyses, students are required to model, visualize, and communicate their results effectively.

Jo Hardin is a statistician at Pomona College.  Her research focuses on applications to large biological datasets, and she is active in the statistics and data science education community.  This summer, she and her colleagues have put together a series of blogs about (teaching) ethics in data science at teachdatascience.com.

Discussion [Comments]

Jenny Bryan (RStudio)

Jenny Bryan is a software engineer at RStudio. As part of the tidyverse team, she develops open source packages to make data science faster, easier, and more fun. She is also the founder of the Master of Data Science Program at University of British Columbia where she is now an adjunct professor.

This session was organised by Mine Çetinkaya-Rundel (University of Edinburgh, RStudio, Duke University) and sponsored by the Section on Statistics and Data Science Education and the International Association for Statistical Education.