Teaching Data Science, Reproducibly


Materials for the "Teaching Data Science, Reproducibly" workshop at ICOTS 2018

View the Project on GitHub mine-cetinkaya-rundel/teach-data-sci-icots2018

Success in data science and statistics is dependent on the development of both analytical and computational skills. As statistics educators we are more familiar and comfortable with teaching the former, but the latter is becoming increasingly important. The goal of this workshop is to equip educators with concrete information on content and infrastructure for painlessly introducing modern computation into a data science and/or statistics curriculum. In addition to gaining technical knowledge, participants will engage in discussion around the decisions that go into choosing infrastructure and developing curriculum. Workshop attendees will work through several exercises from existing courses and get first-hand experience with using relevant tool-chains and techniques, including R/RStudio, literate programming with R Markdown, and collaboration, version control, and automated feedback with git/GitHub.

This is a two part workshop at ICOTS 2018, presented by Dr. Mine Cetinkaya-Rundel (Duke University + RStudio) and Dr. Colin Rundel (Duke University).

This workshop is aimed at participants who are interested in the role of computing in either a Statistics or Data Science curriculum. This includes faculty who are designing new courses or programs as well as those who are interested in adding or improving a computational component to an existing course.

Part 1 will introduce teaching data science and statistics courses using R and RStudio and Part 2 will focus on best practices for configuring and deploying infrastructure to support these tools along with a version control system in the classroom.

Please bring your own laptop.


Both parts will take place in room 2-AVS (Level 2, AV Study Room) in Terrsa Hall.

Part 1 - Saturday, July 7

Time Activity
09:30 - 09:45 Welcome
09:45 - 10:15 Getting started
  :heavy_check_mark: RStudio Cloud
  :heavy_check_mark: UN Votes: Rmd + Output
10:15 - 10:30 Computing with R and RStudio
10:30 - 10:45 Break :tea:
10:45 - 11:30 Literate programming with R Markdown
  :heavy_check_mark: World Cup: Rmd + Codebook
11:30 - 11:45 Data analysis with the tidyverse
11:45 - 12:00 Break :tea:
12:00 - 13:30 Version control with Git and GitHub
  :heavy_check_mark: GitHub.com
  :heavy_check_mark: Demo GitHub repository

Part 2 - Sunday, July 8

Time Activity
09:30 - 10:00 A first course in data science
10:00 - 11:00 Computing infrastructure with RStudio Cloud
  :heavy_check_mark: New workspace in RStudio Cloud
11:00 - 11:15 Break :tea:
11:15 - 12:45 Course management with GitHub (part 1)
  :heavy_check_mark: Workshop Organization
12:45 - 12:45 Break :tea:
12:45 - 13:30 Course management with GitHub (part 2)