class: center, middle, inverse, title-slide # 04
reproducible computing ## đź‘Ż with R Markdown, Git, and GitHub
đź”—
bit.ly/teach-ds-wsc
###
dr. mine çetinkaya-rundel
dr. colin rundel ### 23 june 2021 --- class: middle, inverse # Reproducibility in the classroom --- ## Reproducibility checklist - Are the tables and figures reproducible from the code and data? - Does the code actually do what you think it does? - In addition to what was done, is it clear *why* it was done? (e.g., how were parameter/settings chosen?) - Can the code be used for other data? - Can you extend the code to do other things? --- ## Ambitious goal + many other concerns We need an environment where - data, analysis, and results are tightly connected, or better yet, inseparable - reproducibility is built in + the original data remains untouched + all data manipulations and analyses are inherently documented - documentation is human readable and syntax is minimal --- ## Roadmap 1. Scriptability `\(\rightarrow\)` R 2. Literate programming `\(\rightarrow\)` R Markdown 3. Version control `\(\rightarrow\)` Git / GitHub --- ## Why R Markdown? - **Reproducibility:** Train new analysts whose only workflow is a reproducible one - **Pedagogy:** - Code + output + prose together - Syntax highlighting FTW! - **Efficiency:** Consistent formatting -> easier grading --- ## Tips for starting with R Markdown - Minimal YAML - Minimal chunk options - Use well scaffolded R Markdown documents - Encourage students to knit early and often - *New*: Use the visual editor! --- ## Why Git + GitHub? - **Version control:** Lots of mistakes along the way, need ability to revert - **Collaboration:** Platform that removes barriers to well documented collaboration - **Accountability:** Transparent commit history - **Early introduction:** - Mastery takes time, earlier start the better - Marketability --- ## Goals for version control with Git / GitHub - Centralize the distribution and collection of student assignments - Enable students to work collaboratively - Force students to use git & GitHub - Version control is a best practice for reproducible research - Widely used in industry - Publish / share work --- class: middle, inverse # GitHub as your Learning Management System --- ## Basic Structure On Github - 1 Organization / class - 1 repo / (student or team) / assignment - Student and team repos private by default --- ## Setting up a course 1. Create course organization on GitHub (https://github.com/organizations/new) 1. Request education discount for organization (https://education.github.com/discount_requests/teacher_application) 1. Invite students to organization 1. Create assignment(s) 1. Collect assignments(s) 1. Grade assignment(s) --- ## Demo - What does a course organization look like? - What does a starter repo look like? - What does submitted student work look like? - How do we facilitate creating student repositories and other repeated interactions? --- ## 📦 ghclass ### Tools for managing github class organization accounts - Made for instructors who use GitHub for class management, e.g. students submit assignments via GitHub repos - The package assumes that you’re an R user, and you probably teach R as well, though that’s not a requirement since this package is all about setting up repositories with the right permissions, not what your students put in those repositories. - The package is still under active development and is not currently on CRAN but can be installed from GitHub using: ```r devtools::install_github("rundel/ghclass") library(ghclass) ``` --- ## Using ghclass to distribute an assignment .small[ ```r org_create_assignment( org = "statprog-s1-2020", # Class github organization user = roster$github, # Student github usernames repo = roster$hw1, # Students' assign repo team = roster$hw1, # Students' assign team source_repo = "statprog-s1-2020/hw01", # Template repository private = TRUE # Repository privacy ) ``` ``` #> ✓ Mirrored repo 'statprog-s1-2020/hw01' to repo 'statprog-s1-2020/hw01-team01'. ... #> ✓ Created team 'hw01-team01' in org 'statprog-s1-2020'. ... #> ✓ Added user 'jane_doe' to team 'hw01-team01'. #> ✓ Added user 'john_doe' to team 'hw01-team01'. ... #> ✓ Team 'hw01-team01' given 'push' access to repo 'statprog-s1-2020/hw01-team01' ``` ] --- ## Options for giving feedback on GitHub - Use the GitHub UI to add issues to each student's repo - Clone student repos locally, add feedback to code / notebook, push back to GitHub .small[ ```r hw01_repos <- org_repos(org = "statprog-s1-2020", "hw01_") local_repo_clone(hw01_repos, local_path = "hw01") # Make chages local_repo_commit(repo_dir = "hw01", message = "Feedback") local_repo_push(repo_dir = "hw01") ``` ] - Use the `issue_create()` function to post issues to all (or some) repos at once --- ## Get big picture stats for an assignment .small[ ```r org_repo_stats("statprog-s1-2020", filter = "hw01_", branch = "main") ``` ``` ## # A tibble: 45 x 10 ## repo private branch commits last_update open_issues closed_issues ## <chr> <lgl> <chr> <int> <dttm> <int> <int> ## 1 statpro… TRUE main 26 2020-10-08 12:31:31 1 0 ## 2 statpro… TRUE main 29 2020-10-08 15:30:51 1 0 ## 3 statpro… TRUE main 99 2020-10-09 15:59:31 1 0 ## 4 statpro… TRUE main 63 2020-10-09 14:24:57 1 0 ## 5 statpro… TRUE main 31 2020-10-08 15:57:40 1 0 ## 6 statpro… TRUE main 27 2020-10-09 14:40:16 1 0 ## 7 statpro… TRUE main 55 2020-10-09 05:38:47 1 0 ## 8 statpro… TRUE main 37 2020-10-09 14:19:34 1 0 ## 9 statpro… TRUE main 24 2020-10-09 12:34:39 1 0 ## 10 statpro… TRUE main 39 2020-10-08 16:32:11 1 0 ## # … with 35 more rows, and 3 more variables: open_prs <int>, merged_prs <int>, ## # closed_prs <int> ``` ] --- ## Peer review - Once an assignment is completed you can let other students/teams into a repository and they can provide peer review. - Peer review is an incredibly effective learning experience for both the reviewers and the reviewees, however it does require coordination and being able to carve out sufficient time in the course schedule. - Tip: Do not solely count on peer review for feedback as some reviewers might be less diligent than others. Teams reviewing teams, as opposed to individual reviewing individuals, can partially address this. See the [Peer review with ghclass (dev) vignette](https://rundel.github.io/ghclass-dev/) for more. --- ## Git + GitHub lessons learned - If you plan on using git in class, start on day one, don't wait until the "right time" - First assignment should be individual, not team based to avoid merge conflicts - Students need to remember to pull before starting work - Impossible (?) to avoid shell intervention every once in a while - Remind students on that future projects should go on GitHub with PI / supervisor approval