Duke University + Posit, PBC
Illustration of stars from the cover of The Little Prince.
Photo of the pop-up version of the book The Little Prince, open to a page that shows the Little Prince sitting on top of a mountain.
Illustration of stars from the cover of The Little Prince.
multiple outputs
accessibility checks
HTML
data-hello.qmd
::: {.chapterintro data-latex=""}
Scientists seek to answer questions using rigorous methods and careful observations.
These observations -- collected from the likes of field notes, surveys, and experiments -- form the backbone of a statistical investigation and are called **data**.
Statistics is the study of how best to collect, analyze, and draw conclusions from data.
In this first chapter, we focus on both the properties of data and on the collection of data.
:::
With SCSS for HTML:
With TeX for PDF:
ims-style.tex
\newenvironment{mdframedwithfootChapterintro}
{
\savenotes
\begin{mdframed}[%
topline=true, bottomline=true, linecolor=oiB, linewidth=1.4pt,
rightline=false, leftline=false,
backgroundcolor=oiLB]
\renewcommand{\thempfootnote}{\arabic{footnote}}
}
{
\end{mdframed}
\spewnotes
}
\newenvironment{chapterintro}{
\vspace{4mm}
\begin{mdframedwithfootChapterintro}
\begin{minipage}[t]{0.10\textwidth}
{$\:$ \\ \setkeys{Gin}{width=2.5em,keepaspectratio}\includegraphics{images/_icons/chapterintro.png}}
\end{minipage}
\hfill
\begin{minipage}[t]{0.90\textwidth}
\setlength{\parskip}{1em}
\large
}{\end{minipage}
\end{mdframedwithfootChapterintro}
\vspace{4mm}
}
_quarto.yml
_quarto.yml
format:
html:
theme:
light: [cosmo, scss/ims-style.scss]
dark: [cosmo, scss/ims-style-dark.scss]
code-link: true
mainfont: Atkinson Hyperlegible
monofont: Source Code Pro
author-meta: "Mine Çetinkaya-Rundel and Johanna Hardin"
lightbox:
match: auto
loop: false
fig-dpi: 300
fig-show: hold
fig-align: center
pdf:
include-in-header: latex/ims-style.tex
include-after-body: latex/after-body.tex
documentclass: book
classoption:
- 10pt
- openany
pdf-engine: xelatex
biblio-style: apalike
keep-tex: true
block-headings: false
top-level-division: chapter
fig-dpi: 300
fig-show: hold
fig-pos: H
tbl-pos: H
fig-align: center
toc: true
toc-depth: 2
HTML
HTML - Light
HTML - Dark
ims-style-dark.scss
Painstakingly add \clearpage
that qmd \(\rightarrow\) LaTeX will process and qmd \(\rightarrow\) HTML will ignore:
data-hello.qmd
These two summary statistics are useful in looking for differences in the groups, and we are in for a surprise: an additional 8% of patients in the treatment group had a stroke!
This is important for two reasons.
First, it is contrary to what doctors expected, which was that stents would *reduce* the rate of strokes.
Second, it leads to a statistical question: do the data show a "real" difference between the groups?
\clearpage
This second question is subtle.
Suppose you flip a coin 100 times.
While the chance a coin lands heads in any given coin flip is 50%, we probably won't observe exactly 50 heads.
This type of variation is part of almost any type of data generating process.
and another…
data-hello.qmd
To answer these questions, data must be collected, such as the `county` dataset shown in @tbl-county-df.
Examining \index{summary statistic}**summary statistics** can provide numerical insights about the specifics of each of these questions.
Alternatively, graphs can be used to visually explore the data, potentially providing more insight than a summary statistic.
\clearpage
\index{scatterplot}**Scatterplots** are one type of graph used to study the relationship between two numerical variables.
@fig-county-multi-unit-homeownership displays the relationship between the variables `homeownership` and `multi_unit`, which is the percent of housing units that are in multi-unit structures (e.g., apartments, condos).
Each point on the plot represents a single county.
and another…
By building on things qmd \(\rightarrow\) HTML will happily ignore and qmd \(\rightarrow\) will process: \index{}
data-hello.qmd
We can compute summary statistics from the table to give us a
better idea of how the impact of the stent treatment differed
between the two groups.
A **summary statistic** is a single number summarizing data
from a sample.\index{summary statistic}
For instance, the primary results of the study after 1 year
could be described by two summary statistics: the proportion
of people who had a stroke in the treatment and control groups.
\index{}
tags:data-hello.qmd
We can compute summary statistics from the table to give us a better idea of how the impact of the stent treatment differed between the two groups.
A **summary statistic** is a single number summarizing data from a sample.\index{summary statistic}
For instance, the primary results of the study after 1 year could be described by two summary statistics: the proportion of people who had a stroke in the treatment and control groups.
typst
for stylingTODAY
One source
+
2 style files
\(\downarrow\)
2 outputs
FUTURE
One source
+
1 style file
\(\downarrow\)
2 outputs
typst
for tablesTODAY
data-hello.qmd
typst
for tablesTODAY
data-hello.qmd
FUTURE
fig-alt
data-hello.qmd
#| label: fig-county-multi-unit-homeownership
#| ...
#| fig-alt: A scatterplot of homeownership (on the y-axis) versus the percent of
#| housing units that are in multi-unit structures (on the x-axis) for US
#| counties. The observation from Chattahoochee County, Georgia
#| is highlighted as having a multi-unit rate of 39.4% and a
#| homeownership rate of 31.3%.
ggplot(county, aes(x = multi_unit, y = homeownership)) +
geom_point(alpha = 0.3, fill = IMSCOL["black", "full"], shape = 21) +
...
fig-alt
s?fig-alt
s?fig-alt
sLoad packages:
fig-alt
sFind cells that have ggplot()
but not fig-alt
:
fig-alt
sGet labels of cells without fig-alt
:
# pak::pak("rundel/parsermd") # need dev version
library(parsermd)
library(here)
missing_fig_alt <- parse_qmd(here::here("fig-alt-check/data-hello.qmd")) |>
rmd_select(
has_type("rmd_chunk") &
has_code("ggplot\\(") &
!has_option("fig-alt")
)
rmd_node_label(missing_fig_alt)
[1] "fig-county-multi-unit-homeownership"
fig-alt
sGet contents of cells without fig-alt
:
# pak::pak("rundel/parsermd") # need dev version
library(parsermd)
library(here)
missing_fig_alt <- parse_qmd(here::here("fig-alt-check/data-hello.qmd")) |>
rmd_select(
has_type("rmd_chunk") &
has_code("ggplot\\(") &
!has_option("fig-alt")
)
as_document(missing_fig_alt)
[1] "```{r}"
[2] "#| label: fig-county-multi-unit-homeownership"
[3] "#| fig-cap: A scatterplot of homeownership versus the percent of housing units that are"
[4] "#| in multi-unit structures for US counties. The highlighted dot represents Chattahoochee"
[5] "#| County, Georgia, which has a multi-unit rate of 39.4% and a homeownership rate of"
[6] "#| 31.3%."
[7] "ggplot(county, aes(x = multi_unit, y = homeownership)) +"
[8] " geom_point(alpha = 0.3, fill = IMSCOL[\"black\", \"full\"], shape = 21) +"
[9] " labs("
[10] " x = \"Percent of housing units in that are multi-unit structures\","
[11] " y = \"Homeownership rate\""
[12] " ) +"
[13] " geom_point("
[14] " data = county |> filter(name == \"Chattahoochee County\"),"
[15] " size = 3, stroke = 2, color = IMSCOL[\"red\", \"full\"], shape = 1"
[16] " ) +"
[17] " geom_text("
[18] " data = county |> filter(name == \"Chattahoochee County\"),"
[19] " label = \"Chattahoochee County\", fontface = \"italic\","
[20] " nudge_x = 21, nudge_y = -5, color = IMSCOL[\"red\", \"full\"]"
[21] " ) +"
[22] " guides(color = FALSE) +"
[23] " geom_segment("
[24] " data = county |> filter(name == \"Chattahoochee County\"),"
[25] " aes("
[26] " x = 0, y = homeownership, xend = multi_unit, yend = homeownership,"
[27] " color = IMSCOL[\"red\", \"full\"]"
[28] " ), linetype = \"dashed\""
[29] " ) +"
[30] " geom_segment("
[31] " data = county |> filter(name == \"Chattahoochee County\"),"
[32] " aes("
[33] " x = multi_unit, y = 0, xend = multi_unit, yend = homeownership,"
[34] " color = IMSCOL[\"red\", \"full\"]"
[35] " ), linetype = \"dashed\""
[36] " ) +"
[37] " scale_x_continuous(labels = percent_format(scale = 1)) +"
[38] " scale_y_continuous(labels = percent_format(scale = 1))"
[39] "```"
[40] ""
Illustration of stars from the cover of The Little Prince.
Illustration of stars from the cover of The Little Prince.
leveraging R
GitHub actions
_common.R
Leverage your R knowledge to achieve consistent output:
_common.R
set.seed(1014)
knitr::opts_chunk$set(
comment = "#>",
collapse = TRUE,
fig.retina = 2,
fig.width = 6,
fig.asp = 2/3,
fig.show = "hold"
)
options(
dplyr.print_min = 6,
dplyr.print_max = 6,
pillar.max_footer_lines = 2,
pillar.min_chars = 15,
stringr.view_n = 6,
cli.num_colors = 0,
cli.hyperlink = FALSE,
pillar.bold = TRUE,
width = 77 # 80 - 3 for #> comment
)
ggplot2::theme_set(ggplot2::theme_gray(12))
_common.R
Use your R function writing skills to avoid duplication:
_common.R
# use results: "asis" when setting a status for a chapter
status <- function(type) {
status <- switch(type,
polishing = "should be readable but is currently undergoing final polishing",
restructuring = "is undergoing heavy restructuring and may be confusing or incomplete",
drafting = "is currently a dumping ground for ideas, and we don't recommend reading it",
complete = "is largely complete and just needs final proof reading",
stop("Invalid `type`", call. = FALSE)
)
class <- switch(type,
polishing = "note",
restructuring = "important",
drafting = "important",
complete = "note"
)
cat(paste0(
"\n",
":::: status\n",
"::: callout-", class, " \n",
"You are reading the work-in-progress second edition of R for Data Science. ",
"This chapter ", status, ". ",
"You can find the complete first edition at <https://r4ds.had.co.nz>.\n",
":::\n",
"::::\n"
))
}
_common.R
Use your R function writing skills to avoid duplication:
announcement
_quarto.yml
website:
announcement:
icon: cone-striped
dismissable: true
content: |
"You are reading the work-in-progress second edition of
R for Data Science. This chapter **is currently a dumping
ground for ideas, and we don't recommend reading it**.
You can find the complete first edition at
<https://r4ds.had.co.nz>."
type: primary
position: below-navbar
freeze
Whenever faced with a problem, some people say “Let’s use regular expressions.” Now, they have two problems.
Whenever faced with a problem, some people say “Let’s use
regular expressionsGitHub actions.” Now, they havetwoso many more problems.
Illustration of stars from the cover of The Little Prince.
Illustration of stars from the cover of The Little Prince.
multiple languages
multiple environments
.qmd
Each being executed with their own engine:
notebooks/authoring-py.qmd
freeze
freeze
r-wasm/quarto-live
Photo of the pop-up version of the book The Little Prince, open to a page that shows the Little Prince sitting on top of a mountain, in front of Seattle skyline view.
thank you!
🔗 bit.ly/books-conf24
mine-cetinkaya-rundel/quarto-books-conf24