Reproducible,
dynamic,
and elegant
books with Quarto

Mine Çetinkaya-Rundel

Duke University + Posit, PBC

“Making” books

Illustration of stars from the cover of The Little Prince.

Photo of the pop-up version of the book The Little Prince, open to a page that shows the Little Prince sitting on top of a mountain.

The books

Cover of the OpenIntro textbook Introduction to Modern Statistics, 2nd Edition.

Illustration of a red star from the cover of The Little Prince.

Cover of the book R for Data Science, 2nd Edition.

Illustration of a red star from the cover of The Little Prince.

Mockup of cover of the book Quarto - The Definitive Guide.

Illustration of a red star from the cover of The Little Prince.

Cover of the OpenIntro textbook Introduction to Modern Statistics, 2nd Edition.

Illustration of a red star from the cover of The Little Prince.

Illustration of stars from the cover of The Little Prince.

Cover of the OpenIntro textbook Introduction to Modern Statistics, 2nd Edition.

Illustration of a red star from the cover of The Little Prince.

Illustration of a gold star from the cover of The Little Prince. multiple outputs

Illustration of a gold star from the cover of The Little Prince. accessibility checks

Two outputs

HTML

Screenshot of introduction to Chapter 1 of Introduction to Modern Statistics in HTML in light mode.

PDF

Screenshot of introduction to Chapter 1 of Introduction to Modern Statistics in PDF.

From one source

data-hello.qmd
::: {.chapterintro data-latex=""}
Scientists seek to answer questions using rigorous methods and careful observations.
These observations -- collected from the likes of field notes, surveys, and experiments -- form the backbone of a statistical investigation and are called **data**.
Statistics is the study of how best to collect, analyze, and draw conclusions from data.
In this first chapter, we focus on both the properties of data and on the collection of data.
:::

With the help of meticulous styling

With SCSS for HTML:

ims-style.scss
.chapterintro {
  padding: 1em 1em 1em 4em;
  margin-bottom: 10px;
  background: #d5e6ef 5px center/3em no-repeat;
  border-top: 3px solid #569BBD;
  border-bottom: 3px solid #569BBD;
  background-image: url("images/_icons/chapterintro.png");
  background-position: 0.5em 1.5em;
}

With the help of meticulous styling

With TeX for PDF:

ims-style.tex
\newenvironment{mdframedwithfootChapterintro}
{   
    \savenotes
    \begin{mdframed}[%
    topline=true, bottomline=true, linecolor=oiB, linewidth=1.4pt,
    rightline=false, leftline=false,
    backgroundcolor=oiLB]
    \renewcommand{\thempfootnote}{\arabic{footnote}}
    }
{
    \end{mdframed}
    \spewnotes
}

\newenvironment{chapterintro}{
    \vspace{4mm}
    \begin{mdframedwithfootChapterintro}
    \begin{minipage}[t]{0.10\textwidth}
    {$\:$ \\ \setkeys{Gin}{width=2.5em,keepaspectratio}\includegraphics{images/_icons/chapterintro.png}}
    \end{minipage}
    \hfill
    \begin{minipage}[t]{0.90\textwidth}
    \setlength{\parskip}{1em}
    \large
    }{\end{minipage}
    \end{mdframedwithfootChapterintro}
    \vspace{4mm}
}

_quarto.yml

_quarto.yml
format:
  html:
    theme:
      light: [cosmo, scss/ims-style.scss]
      dark: [cosmo, scss/ims-style-dark.scss]
    code-link: true
    mainfont: Atkinson Hyperlegible
    monofont: Source Code Pro
    author-meta: "Mine Çetinkaya-Rundel and Johanna Hardin"
    lightbox: 
      match: auto
      loop: false
    fig-dpi: 300
    fig-show: hold
    fig-align: center
  pdf:
    include-in-header: latex/ims-style.tex
    include-after-body: latex/after-body.tex
    documentclass: book
    classoption: 
      - 10pt
      - openany
    pdf-engine: xelatex
    biblio-style: apalike
    keep-tex: true
    block-headings: false
    top-level-division: chapter
    fig-dpi: 300
    fig-show: hold
    fig-pos: H
    tbl-pos: H
    fig-align: center
    toc: true
    toc-depth: 2

Two outputs

HTML

Screenshot of introduction to Chapter 1 of Introduction to Modern Statistics in HTML in light mode.

PDF

Screenshot of introduction to Chapter 1 of Introduction to Modern Statistics in PDF.

Three outputs

HTML - Light

Screenshot of introduction to Chapter 1 of Introduction to Modern Statistics in HTML in light mode.

HTML - Dark

Screenshot of introduction to Chapter 1 of Introduction to Modern Statistics in HTML in dark mode.

PDF

Screenshot of introduction to Chapter 1 of Introduction to Modern Statistics in PDF.

With even more meticulous styling

Screenshot of introduction to Chapter 1 of Introduction to Modern Statistics in HTML in dark mode.

ims-style-dark.scss
$body-bg: #222;

.chapterintro {
  padding: 1em 1em 1em 4em;
  margin-bottom: 10px;
  background: lighten($body-bg, 10%) 5px center/3em no-repeat;
  border-top: 3px solid #569BBD;
  border-bottom: 3px solid #569BBD;
  background-image: url("images/_icons/chapterintro.png");
  background-position: 0.5em 1.5em;
}

Unfortunately, it’s not all magic…

Illustration of a gold star from the cover of The Little Prince.

Unfortunately, it’s not all magic…

Illustration of a gold star from the cover of The Little Prince.

Painstakingly add \clearpage that qmd \(\rightarrow\) LaTeX will process and qmd \(\rightarrow\) HTML will ignore:


data-hello.qmd
These two summary statistics are useful in looking for differences in the groups, and we are in for a surprise: an additional 8% of patients in the treatment group had a stroke!
This is important for two reasons.
First, it is contrary to what doctors expected, which was that stents would *reduce* the rate of strokes.
Second, it leads to a statistical question: do the data show a "real" difference between the groups?

\clearpage

This second question is subtle.
Suppose you flip a coin 100 times.
While the chance a coin lands heads in any given coin flip is 50%, we probably won't observe exactly 50 heads.
This type of variation is part of almost any type of data generating process.

Unfortunately, it’s not all magic…

and another…


data-hello.qmd
To answer these questions, data must be collected, such as the `county` dataset shown in @tbl-county-df.
Examining \index{summary statistic}**summary statistics** can provide numerical insights about the specifics of each of these questions.
Alternatively, graphs can be used to visually explore the data, potentially providing more insight than a summary statistic.

\clearpage

\index{scatterplot}**Scatterplots** are one type of graph used to study the relationship between two numerical variables.
@fig-county-multi-unit-homeownership displays the relationship between the variables `homeownership` and `multi_unit`, which is the percent of housing units that are in multi-unit structures (e.g., apartments, condos).
Each point on the plot represents a single county.

Unfortunately, it’s not all magic…

Illustration of a gold star from the cover of The Little Prince.

and another…


data-hello.qmd

\clearpage

## Exercises {#sec-chp1-exercises}

Answers to odd-numbered exercises can be found in [Appendix -@sec-exercise-solutions-01].

Bring back the magic

Illustration of a gold star from the cover of The Little Prince.

By building on things qmd \(\rightarrow\) HTML will happily ignore and qmd \(\rightarrow\) will process: \index{}


data-hello.qmd
We can compute summary statistics from the table to give us a 
better idea of how the impact of the stent treatment differed 
between the two groups.
A **summary statistic** is a single number summarizing data 
from a sample.\index{summary statistic}
For instance, the primary results of the study after 1 year 
could be described by two summary statistics: the proportion 
of people who had a stroke in the treatment and control groups.

In three components

  1. \index{} tags:
data-hello.qmd
We can compute summary statistics from the table to give us a better idea of how the impact of the stent treatment differed between the two groups.
A **summary statistic** is a single number summarizing data from a sample.\index{summary statistic}
For instance, the primary results of the study after 1 year could be described by two summary statistics: the proportion of people who had a stroke in the treatment and control groups.
  1. A .tex file to be appended to the end during render:
after-body.tex
\backmatter
\printindex
  1. Including that file with _quarto.yml:
_quarto.yml
format:
  html:
    ...
  pdf:
    include-in-header: latex/ims-style.tex
    include-after-body: latex/after-body.tex
    ...

Looking forward to typst for styling

Illustration of a blue star from the cover of The Little Prince.

TODAY

One source
+
2 style files
\(\downarrow\)
2 outputs

FUTURE

One source
+
1 style file
\(\downarrow\)
2 outputs

Looking forward to typst for tables

Illustration of a blue star from the cover of The Little Prince.

TODAY

data-hello.qmd
county |>
  select(name, state, pop2017, pop_change, unemployment_rate, median_edu) |>
  slice_head(n = 6) |>
  kableExtra::kbl(
    linesep = "", 
    booktabs = TRUE,
    format.args = list(big.mark = ",")
  ) |>
  kableExtra::kable_styling(
    bootstrap_options = c("striped", "condensed"),
    latex_options = c("striped")
  )

Looking forward to typst for tables

Illustration of a blue star from the cover of The Little Prince.

TODAY

data-hello.qmd
county |>
  select(name, state, pop2017, pop_change, unemployment_rate, median_edu) |>
  slice_head(n = 6) |>
  kableExtra::kbl(
    linesep = "", 
    booktabs = TRUE,
    format.args = list(big.mark = ",")
  ) |>
  kableExtra::kable_styling(
    bootstrap_options = c("striped", "condensed"),
    latex_options = c("striped")
  )


FUTURE

data-hello.qmd
county |>
  select(name, state, pop2017, pop_change, unemployment_rate, median_edu) |>
  slice_head(n = 6) |>
  gt::gt()

Accessibility: fig-alt

data-hello.qmd
#| label: fig-county-multi-unit-homeownership
#| ...
#| fig-alt: A scatterplot of homeownership (on the y-axis) versus the percent of
#|   housing units that are in multi-unit structures (on the x-axis) for US
#|   counties. The observation from Chattahoochee County, Georgia
#|   is highlighted as having a multi-unit rate of 39.4% and a
#|   homeownership rate of 31.3%.
ggplot(county, aes(x = multi_unit, y = homeownership)) +
  geom_point(alpha = 0.3, fill = IMSCOL["black", "full"], shape = 21) +
  ...

A scatterplot of homeownership (on the y-axis) versus the percent of housing units that are in multi-unit structures (on the x-axis) for US counties. The observation from Chattahoochee County, Georgia is highlighted as having a multi-unit rate of 39.4% and a homeownership rate of 31.3%.

Do all my figures have fig-alts?

Results for searching for ggplot keyword in the GitHub interface in the repo for Introduction to Modern Statistics. Search finds 46 files contain this text.

Do all my figures have fig-alts?

Results for searching for ggplot keyword in Positron in the folder for Introduction to Modern Statistics. Search finds 44 files contain this text and there are over 400 mentions of it across these files.

Checking for missing fig-alts

Load packages:

# pak::pak("rundel/parsermd") # need dev version
library(parsermd)
library(here)

Checking for missing fig-alts

Find cells that have ggplot() but not fig-alt:

# pak::pak("rundel/parsermd") # need dev version
library(parsermd)
library(here)

missing_fig_alt <- here::here("fig-alt-check/data-hello.qmd") |>
  parse_qmd() |> 
  rmd_select(
    has_type("rmd_chunk") & 
    has_code("ggplot\\(") & 
    !has_option("fig-alt")
  )

Checking for missing fig-alts

Get labels of cells without fig-alt:

# pak::pak("rundel/parsermd") # need dev version
library(parsermd)
library(here)

missing_fig_alt <- parse_qmd(here::here("fig-alt-check/data-hello.qmd")) |> 
  rmd_select(
    has_type("rmd_chunk") & 
    has_code("ggplot\\(") & 
    !has_option("fig-alt")
  )

rmd_node_label(missing_fig_alt)
[1] "fig-county-multi-unit-homeownership"

Checking for missing fig-alts

Get contents of cells without fig-alt:

# pak::pak("rundel/parsermd") # need dev version
library(parsermd)
library(here)

missing_fig_alt <- parse_qmd(here::here("fig-alt-check/data-hello.qmd")) |> 
  rmd_select(
    has_type("rmd_chunk") & 
    has_code("ggplot\\(") & 
    !has_option("fig-alt")
  )

as_document(missing_fig_alt)
 [1] "```{r}"                                                                                     
 [2] "#| label: fig-county-multi-unit-homeownership"                                              
 [3] "#| fig-cap: A scatterplot of homeownership versus the percent of housing units that are"    
 [4] "#|   in multi-unit structures for US counties. The highlighted dot represents Chattahoochee"
 [5] "#|   County, Georgia, which has a multi-unit rate of 39.4% and a homeownership rate of"     
 [6] "#|   31.3%."                                                                                
 [7] "ggplot(county, aes(x = multi_unit, y = homeownership)) +"                                   
 [8] "  geom_point(alpha = 0.3, fill = IMSCOL[\"black\", \"full\"], shape = 21) +"                
 [9] "  labs("                                                                                    
[10] "    x = \"Percent of housing units in that are multi-unit structures\","                    
[11] "    y = \"Homeownership rate\""                                                             
[12] "  ) +"                                                                                      
[13] "  geom_point("                                                                              
[14] "    data = county |> filter(name == \"Chattahoochee County\"),"                             
[15] "    size = 3, stroke = 2, color = IMSCOL[\"red\", \"full\"], shape = 1"                     
[16] "  ) +"                                                                                      
[17] "  geom_text("                                                                               
[18] "    data = county |> filter(name == \"Chattahoochee County\"),"                             
[19] "    label = \"Chattahoochee County\", fontface = \"italic\","                               
[20] "    nudge_x = 21, nudge_y = -5, color = IMSCOL[\"red\", \"full\"]"                          
[21] "  ) +"                                                                                      
[22] "  guides(color = FALSE) +"                                                                  
[23] "  geom_segment("                                                                            
[24] "    data = county |> filter(name == \"Chattahoochee County\"),"                             
[25] "    aes("                                                                                   
[26] "      x = 0, y = homeownership, xend = multi_unit, yend = homeownership,"                   
[27] "      color = IMSCOL[\"red\", \"full\"]"                                                    
[28] "    ), linetype = \"dashed\""                                                               
[29] "  ) +"                                                                                      
[30] "  geom_segment("                                                                            
[31] "    data = county |> filter(name == \"Chattahoochee County\"),"                             
[32] "    aes("                                                                                   
[33] "      x = multi_unit, y = 0, xend = multi_unit, yend = homeownership,"                      
[34] "      color = IMSCOL[\"red\", \"full\"]"                                                    
[35] "    ), linetype = \"dashed\""                                                               
[36] "  ) +"                                                                                      
[37] "  scale_x_continuous(labels = percent_format(scale = 1)) +"                                 
[38] "  scale_y_continuous(labels = percent_format(scale = 1))"                                   
[39] "```"                                                                                        
[40] ""                                                                                           

Illustration of stars from the cover of The Little Prince.

Cover of the book R for Data Science, 2nd Edition.

Illustration of a red star from the cover of The Little Prince.

Illustration of stars from the cover of The Little Prince.

Cover of the book R for Data Science, 2nd Edition.

Illustration of a red star from the cover of The Little Prince.

Illustration of a gold star from the cover of The Little Prince. leveraging R

Illustration of a gold star from the cover of The Little Prince. GitHub actions

Set global options with _common.R

Leverage your R knowledge to achieve consistent output:

_common.R
set.seed(1014)

knitr::opts_chunk$set(
  comment = "#>",
  collapse = TRUE,
  fig.retina = 2,
  fig.width = 6,
  fig.asp = 2/3,
  fig.show = "hold"
)

options(
  dplyr.print_min = 6,
  dplyr.print_max = 6,
  pillar.max_footer_lines = 2,
  pillar.min_chars = 15,
  stringr.view_n = 6,
  cli.num_colors = 0,
  cli.hyperlink = FALSE,
  pillar.bold = TRUE,
  width = 77 # 80 - 3 for #> comment
)

ggplot2::theme_set(ggplot2::theme_gray(12))

Set status with _common.R

Use your R function writing skills to avoid duplication:

_common.R
# use results: "asis" when setting a status for a chapter
status <- function(type) {
  status <- switch(type,
    polishing = "should be readable but is currently undergoing final polishing",
    restructuring = "is undergoing heavy restructuring and may be confusing or incomplete",
    drafting = "is currently a dumping ground for ideas, and we don't recommend reading it",
    complete = "is largely complete and just needs final proof reading",
    stop("Invalid `type`", call. = FALSE)
  )

  class <- switch(type,
    polishing = "note",
    restructuring = "important",
    drafting = "important",
    complete = "note"
  )

  cat(paste0(
    "\n",
    ":::: status\n",
    "::: callout-", class, " \n",
    "You are reading the work-in-progress second edition of R for Data Science. ",
    "This chapter ", status, ". ",
    "You can find the complete first edition at <https://r4ds.had.co.nz>.\n",
    ":::\n",
    "::::\n"
  ))
}

Set status with _common.R

Use your R function writing skills to avoid duplication:


EDA.qmd
#| results: "asis"
#| echo: false
source("_common.R")
status("complete")

Today’s solution: announcement


_quarto.yml
website:
  announcement: 
    icon: cone-striped
    dismissable: true
    content: |
      "You are reading the work-in-progress second edition of 
      R for Data Science. This chapter **is currently a dumping 
      ground for ideas, and we don't recommend reading it**. 
      You can find the complete first edition at 
      <https://r4ds.had.co.nz>."
    type: primary
    position: below-navbar

Keeping things in check daily

Screenshot of GitHub Action failure email.

Leveraging GitHub actions

  • Avoid freeze
  • Set daily checks
.github/workflows /build_book.yaml
on:
  push:
    branches: main
  pull_request:
    branches: main
  schedule:
    # run every day at 11 PM
    - cron: '0 23 * * *'





Whenever faced with a problem, some people say “Let’s use regular expressions.” Now, they have two problems.





Whenever faced with a problem, some people say “Let’s use regular expressionsGitHub actions.” Now, they have twoso many more problems.

Don’t reinvent the wheel!

Screenshot of GitHub action for rendering and deploying R for Data Science book from its GitHub repo.

Screenshot of GitHub action for rendering a Quarto document from the quarto-actions repo.

Illustration of stars from the cover of The Little Prince.

Mockup of cover of the book Quarto - The Definitive Guide.

Illustration of a red star from the cover of The Little Prince.

Illustration of stars from the cover of The Little Prince.

Mockup of cover of the book Quarto - The Definitive Guide.

Illustration of a red star from the cover of The Little Prince.

Illustration of a gold star from the cover of The Little Prince. multiple languages

Illustration of a gold star from the cover of The Little Prince. multiple environments

Two languages in one .qmd

Each being executed with their own engine:

authoring.qmd
## Code cells

::: panel-tabset
### R

{{< embed notebooks/authoring-r.qmd#plot true >}}

### Python

{{< embed notebooks/authoring-python.qmd#plot true >}}
:::

From two source notebooks

notebooks/authoring-r.qmd
---
title: "Authoring - R"
---

## Markdown text

Hello.

## Code cells

```{r}
#| label: add
1 + 1
```

```{r}
#| label: plot
df <- data.frame(x = 1:8, y = 3:10)
m <- lm(y ~ x, data = df)
plot(df$x, df$y)
abline(m)
```
notebooks/authoring-py.qmd
---
title: Authoring - Python
---

## Markdown text

Hello.

## Code cells

```{python}
#| label: add
1 + 1
```

```{python}
#| label: plot
import matplotlib.pyplot as plt
import numpy as np

xpoints = np.array([1, 8])
ypoints = np.array([3, 10])

plt.plot(xpoints, ypoints)
plt.show()
```

Two recognizable outputs on a single page

GIF of going between tabs of output that is the result of the code in the previous slide. One tab contains a plot made with R and the other with Python.

Productivity with freeze



_quarto.yml
execute:
  freeze: auto

ProductivitySafeguarding your sanity with freeze


_quarto.yml
execute:
  freeze: auto

“Making” books,
that are not just pretty,
but also functional…

Illustration of a blue star from the cover of The Little Prince. r-wasm/quarto-live

Photo of the pop-up version of the book The Little Prince, open to a page that shows the Little Prince sitting on top of a mountain, in front of Seattle skyline view.







thank you!


🔗 bit.ly/books-conf24

mine-cetinkaya-rundel/quarto-books-conf24