library(tidyverse)
#> ── Attaching core tidyverse packages ───────────────────── tidyverse 2.0.0 ──
#> ✔ dplyr 1.1.2 ✔ readr 2.1.4
#> ✔ forcats 1.0.0 ✔ stringr 1.5.0
#> ✔ ggplot2 3.4.2 ✔ tibble 3.2.1
#> ✔ lubridate 1.9.2 ✔ tidyr 1.3.0
#> ✔ purrr 1.0.1
#> ── Conflicts ─────────────────────────────────────── tidyverse_conflicts() ──
#> ✖ dplyr::filter() masks stats::filter()
#> ✖ dplyr::lag() masks stats::lag()
#> ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors6 Data tidying
Prerequisites
6.2.1 Exercises
In each of
table1,table2, andtable3, each observation represents a country. Intable1,countryis the country name,yearis the year of data collection,casesis the number of people with the disease in that year, andpopulationis the number of people in each country in that year. Intable2,countryandyearare the same as intable1,typeis the type of number, andcountis the number of observations (eithercasesorpopulationdepending ontype). Finally, intable3,countryandyearare again the same as intable1, andrateis the rate of disease (cases divided by population).-
For
table2, we need to reshape the data have a column for cases and a column for population and then divide the two to calculate the rate. A possible approach is shown below.table2 |> pivot_wider( names_from = type, values_from = count ) |> mutate(rate = cases / population * 10000) #> # A tibble: 6 × 5 #> country year cases population rate #> <chr> <dbl> <dbl> <dbl> <dbl> #> 1 Afghanistan 1999 745 19987071 0.373 #> 2 Afghanistan 2000 2666 20595360 1.29 #> 3 Brazil 1999 37737 172006362 2.19 #> 4 Brazil 2000 80488 174504898 4.61 #> 5 China 1999 212258 1272915272 1.67 #> 6 China 2000 213766 1280428583 1.67For
table3, we need to separate cases and population into their own columns and then divide them. A possible approach is shown below.table3 |> separate_wider_delim( cols = rate, delim = "/", names = c("cases", "population"), ) |> mutate( cases = as.numeric(cases), population = as.numeric(population), rate = cases / population * 10000 ) #> # A tibble: 6 × 5 #> country year cases population rate #> <chr> <dbl> <dbl> <dbl> <dbl> #> 1 Afghanistan 1999 745 19987071 0.373 #> 2 Afghanistan 2000 2666 20595360 1.29 #> 3 Brazil 1999 37737 172006362 2.19 #> 4 Brazil 2000 80488 174504898 4.61 #> 5 China 1999 212258 1272915272 1.67 #> 6 China 2000 213766 1280428583 1.67