Shannon Pileggi
Workshop materials have been adapted from the 2020 RStudio What They Forgot To Teach You About R Workshop.
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License (CC BY-SA4.0).
R installed? Pretty recent?
Current version 4.2.0
RStudio installed?
I’m on 2022.02.3+492
Have these packages?
tidyverse
(includes purrr
); repurrrsive
Jenny Bryan purrr tutorial
https://jennybc.github.io/purrr-tutorial/
Charlotte Wickham purrr tutorial
https://github.com/cwickham/purrr-tutorial
Jenny Bryan row-oriented workflows workshop
https://github.com/jennybc/row-oriented-workflows
Advanced R by Hadley Wickham, Ch 9 Functionals
https://adv-r.hadley.nz/functionals.html
The Joy of Functional Programming (for Data Science)
webinar by Hadley Wickham
https://www.youtube.com/watch?v=bzUmK0Y07ck
2014+ magrittr pipe %>%
2021+ (R \(\geq\) 4.1.0) native R pipe |>
2022 Isabella Velásquez Understanding the native R pipe |> https://ivelasq.rbind.io/blog/understanding-the-r-pipe/
dplyr::select()
tells R explicitly to use the function select
from the package dplyr
can help to avoid name conflicts (e.g., MASS::select()
)
does not require library(dplyr)
# A tibble: 1,704 × 6
country continent year lifeExp pop gdpPercap
<fct> <fct> <int> <dbl> <int> <dbl>
1 Afghanistan Asia 1952 28.8 8425333 779.
2 Afghanistan Asia 1957 30.3 9240934 821.
3 Afghanistan Asia 1962 32.0 10267083 853.
4 Afghanistan Asia 1967 34.0 11537966 836.
5 Afghanistan Asia 1972 36.1 13079460 740.
6 Afghanistan Asia 1977 38.4 14880372 786.
7 Afghanistan Asia 1982 39.9 12881816 978.
8 Afghanistan Asia 1987 40.8 13867957 852.
9 Afghanistan Asia 1992 41.7 16317921 649.
10 Afghanistan Asia 1997 41.8 22227415 635.
# … with 1,694 more rows
Hans Rosling discusses Gapminder data
https://www.youtube.com/watch?v=hVimVzgtD6w
What am I doing? Are there mistakes?
africa <- gapminder[gapminder$continent == "Africa", ]
africa_mm <- max(africa$lifeExp) - min(africa$lifeExp)
americas <- gapminder[gapminder$continent == "Americas", ]
americas_mm <- max(americas$lifeExp) - min(americas$lifeExp)
asia <- gapminder[gapminder$continent == "Asia", ]
asia_mm <- max(asia$lifeExp) - min(africa$lifeExp)
europe <- gapminder[gapminder$continent == "Europe", ]
europe_mm <- max(europe$lifeExp) - min(europe$lifeExp)
oceania <- gapminder[gapminder$continent == "Oceania", ]
oceania_mm <- max(europe$lifeExp) - min(oceania$lifeExp)
cbind(
continent = c("Africa", "Asias", "Europe", "Oceania"),
max_minus_min = c(africa_mm, americas_mm, asia_mm, europe_mm, oceania_mm)
)
01:00
What are the drawbacks of this code?
How would you do it instead?
01:00
group_by
approach
# A tibble: 5 × 2
continent max_minus_min
<fct> <dbl>
1 Africa 52.8
2 Americas 43.1
3 Asia 53.8
4 Europe 38.2
5 Oceania 12.1
previous approach
continent max_minus_min
[1,] "Africa" "52.843"
[2,] "Asias" "43.074"
[3,] "Europe" "59.004"
[4,] "Oceania" "38.172"
[5,] "Africa" "12.637"
[1] "The 2017 RStudio Conference was in Orlando."
[2] "The 2018 RStudio Conference was in San Diego."
[3] "The 2019 RStudio Conference was in Austin."
[4] "The 2020 RStudio Conference was in San Francisco."
[5] "The 2021 RStudio Conference was in remote."
Can you think of other ways to do this?
00:30
[1] "The 2017 RStudio Conference was in Orlando."
[2] "The 2018 RStudio Conference was in San Diego."
[3] "The 2019 RStudio Conference was in Austin."
[4] "The 2020 RStudio Conference was in San Francisco."
[5] "The 2021 RStudio Conference was in remote."
Some R functions are vectorized.
purrr
purrr
enhances R’s functional programming toolkit
a “core” package in the tidyverse meta-package
purrr
vs apply
purrr is an alternative to “apply” functions
purrr::map()
≈ base::lapply()
got_chars
sw_people
, sw_species
, etc.
How many elements are in got_chars
?
Who is the 9th person listed in got_chars
? What information is given for this person?
What is the difference between got_chars[9]
and got_chars[[9]]
?
[
returns a smaller list; [[
returns contents in the list
x
x[i]
x[[i]]
purrr::map(.x, .f, ...)
for every element of .x
do .f
.x = minis
map(minis, antennate)
for every element of .x
do .f
Advanced R: Ch. 9 Functionals
purrr::map(.x, .f, ...)
purrr::map()
is a nice way to write a for loop.
Someone has to write a for loop. It doesn’t have to be you.
~ Jenny Bryan
How many aliases does each GoT character have?
map(got_chars, .f = 🤷)
Workflow:
map()
to do for all.$url
[1] "https://www.anapioficeandfire.com/api/characters/1303"
$id
[1] 1303
$name
[1] "Daenerys Targaryen"
$gender
[1] "Female"
$culture
[1] "Valyrian"
$born
[1] "In 284 AC, at Dragonstone"
$died
[1] ""
$alive
[1] TRUE
$titles
[1] "Queen of the Andals and the Rhoynar and the First Men, Lord of the Seven Kingdoms"
[2] "Khaleesi of the Great Grass Sea"
[3] "Breaker of Shackles/Chains"
[4] "Queen of Meereen"
[5] "Princess of Dragonstone"
$aliases
[1] "Dany" "Daenerys Stormborn"
[3] "The Unburnt" "Mother of Dragons"
[5] "Mother" "Mhysa"
[7] "The Silver Queen" "Silver Lady"
[9] "Dragonmother" "The Dragon Queen"
[11] "The Mad King's daughter"
$father
[1] ""
$mother
[1] ""
$spouse
[1] "https://www.anapioficeandfire.com/api/characters/1346"
$allegiances
[1] "House Targaryen of King's Landing"
$books
[1] "A Feast for Crows"
$povBooks
[1] "A Game of Thrones" "A Clash of Kings" "A Storm of Swords"
[4] "A Dance with Dragons"
$tvSeries
[1] "Season 1" "Season 2" "Season 3" "Season 4" "Season 5" "Season 6"
$playedBy
[1] "Emilia Clarke"
[1] "Dany" "Daenerys Stormborn"
[3] "The Unburnt" "Mother of Dragons"
[5] "Mother" "Mhysa"
[7] "The Silver Queen" "Silver Lady"
[9] "Dragonmother" "The Dragon Queen"
[11] "The Mad King's daughter"
.x <- got_chars[[?]]
length(.x[["aliases"]])
.x
is a pronoun, like “it”.x
means “the current element”.x <- got_chars[[?]]
length(.x[["aliases"]])
[[1]]
[1] 4
[[2]]
[1] 11
[[3]]
[1] 1
[[4]]
[1] 1
[[5]]
[1] 1
[[6]]
[1] 1
[[7]]
[1] 1
[[8]]
[1] 1
[[9]]
[1] 11
[[10]]
[1] 5
[[11]]
[1] 16
[[12]]
[1] 1
[[13]]
[1] 2
[[14]]
[1] 5
[[15]]
[1] 3
[[16]]
[1] 3
[[17]]
[1] 3
[[18]]
[1] 5
[[19]]
[1] 0
[[20]]
[1] 3
[[21]]
[1] 4
[[22]]
[1] 1
[[23]]
[1] 8
[[24]]
[1] 2
[[25]]
[1] 1
[[26]]
[1] 5
[[27]]
[1] 1
[[28]]
[1] 4
[[29]]
[1] 7
[[30]]
[1] 3
~
is shortcut for anonymous functions supported in purrr
Three ways of specifying anonymous functions:
How many ___ does each character have?
Characters | Items |
---|---|
got_chars | titles, allegiances |
sw_people | vehicles, starships |
map(got_chars, ~ length(.x[["aliases"]]))
05:00
[1] 4 11 1 1 1 1 1 1 11 5 16 1 2 5 3 3 3 5 0 3 4 1 8 2 1
[26] 5 1 4 7 3
map_lgl()
map_int()
map_dbl()
map_chr()
returns an atomic vector
of the specified type
Replace map() with type-specific map().
# What's each character's name?
map(got_chars, ~.x[["name"]])
map(sw_people, ~.x[["name"]])
# What color is each SW character's hair?
map(sw_people, ~ .x[["hair_color"]])
# Is the GoT character alive?
map(got_chars, ~ .x[["alive"]])
# Is the SW character female?
map(sw_people, ~ .x[["gender"]] == "female")
# How heavy is each SW character?
map(sw_people, ~ .x[["mass"]])
03:00
for every element of .x
do .f
.f
specification & shortcuts.x = minis
map(minis, “pants”)
Explore a GoT or SW list and find a new element to look at.
Extract it across the whole list with name and position shortcuts for .f.
Use map_TYPE() to get an an atomic vector as output.
05:00
I’m using map_TYPE() but some
individual elements aren’t of length 1.
They are absent or have length > 1.
Specify a .default
value.
You can’t make an atomic vector.*
Get happy with a list or list-column.
Or pick one element, e.g., the first.
*
You can, if you are willing to flatten()
.
.default
value[[1]]
[1] NA
[[2]]
[1] NA
[[3]]
[1] NA
[[4]]
[1] NA
[[5]]
[1] "http://swapi.co/api/people/1/" "http://swapi.co/api/people/18/"
[[6]]
[1] NA
[[7]]
[1] NA
[[8]]
[1] "http://swapi.co/api/people/13/"
[[9]]
[1] NA
[[10]]
[1] NA
[[11]]
[1] NA
[[12]]
[1] NA
[[13]]
[1] "http://swapi.co/api/people/1/" "http://swapi.co/api/people/5/"
[[14]]
[1] NA
[[15]]
[1] NA
[[16]]
[1] NA
[[17]]
[1] NA
[[18]]
[1] NA
[[19]]
[1] "http://swapi.co/api/people/10/" "http://swapi.co/api/people/32/"
[[20]]
[1] "http://swapi.co/api/people/44/"
[[21]]
[1] "http://swapi.co/api/people/11/"
[[22]]
[1] "http://swapi.co/api/people/70/"
[[23]]
[1] "http://swapi.co/api/people/11/"
[[24]]
[1] NA
[[25]]
[1] NA
[[26]]
[1] "http://swapi.co/api/people/79/"
[[27]]
[1] NA
[[28]]
[1] NA
[[29]]
[1] NA
[[30]]
[1] NA
[[31]]
[1] NA
[[32]]
[1] NA
[[33]]
[1] NA
[[34]]
[1] NA
[[35]]
[1] NA
[[36]]
[1] NA
[[37]]
[1] "http://swapi.co/api/people/67/"
[[38]]
[1] NA
[[39]]
[1] NA
[1] NA NA
[3] NA NA
[5] "http://swapi.co/api/people/1/" NA
[7] NA "http://swapi.co/api/people/13/"
[9] NA NA
[11] NA NA
[13] "http://swapi.co/api/people/1/" NA
[15] NA NA
[17] NA NA
[19] "http://swapi.co/api/people/10/" "http://swapi.co/api/people/44/"
[21] "http://swapi.co/api/people/11/" "http://swapi.co/api/people/70/"
[23] "http://swapi.co/api/people/11/" NA
[25] NA "http://swapi.co/api/people/79/"
[27] NA NA
[29] NA NA
[31] NA NA
[33] NA NA
[35] NA NA
[37] "http://swapi.co/api/people/67/" NA
[39] NA
.f
shortcuts [1] TRUE TRUE TRUE FALSE TRUE FALSE FALSE TRUE TRUE TRUE TRUE FALSE
[13] TRUE TRUE FALSE TRUE TRUE FALSE TRUE FALSE TRUE TRUE TRUE TRUE
[25] FALSE TRUE FALSE FALSE TRUE TRUE
[1] "Theon Greyjoy" "Tyrion Lannister" "Victarion Greyjoy"
# create a named list
got_chars_named <- set_names(got_chars, got_names)
str(got_chars_named[1:3], max.level = 1)
List of 3
$ Theon Greyjoy :List of 18
$ Tyrion Lannister :List of 18
$ Victarion Greyjoy:List of 18
Names
propagate
in purrr
pipelines.
Set them
early and
enjoy!
With tibble::enframe()
, a named list converts to a data frame with names & list-column.
allegiances <- map(got_chars_named, "allegiances")
tibble::enframe(allegiances, value = "allegiances")
# A tibble: 30 × 2
name allegiances
<chr> <list>
1 Theon Greyjoy <chr [1]>
2 Tyrion Lannister <chr [1]>
3 Victarion Greyjoy <chr [1]>
4 Will <NULL>
5 Areo Hotah <chr [1]>
6 Chett <NULL>
7 Cressen <NULL>
8 Arianne Martell <chr [1]>
9 Daenerys Targaryen <chr [1]>
10 Davos Seaworth <chr [2]>
# … with 20 more rows
For more on list columns, see rstudio::conf(2018) Data Rectangling by Jenny Bryan
https://www.rstudio.com/resources/rstudioconf-2018/data-rectangling/
Set list names for a happier life.
There are many ways to specify .f
.
Create a named copy of a GoT or SW list with set_names(). Find an element with tricky presence/absence or length. Extract it many ways.
Extraction methods:
by name
by position
by list("name", pos)
by c(pos, pos)
use .default
for missing data
use map_TYPE()
to coerce output to atomic vector
Finish? Try one of these:
07:00
$`Theon Greyjoy`
[1] "A Game of Thrones" "A Storm of Swords" "A Feast for Crows"
$`Tyrion Lannister`
[1] "A Feast for Crows" "The World of Ice and Fire"
Theon Greyjoy
"A Game of Thrones, A Storm of Swords, A Feast for Crows"
Tyrion Lannister
"A Feast for Crows, The World of Ice and Fire"
map(.x, .f, ...)
$`Theon Greyjoy`
[1] "A Game of Thrones" "A Storm of Swords" "A Feast for Crows"
$`Tyrion Lannister`
[1] "A Feast for Crows" "The World of Ice and Fire"
Theon Greyjoy
"A Game of Thrones, A Storm of Swords, A Feast for Crows"
Tyrion Lannister
"A Feast for Crows, The World of Ice and Fire"
.f
.https://jennybc.github.io/purrr-tutorial/ls03_map-function-syntax.html#load_packages
countries <- c("Argentina", "Brazil")
gap_small <- gapminder |>
filter(country %in% countries & year > 2000)
gap_small
# A tibble: 4 × 6
country continent year lifeExp pop gdpPercap
<fct> <fct> <int> <dbl> <int> <dbl>
1 Argentina Americas 2002 74.3 38331121 8798.
2 Argentina Americas 2007 75.3 40301927 12779.
3 Brazil Americas 2002 71.0 179914212 8131.
4 Brazil Americas 2007 72.4 190010647 9066.
map_dfr()
rowbinds a list of data frames# A tibble: 4 × 6
country continent year lifeExp pop gdpPercap
<chr> <chr> <dbl> <dbl> <dbl> <dbl>
1 Argentina Americas 2002 74.3 38331121 8798.
2 Argentina Americas 2007 75.3 40301927 12779.
3 Brazil Americas 2002 71.0 179914212 8131.
4 Brazil Americas 2007 72.4 190010647 9066.
map_dfr()
smushhttps://twitter.com/asmae_toumi/status/1364407122268729347
.y = hair
.x = minis
map2(minis, hair, enhair)
.y = weapons
.x = minis
map2(minis, weapons, arm)
minis |>
map2(hair, enhair) |>
map2(weapons, arm)
map2()
iterates over two vectors in parallel.
df <- tibble(pants, torso, head)
embody <- function(pants, torso, head)
insert(insert(pants, torso), head)
pmap(df, embody)
pmap()
supply a list to iterate over any number of arguments in parallel.
map_dfr(minis, `[`,
c(“pants”, “torso”, “head”)
Photo from Andriyko Podilnyk on unsplash