Iterating well with purrr

Shannon Pileggi

Introduction

Shannon Pileggi

pipinghotdata.com

@PipingHotData

linkedin.com/in/shannon-m-pileggi/

github.com/shannonpileggi

shannon@pipinghotdata.com

Acknowledgements


Workshop materials have been adapted from the 2020 RStudio What They Forgot To Teach You About R Workshop.

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License (CC BY-SA4.0).

Checklist


R installed? Pretty recent?

     Current version 4.2.0

RStudio installed?

     I’m on 2022.02.3+492

Have these packages?

     tidyverse (includes purrr); repurrrsive

Additional resources

Syntax aside

Pipes

  • 2014+ magrittr pipe %>%

  • 2021+ (R \(\geq\) 4.1.0) native R pipe |>

2022 Isabella Velásquez Understanding the native R pipe |> https://ivelasq.rbind.io/blog/understanding-the-r-pipe/

whatever(arg1, arg2, arg3, ...)

arg1 |>  
  whatever(arg2, arg3)
mean(0:10)

0:10 |> 
  mean()

R for Data Science: Ch 18 Pipes

Namespacing

dplyr::select()

  • tells R explicitly to use the function select from the package dplyr

  • can help to avoid name conflicts (e.g., MASS::select())

  • does not require library(dplyr)

library(dplyr)

select(mtcars, mpg, cyl) 

mtcars |>  
  select(mpg, cyl) 
# library(dplyr) not needed

dplyr::select(mtcars, mpg, cyl) 

mtcars |>  
  dplyr::select(mpg, cyl) 

Iterating without purrr

Gapminder example

library(gapminder)
library(tidyverse)
gapminder
# A tibble: 1,704 × 6
   country     continent  year lifeExp      pop gdpPercap
   <fct>       <fct>     <int>   <dbl>    <int>     <dbl>
 1 Afghanistan Asia       1952    28.8  8425333      779.
 2 Afghanistan Asia       1957    30.3  9240934      821.
 3 Afghanistan Asia       1962    32.0 10267083      853.
 4 Afghanistan Asia       1967    34.0 11537966      836.
 5 Afghanistan Asia       1972    36.1 13079460      740.
 6 Afghanistan Asia       1977    38.4 14880372      786.
 7 Afghanistan Asia       1982    39.9 12881816      978.
 8 Afghanistan Asia       1987    40.8 13867957      852.
 9 Afghanistan Asia       1992    41.7 16317921      649.
10 Afghanistan Asia       1997    41.8 22227415      635.
# … with 1,694 more rows

Gapminder life expectancy

What am I doing? Are there mistakes?

africa <- gapminder[gapminder$continent == "Africa", ]
africa_mm <- max(africa$lifeExp) - min(africa$lifeExp)

americas <- gapminder[gapminder$continent == "Americas", ]
americas_mm <- max(americas$lifeExp) - min(americas$lifeExp)

asia <- gapminder[gapminder$continent == "Asia", ]
asia_mm <- max(asia$lifeExp) - min(africa$lifeExp)

europe <- gapminder[gapminder$continent == "Europe", ]
europe_mm <- max(europe$lifeExp) - min(europe$lifeExp)

oceania <- gapminder[gapminder$continent == "Oceania", ]
oceania_mm <- max(europe$lifeExp) - min(oceania$lifeExp)

cbind(
  continent = c("Africa", "Asias", "Europe", "Oceania"),
  max_minus_min = c(africa_mm, americas_mm, asia_mm, europe_mm, oceania_mm)
  )
01:00

Discussion

  1. What are the drawbacks of this code?

  2. How would you do it instead?

01:00

An alternative solution

gapminder |> 
 group_by(continent) |> 
 summarize(max_minus_min = max(lifeExp) - min(lifeExp))

group_by approach

# A tibble: 5 × 2
  continent max_minus_min
  <fct>             <dbl>
1 Africa             52.8
2 Americas           43.1
3 Asia               53.8
4 Europe             38.2
5 Oceania            12.1

previous approach

     continent max_minus_min
[1,] "Africa"  "52.843"     
[2,] "Asias"   "43.074"     
[3,] "Europe"  "59.004"     
[4,] "Oceania" "38.172"     
[5,] "Africa"  "12.637"     

More iteration

year <- 2017:2021
location <- c("Orlando", "San Diego", "Austin", "San Francisco", "remote")

conf <- rep_len("", length(year))
for (i in seq_along(conf)) {
 conf[i] <- paste0("The ", year[i], " RStudio Conference was in ", location[i], ".")
}
conf
[1] "The 2017 RStudio Conference was in Orlando."      
[2] "The 2018 RStudio Conference was in San Diego."    
[3] "The 2019 RStudio Conference was in Austin."       
[4] "The 2020 RStudio Conference was in San Francisco."
[5] "The 2021 RStudio Conference was in remote."       

Can you think of other ways to do this?

00:30

More iteration, cont.

year <- 2017:2021
location <- c("Orlando", "San Diego", "Austin", "San Francisco", "remote")

paste0("The ", year, " RStudio Conference was in ", location, ".")
[1] "The 2017 RStudio Conference was in Orlando."      
[2] "The 2018 RStudio Conference was in San Diego."    
[3] "The 2019 RStudio Conference was in Austin."       
[4] "The 2020 RStudio Conference was in San Francisco."
[5] "The 2021 RStudio Conference was in remote."       

glue::glue("The {year} RStudio Conference was in {location}.")
The 2017 RStudio Conference was in Orlando.
The 2018 RStudio Conference was in San Diego.
The 2019 RStudio Conference was in Austin.
The 2020 RStudio Conference was in San Francisco.
The 2021 RStudio Conference was in remote.

Some R functions are vectorized.

Introducing purrr

But what if you really need to iterate?


purrr

https://purrr.tidyverse.org/

  • purrr enhances R’s functional programming toolkit

  • a “core” package in the tidyverse meta-package


install.packages("tidyverse") # <-- install purrr + much more
library(tidyverse)            # <-- loads purrr + much more


install.packages("purrr")     # <-- installs only purrr
library(purrr)                # <-- loads only purrr

purrr vs apply

  • purrr is an alternative to “apply” functions

  • purrr::map()base::lapply()

Data

library(purrr)
library(repurrrsive)
help(package = "repurrrsive")

got_chars

sw_people, sw_species, etc.

Get comfortable with lists

Working with lists

Live coding

  1. How many elements are in got_chars?

  2. Who is the 9th person listed in got_chars? What information is given for this person?

  3. What is the difference between got_chars[9] and got_chars[[9]]?

List exploration

str(x, list.len = ?, max.level = ?)

x[i]

x[[i]]

str(x[[i]], ...)

View(x) # in RStudio

Subsetting lists

[ returns a smaller list; [[ returns contents in the list

Another list analogy

x

x[i]

x[[i]]

Iterating with purrr



purrr::map(.x, .f, ...)


for every element of .x do .f


.x = minis


map(minis, antennate)

for every element of .x do .f

purrr::map(.x, .f, ...)

.x <- SOME VECTOR OR LIST
out <- vector(mode = "list", length = length(.x))
for (i in seq_along(out)) {
 out[[i]] <- .f(.x[[i]])
}
out


purrr::map() is a nice way to write a for loop.


Someone has to write a for loop. It doesn’t have to be you.

   ~ Jenny Bryan

Workflow demonstration

How many aliases does each GoT character have?

map(got_chars, .f = 🤷)


Workflow:

  1. Do it for one element.
  2. Find the general recipe.
  3. Drop into map() to do for all.

1. Do it for one element

got_chars[[9]]
$url
[1] "https://www.anapioficeandfire.com/api/characters/1303"

$id
[1] 1303

$name
[1] "Daenerys Targaryen"

$gender
[1] "Female"

$culture
[1] "Valyrian"

$born
[1] "In 284 AC, at Dragonstone"

$died
[1] ""

$alive
[1] TRUE

$titles
[1] "Queen of the Andals and the Rhoynar and the First Men, Lord of the Seven Kingdoms"
[2] "Khaleesi of the Great Grass Sea"                                                  
[3] "Breaker of Shackles/Chains"                                                       
[4] "Queen of Meereen"                                                                 
[5] "Princess of Dragonstone"                                                          

$aliases
 [1] "Dany"                    "Daenerys Stormborn"     
 [3] "The Unburnt"             "Mother of Dragons"      
 [5] "Mother"                  "Mhysa"                  
 [7] "The Silver Queen"        "Silver Lady"            
 [9] "Dragonmother"            "The Dragon Queen"       
[11] "The Mad King's daughter"

$father
[1] ""

$mother
[1] ""

$spouse
[1] "https://www.anapioficeandfire.com/api/characters/1346"

$allegiances
[1] "House Targaryen of King's Landing"

$books
[1] "A Feast for Crows"

$povBooks
[1] "A Game of Thrones"    "A Clash of Kings"     "A Storm of Swords"   
[4] "A Dance with Dragons"

$tvSeries
[1] "Season 1" "Season 2" "Season 3" "Season 4" "Season 5" "Season 6"

$playedBy
[1] "Emilia Clarke"

1. Do it for one element

got_chars[[9]][["aliases"]]
 [1] "Dany"                    "Daenerys Stormborn"     
 [3] "The Unburnt"             "Mother of Dragons"      
 [5] "Mother"                  "Mhysa"                  
 [7] "The Silver Queen"        "Silver Lady"            
 [9] "Dragonmother"            "The Dragon Queen"       
[11] "The Mad King's daughter"


length(got_chars[[9]][["aliases"]])
[1] 11

1. Do it for one element, again

# Daenerys
got_chars[[9]]
got_chars[[9]][["aliases"]]
length(got_chars[[9]][["aliases"]])


# Asha
got_chars[[13]]
got_chars[[13]][["aliases"]]
length(got_chars[[13]][["aliases"]])

2. Find the general recipe.

# Daenerys
got_chars[[9]]
got_chars[[9]][["aliases"]]
length(got_chars[[9]][["aliases"]])


.x <- got_chars[[?]]

length(.x[["aliases"]])


  • .x is a pronoun, like “it”
  • .x means “the current element”

3. Drop into map() to do for all.

.x <- got_chars[[?]]

length(.x[["aliases"]])

map(got_chars, ~ length(.x[["aliases"]]))
[[1]]
[1] 4

[[2]]
[1] 11

[[3]]
[1] 1

[[4]]
[1] 1

[[5]]
[1] 1

[[6]]
[1] 1

[[7]]
[1] 1

[[8]]
[1] 1

[[9]]
[1] 11

[[10]]
[1] 5

[[11]]
[1] 16

[[12]]
[1] 1

[[13]]
[1] 2

[[14]]
[1] 5

[[15]]
[1] 3

[[16]]
[1] 3

[[17]]
[1] 3

[[18]]
[1] 5

[[19]]
[1] 0

[[20]]
[1] 3

[[21]]
[1] 4

[[22]]
[1] 1

[[23]]
[1] 8

[[24]]
[1] 2

[[25]]
[1] 1

[[26]]
[1] 5

[[27]]
[1] 1

[[28]]
[1] 4

[[29]]
[1] 7

[[30]]
[1] 3

Anonymous functions

map(got_chars, ~ length(.x[["aliases"]]))


~ is shortcut for anonymous functions supported in purrr


Three ways of specifying anonymous functions:

map(got_chars,           ~ length(.x[["aliases"]])) # supported in purrr
map(got_chars, function(x) length( x[["aliases"]])) # supported in base R
map(got_chars,        \(x) length( x[["aliases"]])) # supported R > 4.1.0

Your turn

How many ___ does each character have?

Characters Items
got_chars titles, allegiances
sw_people vehicles, starships


map(got_chars, ~ length(.x[["aliases"]]))

05:00

Type specific map variants

map_int(got_chars, ~ length(.x[["aliases"]]))
 [1]  4 11  1  1  1  1  1  1 11  5 16  1  2  5  3  3  3  5  0  3  4  1  8  2  1
[26]  5  1  4  7  3


map_lgl()

map_int()

map_dbl()

map_chr()


returns an atomic vector

of the specified type

Your turn

Replace map() with type-specific map().

# What's each character's name?
map(got_chars, ~.x[["name"]])
map(sw_people, ~.x[["name"]])

# What color is each SW character's hair?
map(sw_people, ~ .x[["hair_color"]])

# Is the GoT character alive?
map(got_chars, ~ .x[["alive"]])

# Is the SW character female?
map(sw_people, ~ .x[["gender"]] == "female")

# How heavy is each SW character?
map(sw_people, ~ .x[["mass"]])
03:00

More purrr

Review #1


Lists can be awkward.

Lists can be necessary.

Get to know your list.

Review #2

purrr::map(.x, .f, ...)

for every element of .x do .f


map_int(got_chars, ~ length(.x[["aliases"]]))

quick anonymous functions via formula


map_lgl(sw_people, ~ .x[["gender"]] == "female")
map_int(got_chars, ~ length(.x[["aliases"]]))
map_chr(got_chars, ~ .x[["name"]])

type specific map variants

We extract by name a lot


# What's each character's name?
map(got_chars, ~.x[["name"]])

# What color is each SW character's hair?
map(sw_people, ~ .x[["hair_color"]])

# Is the GoT character alive?
map(got_chars, ~ .x[["alive"]])

# How heavy is each SW character?
map(sw_people, ~ .x[["mass"]])

.f specification & shortcuts


get_name <- function(x){ x[["name"]] }
map_chr(got_chars, get_name)


map_chr(got_chars, ~ .x[["name"]])


map_chr(got_chars, "name")


map_chr(got_chars, 3)

.f accepts

named functions


anonymous functions


a name


a position


.x = minis


map(minis, “pants”)

Your turn

  1. Explore a GoT or SW list and find a new element to look at.

  2. Extract it across the whole list with name and position shortcuts for .f.

  3. Use map_TYPE() to get an an atomic vector as output.

# GoT
map_??(got_chars, ??)

# Star Wars
map_??(sw_people, ??)
map_??(sw_vehicles, ??)
map_??(sw_species, ??)
# etc.
05:00

Common problem


I’m using map_TYPE() but some

individual elements aren’t of length 1.


They are absent or have length > 1.

Solutions


Missing elements?

Specify a .default value.


Elements of length > 1?

You can’t make an atomic vector.*

Get happy with a list or list-column.

Or pick one element, e.g., the first.


.default value

map(sw_vehicles, "pilots", .default = NA)
[[1]]
[1] NA

[[2]]
[1] NA

[[3]]
[1] NA

[[4]]
[1] NA

[[5]]
[1] "http://swapi.co/api/people/1/"  "http://swapi.co/api/people/18/"

[[6]]
[1] NA

[[7]]
[1] NA

[[8]]
[1] "http://swapi.co/api/people/13/"

[[9]]
[1] NA

[[10]]
[1] NA

[[11]]
[1] NA

[[12]]
[1] NA

[[13]]
[1] "http://swapi.co/api/people/1/" "http://swapi.co/api/people/5/"

[[14]]
[1] NA

[[15]]
[1] NA

[[16]]
[1] NA

[[17]]
[1] NA

[[18]]
[1] NA

[[19]]
[1] "http://swapi.co/api/people/10/" "http://swapi.co/api/people/32/"

[[20]]
[1] "http://swapi.co/api/people/44/"

[[21]]
[1] "http://swapi.co/api/people/11/"

[[22]]
[1] "http://swapi.co/api/people/70/"

[[23]]
[1] "http://swapi.co/api/people/11/"

[[24]]
[1] NA

[[25]]
[1] NA

[[26]]
[1] "http://swapi.co/api/people/79/"

[[27]]
[1] NA

[[28]]
[1] NA

[[29]]
[1] NA

[[30]]
[1] NA

[[31]]
[1] NA

[[32]]
[1] NA

[[33]]
[1] NA

[[34]]
[1] NA

[[35]]
[1] NA

[[36]]
[1] NA

[[37]]
[1] "http://swapi.co/api/people/67/"

[[38]]
[1] NA

[[39]]
[1] NA

select first element

map_chr(sw_vehicles, list("pilots", 1), .default = NA)
 [1] NA                               NA                              
 [3] NA                               NA                              
 [5] "http://swapi.co/api/people/1/"  NA                              
 [7] NA                               "http://swapi.co/api/people/13/"
 [9] NA                               NA                              
[11] NA                               NA                              
[13] "http://swapi.co/api/people/1/"  NA                              
[15] NA                               NA                              
[17] NA                               NA                              
[19] "http://swapi.co/api/people/10/" "http://swapi.co/api/people/44/"
[21] "http://swapi.co/api/people/11/" "http://swapi.co/api/people/70/"
[23] "http://swapi.co/api/people/11/" NA                              
[25] NA                               "http://swapi.co/api/people/79/"
[27] NA                               NA                              
[29] NA                               NA                              
[31] NA                               NA                              
[33] NA                               NA                              
[35] NA                               NA                              
[37] "http://swapi.co/api/people/67/" NA                              
[39] NA                              

more .f shortcuts




map(got_chars, c(14, 1)) 


map(sw_vehicles, list("pilots", 1))

.f accepts


vector of positions


list of names and positions

Another challenge


# create readable output
map_lgl(got_chars, "alive")
 [1]  TRUE  TRUE  TRUE FALSE  TRUE FALSE FALSE  TRUE  TRUE  TRUE  TRUE FALSE
[13]  TRUE  TRUE FALSE  TRUE  TRUE FALSE  TRUE FALSE  TRUE  TRUE  TRUE  TRUE
[25] FALSE  TRUE FALSE FALSE  TRUE  TRUE


Who? 🙄

Named lists >>> unnamed lists

# retrieve names of GoT characters
got_names <- map_chr(got_chars, "name")
got_names[1:3]
[1] "Theon Greyjoy"     "Tyrion Lannister"  "Victarion Greyjoy"

# create a named list
got_chars_named <- set_names(got_chars, got_names)
str(got_chars_named[1:3], max.level = 1)
List of 3
 $ Theon Greyjoy    :List of 18
 $ Tyrion Lannister :List of 18
 $ Victarion Greyjoy:List of 18

# create readable output
map_lgl(got_chars_named, "alive")[1:3]
    Theon Greyjoy  Tyrion Lannister Victarion Greyjoy 
             TRUE              TRUE              TRUE 


Names

propagate

in purrr

pipelines.

Set them

early and

enjoy!

Example: name propagation

With tibble::enframe(), a named list converts to a data frame with names & list-column.

allegiances <- map(got_chars_named, "allegiances")
tibble::enframe(allegiances, value = "allegiances")
# A tibble: 30 × 2
   name               allegiances
   <chr>              <list>     
 1 Theon Greyjoy      <chr [1]>  
 2 Tyrion Lannister   <chr [1]>  
 3 Victarion Greyjoy  <chr [1]>  
 4 Will               <NULL>     
 5 Areo Hotah         <chr [1]>  
 6 Chett              <NULL>     
 7 Cressen            <NULL>     
 8 Arianne Martell    <chr [1]>  
 9 Daenerys Targaryen <chr [1]>  
10 Davos Seaworth     <chr [2]>  
# … with 20 more rows

Review #3

got_names <- map_chr(got_chars, "name")
got_chars_named <- set_names(got_chars, got_names)

Set list names for a happier life.

map_chr(got_chars, "name")
map_chr(got_chars, 3)
map_chr(got_chars, c(3, 1))
map_chr(got_chars, list("name", 1))
map_chr(got_chars, ~ .x[["name"]])

There are many ways to specify .f.

map(sw_vehicles, "pilots", .default = NA)
map_chr(sw_vehicles, list("pilots", 1), .default = NA)

.default is useful for missing things.

Your turn

Create a named copy of a GoT or SW list with set_names(). Find an element with tricky presence/absence or length. Extract it many ways.

Extraction methods:

  • by name

  • by position

  • by list("name", pos)

  • by c(pos, pos)

  • use .default for missing data

  • use map_TYPE() to coerce output to atomic vector

Finish? Try one of these:

  1. Which SW film has the most characters?
  2. Which SW species has the most possible eye colors?
  3. Which GoT character has the most allegiances? Aliases? Titles?
  4. Which GoT character has been played by multiple actors?
07:00

Inspiration for your future purrr work

Additional arguments


books <- map(got_chars_named, "books") 
books[1:2]
$`Theon Greyjoy`
[1] "A Game of Thrones" "A Storm of Swords" "A Feast for Crows"

$`Tyrion Lannister`
[1] "A Feast for Crows"         "The World of Ice and Fire"


map_chr(books[1:2], ~ paste(.x, collapse = ", ")) 
                                            Theon Greyjoy 
"A Game of Thrones, A Storm of Swords, A Feast for Crows" 
                                         Tyrion Lannister 
           "A Feast for Crows, The World of Ice and Fire" 

map(.x, .f, ...)


books <- map(got_chars_named, "books") 
books[1:2]
$`Theon Greyjoy`
[1] "A Game of Thrones" "A Storm of Swords" "A Feast for Crows"

$`Tyrion Lannister`
[1] "A Feast for Crows"         "The World of Ice and Fire"


map_chr(books[1:2], paste, collapse = ", ")
                                            Theon Greyjoy 
"A Game of Thrones, A Storm of Swords, A Feast for Crows" 
                                         Tyrion Lannister 
           "A Feast for Crows, The World of Ice and Fire" 

Passing arguments

# map(.x, .f, ...)
map_chr(books[1:2], paste, collapse = ", ")


So, yes,

there are many ways to specify .f.

map(got_chars, ~ length(.x[["aliases"]]))
map_chr(got_chars, "name")
map_chr(books, paste, collapse = ", ")
map(sw_vehicles, list("pilots", 1))

Walk

countries <- c("Argentina", "Brazil")
gap_small <- gapminder  |> 
 filter(country %in% countries & year > 2000)
gap_small
# A tibble: 4 × 6
  country   continent  year lifeExp       pop gdpPercap
  <fct>     <fct>     <int>   <dbl>     <int>     <dbl>
1 Argentina Americas   2002    74.3  38331121     8798.
2 Argentina Americas   2007    75.3  40301927    12779.
3 Brazil    Americas   2002    71.0 179914212     8131.
4 Brazil    Americas   2007    72.4 190010647     9066.

write_one <- function(x) {
 filename <- paste0(x, ".csv")
 dataset <- filter(gap_small, country == x)
 write_csv(dataset, filename)
}

walk(countries, write_one)
list.files(pattern = "*.csv")

#> [1] "Argentina.csv" "Brazil.csv" 

walk() is map()

but

returns no output

map_dfr() rowbinds a list of data frames


csv_files <- list.files(pattern = "*.csv")
csv_files

#> [1] "Argentina.csv" "Brazil.csv" 


map_dfr(csv_files, ~ read_csv(.x))

# A tibble: 4 × 6
  country   continent  year lifeExp       pop gdpPercap
  <chr>     <chr>     <dbl>   <dbl>     <dbl>     <dbl>
1 Argentina Americas   2002    74.3  38331121     8798.
2 Argentina Americas   2007    75.3  40301927    12779.
3 Brazil    Americas   2002    71.0 179914212     8131.
4 Brazil    Americas   2007    72.4 190010647     9066.

map_dfr() smush


mapping over

2 or more things

in parallel


.y = hair

.x = minis


map2(minis, hair, enhair)


.y = weapons

.x = minis


map2(minis, weapons, arm)

minis |>

   map2(hair, enhair) |>

   map2(weapons, arm)

map2()

iterates over two vectors in parallel.

df <- tibble(pants, torso, head)

embody <- function(pants, torso, head)

   insert(insert(pants, torso), head)

pmap(df, embody)

pmap()

supply a list to iterate over any number of arguments in parallel.

map_dfr(minis, `[`,

   c(“pants”, “torso”, “head”)

For more

rstd.io/row-work

Map guide

Go forth,

and explore the world of purrr!