Hooked on Data - Going Off the Map: Exploring purrr’s Other Functions

I recently completed Colin Fay’s excellent DataCamp course, Intermediate Functional Programming with purrr (full disclosure: I work at DataCamp, but part of why I joined was that I was a big fan of the short, interactive course format). Although I’ve used the purrr package before, there were a lot of functions in this course that were new to me. I wrote this post to hopefully demystify purrr a bit for those who find it overwhelming and illustrate some of its lesser known functions. Most of these functions are covered in Colin’s course, though I added a few I found on the purrr cheatsheet.

Introduction

purrr is a package for functional programming in R. If you’re familiar with it, it’s probably because of the map()* functions. And if you’ve been a little bit intimidated by them, I’m right there with you. You’ll often see purrr used with nested lists or dataframes, like in this (modified) example from one of the last lessons in Jenny Bryan’s purrr tutorial:

library(dplyr)
library(purrr)
library(gapminder)
library(tidyr)

gapminder %>%
  group_by(country) %>%
  nest() %>%  
  mutate(fit = map(data, ~ lm(lifeExp ~ year, data = .x))) %>%
  mutate(rsq = map_dbl(fit, ~ summary(.x)[["r.squared"]])) %>%
  arrange(rsq)

# A tibble: 142 × 4
# Groups:   country [142]
   country          data              fit       rsq
   <fct>            <list>            <list>  <dbl>
 1 Rwanda           <tibble [12 × 5]> <lm>   0.0172
 2 Botswana         <tibble [12 × 5]> <lm>   0.0340
 3 Zimbabwe         <tibble [12 × 5]> <lm>   0.0562
 4 Zambia           <tibble [12 × 5]> <lm>   0.0598
 5 Swaziland        <tibble [12 × 5]> <lm>   0.0682
 6 Lesotho          <tibble [12 × 5]> <lm>   0.0849
 7 Cote d'Ivoire    <tibble [12 × 5]> <lm>   0.283 
 8 South Africa     <tibble [12 × 5]> <lm>   0.312 
 9 Uganda           <tibble [12 × 5]> <lm>   0.342 
10 Congo, Dem. Rep. <tibble [12 × 5]> <lm>   0.348 
# … with 132 more rows
# ℹ Use `print(n = ...)` to see more rows

If you’ve generally worked with plain-old table data or vectors (like I have), you might have this reaction to that code:

But I am here to tell you: purrr can make your life easier even if you never write code like this. Certainly, knowing how to work with complicated nested lists and dataframes is very useful - it can simplify code you’ve written, and your data may arrive in that format (for example, JSON data is often represented as nested lists or dataframes in R). But even if all you ever work with is “simple” lists, dataframes, and vectors, you’ll be glad to know a bit of purrr.

A brief introduction to map()

map() lets you take a list or vector and apply a function to each element. If you’ve been using R for a while, you might be familiar with the apply functions, including sapply() and lapply(). map() does essentially the same thing, but offers several advantages, most importantly consistency of output and helpers that let you write more concise code (see the first Stack Overflow answer by Hadley Wickham here).

Let’s look at an example, where we take a list of four numbers and round each of them.

my_vector <- c(1.0212, 2.483, 3.189, 4.5938)

map(my_vector, round)

[[1]]
[1] 1

[[2]]
[1] 2

[[3]]
[1] 3

[[4]]
[1] 5

The result is a list (hence the double brackets), which is always the case with map(). But you can return a dataframe or different types of vectors instead by using the appropriate map_*() function. In this case, let’s return a vector of type double¹:

map_dbl(my_vector, round)

[1] 1 2 3 5

You can also use map with an “anonymous” function, a function that doesn’t have a name. The formula is a ~ followed by what you want to do to each element, with .x representing the element (. also works). We can make an anonymous function to add ten to each element of our vector:

map_dbl(my_vector, ~ .x + 10)

[1] 11.0212 12.4830 13.1890 14.5938

map() can get much (much) more complicated, with nested lists and multiple inputs and arguments, but even knowing this basic use case can help you! If you do want to dive in more, check out chapter 21 of R for Data Science, Jenny Bryan’s purrr tutorials, Auriel Fournier’s Foundations of Functional Programming with purrr course, and chapters 3 and 4 of Writing Functions in R by Charlotte and Hadley Wickham on DataCamp.

Beyond map()

While map*() is great, it can still take a while to wrap your head around. But purrr offers dozens of useful functions that you can start using right away to streamline your workflow, even if you don’t use map(). Let’s check out a few. I’ll separate them into two types: those that create new functions and those that modify a list/vector.

Modifying and summarizing vectors/lists

keep() and discard()

keep() and discard() … keep and discard elements of a list or vector based on a predicate function. A predicate function is a function that returns TRUE or FALSE. So is.factor() is a predicate function, because it always returns TRUE or FALSE, while round() is not.

For example, we can keep() all elements of our list that are less than 3 with the following code:

keep(my_vector, ~ .x < 3)

[1] 1.0212 2.4830

Similarly, we could discard() all elements less than 3:

discard(my_vector, ~ .x < 3)

[1] 3.1890 4.5938

map_if()

What if you’re not sure about the types of every element in your list, and you want to apply a function that needs the input to be of a certain type? For example, let’s say we wanted to add 10 to every element of a list.

mixed_list <- list("happy", 2L, 4.39)

add_ten <- function(n) {
  n + 10
}

map(mixed_list, add_ten)

Error in n + 10: non-numeric argument to binary operator

We get an error since we’re trying to add 10 to “happy”, which isn’t numeric!

This is where map_if() comes in handy. Just like mutate_if(), select_if(), and summarise_if(), you add a condition and the function will only apply to those columns (or list elements) where the condition is met. Here, we know that condition is that the element needs to be numeric. Let’s try again with map_if():

map_if(mixed_list, is.numeric, add_ten)

[[1]]
[1] "happy"

[[2]]
[1] 12

[[3]]
[1] 14.39

Alright! We see it skipped over “happy”, preserving it as is, and added ten to the two numeric elements.

every() and some()

Sometimes you have a giant list and want to know whether each element meets a condition, like being numeric. You can use every(), which will check if every element of a list satisfies a predicate function:

every(mixed_list, is.numeric)

[1] FALSE

Since “happy” is the first element of mixed_list, it doesn’t past the test.

If we want to be less strict and just check if some of the elements satisfies a predicate function, we can use some() instead:

some(mixed_list, is.numeric)

[1] TRUE

In this case, since some elements of mixed_list were numeric, we got TRUE.

Modifying functions

purrr includes adverbs - functions that take a function and return a modified version (just as an adverb modifies a verb). Let’s check out a few!

negate()

negate() … negates a predicate function (aren’t the purrr function names great?). For example, let’s say you want to check which elements of a list were not null. This is how you would do it with map_lgl (which returns a logical vector rather than a list):

lst <- list("a", 3, 22, NULL, "q", NULL)

map_lgl(lst, ~ !is.null(.))

[1]  TRUE  TRUE  TRUE FALSE  TRUE FALSE

This works, but it’s not super easy to read. Instead, we can make an is_not_null() function using negate():

is_not_null <- negate(is.null)

map_lgl(lst, is_not_null)

[1]  TRUE  TRUE  TRUE FALSE  TRUE FALSE

Voila!

partial()

You probably have a couple functions where you almost always use an extra argument, like mean() with na.rm = TRUE or round() with digits = 1. You can use partial() to create a new function where those are always specified, saving you some repetitive typing!

mean(c(10, NA, 5, 7))

[1] NA

mean(c(10, NA, 5, 7), na.rm = TRUE)

[1] 7.333333

my_mean <- partial(mean, na.rm = TRUE)

my_mean(c(10, NA, 5, 7))

[1] 7.333333

my_round <- partial(round, digits = 1)

my_round(10.484)

[1] 10.5

safely() and possibly()

We saw earlier that map_if() could be used where we have a condition we want to be met before applying a function. In our case, we used it to avoid an error, but you could also use it to meet a condition like the number being negative or greater than a threshold. But what if you don’t know where errors could come from but want to handle them? This is where safely() and possibly() come in.

If you’re not interested in what the error is, you should use possibly(). In addition to the function it’s wrapping around, you need to specify the argument otherwise, which is what you want to return if there is an error. Let’s take a look:

possibly_add_ten <- possibly(add_ten, otherwise = "I'm not numeric!")

map(mixed_list, possibly_add_ten)

[[1]]
[1] "I'm not numeric!"

[[2]]
[1] 12

[[3]]
[1] 14.39

Side-note - I find the the double-brackets confusing. We’ve got them since our list isn’t named. Let’s set the name of each element with purrr's set_names(), giving our elements the very creative names a, b, and c.

mixed_list <- set_names(mixed_list, c("a", "b", "c"))

mixed_list

$a
[1] "happy"

$b
[1] 2

$c
[1] 4.39

Well … mildly easier to read at least.

On the other hand, sometimes you do want to know what the error is. If that’s the case, you can use safely() instead:

map(mixed_list, safely(add_ten))

$a
$a$result
NULL

$a$error
<simpleError in n + 10: non-numeric argument to binary operator>


$b
$b$result
[1] 12

$b$error
NULL


$c
$c$result
[1] 14.39

$c$error
NULL

safely() returns a list of lists. Each element from the original list has two entries: result and error. One is always NULL - if there was an error, result is NULL and error is the error message; if there wasn’t an error, the result is the result and error is NULL. If you want to get back all the errors or results, you can use another handy feature of map(). If you give map() a string as the second argument, for each element of the list, it will return the element inside of it with that name.

safely_add_ten <- safely(add_ten)

mixed_list %>%
  map(safely_add_ten) %>%
  map("error")

$a
<simpleError in n + 10: non-numeric argument to binary operator>

$b
NULL

$c
NULL

Ah, no more list of lists. That was scary. I promise I won’t do it again.

compact()

In the previous example, we probably wouldn’t be interested in the instances where errors were NULL. We could use discard(is.null), but purrr actually provides a function just for the purpose of getting rid of NULLs: compact().

mixed_list %>%
  map(safely_add_ten) %>%
  map("error") %>%
  compact()

$a
<simpleError in n + 10: non-numeric argument to binary operator>

compose()

compose() lets you string together multiple functions. Let’s say you want to add_ten(), take the log, and then round a vector of numbers. You could do either of these:

c(1, 20, 500) %>%
  add_ten() %>%
  log() %>%
  round()

[1] 2 3 6

round(log(add_ten(c(1, 20, 500))))

[1] 2 3 6

But you could also make a new function using compose(). You give compose() functions to execute in order from right to left (just like we have written above):

add_ten_log_and_round <- compose(round, log, add_ten)

add_ten_log_and_round(c(1, 20, 500))

[1] 2 3 6

compose() is great for simplifying your code if you’re going to use a sequence of functions again and again.

What if we wanted to round to the nearest tenth instead? You can combine compose() with partial()!

round_tenth <- partial(round, digits = 1)

add_ten_log_and_round_tenth <- compose(round_tenth, log, add_ten)

add_ten_log_and_round_tenth(c(1, 20, 500))

[1] 2.4 3.4 6.2

Conclusion

There’s so much to learn in the wonderful world of purrr! I’m a big believer that just knowing something exists gets you a lot of the way there. Maybe you can’t think of a use for partial() right now, but you’ll run into a problem a few months down the line where it would be handy. You don’t need to remember the exact syntax; all you need to know is what to Google for and that you don’t need to spend hours trying to find some other solution or resign yourself to writing repetitive code.

One thing I didn’t show in this post was how all of these functions can fit together in an analysis. For that, I’ll recommend Colin’s course again. He also has a series of six blog posts illustrating different purrr functions, including some I didn’t cover here; check them out!

Footnotes

Now, this is a bit of a silly example. Because round() is vectorized (it works not just on a single value but a vector of them), you could have just done this instead:
```
round(my_vector)
```
```
[1] 1 2 3 5
```
This won’t work, however, if your input is a list instead of a vector:
```
vec_as_list <- list(my_vector)

round(vec_as_list)
```
```
Error in round(vec_as_list): non-numeric argument to mathematical function
```
But map() still will:
```
map(vec_as_list, round)
```
```
[[1]]
[1] 1 2 3 5
```
Generally, you don’t need map when you’re working with a simple vector and a vectorized function. map shines where you either a) have a non-vectorized function or b) a more complicated data structure (including a list).↩︎