I recently completed Colin Fay’s excellent DataCamp course, Intermediate Functional Programming with purrr (full disclosure: I work at DataCamp, but part of why I joined was that I was a big fan of the short, interactive course format). Although I’ve used the `purrr`

package before, there were a lot of functions in this course that were new to me. I wrote this post to hopefully demystify `purrr`

a bit for those who find it overwhelming and illustrate some of its lesser known functions. Most of these functions are covered in Colin’s course, though I added a few I found on the `purrr`

cheatsheet.

## Introduction

`purrr`

is a package for functional programming in R. If you’re familiar with it, it’s probably because of the `map()*`

functions. And if you’ve been a little bit intimidated by them, I’m right there with you. You’ll often see `purrr`

used with nested lists or dataframes, like in this (modified) example from one of the last lessons in Jenny Bryan’s `purrr`

tutorial:

```
library(dplyr)
library(purrr)
library(gapminder)
library(tidyr)
```

```
%>%
gapminder group_by(country) %>%
nest() %>%
mutate(fit = map(data, ~ lm(lifeExp ~ year, data = .x))) %>%
mutate(rsq = map_dbl(fit, ~ summary(.x)[["r.squared"]])) %>%
arrange(rsq)
```

```
# A tibble: 142 × 4
# Groups: country [142]
country data fit rsq
<fct> <list> <list> <dbl>
1 Rwanda <tibble [12 × 5]> <lm> 0.0172
2 Botswana <tibble [12 × 5]> <lm> 0.0340
3 Zimbabwe <tibble [12 × 5]> <lm> 0.0562
4 Zambia <tibble [12 × 5]> <lm> 0.0598
5 Swaziland <tibble [12 × 5]> <lm> 0.0682
6 Lesotho <tibble [12 × 5]> <lm> 0.0849
7 Cote d'Ivoire <tibble [12 × 5]> <lm> 0.283
8 South Africa <tibble [12 × 5]> <lm> 0.312
9 Uganda <tibble [12 × 5]> <lm> 0.342
10 Congo, Dem. Rep. <tibble [12 × 5]> <lm> 0.348
# … with 132 more rows
# ℹ Use `print(n = ...)` to see more rows
```

If you’ve generally worked with plain-old table data or vectors (like I have), you might have this reaction to that code:

But I am here to tell you: `purrr`

can make your life easier even if you never write code like this. Certainly, knowing how to work with complicated nested lists and dataframes is very useful - it can simplify code you’ve written, and your data may arrive in that format (for example, JSON data is often represented as nested lists or dataframes in R). But even if all you ever work with is “simple” lists, dataframes, and vectors, you’ll be glad to know a bit of `purrr`

.

## A brief introduction to map()

`map()`

lets you take a list or vector and apply a function to each element. If you’ve been using R for a while, you might be familiar with the apply functions, including `sapply()`

and `lapply()`

. `map()`

does essentially the same thing, but offers several advantages, most importantly consistency of output and helpers that let you write more concise code (see the first Stack Overflow answer by Hadley Wickham here).

Let’s look at an example, where we take a list of four numbers and round each of them.

```
<- c(1.0212, 2.483, 3.189, 4.5938)
my_vector
map(my_vector, round)
```

```
[[1]]
[1] 1
[[2]]
[1] 2
[[3]]
[1] 3
[[4]]
[1] 5
```

The result is a list (hence the double brackets), which is always the case with `map()`

. But you can return a dataframe or different types of vectors instead by using the appropriate `map_*()`

function. In this case, let’s return a vector of type `double`

^{1}:

`map_dbl(my_vector, round)`

`[1] 1 2 3 5`

You can also use `map`

with an “anonymous” function, a function that doesn’t have a name. The formula is a `~`

followed by what you want to do to each element, with `.x`

representing the element (`.`

also works). We can make an anonymous function to add ten to each element of our vector:

`map_dbl(my_vector, ~ .x + 10)`

`[1] 11.0212 12.4830 13.1890 14.5938`

`map()`

can get much (much) more complicated, with nested lists and multiple inputs and arguments, but even knowing this basic use case can help you! If you do want to dive in more, check out chapter 21 of R for Data Science, Jenny Bryan’s purrr tutorials, Auriel Fournier’s Foundations of Functional Programming with purrr course, and chapters 3 and 4 of Writing Functions in R by Charlotte and Hadley Wickham on DataCamp.

## Beyond map()

While `map*()`

is great, it can still take a while to wrap your head around. But `purrr`

offers dozens of useful functions that you can start using right away to streamline your workflow, even if you don’t use `map()`

. Let’s check out a few. I’ll separate them into two types: those that create new functions and those that modify a list/vector.

### Modifying and summarizing vectors/lists

#### keep() and discard()

`keep()`

and `discard()`

… keep and discard elements of a list or vector based on a *predicate function*. A predicate function is a function that returns `TRUE`

or `FALSE`

. So `is.factor()`

is a predicate function, because it always returns `TRUE`

or `FALSE`

, while `round()`

is not.

For example, we can `keep()`

all elements of our list that are less than 3 with the following code:

`keep(my_vector, ~ .x < 3)`

`[1] 1.0212 2.4830`

Similarly, we could `discard()`

all elements less than 3:

`discard(my_vector, ~ .x < 3)`

`[1] 3.1890 4.5938`

#### map_if()

What if you’re not sure about the types of every element in your list, and you want to apply a function that needs the input to be of a certain type? For example, let’s say we wanted to add 10 to every element of a list.

```
<- list("happy", 2L, 4.39)
mixed_list
<- function(n) {
add_ten + 10
n }
```

`map(mixed_list, add_ten)`

`Error in n + 10: non-numeric argument to binary operator`

We get an error since we’re trying to add 10 to “happy”, which isn’t numeric!

This is where `map_if()`

comes in handy. Just like `mutate_if()`

, `select_if()`

, and `summarise_if()`

, you add a condition and the function will only apply to those columns (or list elements) where the condition is met. Here, we know that condition is that the element needs to be numeric. Let’s try again with `map_if()`

:

`map_if(mixed_list, is.numeric, add_ten)`

```
[[1]]
[1] "happy"
[[2]]
[1] 12
[[3]]
[1] 14.39
```

Alright! We see it skipped over “happy”, preserving it as is, and added ten to the two numeric elements.

#### every() and some()

Sometimes you have a giant list and want to know whether each element meets a condition, like being numeric. You can use `every()`

, which will check if every element of a list satisfies a predicate function:

`every(mixed_list, is.numeric)`

`[1] FALSE`

Since “happy” is the first element of `mixed_list`

, it doesn’t past the test.

If we want to be less strict and just check if **some** of the elements satisfies a predicate function, we can use `some()`

instead:

`some(mixed_list, is.numeric)`

`[1] TRUE`

In this case, since some elements of `mixed_list`

were numeric, we got `TRUE`

.

### Modifying functions

`purrr`

includes **adverbs** - functions that take a function and return a modified version (just as an adverb modifies a verb). Let’s check out a few!

#### negate()

`negate()`

… negates a predicate function (aren’t the `purrr`

function names great?). For example, let’s say you want to check which elements of a list were not null. This is how you would do it with `map_lgl`

(which returns a logical vector rather than a list):

`<- list("a", 3, 22, NULL, "q", NULL) lst `

`map_lgl(lst, ~ !is.null(.))`

`[1] TRUE TRUE TRUE FALSE TRUE FALSE`

This works, but it’s not super easy to read. Instead, we can make an `is_not_null()`

function using `negate()`

:

```
<- negate(is.null)
is_not_null
map_lgl(lst, is_not_null)
```

`[1] TRUE TRUE TRUE FALSE TRUE FALSE`

Voila!

#### partial()

You probably have a couple functions where you almost always use an extra argument, like `mean()`

with `na.rm = TRUE`

or `round()`

with `digits = 1`

. You can use `partial()`

to create a new function where those are always specified, saving you some repetitive typing!

`mean(c(10, NA, 5, 7))`

`[1] NA`

`mean(c(10, NA, 5, 7), na.rm = TRUE)`

`[1] 7.333333`

```
<- partial(mean, na.rm = TRUE)
my_mean
my_mean(c(10, NA, 5, 7))
```

`[1] 7.333333`

```
<- partial(round, digits = 1)
my_round
my_round(10.484)
```

`[1] 10.5`

#### safely() and possibly()

We saw earlier that `map_if()`

could be used where we have a condition we want to be met before applying a function. In our case, we used it to avoid an error, but you could also use it to meet a condition like the number being negative or greater than a threshold. But what if you don’t know where errors could come from but want to handle them? This is where `safely()`

and `possibly()`

come in.

If you’re not interested in what the error is, you should use `possibly()`

. In addition to the function it’s wrapping around, you need to specify the argument `otherwise`

, which is what you want to return if there is an error. Let’s take a look:

```
<- possibly(add_ten, otherwise = "I'm not numeric!")
possibly_add_ten
map(mixed_list, possibly_add_ten)
```

```
[[1]]
[1] "I'm not numeric!"
[[2]]
[1] 12
[[3]]
[1] 14.39
```

Side-note - I find the the double-brackets confusing. We’ve got them since our list isn’t named. Let’s set the name of each element with `purrr's`

`set_names()`

, giving our elements the very creative names `a`

, `b`

, and `c`

.

```
<- set_names(mixed_list, c("a", "b", "c"))
mixed_list
mixed_list
```

```
$a
[1] "happy"
$b
[1] 2
$c
[1] 4.39
```

Well … mildly easier to read at least.

On the other hand, sometimes you do want to know what the error is. If that’s the case, you can use `safely()`

instead:

`map(mixed_list, safely(add_ten))`

```
$a
$a$result
NULL
$a$error
<simpleError in n + 10: non-numeric argument to binary operator>
$b
$b$result
[1] 12
$b$error
NULL
$c
$c$result
[1] 14.39
$c$error
NULL
```

`safely()`

returns a list of lists. Each element from the original list has two entries: `result`

and `error`

. One is always `NULL`

- if there was an error, `result`

is `NULL`

and `error`

is the error message; if there wasn’t an error, the `result`

is the result and `error`

is `NULL`

. If you want to get back all the errors or results, you can use another handy feature of `map()`

. If you give `map()`

a string as the second argument, for each element of the list, it will return the element inside of it with that name.

```
<- safely(add_ten)
safely_add_ten
%>%
mixed_list map(safely_add_ten) %>%
map("error")
```

```
$a
<simpleError in n + 10: non-numeric argument to binary operator>
$b
NULL
$c
NULL
```

Ah, no more list of lists. That was scary. I promise I won’t do it again.

#### compact()

In the previous example, we probably wouldn’t be interested in the instances where errors were `NULL`

. We could use `discard(is.null)`

, but `purrr`

actually provides a function just for the purpose of getting rid of `NULL`

s: `compact()`

.

```
%>%
mixed_list map(safely_add_ten) %>%
map("error") %>%
compact()
```

```
$a
<simpleError in n + 10: non-numeric argument to binary operator>
```

#### compose()

`compose()`

lets you string together multiple functions. Let’s say you want to `add_ten()`

, take the log, and then round a vector of numbers. You could do either of these:

```
c(1, 20, 500) %>%
add_ten() %>%
log() %>%
round()
```

`[1] 2 3 6`

`round(log(add_ten(c(1, 20, 500))))`

`[1] 2 3 6`

But you could also make a new function using `compose()`

. You give `compose()`

functions to execute in order from *right to left* (just like we have written above):

```
<- compose(round, log, add_ten)
add_ten_log_and_round
add_ten_log_and_round(c(1, 20, 500))
```

`[1] 2 3 6`

`compose()`

is great for simplifying your code if you’re going to use a sequence of functions again and again.

What if we wanted to round to the nearest tenth instead? You can combine `compose()`

with `partial()`

!

```
<- partial(round, digits = 1)
round_tenth
<- compose(round_tenth, log, add_ten)
add_ten_log_and_round_tenth
add_ten_log_and_round_tenth(c(1, 20, 500))
```

`[1] 2.4 3.4 6.2`

## Conclusion

There’s so much to learn in the wonderful world of `purrr`

! I’m a big believer that just knowing something exists gets you a lot of the way there. Maybe you can’t think of a use for `partial()`

right now, but you’ll run into a problem a few months down the line where it would be handy. You don’t need to remember the exact syntax; all you need to know is what to Google for and that you don’t need to spend hours trying to find some other solution or resign yourself to writing repetitive code.

One thing I didn’t show in this post was how all of these functions can fit together in an analysis. For that, I’ll recommend Colin’s course again. He also has a series of six blog posts illustrating different `purrr`

functions, including some I didn’t cover here; check them out!

## Footnotes

Now, this is a bit of a silly example. Because

`round()`

is vectorized (it works not just on a single value but a vector of them), you could have just done this instead:`round(my_vector)`

`[1] 1 2 3 5`

This won’t work, however, if your input is a list instead of a vector:

`<- list(my_vector) vec_as_list round(vec_as_list)`

`Error in round(vec_as_list): non-numeric argument to mathematical function`

But

`map()`

still will:`map(vec_as_list, round)`

`[[1]] [1] 1 2 3 5`

Generally, you don’t need

`map`

when you’re working with a simple vector and a vectorized function.`map`

shines where you either a) have a non-vectorized function or b) a more complicated data structure (including a list).↩︎