A Short Introduction on How to Use Purrr in Conjunction with Broom

Fri, Jan 14, 2022 3-minute read

In this blog post, I will carry out a basic usage of the package purrr in conjunction with the package broom. This blog post was motivated by the README file (link) of purrr. Let's load the packages first and then I will flesh it out.

library(purrr)
library(broom)
library(dplyr)
library(tidyr)

Here I copy the code from the link mentioned above.

mtcars %>%
  split(.$cyl) %>% # from base R
  map(~ lm(mpg ~ wt, data = .)) %>%
  map(summary) %>%
  map_dbl("r.squared")
##         4         6         8 
## 0.5086326 0.4645102 0.4229655

mtcars is a famous data set in the R community. When evaluating the code, there are few things we can change. First off, it uses the split() function that is from base R, so is summary(). I think this is a great opportunity to illustrate how to combine purrr::map() with the broom package to reach the same result as the code output above.

I. Instead of using split(), we use nest() from tidyr:

mtcars %>%
  nest(-cyl) 
## # A tibble: 3 x 2
##     cyl data              
##   <dbl> <list>            
## 1     6 <tibble [7 x 10]> 
## 2     4 <tibble [11 x 10]>
## 3     8 <tibble [14 x 10]>

II. Using map() and glance() on the data column:

mtcars %>%
  nest(-cyl) %>%
  mutate(model = map(data, ~lm(mpg ~ wt, data = .)),
         glanced = map(model, glance))
## # A tibble: 3 x 4
##     cyl data               model  glanced          
##   <dbl> <list>             <list> <list>           
## 1     6 <tibble [7 x 10]>  <lm>   <tibble [1 x 12]>
## 2     4 <tibble [11 x 10]> <lm>   <tibble [1 x 12]>
## 3     8 <tibble [14 x 10]> <lm>   <tibble [1 x 12]>

There is another way to do it:

mtcars %>%
  nest(-cyl) %>%
  mutate(model = map(data, ~lm(.$mpg ~ .$wt)),
         glanced = map(model, glance))
## # A tibble: 3 x 4
##     cyl data               model  glanced          
##   <dbl> <list>             <list> <list>           
## 1     6 <tibble [7 x 10]>  <lm>   <tibble [1 x 12]>
## 2     4 <tibble [11 x 10]> <lm>   <tibble [1 x 12]>
## 3     8 <tibble [14 x 10]> <lm>   <tibble [1 x 12]>

The dot . is a pronoun, referring to data in this case.

III. unnest() the tidied, then pivot_wider()

mtcars %>%
  nest(-cyl) %>%
  mutate(model = map(data, ~lm(mpg ~ wt, data = .)),
         glanced = map(model, glance)) %>%
  unnest(glanced) %>%
  select(cyl, r.squared) %>%
  pivot_wider(names_from = "cyl", values_from = "r.squared") 
## # A tibble: 1 x 3
##     `6`   `4`   `8`
##   <dbl> <dbl> <dbl>
## 1 0.465 0.509 0.423

Here we have the same output as what the purrr README provides. I need to say that both packages are fantastic ones, and they have made my data science life much easier, especially purrr. But truth be told, it is not easy to learn purrr at the begining. If you want to learn more about both packages, I would recommend check out their documentation and Github repos.