A Short Introduction on How to Use Purrr in Conjunction with Broom
In this blog post, I will carry out a basic usage of the package purrr in conjunction with the package broom. This blog post was motivated by the README file (link) of purrr. Let's load the packages first and then I will flesh it out.
library(purrr)
library(broom)
library(dplyr)
library(tidyr)
Here I copy the code from the link mentioned above.
mtcars %>%
split(.$cyl) %>% # from base R
map(~ lm(mpg ~ wt, data = .)) %>%
map(summary) %>%
map_dbl("r.squared")
## 4 6 8
## 0.5086326 0.4645102 0.4229655
mtcars is a famous data set in the R community. When evaluating the code, there are few things we can change. First off, it uses the split() function that is from base R, so is summary(). I think this is a great opportunity to illustrate how to combine purrr::map() with the broom package to reach the same result as the code output above.
I. Instead of using split(), we use nest() from tidyr:
mtcars %>%
nest(-cyl)
## # A tibble: 3 x 2
## cyl data
## <dbl> <list>
## 1 6 <tibble [7 x 10]>
## 2 4 <tibble [11 x 10]>
## 3 8 <tibble [14 x 10]>
II. Using map() and glance() on the data column:
mtcars %>%
nest(-cyl) %>%
mutate(model = map(data, ~lm(mpg ~ wt, data = .)),
glanced = map(model, glance))
## # A tibble: 3 x 4
## cyl data model glanced
## <dbl> <list> <list> <list>
## 1 6 <tibble [7 x 10]> <lm> <tibble [1 x 12]>
## 2 4 <tibble [11 x 10]> <lm> <tibble [1 x 12]>
## 3 8 <tibble [14 x 10]> <lm> <tibble [1 x 12]>
There is another way to do it:
mtcars %>%
nest(-cyl) %>%
mutate(model = map(data, ~lm(.$mpg ~ .$wt)),
glanced = map(model, glance))
## # A tibble: 3 x 4
## cyl data model glanced
## <dbl> <list> <list> <list>
## 1 6 <tibble [7 x 10]> <lm> <tibble [1 x 12]>
## 2 4 <tibble [11 x 10]> <lm> <tibble [1 x 12]>
## 3 8 <tibble [14 x 10]> <lm> <tibble [1 x 12]>
The dot . is a pronoun, referring to data in this case.
III. unnest() the tidied, then pivot_wider()
mtcars %>%
nest(-cyl) %>%
mutate(model = map(data, ~lm(mpg ~ wt, data = .)),
glanced = map(model, glance)) %>%
unnest(glanced) %>%
select(cyl, r.squared) %>%
pivot_wider(names_from = "cyl", values_from = "r.squared")
## # A tibble: 1 x 3
## `6` `4` `8`
## <dbl> <dbl> <dbl>
## 1 0.465 0.509 0.423
Here we have the same output as what the purrr README provides. I need to say that both packages are fantastic ones, and they have made my data science life much easier, especially purrr. But truth be told, it is not easy to learn purrr at the begining. If you want to learn more about both packages, I would recommend check out their documentation and Github repos.