Chopped Data Visualization
Mon, Feb 7, 2022
3-minute read
In this blog post, I will analyze a dataset about the TV show Chopped. Nothing do I know about the show, but this is a great opportunity to dive into a dataset that I do not know any background information. Let’s see what we can find from it. Here is the source of data (link).
library(tidyverse)
library(lubridate)
library(tidytext)
library(widyr)
library(ggraph)
theme_set(theme_light())
chopped <- read_tsv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2020/2020-08-25/chopped.tsv') %>%
mutate(air_date = mdy(air_date)) %>%
filter(episode_rating > 6)
chopped
## # A tibble: 463 x 21
## season season_episode series_episode episode_rating episode_name
## <dbl> <dbl> <dbl> <dbl> <chr>
## 1 1 1 1 9.2 Octopus, Duck, Animal Cr~
## 2 1 2 2 8.8 Tofu, Blueberries, Oyste~
## 3 1 3 3 8.9 Avocado, Tahini, Bran Fl~
## 4 1 4 4 8.5 Banana, Collard Greens, ~
## 5 1 5 5 8.8 Yucca, Watermelon, Torti~
## 6 1 6 6 8.5 Canned Peaches, Rice Cak~
## 7 1 7 7 8.8 Quail, Arctic Char, Beer
## 8 1 8 8 9 Coconut, Calamari, Donuts
## 9 1 9 9 8.9 Mac & Cheese, Cola, Bacon
## 10 1 10 10 8.8 String Cheese, Jicama, G~
## # ... with 453 more rows, and 16 more variables: episode_notes <chr>,
## # air_date <date>, judge1 <chr>, judge2 <chr>, judge3 <chr>, appetizer <chr>,
## # entree <chr>, dessert <chr>, contestant1 <chr>, contestant1_info <chr>,
## # contestant2 <chr>, contestant2_info <chr>, contestant3 <chr>,
## # contestant3_info <chr>, contestant4 <chr>, contestant4_info <chr>
chopped %>%
ggplot(aes(air_date, episode_rating)) +
geom_point(aes(color = factor(season)),
show.legend = F,
size = 2) +
geom_smooth(method = "loess",
se = F,
size = 2) +
scale_x_date(date_breaks = "6 months",
date_labels = "%Y-%m") +
labs(x = "air date",
y = "episode rating",
title = "Episode Ratings with the Air Date",
subtitle = "Colors represent the season")
It seems like the overall episode rating stayed roughly the same all the time, although there was a little dip in the middle somewhere.
chopped %>%
ggplot(aes(episode_rating)) +
geom_histogram(binwidth = 0.1)
Some ideas in this blog post are inspired by David Robinson’s code.
ingredients <- chopped %>%
select(season, season_episode, episode_rating, appetizer:dessert) %>%
pivot_longer(cols = c(appetizer:dessert), names_to = "course", values_to = "ingredient") %>%
separate_rows(ingredient, sep = ",\\s") %>%
mutate(course = factor(course, levels = c("appetizer", "entree", "dessert")))
Popular ingredients:
ingredients %>%
count(course, ingredient, sort = T) %>%
group_by(course) %>%
slice_max(n, n = 20) %>%
ungroup() %>%
mutate(ingredient = reorder_within(ingredient, n, course)) %>%
ggplot(aes(n, ingredient, fill = ingredient)) +
geom_col(show.legend = F) +
scale_y_reordered() +
facet_wrap(~course, scales = "free_y") +
labs(x = "# of times ingredient appears",
y = NULL,
title = "Which are the most popular ingredients?")
How ingredients are correlated within each season?
set.seed(2022)
ingredients %>%
add_count(ingredient) %>%
filter(n > 10) %>%
pairwise_cor(ingredient, season, sort = T) %>%
head(100) %>%
ggraph(layout = "fr") +
geom_edge_link(aes(color = correlation, width = correlation), alpha = 0.5) +
geom_node_point() +
geom_node_text(aes(label = name), hjust = 1, vjust = 1, check_overlap = T, size = 5) +
theme_void() +
guides(color = "none", edge_color = "none") +
labs(edge_color = "correlation",
title = "How are ingredients correlated within each season?") +
theme(plot.title = element_text(size = 18))
new_ingredients <- ingredients %>%
add_count(ingredient) %>%
filter(n > 5) %>%
group_by(ingredient) %>%
summarize(first_season = min(season),
avg_season = mean(season),
last_season = max(season)) %>%
slice(c(1:7), tail(row_number(), n = 7))
ingredients %>%
semi_join(new_ingredients, by = "ingredient") %>%
mutate(ingredient = fct_reorder(ingredient, season, median)) %>%
ggplot(aes(season, ingredient, fill = ingredient)) +
geom_boxplot(show.legend = F)
Interestingly, almond butter was used in the first 15 seasons or so, but then it was abandoned until almost the Season 39.