US Spending on Kids Data Visualization

Mon, Feb 14, 2022 5-minute read

This blog post analyzes how much money each state in the U.S. spent on kids through various programs between 1997 and 2016 (20-year period). As usual, the data comes from TidyTuesday.

library(tidyverse)
library(geofacet)
library(scales)
library(tidytext)
theme_set(theme_light())
kids <- read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2020/2020-09-15/kids.csv') %>%
  filter(inf_adj > 0) %>%
  mutate(variable = str_to_title(str_replace(variable, "_", " "))) %>%
  left_join(tibble(state = state.name, state_abb = state.abb), by = "state") %>%
  mutate(state_abb = if_else(is.na(state_abb), "DC", state_abb),
         across(raw:inf_adj_perchild, ~ . * 1000))

Overview of the state governments’ spending on kids:

kids %>% 
  filter(!is.na(inf_adj),
         fct_lump(variable, n = 6, w = inf_adj)!= "Other") %>%
  mutate(variable = fct_reorder(variable, -inf_adj, sum)) %>%
  ggplot(aes(year, inf_adj, color = variable)) +
  geom_line(size = 1) +
  scale_y_continuous(labels = dollar) +
  facet_geo(~state_abb, scales = "free_y") +
  theme(axis.text.x = element_text(angle = 90),
        strip.text = element_text(size = 15, face = "bold"),
        plot.title = element_text(size = 18)) +
  labs(y = "adjusted spending",
       color = NULL,
       title = "State-wise Government Spending on 6 Major Categories on Kids (1997 - 2016)")

The spending on some categories (e.g,Pk12ed) have risen up significantly across all states, yet others remain relatively the same level throughout the years.

kids %>%
  filter(fct_lump(variable, n = 6, w = inf_adj_perchild) != "Other") %>%
  mutate(variable = fct_reorder(variable, -inf_adj_perchild, sum)) %>%
  ggplot(aes(year, inf_adj_perchild, color = variable)) +
  geom_line(size = 0.7) +
  scale_y_continuous(labels = dollar) +
  facet_geo(~state_abb) +
  theme(axis.text.x = element_text(angle = 90),
        strip.text = element_text(size = 15, face = "bold"),
        plot.title = element_text(size = 18)) +
  labs(y = "adjusted spending per child",
       color = NULL,
       title = "State-wise Government Spending on 6 Major Categories on Kids (1997 - 2016)")

Although total spending has been rising, the adjusted invenstement on children per captia remains stable. One possible reason is there were more children in the later time interval than the earlier.

Comparing 1997 and 2016 spending:

kids %>%
  filter(year %in% c(1997, 2016)) %>%
  select(1, 2,3,5,7) %>%
  pivot_wider(names_from = "year",
              values_from = "inf_adj") %>%
  mutate(ratio = (`2016` - `1997`)/`1997`) %>%
  group_by(state, state_abb) %>%
  slice_max(ratio, n = 5) %>%
  ungroup() %>%
  mutate(variable = reorder_within(variable, ratio, state_abb)) %>%
  ggplot(aes(ratio, variable, fill = variable)) +
  geom_col(show.legend = F) +
  scale_y_reordered() +
  scale_x_continuous(labels = percent) +
  facet_geo(~state_abb, scales = "free") +
  labs(x = "2016 to 1997 ratio",
       y = "spending sector",
       title = "State-wise Top 5 Largest Increase Among the 20 Year Period") +
  theme(axis.text.x = element_text(angle = 90),
        plot.title = element_text(size = 18),
        strip.text = element_text(size = 15))

Here we get the number of kids each year in each state:

kids_count <- kids %>%
  mutate(num_of_kids = ceiling(inf_adj/inf_adj_perchild)) %>%
  distinct(year, state_abb, .keep_all = T) %>%
  select(state_abb, year, num_of_kids)

kids_count
## # A tibble: 1,020 x 3
##    state_abb  year num_of_kids
##    <chr>     <dbl>       <dbl>
##  1 AL         1997     1187268
##  2 AK         1997      196883
##  3 AZ         1997     1303319
##  4 AR         1997      718408
##  5 CA         1997     9558726
##  6 CO         1997     1084928
##  7 CT         1997      854740
##  8 DE         1997      196901
##  9 DC         1997      127063
## 10 FL         1997     3681284
## # ... with 1,010 more rows

Total number of kids across the U.S.:

kids_count %>%
  group_by(year) %>%
  summarize(num_of_kids_year = sum(num_of_kids)) %>%
  ggplot(aes(year, num_of_kids_year)) +
  geom_line() +
  geom_point() +
  scale_y_continuous(labels = comma) +
  labs(y = "# of kids",
       title = "Yearly # of Kids in the U.S.")

Total number of kids across all states:

kids_count %>%
  ggplot(aes(year, num_of_kids, color = state_abb)) +
  geom_line() +
  geom_point() +
  facet_geo(~state_abb, scales = "free_y") +
  theme(axis.text.x = element_text(angle = 90),
        legend.position = "none",
        plot.title = element_text(size = 18),
        strip.text = element_text(size = 15)) +
  labs(y = "# of kids",
       title = "State-wise # of Kids") 

Kids’ growth rate:

kids_count %>%
  group_by(state_abb) %>%
  mutate(previous_num_of_kids = lag(num_of_kids),
         growth_rate = (num_of_kids - previous_num_of_kids)/previous_num_of_kids) %>% 
  filter(!is.na(growth_rate)) %>%
  ungroup() %>%
  ggplot(aes(year, growth_rate, color = state_abb)) +
  geom_line(size = 1) +
  scale_y_continuous(labels = percent) +
  facet_geo(~state_abb) +
  theme(axis.text.x = element_text(angle = 90),
        legend.position = "none",
        plot.title = element_text(size = 18),
        strip.text = element_text(size = 15)) +
  labs(y = "rate of kids' growth",
       title = "State-wise Yearly Growth Rate for Kids",
       subtitle = "Growth rate compares the current year to the previous year") 

All states remain relatively stable on kids’ growth rate. What is worth noting is that the state of LA had a sudden drop on kids’ growth rate. I guess the reason was the Katrina Hurricane.

Let’s look at Ohio in particular:

ohio <- kids %>%
  filter(state == "Ohio")

ohio
## # A tibble: 428 x 7
##    state variable       year          raw     inf_adj inf_adj_perchild state_abb
##    <chr> <chr>         <dbl>        <dbl>       <dbl>            <dbl> <chr>    
##  1 Ohio  Pk12ed         1997 10213866000      1.46e10           4764.  OH       
##  2 Ohio  Highered       1997  1706351000      2.43e 9            796.  OH       
##  3 Ohio  Edsubs         1997   553903000      7.90e 8            258.  OH       
##  4 Ohio  Edservs        1997   219720000      3.13e 8            102.  OH       
##  5 Ohio  Pell           1997   216774138      3.09e 8            101.  OH       
##  6 Ohio  Headstartpriv  1997   134497015.     1.92e 8             62.7 OH       
##  7 Ohio  Tanfbasic      1997   697067991      9.94e 8            325.  OH       
##  8 Ohio  Othercashserv  1997  2146527000      3.06e 9           1001.  OH       
##  9 Ohio  Snap           1997   351871375      5.02e 8            164.  OH       
## 10 Ohio  Socsec         1997   474286127.     6.76e 8            221.  OH       
## # ... with 418 more rows
ohio %>%
  mutate(variable = fct_reorder(variable, inf_adj, sum)) %>%
  ggplot(aes(year, variable, fill = inf_adj)) +
  geom_tile() +
  scale_fill_gradient(low = "red",
                      high = "green",
                      trans = "log10",
                      labels = dollar) +
  theme(panel.grid = element_blank()) +
  labs(y = "spending sector",
       fill = "adjusted spending",
       title = "Ohio's Spendings on Kids")