Teacher-student Ratio and WDI Package Exploration

Thu, Oct 28, 2021 3-minute read

The dataset is from TidyTuesday about teacher-student ratio in each country. Also, this is a great opportunity to harness the package WDI for GDP, population and other key pieces of information about each country.

library(tidyverse)
library(tidytext)
library(scales)
library(WDI)
theme_set(theme_bw())
ratio <- read_csv("https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2019/2019-05-07/student_teacher_ratio.csv")

ratio
## # A tibble: 5,189 x 8
##    edulit_ind indicator   country_code country     year student_ratio flag_codes
##    <chr>      <chr>       <chr>        <chr>      <dbl>         <dbl> <chr>     
##  1 PTRHC_2    Lower Seco~ MRT          Mauritania  2013         56.6  <NA>      
##  2 PTRHC_2    Lower Seco~ MRT          Mauritania  2014         51.9  <NA>      
##  3 PTRHC_2    Lower Seco~ MRT          Mauritania  2015         53.2  <NA>      
##  4 PTRHC_2    Lower Seco~ MRT          Mauritania  2016         38.2  <NA>      
##  5 PTRHC_1    Primary Ed~ COD          Democrati~  2012         34.7  <NA>      
##  6 PTRHC_1    Primary Ed~ COD          Democrati~  2013         37.1  <NA>      
##  7 PTRHC_1    Primary Ed~ COD          Democrati~  2014         35.3  <NA>      
##  8 PTRHC_1    Primary Ed~ COD          Democrati~  2015         33.2  <NA>      
##  9 PTRHC_3    Upper Seco~ SYR          Syrian Ar~  2013          8.47 <NA>      
## 10 PTRHC_02   Pre-Primar~ GNQ          Equatoria~  2012         17.5  <NA>      
## # ... with 5,179 more rows, and 1 more variable: flags <chr>

Top 10 countries with highest and lowest teacher-student ratio

ratio %>%
  filter(!is.na(student_ratio)) %>%
  group_by(year) %>%
  arrange(desc(student_ratio)) %>%
  slice(c(1:10, seq(n() - 10, n()))) %>% 
  ungroup() %>% 
  mutate(country = reorder_within(country, student_ratio, year)) %>%
  ggplot(aes(student_ratio, country, fill = indicator)) +
  geom_col() +
  facet_wrap(~year, scales = "free_y") +
  scale_y_reordered() +
  scale_x_continuous(labels = percent_format(scale = 1)) +
  labs(x = "teacher student ratio",
       y = NULL,
       title = "Top 10 Countries with Highest and Lowest Teacher-Student Ratio") +
  theme(strip.text = element_text(size = 15, face = "bold"),
        axis.title = element_text(size = 14),
        axis.text = element_text(size = 11),
        plot.title = element_text(size = 18))

The WDI Package Exploration

WDIsearch() and WDI() are two functions. One is to search the right term, and the other one is to find the corresponding data.

# WDIsearch("public.*education") %>%
#   as_tibble() %>%
#   arrange(str_length(name)) %>%
#   View()
# 
# WDI(indicator = c('SE.XPD.TOTL.GD.ZS'), start = 2016, end = 2016, extra = T)
joined_2016 <- WDI(indicator = c('SP.POP.TOTL', 'NY.GDP.PCAP.KD', 'SE.ADT.LITR.ZS', 'SE.XPD.TOTL.GD.ZS'), 
                   start = 2016, end = 2016, extra = T) %>%
  as_tibble() %>%
  rename(country_code = "iso3c") %>%
  select(country_code, NY.GDP.PCAP.KD, SP.POP.TOTL, SE.ADT.LITR.ZS, SE.XPD.TOTL.GD.ZS) %>%
  inner_join(ratio %>% filter(year == 2016), by = "country_code")
joined_2016 %>%
  ggplot(aes(student_ratio, NY.GDP.PCAP.KD, color = country)) +
  geom_point() +
  geom_text(aes(label = country), vjust = 1, hjust = 1, check_overlap = T) +
  facet_wrap(~indicator) +
  scale_y_log10() +
  scale_x_log10(label = percent_format(scale = 1)) +
  theme(
    legend.position = "none",
    strip.text = element_text(size = 13, face = "bold"),
    plot.title = element_text(size = 18)
  ) +
  labs(x = "teacher-student ratio",
       y = "GDP per capita",
       title = "Various Education Level Teacher-Student Ratio and GDP Per Capita")

Not surprisingly, there is a negative correlation between teacher-student ratio and GDP per capita. That is to say, affluent countries tend to have smaller teacher-student ratio.

joined_2016 %>%
  ggplot(aes(student_ratio, SP.POP.TOTL, color = country)) +
  geom_point() +
  geom_text(aes(label = country), vjust = 1, hjust = 1, check_overlap = T) +
  facet_wrap(~indicator) +
  scale_y_log10() +
  scale_x_log10(label = percent_format(scale = 1)) +
  theme(
    legend.position = "none",
    strip.text = element_text(size = 13, face = "bold"),
    plot.title = element_text(size = 18)
  ) +
  labs(x = "teacher-student ratio",
       y = "population total",
       title = "Various Education Level Teacher-Student Ratio and Total Population")

There is a positive trend between total population and teacher-student ratio.

joined_2016 %>%
  ggplot(aes(student_ratio, SE.XPD.TOTL.GD.ZS, color = country)) +
  geom_point() +
  geom_text(aes(label = country), vjust = 1, hjust = 1, check_overlap = T) +
  facet_wrap(~indicator) +
  scale_y_log10(label = percent_format(scale = 1)) +
  scale_x_log10(label = percent_format(scale = 1)) +
  theme(
    legend.position = "none",
    strip.text = element_text(size = 13, face = "bold"),
    plot.title = element_text(size = 18)
  ) +
  labs(x = "teacher-student ratio",
       y = "population total",
       title = "Various Education Level Teacher-Student Ratio and GDP Expenditure on Education (%)")