Player Data for the 2018 FIFA World Cup

Official PDF FIFA has made several official player lists available, conveniently changing the format each time. For this exercise, I use the one from early June. The tabulizer package makes extracting information from tables included in a PDF document relatively easy. (The other (later) version of the official PDF is here. Strangely, the weight variable has been dropped.) suppressMessages(library(tidyverse)) library(stringr) suppressMessages(library(lubridate)) suppressMessages(library(cowplot)) # Note that I set warnings to FALSE because of some annoying (and intermittent) # issues with RJavaTools. library(tabulizer) url <- “https://github.com/davidkane9/wc18/raw/master/fifa_player_list_1.pdf” out <- extract_tables(url, output = “data.frame”) We now have a 32 element list, each item a data frame of information about the 23 players on each team. Let’s combine this information into a single tidy tibble. # Note how bind_rows() makes it very easy to combine a list of compatible # dataframes. pdf_data <- bind_rows(out) %>%…
