R bloggers

Player Data for the 2018 FIFA World Cup

FavoriteLoadingAdd to favorites

Official PDF FIFA has made several official player lists available, conveniently changing the format each time. For this exercise, I use the one from early June. The tabulizer package makes extracting information from tables included in a PDF document relatively easy. (The other (later) version of the official PDF is here. Strangely, the weight variable has been dropped.) suppressMessages(library(tidyverse)) library(stringr) suppressMessages(library(lubridate)) suppressMessages(library(cowplot)) # Note that I set warnings to FALSE because of some annoying (and intermittent) # issues with RJavaTools. library(tabulizer) url <- “https://github.com/davidkane9/wc18/raw/master/fifa_player_list_1.pdf” out <- extract_tables(url, output = “data.frame”) We now have a 32 element list, each item a data frame of information about the 23 players on each team. Let’s combine this information into a single tidy tibble. # Note how bind_rows() makes it very easy to combine a list of compatible # dataframes. pdf_data <- bind_rows(out) %>%…
Original Post: Player Data for the 2018 FIFA World Cup