advanced tips, R

Scraping a website with 5 lines of R code

FavoriteLoadingAdd to favorites

In what is rapidly becoming a series — cool things you can do with R in a tweet — Julia Silge demonstrates scraping the list of members of the US house of representatives on Wikipedia in just 5 R statements: library(rvest)library(tidyverse)h <- read_html(“”)reps <- h %>%html_node(“#mw-content-text > div > table:nth-child(18)”) %>%html_table()reps <- reps[,c(1:2,4:9)] %>%as_tibble() — Julia Silge (@juliasilge) January 12, 2018 Since Twitter munges the URL in the third line when you cut-and-paste, here’s a plain-text version of Julia’s code: library(rvest) library(tidyverse) h <- read_html(“”) reps <- h %>% html_node(“#mw-content-text > div > table:nth-child(18)”) %>% html_table() reps <- reps[,c(1:2,4:9)] %>% as_tibble() And sure enough, here’s what the reps object looks like in the RStudio viewer: As Julia notes it’s not perfect, but you’re still 95% of the way there to gathering data from a page intended for human rather…
Original Post: Scraping a website with 5 lines of R code