R bloggers

SQL Saturday statistics – Web Scraping with R and SQL Server

FavoriteLoadingAdd to favorites

I wanted to check a simple query: How many times has a particular topic been presented and from how many different presenters. Sounds interesting, tackling the problem should not be a problem, just that the end numbers may vary, since there will be some text analysis included. First of all, some web scraping and getting the information from Sqlsaturday web page. Reading the information from the website, and with R/Python integration into SQL Server, this is fairly straightforward task: EXEC sp_execute_external_script @language = N’R’ ,@script = N’ library(rvest) library(XML) library(dplyr) #URL to schedule url_schedule <- ”http://www.sqlsaturday.com/687/Sessions/Schedule.aspx” #Read HTML webpage <- read_html(url_schedule) # Event schedule schedule_info <- html_nodes(webpage, ”.session-schedule-cell-info”) # OK # Extracting HTML content ht <- html_text(schedule_info) df <- data.frame(data=ht) #create empty DF df_res <- data.frame(title=c(), speaker=c()) for (i in 1:nrow(df)){ #print(df[i]) if (i %% 2 != 0) #odd flow…
Original Post: SQL Saturday statistics – Web Scraping with R and SQL Server