R bloggers

Analysis of the Renert – Part 2: Data Processing

FavoriteLoadingAdd to favorites

This is part 2 of a 3 part blog post. This post uses the data that we scraped in part 1 and prepares it for further analysis, which is quite technical. If you’re only interested in the results of the analysis, skip to part 3! First, let’s load the data that we prepared in part 1. Let’s start with the full text: library(“tidyverse”) library(“tidytext”) renert = readRDS(“renert_full.rds”) I want to study the frequencies of words, so for this, I will use a function from the tidytext package called unnest_tokens() which breaks the text down into tokens. Each token is a word, which will then make it possible to compute the frequencies of words. So, let’s unnest the tokens: renert = renert %>% unnest_tokens(word, text) We still need to do some cleaning before continuing. In Luxembourgish, the is written d’ for…
Original Post: Analysis of the Renert – Part 2: Data Processing