Speed up R with Parallel Programming in the Cloud

This past weekend I attended the R User Day at Data Day Texas in Austin. It was a great event, mainly because so many awesome people from the R community came to give some really interesting talks. Lucy D’Agostino McGowan has kindly provided a list of the talks and links to slides, and I thoroughly recommend checking it out: you’re sure to find a talk (or two, or five or ten) that interests you. My own talk was on Speeding up R with Parallel Programming in the Cloud, where I talked about using the doAzureParallel package to launch clusters for use with the foreach function, and using aztk to launch Spark clusters for use with the sparklyr package. I’ve embdeded the slides below: it’s not quite the same without the demos (and sadly there was no video recording), but I’ve…
Original Post: Speed up R with Parallel Programming in the Cloud

Speed up simulations in R with doAzureParallel

I’m a big fan using R to simulate data. When I’m trying to understand a data set, my first step is sometimes to simulate data from a model and compare the results to the data, before I go down the path of fitting an analytical model directly. Simulations are easy to code in R, but they can sometimes take a while to run — especially if there are a bunch of parameters you want to explore, which in turn requires a bunch of simulations. When your pc only has four core processors and your parallel processing across three of the.#DataScience #RStats #Waiting #Multitasking pic.twitter.com/iVkkr7ibox — Patrick Williams (@unbalancedDad) January 24, 2018 In this post, I’ll provide a simple example of running multiple simulations in R, and show how you can speed up the process by running the simulations in parallel:…
Original Post: Speed up simulations in R with doAzureParallel

Applications now open for R Consortium funding for R user groups and small conferences

If you’re an organizer of an R-focused meetup group, or are planning a community-led R conference, the 2018 R Consortium R User Group Support Program is now accepting applications for sponsorship. The 2017 program funded 76 user groups and 3 small conferences, and the program is expanding further in 2018. User groups now also receive a complimentary Meetup.com Pro account, and the grant levels for small conferences have increased.  This is a fantastic program to support the R community, and supports the R Consortium’s mission to foster the continued growth of R community and the data science ecosystem. Funding for these programs comes from R Consortium members, so if you’d like to see more such programs consider asking your employer to join. R Consortium: The 2018 R Consortium R User Group Support Program is Underway
Original Post: Applications now open for R Consortium funding for R user groups and small conferences

Scraping a website with 5 lines of R code

In what is rapidly becoming a series — cool things you can do with R in a tweet — Julia Silge demonstrates scraping the list of members of the US house of representatives on Wikipedia in just 5 R statements: library(rvest)library(tidyverse)h <- read_html(“https://t.co/gloY1eErBn”)reps <- h %>%html_node(“#mw-content-text > div > table:nth-child(18)”) %>%html_table()reps <- reps[,c(1:2,4:9)] %>%as_tibble() pic.twitter.com/25ANm7BHkj — Julia Silge (@juliasilge) January 12, 2018 Since Twitter munges the URL in the third line when you cut-and-paste, here’s a plain-text version of Julia’s code: library(rvest) library(tidyverse) h <- read_html(“https://en.wikipedia.org/wiki/Current_members_of_the_United_States_House_of_Representatives”) reps <- h %>% html_node(“#mw-content-text > div > table:nth-child(18)”) %>% html_table() reps <- reps[,c(1:2,4:9)] %>% as_tibble() And sure enough, here’s what the reps object looks like in the RStudio viewer: As Julia notes it’s not perfect, but you’re still 95% of the way there to gathering data from a page intended for human rather…
Original Post: Scraping a website with 5 lines of R code

Visualize your Strava routes with R

Strava is a fitness app that records you activities, including the routes of your walks, rides and runs. The service also provides an API that allows you to extract all of your data for analysis. University of Melbourne research fellow Marcus Volz created an R package to download and visualize Strava data, and created a chart to visualize all of his runs over six years as a small multiple. Inspired by his work (and the availability of the R package he created), others also visualized bike rides, activity calendars, and aggregated route maps with elevation data. (You can see several examples in the Twitter moment embedded below.) If you’d like to download your own Strava data, all you need a Strava access token, a recent version of R (3.4.3 or later) and the strava package found on Github.
Original Post: Visualize your Strava routes with R

Because it's Friday: Principles and Values

Most companies publish mission and vision statements, and some also publish a detailed list of principles that underlie the company ethos. But what makes a good collection of principles, and does writing them down really matter? At the recent Monktoberfest conference, Bryan Cantrill argued that yes, they do matter, mostly by way of some really egregious counterexamples. That’s all from the blog for this week. We’ll be back on Monday — have a great weekend!
Original Post: Because it's Friday: Principles and Values

Registration and talk proposals now open for useR!2018

Registration is now open for useR! 2018, the official R user conference to be held in Brisbane, Australia July 10-13. If you haven’t been to a useR! conference before, it’s a fantastic opportunity to meet and mingle with other R users from around the world, see talks on R packages and applications, and attend tutorials for deep dives on R-related topics. This year’s conference will also feature keynotes from Jenny Bryan, Steph De Silva, Heike Hofmann, Thomas Lin Pedersen, Roger Peng and Bill Venables. It’s my favourite conference of the year, and I’m particularly looking forward to this one. This video from last year’s conference in Brussels (a sell-out with over 1,1000 attendees) will give you a sense of what a useR! conference is like: The useR! conference brought to you by the R Foundation and is 100% community-led.…
Original Post: Registration and talk proposals now open for useR!2018

Microsoft R Open 3.4.3 now available

Microsoft R Open (MRO), Microsoft’s enhanced distribution of open source R, has been upgraded to version 3.4.3 and is now available for download for Windows, Mac, and Linux. This update upgrades the R language engine to the latest R (version 3.4.3) and updates the bundled packages (specifically: checkpoint, curl, doParallel, foreach, and iterators) to new versions.  MRO is 100% compatible with all R packages. MRO 3.4.3 points to a fixed CRAN snapshot taken on January 1 2018, and you can see some highlights of new packages released since the prior version of MRO on the Spotlights page. As always, you can use the built-in checkpoint package to access packages from an earlier date (for reproducibility) or a later date (to access new and updated packages). MRO 3.4.3 is based on R 3.4.3, a minor update to the R engine (you can see the detailed list…
Original Post: Microsoft R Open 3.4.3 now available

A simple way to set up a SparklyR cluster on Azure

The SparklyR package from RStudio provides a high-level interface to Spark from R. This means you can create R objects that point to data frames stored in the Spark cluster and apply some familiar R paradigms (like dplyr) to the data, all the while leveraging Spark’s distributed architecture without having to worry about memory limitations in R. You can also access the distributed machine-learning algorithms included in Spark directly from R functions.  If you don’t happen to have a cluster of Spark-enabled machines set up in a nearby well-ventilated closet, you can easily set one up in your favorite cloud service. For Azure, one option is to launch a Spark cluster in HDInsight, which also includes the extensions of Microsoft ML Server. While this service recently had a significant price reduction, it’s still more expensive than running a “vanilla” Spark-and-R…
Original Post: A simple way to set up a SparklyR cluster on Azure