Obstacles to performance in parallel programming

Making your code run faster is often the primary goal when using parallel programming techniques in R, but sometimes the effort of converting your code to use a parallel framework leads only to disappointment, at least initially. Norman Matloff, author of Parallel Computing for Data Science: With Examples in R, C++ and CUDA, has shared chapter 2 of that book online, and it describes some of the issues that can lead to poor performance. They include: Communications overhead, particularly an issue with fine-grained parallelism consisting of a very large number of relatively small tasks; Load balance, where the computing resources aren’t contributing equally to the problem; Impacts from use of RAM and virtual memory, such as cache misses and page faults; Network effects, such as latency and bandwidth, that impact performance and communication overhead; Interprocess conflicts and thread scheduling;  Data access and…
Original Post: Obstacles to performance in parallel programming

Starting a Rmarkdown Blog with Bookdown + Hugo + Github

Finally, -after 24h of failed attempts-, I could get my favourite Hugo theme up and running with R Studio and Blogdown. All the steps I followed are detailed in my new Blogdown entry, which is also a GitHub repo. After exploring some alternatives, like Shirin’s (with Jekyll), and Amber Thomas advice (which involved Git skills beyond my basic abilities), I was able to install Yihui’s hugo-lithium-theme in a new repository. However, I wanted to explore other blog templates, hosted in GiHub, like: The three first themes are currently linked in the blogdown documentation as being most simple and easy to set up for unexperienced blog programmers, but I hope the list will grow in the following months. For those who are willing to experiment, the complete list is here. Finally I chose the hugo-tranquilpeak theme, by Thibaud Leprêtre, for which…
Original Post: Starting a Rmarkdown Blog with Bookdown + Hugo + Github

GoTr – R wrapper for An API of Ice And Fire

Ava Yang It’s Game of Thrones time again as the battle for Westeros is heating up. There are tons of ideas, ingredients and interesting analyses out there and I was craving for my own flavour. So step zero, where is the data? Jenny Bryan’s purrr tutorial introduced the list got_chars, representing characters information from the first five books, which seems not much fun beyond exercising list manipulation muscle. However, it led me to an API of Ice and Fire, the world’s greatest source for quantified and structured data from the universe of Ice and Fire including the HBO series Game of Thrones. I decided to create my own API functions, or better, an R package (inspired by the famous rwar package). The API resources cover 3 types of endpoint – Books, Characters and Houses. GoTr pulls data in JSON format…
Original Post: GoTr – R wrapper for An API of Ice And Fire

Oil leakage… those old BMW’s are bad :-)

Introduction My first car was a 13 year Mitsubishi Colt, I paid 3000 Dutch Guilders for it. I can still remember a friend that would not like me to park this car in front of his house because of possible oil leakage. Can you get an idea of which cars will likely to leak oil? Well, with open car data from the Dutch RDW you can. RDW is the Netherlands Vehicle Authority in the mobility chain. RDW Data There are many data sets that you can download. I have used the following: Observed Defects. This set contains 22 mln. records on observed defects at car level (license plate number). Cars in The Netherlands have to be checked yearly, and the findings of each check are submitted to RDW. Basic car details. This set contains 9 mln. records, they are all the cars…
Original Post: Oil leakage… those old BMW’s are bad 🙂

RcppArmadillo 0.7.960.1.0

The bi-monthly RcppArmadillo release is out with a new version 0.7.960.1.0 which is now on CRAN, and will get to Debian in due course. And it is a big one. Lots of nice upstream changes from Armadillo, and lots of work on our end as the Google Summer of Code project by Binxiang Ni, plus a few smaller enhancements — see below for details. Armadillo is a powerful and expressive C++ template library for linear algebra aiming towards a good balance between speed and ease of use with a syntax deliberately close to a Matlab. RcppArmadillo integrates this library with the R environment and language–and is widely used by (currently) 379 other packages on CRAN—an increase of 49 since the last CRAN release in June! Changes in this release relative to the previous CRAN release are as follows: Changes in…
Original Post: RcppArmadillo 0.7.960.1.0

2017 App Update

As you may have noticed, we have made a few changes to our apps for the 2017 season to bring you a smoother and quicker experience while also adding more advanced and customizable views. Most visibly, we moved the apps to Shiny so we can continue to build on our use of R and add new features and improvements throughout the season.  We expect the apps to better handle high traffic load this season during draft season and peak traffic. In addition to the ability to create and save custom settings, you can also choose the columns you view in our Projections tool.  We have also added more advanced metrics such as weekly VOR and Projected Points Per Dollar (ROI) for those of you in auction leagues.  With a free account, you’ll be able to create and save one custom setting.  If…
Original Post: 2017 App Update

I made a 3D movie with ggplot2 once – here’s how I did it

Some time ago (last year actually 😳) I had a blastdeveloping a feature for ggforce which had been on my mind for far to long thanits limited utility warranted. The idea was to showcase the new facettingextension powers I’d added to ggplot2 by making a facetting function thatcreated a stereoscopic pair of plots that would simulate 3D. To procrastinateand show off I made a little animated video with the feature and posted it onTwitter, promising I’d write about it someday. Now (again, one year later), Ithink the world is finally ready to see what went through my R console to makethat little animation. While I’ve been very timely with this blog post, thefeature is still not available on CRAN, so you’ll need to install the GitHubversion of ggforce to follow along. devtools::install_github(‘thomasp85/ggforce’) Setup The goal is to create a spinning…
Original Post: I made a 3D movie with ggplot2 once – here’s how I did it

RStudio Server Pro is ready for BigQuery on the Google Cloud Platform

RStudio is excited to announce the availability of RStudio Server Pro on the Google Cloud Platform. RStudio Server Pro GCP is identical to RStudio Server Pro, but with additional convenience for data scientists, including pre-installation of multiple versions of R, common systems libraries, and the BigQuery package for R. RStudio Server Pro GCP adapts to your unique circumstances. It allows you to choose different GCP computing instances for RStudio Server Pro no matter how large, whenever a project requires it (hourly pricing). If the enhanced security, support for multiple R versions and multiple sessions, and commercially licensed and supported features of RStudio Server Pro appeal to you, please give RStudio Server Pro for GCP a try. Below are some useful links to get you started: Related To leave a comment for the author, please follow the link and comment on…
Original Post: RStudio Server Pro is ready for BigQuery on the Google Cloud Platform

20 years of the R Core Group

The first “official” version of R, version 1.0.0, was released on February 29, 200. But the R Project had already been underway for several years before then. Sharing this tweet, from yesterday, from R Core member Peter Dalgaard: It was twenty years ago today, Ross Ihaka got the band to play…. #rstats pic.twitter.com/msSpPz2kyA — Peter Dalgaard (@pdalgd) August 16, 2017 Twenty years ago, on August 16 1997, the R Core Group was formed. Before that date, the committers to R were the projects’ founders Ross Ihaka and Robert Gentleman, along with Luke Tierney, Heiner Schwarte and Paul Murrell. The email above was the invitation for Kurt Kornik, Peter Dalgaard and Thomas Lumley to join as well. With the sole exception of Schwarte, all of the above remain members of the R Core Group, which has since expanded to 21 members.…
Original Post: 20 years of the R Core Group

Probability functions beginner

On this set of exercises, we are going to explore some of the probability functions in R with practical applications. Basic probability knowledge is required. Note: We are going to use random number functions and random process functions in R such as runif, a problem with these functions is that every time you run them you will obtain a different value. To make your results reproducible you can specify the value of the seed using set.seed(‘any number’) before calling a random function. (If you are not familiar with seeds, think of them as the tracking number of your random numbers). For this set of exercises we will use set.seed(1), don’t forget to specify it before every random exercise. Answers to the exercises are available here If you obtained a different (correct) answer than those listed on the solutions page, please…
Original Post: Probability functions beginner