Advisory on Multiple Assignment dplyr::mutate() on Databases

I currently advise R dplyr users to take care when using multiple assignment dplyr::mutate() commands on databases. (image: Kingroyos, Creative Commons Attribution-Share Alike 3.0 Unported License) In this note I exhibit a troublesome example, and a systematic solution. First let’s set up dplyr, our database, and some example data. library(“dplyr”) ## ## Attaching package: ‘dplyr’ ## The following objects are masked from ‘package:stats’: ## ## filter, lag ## The following objects are masked from ‘package:base’: ## ## intersect, setdiff, setequal, union packageVersion(“dplyr”) ## [1] ‘0.7.4’ packageVersion(“dbplyr”) ## [1] ‘1.2.0’ db <- DBI::dbConnect(RSQLite::SQLite(), “:memory:”) d <- dplyr::copy_to( db, data.frame(xorig = 1:5, yorig = sin(1:5)), “d”) Now suppose somewhere in one of your projects somebody (maybe not even you) has written code that looks somewhat like the following. d %>% mutate( delta = 0, x0 = xorig + delta, y0 = yorig…
Original Post: Advisory on Multiple Assignment dplyr::mutate() on Databases

ggplot2 Time Series Heatmaps: revisited in the tidyverse

I revisited my previous post on creating beautiful time series calendar heatmaps in ggplot, moving the code into the tidyverse.To obtain following example: Simply use the following code:I hope the commented code is self-explanatory – enjoy 🙂 Related If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook…
Original Post: ggplot2 Time Series Heatmaps: revisited in the tidyverse

R Weekly Bulletin Vol – XIV

This week’s R bulletin covers some interesting ways to list functions, to list files and illustrates the use of double colon operator. We will also cover functions like path.package,, and rank. Click To TweetHope you like this R weekly bulletin. Enjoy reading! Shortcut Keys 1. New document – Ctrl+Shift+N2. Close active document – Ctrl+W3. Close all open documents – Ctrl+Shift+W Problem Solving Ideas How to list functions from an R package We can view the functions from a particular R package by using the “jwutil”s package. Install the package and use the lsf function from the package. The syntax of the function is given as: lsf(pkg) Where pkg is a character string containing package name. The function returns a character vector of function names in the given package. Example: library(jwutil) library(rowr) lsf(“rowr”) How to list files with a particular…
Original Post: R Weekly Bulletin Vol – XIV

Who wants to work at Google?

In this tutorial, we will explore the open roles at Google, and try to see what common attributes Google is looking for, in future employees. This dataset is a compilation of job descriptions of 1200+ open roles at Google offices across the world. This dataset is available for download from the Kaggle website, and contains text information about job location, title, department, minimum, preferred qualifications and responsibilities of the position. You can download the dataset here, and run the code on the Kaggle site itself here. Using this dataset we will try to answer the following questions: Where are the open roles? Which departments have the most openings? What are the minimum and preferred educational qualifications needed to get hired at Google? How much experience is needed? What categories of roles are the most in demand? Step1 – Data Preparation and Cleaning:…
Original Post: Who wants to work at Google?

Rcpp 0.12.15: Numerous tweaks and enhancements

The fifteenth release in the 0.12.* series of Rcpp landed on CRAN today after just a few days of gestation in incoming/. This release follows the 0.12.0 release from July 2016, the 0.12.1 release in September 2016, the 0.12.2 release in November 2016, the 0.12.3 release in January 2017, the 0.12.4 release in March 2016, the 0.12.5 release in May 2016, the 0.12.6 release in July 2016, the 0.12.7 release in September 2016, the 0.12.8 release in November 2016, the 0.12.9 release in January 2017, the 0.12.10.release in March 2017, the 0.12.11.release in May 2017, the 0.12.12 release in July 2017, the 0.12.13.release in late September 2017, and the 0.12.14.release in November 2017 making it the nineteenth release at the steady and predictable bi-montly release frequency. Rcpp has become the most popular way of enhancing GNU R with C or…
Original Post: Rcpp 0.12.15: Numerous tweaks and enhancements

Winter solstice challenge #3: the winner is Bianca Kramer!

Part of the winning submission in the category ‘best tool‘. A bit later than intended, but I am pleased to announce the winner of the Winter solstice challenge: Bianca Kramer! Of course, she was the only contender, but her solution is awesome! In fact, I am surprised no one took her took, ran it on their own data and just submit that (which was perfectly well within the scope of the challenge). Best Tool: Bianca KramerThe best tool (see the code snippet on the right) uses R and a few R packages (rorcid, rjson, httpcache) and services like ORCID and CrossRef (and the I4OC project), and the (also awesome) project. The code is available on GitHub. Highest Open Knowledge Score: Bianca KramerI did not check the self-reported score of 54%, but since no one challenged here, Bianca wins this category too.…
Original Post: Winter solstice challenge #3: the winner is Bianca Kramer!

Version 2.2.2 Released

ggtern version 2.2.2 has just been submitted to CRAN, and it includes a number of new features. This time around, I have adapted the hexbin geometry (and stat), and additionally, created an almost equivalent geometry which operates on a triangular mesh rather than a hexagonal mesh. There are some subtle differences which give some added functionality, and together these will provide an additional level of richness to ternary diagrams produced with ggtern, when the data-set is perhaps significantly large and points themselves start to lose their meaning from visual clutter. Ternary Hexbin Firstly, lets look a the ternary hexbin, which, as the name suggests has the capability to bin points in a regular hexagonal grid to produce a pseudo-surface. Now in the original ggplot version, this geometry is somewhat limiting since it only performs a ‘count’ on the number of…
Original Post: Version 2.2.2 Released

Data Driven DIY

Statisfix – Which fixing should I buy? I have a bathroom cabinet to put up. It needs to go onto a tiled plasterboard (drywall) wall.Because of the tiles, I can’t use the fixings I normally use to keep heavy objects fixed to the wall.And bog standard rawlplugs aren’t going to do the job. So what should I buy? YouTube to the rescue – more specifically, this fine chap at Ultimate Handyman. Not only does he demonstrate how to use the fixings, but also produced this strangely mesmerising strength test showing how much weight the fixings support before the plasterboard gives out. As well as the strength of the fixing, I need to consider the price of the fixings, and also, the size of the hole required (which in turn, will also impact the overall cost of the job if I…
Original Post: Data Driven DIY

Because it's Friday: Principles and Values

Most companies publish mission and vision statements, and some also publish a detailed list of principles that underlie the company ethos. But what makes a good collection of principles, and does writing them down really matter? At the recent Monktoberfest conference, Bryan Cantrill argued that yes, they do matter, mostly by way of some really egregious counterexamples. That’s all from the blog for this week. We’ll be back on Monday — have a great weekend!
Original Post: Because it's Friday: Principles and Values

The Friday #rstats PuzzleR : 2018-01-19

Peter Meissner (@marvin_dpr) released crossword.r to CRAN today. It’s a spiffy package that makes it dead simple to generate crossword puzzles. He also made a super spiffy javascript library to pair with it, which can turn crossword model output into an interactive puzzle. I thought I’d combine those two creations with a way to highlight new/updated packages from the previous week, cool/useful packages in general, and some R functions that might come in handy. Think of it as a weekly way to get some R information while having a bit of fun! This was a quick, rough creation and I’ll be changing the styles a bit for next Friday’s release, but Peter’s package is so easy to use that I have absolutely no excuse to not keep this a regular feature of the blog. I’ll release a static, ggplot2 solution…
Original Post: The Friday #rstats PuzzleR : 2018-01-19