Personal note on joining the Microsoft Cloud Advocates team

A quick personal note: today is my first day as a member of the Cloud Developer Advocates team at Microsoft! I’ll still be blogging and events related to R, and supporting the R community, but now I’ll be doing it as a member of a team dedicated to community outreach. As a bit of background, when I joined Microsoft back in 2015 via the acquisition of Revolution Analytics, I was thrilled to be able to continue my role supporting the R community. Since then, Microsoft as a whole has continue to ramp up its support of open source projects and to interact directly with developers of all stripes (including data scientists!) through various initiatives across the company. (Aside: I knew Microsoft was a big company before I joined, but even then took me a while to appreciate the scale of the…
Original Post: Personal note on joining the Microsoft Cloud Advocates team

Because it's Friday: 3-D Animation

We’ve had 3-D animation for quite a while now, of course, but what happens when a traditional 2-D animator uses a virtual reality system to draw? When famed Disney animator Glen Keane sketches his most iconic creation — Ariel from The Little Mermaid — using Tilt Brush, the result is surprisingly moving. That’s all from us for this week. We’ll be back with more on Monday, and in the meantime have a great weekend!
Original Post: Because it's Friday: 3-D Animation

In case you missed it: November 2017 roundup

In case you missed them, here are some articles from November of particular interest to R users. R 3.4.3 “Kite Eating Tree” has been released. Several approaches for generating a “Secret Santa” list with R. The “RevoScaleR” package from Microsoft R Server has now been ported to Python. The call for papers for the R/Finance 2018 conference in Chicago is now open. Give thanks to the volunteers behind R. Advice for R user groups from the organizer of R-Ladies Chicago. Use containers to build R clusters for parallel workloads in Azure with the doAzureParallel package. A collection of R scripts for interesting visualizations that fit into a 280-character Tweet. R is featured in a StackOverflow case study at the Microsoft Connect conference. The City of Chicago uses R to forecast water quality and issue beach safety alerts. A collection of…
Original Post: In case you missed it: November 2017 roundup

The British Ecological Society's Guide to Reproducible Science

The British Ecological Society has published a new volume in their Guides to Better Science series: A Guide to Reproducible Code in Ecology and Evolution (pdf). The introduction, by , describes its scope: A Guide to Reproducible Code covers all the basic tools and information you will need to start making your code more reproducible. We focus on R and Python, but many of the tips apply to any programming language. Anna Krystalli introduces some ways to organise files on your computer and to document your workflows. Laura Graham writes about how to make your code more reproducible and readable. François Michonneau explains how to write reproducible reports. Tamora James breaks down the basics of version control. Finally, Mike Croucher describes how to archive your code. We have also included a selection of helpful tips from other scientists. The guide…
Original Post: The British Ecological Society's Guide to Reproducible Science

On the biases in data

Whether we’re developing statistical models, training machine learning recognizers, or developing AI systems, we start with data. And while the suitability of that data set is, lamentably, sometimes measured by its size, it’s always important to reflect on where those data come from. Data are not neutral: the data we choose to use has profound impacts on the resulting systems we develop. A recent article in Microsoft’s AI Blog discusses the inherent biases found in many data sets: “The people who are collecting the datasets decide that, ‘Oh this represents what men and women do, or this represents all human actions or human faces.’ These are types of decisions that are made when we create what are called datasets,” she said. “What is interesting about training datasets is that they will always bear the marks of history, that history will…
Original Post: On the biases in data

Because it's Friday: The Whole of the Moon

As we’ve noted before, the Solar System is a big place. You can watch a voyage from the Sun to Jupiter, and it takes 45 minutes at the speed of light. A scale model of the Solar System, with the Sun the size of a weather balloon, is 3.5 miles across … and that’s not even including Pluto. And this virtual scale model, a browser-based rendition by John Worth with the moon just one pixel in size, is no less impressive. I can’t do it justice here — the site surely holds the record for the widest horizontal scrollbar on any page on the Web — so here’s a little snippet of the Earth-Moon system: While you can use the astrological symbols at the top to jump to the planets, try manually scrolling to get the full effect of the…
Original Post: Because it's Friday: The Whole of the Moon

A case study in messy data analysis: the Australian same-sex marriage survey

Last month the Australian people signaled their approval of legalizing same-sex marriage by a 62%:38% margin in a national survey. (On a personal note, I was elated and relieved by the result: my husband and I have discussed eventually retiring to Australia, and with this decision our marriage would be recognized there.) While fears of a surprise Brexit-like electoral backlash proved unfounded, researchers including R user Miles McBain explored the results for correlations to demographic variables. This process wasn’t as simple as it might have been though: the Australian Bureau of Statistics released the results as a pair of Excel files that violate just about every good practice for sharing data in spreadsheets: Miles shares the R code he used to extract useful data from this spreadsheet as a blog post that makes a great case study in dealing with…
Original Post: A case study in messy data analysis: the Australian same-sex marriage survey

R 3.4.3 released

R 3.4.3 has been released, as announced by the R Core team today. As of this writing, only the source distribution (for those that build R themselves) is available, but binaries for Windows, Mac and Linux should appear on your local CRAN mirror within the next day or so. This is primarily a bug-fix release. It fixes an issue with incorrect time zones on MacOS High Sierra, and some issues with handling Unicode characters. (Incidentally, representing international and special characters is something that R takes great care in handling properly. It’s not an easy task: a 2003 essay by Joel Spolsky describes the minefield that is character representation, and not much has changed since then.) You can check out the complete list of changes here. Whatever your platform, R 3.4.3 should be backwards-compatible will other R versions in the R…
Original Post: R 3.4.3 released

How to generate a Secret Santa list with R

Several recent blog posts have explored the Secret Santa problem and provided solutions in R. This post provides a roundup of various solutions and how they are implemented in R. If you wanted to set up a “Secret Santa” gift exchange at the office, you could put everyone’s name into a hat and have each participant draw a name at random. The problem is that someone might draw their own name, but if that happens you can just reshuffle all the names back into the hat and start the process over. That’s essentially what the R code below, from a blog post by David Selby, does: 🙈🎁 w/ code:”Secret Santa in R” ✏ @TeaStats #rstats — Mara Averick (@dataandme) November 24, 2017 That’s not an entirely satisfying solution (at least to me), with all of the having to check for…
Original Post: How to generate a Secret Santa list with R