Diversity in the R Community

In the follow-up to the useR! conference in Stanford last year, the Women in R Task force took the opportunity to survey the 900-or-so participants about their backgrounds, experiences and interests. With 455 responses, the recently-published results provide an interesting snapshot about the R community (or at least that subset able to travel to the US and who were able to register before the conference sold out). Among the findings (there are summaries; check the report for the detailed breakdowns): 33% of attendees identified as women 26% of attendees identified as other than White or Caucasian 5% of attendees identified as LGBTQ The report also includes some interesting demographic analysis of the attendees, including the map of home country distribution shown below. The report also offers recommendations for future conferences, one of which has already been implemented: the useR!2017 conference in Brussels…
Original Post: Diversity in the R Community

Git Gud with Git and R

If you’re doing any kind of in-depth programming in the R language (say, creating a report in Rmarkdown, or developing a package) you might want to consider using a version-control system. And if you collaborate with another person (or a team) on the work, it makes things infinitely easier when it comes to coordinating changes. Amongst other benefits, a version-control system: Saves you from the worry of making irrevocable changes. Instead of keeping multiple versions of files around (are filenames like Report.Rmd; Report2.Rmd; Report-final.Rmd; Report-final-final.Rmd familiar?) you just keep the latest version of the file, knowing that the older versions are accessible should you need them. Keeps a remote backup of your files. If you accidentally delete a critical file, you can retrieve it. If your hard drive crashes, it’s easy to restore the project. Makes it easy to work…
Original Post: Git Gud with Git and R

The fivethirtyeight R package

Andrew Flowers, quantitiative editor of FiveThirtyEight.com, announced at last weeks’ RStudio conference the availability of a new R package containing data and analyses from some of their data journalism features: the fivethirtyeight package. (Andrew’s talk isn’t yet online, but you can see him discuss several of these stories in his UseR!2016 presentation.) While not an official product of the FiveThirtyEight editorial team, it was developed by Albert Y. Kim, Chester Ismay and Jennifer Chunn under their guidance. Their motivation for producing the package was to provide a resource for teaching data science: We are involved in statistics and data science education, in particular at the introductory undergraduate level. As such, we are always looking for data sets that balance being Rich enough to answer meaningful questions with, real enough to ensure that there is context, and realistic enough to convey to…
Original Post: The fivethirtyeight R package

Because it's Friday: Code Burn

I was unaware of the work of Jenn Schiffer until recently. At the risk of giving away the joke, she writes satire for coders. Some of her best pieces include: Like any good satire it can be hard to spot, but if you had any doubt check out the byline at the end of each post. Don’t miss the comments, either. The story behind these posts is the subject of a interesting talk delivered by Jenn Schiffer late last year. It’s well worth watching, not least as an insight into the experience of women in the tech industry. And also because it’s very, very funny. That’s all from the blog for this week. We’ll be back on Monday. In the meantime, have a great weekend!
Original Post: Because it's Friday: Code Burn

Education Analytics with R and Cortana Intelligence Suite

By Fang Zhou, Microsoft Data Scientist; Hong Ooi, Microsoft Senior Data Scientist; and Graham Williams, Microsoft Director of Data Science Education is a relatively late adopter of predictive analytics and machine learning as a management tool. A keen desire for improving educational outcomes for society is now leading universities and governments to perform student predictive analytics to provide better-informed and timely decision making. Student predictive analytics often aims to solve two key problems: Predict student academic outcomes so as to better target support. Predict students at risk of dropping out so as to prevent attrition. Education systems face enormous diversity across regions and countries. Two case studies demonstrate the novel and unique landscape for machine learning in the education world. A mixed effects regression model has been developed in conjunction with an Australian education department to measure the influence of…
Original Post: Education Analytics with R and Cortana Intelligence Suite

In case you missed it: December 2016 roundup

In case you missed them, here are some articles from December of particular interest to R users.  Power BI now has a gallery of custom visualizations built with R. Chicago’s Department of Public Health uses R to prioritize health inspections at restaurants. A beautiful map of Switzerland municipalities combined with a relief map of the mountains, created with R. Using the Azure Interface Tool to parallelize the problem of optimizing an R model across the hyperparameter space. A primer on Bayesian Statistics. Animating Voronoi tesselations in R to create a greeting card. The Linux Data Science Virtual Machine, which includes several R-related components, is available for a free “test drive” on Azure. The new AzureSMR package lets you manage Azure virtual machines, clusters and storage from R. Interactive decision trees in Microsoft R Server. The ompr package provides numerical optimization with…
Original Post: In case you missed it: December 2016 roundup

The anatomy of a useful chart: NOAA's flood forecasts

With thanks to NOAA’s incredible data gathering and forecasting activities, I’ve been obsessed with this chart for the past few days: We used to live near the Napa river where this river gage is located, and still have many friends in the area. We were in the area last weekend, when a “pineapple express” weather event brought an atmospheric river over much of California, with much rain and some flooding in low-lying areas. This was just before the first peak in the chart above, which shows the water level in the Napa river (in blue) along with a NOAA forecast (in purple). I was checking this chart obsessively, as the observed water level approached the “Major Flood” level, and experienced alternate bouts of hope and fear as the forecast skirted above the line from time to time. Relying on this…
Original Post: The anatomy of a useful chart: NOAA's flood forecasts

What can we learn from StackOverflow data?

StackOverflow, the popular Q&A site for programmers, provides useful information to nearly 5 million programmers worldwide with its database of questions and answers — not to mention the additional comments that other programmers provide. (You might be interested in the architecture, based SQL Server 2016, required to deliver the 8.5 billion pages Stack Overflow served last year.) Since its inception, StackOverflow has has a policy of sharing all of this content under a Creative Commons license. This represents a rich trove of unstructured data for analysis, especially given that the database of 13 million questions, 21 million answers and 54 million comments (and growing) is easily accessible via StackExchange Data Explore, Kaggle and Google BigQuery. Various data scientists have investigated this database, and learned some interesting things about programmers in the process. Here are a few examples, with links to the complete reports. Sara…
Original Post: What can we learn from StackOverflow data?

Because it's Friday: The camera might not lie, but sometimes it fibs

Photography is my favourite art form: it’s more than just capturing a scene in the frame. A good photograph tells a story, chosen and delivered by the photographer. But sometimes that story isn’t exactly what it seems, as shown by these examples of photography tricks. (Yes, that’s one of those clickbaity listicles, but at least it doesn’t make you click “Next” for Every. Single. Entry.) Here’s a great example from the set — this photo set-up: resulted in this lovely wedding picture (by Chris Chambers Photography): That’s all from us for this week. See you back here on Monday, and have a great weekend!
Original Post: Because it's Friday: The camera might not lie, but sometimes it fibs