Diversity in the R Community

In the follow-up to the useR! conference in Stanford last year, the Women in R Task force took the opportunity to survey the 900-or-so participants about their backgrounds, experiences and interests. With 455 responses, the recently-published results provide an interesting snapshot about the R community (or at least that subset able to travel to the US and who were able to register before the conference sold out). Among the findings (there are summaries; check the report for the detailed breakdowns): 33% of attendees identified as women 26% of attendees identified as other than White or Caucasian 5% of attendees identified as LGBTQ The report also includes some interesting demographic analysis of the attendees, including the map of home country distribution shown below. The report also offers recommendations for future conferences, one of which has already been implemented: the useR!2017 conference in Brussels…
Original Post: Diversity in the R Community

Git Gud with Git and R

If you’re doing any kind of in-depth programming in the R language (say, creating a report in Rmarkdown, or developing a package) you might want to consider using a version-control system. And if you collaborate with another person (or a team) on the work, it makes things infinitely easier when it comes to coordinating changes. Amongst other benefits, a version-control system: Saves you from the worry of making irrevocable changes. Instead of keeping multiple versions of files around (are filenames like Report.Rmd; Report2.Rmd; Report-final.Rmd; Report-final-final.Rmd familiar?) you just keep the latest version of the file, knowing that the older versions are accessible should you need them. Keeps a remote backup of your files. If you accidentally delete a critical file, you can retrieve it. If your hard drive crashes, it’s easy to restore the project. Makes it easy to work…
Original Post: Git Gud with Git and R

The fivethirtyeight R package

Andrew Flowers, quantitiative editor of FiveThirtyEight.com, announced at last weeks’ RStudio conference the availability of a new R package containing data and analyses from some of their data journalism features: the fivethirtyeight package. (Andrew’s talk isn’t yet online, but you can see him discuss several of these stories in his UseR!2016 presentation.) While not an official product of the FiveThirtyEight editorial team, it was developed by Albert Y. Kim, Chester Ismay and Jennifer Chunn under their guidance. Their motivation for producing the package was to provide a resource for teaching data science: We are involved in statistics and data science education, in particular at the introductory undergraduate level. As such, we are always looking for data sets that balance being Rich enough to answer meaningful questions with, real enough to ensure that there is context, and realistic enough to convey to…
Original Post: The fivethirtyeight R package

Introduction to Forecasting with ARIMA in R

[unable to retrieve full-text content]ARIMA models are a popular and flexible class of forecasting model that utilize historical information to make predictions. In this tutorial, we walk through an example of examining time series for demand at a bike-sharing service, fitting an ARIMA model, and creating a basic forecast.
Original Post: Introduction to Forecasting with ARIMA in R

Education Analytics with R and Cortana Intelligence Suite

By Fang Zhou, Microsoft Data Scientist; Hong Ooi, Microsoft Senior Data Scientist; and Graham Williams, Microsoft Director of Data Science Education is a relatively late adopter of predictive analytics and machine learning as a management tool. A keen desire for improving educational outcomes for society is now leading universities and governments to perform student predictive analytics to provide better-informed and timely decision making. Student predictive analytics often aims to solve two key problems: Predict student academic outcomes so as to better target support. Predict students at risk of dropping out so as to prevent attrition. Education systems face enormous diversity across regions and countries. Two case studies demonstrate the novel and unique landscape for machine learning in the education world. A mixed effects regression model has been developed in conjunction with an Australian education department to measure the influence of…
Original Post: Education Analytics with R and Cortana Intelligence Suite

The Most Popular Language For Machine Learning and Data Science Is …

By Jean-Francois Puget, IBM. What programming language should one learn to get a machine learning or data science job?  That’s the silver bullet question.  It is debated in many forums.  I could provide here my own answer to it and explain why, but I’d rather look at some data first.  After all, this is what machine learners and data scientists should do: look at data, not opinions. So, let’s look at some data.  I will use the trend search available on indeed.com.  It looks for occurrences over time of selected terms in job offers.  It gives an indication of what skills employers are seeking.  Note however that it is not a poll on which skills are effectively in use.  It is rather an advanced indicator of how skill popularity evolve (more formally, it is probably close to the first order…
Original Post: The Most Popular Language For Machine Learning and Data Science Is …

In case you missed it: December 2016 roundup

In case you missed them, here are some articles from December of particular interest to R users.  Power BI now has a gallery of custom visualizations built with R. Chicago’s Department of Public Health uses R to prioritize health inspections at restaurants. A beautiful map of Switzerland municipalities combined with a relief map of the mountains, created with R. Using the Azure Interface Tool to parallelize the problem of optimizing an R model across the hyperparameter space. A primer on Bayesian Statistics. Animating Voronoi tesselations in R to create a greeting card. The Linux Data Science Virtual Machine, which includes several R-related components, is available for a free “test drive” on Azure. The new AzureSMR package lets you manage Azure virtual machines, clusters and storage from R. Interactive decision trees in Microsoft R Server. The ompr package provides numerical optimization with…
Original Post: In case you missed it: December 2016 roundup

The anatomy of a useful chart: NOAA's flood forecasts

With thanks to NOAA’s incredible data gathering and forecasting activities, I’ve been obsessed with this chart for the past few days: We used to live near the Napa river where this river gage is located, and still have many friends in the area. We were in the area last weekend, when a “pineapple express” weather event brought an atmospheric river over much of California, with much rain and some flooding in low-lying areas. This was just before the first peak in the chart above, which shows the water level in the Napa river (in blue) along with a NOAA forecast (in purple). I was checking this chart obsessively, as the observed water level approached the “Major Flood” level, and experienced alternate bouts of hope and fear as the forecast skirted above the line from time to time. Relying on this…
Original Post: The anatomy of a useful chart: NOAA's flood forecasts

A Non-comprehensive List of Awesome Things Other People Did in 2016

Editor’s note: For the last few years I have made a list of awesome things that other people did (2015, 2014, 2013). Like in previous years I’m making a list, again right off the top of my head. If you know of some, you should make your own list or add it to the comments! I have also avoided talking about stuff I worked on or that people here at Hopkins are doing because this post is supposed to be about other people’s awesome stuff. I write this post because a blog often feels like a place to complain, but we started Simply Stats as a place to be pumped up about the stuff people were doing with data. Thomas Lin Pedersen created the tweenr package for interpolating graphs in animations. Check out this awesome logo he made with it.…
Original Post: A Non-comprehensive List of Awesome Things Other People Did in 2016