Because it's Friday: Mario in the Park

I got my first chance to use HoloLens just a couple of weeks ago. It was pretty amazing to see a virtual wind turbine appear in the room with me, and to be able to walk around it and see how it was performing. But here’s a much more fun application of HoloLens: a recreation of the 2-D Super Mario “World 1-1” in the very 3-D world of New York’s Central Park: Sadly the blocks and platforms are purely virtual, which constrains the real-world player to walking on the ground. But it does make for some amusing antics for the passers-by (although in true New Yorker style, most ignore the random jumping and pointing!). That’s all for us for this week. Have a great weekend: we’ll be back on Monday!
Original Post: Because it's Friday: Mario in the Park

The R community is one of R's best features

R is incredible software for statistics and data science. But while the bits and bytes of software are an essential component of its usefulness, software needs a community to be successful. And that’s an area where R really shines, as Shannon Ellis explains in this lovely ROpenSci blog post. For software, a thriving community offers developers, expertise, collaborators, writers and documentation, testers, agitators (to keep the community and software on track!), and so much more. Shannon provides links where you can find all of this in the R community: #rstats hashtag — a responsive, welcoming, and inclusive community of R users to interact with on Twitter R-Ladies — a world-wide organization focused on promoting gender diversity within the R community, with more than 30 local chapters Local R meetup groups — a google search may show that there’s one in your…
Original Post: The R community is one of R's best features

Interactive R visuals in Power BI

Power BI has long had the capability to include custom R charts in dashboards and reports. But in sharp contrast to standard Power BI visuals, these R charts were static. While R charts would update when the report data was refreshed or filtered, it wasn’t possible to interact with an R chart on the screen (to display tool-tips, for example). But in the latest update to Power BI, you can create create R custom visuals that embed interactive R charts, like this: The above chart was created with the plotly package, but you can also use htmlwidgets or any other R package that creates interactive graphics. The only restriction is that the output must be HTML, which can then be embedded into the Power BI dashboard or report. You can also publish reports including these interactive charts to the online…
Original Post: Interactive R visuals in Power BI

Updated Data Science Virtual Machine for Windows: GPU-enabled with Docker support

The Windows edition of the Data Science Virtual Machine (DSVM), the all-in-one virtual machine image with a wide-collection of open-source and Microsoft data science tools, has been updated to the Windows Server 2016 platform. This update brings built-in support for Docker containers and GPU-based deep learning.  GPU-based Deep Learning. While prior editions of the DSVM could access GPU-based capabilities by installing additional components, everything is now configured and ready at launch. The DSVM now includes GPU-enabled builds of popular deep learning frameworks including CNTK, Tensorflow, and MXNET. It also includes Microsoft R Server 9.1, and several machine-learning functions in the MicrosoftML package can also take advantage of GPUs. Note that you will need to use an N-series Azure instance to benefit from GPU acceleration, but all of the tools in the DSVM will also work on regular CPU-based instances as well. Docker…
Original Post: Updated Data Science Virtual Machine for Windows: GPU-enabled with Docker support

R leads, Python gains in 2017 Burtch Works Survey

For the past four years, recruiting firm Burtch Works has conducted a simple survey of data scientists with just one question: “Which do you prefer to use — SAS, R or Python”. The results for this year’s survey of 1,046 respondents are in: R: 40% (2016: 42%) SAS: 34% (2016: 39%) Python: 26% (2016: 20%) Compared to last year’s results, Python has gained 6 percentage points, mainly at the expense of SAS. (2016 results do not add to 100% due to rounding.) The trend for the commercial tool compared to the open-source tools is apparent looking at the data from all four years: (Note: Python has only been an option for respondents in the last two surveys.) Demographic breakdowns, included in the survey analysis linked below, show that SAS remains popular with more years of experience (more than 16 years),…
Original Post: R leads, Python gains in 2017 Burtch Works Survey

Using sparklyr with Microsoft R Server

The sparklyr package (by RStudio) provides a high-level interface between R and Apache Spark. Among many other things, it allows you to filter and aggregate data in Spark using the dplyr syntax. In Microsoft R Server 9.1, you can now connect to a a Spark session using the sparklyr package as the interface, allowing you to combine the data-preparation capabilities of sparklyr and the data-analysis capabilities of Microsoft R Server in the same environment. In a presentation by at the Spark Summit (embedded below, and you can find the slides here), Ali Zaidi shows how to connect to a Spark session from Microsoft R Server, and use the sparklyr package to extract a data set. He then shows how to build predictive models on this data (specifically, a deep Neural Network and a Boosted Trees classifier). He also shows how…
Original Post: Using sparklyr with Microsoft R Server

Because it's Friday: Dry Martini Specifications

“Standards are Serious Business” was once the tagline of ANSI, the American National Standards Institute, but this tongue-in-cheek standard (ANSI K100.1-1974, an update to ASA K100.1-1966) is anything but. I mean, I appreciate a dry martini as much as anyone but this is beyond the pale: The list of standards committee members rather gives the joke away, though: Apparently this was once an active ANSI standard (though according to Wikipedia, no longer), used to promote the benefits of the standards organization. Check out the complete American National Standard Safety Code and Requirements for Dry Martinis for a fun read. That’s all for us for this week. Have a great weekend, and we’ll be back on Monday!
Original Post: Because it's Friday: Dry Martini Specifications

Applications of R at EARL San Francisco 2017

The Mango team held their first instance of the EARL conference series in San Francisco last month, and it was a fantastic showcase of real-world applications of R. This was a smaller version of the EARL conferences in London and Boston, but with that came the opportunity to interact with R users from industry in a more intimate setting. Hopefully Mango will return to the venue again next year, and if so I’ll definitely be back! As always with EARL events, the program featured many interesting presentations of how R is used to implement a data-driven (or data-informed) policy at companies around the world. With a dual-track program I couldn’t attend all of the talks, but here are some of the applications that caught my interest: Ricardo Bion (AirBnB): An keynote with an update on data science practice at AirBnB:…
Original Post: Applications of R at EARL San Francisco 2017

Demo: Real-Time Predictions with Microsoft R Server

At the R/Finance conference last month, I demonstrated how to operationalize models developed in Microsoft R Server as web services using the mrsdeploy package. Then, I used that deployed model to generate predictions for loan delinquency, using a Python script as the client. (You can see slides here, and a video of the presentation below.) With Microsoft R Server 9.1, there are now two ways to operationalize models as a Web service or as a SQL Server stored procedure: Flexible Operationalization: Deploy any R script or function. Real-Time Operationalization: Deploy model objects generated by specific functions in Microsoft R, but generates predictions much more quickly by bypassing the R interpreter. In the demo, which begins at the 10:00 mark in the video below, you can see a comparison of using the two types of deployment. Ultimately, I was able to generate predictions from…
Original Post: Demo: Real-Time Predictions with Microsoft R Server

Studying disease with R: RECON, The R Epidemics Consortium

For almost a year now, a collection of researchers from around the world has been collaborating to develop the next generation of analysis tools for disease outbreak response using R. The R Epidemics Consortium (RECON) creates R packages for handling, visualizing, and analyzing outbreak data using cutting-edge statistical methods, along with general-purpose tools for data cleaning, versioning, and encryption, and system infrastructure. Like ROpenSci, the Epidemics Consortium is focused on developing efficient, reliable, and accessible open-source tools, but with a focus on epidemology as opposed to science generally. The Epidemics Consortium has already created several useful resources for epidemiology: There are also a large number of additional packages under development. RECON welcomes new members, particularly experienced R developers and as public health officers specialized in outbreak response. You can find information on how to join here, and general information about the R Epidemics…
Original Post: Studying disease with R: RECON, The R Epidemics Consortium