Interactive R visuals in Power BI

Power BI has long had the capability to include custom R charts in dashboards and reports. But in sharp contrast to standard Power BI visuals, these R charts were static. While R charts would update when the report data was refreshed or filtered, it wasn’t possible to interact with an R chart on the screen (to display tool-tips, for example). But in the latest update to Power BI, you can create create R custom visuals that embed interactive R charts, like this: The above chart was created with the plotly package, but you can also use htmlwidgets or any other R package that creates interactive graphics. The only restriction is that the output must be HTML, which can then be embedded into the Power BI dashboard or report. You can also publish reports including these interactive charts to the online…
Original Post: Interactive R visuals in Power BI

Updated Data Science Virtual Machine for Windows: GPU-enabled with Docker support

The Windows edition of the Data Science Virtual Machine (DSVM), the all-in-one virtual machine image with a wide-collection of open-source and Microsoft data science tools, has been updated to the Windows Server 2016 platform. This update brings built-in support for Docker containers and GPU-based deep learning.  GPU-based Deep Learning. While prior editions of the DSVM could access GPU-based capabilities by installing additional components, everything is now configured and ready at launch. The DSVM now includes GPU-enabled builds of popular deep learning frameworks including CNTK, Tensorflow, and MXNET. It also includes Microsoft R Server 9.1, and several machine-learning functions in the MicrosoftML package can also take advantage of GPUs. Note that you will need to use an N-series Azure instance to benefit from GPU acceleration, but all of the tools in the DSVM will also work on regular CPU-based instances as well. Docker…
Original Post: Updated Data Science Virtual Machine for Windows: GPU-enabled with Docker support

R leads, Python gains in 2017 Burtch Works Survey

For the past four years, recruiting firm Burtch Works has conducted a simple survey of data scientists with just one question: “Which do you prefer to use — SAS, R or Python”. The results for this year’s survey of 1,046 respondents are in: R: 40% (2016: 42%) SAS: 34% (2016: 39%) Python: 26% (2016: 20%) Compared to last year’s results, Python has gained 6 percentage points, mainly at the expense of SAS. (2016 results do not add to 100% due to rounding.) The trend for the commercial tool compared to the open-source tools is apparent looking at the data from all four years: (Note: Python has only been an option for respondents in the last two surveys.) Demographic breakdowns, included in the survey analysis linked below, show that SAS remains popular with more years of experience (more than 16 years),…
Original Post: R leads, Python gains in 2017 Burtch Works Survey

Using sparklyr with Microsoft R Server

The sparklyr package (by RStudio) provides a high-level interface between R and Apache Spark. Among many other things, it allows you to filter and aggregate data in Spark using the dplyr syntax. In Microsoft R Server 9.1, you can now connect to a a Spark session using the sparklyr package as the interface, allowing you to combine the data-preparation capabilities of sparklyr and the data-analysis capabilities of Microsoft R Server in the same environment. In a presentation by at the Spark Summit (embedded below, and you can find the slides here), Ali Zaidi shows how to connect to a Spark session from Microsoft R Server, and use the sparklyr package to extract a data set. He then shows how to build predictive models on this data (specifically, a deep Neural Network and a Boosted Trees classifier). He also shows how…
Original Post: Using sparklyr with Microsoft R Server

Applications of R at EARL San Francisco 2017

The Mango team held their first instance of the EARL conference series in San Francisco last month, and it was a fantastic showcase of real-world applications of R. This was a smaller version of the EARL conferences in London and Boston, but with that came the opportunity to interact with R users from industry in a more intimate setting. Hopefully Mango will return to the venue again next year, and if so I’ll definitely be back! As always with EARL events, the program featured many interesting presentations of how R is used to implement a data-driven (or data-informed) policy at companies around the world. With a dual-track program I couldn’t attend all of the talks, but here are some of the applications that caught my interest: Ricardo Bion (AirBnB): An keynote with an update on data science practice at AirBnB:…
Original Post: Applications of R at EARL San Francisco 2017

Demo: Real-Time Predictions with Microsoft R Server

At the R/Finance conference last month, I demonstrated how to operationalize models developed in Microsoft R Server as web services using the mrsdeploy package. Then, I used that deployed model to generate predictions for loan delinquency, using a Python script as the client. (You can see slides here, and a video of the presentation below.) With Microsoft R Server 9.1, there are now two ways to operationalize models as a Web service or as a SQL Server stored procedure: Flexible Operationalization: Deploy any R script or function. Real-Time Operationalization: Deploy model objects generated by specific functions in Microsoft R, but generates predictions much more quickly by bypassing the R interpreter. In the demo, which begins at the 10:00 mark in the video below, you can see a comparison of using the two types of deployment. Ultimately, I was able to generate predictions from…
Original Post: Demo: Real-Time Predictions with Microsoft R Server

Studying disease with R: RECON, The R Epidemics Consortium

For almost a year now, a collection of researchers from around the world has been collaborating to develop the next generation of analysis tools for disease outbreak response using R. The R Epidemics Consortium (RECON) creates R packages for handling, visualizing, and analyzing outbreak data using cutting-edge statistical methods, along with general-purpose tools for data cleaning, versioning, and encryption, and system infrastructure. Like ROpenSci, the Epidemics Consortium is focused on developing efficient, reliable, and accessible open-source tools, but with a focus on epidemology as opposed to science generally. The Epidemics Consortium has already created several useful resources for epidemiology: There are also a large number of additional packages under development. RECON welcomes new members, particularly experienced R developers and as public health officers specialized in outbreak response. You can find information on how to join here, and general information about the R Epidemics…
Original Post: Studying disease with R: RECON, The R Epidemics Consortium

Syberia: A development framework for R code in production

Putting R code into production generally involves orchestrating the execution of a series of R scripts. Even if much of the application logic is encoded into R packages, a run-time environment typically involves scripts to ingest and prepare data, run the application logic, validate the results, and operationalize the output. Managing those scripts, especially in the face of working with multiple R versions, can be a pain — and worse, very complex scripts are difficult to understand and reuse for future applications. That’s where Syberia comes in: an open-source framework created by Robert Krzyzanowski and other engineers at the consumer lending company Avant. There, Syberia has been used by more than 30 developers to build a production data modeling system. In fact, building production R systems was the motivating tenet of Syberia:  Developing classifiers using the Syberia modeling engine follows…
Original Post: Syberia: A development framework for R code in production

Interfacing with APIs using R: the basics

While R (and its package ecosystem) provides a wealth of functions for querying and analyzing data, in our cloud-enabled world there’s now a plethora of online services with APIs you can use to augment R’s capabilities. Many of these APIs use a RESTful interface, which means you will typically send/receive data encoded in the JSON format using HTTP commands. Fortunately, as Steph Locke explains in her most recent R Quick tip, the process is pretty simple using R: Obtain an authentication key for using the service  Find the URL of the API service you wish to use Convert your input data to JSON format using toJSON in the jsonlite package Send your data to the API service using the POST function in the httr package. Include your API key using the add_headers function Extract your results from the API response…
Original Post: Interfacing with APIs using R: the basics