Speed up simulations in R with doAzureParallel

I’m a big fan using R to simulate data. When I’m trying to understand a data set, my first step is sometimes to simulate data from a model and compare the results to the data, before I go down the path of fitting an analytical model directly. Simulations are easy to code in R, but they can sometimes take a while to run — especially if there are a bunch of parameters you want to explore, which in turn requires a bunch of simulations. When your pc only has four core processors and your parallel processing across three of the.#DataScience #RStats #Waiting #Multitasking pic.twitter.com/iVkkr7ibox — Patrick Williams (@unbalancedDad) January 24, 2018 In this post, I’ll provide a simple example of running multiple simulations in R, and show how you can speed up the process by running the simulations in parallel:…
Original Post: Speed up simulations in R with doAzureParallel

Microsoft R Open 3.4.3 now available

Microsoft R Open (MRO), Microsoft’s enhanced distribution of open source R, has been upgraded to version 3.4.3 and is now available for download for Windows, Mac, and Linux. This update upgrades the R language engine to the latest R (version 3.4.3) and updates the bundled packages (specifically: checkpoint, curl, doParallel, foreach, and iterators) to new versions.  MRO is 100% compatible with all R packages. MRO 3.4.3 points to a fixed CRAN snapshot taken on January 1 2018, and you can see some highlights of new packages released since the prior version of MRO on the Spotlights page. As always, you can use the built-in checkpoint package to access packages from an earlier date (for reproducibility) or a later date (to access new and updated packages). MRO 3.4.3 is based on R 3.4.3, a minor update to the R engine (you can see the detailed list…
Original Post: Microsoft R Open 3.4.3 now available

A simple way to set up a SparklyR cluster on Azure

The SparklyR package from RStudio provides a high-level interface to Spark from R. This means you can create R objects that point to data frames stored in the Spark cluster and apply some familiar R paradigms (like dplyr) to the data, all the while leveraging Spark’s distributed architecture without having to worry about memory limitations in R. You can also access the distributed machine-learning algorithms included in Spark directly from R functions.  If you don’t happen to have a cluster of Spark-enabled machines set up in a nearby well-ventilated closet, you can easily set one up in your favorite cloud service. For Azure, one option is to launch a Spark cluster in HDInsight, which also includes the extensions of Microsoft ML Server. While this service recently had a significant price reduction, it’s still more expensive than running a “vanilla” Spark-and-R…
Original Post: A simple way to set up a SparklyR cluster on Azure

Services and tools for building intelligent R applications in the cloud

by Le Zhang (Data Scientist, Microsoft) and Graham Williams (Director of Data Science, Microsoft) As an in-memory application, R is sometimes thought to be constrained in performance or scalability for enterprise-grade applications. But by deploying R in a high-performance cloud environment, and by leveraging the scale of parallel architectures and dedicated big-data technologies, you can build applications using R that provide the necessary computational efficiency, scale, and cost-effectiveness. We identify four application areas and associated applications and Azure services that you can use to deploy R in enterprise applications. They cover the tasks required to prototype, build, and operationalize an enterprise-level data science and AI solution. In each of the four, there are R packages and tools specifically for accelerating the development of desirable analytics. Below is a brief introduction of each. Cloud resource management and operation Cloud computing instances…
Original Post: Services and tools for building intelligent R applications in the cloud

After the “Meltdown,” How Can You Protect Your Database?

[unable to retrieve full-text content]What Data Scientists should know about Meltdown and Spectre viruses and how to protect the potentially affected databases. The most important thing is to prevent outside parties from executing local Javascript code on your machine.
Original Post: After the “Meltdown,” How Can You Protect Your Database?

R in the Windows Subsystem for Linux

R has been available for Windows since the very beginning, but if you have a Windows machine and want to use R within a Linux ecosystem, that’s easy to do with the new Fall Creator’s Update (version 1709). If you need access to the gcc toolchain for building R packages, or simply prefer the bash environment, it’s easy to get things up and running. Once you have things set up, you can launch a bash shell and run R at the terminal like you would in any Linux system. And that’s because this is a Linux system: the Windows Subsystem for Linux is a complete Linux distribution running within Windows. This page provides the details on installing Linux on Windows, but here are the basic steps you need and how to get the latest version of R up and running…
Original Post: R in the Windows Subsystem for Linux

How to make Python easier for the R user: revoscalepy

by Siddarth Ramesh, Data Scientist, Microsoft I’m an R programmer. To me, R has been great for data exploration, transformation, statistical modeling, and visualizations. However, there is a huge community of Data Scientists and Analysts who turn to Python for these tasks. Moreover, both R and Python experts exist in most analytics organizations, and it is important for both languages to coexist. Many times, this means that R coders will develop a workflow in R but then must redesign and recode it in Python for their production systems. If the coder is lucky, this is easy, and the R model can be exported as a serialized object and read into Python. There are packages that do this, such as pmml. Unfortunately, many times, this is more challenging because the production system might demand that the entire end to end workflow is built…
Original Post: How to make Python easier for the R user: revoscalepy