Highlights of the Data Science Track at Microsoft Ignite

The letters and numbers you entered did not match the image. Please try again. As a final step before posting your comment, enter the letters and numbers you see in the image below. This prevents automated programs from posting comments. Having trouble reading this image? View an alternate.
Original Post: Highlights of the Data Science Track at Microsoft Ignite

How to build an image recognizer in R using just a few images

Microsoft Cognitive Services provides several APIs for image recognition, but if you want to build your own recognizer (or create one that works offline), you can use the new Image Featurizer capabilities of Microsoft R Server.  The process of training an image recognition system requires LOTS of images — millions and millions of them. The process involves feeding those images into a deep neural network, and during that process the network generates “features” from the image. These features might be versions of the image including just the outlines, or maybe the image with only the green parts. You could further boil those features down into a single number, say the length of the outline or the percentage of the image that is green. With enough of these “features”, you could use them in a traditional machine learning model to classify…
Original Post: How to build an image recognizer in R using just a few images

Lessons Learned From Benchmarking Fast Machine Learning Algorithms

[unable to retrieve full-text content]Boosted decision trees are responsible for more than half of the winning solutions in machine learning challenges hosted at Kaggle, and require minimal tuning. We evaluate two popular tree boosting software packages: XGBoost and LightGBM and draw 4 important lessons.
Original Post: Lessons Learned From Benchmarking Fast Machine Learning Algorithms

dplyrXdf 0.10.0 beta prerelease

I’m happy to announce that version 0.10.0 beta of the dplyrXdf package is now available. You can get it from Github: install_github(“RevolutionAnalytics/dplyrXdf”, build_vignettes=FALSE) This is a major update to dplyrXdf that adds the following features: Support for the tidyeval framework that powers the latest version of dplyr Works with Spark and Hadoop clusters and files in HDFS Several utility functions to ease working with files and datasets Many bugfixes and workarounds for issues with the underlying RevoScaleR functions This (pre-)release of dplyrXdf requires Microsoft R Server or Client version 8.0 or higher, and dplyr 0.7 or higher. If you’re using R Server, dplyr 0.7 won’t be in the MRAN snapshot that is your default repo, but you can get it from CRAN: install.packages(“dplyr”, repos=”https://cloud.r-project.org”) The tidyeval framework This completely changes the way in which dplyr handles standard evaluation. Previously, if…
Original Post: dplyrXdf 0.10.0 beta prerelease

Tutorial: Deep Learning with R on Azure with Keras and CNTK

by Le Zhang (Data Scientist, Microsoft) and Graham Williams (Director of Data Science, Microsoft) Microsoft’s Cognitive Toolkit (better known as CNTK) is a commercial-grade and open-source framework for deep learning tasks. At present CNTK does not have a native R interface but can be accessed through Keras, a high-level API which wraps various deep learning backends including CNTK, TensorFlow, and Theano, for the convenience of modularizing deep neural network construction. The latest version of CNTK (2.1) supports Keras. The RStudio team has developed an R interface for Keras making it possible to run different deep learning backends, including CNTK, from within an R session. This tutorial illustrates how to simply and quickly spin up a Ubuntu-based Azure Data Science Virtual Machine (DSVM) and to configure a Keras and CNTK environment. An Azure DSVM is a curated virtual machine image coming with an…
Original Post: Tutorial: Deep Learning with R on Azure with Keras and CNTK

Tutorial: Publish an R function as a SQL Server stored procedure with the sqlrutils package

In SQL Server 2016 and later, you can publish an R function to the database as a stored procedure. This makes it possible to run your R function on the SQL Server itself, which makes the power of that server available for R computations, and also eliminates the time required to move data to and from the server. It also makes your R function available as a resource to DBAs for use in SQL queries, even if they don’t know the R language. Neils Berglund recently posted a detailed tutorial on using the sqlrutils package to publish an R function as a stored procedure. There are several steps to the process, but ultimately it boils down to calling registerStoredProcedure on your R function (and providing the necessary credentials). If you don’t have a connection (or the credentials) to publish to…
Original Post: Tutorial: Publish an R function as a SQL Server stored procedure with the sqlrutils package

Text categorization with deep learning, in R

Given a short review of a product, like “I couldn’t put it down!”, can you predict what the product is? In that case it’s pretty easy — it’s for a book — but this general problem of text categorization comes up in a lot of natural language analysis problems. In his talk at useR!2017 (shown below), Microsoft data scientist Angus Taylor demonstrates how to build a text categorization model in R. He applies a convolutional neural network (trained using the R interface to the MXNET deep learning platform) to Amazon review data, and creates a small Shiny app to categorize previously-unseen reviews. The talk also provides an brief introduction to convolutional neural networks and one-hot encoding, if you haven’t come across those concepts before. The model Angus uses in this example is described in more detail in the blog post, Cloud-Scale Text…
Original Post: Text categorization with deep learning, in R

Applications in energy, retail and shipping

The Solutions section of the Cortana Intelligence Gallery provides more than two dozen working examples of applying machine learning, data science and artificial intelligence to real-world problems. Each solution provides sample data, scripts for model training and evaluation, and reporting of predictions. You can deploy a complete stack in Azure to implement the solution with the click of a button, or follow instructions to deploy on your own hardware. The internals of each solution is fully documented and open source, so you can easily customize it to your needs. Here’s a brief overview of some solutions that have recently been posted to the Gallery. Click on the links in bold to be taken to the main solution page. Customer Churn Prediction. This solution uses historical customer transaction data to identify new customers that are most likely to churn (switch to a…
Original Post: Applications in energy, retail and shipping

How to use H2O with R on HDInsight

H2O.ai is an open-source AI platform that provides a number of machine-learning algorithms that run on the Spark distributed computing framework. Azure HDInsight is Microsoft’s fully-managed Apache Hadoop platform in the cloud, which makes it easy to spin up and manage Azure clusters of any size. It’s also easy to to run H2O on HDInsight: H2O AI Platform is available as an application on HDInsight, which pre-installs everything you need as the cluster is created. You can also drive H2O from R, but the R packages don’t come auto-installed on HDInsight. To make this easy, the Azure HDInsight team has provided a couple of scripts that will install the necessary components on the cluster for you. These include RStudio (to provide an R IDE on the cluster) and the rsparkling package. With these components installed, from R you can: Query data…
Original Post: How to use H2O with R on HDInsight