Saving Snow Leopards with Artificial Intelligence

The snow leopard, the large cat native to the mountain ranges of Central and South Asia, is a highly endangered species. With an estimated estimated 3900-6500 individuals left in the wild, conservation efforts led by the Snow Leopard Trust are focused on preserving this iconic animal. But the snow leopard is an elusive creature: given their range and emote habitat (including the highlands of the Himalayas), they are difficult to study. In order to gather data about the creatures, researchers have used camera traps to capture more than 1 million images.  But not all of those images are of snow leopards. It’s a time-consuming process to classify those images as being of snow leopards, their prey, some other animal or nothing at all. To make things even more difficult, snow leopards have excellent camouflage, and can be difficult to spot even by…
Original Post: Saving Snow Leopards with Artificial Intelligence

Tutorial: Azure Data Lake analytics with R

The Azure Data Lake store is an Apache Hadoop file system compatible with HDFS, hosted and managed in the Azure Cloud. You can store and access the data within directly via the API, by connecting the filesystem directly to Azure HDInsight services, or via HDFS-compatible open-source applications. And for data science applications, you can also access the data directly from R, as this tutorial explains.  To interface with Azure Data Lake, you’ll use U-SQL, a SQL-like language extensible using C#. The R Extensions for U-SQL allow you to reference an R script from a U-SQL statement, and pass data from Data Lake into the R Script. There’s a 500Mb limit for the data passed to R, but the basic idea is that you perform the main data munging tasks in U-SQL, and then pass the prepared data to R for analysis. With this…
Original Post: Tutorial: Azure Data Lake analytics with R

Announcing dplyrXdf 1.0

I’m delighted to announce the release of version 1.0.0 of the dplyrXdf package. dplyrXdf began as a simple (relatively speaking) backend to dplyr for Microsoft Machine Learning Server/Microsoft R Server’s Xdf file format, but has now become a broader suite of tools to ease working with Xdf files. This update to dplyrXdf brings the following new features: Support for the new tidyeval framework that powers the current release of dplyr Support for Spark and Hadoop clusters, including integration with the sparklyr package to process Hive tables in Spark Integration with dplyr to process SQL Server tables in-database Simplified handling of parallel processing for grouped data Several utility functions for Xdf and file management Workarounds for various glitches and unexpected behaviour in MRS and dplyr Spark, Hadoop and HDFS New in version 1.0.0 of dplyrXdf is support for Xdf files and datasets stored…
Original Post: Announcing dplyrXdf 1.0

Introducing the Deep Learning Virtual Machine on Azure

A new member has just joined the family of Data Science Virtual Machines on Azure: The Deep Learning Virtual Machine. Like other DSVMs in the family, the Deep Learning VM is a pre-configured environment with all the tools you need for data science and AI development pre-installed. The Deep Learning VM is designed specifically for GPU-enabled instances, and comes with a complete suite of deep learning frameworks including Tensorflow, PyTorch, MXNet, Caffe2 and CNTK. It also comes witth example scripts and data sets to get you started on deep learning and AI problems, including: The DLVM along with all the DSVMs also provides a complete suite of data science tools including R, Python, Spark, and much more: There have also been some updates and additions to the tools provided in the entire DSVM family, including: All Data Science Virtual Machines,…
Original Post: Introducing the Deep Learning Virtual Machine on Azure

Create Powerpoint presentations from R with the OfficeR package

For many of us data scientists, whatever the tools we use to conduct research or perform an analysis, our superiors are going to want the results as a Microsoft Office document. Most likely it’s a Word document or a PowerPoint presentation, and it probably has to follow corporate branding guidelines to boot. The OfficeR package, by David Gohel, addresses this problem by allowing you to take a Word or PowerPoint template and programmatically insert text, tables and charts generated by R into the template to create a complete document. (The OfficeR package also represents a leap forward from the similar ReporteRs package: it’s faster, and no longer has a dependency on a Java installation.) At his blog, Len Kiefer takes the OfficeR package through its paces, demonstrating how to create a PowerPoint deck using R. The process is pretty simple:…
Original Post: Create Powerpoint presentations from R with the OfficeR package

Key Takeaways from AI Conference in San Francisco 2017 – Day 2

[unable to retrieve full-text content]Highlights and key takeaways from day 2 of AI Conference San Francisco 2017, including current state review, future trends, and top recommendations for AI initiatives.
Original Post: Key Takeaways from AI Conference in San Francisco 2017 – Day 2

Featurizing images: the shallow end of deep learning

by Bob Horton and Vanja Paunic, Microsoft AI and Research Data Group Training deep learning models from scratch requires large data sets and significant computational reources. Using pre-trained deep neural network models to extract relevant features from images allows us to build classifiers using standard machine learning approaches that work well for relatively small data sets. In this context, a deep learning solution can be thought of as incorporating layers that compute features, followed by layers that map these features to outcomes; here we’ll just map the features to outcomes ourselves. We explore an example of using a pre-trained deep learning image classifier to generate features for use with traditional machine learning approaches to address a problem the original model was never trained on (see the blog post “Image featurization with a pre-trained deep neural network model” for other examples).…
Original Post: Featurizing images: the shallow end of deep learning

Meet the new Microsoft R Server: Microsoft ML Server 9.2

Microsoft R Server has received a new name and a major update: Microsoft ML Server 9.2 is now available. ML Server provides a scalable production platform for R — and now Python — programs. The basic idea is that a local client can push R or Python code and have it operationalized on the remote server. ML Server is also included with the Data Science Virtual Machine and HDInsight Spark clusters on Azure.  This video gives a high-level overview of the process, or you can also see details of deploying an R model or a Python model as a web service. The related Microsoft Machine Learning Services provides similar capabilities for in-database computations within SQL Server 2017 (now with Python as well as R) and (in preview) the fully-managed Azure SQL Database. ML Services also provides real-time scoring of trained models, with predictions generated…
Original Post: Meet the new Microsoft R Server: Microsoft ML Server 9.2

News Roundup from Microsoft Ignite

It’s been a big day for the team here at Microsoft, with a flurry of announcements from the Ignite conference in Orlando. We’ll provide more in-depth details in the coming days and weeks, but for now here’s a brief roundup of the news related to data science: Microsoft ML Server 9.2 is now available. This is the new name for what used to be called Microsoft R Server, and now also includes support for operationalizing the Python language as well as R. This 2-minute video explains how to deploy models with ML Server (and with this release, real-time scoring is now also supported Linux as well). Microsoft R Client 3.4.1, featuring R 3.4.1, was also released today and provides desktop capabilities for R developers with the ability to deploy computations to a production ML Server. The next generation of Azure…
Original Post: News Roundup from Microsoft Ignite

Tutorial: Launch a Spark and R cluster with HDInsight

If you’d like to get started using R with Spark, you’ll need to set up a Spark cluster and install R and all the other necessary software on the nodes. A really easy way to achieve that is to launch an HDInsight cluster on Azure, which is just a managed Spark cluster with some useful extra components. You’ll just need to configure the components you’ll need, in our case R and Microsoft R Server, and RStudio Server. This tutorial explains how to launch an HDInsight cluster for use with R. It explains how to size the cluster and launch the cluster, connect to it via SSH, install Microsoft R Server (with R) on each of the nodes, and install RStudio Server community edition to use as an IDE on the edge node. (If you find you need a larger or…
Original Post: Tutorial: Launch a Spark and R cluster with HDInsight