News Roundup from Microsoft Ignite

It’s been a big day for the team here at Microsoft, with a flurry of announcements from the Ignite conference in Orlando. We’ll provide more in-depth details in the coming days and weeks, but for now here’s a brief roundup of the news related to data science: Microsoft ML Server 9.2 is now available. This is the new name for what used to be called Microsoft R Server, and now also includes support for operationalizing the Python language as well as R. This 2-minute video explains how to deploy models with ML Server (and with this release, real-time scoring is now also supported Linux as well). Microsoft R Client 3.4.1, featuring R 3.4.1, was also released today and provides desktop capabilities for R developers with the ability to deploy computations to a production ML Server. The next generation of Azure…
Original Post: News Roundup from Microsoft Ignite

News Roundup from Microsoft Ignite

It’s been a big day for the team here at Microsoft, with a flurry of announcements from the Ignite conference in Orlando. We’ll provide more in-depth details in the coming days and weeks, but for now here’s a brief roundup of the news related to data science: Microsoft ML Server 9.2 is now available. This is the new name for what used to be called Microsoft R Server, and now also includes support for operationalizing the Python language as well as R. This 2-minute video explains how to deploy models with ML Server (and with this release, real-time scoring is now also supported Linux as well). Microsoft R Client 3.4.1, featuring R 3.4.1, was also released today and provides desktop capabilities for R developers with the ability to deploy computations to a production ML Server. The next generation of Azure…
Original Post: News Roundup from Microsoft Ignite

Visualizing High Dimensional Data In Augmented Reality

[unable to retrieve full-text content]Oftentimes, when Data Scientists first get a data set, they’ll use a matrix of 2D scatter plots to quickly overview the contents. 2D scatter plots show the relationships between pairs of attributes. But for data with lots of attributes, that type of analysis just doesn’t scale.
Original Post: Visualizing High Dimensional Data In Augmented Reality

Custom Level Coding in vtreat

One of the services that the R package vtreat provides is level coding (what we sometimes call impact coding): converting the levels of a categorical variable to a meaningful and concise single numeric variable, rather than coding them as indicator variables (AKA “one-hot encoding”). Level coding can be computationally and statistically preferable to one-hot encoding for variables that have an extremely large number of possible levels. Level coding is like measurement: it summarizes categories of individuals into useful numbers. Source: USGS By default, vtreat level codes to the difference between the conditional means and the grand mean (catN variables) when the outcome is numeric, and to the difference between the conditional log-likelihood and global log-likelihood of the target class (catB variables) when the outcome is categorical. These aren’t the only possible level codings. For example, the ranger package can encode…
Original Post: Custom Level Coding in vtreat

Spark – The Definitive Guide – exclusive preview

[unable to retrieve full-text content]Get an exclusive preview of “Spark: The Definitive Guide” from Databricks! Learn how Spark runs on a cluster, see examples in SQL, Python and Scala, Learn about Structured Streaming and Machine Learning and more.
Original Post: Spark – The Definitive Guide – exclusive preview

Top Stories, Sep 18-24: Essential Data Science & Machine Learning Cheat Sheets; How To Become a 10x Data Scientist

[unable to retrieve full-text content]30 Essential Data Science, Machine Learning & Deep Learning Cheat Sheets; How To Become a 10x Data Scientist; 5 Machine Learning Projects You Can No Longer Overlook – Episode VI; Ensemble Learning to Improve Machine Learning Results; Putting Machine Learning in Production
Original Post: Top Stories, Sep 18-24: Essential Data Science & Machine Learning Cheat Sheets; How To Become a 10x Data Scientist

Time Series Analysis in R Part 2: Time Series Transformations

In Part 1 of this series, we got started by looking at the ts object in R and how it represents time series data. In Part 2, I’ll discuss some of the many time series transformation functions that are available in R. This is by no means an exhaustive catalog. If you feel I left out anything important, please let me know. I compile these posts as a guide in RMarkdown which I plan to make available on the web soon. Often in time series analysis and modeling, we will want to transform data. There are a number of different functions that can be used to transform time series data such as the difference, log, moving average, percent change, lag, or cumulative sum. These type of function are useful for both visualizing time series data and for modeling time series.…
Original Post: Time Series Analysis in R Part 2: Time Series Transformations

Python Data Preparation Case Files: Group-based Imputation

[unable to retrieve full-text content]The second part in this series addresses group-based imputation for dealing with missing data values. Check out why finding group means can be a more formidable action than overall means, and see how to accomplish it in Python.
Original Post: Python Data Preparation Case Files: Group-based Imputation