How to use H2O with R on HDInsight

H2O.ai is an open-source AI platform that provides a number of machine-learning algorithms that run on the Spark distributed computing framework. Azure HDInsight is Microsoft’s fully-managed Apache Hadoop platform in the cloud, which makes it easy to spin up and manage Azure clusters of any size. It’s also easy to to run H2O on HDInsight: H2O AI Platform is available as an application on HDInsight, which pre-installs everything you need as the cluster is created. You can also drive H2O from R, but the R packages don’t come auto-installed on HDInsight. To make this easy, the Azure HDInsight team has provided a couple of scripts that will install the necessary components on the cluster for you. These include RStudio (to provide an R IDE on the cluster) and the rsparkling package. With these components installed, from R you can: Query…
Original Post: How to use H2O with R on HDInsight

How to use H2O with R on HDInsight

H2O.ai is an open-source AI platform that provides a number of machine-learning algorithms that run on the Spark distributed computing framework. Azure HDInsight is Microsoft’s fully-managed Apache Hadoop platform in the cloud, which makes it easy to spin up and manage Azure clusters of any size. It’s also easy to to run H2O on HDInsight: H2O AI Platform is available as an application on HDInsight, which pre-installs everything you need as the cluster is created. You can also drive H2O from R, but the R packages don’t come auto-installed on HDInsight. To make this easy, the Azure HDInsight team has provided a couple of scripts that will install the necessary components on the cluster for you. These include RStudio (to provide an R IDE on the cluster) and the rsparkling package. With these components installed, from R you can: Query data…
Original Post: How to use H2O with R on HDInsight

Counterfactual estimation on nonstationary data, be careful!!!

By Gabriel Vasconcelos In a recent paper that can be downloaded here, Carvalho, Masini and Medeiros show that estimating counterfactuals in a non-stationary framework (when I say non-stationary it means integrated) is a tricky task. It is intuitive that the models will not work properly in the absence of cointegration (spurious case), but what the authors show is that even with cointegration, the average treatment effect (ATE) converges to a non-standard distribution. As a result, standard tests on the ATE will identify treatment effects in several cases that there is no effect at all. For those who are not familiar with counterfactual models: normally, these models have a treated unit (we want to know the effects of the treatment) and several untreated units that we call peers. There units may be cities, countries, companies, etc. Assuming that only one…
Original Post: Counterfactual estimation on nonstationary data, be careful!!!

Letter to the Editor of Perspectives on Psychological Science

[relevant cat picture] tl;dr: Himmicane in a teacup. Back in the day, the New Yorker magazine did not have a Letters to the Editors column, and so the great Spy magazine (the Gawker of its time) ran its own feature, Letters to the Editor of the New Yorker, where they posted the letters you otherwise would never see. Here on this blog we can start a new feature, Letters to the Editor of Perspectives on Psychological Science, which will feature corrections that this journal refuses to print. Here’s our first entry: “In the article, ‘Going in Many Right Directions, All at Once,’ published in this journal, the author wrote, “some critics go beyond scientific argument and counterargument to imply that the entire field is inept and misguided (e.g., Gelman, 2014; Shimmack [sic], 2014).’ However, this article provided no evidence that…
Original Post: Letter to the Editor of Perspectives on Psychological Science

15 Jobs for R users (2017-07-31) – from all over the world

To post your R job on the next post Just visit this link and post a new R job to the R community. You can post a job for free (and there are also “featured job” options available for extra exposure). Current R jobs Job seekers: please follow the links below to learn more and apply for your R job of interest: Featured Jobs Freelance Data Scientists – PhD Paradise – Germany Data Science Talent – Posted by datasciencetalent Frankfurt am Main Hessen, Germany 22 Jul2017 Full-Time Technical Project Manager Virginia Tech Applied Research Corporation – Posted by miller703 Arlington Virginia, United States 15 Jul2017 Full-Time Software Developer Virginia Tech Applied Research Corporation – Posted by miller703 Arlington Virginia, United States 15 Jul2017 Full-Time Marketing Manager RStudio – Posted by [email protected] Anywhere 14 Jul2017 Full-Time Solutions Engineer RStudio – Posted by nwstephens Anywhere 14 Jul2017 Full-Time Lead Data Scientist Golden Rat Studios – Posted by goldenrat Los Angeles California, United States 13 Jul2017 Full-Time Customer Success Rep RStudio – Posted by jclemens1…
Original Post: 15 Jobs for R users (2017-07-31) – from all over the world

Machine Learning Explained: Dimensionality Reduction

Dealing with a lot of dimensions can be painful for machine learning algorithms. High dimensionality will increase the computational complexity, increase the risk of overfitting (as your algorithm has more degrees of freedom) and the sparsity of the data will grow. Hence, dimensionality reduction will project the data in a space with less dimension to limit these phenomena.In this post, we will first try to get an intuition of what is dimensionality reduction, then we will focus on the most widely used techniques. Spaces and variables with many dimensions Let’s say we own a shop and want to collect some data on our clients. We can collect their ages, how frequently they come to our shop, how much they spend on average and when was the last time they came to our shop. Hence each of our clients can be…
Original Post: Machine Learning Explained: Dimensionality Reduction

Google Vision API in R – RoogleVision

Using the Google Vision API in R Utilizing RoogleVision After doing my post last month on OpenCV and face detection, I started looking into other algorithms used for pattern detection in images. As it turns out, Google has done a phenomenal job with their Vision API. It’s absolutely incredible the amount of information it can spit back to you by simply sending it a picture. Also, it’s 100% free! I believe that includes 1000 images per month. Amazing! In this post I’m going to walk you through the absolute basics of accessing the power of the Google Vision API using the RoogleVision package in R. As always, we’ll start off loading some libraries. I wrote some extra notation around where you can install them within the code. # Normal Libraries library(tidyverse) # devtools::install_github(“flovv/RoogleVision”) library(RoogleVision) library(jsonlite) # to import…
Original Post: Google Vision API in R – RoogleVision

Digital Transformation through Data Democratization

[unable to retrieve full-text content]Digital innovators will succeed because enterprise data doesn’t belong to silos and data has immense value, but only if available as a “whole”, to allow full picture of the enterprise rather than short term trends or baseline BI reports.
Original Post: Digital Transformation through Data Democratization

Upcoming Talk at the Bay Area R Users Group (BARUG)

Next Tuesday (August 8) I will be giving a talk at the Bay Area R Users Group (BARUG). The talk is titled Beyond Popularity: Monetizing R Packages. Here is an abstract of the talk: This talk will cover my initial foray into monetizing an open source R package. In 2015, a year after publishing the initial version of choroplethr on CRAN, I made a concerted effort to try and monetize the package. In this workshop you will learn: 1. Why I decided to monetize choroplethr. 2. The monetization strategy I used. 3. Three tactics that can help you monetize your own R package. I first gave this talk at the San Francisco EARL conference in June. The talk was well-received there, and I am looking forward to giving it again! The talk is free to attend, but requires registration. Learn…
Original Post: Upcoming Talk at the Bay Area R Users Group (BARUG)