Model Training Productionization with H2O REST API and Requests in Python

Building a model scoring application in production using H2O is greatly benefited from its ability to export models to MOJOs which allows scoring large datasets in Spark without native H2O dependency. However, the model training process is still hard to productionize given the nature of how data scientists work vs engineering requirements to build a…

Five Takeaways from ODSC East 2018

The last four days in Boston have been nothing but attending talks and meeting with great people. I was exposed to a variety of interesting topics, including data science/deep learning applications in healthcare and other fields, and technical discussions/training sessions at different levels. The bottom line is that ODSC definitely exceeded my expectation. Here I…

Generative Adversarial Networks – An Experiment with Training Improvement Techniques (in Tensorflow)

Well first, if you’re interested in Deep Learning but just don’t bother reading this post at all, I recommend you to take a look at the deep learning nano degree offered by Udacity. It looks like a well designed series of courses that covers all major aspects of deep learning. Introduction Generative Adversarial Networks (GANs)…

Scoring H2O MOJO Models with Spark DataFrame and Dataset

by Jiankun Liu Introduction H2O allows you to export models to POJOs or MOJOs (Model Object, Optimized) and later be deployed in production, presumably for scoring large datasets, or building real-time applications. Theoretically it would work in a spark application, but the official documentation did not explain into details other than saying you can “create…

Classify SHARE documents with Natural Language Processing

*Edit: SHARE has published the post on their blog: http://www.share-research.org/2016/05/classifying-research-activity-in-share-with-natural-language-processing/ * Jiankun Liu, 03/22/2016 Developers at the Center for Open Science working on the SHARE project are constantly looking for ways to improve SHARE’s metadata quality. One challenging task is to add subject areas so that users can have more options and control when searching…