Five Takeaways from ODSC East 2018

The last four days in Boston have been nothing but attending talks and meeting with great people. I was exposed to a variety of interesting topics, including data science/deep learning applications in healthcare and other fields, and technical discussions/training sessions at different levels. The bottom line is that ODSC definitely exceeded my expectation. Here I…

Generative Adversarial Networks – An Experiment with Training Improvement Techniques (in Tensorflow)

Well first, if you’re interested in Deep Learning but just don’t bother reading this post at all, I recommend you to take a look at the deep learning nano degree offered by Udacity. It looks like a well designed series of courses that covers all major aspects of deep learning. Introduction Generative Adversarial Networks (GANs)…

Scoring H2O MOJO Models with Spark DataFrame and Dataset

by Jiankun Liu Introduction H2O allows you to export models to POJOs or MOJOs (Model Object, Optimized) and later be deployed in production, presumably for scoring large datasets, or building real-time applications. Theoretically it would work in a spark application, but the official documentation did not explain into details other than saying you can “create…

Build a multi-label documentation classification model for SHARE using One vs the rest classifiers

Jiankun Liu A recent post talked about how we can label documents on SHARE with Natural Language Processing models. In this post I’m going to include more detail on how it was done. If you are interested in reading further, I recommend you read the previous post (link) first, which introduced the problem, the data…

Classify SHARE documents with Natural Language Processing

*Edit: SHARE has published the post on their blog: http://www.share-research.org/2016/05/classifying-research-activity-in-share-with-natural-language-processing/ * Jiankun Liu, 03/22/2016 Developers at the Center for Open Science working on the SHARE project are constantly looking for ways to improve SHARE’s metadata quality. One challenging task is to add subject areas so that users can have more options and control when searching…