New open data sets from Microsoft Research

Microsoft has released a number of data sets produced by Microsoft Research and made them available for download at Microsoft Research Open Data.   The Datasets in Microsoft Research Open Data are categorized by their primary research area, such as Physics, Social Science, Environmental Science, and Information Science. Many of the data sets have not been previously available to the public, and many are large and useful for research in AI and Machine Learning techniques. Many of the datasets also include links to associated papers from Microsoft Research. For example, the 10Gb DESM Word Embeddings dataset provides the IN and the OUT word2vec embeddings for 2.7M words trained on a Bing query corpus of 600M+ queries. Other data sets of note include: A collection of 38M tweets related to the 2012 US election 3-D capture data from individuals performing a variety of hand gestures…
Original Post: New open data sets from Microsoft Research

KDnuggets™ News 18:n26, July 11: 5 Favorite Free Visualization Tools; SQL Cheat Sheet; Top 20 Python Libraries for Data Science

[unable to retrieve full-text content]Also Introduction to Apache Spark; fast.ai Machine Learning Course Notes; Cartoon: How is Data Science Different From Religion?
Original Post: KDnuggets™ News 18:n26, July 11: 5 Favorite Free Visualization Tools; SQL Cheat Sheet; Top 20 Python Libraries for Data Science

Analyze a Soccer (Football) Game Using Tensorflow Object Detection and OpenCV

[unable to retrieve full-text content]For the data scientist within you let’s use this opportunity to do some analysis on soccer clips. With the use of deep learning and opencv we can extract interesting insights from video clips
Original Post: Analyze a Soccer (Football) Game Using Tensorflow Object Detection and OpenCV

Manage your Machine Learning Lifecycle with MLflow  –  Part 1

[unable to retrieve full-text content]Reproducibility, good management and tracking experiments is necessary for making easy to test other’s work and analysis. In this first part we will start learning with simple examples how to record and query experiments, packaging Machine Learning models so they can be reproducible and ran on any platform using MLflow.
Original Post: Manage your Machine Learning Lifecycle with MLflow  –  Part 1

Text Classification & Embeddings Visualization Using LSTMs, CNNs, and Pre-trained Word Vectors

[unable to retrieve full-text content]In this tutorial, I classify Yelp round-10 review datasets. After processing the review comments, I trained three model in three different ways and obtained three word embeddings.
Original Post: Text Classification & Embeddings Visualization Using LSTMs, CNNs, and Pre-trained Word Vectors

Deep Quantile Regression

[unable to retrieve full-text content]Most Deep Learning frameworks currently focus on giving a best estimate as defined by a loss function. Occasionally something beyond a point estimate is required to make a decision. This is where a distribution would be useful. This article will purely focus on inferring quantiles.
Original Post: Deep Quantile Regression

Inside the Mind of a Neural Network with Interactive Code in Tensorflow

[unable to retrieve full-text content]Understand the inner workings of neural network models as this post covers three related topics: histogram of weights, visualizing the activation of neurons, and interior / integral gradients.
Original Post: Inside the Mind of a Neural Network with Interactive Code in Tensorflow

Building a Basic Keras Neural Network Sequential Model

[unable to retrieve full-text content]The approach basically coincides with Chollet’s Keras 4 step workflow, which he outlines in his book “Deep Learning with Python,” using the MNIST dataset, and the model built is a Sequential network of Dense layers. A building block for additional posts.
Original Post: Building a Basic Keras Neural Network Sequential Model

Top 20 Python Libraries for Data Science in 2018

[unable to retrieve full-text content]Our selection actually contains more than 20 libraries, as some of them are alternatives to each other and solve the same problem. Therefore we have grouped them as it’s difficult to distinguish one particular leader at the moment.
Original Post: Top 20 Python Libraries for Data Science in 2018