Our quest for robust time series forecasting at scale

by ERIC TASSONE, FARZAN ROHANIWe were part of a team of data scientists in Search Infrastructure at Google that took on the task of developing robust and automatic large-scale time series forecasting for our organization. In this post, we recount how we approached the task, describing initial stakeholder needs, the business and engineering contexts in which the challenge arose, and theoretical and pragmatic choices we made to implement our solution.Introduction Time series forecasting enjoys a rich and luminous history, and today is an essential element of most any business operation. So it should come as no surprise that Google has compiled and forecast time series for a long time. For instance, the image below from the Google Visitors Center in Mountain View, California, shows hand-drawn time series of “Results Pages” (essentially search query volume) dating back nearly to the founding…
Original Post: Our quest for robust time series forecasting at scale

Attributing a deep network’s prediction to its input features

Editor’s note: Causal inference is central to answering questions in science, engineering and business and hence the topic has received particular attention on this blog. Typically, causal inference in data science is framed in probabilistic terms, where there is statistical uncertainty in the outcomes as well as model uncertainty about the true causal mechanism connecting inputs and outputs. And yet even when the relationship between inputs and outputs is fully known and entirely deterministic, causal inference is far from obvious for a complex system. In this post, we explore causal inference in this setting via the problem of attribution in deep networks. This investigation has practical as well as philosophical implications for causal inference. On the other hand, if you just care about understanding what a deep network is doing, this post is for you too. Deep networks have had…
Original Post: Attributing a deep network’s prediction to its input features

Causality in machine learning

By OMKAR MURALIDHARAN, NIALL CARDIN, TODD PHILLIPS, AMIR NAJMIGiven recent advances and interest in machine learning, those of us with traditional statistical training have had occasion to ponder the similarities and differences between the fields. Many of the distinctions are due to culture and tooling, but there are also differences in thinking which run deeper. Take, for instance, how each field views the provenance of the training data when building predictive models. For most of ML, the training data is a given, often presumed to be representative of the data against which the prediction model will be deployed, but not much else. With a few notable exceptions, ML abstracts away from the data generating mechanism, and hence sees the data as raw material from which predictions are to be extracted. Indeed, machine learning generally lacks the vocabulary to capture the…
Original Post: Causality in machine learning

Reinforcement Learning as a Service

I’ve been integrating reinforcement learning into an actual product for the last 6 months, and therefore I’m developing an appreciation for what are likely to be common problems. In particular, I’m now sold on the idea of reinforcement learning as a service, of which the decision service from MSR-NY is an early example (limited to contextual bandits at the moment, but incorporating key system insights).Service, not algorithm Supervised learning is essentially observational: some data has been collected and subsequently algorithms are run on it. (Online supervised learning doesn’t necessarily work this way, but mostly online techniques have been used for computational reasons after data collection.) In contrast, counterfactual learning is very difficult do to observationally. Diverse fields such as economics, political science, and epidemiology all attempt to make counterfactual conclusions using observational data, essentially because this is the only data…
Original Post: Reinforcement Learning as a Service

Generating Text via Adversarial Training

There was a really cute paper at the GAN workshop this year, Generating Text via Adversarial Training by Zhang, Gan, and Carin. In particular, they make a couple of unusual choices that appear important. (Warning: if you are not familiar with GANs, this post will not make a lot of sense.)They use a convolutional neural network (CNN) as a discriminator, rather than an RNN. In retrospect this seems like a good choice, e.g. Tong Zhang has been crushing it in text classification with CNNs. CNNs are a bit easier to train than RNNs, so the net result is a powerful discriminator with a relatively easy optimization problem associated with it. They use a smooth approximation to the LSTM output in their generator, but actually this kind of trick appears everywhere so isn’t so remarkable in isolation. They use a pure…
Original Post: Generating Text via Adversarial Training

How big compute is powering the deep learning rocket ship

The D.E. Shaw Supercomputer, Anton. (source: Matt Simmons on Flickr).For more on this topic, check out the deep learning session lineup at Strata + Hadoop World San Jose, March 13-16, 2017. Subscribe to the O’Reilly Data Show Podcast to explore the opportunities and techniques driving big data, data science, and AI. Find us on Stitcher, TuneIn, iTunes, SoundCloud, RSS. Specialists describe deep learning as akin to a rocketship that needs a really big engine (a model) and a lot of fuel (the data) in order to go anywhere interesting. To get a better understanding of the issues involved in building compute systems for deep learning, I spoke with one of the foremost experts on this subject: Greg Diamos, senior researcher at Baidu. Diamos has long worked to combine advances in software and hardware to make computers run faster. In recent…
Original Post: How big compute is powering the deep learning rocket ship

The Prior: Fully comprehended last, put first, checked the least?

Priors are important in Bayesian inference. Some would even say : ” In Bayesian inference you can—OK, you must—assign a prior distribution representing the set of values the coefficient [i.e any unknown parameter] can be.” Although priors are put first in most expositions, my sense is that in most applications they are seldom considered first, are checked the least and actually fully comprehended last (or perhaps not fully at all). It reminds of the comical response of someone when asked for difficult directions – “If I wanted to go there, I wouldn’t start out from here.” Perhaps this is less comical – “If I am going to be doing a Bayesian analyses, I do not want to be responsible for getting and checking the prior. Maybe the domain expert should do that or just accept the default priors I find in…
Original Post: The Prior: Fully comprehended last, put first, checked the least?

Custom images for Shiny dashboard valueBox icons

The shinydashboard package provides functions like valueBox that conveniently display basic information like summary statistics. In addition to presenting a value and subtitle on a colored background, an icon may be included as well. However, the icon must come from either the Font Awesome or Glyphicon icon libraries and cannot be image files. I’ve provided a gist that shows how to achieve the use of custom icons with local image files stored in an app’s www/ directory. It involves overriding a couple functions in shiny and shinydashboard and adding a small bit of custom CSS. Ideally, functionality could be included in future versions of these two packages to allow this in a more robust and complete fashion. But for now, here is a way to do it yourself for value boxes. The gist above includes the app.R file to…
Original Post: Custom images for Shiny dashboard valueBox icons

The 2017 machine learning outlook

Tower viewer. (source: Pexels).Join Steven Camiña of MemSQL for “Building the Ideal Stack for Machine Learning,” where he’ll share how to use real-time data for machine learning. Machine learning has been a mainstream commercial field for some time now, but it’s going through an important acceleration. In this podcast episode, I talk about that acceleration with two executives from MemSQL, a company that specializes in in-memory databases: Gary Orenstein, MemSQL chief marketing officer, and Drew Paroski, MemSQL vice president of engineering. Orenstein and Paroski identify a few crucial inflections in the machine learning landscape: machine learning models have become easier to write; computing capacity on the cloud has increased dramatically; and new sources of data—everything from drones to smart-home devices and industrial controllers—have added new richness to machine learning models. Computing capacity and software progress have made it possible to…
Original Post: The 2017 machine learning outlook

Third Actuarial Pricing Game

With the support of ACTINFO Chair and the (French) Institute of Actuaries, our Third Actuarial Pricing Game starts today ! There is a toolbox file available online, with a description of the game : the rules, the dates, and a description of the datasets 3 datasets : one underwriting and one claims databases, for year 0 (training data) and one underwriting dataset to enter the game Anyone can play. Students from various programs around the world, as well as practitioners are welcome to play. It can be by teams, and there are no limit on the size. And there is no registration: to start playing, teams have to submit a dataset before the deadline (end of February), to [email protected]. Related R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data,…
Original Post: Third Actuarial Pricing Game