Reinforcement Learning and Language Support

What is the right way to specify a program that learns from experience? Existing general-purpose programming languages are designed to facilitate the specification of any piece of software. So we can just use these programming languages for reinforcement learning, right? Sort of.Abstractions matter An analogy with high performance serving might be helpful. An early influential page on high performance serving (the C10K problem by Dan Kegel) outlines several I/O strategies. I’ve tried many of them. One strategy is event-driven programming, where a core event loop monitors file descriptors for events, and then dispatches handlers. This style yields high performance servers, but is difficult to program and sensitive to programmer error. In addition to fault isolation issues (if all event are running in the same address space), this style is sensitive to whenever any event handler takes too long to execute…
Original Post: Reinforcement Learning and Language Support

Reinforcement Learning as a Service

I’ve been integrating reinforcement learning into an actual product for the last 6 months, and therefore I’m developing an appreciation for what are likely to be common problems. In particular, I’m now sold on the idea of reinforcement learning as a service, of which the decision service from MSR-NY is an early example (limited to contextual bandits at the moment, but incorporating key system insights).Service, not algorithm Supervised learning is essentially observational: some data has been collected and subsequently algorithms are run on it. (Online supervised learning doesn’t necessarily work this way, but mostly online techniques have been used for computational reasons after data collection.) In contrast, counterfactual learning is very difficult do to observationally. Diverse fields such as economics, political science, and epidemiology all attempt to make counterfactual conclusions using observational data, essentially because this is the only data…
Original Post: Reinforcement Learning as a Service

Generating Text via Adversarial Training

There was a really cute paper at the GAN workshop this year, Generating Text via Adversarial Training by Zhang, Gan, and Carin. In particular, they make a couple of unusual choices that appear important. (Warning: if you are not familiar with GANs, this post will not make a lot of sense.)They use a convolutional neural network (CNN) as a discriminator, rather than an RNN. In retrospect this seems like a good choice, e.g. Tong Zhang has been crushing it in text classification with CNNs. CNNs are a bit easier to train than RNNs, so the net result is a powerful discriminator with a relatively easy optimization problem associated with it. They use a smooth approximation to the LSTM output in their generator, but actually this kind of trick appears everywhere so isn’t so remarkable in isolation. They use a pure…
Original Post: Generating Text via Adversarial Training

On the Sustainability of Open Industrial Research

I’m glad OpenAI exists: the more science, the better! Having said that, there was a strange happenstance at NIPS this year. OpenAI released OpenAI universe, which is their second big release of a platform for measuring and training counterfactual learning algorithms. This is the kind of behaviour you would expect from an organization which is promoting the general advancement of AI without consideration of financial gain. At the same time, Google, Facebook, and Microsoft all announced analogous platforms. Nobody blinked an eyelash at the fact that three for-profit organizations were tripping over themselves to give away basic research technologies.A naive train of thought says that basic research is a public good, subject to the free-rider problem, and therefore will be underfunded by for-profit organizations. If you think this is a strawman position, you haven’t heard of the Cisco model for…
Original Post: On the Sustainability of Open Industrial Research

Dialogue Workshop Recap

Most of the speakers have sent me their slides, which can be found on the schedule page. Overall the workshop was fun and enlightening. Here are some major themes that I picked up upon.Evaluation There is no magic bullet, but check out Helen’s slides for a nicely organized discussion of metrics. Many different strategies were on display in the workshop:Milica Gasic utilized crowdsourcing for some of her experiments. She also indicated the incentives of crowdsourcing can lead to unnatural participant behaviours. Nina Dethlefs used a combination of objective (BLEU) and subjective (“naturalness”) evaluation. Vlad Serban has been a proponent of next utterance classification as a useful intrinsic metric. Antoine Bordes (and the other FAIR folks) are heavily leveraging simulation and engineered tasks. Jason Williams used imitation metrics (from hand labeled dialogs) as well as simulation. As Helen points out, computing…
Original Post: Dialogue Workshop Recap

NIPS 2016 Reflections

It was a great conference. The organizers had to break with tradition to accommodate the rapid growth in submissions and attendance, but despite my nostalgia, I feel the changes were beneficial. In particular, leveraging parallel tracks and eliminating poster spotlights allowed for more presentations while ending the day before midnight, and the generous space allocation per poster really improved the poster session. The workshop organizers apparently thought of everything in advance: I didn’t experience any hiccups (although, we only had one microphone, so I got a fair bit of exercise during discussion periods).Here are some high-level themes I picked up on.Openness. Two years ago Amazon started opening up their research, and they are now a major presence at the conference. This year at NIPS, Apple announced they would be opening up their research practices. Clearly, companies are finding it in…
Original Post: NIPS 2016 Reflections

Learning Methods for Dialog Workshop at NIPS This Saturday

The schedule for the workshop has been finalized, and I’m pretty excited. We managed to convince some seasoned researchers in dialog, who don’t normally attend NIPS, to give invited talks. We’re also devoting some time to “Building Complete Systems”, because it’s easy to focus on the trees instead of the forest, especially when the tree is something really interesting like a neural network trained on a bunch of GPUs. But don’t worry, there’s plenty of “NIPS red meat” in the schedule as well.See you on Saturday!
Original Post: Learning Methods for Dialog Workshop at NIPS This Saturday

NIPS dialogue workshop

I’m co-organizing a workshop on dialogue at NIPS 2016. NIPS is not a traditional forum for dialogue research, but there are increasing number of people (like myself!) in machine learning who are becoming interested in dialogue, so the time seemed right. From a personal perspective, dialogue is interesting because 1) it smells like AI, 2) recent advances in (deep learning)…
Original Post: NIPS dialogue workshop

Update on dialogue progress

In a recent blog post I discussed two ideas for moving dialogue forward; both ideas are related to the need to democratize access to the data required to evaluate a dialog system. It turns out both ideas have already been advanced to some degree:Having computers “talk” to each other instead of with people: Marco Beroni is on it. Creating an…
Original Post: Update on dialogue progress

ICML 2016 Thoughts

ICML is too big for me to “review” it per se, but I can provide a myopic perspective.The heavy hitting topics were Deep Learning, Reinforcement Learning, and Optimization; but there was a heavy tail of topics receiving attention. It felt like deep learning was less dominant this year; but the success of deep learning has led to multiple application specific…
Original Post: ICML 2016 Thoughts