A Real World Reinforcement Learning Research Program

A Real World Reinforcement Learning Research Program We are hiring for reinforcement learning related research at all levels and all MSR labs. If you are interested, apply, talk to me at COLT or ICML, or email me. More generally though, I wanted to lay out a philosophy of research which differs from (and plausibly improves on) the current prevailing mode. Deepmind and OpenAI have popularized an empirical approach where researchers modify algorithms and test them against simulated environments, including in self-play. They’ve achieved significant success in these simulated environments, greatly expanding the reportoire of ‘games solved by reinforcement learning’ which consisted of the singleton backgammon when I was a graduate student. Given the ambitious goals of these organizations, the more general plan seems to be “first solve games, then solve real problems”. There are some weaknesses to this approach, which…
Original Post: A Real World Reinforcement Learning Research Program

Pervasive Simulator Misuse with Reinforcement Learning

Pervasive Simulator Misuse with Reinforcement Learning The surge of interest in reinforcement learning is great fun, but I often see confused choices in applying RL algorithms to solve problems. There are two purposes for which you might use a world simulator in reinforcement learning: Reinforcement Learning Research: You might be interested in creating reinforcement learning algorithms for the real world and use the simulator as a cheap alternative to actual real-world application. Problem Solving: You want to find a good policy solving a problem for which you have a good simulator. In the first instance I have no problem, but in the second instance, I’m seeing many head-scratcher choices. A reinforcement learning algorithm engaging in policy improvement from a continuous stream of experience needs to solve an opportunity-cost problem. (The RL lingo for opportunity-cost is “advantage”.) Thinking about this in…
Original Post: Pervasive Simulator Misuse with Reinforcement Learning

EWRL and NIPS 2016

EWRL and NIPS 2016 I went to the European Workshop on Reinforcement Learning and NIPS last month and saw several interesting things. At EWRL, I particularly liked the talks from: Remi Munos on off-policy evaluation Mohammad Ghavamzadeh on learning safe policies Emma Brunskill on optimizing biased-but safe estimators (sense a theme?) Sergey Levine on low sample complexity applications of RL in robotics. My talk is here. Overall, this was a well organized workshop with diverse and interesting subjects, with the only caveat being that they had to limit registration At NIPS itself, I found the poster sessions fairly interesting. Allen-Zhu and Hazan had a new notion of a reduction (video). Zhao, Poupart, and Gordon had a new way to learn Sum-Product Networks Ho, Littman, MacGlashan, Cushman, and Austerwell, had a paper on how “Showing” is different from “Doing”. Toulis and…
Original Post: EWRL and NIPS 2016

The Multiworld Testing Decision Service

The Multiworld Testing Decision Service We made a tool that you can use. It is the first general purpose reinforcement-based learning system Reinforcement learning is much discussed these days with successes like AlphaGo. Wouldn’t it be great if Reinforcement Learning algorithms could easily be used to solve all reinforcement learning problems? But there is a well-known problem: It’s very easy…
Original Post: The Multiworld Testing Decision Service

AlphaGo is not the solution to AI

AlphaGo is not the solution to AI Congratulations are in order for the folks at Google Deepmind who have mastered Go. However, some of the discussion around this seems like giddy overstatement. Wired says Machines have conquered the last games and Slashdot says We know now that we don’t need any big new breakthroughs to get to true AI. The…
Original Post: AlphaGo is not the solution to AI