Pervasive Simulator Misuse with Reinforcement Learning

Pervasive Simulator Misuse with Reinforcement Learning The surge of interest in reinforcement learning is great fun, but I often see confused choices in applying RL algorithms to solve problems. There are two purposes for which you might use a world simulator in reinforcement learning: Reinforcement Learning Research: You might be interested in creating reinforcement learning algorithms for the real world and use the simulator as a cheap alternative to actual real-world application. Problem Solving: You want to find a good policy solving a problem for which you have a good simulator. In the first instance I have no problem, but in the second instance, I’m seeing many head-scratcher choices. A reinforcement learning algorithm engaging in policy improvement from a continuous stream of experience needs to solve an opportunity-cost problem. (The RL lingo for opportunity-cost is “advantage”.) Thinking about this in…
Original Post: Pervasive Simulator Misuse with Reinforcement Learning