Note: an updated version of this blog post can be found on the Berkeley Artificial Intelligence Research (BAIR) Blog.
This is one of those papers that changes your views on things. I knew hyper-parameters were important to deep RL and I knew that model-based RL is a more complicated system to build and deploy, but magnitude of the effects of tuning parameters in MBRL is boggling. In breaking the simulator with a really simple MBRL algorithm, you can learn a lot more about the state of the field.
Full disclaimer: I was not the lead researcher on this project. I helped run experiments and understand the meaning of the experiments in the broader field.
Automatic Machine Learning (AutoML) is a field dedicated to the study of using machine learning algorithms to tune our machine learning tools. Humans are really bad at internalizing higher-dimensional relationships, so let’s let a computer do it for us. A harder problem is dynamic parameter tuning (where parameters can change within a run), but more on that later.
From another post of mine.
Model-based reinforcement learning (MBRL) follows the framework of an agent interacting in an environment, learning a model of said environment, and then leveraging the model for control. Specifically, the agent acts in a Markov Decision Process (MDP) governed by a transition function xt+1 = f (xt , ut) and returns a reward at each step r(xt, ut). With a collected dataset D := xi, ui, xi+1, ri, the agent learns a model, xt+1 = fθ (xt , ut) to minimize the negative log-likelihood of the transitions. We employ sample-based model-predictive control (MPC) using the learned dynamics model, which optimizes the expected reward over a finite, recursively predicted horizon, τ, from a set of actions sampled from a uniform distribution U(a), (see paper or paper or paper).
Why may we see an even outsized impact of AutoML in MBRL? It has a ton of moving parts. First off, more machine learning parts equals a harder problem, yet there is a bigger reason that compounds to make parameter tuning way more impactful (in intuition) to MBRL. Normally, a parameter tuning graduate student will fine tune one problem at once, but in MBRL there are two weirdly connected systems (the objectives are mismatched), so no human will be able to find the perfect parameters unless yielding to luck.
Mujoco showed up as a favorite of Deep RL because it was available when a massive growth phase came through. Mujoco is expensive, restricted, and not an accurate portrayal of the real world. It is a decent simulator, relatively lightweight, and easy-enough to use. That makes it relatively good for individual researchers trying to prove themselves but not necessarily great for the long-term health of the field. Now state-of-the-art (SOTA) racing has reached a new level, so real intellectual breakthroughs in ML are stagnating. I suspect over the next 5 years the simulators used by deep RL researchers will change substantially, but the jury is still out whether that change in sim will also translate to an improvement in research practices. (Note, back when I was on Medium, I wrote a post about baselines in RL research, and it is even more true now).
In short, the results of this paper are astounding. With sufficient hyper parameter tuning, the MBRL algorithm (PETS) literally breaks mujoco. The famous “halfcheetah” task degenerates into a glorious spiral of data-driven method heaven.
Normally, it is supposed to run. The paper has a much, much wider range of results on multiple environments, but I leave that to the reader. The paper has interesting tradeoffs between optimizing the model (learning the dynamics) and the controller (solving the reward-maximization problem). Additionally, it shows how dynamically changing the parameters throughout a trial can be useful, such as increasing your model horizon as the algorithm collects data and the model becomes more accurate.
I am really confident about the future of MBRL these days. This paper shows how much more capable the algorithms we have are in terms of optimality. Second, I feel like all the pieces of this “MBRL System” the field uses regularly now has a lot of future directions to exploit.
Thanks for reading, and thanks again to the authors who did the majority of the work on this paper. I will be adding a video attachment of the poster presentation when we get it done.
@article{zhang2021importance,
title={On the Importance of Hyperparameter Optimization for Model-based Reinforcement Learning},
author={Zhang, Baohe and Rajan, Raghu and Pineda, Luis and Lambert, Nathan and Biedenkapp, Andr{\'e} and Chua, Kurtland and Hutter, Frank and Calandra, Roberto},
journal={AISTATS},
year={2021}
}