Reinforcement Learning

←All Research

The most opaque and intriguing system

Reinforcement learning is a framework open to such a deep level of investigation. There are theoretical proofs showing convergence, new algorithms, relations to biology, and my personal favorite: applications. RL is becoming feasible to use in real-world systems and this has potentially huge implications (see this write-up by a colleague) because it’s interactions and problem definition are not well-posed. Legislating this so safety and usefulness are preserved is an active area of my work.

I spend most of my time thinking about the variant of model-based reinforcement learning, which involves very similar optimizations, but has a structured and modular learning setup. Learning a dynamics model lends itself to interpretability and generalization (see model-learning).

Model-based reinforcement learning.

Open areas of study:

  • interpretable RL algorithms: what can we learn about how an agent comes to a decision?
  • non sample-based optimization with a learned model.
  • multi-agent learning (100s or 1000s of agents).
Advancements in instruction tuning and RLHF! Empirical studies.
How the optimization setup of RLHF is limiting the steerability of LLMs.
The report for a small and powerful chat model trained with DPO!
The complicated historical past underpinning reinforcement learning from human feedback!
Where is model-based RL heading 4 years after the seminal paper of my Ph.D.
My thesis on model-based RL. Let's make models work with tasks!
We propose a new type of documentation for dynamic machine learning (and reinforcement learning) systems!
We detail why reinforcement learning systems pose a different type of (dynamic) risks to society. This paper outlines the different types of feedback present in RL systems, the risks they pose, and a path forward for policymakers.
We flip the script on Offline RL research and ask the question of "what is the best dataset to collect?" rather than "what is the best algorithm?"
An open-source PyTorch repository designed from the bottom up for model-based reinforcement learning research.
We showed that advancements in AutoML when paired with common deep RL tasks, MBRL algorithms perform so well they break the simulator.
We explored how MBRL can learn multi-step, nonlinear controllers!
Trying to reframe the MBRL framework with long-term predictions instead of one-step predictions!
Studying the numerical effects of a dual-optimization problem in model-based reinforcement learning -- control and dynamics. When optimizing model accuracy, there is no guarantee on improving task performance!
Learning how to walk with a real-world hexapod using a hierarchy of model-free RL for basic motion primitives with model-based RL for higher level planning.
We used deep model-based reinforcement learning to have a quadrotor learn to hover from less than 5 minutes of all experimental training data.