My CV PhD Thesis Google Scholar Semantic Scholar

Note: Page not maintained, please see Google or Semantic Scholar.

Camels in a Changing Climate: Enhancing LM Adaptation with Tulu 2

Nov 2023

H. Ivison and Y. Wang et al.

Advancements in instruction tuning and RLHF! Empirical studies.

[pdf][arxiv][code][video (<5 min)][talk (>15min)]

The Alignment Ceiling: Objective Mismatch in Reinforcement Learning from Human Feedback

Oct 2023

Nathan Lambert, Roberto Calandra

How the optimization setup of RLHF is limiting the steerability of LLMs.

[pdf][arxiv][code][video (<5 min)][talk (>15min)]

Zephyr: Direct Distillation of LM Alignment

Oct 2023

HuggingFace H4 Team

The report for a small and powerful chat model trained with DPO!

[pdf][arxiv][code][video (<5 min)][talk (>15min)]

The History and Risks of Reinforcement Learning and Human Feedback

Oct 2023

Nathan Lambert, Thomas Krendl Gilbert, Tom Zick

The complicated historical past underpinning reinforcement learning from human feedback!

[pdf][arxiv][code][video (<5 min)][talk (>15min)]

A Unified View on Solving Objective Mismatch in Model-Based Reinforcement Learning

Oct 2023

Ran Wei, Nathan Lambert, Anthony McDonald, Alfredo Garcia, Roberto Calandra

Where is model-based RL heading 4 years after the seminal paper of my Ph.D.

[pdf][arxiv][code][video (<5 min)][talk (>15min)]

Measuring Data

Dec 2022

Margaret Mitchell, Alexandra Sasha Luccioni, Nathan Lambert, Marissa Gerchick, Angelina McMillan-Major, Ezinwanne Ozoani, Nazneen Rajani, Tristan Thrush, Yacine Jernite, Douwe Kiela

When you "measure data", you quantify its characteristics to support dataset comparison & curation. You also begin to know what systems will learn. Many ML systems don't reason with this, we posit you should.

[pdf][arxiv][code][video (<5 min)][talk (>15min)]

Synergy of Prediction and Control in Model-based Reinforcement Learning

May 2022

Nathan Lambert

My thesis on model-based RL. Let's make models work with tasks!

[pdf][arxiv][code][video (<5 min)][talk (>15min)]

Reward Reports for Reinforcement Learning

Apr 2022

Thomas Krendl Gilbert, Sarah Dean, Nathan Lambert, Tom Zick, Aaron Snoswell

We propose a new type of documentation for dynamic machine learning (and reinforcement learning) systems!

[pdf][arxiv][code][video (<5 min)][talk (>15min)]

Choices, Risks, and Reward Reports: Charting Public Policy for Reinforcement Learning Systems

Feb 2022

Thomas Krendl Gilbert, Sarah Dean, Tom Zick, Nathan Lambert

We detail why reinforcement learning systems pose a different type of (dynamic) risks to society. This paper outlines the different types of feedback present in RL systems, the risks they pose, and a path forward for policymakers.

[pdf][arxiv][code][video (<5 min)][talk (>15min)]

The Challenges of Exploration for Offline Reinforcement Learning

Feb 2022

Nathan Lambert, Markus Wulfmeier, William Whitney, Arunkumar Byravan, Michael Bloesch, Vibhavari Dasagi, Tim Hertweck, Martin Riedmiller

We flip the script on Offline RL research and ask the question of "what is the best dataset to collect?" rather than "what is the best algorithm?"

[pdf][arxiv][code][video (<5 min)][talk (>15min)]

Investigating Compounding Prediction Errors in Learned Dynamics Models

Dec 2021

Nathan Lambert, Kristofer Pister, Roberto Calandra

In this paper we set out to understand the causes of compounding prediction errors in one-step learned models. With this, we hope a next generation of models can be used to improve model-based reinforcement learning.

[pdf][arxiv][code][video (<5 min)][talk (>15min)]

BotNet: A Simulator for Studying the Effects of Accurate Communication Models on Multi-agent and Swarm Control

Sep 2021

Mark Selden, Jason Zhou, Felipe Campos, Nathan Lambert, Daniel Drew, Kristofer S. J. Pister

A simulator for studying high-agent-count networked systems!

[pdf][arxiv][code][video (<5 min)][talk (>15min)]

Best Student Paper Finalist!

Axes for Sociotechnical Inquiry in AI Research

May 2021

Sarah Dean, Thomas Krendl Gilbert, Nathan Lambert, Tom Zick

We present a concise set of directions for understanding the societal risks of new directions of AI research.

[pdf][arxiv][code][video (<5 min)][talk (>15min)]

MBRL-Lib: A Modular Library for Model-based Reinforcement Learning

Apr 2021

Luis Pineda, Brandon Amos, Amy Zhang, Nathan O Lambert, Roberto Calandra

An open-source PyTorch repository designed from the bottom up for model-based reinforcement learning research.

[pdf][arxiv][code][video (<5 min)][talk (>15min)]

On the Importance of Hyperparameter Optimization for Model-based Reinforcement Learning

Feb 2021

Baohe Zhang, Raghu Rajan, Luis Pineda, Nathan Lambert, André Biedenkapp, Kurtland Chua, Frank Hutter, Roberto Calandra

We showed that advancements in AutoML when paired with common deep RL tasks, MBRL algorithms perform so well they break the simulator.

[pdf][arxiv][code][video (<5 min)][talk (>15min)]

AI Development for the Public Interest: From Abstraction Traps to Sociotechnical Risks

Feb 2021

McKane Andrus, Sarah Dean, Thomas Krendl Gilbert, Nathan Lambert, Tom Zick

We study three developing subfields of AI research and their growing relationship with the sociotechnical: AI Safety, Fair Machine Learning, and Human-in-the-loop Autonomy.

[pdf][arxiv][code][video (<5 min)][talk (>15min)]

Nonholonomic Yaw Control of an Underactuated Flying Robot with Model-based Reinforcement Learning

Dec 2020

Nathan Lambert, Craig Schindler, Daniel S Drew, Kristofer SJ Pister

We explored how MBRL can learn multi-step, nonlinear controllers!

[pdf][arxiv][code][video (<5 min)][talk (>15min)]

Learning Accurate Long-term Dynamics for Model-based Reinforcement Learning

Dec 2020

Nathan O Lambert, Albert Wilcox, Howard Zhang, Kristofer SJ Pister, Roberto Calandra

Trying to reframe the MBRL framework with long-term predictions instead of one-step predictions!

[pdf][arxiv][code][video (<5 min)][talk (>15min)]

Learning for Microrobot Exploration: Model-based Locomotion, Robust Navigation, and Low-Power Deep Classification

Jul 2020

Nathan Lambert, Fahran Toddywala, Brian Liao, Eric Zhu, Lydia Lee, Kristofer S.J. Pister

A collections of steps towards a data-driven autonomous microrobot.

[pdf][arxiv][code][video (<5 min)][talk (>15min)]

Objective Mismatch in Model-based Reinforcement Learning

Feb 2020

Nathan Lambert, Brandon Amos, Omry Yadan, Roberto Calandra

Studying the numerical effects of a dual-optimization problem in model-based reinforcement learning -- control and dynamics. When optimizing model accuracy, there is no guarantee on improving task performance!

[pdf][arxiv][code][video (<5 min)][talk (>15min)]

Learning Generalizable Locomotion Skills with Hierarchical Reinforcement Learning

Sep 2019

Tianyu Li, Nathan Lambert , Roberto Calandra , Franziska Meier , Akshara Rai

Learning how to walk with a real-world hexapod using a hierarchy of model-free RL for basic motion primitives with model-based RL for higher level planning.

[pdf][arxiv][code][video (<5 min)][talk (>15min)]

Low Level Control of a Quadrotor with Deep Model-Based Reinforcement Learning

Jan 2019

Nathan Lambert, Daniel Drew, Joseph Yaconelli, Roberto Calandra, Sergey Levine, Kristofer Pister

We used deep model-based reinforcement learning to have a quadrotor learn to hover from less than 5 minutes of all experimental training data.

[pdf][arxiv][code][video (<5 min)][talk (>15min)]

Toward Controlled Flight of the Ionocraft: A Flying Microrobot Using Electrohydrodynamic Thrust With Onboard Sensing and No Moving Parts

Jul 2018

Daniel Drew, Nathan Lambert, Craig Schindler, Kristofer SJ Pister

A collection of steps towards controlled flight of The Ionocraft, a completely silent microrobot with ion thrust!

[pdf][arxiv][code][video (<5 min)][talk (>15min)]

April 10, 2020

Semiautonomous Seminar (UC Berkeley)

A mixed talk discussing the research challenges of controlling microrobots and how model-learning can be used to synthesize highly specific controllers.

[Watch Me]

[Slides]

Note: Page not maintained, please see Google or Semantic Scholar.

Papers

Talks

Camels in a Changing Climate: Enhancing LM Adaptation with Tulu 2

The Alignment Ceiling: Objective Mismatch in Reinforcement Learning from Human Feedback

Zephyr: Direct Distillation of LM Alignment

The History and Risks of Reinforcement Learning and Human Feedback

A Unified View on Solving Objective Mismatch in Model-Based Reinforcement Learning

Measuring Data

Synergy of Prediction and Control in Model-based Reinforcement Learning

Reward Reports for Reinforcement Learning

Choices, Risks, and Reward Reports: Charting Public Policy for Reinforcement Learning Systems

The Challenges of Exploration for Offline Reinforcement Learning

Investigating Compounding Prediction Errors in Learned Dynamics Models

BotNet: A Simulator for Studying the Effects of Accurate Communication Models on Multi-agent and Swarm Control

Axes for Sociotechnical Inquiry in AI Research

MBRL-Lib: A Modular Library for Model-based Reinforcement Learning

On the Importance of Hyperparameter Optimization for Model-based Reinforcement Learning

AI Development for the Public Interest: From Abstraction Traps to Sociotechnical Risks

Nonholonomic Yaw Control of an Underactuated Flying Robot with Model-based Reinforcement Learning

Learning Accurate Long-term Dynamics for Model-based Reinforcement Learning

Learning for Microrobot Exploration: Model-based Locomotion, Robust Navigation, and Low-Power Deep Classification

Objective Mismatch in Model-based Reinforcement Learning

Learning Generalizable Locomotion Skills with Hierarchical Reinforcement Learning

Low Level Control of a Quadrotor with Deep Model-Based Reinforcement Learning

Toward Controlled Flight of the Ionocraft: A Flying Microrobot Using Electrohydrodynamic Thrust With Onboard Sensing and No Moving Parts

15min History of Reinforcement Learning and Human Feedback

DPO: Is RL needed for RLHF?

Bridging RLHF from LLMs back to control

Objective Mismatch in Reinforcement Learning from Human Feedback

Reinforcement Learning from Human Feedback: A Tutorial

Reinforcement Learning from Human Feedback: Open and Academic Perspectives

Intro to Reinforcement Learning from Human Feedback

Planning through Exploration and Exploitation in Model-based Reinforcement Learning

(Dissertation Talk) Synergy of Prediction and Control in Model-based Reinforcement Learning

Machine Learning for Microsystem Control

Improving Model Predictive Control Used in Model-based Reinforcement Learning

Bringing Model-based RL to Novel Robots

Model Learning for Low-level Control in Robotics