I am a PhD student in the Computer Science Department at Stanford University advised by Emma Brunskill. Most recently, I've been interested in topics around foundation models for decision-making/reasoning, reinforcement learning, and alignment.
During my PhD, I have been supported by an NSF Graduate Research Fellowship. I previously graduated with a B.S. in Electrical Engineering & Computer Science
from UC Berkeley where I worked on robot learning, advised by Ken Goldberg.
I am thankful to have spent two wonderful summers at Google with
George Tucker, Ofir Nachum, and Bo Dai on the Brain Team (now Google DeepMind), and
Chris Dann, Alekh Agarwal, and Tong Zhang on the Learning Theory Team.
|
Selected Projects
|
Eliciting Math Reasoning from Language Models via Step-by-Step Self-Training.
Jonathan Lee et al.
In preparation.
Reasoning Foundation Models for Decision-Making
|
|
Formal Reasoning for Large Language Models via Information-Directed Search.
Yash Chandak, Jonathan Lee, Emma Brunskill
In preparation.
Reasoning Foundation Models for Decision-Making
|
|
Supervised Pretraining Can Learn In-Context Reinforcement Learning.
Jonathan Lee*, Annie Xie*, Aldo Pacchiano, Yash Chandak, Chelsea Finn, Ofir Nachum, Emma Brunskill
Neural Information Processing Systems (NeurIPS), 2023. (Spotlight)
Foundation Models for Decision-Making
|
|
Dueling RL: Reinforcement Learning with Trajectory Preferences.
Aldo Pacchiano, Aadirupa Saha, Jonathan Lee
International Conference on Artificial Intelligence and Statistics (AISTATS), 2023.
Alignment / RLHF
|
All Publications
2023 |
Supervised Pretraining Can Learn In-Context Reinforcement Learning.
Jonathan Lee*, Annie Xie*, Aldo Pacchiano, Yash Chandak, Chelsea Finn, Ofir Nachum, Emma Brunskill
Neural Information Processing Systems (NeurIPS), 2023.
(Spotlight)
| |
Experiment Planning with Function Approximation.
Aldo Pacchiano, Jonathan Lee, Emma Brunskill
Neural Information Processing Systems (NeurIPS), 2023.
| |
Learning in POMDPs is Sample-Efficient with Hindsight Observability.
Jonathan Lee, Alekh Agarwal, Christoph Dann, Tong Zhang
International Conference on Machine Learning (ICML), 2023.
| |
Dueling RL: Reinforcement Learning with Trajectory Preferences.
Aldo Pacchiano, Aadirupa Saha, Jonathan Lee
International Conference on Artificial Intelligence and Statistics (AISTATS), 2023.
| |
Estimating Optimal Policy Value in General Linear Contextual Bandits.
Jonathan Lee, Weihao Kong, Aldo Pacchiano, Vidya Muthukumar, Emma Brunskill
arXiv, 2023.
| 2022 |
Oracle Inequalities for Model Selection in Offline Reinforcement Learning.
Jonathan Lee, George Tucker, Ofir Nachum, Bo Dai, Emma Brunskill
Neural Information Processing Systems (NeurIPS), 2022.
| |
Model Selection in Batch Policy Optimization.
Jonathan Lee, George Tucker, Ofir Nachum, Bo Dai
International Conference on Machine Learning (ICML), 2022.
| 2021 |
Design of Experiments for Stochastic Contextual Linear Bandits.
Andrea Zanette*, Kefan Dong*, Jonathan Lee*, Emma Brunskill
Neural Information Processing Systems (NeurIPS), 2021.
| |
Near Optimal Policy Optimization via REPS.
Aldo Pacchiano, Jonathan Lee, Peter Bartlett, Ofir Nachum
Neural Information Processing Systems (NeurIPS), 2021.
| |
Online Model Selection for Reinforcement Learning with Function Approximation.
Jonathan Lee, Aldo Pacchiano, Vidya Muthukumar, Weihao Kong, Emma Brunskill
International Conference on Artificial Intelligence and Statistics (AISTATS), 2021.
| |
Dynamic Regret Convergence Analysis and an Adaptive Regularization Algorithm for On-Policy Robot Imitation Learning.
Jonathan Lee, Michael Laskey, Ajay Kumar Tanwani, Anil Aswani, Ken Goldberg.
International Journal of Robotics Research (IJRR), 2021.
(Invited paper)
| 2020 |
Accelerated Message Passing for Entropy-Regularized MAP Inference.
Jonathan Lee, Aldo Pacchiano, Peter Bartlett, Michael I. Jordan.
International Conference on Machine Learning (ICML), 2020.
| |
Convergence Rates of Smooth Message Passing with Rounding in Entropy-Regularized MAP Inference.
Jonathan Lee*, Aldo Pacchiano*, Michael I. Jordan.
International Conference on Artificial Intelligence and Statistics (AISTATS), 2020.
| |
Online Learning with Continuous Variations: Dynamic Regret and Reductions.
Ching-An Cheng*, Jonathan Lee*, Ken Goldberg, Byron Boots.
International Conference on Artificial Intelligence and Statistics (AISTATS), 2020.
| 2019 |
On-Policy Robot Imitation Learning from a Converging Supervisor.
Ashwin Balakrishna*, Brijen Thananjeyan*, Jonathan Lee, Arsh Zahed, Felix Li, Joseph E. Gonzalez, Ken Goldberg.
Conference on Robot Learning (CoRL), 2019.
(Oral)
| |
A Dynamic Regret Analysis and Adaptive Regularization Algorithm for On-Policy Robot Imitation Learning.
Jonathan Lee, Michael Laskey, Ajay Kumar Tanwani, Anil Aswani, Ken Goldberg.
Springer Proceedings in Advanced Robotics: Algorithmic Foundations of Robotics, 2019.
International Workshop on the Algorithmic Foundations of Robotics (WAFR), 2018.
(Invited to IJRR)
| |
Generalizing Robot Imitation Learning with Invariant Hidden Semi-Markov Models.
Ajay Kumar Tanwani, Jonathan Lee, Brijen Thananjeyan, Michael Laskey, Sanjay Krishnan, Roy Fox, Ken Goldberg, Sylvain Calinon
Springer Proceedings in Advanced Robotics: Algorithmic Foundations of Robotics, 2019.
International Workshop on the Algorithmic Foundations of Robotics (WAFR), 2018.
(Invited to IJRR)
| 2018 |
Constraint Estimation and Derivative-Free Recovery for Robot Learning from Demonstrations.
Jonathan Lee, Michael Laskey, Roy Fox, Ken Goldberg. IEEE Conference on Automation Science and Engineering (CASE), 2018.
| 2017 |
DART: Noise Injection for Robust Imitation Learning.
Michael Laskey, Jonathan Lee, Roy Fox, Anca Dragan, Ken Goldberg.
Conference on Robot Learning (CoRL), 2017.
[BAIR Blog]
| |
Comparing Human-Centric and Robot-Centric Sample Efficiency for Robot Deep Learning from Demonstrations.
Michael Laskey, Caleb Chuck, Jonathan Lee, Jeffrey Mahler, Sanjay Krishnan, Kevin Jamieson, Anca Dragan, Ken Goldberg.
IEEE Conference on Robotics and Automation (ICRA), 2017.
| 2016 |
Robot Grasping in Clutter: Using a Hierarchy of Supervisors for Learning from Demonstrations.
Michael Laskey, Jonathan Lee, Caleb Chuck, David Gealy, Wesley Hsieh, Florian T. Pokorny, Anca D. Dragan, and Ken Goldberg.
IEEE Conference on Automation Science and Engineering (CASE), 2016.
|
Short papers, workshop papers, etc.
|
Improved Estimator Selection for Off-Policy Evaluation
George Tucker, Jonathan Lee.
ICML Workshop on Reinforcement Learning Theory, 2021.
|
|
Continuous Online Learning and New Insights into Online Imitation Learning.
Jonathan Lee*, Ching-An Cheng*, Ken Golberg, Byron Boots.
NeurIPS Optimization Foundations for Reinforcement Learning Workshop, 2019.
(Best Paper Award)
|
|
Stability Analysis of On-Policy Imitation Learning Algorithms Using Dynamic Regret.
Jonathan Lee, Michael Laskey, Ajay Kumar Tanwani, Ken Goldberg.
RSS Workshop on Imitation and Causality, 2018.
(Spotlight)
|
|
Iterative Noise Injection for Scalable Imitation Learning.
Michael Laskey,
Jonathan Lee, Wesley Hsieh, Richard Liaw, Jeffrey Mahler, Roy Fox, Ken Goldberg.
arXiv, 2017.
|
|