Jonathan N. Lee

`jnl AT stanford DOT edu`

CV Google Scholar Github

I am a PhD student in the Computer Science Department at Stanford University advised by Emma Brunskill. Most recently, I've been interested in topics around foundation models for decision-making/reasoning, reinforcement learning, and alignment.

During my PhD, I have been supported by an NSF Graduate Research Fellowship. I previously graduated with a B.S. in Electrical Engineering & Computer Science from UC Berkeley where I worked on robot learning, advised by Ken Goldberg.

I am thankful to have spent two wonderful summers at Google with George Tucker, Ofir Nachum, and Bo Dai on the Brain Team (now Google DeepMind), and Chris Dann, Alekh Agarwal, and Tong Zhang on the Learning Theory Team.

Selected Projects

	Eliciting Math Reasoning from Language Models via Step-by-Step Self-Training. Jonathan Lee et al. In preparation. Reasoning Foundation Models for Decision-Making
	Formal Reasoning for Large Language Models via Information-Directed Search. Yash Chandak, Jonathan Lee, Emma Brunskill In preparation. Reasoning Foundation Models for Decision-Making
	Supervised Pretraining Can Learn In-Context Reinforcement Learning. Jonathan Lee, Annie Xie, Aldo Pacchiano, Yash Chandak, Chelsea Finn, Ofir Nachum, Emma Brunskill Neural Information Processing Systems (NeurIPS), 2023. (Spotlight) Foundation Models for Decision-Making
	Dueling RL: Reinforcement Learning with Trajectory Preferences. Aldo Pacchiano, Aadirupa Saha, Jonathan Lee International Conference on Artificial Intelligence and Statistics (AISTATS), 2023. Alignment / RLHF

All Publications

2023	Supervised Pretraining Can Learn In-Context Reinforcement Learning. Jonathan Lee, Annie Xie, Aldo Pacchiano, Yash Chandak, Chelsea Finn, Ofir Nachum, Emma Brunskill Neural Information Processing Systems (NeurIPS), 2023. (Spotlight)
	Experiment Planning with Function Approximation. Aldo Pacchiano, Jonathan Lee, Emma Brunskill Neural Information Processing Systems (NeurIPS), 2023.
	Learning in POMDPs is Sample-Efficient with Hindsight Observability. Jonathan Lee, Alekh Agarwal, Christoph Dann, Tong Zhang International Conference on Machine Learning (ICML), 2023.
	Dueling RL: Reinforcement Learning with Trajectory Preferences. Aldo Pacchiano, Aadirupa Saha, Jonathan Lee International Conference on Artificial Intelligence and Statistics (AISTATS), 2023.
	Estimating Optimal Policy Value in General Linear Contextual Bandits. Jonathan Lee, Weihao Kong, Aldo Pacchiano, Vidya Muthukumar, Emma Brunskill arXiv, 2023.
2022	Oracle Inequalities for Model Selection in Offline Reinforcement Learning. Jonathan Lee, George Tucker, Ofir Nachum, Bo Dai, Emma Brunskill Neural Information Processing Systems (NeurIPS), 2022.
	Model Selection in Batch Policy Optimization. Jonathan Lee, George Tucker, Ofir Nachum, Bo Dai International Conference on Machine Learning (ICML), 2022.
2021	Design of Experiments for Stochastic Contextual Linear Bandits. Andrea Zanette, Kefan Dong, Jonathan Lee*, Emma Brunskill Neural Information Processing Systems (NeurIPS), 2021.
	Near Optimal Policy Optimization via REPS. Aldo Pacchiano, Jonathan Lee, Peter Bartlett, Ofir Nachum Neural Information Processing Systems (NeurIPS), 2021.
	Online Model Selection for Reinforcement Learning with Function Approximation. Jonathan Lee, Aldo Pacchiano, Vidya Muthukumar, Weihao Kong, Emma Brunskill International Conference on Artificial Intelligence and Statistics (AISTATS), 2021.
	Dynamic Regret Convergence Analysis and an Adaptive Regularization Algorithm for On-Policy Robot Imitation Learning. Jonathan Lee, Michael Laskey, Ajay Kumar Tanwani, Anil Aswani, Ken Goldberg. International Journal of Robotics Research (IJRR), 2021. (Invited paper)
2020	Accelerated Message Passing for Entropy-Regularized MAP Inference. Jonathan Lee, Aldo Pacchiano, Peter Bartlett, Michael I. Jordan. International Conference on Machine Learning (ICML), 2020.
	Convergence Rates of Smooth Message Passing with Rounding in Entropy-Regularized MAP Inference. Jonathan Lee, Aldo Pacchiano, Michael I. Jordan. International Conference on Artificial Intelligence and Statistics (AISTATS), 2020.
	Online Learning with Continuous Variations: Dynamic Regret and Reductions. Ching-An Cheng, Jonathan Lee**, Ken Goldberg, Byron Boots. International Conference on Artificial Intelligence and Statistics (AISTATS), 2020.
2019	On-Policy Robot Imitation Learning from a Converging Supervisor. Ashwin Balakrishna, Brijen Thananjeyan, Jonathan Lee, Arsh Zahed, Felix Li, Joseph E. Gonzalez, Ken Goldberg. Conference on Robot Learning (CoRL), 2019. (Oral)
	A Dynamic Regret Analysis and Adaptive Regularization Algorithm for On-Policy Robot Imitation Learning. Jonathan Lee, Michael Laskey, Ajay Kumar Tanwani, Anil Aswani, Ken Goldberg. Springer Proceedings in Advanced Robotics: Algorithmic Foundations of Robotics, 2019. International Workshop on the Algorithmic Foundations of Robotics (WAFR), 2018. (Invited to IJRR)
	Generalizing Robot Imitation Learning with Invariant Hidden Semi-Markov Models. Ajay Kumar Tanwani, Jonathan Lee, Brijen Thananjeyan, Michael Laskey, Sanjay Krishnan, Roy Fox, Ken Goldberg, Sylvain Calinon Springer Proceedings in Advanced Robotics: Algorithmic Foundations of Robotics, 2019. International Workshop on the Algorithmic Foundations of Robotics (WAFR), 2018. (Invited to IJRR)
2018	Constraint Estimation and Derivative-Free Recovery for Robot Learning from Demonstrations. Jonathan Lee, Michael Laskey, Roy Fox, Ken Goldberg. IEEE Conference on Automation Science and Engineering (CASE), 2018.
2017	DART: Noise Injection for Robust Imitation Learning. Michael Laskey, Jonathan Lee, Roy Fox, Anca Dragan, Ken Goldberg. Conference on Robot Learning (CoRL), 2017. [BAIR Blog]
	Comparing Human-Centric and Robot-Centric Sample Efficiency for Robot Deep Learning from Demonstrations. Michael Laskey, Caleb Chuck, Jonathan Lee, Jeffrey Mahler, Sanjay Krishnan, Kevin Jamieson, Anca Dragan, Ken Goldberg. IEEE Conference on Robotics and Automation (ICRA), 2017.
2016	Robot Grasping in Clutter: Using a Hierarchy of Supervisors for Learning from Demonstrations. Michael Laskey, Jonathan Lee, Caleb Chuck, David Gealy, Wesley Hsieh, Florian T. Pokorny, Anca D. Dragan, and Ken Goldberg. IEEE Conference on Automation Science and Engineering (CASE), 2016.

Short papers, workshop papers, etc.

	Improved Estimator Selection for Off-Policy Evaluation George Tucker, Jonathan Lee. ICML Workshop on Reinforcement Learning Theory, 2021.
	Continuous Online Learning and New Insights into Online Imitation Learning. Jonathan Lee, Ching-An Cheng, Ken Golberg, Byron Boots. NeurIPS Optimization Foundations for Reinforcement Learning Workshop, 2019. (Best Paper Award)
	Stability Analysis of On-Policy Imitation Learning Algorithms Using Dynamic Regret. Jonathan Lee, Michael Laskey, Ajay Kumar Tanwani, Ken Goldberg. RSS Workshop on Imitation and Causality, 2018. (Spotlight)
	Iterative Noise Injection for Scalable Imitation Learning. Michael Laskey, Jonathan Lee, Wesley Hsieh, Richard Liaw, Jeffrey Mahler, Roy Fox, Ken Goldberg. arXiv, 2017.