About Me | Yaodong Yang

About Me

Dr. Yaodong Yang is a Boya Assistant Professor at the Peking University Institute for Artificial Intelligence, and Programme Director for LLM Safety Centre at the Beijing Academy of Artificial Intelligence (BAAI). Dr. Yang's research focuses on human-AI safe interaction and value alignment, covering areas such as reinforcement learning, AI alignment, multi-agent learning, and embodied AI. He has maintained a track record of more than 100 publications at top conferences and journals (Nature Machine Intelligence, Artificial Intelligence, JMLR, IEEE T-PAMI, National Science Review), with 6000+ Google Citations. He has been awarded the Best Paper Award Initial List at ICCV'23, the Best System Paper Award at CoRL'20, the Best Blue-Sky Paper Award at AAMAS'21, the Rising Star Award of ACM SIGAI China and World AI Conference 2022. He has taken the leadership role in the alignment and open-source efforts for the Baichuan2, Pengcheng Naohai 33B, and Hong Kong HKGAI LLMs. His team also won the NeurIPS'22 MyoChallenge on Dexterous Manipulation. Dr. Yang serves as an Area Chair for ICLR, NeurIPS, AAAI, IJCAI, and AAMAS, and is an Associate Editor for Neural Networks. Previously, Dr. Yang was an Assistant Professor at King's College London, Principal Researcher at Huawei UK, and Senior Manager at American International Group (AIG). He earned his bachelor's degree from the University of Science and Technology of China, a master's and Ph.D. from Imperial College London and University College London (nominated for the ACM SIGAI Doctoral Dissertation Award).

杨耀东博士，北京大学人工智能研究院研究员（博雅学者）、人工智能安全与治理中心执行主任、智源大模型安全项目负责人。人社部海外高层次人才、国家级高层次青年人才、中国科协青年托举人才。研究方向为智能体安全交互与价值对齐，科研领域涵盖强化学习、AI对齐、多智能体学习、具身智能。发表AI领域顶会顶刊论文一百余篇，谷歌引用六千余次，曾获ICCV’23最佳论文奖入围、CoRL’20最佳系统论文奖、AAMAS’21最具前瞻性论文奖、WAIC’22云帆奖璀璨明星、ACM SIGAI China新星奖。带领国内团队研发多智能体强化学习算法首登Nature Machine Intelligence，主导Baichuan2、鹏城脑海33B、香港HKGAI大模型价值对齐工作，带领团队获NeurIPS’22 机器人灵巧操作比赛冠军。获央视一套《焦点访谈》、央视四套《深度国际》、Financial Times、MIT Tech Review报道。现任ICLR/NeurIPS/AAAI/IJCAI/AAMAS 领域主席，Neural Network执行编委。主持国自然、科技部、北京市科委、校企联合实验室等科研课题三十余项。曾任伦敦国王大学助理教授、华为英国研究所主任研究员、美国国际集团(AIG)科学部高级经理。本科毕业于中国科学技术大学，并在伦敦帝国理工大学、伦敦大学学院获得硕士、博士学位（唯一提名ACM SIGAI 优博奖）。

北大对齐与交互实验室PAIR-Lab的科研方向包括：

常年招收强化学习实习生/访问学者（带薪）

人工智能对齐（人类反馈强化学习、博弈论、控制论）

基于强化学习的灵巧双手操作（强化学习、机器人、具身智能）

多智能体博弈交互（强化学习、多智能体、博弈论）

强化学习开源项目（Show me the code, not the story~）

Recent News

10/2024

Checkout my recent talk on "Can LLM be Aligned ?" at CNCC 2024.

09/2024

Five papers get accepted at NeurIPS 2024

Achieving Efficient Alignment through Learned Correction (Oral, top 0.5%)
ProgressGym: Alignment with a Millennium of Moral Progress (Spotlight)
Panacea: Pareto Alignment via Preference Adaptation for LLMs
Scalable Constrained Policy Optimization for Safe Multi-agent Reinforcement Learning
SafeSora: Towards Safety Alignment of Text2Video Generation via a Human Preference Dataset

09/2024

Check out our Nature Machine Intelligence paper on Large Scale Multi-agent RL.

Efficient and scalable reinforcement learning for large-scale network control

08/2024

Two papers accepted at CoRL 2024

Neural Attention Field: Emerging Point Relevance in 3D Scenes for One-Shot Dexterous Grasping
Object-Centric Dexterous Manipulation from Human Motion Data

05/2024

Valse 2024年度进展报告:从偏好对齐到价值对齐与超对齐

中文视频

05/2024

Three papers get accepted at ICML 2024

SINSIGHT: End-to-End Neuro-Symbolic Visual Reinforcement Learning with Language Explanations
Safe Reinforcement Learning using Finite-Horizon Gradient-based Estimation
Planning with Theory of Mind for Few-Shot Adaptation in Mixed-motive Environments

03/2024

We, alogn with Yoshua Bengio, Stuart Russell, Geff Hinton and Chinese decision makers signed Beijing Declaration on AI Safety.

中文报道

01/2024

Five papers get accepted at ICLR 2024 & one paper on TPAMI.

Spotlight (5%) CivRealm: A Learning and Reasoning Odyssey for Decision-Making Agents
Spotlight (5%) Maximum Entropy Heterogeneous-Agent Reinforcement Learning
Spotlight (5%) Safe RLHF: Safe Reinforcement Learning from Human Feedback
SafeDreamer: Safe Reinforcement Learning with World Models
Byzantine Robust Cooperative Multi-Agent Reinforcement Learning as a Bayesian Game
PAMI ASP: Learn a Universal Neural Solver

12/2023

Three papers get accepted at AAAI 2024.

STAS: Spatial-Temporal Return Decomposition for Multi-agent Reinforcement Learning
Oral (7%) ProAgent: Building Proactive Cooperative AI with Large Language Models
A Perspective of Q-value Estimation on Offline-to-Online Reinforcement Learning

12/2023

Two top journals get accepted!

11/2023

We release AI Alignment Survey and Alignment Resource Website.

10/2023

Our paper won the best paper initial list (17/8260) at ICCV 2023!

UniDexGrasp++: Improving Dexterous Grasping Policy Learning via Geometry-aware Curriculum and Iterative Generalist-Specialist Learning

09/2023

Six papers get accepted at NeurIPS 2023.

Multi-Agent First Order Constrained Optimization in Policy Space
Hierarchical Multi-Agent Skill Discovery
Policy Space Diversity for Non-Transitive Games
Team-PSRO for Learning Approximate TMECor in Large Team Games via Cooperative Reinforcement Learning
BeaverTails: A Human-Preference Dataset for LLM Harmlessness Alignment
Safety Gymnasium: A Unified Safe Reinforcement Learning Benchmark

09/2023

Two papers get accepted at JMLR and TMLR.

06/2023

TorchOpt is now officially part of PyTorch Ecosystem!

Announcement!

05/2023

Four papers get accepted at ICML 2023.

03/2023

Invited talk given:

Slides: Aligning safe decision in open-ended world.

(Video)

03/2023

One paper gets accepted at Artificial Intelligence Journal

Safe Multi-Agent Reinforcement Learning for Multi-Robot Control

We propose the first safe cooperative MARL method.

02/2023

Two ICRA papers, One ICLR paper got accepted.

ICRA'23: End-to-End Affordance Learning for Robotic Manipulation

We take advantage of visual affordance by using the contact information generated during the RL training process to predict contact maps of interest.

ICRA'23: GenDexGrasp: Generalizable Dexterous Grasping

A versatile dexterous grasping method that can generalize to unseen hands.

ICLR'23: QUALITY-SIMILAR DIVERSITY VIA POPULATION BASED REINFORCEMENT LEARNING

A new policy diversity measure is proposed that suits game AI settings.

1/2023

One paper gets accepted at Autonomous Agents and Multi-Agent Systems (Springer)

Online Markov Decision Processes with Non-oblivious Strategic Adversary

We study the setting of online MDP where the adversary is smart where it can change its policy accordingly to the learning agent's behavior.

1/2023

One paper gets accepted at AAMAS 2023

Is Nash Equilibrium Approximator Learnable ?

We prove that Nash Equilibrium is agnostic-PAC learnable.

12/2022

We have won the 1st place at NeurIPS 2022 MyoChallenge!

This competition is about learning contact-rich manipulation using a musculoskeletal hand, e.g., Die Rotation.

11/2022

Our paper gets accepted at National Science Review [IF-23]

On the complexity of computing markov perfect equilibrium in general-sum stochastic games

We prove the complexity of computing Nash Equilibrium in Markov games are PPAD-Complete.

11/2022

Three multi-agent RL papers get accepted at AAAI 2023.

Mutli-agent RL:

Learning to Shape Rewards using a Game of Two Partners
Subspace-Aware Exploration for Sparse-Reward Multi-Agent Tasks
Cooperative Multi-agent Q-learning with Bidirectional Action-Dependency

10/2022

Talk is given at Airs in Air.

Game Theoretical Multi-Agent Reinforcement Learning.

09/2022

Talk is given at Techbeat.com 2022.

A General Solution Framework to Cooperative MARL.

09/2022

Seven papers got accepted at NeurIPS 2022.

Preference-based RL:

Meta-Reward-Net: Implicitly Differentiable Reward Learning for Preference-based Reinforcement Learning

Meta-RL:

A Theoretical Understanding of Gradient Bias in Meta-Reinforcement Learning

Safe RL:

Constrained Update Projection Approach to Safe Policy Optimization

Cooperative Games:

Multi-Agent Reinforcement Learning is a Sequence Modeling Problem

Zero-sum Games:

A Unified Diversity Measure for Multiagent Reinforcement Learning

New RL environments:

Towards Human-Level Bimanual Dexterous Manipulation with Reinforcement Learning

MATE: Benchmarking Multi-Agent Reinforcement Learning in Distributed Target Coverage Control

08/2022

Tutorial on Conference on Games 2022

Solving two-player zero-sum games through reinforcement learning

Part I

Part II

08/2022

Two Invited Talks were given during the summer holidays

CSML China 08/22:

A continuum of solutions to cooperative MARL.

CCDM China 07/22:

Training a Population of Agents.

07/2022

One paper got accepted at IROS 2022.

Fully Decentralized Model-based Policy Optimization for Networked Systems

We figured out how to do model-based MARL in networked systems.

05/2022

One paper got accepted at IJCAI 2022.

1. On the Convergence of Fictitious Play: A Decomposition Approach

We extend the convergence guarantee for the well-known fictitious play method.

04/2022

We open source two reinforcement learning projects:

1. TorchOpt

We develop an optimisation tool in Pytorch where meta-gradients can be computed easily.

With TorchOpt, you can implement Meta-RL algorithms easily, try our code!

2. BiDexHands

We develop a RL/MARL environment for bimanual dexterous hands manipulations.

BiDexhands are super fast, you can reach 40,000 FPS by only one GPU.

01/2022

Two papers got accepted at ICLR 2022.

1. Multiagent-Agent TRPO Methods

We develop how to conduct trust-region updates in MARL settings.

This is the SOTA algorithm in the cooperative MARL space, try our code!

[English Blog] [Chinese Blog] [Code]

2. LIGS: Learnable Intrinsic-Reward Generation Selection for Multi- Agent Learning

The paper addresses coordination improvement in the MARL setting by learning intrinsic rewards that motivate the exploration and coordination.

01/2022

北大AI院多智能体中心面向全球招收寒研学生，见课题如下。

Link

01/2022

Invited talk at DAI 2021 on the topic of Training A Population of Reinforcement Learning Agents.

Slides

09/2021

Three papers get accepted at NeurIPS 2021:

1. https://lnkd.in/dEGYvaKq

(see Blog here)

We analysed the variance of gradient norm for multi-agent reinforcement learning and developed a minimal-variance policy gradient estimator.

2. https://lnkd.in/dqcvH4Ns

We developed a rigorous way to generate diverse policies in population-based training and demonstrated impressive results on Google football.

3. https://lnkd.in/db__AKEY

We show it is entirely possible to make AI learn to learn how to solve zero-sum games without even telling it what is a Nash equilibrium.

08/2021

Invited talk at RLChina on the tutorial of Multi-Agent Learning.

Slides Video (in Chinese)

07/2021

Invited talk by 机器之心 on my recent work on how to deal with non-transitivity in two-player zero-sum games.

Slides Video (in Chinese)

06/2021

We opensource MALib: A bespoke high-performance framework for population-based multi-agent reinforcement learning.

Paper Github

05/2021

Two papers get accepted in ICML 2021.

Modelling Behavioural Diversity for Learning in Open-Ended Games. This paper studies how to measure and promote behavioural diversity in solving games in a mathematically rigorous way. It is awarded a long talk (top 3%) at ICML 2021.

Learning in Nonzero-Sum Stochastic Games with Potentials. This paper studies a generalised class of fully cooperative games, named stochastic potential games, and propose a MARL solution to find the Nash in such games.

03/2021

Check out my recent talk on the topic of:

A general framework for solving two-player zero-sum games.

02/2021

Update: Our paper wins the Best Paper Award at the Blue Sky Idea track!!!

One paper gets accepted in AAMAS 2021.

Diverse Auto-Curriculum is Critical for Successful Real-World Multiagent Learning Systems. I express some of my recent thoughts on why behavioural diversity in the policy space is an important factor for MARL techniques to be applied in real-world problems, outside purely video games.

11/2020

Check out my latest work on:

An Overview of Multi-Agent Reinforcement Learning from Game Theoretical Perspective. I hope this work could offer a nice summary of game theory basics for MARL researches in addition to the deep RL hype :)

10/2020

Update: SMARTS won the BEST paper award in CoRL 2020!

We release SMARTS: a multi-agent reinforcement learning enabled autonomous driving platform.

Multi-agent interaction is a fundamental aspect of autonomous driving in the real world. Today we are excited to introduce a dedicated platform: SMARTS, that supports Scalable Multi-Agent Reinforcement Learning Training for autonomous driving. With SMARTS, ML researchers can now evaluate their new algorithms in the self-driving scenarios, in addition to traditional video games. In turn, SMARTS can enrich the social vehicle behaviours and create increasingly more realistic and diverse interactions, powered by RL techniques, for autonomous driving researchers. Check our code on Github, and our paper at Conference on Robotic Learning 2020.

10/2020

One paper gets accepted at NIPS 2020 !

Replica-exchange Nos\'e-Hoover dynamics for Bayesian learning on large datasets. We introduce a new HMC sampler for large-scale Bayesian deep learning that suits multi-mode sampling and the noises from mini-batches can be absorbed by a special design of Nose-Hoover dynamics.

09/2020

One paper gets accepted at CIKM 2020 !

Learning to infer user hidden states for online sequential advertising.

08/2020

A lecture was given at RL China Summer School.

Advances of Multi-agent Learning in Gaming AI.

06/2020

A talk was given at ISTBI, Fudan University.

Many-agent Reinforcement Learning.

06/2020

One paper gets accepted at ICML 2020

Multi-agent Determinantal Q-learning. We introduce a new function approximator called Q-determinant point process for multi-agent reinforcement learning problems. It can help learn the Q-function factorisation with no needs for a priori structural constraints such as QMIX, VDN, etc.

05/2020

One paper gets accepted at IJCAI 2020

Modelling Bounded Rationality in Multi-Agent Interactions by Generalized Recursive Reasoning. We use probabilisitic graphical model to describe the recursive reasoning process of "I believe you believe I believe..." in the multi-agent system.

02/2020

One paper gets accepted at AAMAS 2020

Alpha^Alpla-Rank: Practically Scaling Alpha-Rank through Stochastic Optimisation. Alpha-Rank is a replacement for Nash equilibrium for general-sum N-player game, importantly, its solution is P-complete. In this paper, we further enhance its tractability by several orders of magnitude by stochastic optimisation formulation.

Dr. Yaodong Yang (杨耀东)

PKU Alignment & Interaction Lab

Recent News

Dr. Yaodong Yang (杨耀东)

​PKU Alignment & Interaction Lab

Recent News

PKU Alignment & Interaction Lab