GitHub - Mamba413/ROOM: Robust Offline Reinforcement Learning with Heavy-Tailed Rewards

Robust Offline Reinforcement Learning with Heavy-Tailed Rewards

Reproducible code for the paper: Robust Offline Reinforcement Learning with Heavy-Tailed Rewards

Summary of the paper

This paper endeavors to augment the robustness of offline reinforcement learning (RL) in scenarios laden with heavy-tailed rewards, a prevalent circumstance in real-world applications. We propose two algorithmic frameworks, ROAM and ROOM, for robust off-policy evaluation (OPE) and offline policy optimization (OPO), respectively. Central to our frameworks is the median-of-means (MM) method. Our key insight is that employing MoM to offline RL does more than just tackle heavy-tailed rewards—it offers valid uncertainty quantification to address insufficient coverage issue in offline RL as well.

Below it is the numerical performance of our proposal (ROOM-VM & P-ROOM-VM) on the d4rl benchmarked dataset:

File structure

requirement.txt: prerequisite python libraries
Cartpole directory: code for reproducing results in Figures 3, 4, 6
- _density directory: functions for estimating the density ratio in marginalize importance sampling based methods
- _RL directory: employ MM in the TD update in fitted Q-iteration/evaluation based algorithms (Algorithms 4-5)
- _MM_OPE.py: Algorithm 1 and its variant (ROAM-variant)
- _MM_OPE.py: Algorithm 2 and its pessimistic variant (P-ROOM)
- _PB_OPO.py: Bootstrap based variant for OPE.
- eval_cartpole.py: reproduce Figures 3(a), 4, 6
- optimize_cartpole.py: reproduce Figures 3(b)
SQL:
- src directory: implement the sparse Q-learning (SQL) for
- main_SQL.py: the main file for conducting numerical studies for SQL. (reproduce Figure 5)
SAC-N:
- SACN.py directory: implement the soft-actor critic (SAC) of $N$ ensemble.
- main_SACN.py: the main file for conducting numerical studies for SACN. (reproduce Figure A3)

Citation

@InProceedings{zhu2024robust,
  title     = 	 {Robust Offline Reinforcement Learning with Heavy-Tailed Rewards},
  author    =   {Zhu, Jin and Wan, Runzhe and Qi, Zhengling and Luo, Shikai and Shi, Chengchun},
  booktitle = 	 {Proceedings of The 27th International Conference on Artificial Intelligence and Statistics},
  pages     = 	 {541--549},
  year      =   {2024},
  editor    = 	 {Dasgupta, Sanjoy and Mandt, Stephan and Li, Yingzhen},
  volume    = 	 {238},
  series    = 	 {Proceedings of Machine Learning Research},
  month     = 	 {02--04 May},
  publisher =    {PMLR}
}

Reference

Offline RL with No OOD Actions: In-Sample Learning via Implicit Value Regularization, ICLR (2023)
Uncertainty-Based Offline Reinforcement Learning with Diversified Q-Ensemble, NeurIPS (2021)

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
Cartpole		Cartpole
SAC-N		SAC-N
SQL		SQL
figure		figure
.gitignore		.gitignore
README.md		README.md
requirement.txt		requirement.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Robust Offline Reinforcement Learning with Heavy-Tailed Rewards

Summary of the paper

File structure

Citation

Reference

About

Releases

Packages

Languages

Mamba413/ROOM

Folders and files

Latest commit

History

Repository files navigation

Robust Offline Reinforcement Learning with Heavy-Tailed Rewards

Summary of the paper

File structure

Citation

Reference

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages