research-article

Revisiting State Augmentation methods for Reinforcement Learning with Stochastic Delays

Authors:

Mayank Baranwal,

Harshad KhadilkarAuthors Info & Claims

CIKM '21: Proceedings of the 30th ACM International Conference on Information & Knowledge Management

Pages 1346 - 1355

https://doi.org/10.1145/3459637.3482386

Published: 30 October 2021 Publication History

Abstract

Several real-world scenarios, such as remote control and sensing, are comprised of action and observation delays. The presence of delays degrades the performance of reinforcement learning (RL) algorithms, often to such an extent that algorithms fail to learn anything substantial. This paper formally describes the notion of Markov Decision Processes (MDPs) with stochastic delays and shows that delayed MDPs can be transformed into equivalent standard MDPs (without delays) with significantly simplified cost structure. We employ this equivalence to derive a model-free Delay-Resolved RL framework and show that even a simple RL algorithm built upon this framework achieves near-optimal rewards in environments with stochastic delays in actions and observations. The delay-resolved deep Q-network (DRDQN) algorithm is bench-marked on a variety of environments comprising of multi-step and stochastic delays and results in better performance, both in terms of achieving near-optimal rewards and minimizing the computational overhead thereof, with respect to the currently established algorithms.

Supplementary Material

MP4 File (CIKM21-rgfp0725.mp4)

Revisiting State Augmentation Methods for Reinforcement Learning with Stochastic Delays

Download
48.07 MB

References

[1]

Mridul Agarwal and Vaneet Aggarwal. 2021. Blind Decision Making: Reinforcement Learning with Delayed Observations. Proceedings of the International Conference on Automated Planning and Scheduling 31, 1 (May 2021), 2--6. https://ojs.aaai.org/index.php/ICAPS/article/view/15940

[2]

Eitan Altman, Tamer Başar, and R Srikant. 1999. Congestion control as a stochastic control problem with action delays. Automatica 35, 12 (1999), 1937--1950.

Digital Library

[3]

Eitan Altman and Philippe Nain. 1992. Closed-loop control with delayed infor- mation. ACM sigmetrics performance evaluation review 20, 1 (1992), 193--204.

Digital Library

[4]

JL Bander and CC White III. 1999. Markov decision processes with noise- corrupted and delayed state observations. Journal of the Operational Research Society 50, 6 (1999), 660--668.

[5]

Dimitri P Bertsekas, Dimitri P Bertsekas, Dimitri P Bertsekas, and Dimitri P Bertsekas. 1995. Dynamic programming and optimal control. Vol. 1. Athena scientific Belmont, MA.

Digital Library

[6]

Yann Bouteiller, Simon Ramstedt, Giovanni Beltrame, Christopher Pal, and Jonathan Binas. 2021. Reinforcement Learning with Random Delays. In International Conference on Learning Representations. https://openreview.net/forum?id= QFYnKlBJYR

[7]

Greg Brockman, Vicki Cheung, Ludwig Pettersson, Jonas Schneider, John Schulman, Jie Tang, and Wojciech Zaremba. 2016. OpenAI Gym. arXiv:arXiv:1606.01540

[8]

DM Brooks and Cornelius T Leondes. 1972. Markov decision processes with state-information lag. Operations Research 20, 4 (1972), 904--907.

Digital Library

[9]

Baiming Chen, Mengdi Xu, Liang Li, and Ding Zhao. 2020. Delay-Aware Model-Based Reinforcement Learning for Continuous Control. arXiv preprint arXiv:2005.05440 (2020).

[10]

Gal Dalal, Esther Derman, and Shie Mannor. 2021. Acting in Delayed Environments with Non-Stationary Markov Policies. In International Conference on Learning Representations. https://openreview.net/forum?id=j1RMMKeP2gR

[11]

Twan Dollevoet, Dennis Huisman, Marie Schmidt, and Anita Schöbel. 2018. Delay propagation and delay management in transportation networks. In Handbook of Optimization in the Railway Industry. Springer, 285--317.

[12]

Elizabeth Gibney. 2016. Google AI algorithm masters ancient game of Go. Nature News 529, 7587 (2016), 445.

[13]

Robert Hannah and Wotao Yin. 2018. On unbounded delays in asynchronous parallel fixed-point algorithms. Journal of Scientific Computing 76, 1 (2018), 299--326.

Digital Library

[14]

Peter Henderson, Riashat Islam, Philip Bachman, Joelle Pineau, Doina Precup, and David Meger. 2018. Deep reinforcement learning that matters. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 32.

[15]

Takashi Imaida, Yasuyoshi Yokokohji, Toshitsugu Doi, Mitsushige Oda, and Tsuneo Yoshikawa. 2004. Ground-space bilateral teleoperation of ETS-VII robot arm by direct bilateral coupling under 7-s time delay condition. IEEE Transactions on Robotics and Automation 20, 3 (2004), 499--511.

[16]

Maolin Jin, Sang Hoon Kang, and Pyung Hun Chang. 2008. Robust compliant motion control of robot with nonlinear friction using time-delay estimation. IEEE Transactions on Industrial Electronics 55, 1 (2008), 258--269.

[17]

Konstantinos V Katsikopoulos and Sascha E Engelbrecht. 2003. Markov decision processes with delays and asynchronous cost collection. IEEE transactions on automatic control 48, 4 (2003), 568--574.

[18]

Soung H Kim. 1985. State information lag Markov decision process with control limit rule. Naval research logistics quarterly 32, 3 (1985), 491--496.

[19]

Soung Hie Kim and Byung Ho Jeong. 1987. A partially observable Markov decision process with lagged information. Journal of the Operational Research Society 38, 5 (1987), 439--446.

[20]

Daniel B Larremore, Bryan Wilder, Evan Lester, Soraya Shehata, James M Burke, James A Hay, Milind Tambe, Michael J Mina, and Roy Parker. 2020. Test sensitivity is secondary to frequency and turnaround time for COVID-19 surveillance. MedRxiv (2020).

[21]

Qun Li, Xuhua Guan, Peng Wu, Xiaoye Wang, Lei Zhou, Yeqing Tong, Ruiqi Ren, Kathy SM Leung, Eric HY Lau, Jessica Y Wong, et al. 2020. Early transmission dynamics in Wuhan, China, of novel coronavirus--infected pneumonia. New England Journal of Medicine (2020).

[22]

Hartmut Logemann. 1998. Destabilizing effects of small time delays on feedback- controlled descriptor systems. Linear Algebra Appl. 272, 1-3 (1998), 131--153.

[23]

Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A Rusu, Joel Veness, Marc G Bellemare, Alex Graves, Martin Riedmiller, Andreas K Fidjeland, Georg Ostrovski, et al. 2015. Human-level control through deep reinforcement learning. nature 518, 7540 (2015), 529--533.

[24]

A Ross Otto, Samuel J Gershman, Arthur B Markman, and Nathaniel D Daw. 2013. The curse of planning: dissecting multiple reinforcement-learning systems by taxing the central executive. Psychological science 24, 5 (2013), 751--761.

[25]

K Patanarapeelert, TD Frank, R Friedrich, PJ Beek, and IM Tang. 2006. Theoretical analysis of destabilization resonances in time-delayed stochastic second-order dynamical systems and some implications for human motor control. Physical Review E 73, 2 (2006), 021901.

[26]

Simon Ramstedt and Chris Pal. 2019. Real-time reinforcement learning. In Ad- vances in neural information processing systems. 3073--3082.

Digital Library

[27]

Xinmiao Rong, Liu Yang, Huidi Chu, and Meng Fan. 2020. Effect of delay in diag- nosis on transmission of COVID-19. Mathematical Biosciences and Engineering 17, 3 (2020), 2725--2740.

[28]

Erik Schuitema, Lucian Buşoniu, Robert Babu?ka, and Pieter Jonker. 2010. Control delay in reinforcement learning for real-time dynamic systems: a memoryless approach. In 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems. IEEE, 3226--3231.

[29]

Jérémie Sciré, Sarah Ann Nadeau, Timothy G Vaughan, Brupbacher Gavin, Simon Fuchs, Jürg Sommer, Katrin N Koch, Reto Misteli, Lukas Mundorff, Thomas Götz, et al. 2020. Reproductive number of the COVID-19 epidemic in Switzerland with a focus on the Cantons of Basel-Stadt and Basel-Landschaft. Swiss Medical Weekly 150, 19--20 (2020), w20271.

[30]

David Silver, Aja Huang, Chris J Maddison, Arthur Guez, Laurent Sifre, George Van Den Driessche, Julian Schrittwieser, Ioannis Antonoglou, Veda Panneershelvam, Marc Lanctot, et al. 2016. Mastering the game of Go with deep neural networks and tree search. nature 529, 7587 (2016), 484--489.

[31]

Satinder P Singh, Tommi Jaakkola, and Michael I Jordan. 1994. Learning without state-estimation in partially observable Markovian decision processes. In Machine Learning Proceedings 1994. Elsevier, 284--292.

Digital Library

[32]

Edward J Sondik. 1978. The optimal control of partially observable Markov processes over the infinite horizon: Discounted costs. Operations research 26, 2 (1978), 282--304.

Digital Library

[33]

Thomas J. Walsh, Ali Nouri, Lihong Li, and Michael L. Littman. 2007. Planning and Learning in Environments with Delayed Feedback. (2007), 442--453.

Digital Library

[34]

Thomas J Walsh, Ali Nouri, Lihong Li, and Michael L Littman. 2009. Learning and planning in environments with delayed feedback. Autonomous Agents and Multi-Agent Systems 18, 1 (2009), 83.

Digital Library

[35]

Chelsea C White. 1988. Note on ?A partially observable Markov decision process with lagged information". Journal of the Operational Research Society 39, 2 (1988), 217--217.

[36]

Greg Whittred and Ian Zimmer. 1984. Timeliness of financial reporting and financial distress. Accounting Review (1984), 287--295.

[37]

Ted Xiao, Eric Jang, Dmitry Kalashnikov, Sergey Levine, Julian Ibarz, Karol Haus- man, and Alexander Herzog. 2019. Thinking While Moving: Deep Reinforcement Learning with Concurrent Control. In International Conference on Learning Rep- resentations.

[38]

L Zelevinsky. 1998. Does time-optimal control of a stochastic system with sensory delay produce movement units. Master's thesis, University of Massachusetts, Amherst (1998).

Cited By

Xia BSun HYuan BLi ZLiang BWang X(2025)A delay-robust method for enhanced real-time reinforcement learningNeural Networks10.1016/j.neunet.2024.106769181(106769)Online publication date: Jan-2025
https://doi.org/10.1016/j.neunet.2024.106769
Liu TLei LZheng KShen X(2024)Multitimescale Control and Communications With Deep Reinforcement Learning—Part I: Communication-Aware Vehicle ControlIEEE Internet of Things Journal10.1109/JIOT.2023.334859011:9(15386-15401)Online publication date: 1-May-2024
https://doi.org/10.1109/JIOT.2023.3348590
Zhang ZFei YZhou JYu YSun C(2024)Robust reinforcement learning control for quadrotor with input delay and uncertaintiesJournal of the Franklin Institute10.1016/j.jfranklin.2024.107012361:13(107012)Online publication date: Sep-2024
https://doi.org/10.1016/j.jfranklin.2024.107012
Show More Cited By

Index Terms

Revisiting State Augmentation methods for Reinforcement Learning with Stochastic Delays
1. Theory of computation
  1. Theory and algorithms for application domains
    1. Machine learning theory
      1. Markov decision processes
      2. Reinforcement learning
        Sequential decision making

Recommendations

State-augmentation transformations for risk-sensitive reinforcement learning
AAAI'19/IAAI'19/EAAI'19: Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence and Thirty-First Innovative Applications of Artificial Intelligence Conference and Ninth AAAI Symposium on Educational Advances in Artificial Intelligence

In the framework of MDP, although the general reward function takes three arguments—current state, action, and successor state; it is often simplified to a function of two arguments—current state and action. The former is called a transition-based reward ...
Solving Semi-Markov Decision Problems Using Average Reward Reinforcement Learning

<P>A large class of problems of sequential decision making under uncertainty, of which the underlying probability structure is a Markov process, can be modeled as stochastic dynamic programs referred to, in general, as Markov decision problems or MDPs. ...
Reinforcement learning with limited reinforcement: Using Bayes risk for active learning in POMDPs

Acting in domains where an agent must plan several steps ahead to achieve a goal can be a challenging task, especially if the agent@?s sensors provide only noisy or partial information. In this setting, Partially Observable Markov Decision Processes (...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

CIKM '21: Proceedings of the 30th ACM International Conference on Information & Knowledge Management

October 2021

4966 pages

ISBN:9781450384469

DOI:10.1145/3459637

General Chairs:
Gianluca Demartini
The University of Queensland, Australia
,
Guido Zuccon
The University of Queensland, Australia
,
Program Chairs:
J. Shane Culpepper
RMIT University, Australia
,
Zi Huang
The University of Queensland, Australia
,
Hanghang Tong
University of Illinois at Urbana-Champaign, USA

Copyright © 2021 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 30 October 2021

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

CIKM '21

Sponsor:

CIKM '21: The 30th ACM International Conference on Information and Knowledge Management

November 1 - 5, 2021

Queensland, Virtual Event, Australia

Acceptance Rates

Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

Upcoming Conference

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

11
Total Citations
View Citations
281
Total Downloads

Downloads (Last 12 months)80
Downloads (Last 6 weeks)13

Reflects downloads up to 14 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

Xia BSun HYuan BLi ZLiang BWang X(2025)A delay-robust method for enhanced real-time reinforcement learningNeural Networks10.1016/j.neunet.2024.106769181(106769)Online publication date: Jan-2025
https://doi.org/10.1016/j.neunet.2024.106769
Liu TLei LZheng KShen X(2024)Multitimescale Control and Communications With Deep Reinforcement Learning—Part I: Communication-Aware Vehicle ControlIEEE Internet of Things Journal10.1109/JIOT.2023.334859011:9(15386-15401)Online publication date: 1-May-2024
https://doi.org/10.1109/JIOT.2023.3348590
Zhang ZFei YZhou JYu YSun C(2024)Robust reinforcement learning control for quadrotor with input delay and uncertaintiesJournal of the Franklin Institute10.1016/j.jfranklin.2024.107012361:13(107012)Online publication date: Sep-2024
https://doi.org/10.1016/j.jfranklin.2024.107012
Xia BYang ZXie MChang YYuan BLi ZWang XLiang B(2024)Solving time-delay issues in reinforcement learning via transformersApplied Intelligence10.1007/s10489-024-05830-254:23(12156-12176)Online publication date: 10-Sep-2024
https://doi.org/10.1007/s10489-024-05830-2
Takazawa TSuzuki YNakamura AMatsuo RMorasso PNomura T(2024)How the brain can be trained to achieve an intermittent control strategy for stabilizing quiet stance by means of reinforcement learningBiological Cybernetics10.1007/s00422-024-00993-0118:3-4(229-248)Online publication date: 12-Jul-2024
https://doi.org/10.1007/s00422-024-00993-0
Yu Yxia BXie MLi ZWang X(2024)Dynamic Modeling for Reinforcement Learning with Random DelayArtificial Neural Networks and Machine Learning – ICANN 202410.1007/978-3-031-72341-4_26(381-396)Online publication date: 17-Sep-2024
https://doi.org/10.1007/978-3-031-72341-4_26
Liu LXu MWang ZFang CLi ZLi MSun YChen H(2023)Delay-Informed Intelligent Formation Control for UAV-Assisted IoT ApplicationSensors10.3390/s2313619023:13(6190)Online publication date: 6-Jul-2023
https://doi.org/10.3390/s23136190
Yu YXia BXie MWang XLi ZChang Y(2023)Overcoming Delayed Feedback via Overlook Decision Making2023 IEEE International Conference on Systems, Man, and Cybernetics (SMC)10.1109/SMC53992.2023.10394201(31-37)Online publication date: 1-Oct-2023
https://doi.org/10.1109/SMC53992.2023.10394201
McCutcheon LFallah S(2023)Adaptive PD Control Using Deep Reinforcement Learning for Local-Remote Teleoperation with Stochastic Time Delays2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)10.1109/IROS55552.2023.10341953(7046-7053)Online publication date: 1-Oct-2023
https://doi.org/10.1109/IROS55552.2023.10341953
Niu HWang LDu KLu ZWen XLiu Y(2023)A pipelining task offloading strategy via delay-aware multi-agent reinforcement learning in Cybertwin-enabled 6G networkDigital Communications and Networks10.1016/j.dcan.2023.04.004Online publication date: Apr-2023
https://doi.org/10.1016/j.dcan.2023.04.004
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents