Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3459637.3482386acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

Revisiting State Augmentation methods for Reinforcement Learning with Stochastic Delays

Published: 30 October 2021 Publication History

Abstract

Several real-world scenarios, such as remote control and sensing, are comprised of action and observation delays. The presence of delays degrades the performance of reinforcement learning (RL) algorithms, often to such an extent that algorithms fail to learn anything substantial. This paper formally describes the notion of Markov Decision Processes (MDPs) with stochastic delays and shows that delayed MDPs can be transformed into equivalent standard MDPs (without delays) with significantly simplified cost structure. We employ this equivalence to derive a model-free Delay-Resolved RL framework and show that even a simple RL algorithm built upon this framework achieves near-optimal rewards in environments with stochastic delays in actions and observations. The delay-resolved deep Q-network (DRDQN) algorithm is bench-marked on a variety of environments comprising of multi-step and stochastic delays and results in better performance, both in terms of achieving near-optimal rewards and minimizing the computational overhead thereof, with respect to the currently established algorithms.

Supplementary Material

MP4 File (CIKM21-rgfp0725.mp4)
Revisiting State Augmentation Methods for Reinforcement Learning with Stochastic Delays

References

[1]
Mridul Agarwal and Vaneet Aggarwal. 2021. Blind Decision Making: Reinforcement Learning with Delayed Observations. Proceedings of the International Conference on Automated Planning and Scheduling 31, 1 (May 2021), 2--6. https://ojs.aaai.org/index.php/ICAPS/article/view/15940
[2]
Eitan Altman, Tamer Başar, and R Srikant. 1999. Congestion control as a stochastic control problem with action delays. Automatica 35, 12 (1999), 1937--1950.
[3]
Eitan Altman and Philippe Nain. 1992. Closed-loop control with delayed infor- mation. ACM sigmetrics performance evaluation review 20, 1 (1992), 193--204.
[4]
JL Bander and CC White III. 1999. Markov decision processes with noise- corrupted and delayed state observations. Journal of the Operational Research Society 50, 6 (1999), 660--668.
[5]
Dimitri P Bertsekas, Dimitri P Bertsekas, Dimitri P Bertsekas, and Dimitri P Bertsekas. 1995. Dynamic programming and optimal control. Vol. 1. Athena scientific Belmont, MA.
[6]
Yann Bouteiller, Simon Ramstedt, Giovanni Beltrame, Christopher Pal, and Jonathan Binas. 2021. Reinforcement Learning with Random Delays. In International Conference on Learning Representations. https://openreview.net/forum?id= QFYnKlBJYR
[7]
Greg Brockman, Vicki Cheung, Ludwig Pettersson, Jonas Schneider, John Schulman, Jie Tang, and Wojciech Zaremba. 2016. OpenAI Gym. arXiv:arXiv:1606.01540
[8]
DM Brooks and Cornelius T Leondes. 1972. Markov decision processes with state-information lag. Operations Research 20, 4 (1972), 904--907.
[9]
Baiming Chen, Mengdi Xu, Liang Li, and Ding Zhao. 2020. Delay-Aware Model-Based Reinforcement Learning for Continuous Control. arXiv preprint arXiv:2005.05440 (2020).
[10]
Gal Dalal, Esther Derman, and Shie Mannor. 2021. Acting in Delayed Environments with Non-Stationary Markov Policies. In International Conference on Learning Representations. https://openreview.net/forum?id=j1RMMKeP2gR
[11]
Twan Dollevoet, Dennis Huisman, Marie Schmidt, and Anita Schöbel. 2018. Delay propagation and delay management in transportation networks. In Handbook of Optimization in the Railway Industry. Springer, 285--317.
[12]
Elizabeth Gibney. 2016. Google AI algorithm masters ancient game of Go. Nature News 529, 7587 (2016), 445.
[13]
Robert Hannah and Wotao Yin. 2018. On unbounded delays in asynchronous parallel fixed-point algorithms. Journal of Scientific Computing 76, 1 (2018), 299--326.
[14]
Peter Henderson, Riashat Islam, Philip Bachman, Joelle Pineau, Doina Precup, and David Meger. 2018. Deep reinforcement learning that matters. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 32.
[15]
Takashi Imaida, Yasuyoshi Yokokohji, Toshitsugu Doi, Mitsushige Oda, and Tsuneo Yoshikawa. 2004. Ground-space bilateral teleoperation of ETS-VII robot arm by direct bilateral coupling under 7-s time delay condition. IEEE Transactions on Robotics and Automation 20, 3 (2004), 499--511.
[16]
Maolin Jin, Sang Hoon Kang, and Pyung Hun Chang. 2008. Robust compliant motion control of robot with nonlinear friction using time-delay estimation. IEEE Transactions on Industrial Electronics 55, 1 (2008), 258--269.
[17]
Konstantinos V Katsikopoulos and Sascha E Engelbrecht. 2003. Markov decision processes with delays and asynchronous cost collection. IEEE transactions on automatic control 48, 4 (2003), 568--574.
[18]
Soung H Kim. 1985. State information lag Markov decision process with control limit rule. Naval research logistics quarterly 32, 3 (1985), 491--496.
[19]
Soung Hie Kim and Byung Ho Jeong. 1987. A partially observable Markov decision process with lagged information. Journal of the Operational Research Society 38, 5 (1987), 439--446.
[20]
Daniel B Larremore, Bryan Wilder, Evan Lester, Soraya Shehata, James M Burke, James A Hay, Milind Tambe, Michael J Mina, and Roy Parker. 2020. Test sensitivity is secondary to frequency and turnaround time for COVID-19 surveillance. MedRxiv (2020).
[21]
Qun Li, Xuhua Guan, Peng Wu, Xiaoye Wang, Lei Zhou, Yeqing Tong, Ruiqi Ren, Kathy SM Leung, Eric HY Lau, Jessica Y Wong, et al. 2020. Early transmission dynamics in Wuhan, China, of novel coronavirus--infected pneumonia. New England Journal of Medicine (2020).
[22]
Hartmut Logemann. 1998. Destabilizing effects of small time delays on feedback- controlled descriptor systems. Linear Algebra Appl. 272, 1-3 (1998), 131--153.
[23]
Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A Rusu, Joel Veness, Marc G Bellemare, Alex Graves, Martin Riedmiller, Andreas K Fidjeland, Georg Ostrovski, et al. 2015. Human-level control through deep reinforcement learning. nature 518, 7540 (2015), 529--533.
[24]
A Ross Otto, Samuel J Gershman, Arthur B Markman, and Nathaniel D Daw. 2013. The curse of planning: dissecting multiple reinforcement-learning systems by taxing the central executive. Psychological science 24, 5 (2013), 751--761.
[25]
K Patanarapeelert, TD Frank, R Friedrich, PJ Beek, and IM Tang. 2006. Theoretical analysis of destabilization resonances in time-delayed stochastic second-order dynamical systems and some implications for human motor control. Physical Review E 73, 2 (2006), 021901.
[26]
Simon Ramstedt and Chris Pal. 2019. Real-time reinforcement learning. In Ad- vances in neural information processing systems. 3073--3082.
[27]
Xinmiao Rong, Liu Yang, Huidi Chu, and Meng Fan. 2020. Effect of delay in diag- nosis on transmission of COVID-19. Mathematical Biosciences and Engineering 17, 3 (2020), 2725--2740.
[28]
Erik Schuitema, Lucian Buşoniu, Robert Babu?ka, and Pieter Jonker. 2010. Control delay in reinforcement learning for real-time dynamic systems: a memoryless approach. In 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems. IEEE, 3226--3231.
[29]
Jérémie Sciré, Sarah Ann Nadeau, Timothy G Vaughan, Brupbacher Gavin, Simon Fuchs, Jürg Sommer, Katrin N Koch, Reto Misteli, Lukas Mundorff, Thomas Götz, et al. 2020. Reproductive number of the COVID-19 epidemic in Switzerland with a focus on the Cantons of Basel-Stadt and Basel-Landschaft. Swiss Medical Weekly 150, 19--20 (2020), w20271.
[30]
David Silver, Aja Huang, Chris J Maddison, Arthur Guez, Laurent Sifre, George Van Den Driessche, Julian Schrittwieser, Ioannis Antonoglou, Veda Panneershelvam, Marc Lanctot, et al. 2016. Mastering the game of Go with deep neural networks and tree search. nature 529, 7587 (2016), 484--489.
[31]
Satinder P Singh, Tommi Jaakkola, and Michael I Jordan. 1994. Learning without state-estimation in partially observable Markovian decision processes. In Machine Learning Proceedings 1994. Elsevier, 284--292.
[32]
Edward J Sondik. 1978. The optimal control of partially observable Markov processes over the infinite horizon: Discounted costs. Operations research 26, 2 (1978), 282--304.
[33]
Thomas J. Walsh, Ali Nouri, Lihong Li, and Michael L. Littman. 2007. Planning and Learning in Environments with Delayed Feedback. (2007), 442--453.
[34]
Thomas J Walsh, Ali Nouri, Lihong Li, and Michael L Littman. 2009. Learning and planning in environments with delayed feedback. Autonomous Agents and Multi-Agent Systems 18, 1 (2009), 83.
[35]
Chelsea C White. 1988. Note on ?A partially observable Markov decision process with lagged information". Journal of the Operational Research Society 39, 2 (1988), 217--217.
[36]
Greg Whittred and Ian Zimmer. 1984. Timeliness of financial reporting and financial distress. Accounting Review (1984), 287--295.
[37]
Ted Xiao, Eric Jang, Dmitry Kalashnikov, Sergey Levine, Julian Ibarz, Karol Haus- man, and Alexander Herzog. 2019. Thinking While Moving: Deep Reinforcement Learning with Concurrent Control. In International Conference on Learning Rep- resentations.
[38]
L Zelevinsky. 1998. Does time-optimal control of a stochastic system with sensory delay produce movement units. Master's thesis, University of Massachusetts, Amherst (1998).

Cited By

View all
  • (2025)A delay-robust method for enhanced real-time reinforcement learningNeural Networks10.1016/j.neunet.2024.106769181(106769)Online publication date: Jan-2025
  • (2024)Multitimescale Control and Communications With Deep Reinforcement Learning—Part I: Communication-Aware Vehicle ControlIEEE Internet of Things Journal10.1109/JIOT.2023.334859011:9(15386-15401)Online publication date: 1-May-2024
  • (2024)Robust reinforcement learning control for quadrotor with input delay and uncertaintiesJournal of the Franklin Institute10.1016/j.jfranklin.2024.107012361:13(107012)Online publication date: Sep-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
CIKM '21: Proceedings of the 30th ACM International Conference on Information & Knowledge Management
October 2021
4966 pages
ISBN:9781450384469
DOI:10.1145/3459637
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 30 October 2021

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. delays in learning
  2. mdps
  3. reinforcement learning

Qualifiers

  • Research-article

Conference

CIKM '21
Sponsor:

Acceptance Rates

Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

Upcoming Conference

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)80
  • Downloads (Last 6 weeks)13
Reflects downloads up to 14 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2025)A delay-robust method for enhanced real-time reinforcement learningNeural Networks10.1016/j.neunet.2024.106769181(106769)Online publication date: Jan-2025
  • (2024)Multitimescale Control and Communications With Deep Reinforcement Learning—Part I: Communication-Aware Vehicle ControlIEEE Internet of Things Journal10.1109/JIOT.2023.334859011:9(15386-15401)Online publication date: 1-May-2024
  • (2024)Robust reinforcement learning control for quadrotor with input delay and uncertaintiesJournal of the Franklin Institute10.1016/j.jfranklin.2024.107012361:13(107012)Online publication date: Sep-2024
  • (2024)Solving time-delay issues in reinforcement learning via transformersApplied Intelligence10.1007/s10489-024-05830-254:23(12156-12176)Online publication date: 10-Sep-2024
  • (2024)How the brain can be trained to achieve an intermittent control strategy for stabilizing quiet stance by means of reinforcement learningBiological Cybernetics10.1007/s00422-024-00993-0118:3-4(229-248)Online publication date: 12-Jul-2024
  • (2024)Dynamic Modeling for Reinforcement Learning with Random DelayArtificial Neural Networks and Machine Learning – ICANN 202410.1007/978-3-031-72341-4_26(381-396)Online publication date: 17-Sep-2024
  • (2023)Delay-Informed Intelligent Formation Control for UAV-Assisted IoT ApplicationSensors10.3390/s2313619023:13(6190)Online publication date: 6-Jul-2023
  • (2023)Overcoming Delayed Feedback via Overlook Decision Making2023 IEEE International Conference on Systems, Man, and Cybernetics (SMC)10.1109/SMC53992.2023.10394201(31-37)Online publication date: 1-Oct-2023
  • (2023)Adaptive PD Control Using Deep Reinforcement Learning for Local-Remote Teleoperation with Stochastic Time Delays2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)10.1109/IROS55552.2023.10341953(7046-7053)Online publication date: 1-Oct-2023
  • (2023)A pipelining task offloading strategy via delay-aware multi-agent reinforcement learning in Cybertwin-enabled 6G networkDigital Communications and Networks10.1016/j.dcan.2023.04.004Online publication date: Apr-2023
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media