research-article

Open access

Efficient Parallel Reinforcement Learning Framework Using the Reactor Model

Authors:

Marten Lohstroh,

Edward A. LeeAuthors Info & Claims

SPAA '24: Proceedings of the 36th ACM Symposium on Parallelism in Algorithms and Architectures

Pages 41 - 51

https://doi.org/10.1145/3626183.3659967

Published: 17 June 2024 Publication History

Abstract

Parallel Reinforcement Learning (RL) frameworks are essential for mapping RL workloads to multiple computational resources, allowing for faster generation of samples, estimation of values, and policy improvement. These computational paradigms require a seamless integration of training, serving, and simulation workloads. Existing frameworks, such as Ray, are not managing this orchestration efficiently, especially in RL tasks that demand intensive input/output and synchronization between actors on a single node. In this study, we have proposed a solution implementing the reactor model, which enforces a set of actors to have a fixed communication pattern. This allows the scheduler to eliminate work needed for synchronization, such as acquiring and releasing locks for each actor or sending and processing coordination-related messages. Our framework, Lingua Franca (LF), a coordination language based on the reactor model, also supports true parallelism in Python and provides a unified interface that allows users to automatically generate dataflow graphs for RL tasks. In comparison to Ray on a single-node multi-core compute platform, LF achieves 1.21x and 11.62x higher simulation throughput in OpenAI Gym and Atari environments, reduces the average training time of synchronized parallel Q-learning by 31.2%, and accelerates multi-agent RL inference by 5.12x.

References

[1]

Mart'in Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, et al. 2016. TensorFlow: A System for Large-Scale Machine Learning. In 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16). 265--283.

Digital Library

[2]

Ilge Akkaya, Marcin Andrychowicz, Maciek Chociej, Mateusz Litwin, Bob McGrew, Arthur Petron, Alex Paino, Matthias Plappert, Glenn Powell, Raphael Ribas, et al. 2019. Solving rubik's cube with a robot hand.

[3]

Albert Benveniste and Gérard Berry. 1991. The Synchronous Approach to Reactive and Real-Time Systems. Proc. IEEE, Vol. 79, 9 (1991), 1270--1282.

[4]

James Bradbury, Roy Frostig, Peter Hawkins, Matthew James Johnson, Chris Leary, Dougal Maclaurin, George Necula, Adam Paszke, Jake VanderPlas, Skye Wanderman-Milne, et al. 2018. JAX: composable transformations of Python NumPy programs.

[5]

Greg Brockman, Vicki Cheung, Ludwig Pettersson, Jonas Schneider, John Schulman, Jie Tang, and Wojciech Zaremba. 2016. Openai gym.

[6]

Li Chen, Justinas Lingys, Kai Chen, and Feng Liu. 2018. Auto: Scaling deep reinforcement learning for datacenter-scale automatic traffic optimization. In Proceedings of the 2018 conference of the ACM special interest group on data communication. 191--205.

Digital Library

[7]

Tianqi Chen, Mu Li, Yutian Li, Min Lin, Naiyan Wang, Minjie Wang, Tianjun Xiao, Bing Xu, Chiyuan Zhang, and Zheng Zhang. 2015. MXNet: A Flexible and Efficient Machine Learning Library for Heterogeneous Distributed Systems. https://arxiv.org/abs/1512.01274

[8]

Leonardo de Moura, Soonho Kong, Jeremy Avigad, Floris Van Doorn, and Jakob von Raumer. 2015. The Lean theorem prover (system description). In Automated Deduction-CADE-25: 25th International Conference on Automated Deduction, Berlin, Germany, August 1--7, 2015, Proceedings 25. Springer, 378--388.

[9]

Stephen A. Edwards and John Hui. 2020. The Sparse Synchronous Model. In Forum on Specification and Design Languages (FDL). 1--8.

[10]

Lasse Espeholt, Raphaël Marinier, Piotr Stanczyk, Ke Wang, and Marcin Michalski. 2019. Seed rl: Scalable and efficient deep-rl with accelerated central inference.

[11]

Lasse Espeholt, Hubert Soyer, Remi Munos, Karen Simonyan, Vlad Mnih, Tom Ward, Yotam Doron, Vlad Firoiu, Tim Harley, Iain Dunning, et al. 2018. Impala: Scalable distributed deep-rl with importance weighted actor-learner architectures. In International conference on machine learning. PMLR, 1407--1416.

[12]

Sam Gross. 2021. Multithreaded Python without the GIL. https://docs.google.com/document/d/18CXhDb1ygxg-YXNBJNzfzZsDFosB5e6BfnXLlejd9l0/edit. [Online; accessed 17-April-2024].

[13]

Ameer Haj-Ali, Nesreen K. Ahmed, Theodore L. Willke, Joseph Gonzalez, Krste Asanovic, and Ion Stoica. 2019. Deep Reinforcement Learning in System Optimization. showeprint[arXiv]1908.01275 http://arxiv.org/abs/1908.01275

[14]

Carl Hewitt. 2010. Actor Model for Discretionary, Adaptive Concurrency. showeprint[arXiv]1008.1459 http://arxiv.org/abs/1008.1459

[15]

Matthew W Hoffman, Bobak Shahriari, John Aslanides, Gabriel Barth-Maron, Nikola Momchev, Danila Sinopalnikov, Piotr Sta'nczyk, Sabela Ramos, Anton Raichuk, Damien Vincent, et al. 2020. Acme: A research framework for distributed reinforcement learning.

[16]

Dan Horgan, John Quan, David Budden, Gabriel Barth-Maron, Matteo Hessel, Hado van Hasselt, and David Silver. 2018. Distributed Prioritized Experience Replay. showeprint[arXiv]1803.00933 http://arxiv.org/abs/1803.00933

[17]

Ahmet Inci, Evgeny Bolotin, Yaosheng Fu, Gal Dalal, Shie Mannor, David Nellans, and Diana Marculescu. 2020. The architectural implications of distributed reinforcement learning on CPU-GPU systems.

[18]

Elia Kaufmann, Leonard Bauersfeld, Antonio Loquercio, Matthias Müller, Vladlen Koltun, and Davide Scaramuzza. 2023. Champion-level drone racing using deep reinforcement learning. Nature, Vol. 620, 7976 (2023), 982--987.

[19]

Anurag Koul. 2019. ma-gym: Collection of multi-agent environments based on OpenAI gym. https://github.com/koulanurag/ma-gym.

[20]

Eric Liang, Zhanghao Wu, Michael Luo, Sven Mika, Joseph E Gonzalez, and Ion Stoica. 2021. RLlib Flow: Distributed Reinforcement Learning is a Dataflow Problem. Advances in Neural Information Processing Systems, Vol. 34 (2021), 5506--5517.

[21]

Xiaojun Liu, Eleftherios Matsikoudis, and Edward A. Lee. 2006. Modeling Timed Concurrent Systems. In CONCUR 2006 - Concurrency Theory, Vol. LNCS 4137. 1--15. https://doi.org/10.1007/11817949_1

Digital Library

[22]

Marten Lohstroh. 2020. Reactors: A Deterministic Model of Concurrent Computation for Reactive Systems. Ph.,D. Dissertation. EECS Department, University of California, Berkeley. http://www2.eecs.berkeley.edu/Pubs/TechRpts/2020/EECS-2020--235.html

Digital Library

[23]

Marten Lohstroh, Christian Menard, Soroush Bateni, and Edward A Lee. 2021. Toward a Lingua Franca for deterministic concurrent systems. ACM Transactions on Embedded Computing Systems (TECS), Vol. 20, 4 (2021), 1--27.

Digital Library

[24]

Christian Menard, Marten Lohstroh, Soroush Bateni, Matthew Chorlian, Arthur Deng, Peter Donovan, Clément Fournier, Shaokai Lin, Felix Suchert, Tassilo Tanneberger, et al. 2023. High-performance deterministic concurrency using lingua franca. ACM Transactions on Architecture and Code Optimization, Vol. 20, 4 (2023), 1--29.

Digital Library

[25]

Volodymyr Mnih, Adria Puigdomenech Badia, Mehdi Mirza, Alex Graves, Timothy Lillicrap, Tim Harley, David Silver, and Koray Kavukcuoglu. 2016. Asynchronous methods for deep reinforcement learning. In International conference on machine learning. PMLR, 1928--1937.

[26]

Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, and Martin A. Riedmiller. 2013. Playing Atari with Deep Reinforcement Learning. showeprint[arXiv]1312.5602 http://arxiv.org/abs/1312.5602

[27]

Thomas M Moerland, Joost Broekens, Aske Plaat, Catholijn M Jonker, et al. 2023. Model-based reinforcement learning: A survey. Foundations and Trends® in Machine Learning, Vol. 16, 1 (2023), 1--118.

[28]

Philipp Moritz, Robert Nishihara, Stephanie Wang, Alexey Tumanov, Richard Liaw, Eric Liang, Melih Elibol, Zongheng Yang, William Paul, Michael I Jordan, et al. 2018. Ray: A distributed framework for emerging $$AI$$ applications. In 13th USENIX symposium on operating systems design and implementation (OSDI 18). 561--577.

[29]

OpenAI. 2023. GPT-4 Technical Report. https://arxiv.org/abs/2303.08774

[30]

Afshin Oroojlooy and Davood Hajinezhad. 2023. A review of cooperative multi-agent deep reinforcement learning. Applied Intelligence, Vol. 53, 11 (2023), 13677--13722.

Digital Library

[31]

Adam Paszke, Sam Gross, Soumith Chintala, Gregory Chanan, Edward Yang, Zachary DeVito, Zeming Lin, Alban Desmaison, Luca Antiga, and Adam Lerer. 2017. Automatic differentiation in pytorch.

[32]

Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, et al. 2019. PyTorch: An Imperative Style, High-Performance Deep Learning Library. https://doi.org/10.48550/arXiv.1912.01703

[33]

Xavi Puig, Eric Undersander, Andrew Szot, Mikael Dallaire Cote, Ruslan Partsey, Jimmy Yang, Ruta Desai, Alexander William Clegg, Michal Hlavac, Tiffany Min, Theo Gervet, Vladimír Vondru?, Vincent-Pierre Berges, John Turner, Oleksandr Maksymets, Zsolt Kira, Mrinal Kalakrishnan, Jitendra Malik, Devendra Singh Chaplot, Unnat Jain, Dhruv Batra, Akshara Rai, and Roozbeh Mottaghi. 2023. Habitat 3.0: A Co-Habitat for Humans, Avatars and Robots.

[34]

Jeff Rasley, Samyam Rajbhandari, Olatunji Ruwase, and Yuxiong He. 2020. Deepspeed: System optimizations enable training deep learning models with over 100 billion parameters. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 3505--3506.

Digital Library

[35]

Tim Rentsch. 1982. Object oriented programming. ACM Sigplan Notices, Vol. 17, 9 (1982), 51--57.

Digital Library

[36]

Marcus Rossel, Shaokai Lin, Marten Lohstroh, Jeronimo Castrillon, and Andres Goens. 2023. Provable Determinism for Software in Cyber-Physical Systems. In 15th International Conference on Verified Software: Theories, Tools, and Experiments (VSTTE). 1--22.

[37]

Mohammad Reza Samsami and Hossein Alimadad. 2020. Distributed deep reinforcement learning: An overview.

[38]

Manolis Savva, Abhishek Kadian, Oleksandr Maksymets, Yili Zhao, Erik Wijmans, Bhavana Jain, Julian Straub, Jia Liu, Vladlen Koltun, Jitendra Malik, Devi Parikh, and Dhruv Batra. 2019. Habitat: A Platform for Embodied AI Research. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). 1--17.

[39]

John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. 2017. Proximal Policy Optimization Algorithms. showeprint[arXiv]1707.06347 http://arxiv.org/abs/1707.06347

[40]

Alexander Schulz-Rosengarten, Reinhard von Hanxleden, Marten Lohstroh, Edward A Lee, and Soroush Bateni. 2023. Polyglot Modal Models through Lingua Franca. In Proceedings of Cyber-Physical Systems and Internet of Things Week 2023. IEEE, Berkeley, CA, USA, 337--342.

Digital Library

[41]

Brennan Shacklett, Luc Guy Rosenzweig, Zhiqiang Xie, Bidipta Sarkar, Andrew Szot, Erik Wijmans, Vladlen Koltun, Dhruv Batra, and Kayvon Fatahalian. 2023. An extensible, data-oriented architecture for high-performance, many-world simulation. ACM Transactions on Graphics (TOG), Vol. 42, 4 (2023), 1--13.

Digital Library

[42]

David Silver, Aja Huang, Chris J Maddison, Arthur Guez, Laurent Sifre, George Van Den Driessche, Julian Schrittwieser, Ioannis Antonoglou, Veda Panneershelvam, Marc Lanctot, et al. 2016. Mastering the game of Go with deep neural networks and tree search. nature, Vol. 529, 7587 (2016), 484--489.

[43]

Andrew Szot, Alex Clegg, Eric Undersander, Erik Wijmans, Yili Zhao, John Turner, Noah Maestre, Mustafa Mukadam, Devendra Chaplot, Oleksandr Maksymets, Aaron Gokaslan, Vladimir Vondrus, Sameer Dharur, Franziska Meier, Wojciech Galuba, Angel Chang, Zsolt Kira, Vladlen Koltun, Jitendra Malik, Manolis Savva, and Dhruv Batra. 2021. Habitat 2.0: Training Home Assistants to Rearrange their Habitat. In Advances in Neural Information Processing Systems (NeurIPS). 1--17.

[44]

Hado Van Hasselt, Arthur Guez, and David Silver. 2015. Deep Reinforcement Learning with Double Q-learning.

[45]

Amir Yazdanbakhsh, Junchao Chen, and Yu Zheng. 2020. Menger: Massively large-scale distributed reinforcement learning.

[46]

Fanyu Zeng and Chen Wang. 2020. Visual navigation with asynchronous proximal policy optimization in artificial agents. Journal of Robotics, Vol. 2020 (2020), 1--7.

[47]

Kaiqing Zhang, Zhuoran Yang, and Tamer Bacs ar. 2021. Multi-agent reinforcement learning: A selective overview of theories and algorithms. Handbook of reinforcement learning and control, Vol. 325, 11 (2021), 321--384.

[48]

Huanzhou Zhu, Bo Zhao, Gang Chen, Weifeng Chen, Yijie Chen, Liang Shi, Yaodong Yang, Peter Pietzuch, and Lei Chen. 2023. MSRL: Distributed Reinforcement Learning with Dataflow Fragments. In 2023 USENIX Annual Technical Conference (USENIX ATC 23). 977--993. https://www.usenix.org/conference/atc23/presentation/zhu-huanzhou

Index Terms

Efficient Parallel Reinforcement Learning Framework Using the Reactor Model
1. Computing methodologies
  1. Concurrent computing methodologies
    1. Concurrent algorithms
  2. Parallel computing methodologies
    1. Parallel programming languages

Recommendations

Using Transfer Learning to Speed-Up Reinforcement Learning: A Cased-Based Approach
LARS '10: Proceedings of the 2010 Latin American Robotics Symposium and Intelligent Robotics Meeting

Reinforcement Learning (RL) is a well-known technique for the solution of problems where agents need to act with success in an unknown environment, learning through trial and error. However, this technique is not efficient enough to be used in ...
Reward Shaping in Episodic Reinforcement Learning
AAMAS '17: Proceedings of the 16th Conference on Autonomous Agents and MultiAgent Systems

Recent advancements in reinforcement learning confirm that reinforcement learning techniques can solve large scale problems leading to high quality autonomous decision making. It is a matter of time until we will see large scale applications of ...
Enhancing Machine Learning Based Malware Detection Model by Reinforcement Learning
ICCNS '18: Proceedings of the 8th International Conference on Communication and Network Security

Malware detection is getting more and more attention due to the rapid growth of new malware. As a result, machine learning (ML) has become a popular way to detect malware variants. However, machine learning models can also be cheated. Through ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

SPAA '24: Proceedings of the 36th ACM Symposium on Parallelism in Algorithms and Architectures

June 2024

510 pages

ISBN:9798400704161

DOI:10.1145/3626183

General Chair:
Kunal Agrawal
Washington University in St. Louis, USA
,
Program Chair:
Erez Petrank
Technion, Israel

Copyright © 2024 Owner/Author.

This work is licensed under a Creative Commons Attribution International 4.0 License.

Sponsors

SIGACT: ACM Special Interest Group on Algorithms and Computation Theory
SIGARCH: ACM Special Interest Group on Computer Architecture
EATCS: European Association for Theoretical Computer Science

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 June 2024

Check for updates

Author Tags

Qualifiers

Research-article

Conference

SPAA '24

Sponsor:

SPAA '24: 36th ACM Symposium on Parallelism in Algorithms and Architectures

June 17 - 21, 2024

Nantes, France

Acceptance Rates

Overall Acceptance Rate 447 of 1,461 submissions, 31%

Upcoming Conference

SPAA '25

Sponsor:
sigact
sigact

37th ACM Symposium on Parallelism in Algorithms and Architectures

July 28 - August 1, 2025

Portland , OR , USA

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
398
Total Downloads

Downloads (Last 12 months)398
Downloads (Last 6 weeks)80

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Figures

Tables

Media

View Table of Conten