Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3528416.3530227acmconferencesArticle/Chapter ViewAbstractPublication PagescfConference Proceedingsconference-collections
research-article
Open access

FPGA acceleration of deep reinforcement learning using on-chip replay management

Published: 17 May 2022 Publication History

Abstract

A major bottleneck in parallelizing deep reinforcement learning (DRL) is in the high latency to perform various operations used to update the Prioritized Replay Buffer on CPU. The low arithmetic intensity of these operations leads to severe under-utilization of the SIMT computation power of GPUs. In this work, we propose a high-throughput on-chip accelerator for Prioritized Replay Buffer and learner that efficient allocates computation and memory resources to saturate the FPGA computation power. Our design features hardware pipelining on FPGA such that the latency of replay operations is completely hidden. Our experimental results show that the performance of the key operations in managing Prioritized Replay Buffer including sampling and priority insertions are improved by factor of 21X ~ 40X compared with the state-of-the-art implementations on CPU and GPU. In addition, our system design leads to up to 4.3X improvement in overall throughput compared with the state-of-the-art CPU-GPU implementations.

References

[1]
Greg Brockman, Vicki Cheung, Ludwig Pettersson, Jonas Schneider, John Schulman, Jie Tang, and Wojciech Zaremba. 2016. OpenAI Gym. arXiv:arXiv:1606.01540
[2]
Konstantinos Chatzilygeroudis, Roberto Rama, Rituraj Kaushik, Dorian Goepp, Vassilis Vassiliades, and Jean-Baptiste Mouret. 2017. Black-box data-efficient policy search for robotics. In 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 51--58.
[3]
Hyungmin Cho, Pyeongseok Oh, Jiyoung Park, Wookeun Jung, and Jaejin Lee. 2019. FA3C: FPGA-Accelerated Deep Reinforcement Learning. In Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems. ACM, 499--513.
[4]
Andrea Damiani, Giorgia Fiscaletti, Marco Bacis, Rolando Brondolin, and Marco D Santambrogio. 2022. BlastFunction: A Full-stack Framework Bringing FPGA Hardware Acceleration to Cloud-native Applications. ACM Transactions on Reconfigurable Technology and Systems (TRETS) 15, 2 (2022), 1--27.
[5]
Dimitrios Danopoulos, Christoforos Kachris, and Dimitrios Soudris. 2021. Utilizing cloud FPGAs towards the open neural network standard. Sustainable Computing: Informatics and Systems 30 (2021), 100520.
[6]
Johannes de Fine Licht, Maciej Besta, Simon Meierhans, and Torsten Hoefler. 2020. Transformations of high-level synthesis codes for high-performance computing. IEEE Transactions on Parallel and Distributed Systems 32, 5 (2020), 1014--1029.
[7]
Jeffrey Dean and Sanjay Ghemawat. 2008. MapReduce: Simplified Data Processing on Large Clusters. Commun. ACM 51, 1 (Jan. 2008), 107--113.
[8]
Lasse Espeholt, Raphaël Marinier, Piotr Stanczyk, Ke Wang, and Marcin Michalski. 2019. SEED RL: Scalable and Efficient Deep-RL with Accelerated Central Inference. CoRR abs/1910.06591 (2019). arXiv:1910.06591 http://arxiv.org/abs/1910.06591
[9]
Lasse Espeholt, Hubert Soyer, Remi Munos, Karen Simonyan, Vlad Mnih, Tom Ward, Yotam Doron, Vlad Firoiu, Tim Harley, Iain Dunning, et al. 2018. Impala: Scalable distributed deep-rl with importance weighted actor-learner architectures. In International Conference on Machine Learning. PMLR, 1407--1416.
[10]
Ce Guo, Wayne Luk, Stanley Qing Shui Loh, Alexander Warren, and Joshua Levine. 2019. Customisable Control Policy Learning for Robotics. In 2019 IEEE 30th International Conference on Application-specific Systems, Architectures and Processors (ASAP), Vol. 2160. IEEE, 91--98.
[11]
Tuomas Haarnoja, Aurick Zhou, Pieter Abbeel, and Sergey Levine. 2018. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In International conference on machine learning. PMLR, 1861--1870.
[12]
Dan Horgan, John Quan, David Budden, Gabriel Barth-Maron, Matteo Hessel, Hado van Hasselt, and David Silver. 2018. Distributed Prioritized Experience Replay. CoRR abs/1803.00933 (2018). arXiv:1803.00933 http://arxiv.org/abs/1803.00933
[13]
Intel. 2017. Intel Stratix 10 MX FPGAs. https://www.intel.com/content/www/us/en/products/programmable/sip/stratix-10-mx.html
[14]
Intel. 2018. SkyLake Specification. https://www.7-cpu.com/cpu/Skylake.html
[15]
Intel. 2021. GPU Memory Latency's Impact, and Updated Test. https://chipsandcheese.com/2021/05/13/gpu-memory-latencys-impact-and-updated-test/
[16]
Steven Kapturowski, Georg Ostrovski, John Quan, Remi Munos, and Will Dabney. 2019. Recurrent Experience Replay in Distributed Reinforcement Learning.
[17]
Vinod Kathail. 2020. Xilinx vitis unified software platform. In Proceedings of the 2020 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. 173--174.
[18]
Vasileios Leon, Kiamal Pekmestzi, and Dimitrios Soudris. 2021. Exploiting the Potential of Approximate Arithmetic in DSP & AI Hardware Accelerators. In 2021 31st International Conference on Field-Programmable Logic and Applications (FPL). IEEE, 263--264.
[19]
Yuxi Li and Dale Schuurmans. 2012. MapReduce for Parallel Reinforcement Learning. In Recent Advances in Reinforcement Learning, Scott Sanner and Marcus Hutter (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 309--320.
[20]
Eric Liang, Richard Liaw, Robert Nishihara, Philipp Moritz, Roy Fox, Joseph Gonzalez, Ken Goldberg, and Ion Stoica. 2017. Ray RLLib: A Composable and Scalable Reinforcement Learning Library. CoRR abs/1712.09381 (2017). arXiv:1712.09381 http://arxiv.org/abs/1712.09381
[21]
Timothy P. Lillicrap, Jonathan J. Hunt, Alexander Pritzel, Nicolas Manfred Otto Heess, Tom Erez, Yuval Tassa, David Silver, and Daan Wierstra. 2016. Continuous control with deep reinforcement learning. CoRR abs/1509.02971 (2016).
[22]
Yuan Meng, Sanmukh Kuppannagari, and Viktor Prasanna. 2020. Accelerating proximal policy optimization on cpu-fpga heterogeneous platforms. In 2020 IEEE 28th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM). IEEE, 19--27.
[23]
Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, and Martin A. Riedmiller. 2013. Playing Atari with Deep Reinforcement Learning. CoRR abs/1312.5602 (2013). arXiv:1312.5602 http://arxiv.org/abs/1312.5602
[24]
Takao Moriyama, Giovanni De Magistris, Michiaki Tatsubori, Tu-Hoa Pham, Asim Munawar, and Ryuki Tachibana. 2018. Reinforcement Learning Testbed for Power-Consumption Optimization. CoRR abs/1808.10427 (2018). arXiv:1808.10427 http://arxiv.org/abs/1808.10427
[25]
Arun Nair, Praveen Srinivasan, Sam Blackwell, Cagdas Alcicek, Rory Fearon, Alessandro De Maria, Vedavyas Panneershelvam, Mustafa Suleyman, Charles Beattie, Stig Petersen, Shane Legg, Volodymyr Mnih, Koray Kavukcuoglu, and David Silver. 2015. Massively Parallel Methods for Deep Reinforcement Learning. CoRR abs/1507.04296 (2015). arXiv:1507.04296 http://arxiv.org/abs/1507.04296
[26]
NVIDIA. 2022. Deep Learning Performance Documentation. https://docs.nvidia.com/deeplearning/performance/dl-performance-gpu-background/index.html.
[27]
Georg Ofenbeck, Ruedi Steinmann, Victoria Caparros, Daniele G Spampinato, and Markus Püschel. 2014. Applying the roofline model. In 2014 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS). IEEE, 76--85.
[28]
Ian Osband, Charles Blundell, Alexander Pritzel, and Benjamin Van Roy. 2016. Deep exploration via bootstrapped DQN. Advances in neural information processing systems 29 (2016), 4026--4034.
[29]
Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas Kopf, Edward Yang, Zachary DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. 2019. PyTorch: An Imperative Style, High-Performance Deep Learning Library. In Advances in Neural Information Processing Systems 32, H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett (Eds.). Curran Associates, Inc., 8024--8035. http://papers.neurips.cc/paper/9015-pytorch-an-imperative-style-high-performance-deep-learning-library.pdf
[30]
H. Robbins and S. Monro. 1951. A stochastic approximation method. Annals of Mathematical Statistics 22 (1951), 400--407.
[31]
Tom Schaul, John Quan, Ioannis Antonoglou, and David Silver. 2015. Prioritized Experience Replay. http://arxiv.org/abs/1511.05952 cite arxiv:1511.05952Comment: Published at ICLR 2016.
[32]
John Schulman, Sergey Levine, Philipp Moritz, Michael I. Jordan, and Pieter Abbeel. 2015. Trust Region Policy Optimization. CoRR abs/1502.05477 (2015). arXiv:1502.05477 http://arxiv.org/abs/1502.05477
[33]
Lorenzo Servadei, Jin Hwa Lee, José A Arjona Medina, Michael Werner, Sepp Hochreiter, Wolfgang Ecker, and Robert Wille. 2022. Deep Reinforcement Learning for Optimization at Early Design Stages. IEEE Design & Test (2022).
[34]
Shengjia Shao and Wayne Luk. 2017. Customised pearlmutter propagation: A hardware architecture for trust region policy optimisation. In 2017 27th International Conference on Field Programmable Logic and Applications (FPL). IEEE, 1--6.
[35]
Shengjia Shao, Jason Tsai, Michal Mysior, Wayne Luk, Thomas Chau, Alexander Warren, and Ben Jeppesen. 2018. Towards hardware accelerated reinforcement learning for application-specific robotic control. In 2018 IEEE 29th International Conference on Application-specific Systems, Architectures and Processors (ASAP). IEEE, 1--8.
[36]
David Silver, Aja Huang, Chris J Maddison, Arthur Guez, Laurent Sifre, George Van Den Driessche, Julian Schrittwieser, Ioannis Antonoglou, Veda Panneershelvam, Marc Lanctot, et al. 2016. Mastering the game of Go with deep neural networks and tree search. nature 529, 7587 (2016), 484--489.
[37]
Adam Stooke and Pieter Abbeel. 2018. Accelerated methods for deep reinforcement learning. arXiv preprint arXiv:1803.02811 (2018).
[38]
Oriol Vinyals, Igor Babuschkin, Junyoung Chung, Michael Mathieu, Max Jaderberg, Wojciech M Czarnecki, Andrew Dudzik, Aja Huang, Petko Georgiev, Richard Powell, et al. 2019. Alphastar: Mastering the real-time strategy game starcraft ii. DeepMind blog 2 (2019).
[39]
xilinx. 2012. Large FPGA methodology guide. https://www.xilinx.com/support/documentation/sw_manuals/xilinx14_7/ug872_largefpga.pdf
[40]
Xilinx. 2021. Alveo U250 Data Center Accelerator Card. https://www.xilinx.com/products/boards-and-kits/alveo/u250.html
[41]
Chi Zhang, Sanmukh Rao Kuppannagari, and Viktor K Prasanna. 2021. Parallel Actors and Learners: A Framework for Generating Scalable RL Implementations. arXiv (2021). arXiv:2110.01101 [cs.LG]

Cited By

View all
  • (2024)PEARL: Enabling Portable, Productive, and High-Performance Deep Reinforcement Learning using Heterogeneous PlatformsProceedings of the 21st ACM International Conference on Computing Frontiers10.1145/3649153.3649193(41-50)Online publication date: 7-May-2024
  • (2024)FPGA-Accelerated Sim-to-Real Control Policy Learning for Robotic ArmsIEEE Transactions on Circuits and Systems II: Express Briefs10.1109/TCSII.2024.335369071:3(1690-1694)Online publication date: Mar-2024
  • (2023)A Framework for Mapping DRL Algorithms With Prioritized Replay Buffer Onto Heterogeneous PlatformsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2023.326482334:6(1816-1829)Online publication date: 1-Jun-2023
  • Show More Cited By

Index Terms

  1. FPGA acceleration of deep reinforcement learning using on-chip replay management

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    CF '22: Proceedings of the 19th ACM International Conference on Computing Frontiers
    May 2022
    321 pages
    ISBN:9781450393386
    DOI:10.1145/3528416
    This work is licensed under a Creative Commons Attribution International 4.0 License.

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 17 May 2022

    Permissions

    Request permissions for this article.

    Check for updates

    Badges

    • Best Paper

    Author Tags

    1. FPGA
    2. deep reinforcement learning
    3. prioritized replay buffer

    Qualifiers

    • Research-article

    Conference

    CF '22
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 273 of 785 submissions, 35%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)291
    • Downloads (Last 6 weeks)43
    Reflects downloads up to 22 Sep 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)PEARL: Enabling Portable, Productive, and High-Performance Deep Reinforcement Learning using Heterogeneous PlatformsProceedings of the 21st ACM International Conference on Computing Frontiers10.1145/3649153.3649193(41-50)Online publication date: 7-May-2024
    • (2024)FPGA-Accelerated Sim-to-Real Control Policy Learning for Robotic ArmsIEEE Transactions on Circuits and Systems II: Express Briefs10.1109/TCSII.2024.335369071:3(1690-1694)Online publication date: Mar-2024
    • (2023)A Framework for Mapping DRL Algorithms With Prioritized Replay Buffer Onto Heterogeneous PlatformsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2023.326482334:6(1816-1829)Online publication date: 1-Jun-2023
    • (2023)DQN Algorithm Design for Fast Efficient Shortest Path System2023 Asia Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)10.1109/APSIPAASC58517.2023.10317113(254-260)Online publication date: 31-Oct-2023

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Get Access

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media