Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3373087.3375359acmconferencesArticle/Chapter ViewAbstractPublication PagesfpgaConference Proceedingsconference-collections
poster
Public Access

QTAccel: A Generic FPGA based Design for Q-Table based Reinforcement Learning Accelerators

Published: 24 February 2020 Publication History

Abstract

Q-Table based Reinforcement Learning (QRL) is a class of widely used algorithms in AI that work by successively improving the estimates of Q values -- quality of state-action pairs, stored in a table. They significantly outperform Neural Network based techniques when the state space is tractable. Fast learning for AI applications in several domains (e.g. robotics), with tractable 'mid-sized' Q-tables, still necessitates performing substantial rapid updates. State-of-the-art FPGA implementations of QRL do not scale with the increasing Q-Table state space, thus are not efficient for such applications. In this work, we develop a novel FPGA implementation of QRL, scalable to large state spaces and facilitating a large class of AI applications. Our pipelined architecture provides higher throughput while using significantly fewer on-chip resources and thereby supports a variety of action selection policies that covers Q-Learning and variations of bandit algorithms. Possible dependencies caused by consecutive Q value updates are handled, allowing the design to process one Q-sample every clock cycle. Additionally, we provide the first known FPGA implementation of the SARSA (State-Action-Reward-State-Action) algorithm. We evaluate our architecture for Q-Learning and SARSA algorithms and show that our designs achieve a high throughput of up to 180 million Q samples per second.

Cited By

View all
  • (2022)PPOAccel: A High-Throughput Acceleration Framework for Proximal Policy OptimizationIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2021.313470933:9(2066-2078)Online publication date: 1-Sep-2022
  • (2022)GAE-LCT: A Run-Time GA-Based Classifier Evolution Method for Hardware LCT Controlled SoC Performance-Power OptimizationArchitecture of Computing Systems10.1007/978-3-031-21867-5_18(271-285)Online publication date: 14-Dec-2022
  • (2021)An Action-Selection Policy Generator for Reinforcement Learning Hardware AcceleratorsApplications in Electronics Pervading Industry, Environment and Society10.1007/978-3-030-66729-0_32(267-272)Online publication date: 26-Jan-2021
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
FPGA '20: Proceedings of the 2020 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays
February 2020
346 pages
ISBN:9781450370998
DOI:10.1145/3373087
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 24 February 2020

Check for updates

Author Tags

  1. artificial intelligence
  2. fpga acceleration
  3. q learning
  4. reinforcement learning accelerator

Qualifiers

  • Poster

Funding Sources

Conference

FPGA '20
Sponsor:

Acceptance Rates

Overall Acceptance Rate 125 of 627 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 22 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2022)PPOAccel: A High-Throughput Acceleration Framework for Proximal Policy OptimizationIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2021.313470933:9(2066-2078)Online publication date: 1-Sep-2022
  • (2022)GAE-LCT: A Run-Time GA-Based Classifier Evolution Method for Hardware LCT Controlled SoC Performance-Power OptimizationArchitecture of Computing Systems10.1007/978-3-031-21867-5_18(271-285)Online publication date: 14-Dec-2022
  • (2021)An Action-Selection Policy Generator for Reinforcement Learning Hardware AcceleratorsApplications in Electronics Pervading Industry, Environment and Society10.1007/978-3-030-66729-0_32(267-272)Online publication date: 26-Jan-2021
  • (2020)QTAccel: A Generic FPGA based Design for Q-Table based Reinforcement Learning Accelerators2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)10.1109/IPDPSW50202.2020.00024(107-114)Online publication date: May-2020
  • (2020)Accelerating Proximal Policy Optimization on CPU-FPGA Heterogeneous Platforms2020 IEEE 28th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)10.1109/FCCM48280.2020.00012(19-27)Online publication date: May-2020

View Options

View options

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media