poster

Public Access

QTAccel: A Generic FPGA based Design for Q-Table based Reinforcement Learning Accelerators

Authors:

Rachit Rajat,

Yuan Meng,

Sanmukh Kuppannagari,

Ajitesh Srivastava,

Viktor Prasanna,

Rajgopal KannanAuthors Info & Claims

FPGA '20: Proceedings of the 2020 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays

Page 323

https://doi.org/10.1145/3373087.3375359

Published: 24 February 2020 Publication History

Abstract

Q-Table based Reinforcement Learning (QRL) is a class of widely used algorithms in AI that work by successively improving the estimates of Q values -- quality of state-action pairs, stored in a table. They significantly outperform Neural Network based techniques when the state space is tractable. Fast learning for AI applications in several domains (e.g. robotics), with tractable 'mid-sized' Q-tables, still necessitates performing substantial rapid updates. State-of-the-art FPGA implementations of QRL do not scale with the increasing Q-Table state space, thus are not efficient for such applications. In this work, we develop a novel FPGA implementation of QRL, scalable to large state spaces and facilitating a large class of AI applications. Our pipelined architecture provides higher throughput while using significantly fewer on-chip resources and thereby supports a variety of action selection policies that covers Q-Learning and variations of bandit algorithms. Possible dependencies caused by consecutive Q value updates are handled, allowing the design to process one Q-sample every clock cycle. Additionally, we provide the first known FPGA implementation of the SARSA (State-Action-Reward-State-Action) algorithm. We evaluate our architecture for Q-Learning and SARSA algorithms and show that our designs achieve a high throughput of up to 180 million Q samples per second.

Cited By

View all

Meng YKuppannagari SKannan RPrasanna V(2022)PPOAccel: A High-Throughput Acceleration Framework for Proximal Policy OptimizationIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2021.313470933:9(2066-2078)Online publication date: 1-Sep-2022
https://doi.org/10.1109/TPDS.2021.3134709
Surhonne ADoan NMaurer FWild THerkersdorf A(2022)GAE-LCT: A Run-Time GA-Based Classifier Evolution Method for Hardware LCT Controlled SoC Performance-Power OptimizationArchitecture of Computing Systems10.1007/978-3-031-21867-5_18(271-285)Online publication date: 14-Dec-2022
https://doi.org/10.1007/978-3-031-21867-5_18
Cardarilli GDi Nunzio LFazzolari RGiardino DMatta MRe MSpanò S(2021)An Action-Selection Policy Generator for Reinforcement Learning Hardware AcceleratorsApplications in Electronics Pervading Industry, Environment and Society10.1007/978-3-030-66729-0_32(267-272)Online publication date: 26-Jan-2021
https://doi.org/10.1007/978-3-030-66729-0_32
Show More Cited By

Index Terms

QTAccel: A Generic FPGA based Design for Q-Table based Reinforcement Learning Accelerators

Recommendations

A CNN accelerator on embedded FPGA using dynamic reconfigurable coprocessor
AIIPCC '19: Proceedings of the International Conference on Artificial Intelligence, Information Processing and Cloud Computing

Convolutional neural network (CNN) has been widely deployed in deep learning networks at present. However, numerous convolution operations are computing intensive and often require powerful accelerator such as FPGA. The existed accelerators usually as ...
SD-Q: selective discount Q learning based on new results of intertemporal choice theory
AICI'11: Proceedings of the Third international conference on Artificial intelligence and computational intelligence - Volume Part II

We discuss the reinforcement learning from an intertemporal choice perspective. Different from previous research, this paper wants to emphasize the importance of deeper understanding the psychological mechanism of human decision-making. In what follows ...
Reinforcement Learning in Adaptive Control of Power System Generation
Abstract
Considering our depleting resources, efficient energy production and transmission is the need of the hour. This paper focuses on the concept of using Reinforcement Learning (RL) to control the power systems unit commitment and economic dispatch ...

Comments

Information & Contributors

Information

Published In

FPGA '20: Proceedings of the 2020 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays

February 2020

346 pages

ISBN:9781450370998

DOI:10.1145/3373087

General Chair:
Stephen Neuendorffer
Xilinx, USA
,
Program Chair:
Lesley Shannon
Simon Fraser University, Canada

Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 24 February 2020

Check for updates

Author Tags

Qualifiers

Poster

Funding Sources

National Science Foundation

Conference

FPGA '20

Sponsor:

SIGDA

FPGA '20: The 2020 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays

February 23 - 25, 2020

CA, Seaside, USA

Acceptance Rates

Overall Acceptance Rate 125 of 627 submissions, 20%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

5
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 22 Sep 2024

Other Metrics

View Author Metrics

Citations

Cited By

View all

Meng YKuppannagari SKannan RPrasanna V(2022)PPOAccel: A High-Throughput Acceleration Framework for Proximal Policy OptimizationIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2021.313470933:9(2066-2078)Online publication date: 1-Sep-2022
https://doi.org/10.1109/TPDS.2021.3134709
Surhonne ADoan NMaurer FWild THerkersdorf A(2022)GAE-LCT: A Run-Time GA-Based Classifier Evolution Method for Hardware LCT Controlled SoC Performance-Power OptimizationArchitecture of Computing Systems10.1007/978-3-031-21867-5_18(271-285)Online publication date: 14-Dec-2022
https://doi.org/10.1007/978-3-031-21867-5_18
Cardarilli GDi Nunzio LFazzolari RGiardino DMatta MRe MSpanò S(2021)An Action-Selection Policy Generator for Reinforcement Learning Hardware AcceleratorsApplications in Electronics Pervading Industry, Environment and Society10.1007/978-3-030-66729-0_32(267-272)Online publication date: 26-Jan-2021
https://doi.org/10.1007/978-3-030-66729-0_32
Meng YKuppannagari SRajat RSrivastava AKannan RPrasanna V(2020)QTAccel: A Generic FPGA based Design for Q-Table based Reinforcement Learning Accelerators2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)10.1109/IPDPSW50202.2020.00024(107-114)Online publication date: May-2020
https://doi.org/10.1109/IPDPSW50202.2020.00024
Meng YKuppannagari SPrasanna V(2020)Accelerating Proximal Policy Optimization on CPU-FPGA Heterogeneous Platforms2020 IEEE 28th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)10.1109/FCCM48280.2020.00012(19-27)Online publication date: May-2020
https://doi.org/10.1109/FCCM48280.2020.00012

View Options

View options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Cited By

Index Terms

Recommendations

A CNN accelerator on embedded FPGA using dynamic reconfigurable coprocessor

SD-Q: selective discount Q learning based on new results of intertemporal choice theory

Reinforcement Learning in Adaptive Control of Power System Generation

Comments

Published In

Sponsors

Publisher

Publication History

Check for updates

Author Tags

Qualifiers

Funding Sources

Conference

Acceptance Rates

Other Metrics

Article Metrics

Other Metrics

Cited By

Login options

Full Access

Abstract

Cited By

Index Terms

Recommendations

A CNN accelerator on embedded FPGA using dynamic reconfigurable coprocessor

SD-Q: selective discount Q learning based on new results of intertemporal choice theory

Reinforcement Learning in Adaptive Control of Power System Generation

Comments

Information

Published In

Sponsors

Publisher

Publication History

Check for updates

Author Tags

Qualifiers

Funding Sources

Conference

Acceptance Rates

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

View options

Get Access

Login options

Full Access

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations