research-article

FleetRec: Large-Scale Recommendation Inference on Hybrid GPU-FPGA Clusters

Authors:

Jiansong Zhang,

Gustavo AlonsoAuthors Info & Claims

KDD '21: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining

Pages 3097 - 3105

https://doi.org/10.1145/3447548.3467139

Published: 14 August 2021 Publication History

Abstract

We present FleetRec, a high-performance and scalable recommendation inference system within tight latency constraints. FleetRec takes advantage of heterogeneous hardware including GPUs and the latest FPGAs equipped with high-bandwidth memory. By disaggregating computation and memory to different types of hardware and bridging their connections by high-speed network, FleetRec gains the best of both worlds, and can naturally scale out by adding nodes to the cluster. Experiments on three production models up to 114 GB show that FleetRec outperforms optimized CPU baseline by more than one order of magnitude in terms of throughput while achieving significantly lower latency.

Supplementary Material

MP4 File (KDD2021_FleetRec_talk.mp4)

Presentation video

Download
134.08 MB

References

[1]

Heng-Tze Cheng, Levent Koc, Jeremiah Harmsen, Tal Shaked, Tushar Chandra, Hrishi Aradhye, Glen Anderson, Greg Corrado, Wei Chai, Mustafa Ispir, et al. 2016. Wide & deep learning for recommender systems. In Proceedings of the 1st workshop on deep learning for recommender systems.

Digital Library

[2]

Young-kyu Choi, Yuze Chi, Jie Wang, Licheng Guo, and Jason Cong. 2020. When HLS Meets FPGA HBM: Benchmarking and Bandwidth Optimization. arXiv preprint arXiv:2010.06075 (2020).

[3]

Johannes de Fine Licht, Grzegorz Kwasniewski, and Torsten Hoefler. 2020. Flexible Communication Avoiding Matrix Multiplication on FPGA with High-Level Synthesis. In The 2020 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. 244--254.

[4]

Udit Gupta, Samuel Hsia, Vikram Saraph, Xiaodong Wang, Brandon Reagen, Gu-Yeon Wei, Hsien-Hsin S Lee, David Brooks, and Carole-Jean Wu. 2020. Deep- RecSys: A System for Optimizing End-To-End At-scale Neural Recommendation Inference. In 2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA).

[5]

Udit Gupta, Carole-Jean Wu, Xiaodong Wang, Maxim Naumov, Brandon Reagen, David Brooks, Bradford Cottel, Kim Hazelwood, Mark Hempstead, Bill Jia, et al. 2020. The architectural implications of Facebook's DNN-based personalized recommendation. In 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[6]

Xiangnan He, Lizi Liao, Hanwang Zhang, Liqiang Nie, Xia Hu, and Tat-Seng Chua. 2017. Neural collaborative filtering. In Proceedings of the 26th international conference on world wide web.

Digital Library

[7]

Zhenhao He, Dario Korolija, and Gustavo Alonso. 2021. EasyNet: 100 Gbps Network for HLS. In 2021 31th International Conference on Field Programmable Logic and Applications (FPL).

[8]

Samuel Hsia, Udit Gupta, Mark Wilkening, Carole-Jean Wu, Gu-Yeon Wei, and David Brooks. 2020. Cross-Stack Workload Characterization of Deep Recommendation Systems. In 2020 IEEE International Symposium on Workload Characterization (IISWC).

[9]

Ranggi Hwang, Taehun Kim, Youngeun Kwon, and Minsoo Rhu. 2020. Centaur: A Chiplet-based, Hybrid Sparse-Dense Accelerator for Personalized Recommendations. In 2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA).

[10]

Wenqi Jiang, Zhenhao He, Shuai Zhang, Thomas B Preußer, Kai Zeng, Liang Feng, Jiansong Zhang, Tongxuan Liu, Yong Li, Jingren Zhou, et al. 2021. MicroRec: Efficient Recommendation Inference by Hardware and Data Structure Solutions. In 2021 4th Conference on Machine Learning and Systems (MLSys).

[11]

Hongshin Jun, Jinhee Cho, Kangseol Lee, Ho-Young Son, Kwiwook Kim, Hanho Jin, and Keith Kim. 2017. Hbm (high bandwidth memory) dram technology and architecture. In 2017 IEEE International Memory Workshop (IMW). IEEE, 1--4.

[12]

Vinod Kathail. 2020. Xilinx Vitis Unified Software Platform. In The 2020 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays.

[13]

Liu Ke, Udit Gupta, Benjamin Youngjae Cho, David Brooks, Vikas Chandra, Utku Diril, Amin Firoozshahian, Kim Hazelwood, Bill Jia, Hsien-Hsin S Lee, et al. 2020. Recnmp: Accelerating personalized recommendation with near-memory processing. In 2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA).

Digital Library

[14]

Youngeun Kwon, Yunjae Lee, and Minsoo Rhu. 2019. TensorDIMM: A Practical Near-Memory Processing Architecture for Embeddings and Tensor Operations in Deep Learning. In Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

Digital Library

[15]

Mike O'Connor. 2014. Highlights of the high-bandwidth memory (hbm) standard. In Memory Forum Workshop.

[16]

Zeke Wang, Hongjing Huang, Jie Zhang, and Gustavo Alonso. 2020. Benchmarking High Bandwidth Memory on FPGAs. In The 29th IEEE International Symposium On Field-Programmable Custom Computing Machines.

[17]

Zhe Zhao, Lichan Hong, Li Wei, Jilin Chen, Aniruddh Nath, Shawn Andrews, Aditee Kumthekar, Maheswaran Sathiamoorthy, Xinyang Yi, and Ed Chi. 2019. Recommending what video to watch next: a multitask ranking system. In Proceedings of the 13th ACM Conference on Recommender Systems.

Digital Library

[18]

Guorui Zhou, Xiaoqiang Zhu, Chenru Song, Ying Fan, Han Zhu, Xiao Ma, Yanghui Yan, Junqi Jin, Han Li, and Kun Gai. 2018. Deep interest network for click-through rate prediction. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining.

Digital Library

[19]

Yu Zhu, Zhenhao He, Wenqi Jiang, Kai Zeng, Jingren Zhou, and Gustavo Alonso. 2021. Distributed Recommendation Inference on FPGA Clusters. In 2021 31th International Conference on Field-Programmable Logic and Applications (FPL). IEEE.

Cited By

Li MReis DLaguna ANiemier MHu X(2024)Accelerating Recommendation Systems With In-Memory Embedding OperationsIEEE Transactions on Circuits and Systems for Artificial Intelligence10.1109/TCASAI.2024.34878171:2(244-256)Online publication date: Dec-2024
https://doi.org/10.1109/TCASAI.2024.3487817
You XYang HWang SPeng TDing CLi XChen BLuan ZLiu TLi YQian D(2024)Exploiting Structured Feature and Runtime Isolation for High-Performant Recommendation ServingIEEE Transactions on Computers10.1109/TC.2024.344974973:11(2474-2487)Online publication date: Nov-2024
https://doi.org/10.1109/TC.2024.3449749
Lu CXu HLi YChen WYe KXu C(2024)SMIless: Serving DAG-based Inference with Dynamic Invocations under Serverless ComputingSC24: International Conference for High Performance Computing, Networking, Storage and Analysis10.1109/SC41406.2024.00044(1-17)Online publication date: 17-Nov-2024
https://doi.org/10.1109/SC41406.2024.00044
Show More Cited By

Index Terms

FleetRec: Large-Scale Recommendation Inference on Hybrid GPU-FPGA Clusters

Recommendations

Performance and toolchain of a combined GPU/FPGA desktop (abstract only)
FPGA '13: Proceedings of the ACM/SIGDA international symposium on Field programmable gate arrays

Low-power, high-performance computing nowadays relies on accelerator cards to speed up the calculations. Combining the power of GPUs with the flexibility of FPGAs enlarges the scope of problems that can be accelerated [2, 3]. We describe the performance ...
Optimization schemes and performance evaluation of Smith–Waterman algorithm on CPU, GPU and FPGA

With fierce competition between CPU and graphics processing unit (GPU) platforms, performance evaluation has become the focus of various sectors. In this paper, we take a well-known algorithm in the field of biosequence matching and database searching, ...
FPGA, GPU, and CPU implementations of Jacobi algorithm for eigenanalysis

Parallel implementations of Jacobi algorithm for eigenanalysis of a matrix on most commonly used high performance computing (HPC) devices such as central processing unit (CPU), graphics processing unit (GPU), and field-programmable gate array (FPGA) are ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

KDD '21: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining

August 2021

4259 pages

ISBN:9781450383325

DOI:10.1145/3447548

General Chairs:
Feida Zhu
Singapore Management University
,
Beng Chin Ooi
National University of Singapore
,
Chunyan Miao
Nanyang Technology University
,
Program Chairs:
Haixun Wang,
Iryna Skrypnyk,
Wynne Hsu,
Sanjay Chawla

Copyright © 2021 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 August 2021

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

KDD '21

Sponsor:

KDD '21: The 27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 14 - 18, 2021

Virtual Event, Singapore

Acceptance Rates

Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Upcoming Conference

KDD '25

Sponsor:
sigkdd
sigkdd

The 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 3 - 7, 2025

Toronto , ON , Canada

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

16
Total Citations
View Citations
634
Total Downloads

Downloads (Last 12 months)187
Downloads (Last 6 weeks)19

Reflects downloads up to 23 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Li MReis DLaguna ANiemier MHu X(2024)Accelerating Recommendation Systems With In-Memory Embedding OperationsIEEE Transactions on Circuits and Systems for Artificial Intelligence10.1109/TCASAI.2024.34878171:2(244-256)Online publication date: Dec-2024
https://doi.org/10.1109/TCASAI.2024.3487817
You XYang HWang SPeng TDing CLi XChen BLuan ZLiu TLi YQian D(2024)Exploiting Structured Feature and Runtime Isolation for High-Performant Recommendation ServingIEEE Transactions on Computers10.1109/TC.2024.344974973:11(2474-2487)Online publication date: Nov-2024
https://doi.org/10.1109/TC.2024.3449749
Lu CXu HLi YChen WYe KXu C(2024)SMIless: Serving DAG-based Inference with Dynamic Invocations under Serverless ComputingSC24: International Conference for High Performance Computing, Networking, Storage and Analysis10.1109/SC41406.2024.00044(1-17)Online publication date: 17-Nov-2024
https://doi.org/10.1109/SC41406.2024.00044
Patel PChoukse EZhang CShah AGoiri ÍMaleki SBianchini R(2024)Splitwise: Efficient Generative LLM Inference Using Phase Splitting2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA)10.1109/ISCA59077.2024.00019(118-132)Online publication date: 29-Jun-2024
https://doi.org/10.1109/ISCA59077.2024.00019
Zhou YZhang FLin THuang YLong SZhai JDu X(2024)F-TADOC: FPGA-Based Text Analytics Directly on Compression with HLS2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00287(3739-3752)Online publication date: 13-May-2024
https://doi.org/10.1109/ICDE60146.2024.00287
Krishnan ANambiar MSinghal R(2023)Hetero-Rec++: Modelling-based Robust and Optimal Deployment of Embeddings RecommendationsProceedings of the Third International Conference on AI-ML Systems10.1145/3639856.3639878(1-9)Online publication date: 25-Oct-2023
https://dl.acm.org/doi/10.1145/3639856.3639878
Ye ZGao WHu QSun PWang XLuo YZhang TWen Y(2023)Deep Learning Workload Scheduling in GPU Datacenters: A SurveyACM Computing Surveys10.1145/3638757Online publication date: 27-Dec-2023
https://doi.org/10.1145/3638757
Pan ZZheng ZZhang FWu RLiang HWang DQiu XBai JLin WDu XAamodt TSwift MJerger N(2023)RECom: A Compiler Approach to Accelerating Recommendation Model Inference with Massive Embedding ColumnsProceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 410.1145/3623278.3624761(268-286)Online publication date: 25-Mar-2023
https://dl.acm.org/doi/10.1145/3623278.3624761
Hsia SGupta UAcun BArdalani NZhong PWei GBrooks DWu CAamodt TJerger NSwift M(2023)MP-Rec: Hardware-Software Co-design to Enable Multi-path RecommendationProceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 310.1145/3582016.3582068(449-465)Online publication date: 25-Mar-2023
https://dl.acm.org/doi/10.1145/3582016.3582068
Jiang WLi SZhu YDe Fine Licht JHe ZShi RRenggli CZhang SRekatsinas THoefler TAlonso GMohror KArnold DBadia R(2023)Co-design Hardware and Algorithm for Vector SearchProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/3581784.3607045(1-15)Online publication date: 12-Nov-2023
https://dl.acm.org/doi/10.1145/3581784.3607045
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents