research-article

GraVF-M: Graph Processing System Generation for Multi-FPGA Platforms

Authors:

Nina Engelhardt,

Hayden K.-H. SoAuthors Info & Claims

ACM Transactions on Reconfigurable Technology and Systems (TRETS), Volume 12, Issue 4

Article No.: 21, Pages 1 - 28

https://doi.org/10.1145/3357596

Published: 21 November 2019 Publication History

Abstract

Due to the irregular nature of connections in most graph datasets, partitioning graph analysis algorithms across multiple computational nodes that do not share a common memory inevitably leads to large amounts of interconnect traffic. Previous research has shown that FPGAs can outcompete software-based graph processing in shared memory contexts, but it remains an open question if this advantage can be maintained in distributed systems.

In this work, we present GraVF-M, a framework designed to ease the implementation of FPGA-based graph processing accelerators for multi-FPGA platforms with distributed memory. Based on a lightweight description of the algorithm kernel, the framework automatically generates optimized RTL code for the whole multi-FPGA design. We exploit an aspect of the programming model to present a familiar message-passing paradigm to the user, while under the hood implementing a more efficient architecture that can reduce the necessary inter-FPGA network traffic by a factor equal to the average degree of the input graph. A performance model based on a theoretical analysis of the factors influencing performance serves to evaluate the efficiency of our implementation. With a throughput of up to 5.8GTEPS (billions of traversed edges per second) on a 4-FPGA system, the designs generated by GraVF-M compare favorably to state-of-the-art frameworks from the literature and reach 94% of the projected performance limit of the system.

References

[1]

O. G. Attia, T. Johnson, K. Townsend, P. Jones, and J. Zambreno. 2014. CyGraph: A reconfigurable architecture for parallel breadth-first search. In Proceedings of the IEEE International Parallel Distributed Processing Symposium Workshops (IPDPSW’14). 228--235.

[2]

O. G. Attia, A. Grieve, K. R. Townsend, P. Jones, and J. Zambreno. 2015. Accelerating all-pairs shortest path using a message-passing reconfigurable architecture. In Proceedings of the International Conference on ReConFigurable Computing and FPGAs (ReConFig’15). 1--6.

[3]

Brahim Betkaoui, David B. Thomas, Wayne Luk, and Natasa Przulj. 2011. A framework for FPGA acceleration of large graph problems: Graphlet counting case study. In Proceedings of the International Conference on Field-Programmable Technology (FPT’11). IEEE, 1--8.

[4]

B. Betkaoui, Y. Wang, D. B. Thomas, and W. Luk. 2012. A reconfigurable computing approach for efficient and scalable parallel graph exploration. In Proceedings of the 23rd International Conference on Application-Specific Systems, Architectures and Processors (ASAP’12).

[5]

D. Chakrabarti, Y. Zhan, and C. Faloutsos. 2004. R-MAT: A Recursive Model for Graph Mining. 442--446.

[6]

Convey Computer. 2011. Convey Computer Doubles Graph500 Performance, Develops New Graph Personality. Press release. Retrieved from: http://investors.micron.com/news-releases/news-release-details/convey-computer-doubles-graph500-performance-develops-new-graph.

[7]

Convey Computer. 2012. New Convey MX™ Demonstrates Leading Power/Performance on Graph 500 Benchmark. Press release. Retrieved from: https://www.yahoo.com/news/convey-mx-tm-demonstrates-leading-214814156.html.

[8]

Guohao Dai, Yuze Chi, Yu Wang, and Huazhong Yang. 2016. FPGP: Graph processing framework on FPGA a case study of breadth-first search. In Proceedings of the ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA’16). ACM, New York, NY, USA, 105--110.

Digital Library

[9]

G. Dai, T. Huang, Y. Chi, N. Xu, Y. Wang, and H. Yang. 2017. ForeGraph: Exploring large-scale graph processing on multi-FPGA architecture. In Proceedings of the International Symposium on Field-Programmable Gate Arrays (FPGA’17). ACM, 10.

[10]

Michael Delorimier, Nachiket Kapre, Nikil Mehta, and André Dehon. 2011. Spatial hardware implementation for sparse graph algorithms in GraphStep. ACM Trans. Auton. Adapt. Syst. 6, 3, Article 17 (Sept. 2011), 20 pages.

Digital Library

[11]

M. deLorimier, N. Kapre, N. Mehta, D. Rizzo, I. Eslick, R. Rubin, T. E. Uribe, T. F. Knight Jr., and A. DeHon. 2006. GraphStep: A system architecture for sparse-graph algorithms. In Proceedings of the 14th IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM’06).

[12]

Nina Engelhardt, Dominic C.-H. Hung, and Hayden K.-H. So. 2018. Performance-driven system generation for distributed vertex-centric graph processing on multi-FPGA systems. In Proceedings of the 28th International Conference on Field Programmable Logic and Applications (FPL’18). 215--2153.

[13]

N. Engelhardt and H. K. H. So. 2016. GraVF: A vertex-centric distributed graph processing framework on FPGAs. In Proceedings of the 26th International Conference on Field Programmable Logic and Applications (FPL’16).

[14]

Nina Engelhardt and Hayden Kwok-Hay So. 2017. Towards flexible automatic generation of graph processing gateware. In Proceedings of the International Symposium on Highly-Efficient Accelerators and Reconfigurable Technologies (HEART’17).

Digital Library

[15]

N. Kapre. 2015. Custom FPGA-based soft-processors for sparse graph acceleration. In Proceedings of the 26th International Conference on Application-specific Systems, Architectures and Processors (ASAP’15).

[16]

G. Karypis and V. Kumar. 1998. A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM J. Sci. Comput. 20, 1 (1998), 359--392.

Digital Library

[17]

Soroosh Khoram, Jialiang Zhang, Maxwell Strange, and Jing Li. 2018. Accelerating graph analytics by co-optimizing storage and access on an FPGA-HMC platform. In Proceedings of the ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA’18). ACM, New York, NY, 239--248.

Digital Library

[18]

Richard E. Korf. 2009. Multi-way number partitioning. In Proceedings of the 21st International Joint Conference on Artifical Intelligence (IJCAI’09). Morgan Kaufmann Publishers Inc., San Francisco, CA, 538--543. Retrieved from: http://dl.acm.org/citation.cfm?id=1661445.1661531.

[19]

Jinho Lee, Heesu Kim, Sungjoo Yoo, Kiyoung Choi, H. Peter Hofstee, Gi-Joon Nam, Mark R. Nutter, and Damir Jamsek. 2017. ExtraV: Boosting graph processing near storage with a coherent accelerator. Proc. VLDB Endow. 10, 12 (Aug. 2017), 1706--1717.

Digital Library

[20]

Guoqing Lei, Rongchun Li, Song Guo, and Fei Xia. 2015. TorusBFS: A novel message-passing parallel breadth-first search architecture on FPGAs. IRACST—Eng. Sci. Technol.: Int. J. 5, 5 (Oct. 2015), 313--318.

[21]

Jure Leskovec and Andrej Krevl. 2014. SNAP Datasets: Stanford Large Network Dataset Collection. Retrieved from: http://snap.stanford.edu/data.

[22]

M-Labs. 2012. Migen. Retrieved from: http://m-labs.hk/migen.

[23]

Xiaoyu Ma, Dan Zhang, and Derek Chiou. 2017. FPGA-accelerated transactional execution of graph workloads. In Proceedings of the ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA’17). ACM, New York, NY, 227--236.

Digital Library

[24]

G. Malewicz, M. Austern, A. Bik, J. Dehnert, I. Horn, N. Leiser, and G. Czajkowski. 2010. Pregel: A system for large-scale graph processing. In Proceedings of the ACM SIGMOD International Conference on Management of Data. ACM.

[25]

Richard C. Murphy, Kyle B. Wheeler, Brian W. Barrett, and James A. Ang. 2010. Introducing the Graph500. Cray User’s Group (2010). Retrieved from: https://cug.org/5-publications/proceedings_attendee_lists/CUG10CD/pages/1-program/final_program/CUG10_Proceedings/pages/authors/11-15Wednesday/14C-Murphy-paper.pdf.

[26]

E. Nurvitadhi, G. Weisz, Y. Wang, S. Hurkat, M. Nguyen, J. C. Hoe, J. F. Martínez, and C. Guestrin. 2014. GraphGen: An FPGA framework for vertex-centric graph computation. In Proceedings of the 22nd International Symposium on Field-Programmable Custom Computing Machines (FCCM’14). IEEE, 25--28.

[27]

Tayo Oguntebi and Kunle Olukotun. 2016. GraphOps: A dataflow library for graph analytics acceleration. In Proceedings of the ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA’16). ACM, New York, NY, 111--117.

Digital Library

[28]

Y. Umuroglu, D. Morrison, and M. Jahre. 2015. Hybrid breadth-first search on a single-chip FPGA-CPU heterogeneous platform. In Proceedings of the International Conference on Field Programmable Logic and Applications (FPL’15).

[29]

Q. Wang, W. Jiang, Y. Xia, and V. Prasanna. 2010. A message-passing multi-softcore architecture on FPGA for breadth-first search. In Proceedings of the International Conference on Field-Programmable Technology (FPT’10).

[30]

J. Zhang, S. Khoram, and J. Li. 2017. Boosting the performance of FPGA-based graph processor using hybrid memory cube: A case for breadth-first search. In Proceedings of the International Symposium on Field-Programmable Gate Arrays (FPGA’17). ACM, 10.

[31]

Jialiang Zhang and Jing Li. 2018. Degree-aware hybrid graph traversal on FPGA-HMC platform. In Proceedings of the ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA’18). ACM, New York, NY, 229--238.

Digital Library

[32]

J. Zhou, S. Liu, Q. Guo, X. Zhou, T. Zhi, D. Liu, C. Wang, X. Zhou, Y. Chen, and T. Chen. 2017. TuNao: A high-performance and energy-efficient reconfigurable accelerator for graph processing. In Proceedings of the 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID’17). 731--734.

[33]

S. Zhou, C. Chelmis, and V. K. Prasanna. 2016. High-throughput and energy-efficient graph processing on FPGA. In Proceedings of the IEEE 24th International Symposium on Field-Programmable Custom Computing Machines (FCCM’16). 103--110.

[34]

Shijie Zhou, Rajgopal Kannan, Hanqing Zeng, and Viktor K. Prasanna. 2018. An FPGA framework for edge-centric graph processing. In Proceedings of the 15th ACM International Conference on Computing Frontiers (CF’18). ACM, New York, NY, 69--77.

[35]

S. Zhou and V. K. Prasanna. 2017. Accelerating graph analytics on CPU-FPGA heterogeneous platform. In Proceedings of the 29th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD’17). 137--144.

[36]

Xiaowei Zhu, Wentao Han, and Wenguang Chen. 2015. GridGraph: Large-scale graph processing on a single machine using 2-level hierarchical partitioning. In Proceedings of the USENIX Annual Technical Conference (USENIX ATC’15). USENIX Association, 375--386. Retrieved from: https://www.usenix.org/conference/atc15/technical-session/presentation/zhu.

Cited By

Jeziorek KWzorek PBlachut KPinna AKryjak T(2024)Optimising Graph Representation for Hardware Implementation of Graph Convolutional Networks for Event-Based VisionDesign and Architectures for Signal and Image Processing10.1007/978-3-031-62874-0_9(110-122)Online publication date: 17-Jan-2024
https://dl.acm.org/doi/10.1007/978-3-031-62874-0_9
Sahebi ABarbone MProcaccini MLuk WGaydadjiev GGiorgi R(2023)Distributed large-scale graph processing on FPGAsJournal of Big Data10.1186/s40537-023-00756-x10:1Online publication date: 4-Jun-2023
https://doi.org/10.1186/s40537-023-00756-x
Zhang XChang YLu TZhang KChen M(2023)Rethinking Design Paradigm of Graph Processing System with a CXL-like Memory Semantic Fabric2023 IEEE/ACM 23rd International Symposium on Cluster, Cloud and Internet Computing (CCGrid)10.1109/CCGrid57682.2023.00013(25-35)Online publication date: May-2023
https://doi.org/10.1109/CCGrid57682.2023.00013
Show More Cited By

Index Terms

GraVF-M: Graph Processing System Generation for Multi-FPGA Platforms
1. Computer systems organization
  1. Architectures
    1. Other architectures
      1. Reconfigurable computing
2. Hardware
  1. Integrated circuits
    1. Reconfigurable logic and FPGAs
      1. Hardware accelerators
      2. Reconfigurable logic applications

Recommendations

ThunderGP: Resource-Efficient Graph Processing Framework on FPGAs with HLS
FPGA has been an emerging computing infrastructure in datacenters benefiting from fine-grained parallelism, energy efficiency, and reconfigurability. Meanwhile, graph processing has attracted tremendous interest in data analytics, and its performance is ...
ThunderGP: HLS-based Graph Processing Framework on FPGAs
FPGA '21: The 2021 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays

FPGA has been an emerging computing infrastructure in datacenters benefiting from features of fine-grained parallelism, energy efficiency, and reconfigurability. Meanwhile, graph processing has attracted tremendous interest in data analytics, and its ...
ForeGraph: Exploring Large-scale Graph Processing on Multi-FPGA Architecture
FPGA '17: Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays

The performance of large-scale graph processing suffers from challenges including poor locality, lack of scalability, random access pattern, and heavy data conflicts. Some characteristics of FPGA make it a promising solution to accelerate various ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Reconfigurable Technology and Systems

ACM Transactions on Reconfigurable Technology and Systems Volume 12, Issue 4

December 2019

163 pages

ISSN:1936-7406

EISSN:1936-7414

DOI:10.1145/3361265

Editor:
Deming Chen
University of Illinois, Urbana-Champaign Urbana

Issue’s Table of Contents

Copyright © 2019 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 21 November 2019

Accepted: 01 August 2019

Revised: 01 June 2019

Received: 01 February 2019

Published in TRETS Volume 12, Issue 4

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Funding Sources

Research Grants Council of Hong Kong
Croucher Foundation (Croucher Innovation Award 2013)

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

7
Total Citations
View Citations
244
Total Downloads

Downloads (Last 12 months)44
Downloads (Last 6 weeks)4

Reflects downloads up to 18 Aug 2024

Other Metrics

View Author Metrics

Citations

Cited By

Jeziorek KWzorek PBlachut KPinna AKryjak T(2024)Optimising Graph Representation for Hardware Implementation of Graph Convolutional Networks for Event-Based VisionDesign and Architectures for Signal and Image Processing10.1007/978-3-031-62874-0_9(110-122)Online publication date: 17-Jan-2024
https://dl.acm.org/doi/10.1007/978-3-031-62874-0_9
Sahebi ABarbone MProcaccini MLuk WGaydadjiev GGiorgi R(2023)Distributed large-scale graph processing on FPGAsJournal of Big Data10.1186/s40537-023-00756-x10:1Online publication date: 4-Jun-2023
https://doi.org/10.1186/s40537-023-00756-x
Zhang XChang YLu TZhang KChen M(2023)Rethinking Design Paradigm of Graph Processing System with a CXL-like Memory Semantic Fabric2023 IEEE/ACM 23rd International Symposium on Cluster, Cloud and Internet Computing (CCGrid)10.1109/CCGrid57682.2023.00013(25-35)Online publication date: May-2023
https://doi.org/10.1109/CCGrid57682.2023.00013
Chen XCheng FTan HChen YHe BWong WChen D(2022)ThunderGP: Resource-Efficient Graph Processing Framework on FPGAs with HLSACM Transactions on Reconfigurable Technology and Systems10.1145/351714115:4(1-31)Online publication date: 9-Dec-2022
https://dl.acm.org/doi/10.1145/3517141
Deng JFu XZhang BWang JZhang PXie X(2022)Graph_CC: Accelerator of Connected Component Search in Graph Computing2022 7th International Conference on Integrated Circuits and Microsystems (ICICM)10.1109/ICICM56102.2022.10011381(441-447)Online publication date: 28-Oct-2022
https://doi.org/10.1109/ICICM56102.2022.10011381
Zhang XChang YLu TLiu KZhang KChen M(2022)GraFF: A Multi-FPGA System with Memory Semantic Fabric for Scalable Graph Processing2022 International Conference on Field-Programmable Technology (ICFPT)10.1109/ICFPT56656.2022.9974189(1-2)Online publication date: 5-Dec-2022
https://doi.org/10.1109/ICFPT56656.2022.9974189
Wu YWang QZheng LLiao XJin HJiang WZheng RHu K(2021)FDGLib: A Communication Library for Efficient Large-Scale Graph Processing in FPGA-Accelerated Data CentersJournal of Computer Science and Technology10.1007/s11390-021-1242-y36:5(1051-1070)Online publication date: 30-Sep-2021
https://doi.org/10.1007/s11390-021-1242-y

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View Issue’s Table of Contents