Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

GraVF-M: Graph Processing System Generation for Multi-FPGA Platforms

Published: 21 November 2019 Publication History

Abstract

Due to the irregular nature of connections in most graph datasets, partitioning graph analysis algorithms across multiple computational nodes that do not share a common memory inevitably leads to large amounts of interconnect traffic. Previous research has shown that FPGAs can outcompete software-based graph processing in shared memory contexts, but it remains an open question if this advantage can be maintained in distributed systems.
In this work, we present GraVF-M, a framework designed to ease the implementation of FPGA-based graph processing accelerators for multi-FPGA platforms with distributed memory. Based on a lightweight description of the algorithm kernel, the framework automatically generates optimized RTL code for the whole multi-FPGA design. We exploit an aspect of the programming model to present a familiar message-passing paradigm to the user, while under the hood implementing a more efficient architecture that can reduce the necessary inter-FPGA network traffic by a factor equal to the average degree of the input graph. A performance model based on a theoretical analysis of the factors influencing performance serves to evaluate the efficiency of our implementation. With a throughput of up to 5.8GTEPS (billions of traversed edges per second) on a 4-FPGA system, the designs generated by GraVF-M compare favorably to state-of-the-art frameworks from the literature and reach 94% of the projected performance limit of the system.

References

[1]
O. G. Attia, T. Johnson, K. Townsend, P. Jones, and J. Zambreno. 2014. CyGraph: A reconfigurable architecture for parallel breadth-first search. In Proceedings of the IEEE International Parallel Distributed Processing Symposium Workshops (IPDPSW’14). 228--235.
[2]
O. G. Attia, A. Grieve, K. R. Townsend, P. Jones, and J. Zambreno. 2015. Accelerating all-pairs shortest path using a message-passing reconfigurable architecture. In Proceedings of the International Conference on ReConFigurable Computing and FPGAs (ReConFig’15). 1--6.
[3]
Brahim Betkaoui, David B. Thomas, Wayne Luk, and Natasa Przulj. 2011. A framework for FPGA acceleration of large graph problems: Graphlet counting case study. In Proceedings of the International Conference on Field-Programmable Technology (FPT’11). IEEE, 1--8.
[4]
B. Betkaoui, Y. Wang, D. B. Thomas, and W. Luk. 2012. A reconfigurable computing approach for efficient and scalable parallel graph exploration. In Proceedings of the 23rd International Conference on Application-Specific Systems, Architectures and Processors (ASAP’12).
[5]
D. Chakrabarti, Y. Zhan, and C. Faloutsos. 2004. R-MAT: A Recursive Model for Graph Mining. 442--446.
[6]
Convey Computer. 2011. Convey Computer Doubles Graph500 Performance, Develops New Graph Personality. Press release. Retrieved from: http://investors.micron.com/news-releases/news-release-details/convey-computer-doubles-graph500-performance-develops-new-graph.
[7]
Convey Computer. 2012. New Convey MX™ Demonstrates Leading Power/Performance on Graph 500 Benchmark. Press release. Retrieved from: https://www.yahoo.com/news/convey-mx-tm-demonstrates-leading-214814156.html.
[8]
Guohao Dai, Yuze Chi, Yu Wang, and Huazhong Yang. 2016. FPGP: Graph processing framework on FPGA a case study of breadth-first search. In Proceedings of the ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA’16). ACM, New York, NY, USA, 105--110.
[9]
G. Dai, T. Huang, Y. Chi, N. Xu, Y. Wang, and H. Yang. 2017. ForeGraph: Exploring large-scale graph processing on multi-FPGA architecture. In Proceedings of the International Symposium on Field-Programmable Gate Arrays (FPGA’17). ACM, 10.
[10]
Michael Delorimier, Nachiket Kapre, Nikil Mehta, and André Dehon. 2011. Spatial hardware implementation for sparse graph algorithms in GraphStep. ACM Trans. Auton. Adapt. Syst. 6, 3, Article 17 (Sept. 2011), 20 pages.
[11]
M. deLorimier, N. Kapre, N. Mehta, D. Rizzo, I. Eslick, R. Rubin, T. E. Uribe, T. F. Knight Jr., and A. DeHon. 2006. GraphStep: A system architecture for sparse-graph algorithms. In Proceedings of the 14th IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM’06).
[12]
Nina Engelhardt, Dominic C.-H. Hung, and Hayden K.-H. So. 2018. Performance-driven system generation for distributed vertex-centric graph processing on multi-FPGA systems. In Proceedings of the 28th International Conference on Field Programmable Logic and Applications (FPL’18). 215--2153.
[13]
N. Engelhardt and H. K. H. So. 2016. GraVF: A vertex-centric distributed graph processing framework on FPGAs. In Proceedings of the 26th International Conference on Field Programmable Logic and Applications (FPL’16).
[14]
Nina Engelhardt and Hayden Kwok-Hay So. 2017. Towards flexible automatic generation of graph processing gateware. In Proceedings of the International Symposium on Highly-Efficient Accelerators and Reconfigurable Technologies (HEART’17).
[15]
N. Kapre. 2015. Custom FPGA-based soft-processors for sparse graph acceleration. In Proceedings of the 26th International Conference on Application-specific Systems, Architectures and Processors (ASAP’15).
[16]
G. Karypis and V. Kumar. 1998. A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM J. Sci. Comput. 20, 1 (1998), 359--392.
[17]
Soroosh Khoram, Jialiang Zhang, Maxwell Strange, and Jing Li. 2018. Accelerating graph analytics by co-optimizing storage and access on an FPGA-HMC platform. In Proceedings of the ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA’18). ACM, New York, NY, 239--248.
[18]
Richard E. Korf. 2009. Multi-way number partitioning. In Proceedings of the 21st International Joint Conference on Artifical Intelligence (IJCAI’09). Morgan Kaufmann Publishers Inc., San Francisco, CA, 538--543. Retrieved from: http://dl.acm.org/citation.cfm?id=1661445.1661531.
[19]
Jinho Lee, Heesu Kim, Sungjoo Yoo, Kiyoung Choi, H. Peter Hofstee, Gi-Joon Nam, Mark R. Nutter, and Damir Jamsek. 2017. ExtraV: Boosting graph processing near storage with a coherent accelerator. Proc. VLDB Endow. 10, 12 (Aug. 2017), 1706--1717.
[20]
Guoqing Lei, Rongchun Li, Song Guo, and Fei Xia. 2015. TorusBFS: A novel message-passing parallel breadth-first search architecture on FPGAs. IRACST—Eng. Sci. Technol.: Int. J. 5, 5 (Oct. 2015), 313--318.
[21]
Jure Leskovec and Andrej Krevl. 2014. SNAP Datasets: Stanford Large Network Dataset Collection. Retrieved from: http://snap.stanford.edu/data.
[22]
M-Labs. 2012. Migen. Retrieved from: http://m-labs.hk/migen.
[23]
Xiaoyu Ma, Dan Zhang, and Derek Chiou. 2017. FPGA-accelerated transactional execution of graph workloads. In Proceedings of the ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA’17). ACM, New York, NY, 227--236.
[24]
G. Malewicz, M. Austern, A. Bik, J. Dehnert, I. Horn, N. Leiser, and G. Czajkowski. 2010. Pregel: A system for large-scale graph processing. In Proceedings of the ACM SIGMOD International Conference on Management of Data. ACM.
[25]
Richard C. Murphy, Kyle B. Wheeler, Brian W. Barrett, and James A. Ang. 2010. Introducing the Graph500. Cray User’s Group (2010). Retrieved from: https://cug.org/5-publications/proceedings_attendee_lists/CUG10CD/pages/1-program/final_program/CUG10_Proceedings/pages/authors/11-15Wednesday/14C-Murphy-paper.pdf.
[26]
E. Nurvitadhi, G. Weisz, Y. Wang, S. Hurkat, M. Nguyen, J. C. Hoe, J. F. Martínez, and C. Guestrin. 2014. GraphGen: An FPGA framework for vertex-centric graph computation. In Proceedings of the 22nd International Symposium on Field-Programmable Custom Computing Machines (FCCM’14). IEEE, 25--28.
[27]
Tayo Oguntebi and Kunle Olukotun. 2016. GraphOps: A dataflow library for graph analytics acceleration. In Proceedings of the ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA’16). ACM, New York, NY, 111--117.
[28]
Y. Umuroglu, D. Morrison, and M. Jahre. 2015. Hybrid breadth-first search on a single-chip FPGA-CPU heterogeneous platform. In Proceedings of the International Conference on Field Programmable Logic and Applications (FPL’15).
[29]
Q. Wang, W. Jiang, Y. Xia, and V. Prasanna. 2010. A message-passing multi-softcore architecture on FPGA for breadth-first search. In Proceedings of the International Conference on Field-Programmable Technology (FPT’10).
[30]
J. Zhang, S. Khoram, and J. Li. 2017. Boosting the performance of FPGA-based graph processor using hybrid memory cube: A case for breadth-first search. In Proceedings of the International Symposium on Field-Programmable Gate Arrays (FPGA’17). ACM, 10.
[31]
Jialiang Zhang and Jing Li. 2018. Degree-aware hybrid graph traversal on FPGA-HMC platform. In Proceedings of the ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA’18). ACM, New York, NY, 229--238.
[32]
J. Zhou, S. Liu, Q. Guo, X. Zhou, T. Zhi, D. Liu, C. Wang, X. Zhou, Y. Chen, and T. Chen. 2017. TuNao: A high-performance and energy-efficient reconfigurable accelerator for graph processing. In Proceedings of the 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID’17). 731--734.
[33]
S. Zhou, C. Chelmis, and V. K. Prasanna. 2016. High-throughput and energy-efficient graph processing on FPGA. In Proceedings of the IEEE 24th International Symposium on Field-Programmable Custom Computing Machines (FCCM’16). 103--110.
[34]
Shijie Zhou, Rajgopal Kannan, Hanqing Zeng, and Viktor K. Prasanna. 2018. An FPGA framework for edge-centric graph processing. In Proceedings of the 15th ACM International Conference on Computing Frontiers (CF’18). ACM, New York, NY, 69--77.
[35]
S. Zhou and V. K. Prasanna. 2017. Accelerating graph analytics on CPU-FPGA heterogeneous platform. In Proceedings of the 29th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD’17). 137--144.
[36]
Xiaowei Zhu, Wentao Han, and Wenguang Chen. 2015. GridGraph: Large-scale graph processing on a single machine using 2-level hierarchical partitioning. In Proceedings of the USENIX Annual Technical Conference (USENIX ATC’15). USENIX Association, 375--386. Retrieved from: https://www.usenix.org/conference/atc15/technical-session/presentation/zhu.

Cited By

View all
  • (2024)Optimising Graph Representation for Hardware Implementation of Graph Convolutional Networks for Event-Based VisionDesign and Architectures for Signal and Image Processing10.1007/978-3-031-62874-0_9(110-122)Online publication date: 17-Jan-2024
  • (2023)Distributed large-scale graph processing on FPGAsJournal of Big Data10.1186/s40537-023-00756-x10:1Online publication date: 4-Jun-2023
  • (2023)Rethinking Design Paradigm of Graph Processing System with a CXL-like Memory Semantic Fabric2023 IEEE/ACM 23rd International Symposium on Cluster, Cloud and Internet Computing (CCGrid)10.1109/CCGrid57682.2023.00013(25-35)Online publication date: May-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Reconfigurable Technology and Systems
ACM Transactions on Reconfigurable Technology and Systems  Volume 12, Issue 4
December 2019
163 pages
ISSN:1936-7406
EISSN:1936-7414
DOI:10.1145/3361265
  • Editor:
  • Deming Chen
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 21 November 2019
Accepted: 01 August 2019
Revised: 01 June 2019
Received: 01 February 2019
Published in TRETS Volume 12, Issue 4

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. FPGA
  2. GraVF-M
  3. Vertex centric
  4. graph processing
  5. multi-FPGA architecture
  6. performance modelling

Qualifiers

  • Research-article
  • Research
  • Refereed

Funding Sources

  • Research Grants Council of Hong Kong
  • Croucher Foundation (Croucher Innovation Award 2013)

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)44
  • Downloads (Last 6 weeks)4
Reflects downloads up to 18 Aug 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Optimising Graph Representation for Hardware Implementation of Graph Convolutional Networks for Event-Based VisionDesign and Architectures for Signal and Image Processing10.1007/978-3-031-62874-0_9(110-122)Online publication date: 17-Jan-2024
  • (2023)Distributed large-scale graph processing on FPGAsJournal of Big Data10.1186/s40537-023-00756-x10:1Online publication date: 4-Jun-2023
  • (2023)Rethinking Design Paradigm of Graph Processing System with a CXL-like Memory Semantic Fabric2023 IEEE/ACM 23rd International Symposium on Cluster, Cloud and Internet Computing (CCGrid)10.1109/CCGrid57682.2023.00013(25-35)Online publication date: May-2023
  • (2022)ThunderGP: Resource-Efficient Graph Processing Framework on FPGAs with HLSACM Transactions on Reconfigurable Technology and Systems10.1145/351714115:4(1-31)Online publication date: 9-Dec-2022
  • (2022)Graph_CC: Accelerator of Connected Component Search in Graph Computing2022 7th International Conference on Integrated Circuits and Microsystems (ICICM)10.1109/ICICM56102.2022.10011381(441-447)Online publication date: 28-Oct-2022
  • (2022)GraFF: A Multi-FPGA System with Memory Semantic Fabric for Scalable Graph Processing2022 International Conference on Field-Programmable Technology (ICFPT)10.1109/ICFPT56656.2022.9974189(1-2)Online publication date: 5-Dec-2022
  • (2021)FDGLib: A Communication Library for Efficient Large-Scale Graph Processing in FPGA-Accelerated Data CentersJournal of Computer Science and Technology10.1007/s11390-021-1242-y36:5(1051-1070)Online publication date: 30-Sep-2021

View Options

Get Access

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media