Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3316781.3317802acmconferencesArticle/Chapter ViewAbstractPublication PagesdacConference Proceedingsconference-collections
research-article
Public Access

Dr. BFS: Data Centric Breadth-First Search on FPGAs

Published: 02 June 2019 Publication History

Abstract

The flexible architectures of Field Programmable Gate Arrays (FPGAs) lend themselves to an array of data analytical applications, among which Breadth-First Search (BFS), due to its vital importance, draws particular attention. Recent attempts that offload BFS on FPGAs either simply imitate the existing CPU- or Graphics Processing Units (GPU)- based mechanisms or suffer from scalability issues. To this end, we introduce a novel data centric design which extensively extracts the potential of FPGAs for BFS with the following two techniques. First, we advocate to partition and compress the BFS algorithmic metadata in order to buffer them in fast on-chip memory and circumvent the expensive metadata access. Second, we propose a hierarchical coalescing method to improve the throughput of graph data access. Taken together, our evaluation demonstrates that the proposed design achieves, on average, 1.6× and 2.2× speedups over the state-of-the-art FPGA designs TorusBFS and Umuroglu, respectively, across a collection of graph datasets.

References

[1]
Anil Gaihre, et al. Do bitcoin users really care about anonymity? an analysis of the bitcoin transaction graph. In 2018 IEEE International Conference on Big Data (Big Data), pages 1198--1207. IEEE, 2018.
[2]
Brahim Betkaoui, et al. A framework for FPGA acceleration of large graph problems: Graphlet counting case study. In 2011 International Conference on Field-Programmable Technology, pages 1--8. IEEE.
[3]
Nenad Trinajstic. Chemical graph theory. Routledge, 2018.
[4]
Richard C Murphy, et al. Introducing the graph 500. Cray User's Group (CUG), 19:45--74, 2010.
[5]
Duane Merrill, et al. Scalable gpu graph traversal. In ACM SIGPLAN Notices, volume 47, pages 117--128. ACM, 2012.
[6]
H. Liu et al. Enterprise: breadth-first graph traversal on GPUs. In SC '15: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, pages 1--12.
[7]
Shijie Zhou et al. Accelerating graph analytics on CPU-FPGA heterogeneous platform. In 2017 29th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD), pages 137--144. IEEE.
[8]
Shijie Zhou, et al. An FPGA framework for edge-centric graph processing. In Proceedings of the 15th ACM International Conference on Computing Frontiers - CF '18, pages 69--77. ACM Press.
[9]
Guoqing LEI, et al. TorusBFS: A novel message-passing parallel breadth-first search architecture on FPGAs. IRACST International Journal, 5(5):6.
[10]
Y. Umuroglu, et al. Hybrid breadth-first search on a single-chip FPGA-CPU heterogeneous platform. In 2015 25th International Conference on Field Programmable Logic and Applications (FPL), pages 1--8.
[11]
Michael Bauer, et al. Cudadma: optimizing gpu memory bandwidth via warp specialization. In SC, 2011.
[12]
Yangzihao Wang, et al. Gunrock: Gpu graph analytics. ACM Transactions on Parallel Computing (TOPC), 4(1):3, 2017.
[13]
Reynold S Xin, et al. Graphx: A resilient distributed graph system on spark. In First International Workshop on Graph Data Management Experiences and Systems, page 2. ACM, 2013.
[14]
Alexander Frolov, et al. Performance evaluation of breadth-first search on intel xeon phi. page 12.
[15]
Mireya Paredes, et al. Breadth first search vectorization on the intel xeon phi. In Proceedings of the ACM International Conference on Computing Frontiers, pages 1--10. ACM, 2016.
[16]
Shijie Zhou, et al. High-throughput and energy-efficient graph processing on FPGA. In 2016 IEEE 24th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM), pages 103--110. IEEE.
[17]
Grzegorz Malewicz, et al. Pregel: a system for large-scale graph processing. In Proceedings of the 2010 ACM SIGMOD International Conference on Management of data, pages 135--146. ACM, 2010.
[18]
Amitabha Roy, et al. X-stream: Edge-centric graph processing using streaming partitions. In Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles, pages 472--488. ACM, 2013.
[19]
Charles Eric LaForest et al. Efficient multi-ported memories for FPGAs. In Proceedings of the 18th annual ACM/SIGDA international symposium on Field programmable gate arrays - FPGA '10, page 41. ACM Press.
[20]
O. G. Attia, et al. CyGraph: A reconfigurable architecture for parallel breadth-first search. In 2014 IEEE International Parallel Distributed Processing Symposium Workshops, pages 228--235.
[21]
Jialiang Zhang, et al. Boosting the performance of FPGA-based graph processor using hybrid memory cube: A case for breadth first search.
[22]
Maohua Zhu, et al. Performance evaluation and optimization of hbm-enabled gpu for data-intensive applications. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 26(5):831--840, 2018.
[23]
Hang Liu et al. Graphene: fine-grained io management for graph computing. In Proceedings of the 15th Usenix Conference on File and Storage Technologies, pages 285--299. USENIX Association, 2017.
[24]
Hao Wei, et al. Speedup graph processing by graph ordering. In Proceedings of the 2016 International Conference on Management of Data, pages 1813--1828. ACM, 2016.
[25]
NVIDIA TESLA P100 GPU, https://images.nvidia.com/content/tesla/pdf/nvidiatesla-p100-pcie-datasheet.pdf.

Cited By

View all
  • (2024)ScalaBFS2: A High-performance BFS Accelerator on an HBM-enhanced FPGA ChipACM Transactions on Reconfigurable Technology and Systems10.1145/365003717:2(1-39)Online publication date: 29-Feb-2024
  • (2024)Optimisation and Evaluation of Breadth First Search with oneAPI/SYCL on Intel FPGAs: from Describing Algorithms to Describing ArchitecturesProceedings of the 12th International Workshop on OpenCL and SYCL10.1145/3648115.3648134(1-11)Online publication date: 8-Apr-2024
  • (2024)GraFlex: Flexible Graph Processing on FPGAs through Customized Scalable Interconnection NetworkProceedings of the 2024 ACM/SIGDA International Symposium on Field Programmable Gate Arrays10.1145/3626202.3637573(143-153)Online publication date: 1-Apr-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
DAC '19: Proceedings of the 56th Annual Design Automation Conference 2019
June 2019
1378 pages
ISBN:9781450367257
DOI:10.1145/3316781
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

In-Cooperation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 02 June 2019

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Funding Sources

Conference

DAC '19
Sponsor:

Acceptance Rates

Overall Acceptance Rate 1,770 of 5,499 submissions, 32%

Upcoming Conference

DAC '25
62nd ACM/IEEE Design Automation Conference
June 22 - 26, 2025
San Francisco , CA , USA

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)96
  • Downloads (Last 6 weeks)9
Reflects downloads up to 18 Aug 2024

Other Metrics

Citations

Cited By

View all
  • (2024)ScalaBFS2: A High-performance BFS Accelerator on an HBM-enhanced FPGA ChipACM Transactions on Reconfigurable Technology and Systems10.1145/365003717:2(1-39)Online publication date: 29-Feb-2024
  • (2024)Optimisation and Evaluation of Breadth First Search with oneAPI/SYCL on Intel FPGAs: from Describing Algorithms to Describing ArchitecturesProceedings of the 12th International Workshop on OpenCL and SYCL10.1145/3648115.3648134(1-11)Online publication date: 8-Apr-2024
  • (2024)GraFlex: Flexible Graph Processing on FPGAs through Customized Scalable Interconnection NetworkProceedings of the 2024 ACM/SIGDA International Symposium on Field Programmable Gate Arrays10.1145/3626202.3637573(143-153)Online publication date: 1-Apr-2024
  • (2024)Parallelization of butterfly counting on hierarchical memoryThe VLDB Journal10.1007/s00778-024-00856-x33:5(1453-1484)Online publication date: 7-Jun-2024
  • (2023)Distributed large-scale graph processing on FPGAsJournal of Big Data10.1186/s40537-023-00756-x10:1Online publication date: 4-Jun-2023
  • (2023)I/O-Efficient Butterfly Counting at ScaleProceedings of the ACM on Management of Data10.1145/35887141:1(1-27)Online publication date: 30-May-2023
  • (2022)ThunderGP: Resource-Efficient Graph Processing Framework on FPGAs with HLSACM Transactions on Reconfigurable Technology and Systems10.1145/351714115:4(1-31)Online publication date: 9-Dec-2022
  • (2022)Ultra-Fast FPGA Implementation of Graph Cut Algorithm With Ripple Push and Early TerminationIEEE Transactions on Circuits and Systems I: Regular Papers10.1109/TCSI.2021.313759069:4(1532-1545)Online publication date: Apr-2022
  • (2021)ThunderGPThe 2021 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays10.1145/3431920.3439290(69-80)Online publication date: 17-Feb-2021
  • (2019)SIMD-XProceedings of the 2019 USENIX Conference on Usenix Annual Technical Conference10.5555/3358807.3358843(411-427)Online publication date: 10-Jul-2019

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media