research-article

Public Access

Dr. BFS: Data Centric Breadth-First Search on FPGAs

Authors:

Zachary Sherer,

Yan LuoAuthors Info & Claims

DAC '19: Proceedings of the 56th Annual Design Automation Conference 2019

Article No.: 208, Pages 1 - 6

https://doi.org/10.1145/3316781.3317802

Published: 02 June 2019 Publication History

Abstract

The flexible architectures of Field Programmable Gate Arrays (FPGAs) lend themselves to an array of data analytical applications, among which Breadth-First Search (BFS), due to its vital importance, draws particular attention. Recent attempts that offload BFS on FPGAs either simply imitate the existing CPU- or Graphics Processing Units (GPU)- based mechanisms or suffer from scalability issues. To this end, we introduce a novel data centric design which extensively extracts the potential of FPGAs for BFS with the following two techniques. First, we advocate to partition and compress the BFS algorithmic metadata in order to buffer them in fast on-chip memory and circumvent the expensive metadata access. Second, we propose a hierarchical coalescing method to improve the throughput of graph data access. Taken together, our evaluation demonstrates that the proposed design achieves, on average, 1.6× and 2.2× speedups over the state-of-the-art FPGA designs TorusBFS and Umuroglu, respectively, across a collection of graph datasets.

References

[1]

Anil Gaihre, et al. Do bitcoin users really care about anonymity? an analysis of the bitcoin transaction graph. In 2018 IEEE International Conference on Big Data (Big Data), pages 1198--1207. IEEE, 2018.

[2]

Brahim Betkaoui, et al. A framework for FPGA acceleration of large graph problems: Graphlet counting case study. In 2011 International Conference on Field-Programmable Technology, pages 1--8. IEEE.

[3]

Nenad Trinajstic. Chemical graph theory. Routledge, 2018.

[4]

Richard C Murphy, et al. Introducing the graph 500. Cray User's Group (CUG), 19:45--74, 2010.

[5]

Duane Merrill, et al. Scalable gpu graph traversal. In ACM SIGPLAN Notices, volume 47, pages 117--128. ACM, 2012.

Digital Library

[6]

H. Liu et al. Enterprise: breadth-first graph traversal on GPUs. In SC '15: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, pages 1--12.

Digital Library

[7]

Shijie Zhou et al. Accelerating graph analytics on CPU-FPGA heterogeneous platform. In 2017 29th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD), pages 137--144. IEEE.

[8]

Shijie Zhou, et al. An FPGA framework for edge-centric graph processing. In Proceedings of the 15th ACM International Conference on Computing Frontiers - CF '18, pages 69--77. ACM Press.

Digital Library

[9]

Guoqing LEI, et al. TorusBFS: A novel message-passing parallel breadth-first search architecture on FPGAs. IRACST International Journal, 5(5):6.

[10]

Y. Umuroglu, et al. Hybrid breadth-first search on a single-chip FPGA-CPU heterogeneous platform. In 2015 25th International Conference on Field Programmable Logic and Applications (FPL), pages 1--8.

[11]

Michael Bauer, et al. Cudadma: optimizing gpu memory bandwidth via warp specialization. In SC, 2011.

Digital Library

[12]

Yangzihao Wang, et al. Gunrock: Gpu graph analytics. ACM Transactions on Parallel Computing (TOPC), 4(1):3, 2017.

Digital Library

[13]

Reynold S Xin, et al. Graphx: A resilient distributed graph system on spark. In First International Workshop on Graph Data Management Experiences and Systems, page 2. ACM, 2013.

Digital Library

[14]

Alexander Frolov, et al. Performance evaluation of breadth-first search on intel xeon phi. page 12.

[15]

Mireya Paredes, et al. Breadth first search vectorization on the intel xeon phi. In Proceedings of the ACM International Conference on Computing Frontiers, pages 1--10. ACM, 2016.

Digital Library

[16]

Shijie Zhou, et al. High-throughput and energy-efficient graph processing on FPGA. In 2016 IEEE 24th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM), pages 103--110. IEEE.

[17]

Grzegorz Malewicz, et al. Pregel: a system for large-scale graph processing. In Proceedings of the 2010 ACM SIGMOD International Conference on Management of data, pages 135--146. ACM, 2010.

Digital Library

[18]

Amitabha Roy, et al. X-stream: Edge-centric graph processing using streaming partitions. In Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles, pages 472--488. ACM, 2013.

Digital Library

[19]

Charles Eric LaForest et al. Efficient multi-ported memories for FPGAs. In Proceedings of the 18th annual ACM/SIGDA international symposium on Field programmable gate arrays - FPGA '10, page 41. ACM Press.

Digital Library

[20]

O. G. Attia, et al. CyGraph: A reconfigurable architecture for parallel breadth-first search. In 2014 IEEE International Parallel Distributed Processing Symposium Workshops, pages 228--235.

Digital Library

[21]

Jialiang Zhang, et al. Boosting the performance of FPGA-based graph processor using hybrid memory cube: A case for breadth first search.

[22]

Maohua Zhu, et al. Performance evaluation and optimization of hbm-enabled gpu for data-intensive applications. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 26(5):831--840, 2018.

[23]

Hang Liu et al. Graphene: fine-grained io management for graph computing. In Proceedings of the 15th Usenix Conference on File and Storage Technologies, pages 285--299. USENIX Association, 2017.

Digital Library

[24]

Hao Wei, et al. Speedup graph processing by graph ordering. In Proceedings of the 2016 International Conference on Management of Data, pages 1813--1828. ACM, 2016.

Digital Library

[25]

NVIDIA TESLA P100 GPU, https://images.nvidia.com/content/tesla/pdf/nvidiatesla-p100-pcie-datasheet.pdf.

Cited By

Li KXu SShao ZZheng RLiao XJin H(2024)ScalaBFS2: A High-performance BFS Accelerator on an HBM-enhanced FPGA ChipACM Transactions on Reconfigurable Technology and Systems10.1145/365003717:2(1-39)Online publication date: 29-Feb-2024
https://dl.acm.org/doi/10.1145/3650037
Olgu KKenter TNunez-Yanez JMcintosh-Smith S(2024)Optimisation and Evaluation of Breadth First Search with oneAPI/SYCL on Intel FPGAs: from Describing Algorithms to Describing ArchitecturesProceedings of the 12th International Workshop on OpenCL and SYCL10.1145/3648115.3648134(1-11)Online publication date: 8-Apr-2024
https://dl.acm.org/doi/10.1145/3648115.3648134
Su CDu LLiang TLin ZWang MSinha SZhang WZhang ZPutnam A(2024)GraFlex: Flexible Graph Processing on FPGAs through Customized Scalable Interconnection NetworkProceedings of the 2024 ACM/SIGDA International Symposium on Field Programmable Gate Arrays10.1145/3626202.3637573(143-153)Online publication date: 1-Apr-2024
https://dl.acm.org/doi/10.1145/3626202.3637573
Show More Cited By

Recommendations

ScalaBFS: A Scalable BFS Accelerator on FPGA-HBM Platform
FPGA '21: The 2021 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays

High Bandwidth Memory (HBM) provides massive aggregated memory bandwidth by exposing multiple memory channels to the processing units. To achieve high performance, an accelerator built on top of an FPGA configured with HBM (i.e., FPGA-HBM platform) ...
BFS-4K: An Efficient Implementation of BFS for Kepler GPU Architectures
Breadth-first search (BFS) is one of the most common graph traversal algorithms and the building block for a wide range of graph applications. With the advent of graphics processing units (GPUs), several works have been proposed to accelerate graph ...
ScalaBFS2: A High-performance BFS Accelerator on an HBM-enhanced FPGA Chip
The introduction of High Bandwidth Memory (HBM) to the FPGA chip makes it possible for an FPGA-based accelerator to leverage the huge memory bandwidth of HBM to improve its performance when implementing a specific algorithm, which is especially true for ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

DAC '19: Proceedings of the 56th Annual Design Automation Conference 2019

June 2019

1378 pages

ISBN:9781450367257

DOI:10.1145/3316781

Copyright © 2019 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGDA: ACM Special Interest Group on Design Automation
IEEE-CEDA

In-Cooperation

SIGBED: ACM Special Interest Group on Embedded Systems

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 02 June 2019

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Research-article
Research
Refereed limited

Funding Sources

National Science Foundation

Conference

DAC '19

Sponsor:

SIGDA

DAC '19: The 56th Annual Design Automation Conference 2019

June 2 - 6, 2019

NV, Las Vegas, USA

Acceptance Rates

Overall Acceptance Rate 1,770 of 5,499 submissions, 32%

Upcoming Conference

DAC '25

Sponsor:
sigda

62nd ACM/IEEE Design Automation Conference

June 22 - 26, 2025

San Francisco , CA , USA

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

10
Total Citations
View Citations
449
Total Downloads

Downloads (Last 12 months)96
Downloads (Last 6 weeks)9

Reflects downloads up to 18 Aug 2024

Other Metrics

View Author Metrics

Citations

Cited By

Li KXu SShao ZZheng RLiao XJin H(2024)ScalaBFS2: A High-performance BFS Accelerator on an HBM-enhanced FPGA ChipACM Transactions on Reconfigurable Technology and Systems10.1145/365003717:2(1-39)Online publication date: 29-Feb-2024
https://dl.acm.org/doi/10.1145/3650037
Olgu KKenter TNunez-Yanez JMcintosh-Smith S(2024)Optimisation and Evaluation of Breadth First Search with oneAPI/SYCL on Intel FPGAs: from Describing Algorithms to Describing ArchitecturesProceedings of the 12th International Workshop on OpenCL and SYCL10.1145/3648115.3648134(1-11)Online publication date: 8-Apr-2024
https://dl.acm.org/doi/10.1145/3648115.3648134
Su CDu LLiang TLin ZWang MSinha SZhang WZhang ZPutnam A(2024)GraFlex: Flexible Graph Processing on FPGAs through Customized Scalable Interconnection NetworkProceedings of the 2024 ACM/SIGDA International Symposium on Field Programmable Gate Arrays10.1145/3626202.3637573(143-153)Online publication date: 1-Apr-2024
https://dl.acm.org/doi/10.1145/3626202.3637573
Wang ZLai LLiu YShui BTian CZhong S(2024)Parallelization of butterfly counting on hierarchical memoryThe VLDB Journal10.1007/s00778-024-00856-x33:5(1453-1484)Online publication date: 7-Jun-2024
https://doi.org/10.1007/s00778-024-00856-x
Sahebi ABarbone MProcaccini MLuk WGaydadjiev GGiorgi R(2023)Distributed large-scale graph processing on FPGAsJournal of Big Data10.1186/s40537-023-00756-x10:1Online publication date: 4-Jun-2023
https://doi.org/10.1186/s40537-023-00756-x
Wang ZLai LLiu YShui BTian CZhong S(2023)I/O-Efficient Butterfly Counting at ScaleProceedings of the ACM on Management of Data10.1145/35887141:1(1-27)Online publication date: 30-May-2023
https://dl.acm.org/doi/10.1145/3588714
Chen XCheng FTan HChen YHe BWong WChen D(2022)ThunderGP: Resource-Efficient Graph Processing Framework on FPGAs with HLSACM Transactions on Reconfigurable Technology and Systems10.1145/351714115:4(1-31)Online publication date: 9-Dec-2022
https://dl.acm.org/doi/10.1145/3517141
Yan GLiu XChen FWang HHa Y(2022)Ultra-Fast FPGA Implementation of Graph Cut Algorithm With Ripple Push and Early TerminationIEEE Transactions on Circuits and Systems I: Regular Papers10.1109/TCSI.2021.313759069:4(1532-1545)Online publication date: Apr-2022
https://doi.org/10.1109/TCSI.2021.3137590
Chen XTan HChen YHe BWong WChen DShannon LAdler M(2021)ThunderGPThe 2021 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays10.1145/3431920.3439290(69-80)Online publication date: 17-Feb-2021
https://dl.acm.org/doi/10.1145/3431920.3439290
Liu HHuang HDan TDahlia M(2019)SIMD-XProceedings of the 2019 USENIX Conference on Usenix Annual Technical Conference10.5555/3358807.3358843(411-427)Online publication date: 10-Jul-2019
https://dl.acm.org/doi/10.5555/3358807.3358843

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents