Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1837274.1837289acmconferencesArticle/Chapter ViewAbstractPublication PagesdacConference Proceedingsconference-collections
research-article

An effective GPU implementation of breadth-first search

Published: 13 June 2010 Publication History
  • Get Citation Alerts
  • Abstract

    Breadth-first search (BFS) has wide applications in electronic design automation (EDA) as well as in other fields. Researchers have tried to accelerate BFS on the GPU, but the two published works are both asymptotically slower than the fastest CPU implementation. In this paper, we present a new GPU implementation of BFS that uses a hierarchical queue management technique and a three-layer kernel arrangement strategy. It guarantees the same computational complexity as the fastest sequential version and can achieve up to 10 times speedup.

    References

    [1]
    C. Rodrigues, D. Hardy, J. Stone, K. Schulten, and W. Hwu, "GPU acceleration of cutoff pair potentials for molecular modeling applications," in ACM International Conference on Computing Frontiers, 2008, pp. 273--282.
    [2]
    P. Harish and P. J. Narayanan, "Accelerating large graph algorithms on the GPU using CUDA," in IEEE High Performance Computing, 2007, pp 197--208.
    [3]
    Y. Deng, B. Wang, and S. Mu, "Taming irregular EDA applications on GPUs," in ICCAD, 2009, pp. 539--546.
    [4]
    C. Lauterbach, M. Garland, S. Sengupta, D. Luebke, and D. Manocha, "Fast BVH construction on GPUs," Computer Graphics Forum, vol. 28, no. 2., pp. 375--384, 2009.
    [5]
    A. Yoo, E. Chow, K. Henderson, W. Mcledon, B. Hendrickson, and Ü Catalyürek, "A scalable distributed parallel breadth-first search algorithm on Bluegene/L," in Proceedings of the ACM/IEEE Conference on Supercomputing, 2005, pp. 25--32.
    [6]
    D. A. Bader and K. Madduri, "Designing Multithreaded Algorithms for Breadth-First Search and st-connectivity on the Cray MTA-2," in ICPP, 2006, pp. 523--530.
    [7]
    D. P. Scarpazza, O. Villa, and F. Petrini, "Efficient breadth-first search on the Cell/BE processor," IEEE Trans. on Parallel Distributed Systems, vol. 19, no. 10, pp. 1381--1395, 2008.
    [8]
    T. Cormen, C. Leiserson, R. Rivest, and C. Stein, Introduction to Algorithms, MIT press, 2001.
    [9]
    http://www.dis.uniromal1.it/~challenge9/
    [10]
    S. Xiao and W. Feng, "Inter-block GPU communication via fast barrier synchronization," Technical Report TR-09-19, Dept. of Computer Science, Virginia Tech.
    [11]
    U. Meyer, "External memory BFS on undirected graphs with bounded degree," in SODA, 2001, pp. 87--88.

    Cited By

    View all
    • (2024)H3DM: A High-bandwidth High-capacity Hybrid 3D Memory Design for GPUsProceedings of the ACM on Measurement and Analysis of Computing Systems10.1145/36390388:1(1-28)Online publication date: 21-Feb-2024
    • (2024)TTN: Topological Transformer Network for Automated Coronary Artery Branch Labeling in Cardiac CT AngiographyIEEE Journal of Translational Engineering in Health and Medicine10.1109/JTEHM.2023.332903112(129-139)Online publication date: 2024
    • (2024)Allok: a machine learning approach for efficient graph execution on CPU–GPU clustersThe Journal of Supercomputing10.1007/s11227-024-06079-9Online publication date: 23-May-2024
    • Show More Cited By

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    DAC '10: Proceedings of the 47th Design Automation Conference
    June 2010
    1036 pages
    ISBN:9781450300025
    DOI:10.1145/1837274
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 13 June 2010

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. BFS
    2. CUDA
    3. GPU computing

    Qualifiers

    • Research-article

    Funding Sources

    Conference

    DAC '10
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 1,770 of 5,499 submissions, 32%

    Upcoming Conference

    DAC '25
    62nd ACM/IEEE Design Automation Conference
    June 22 - 26, 2025
    San Francisco , CA , USA

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)79
    • Downloads (Last 6 weeks)3

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)H3DM: A High-bandwidth High-capacity Hybrid 3D Memory Design for GPUsProceedings of the ACM on Measurement and Analysis of Computing Systems10.1145/36390388:1(1-28)Online publication date: 21-Feb-2024
    • (2024)TTN: Topological Transformer Network for Automated Coronary Artery Branch Labeling in Cardiac CT AngiographyIEEE Journal of Translational Engineering in Health and Medicine10.1109/JTEHM.2023.332903112(129-139)Online publication date: 2024
    • (2024)Allok: a machine learning approach for efficient graph execution on CPU–GPU clustersThe Journal of Supercomputing10.1007/s11227-024-06079-9Online publication date: 23-May-2024
    • (2023)An Evaluation of Directive-Based Parallelization on the GPU Using a Parboil BenchmarkElectronics10.3390/electronics1222455512:22(4555)Online publication date: 7-Nov-2023
    • (2023)A Decentralized Frontier Queue for Improving Scalability of Breadth-First-Search on GPUs2023 Design, Automation & Test in Europe Conference & Exhibition (DATE)10.23919/DATE56975.2023.10137054(1-6)Online publication date: Apr-2023
    • (2023)Optimization Techniques for GPU ProgrammingACM Computing Surveys10.1145/357063855:11(1-81)Online publication date: 16-Mar-2023
    • (2023)A Survey on Auto-Parallelism of Large-Scale Deep Learning TrainingIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2023.328193134:8(2377-2390)Online publication date: Aug-2023
    • (2023)Automatic CPU-GPU Allocation for Graph Execution2023 31st Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP)10.1109/PDP59025.2023.00013(27-34)Online publication date: Mar-2023
    • (2023)Cascading Failure Propagation and Mitigation Strategies in Power SystemsIEEE Systems Journal10.1109/JSYST.2023.324804417:2(3282-3293)Online publication date: Jun-2023
    • (2023)FAST-CON: a Multi-source Approach for Efficient S- T Connectivity on Sparse Graphs2023 IEEE High Performance Extreme Computing Conference (HPEC)10.1109/HPEC58863.2023.10363544(1-6)Online publication date: 25-Sep-2023
    • Show More Cited By

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media