research-article

An Efficient Implementation of the Bellman-Ford Algorithm for Kepler GPU Architectures

Authors:

Federico Busato,

Nicola BombieriAuthors Info & Claims

IEEE Transactions on Parallel and Distributed Systems, Volume 27, Issue 8

Pages 2222 - 2233

https://doi.org/10.1109/TPDS.2015.2485994

Published: 01 August 2016 Publication History

Abstract

Finding the shortest paths from a single source to all other vertices is a common problem in graph analysis. The Bellman-Ford's algorithm is the solution that solves such a single-source shortest path (SSSP) problem and better applies to be parallelized for many-core architectures. Nevertheless, the high degree of parallelism is guaranteed at the cost of low work efficiency, which, compared to similar algorithms in literature (e.g., Dijkstra's) involves much more redundant work and a consequent waste of power consumption. This article presents a parallel implementation of the Bellman-Ford algorithm that exploits the architectural characteristics of recent GPU architectures (i.e., NVIDIA Kepler, Maxwell) to improve both performance and work efficiency. The article presents different optimizations to the implementation, which are oriented both to the algorithm and to the architecture. The experimental results show that the proposed implementation provides an average speedup of <inline-formula><tex-math notation="LaTeX">$5 \times$ </tex-math><alternatives><inline-graphic xlink:type="simple" xlink:href="bombieri-ieq1-2485994.gif"/></alternatives></inline-formula> higher than the existing most efficient parallel implementations for SSSP, that it works on graphs where those implementations cannot work or are inefficient (e.g., graphs with negative weight edges, sparse graphs), and that it sensibly reduces the redundant work caused by the parallelization process.

References

[1]

T. Cormen, C. Leiserson, R. Rivest, and C. Stein, Introduction to Algorithms. Cambridge, MA, USA: MIT Press, 2009.

Digital Library

[2]

E. W. Dijkstra, “A note on two problems in connexion with graphs ”, Numerische Math., vol. 1, no. 1, pp. 269 –271, 1959.

Digital Library

[3]

R. Bellman, “On a routing problem”, Quart. Appl. Math., vol. 16, no. 1, pp. 87–90, 1958.

[4]

L. R. Ford, Network Flow Theory. Santa Monica, CA, USA : Rand Corp., 1956.

[5]

P. J. Martin, R. Torres, and A. Gavilanes, “CUDA solutions for the SSSP problem”, in Proc. 9th Int. Conf. Comput. Sci.: Part I, 2009, pp. 904–913.

Digital Library

[6]

H. Ortega-Arranz, Y. Torres, D. Llanos, and A. Gonzalez-Escribano, “A new GPU-based approach to the shortest path problem”, in Proc. Int. Conf. High Perform. Comput. Simul. 2013, pp. 505–511.

[7]

M. Burtscher, R. Nasre, and K. Pingali, “A quantitative study of irregular programs on GPUs”, in Proc. IEEE Int. Symp. Workload Characterization, 2012, pp. 141–151.

[8]

S. Hong and H. Kim, “An integrated GPU power and performance model”, in Proc. 37th Annu. Int. Symp. Comput. Archit., 2010, pp. 280–289.

[9]

A. Davidson, S. Baxter, M. Garland, and J. Owens, “Work-efficient parallel GPU methods for Single-source shortest paths”, in Proc. IEEE 28th Int. Parallel Distrib. Process. Symp. 2014, pp. 349–359.

[10]

X. Zhang, F. Yan, L. Tao, and D. Sung, “Optimal candidate set for opportunistic routing in asynchronous wireless sensor networks”, in Proc. 23rd Int. Conf. Comput. Commun. Netw. 2014, pp. 1–8.

[11]

M. Saad, “Joint optimal routing and power allocation for spectral efficiency in multihop wireless networks,” IEEE Trans. Wireless Commun., vol. 13, no. 5, pp. 2530–2539, May 2014 .

[12]

S. Klamt and A. von Kamp, “Computing paths and cycles in biological interaction graphs”, BMC Bioinformat., vol. 10, no. 6, pp. 1–11, 2014.

[13]

S. Lu, P. Weston, S. Hillmansen, H. Gooi, and C. Roberts, “Increasing the regenerative braking energy for railway vehicles,” IEEE Trans. Intell. Transp. Syst., vol. 15, no. 181, pp. 2506–2515, Dec. 2014.

[14]

NVIDIA. Maxwell architecture [Online]. Available: http://international.download.nvidia.com/geforce.com/international/pdfs/GeForce_GTX_980_Whitepaper_FINAL.PDF, 2015.

[15]

NVIDIA. Cuda home page [Online]. Available: https://developer.nvidia.com/cuda-zone, 2015.

[16]

NVIDIA. Kepler gk110 [Online]. Available: www.nvidia.com/content/PDF/kepler/NVIDIA -kepler-GK110-Architecture-Whitepaper.pdf, 2015.

[17]

B. V. Cherkassky, A. V. Goldberg, and T. Radzik, “ Shortest paths algorithms: Theory and experimental evaluation”, Math. Program., vol. 73, no. 2, pp. 129–174, 1996.

[18]

F. B. Zhan and C. E. Noon, “Shortest path algorithms: An evaluation using real road networks”, Transp. Sci., vol. 32, no. 1, pp. 65–73, 1998.

Digital Library

[19]

P. Harish and P. Narayanan, “Accelerating large graph algorithms on the GPU using cuda”, in Proc. 14th Int. Conf. High Perform. Comput., 2007, pp. 197–208 .

Digital Library

[20]

U. Meyer and P. Sanders, “$\Delta$-stepping: A parallelizable shortest path algorithm”, J. Algorithms, vol. 49, no. 1, pp. 114–152, 2003.

Digital Library

[21]

J. R. Crobak, J. W. Berry, K. Madduri, and D. A. Bader, “Advanced shortest paths algorithms on a massively-multithreaded architecture”, in Proc. IEEE Int. Parallel Distrib. Process. Symp., 2007, pp. 1–8.

[22]

V. Chakaravarthy, F. Checconi, F. Petrini, and Y. Sabharwal, “Scalable single source shortest path algorithms for massively parallel systems”, in Proc. IEEE 28th Int. Parallel Distrib. Process. Symp. 2014, pp. 889–901.

[23]

K. Kelley and T. B. Schardl, “Parallel single-source shortest paths”, MIT computer science and artificial intelligence laboratory, internal report, 2010.

[24]

H. N. Garbow, “Scaling algorithms for network problems”, J. Comput. Syst. Sci., vol. 31, no. 2, pp. 148– 168, 1985.

Digital Library

[25]

J. Shun and G. E. Blelloch, “Ligra: A lightweight graph processing framework for shared memory ”, ACM SIGPLAN Notices, vol. 48, no. 8. pp. 135 –146, 2013.

[26]

N. Edmonds, A. Breuer, D. Gregor, and A. Lumsdaine, “Single-source shortest paths with the parallel boost graph library”, in The Ninth DIMACS Implementation Challenge: The Shortest Path Problem, Piscataway, NJ, USA: AMS, 2006, pp. 219–248.

[27]

G. Malewicz, M. H. Austern, A. J. Bik, J. C. Dehnert, I. Horn, N. Leiser, and G. Czajkowski, “Pregel: A system for Large-scale graph processing”, in Proc. ACM SIGMOD Int. Conf. Manage. Data , 2010, pp. 135–146.

[28]

E. Deelman, G. Singh, M.-H. Su, J. Blythe, Y. Gil, C. Kesselman, G. Mehta, K. Vahi, G. B. Berriman, J. Good, et al., “Pegasus: A framework for mapping complex scientific workflows onto distributed systems”, Scientific Program., vol. 13, no. 3, pp. 219–237, 2005.

Digital Library

[29]

S. Odeh, O. Green, Z. Mwassi, O. Shmueli, and Y. Birk, “Merge Path-parallel merging made simple”, in Proc. IEEE 26th Int. Parallel Distrib. Process. Symp. Workshops PhD Forum, 2012, pp. 1611–1618.

[30]

R. Sedgewick and K. Wayne, Algorithms. New York, NY, USA: Pearson Education, 2011.

Digital Library

[31]

U. Pape, “Implementation and efficiency of Moore-algorithms for the shortest route problem”, Math. Program., vol. 7, no. 1, pp. 212–222, 1974.

Digital Library

[32]

D. Merrill, M. Garland, and A. Grimshaw, “Scalable GPU graph traversal”, in Proc. ACM SIGPLAN 17th Symp. Principles Practice Parallel Program., 2012, pp. 117–128.

[33]

F. Busato and N. Bombieri, “BFS-4K: An efficient implementation of BFS for kepler GPU architectures”, IEEE Trans. Parallel Distrib. Syst., vol. 26, no. 7, pp. 1826–1838, 2015.

[34]

NVIDIA, ( 2014). Parallel thread execution ISA version 4.1 [Online]. Available: http://docs.nvidia.com/cuda/parallel-thread-execution/, 1.

[35]

J. D. O. Mark Harris, Shubhabrata Sengupta, GPU Gems 3: Parallel Prefix Sum (Scan) With CUDA. Reading, MA, USA : Addison-Wesley, 2008, ch. 3.

[36]

S. Hong, S. K. Kim, T. Oguntebi, and K. Olukotun, “Accelerating CUDA graph algorithms at maximum warp”, in Proc. ACM 16th Symp. Principles Practice Parallel Program., 2011, pp. 267–276.

Digital Library

[37]

D. A. Bader, H. Meyerhenke, P. Sanders, and D. Wagner, “10th dimacs implementation challenge: Graph partitioning and graph clustering”, 2011.

[38]

D. A. Bader and K. Madduri, “GTgraph: A synthetic graph generator suite”, For the 9th DIMACS Implementation Challenge, 2006.

[39]

T. A. Davis and Y. Hu, “The university of florida sparse matrix collection”, ACM Trans. Math. Softw., vol. 38, no. 1, p. 1, 2011.

Digital Library

[40]

J. Leskovec and R. Sosič. (2014, Jun.) SNAP: A general purpose network analysis and graph mining library in C++ [Online]. Available: http://snap.stanford.edu/snap, 2015.

[41]

J. G. Siek, L.-Q. Lee, and A. Lumsdaine, Boost Graph Library: User Guide and Reference Manual, The. New York, NY, USA: Pearson Educ., 2001.

[42]

C. Demetrescu, A. Goldberg, and D. Johnson, “9th DIMACS implementation challenge–shortest paths”, AMS, 2006.

Cited By

Meng LShao YYuan LLai LCheng PLi XYu WZhang WLin XZhou J(2024)A Survey of Distributed Graph Algorithms on Massive GraphsACM Computing Surveys10.1145/369496657:2(1-39)Online publication date: 10-Oct-2024
https://dl.acm.org/doi/10.1145/3694966
Feng YWang HZhu YLiu XLu HLiu Q(2024)DAWN: Matrix Operation-Optimized Algorithm for Shortest Paths Problem on Unweighted GraphsProceedings of the 38th ACM International Conference on Supercomputing10.1145/3650200.3656600(1-13)Online publication date: 30-May-2024
https://dl.acm.org/doi/10.1145/3650200.3656600
Lee WShahzaad BAlkouz BBouguettaya A(2024)Reactive Composition of UAV Delivery Services in Urban EnvironmentsIEEE Transactions on Intelligent Transportation Systems10.1109/TITS.2024.339291425:10(13453-13466)Online publication date: 1-Oct-2024
https://dl.acm.org/doi/10.1109/TITS.2024.3392914
Show More Cited By

Index Terms

An Efficient Implementation of the Bellman-Ford Algorithm for Kepler GPU Architectures

Index terms have been assigned to the content through auto-classification.

Recommendations

Fast Parallel Bellman-Ford-Moore Algorithm Implementation for Small Graphs
Supercomputing
Abstract
We present a practical multicore solution for a classical routing problem known as Single Source Shortest Path (SSSP), or the Shortest Path Tree (SPT) search. Most of the practical graphs are relatively small and a commodity hardware can be used ...
Neural bellman-ford networks: a general graph neural network framework for link prediction
NIPS '21: Proceedings of the 35th International Conference on Neural Information Processing Systems

Link prediction is a very fundamental task on graphs. Inspired by traditional path-based methods, in this paper we propose a general and flexible representation learning framework based on paths for link prediction. Specifically, we define the ...
BFS-4K: An Efficient Implementation of BFS for Kepler GPU Architectures
Breadth-first search (BFS) is one of the most common graph traversal algorithms and the building block for a wide range of graph applications. With the advent of graphics processing units (GPUs), several works have been proposed to accelerate graph ...

Comments

Information & Contributors

Information

Published In

cover image IEEE Transactions on Parallel and Distributed Systems

IEEE Transactions on Parallel and Distributed Systems Volume 27, Issue 8

Aug. 2016

309 pages

ISSN:1045-9219

Issue’s Table of Contents

Copyright © 2015.

Publisher

IEEE Press

Publication History

Published: 01 August 2016

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

17
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 10 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Meng LShao YYuan LLai LCheng PLi XYu WZhang WLin XZhou J(2024)A Survey of Distributed Graph Algorithms on Massive GraphsACM Computing Surveys10.1145/369496657:2(1-39)Online publication date: 10-Oct-2024
https://dl.acm.org/doi/10.1145/3694966
Feng YWang HZhu YLiu XLu HLiu Q(2024)DAWN: Matrix Operation-Optimized Algorithm for Shortest Paths Problem on Unweighted GraphsProceedings of the 38th ACM International Conference on Supercomputing10.1145/3650200.3656600(1-13)Online publication date: 30-May-2024
https://dl.acm.org/doi/10.1145/3650200.3656600
Lee WShahzaad BAlkouz BBouguettaya A(2024)Reactive Composition of UAV Delivery Services in Urban EnvironmentsIEEE Transactions on Intelligent Transportation Systems10.1109/TITS.2024.339291425:10(13453-13466)Online publication date: 1-Oct-2024
https://dl.acm.org/doi/10.1109/TITS.2024.3392914
Cheng YHuang CJiang HXu XWang F(2024)An efficient SSSP algorithm on time-evolving graphs with prediction of computation resultsJournal of Parallel and Distributed Computing10.1016/j.jpdc.2023.104830186:COnline publication date: 1-Apr-2024
https://dl.acm.org/doi/10.1016/j.jpdc.2023.104830
Khanda AShovan SDas S(2023)A Parallel Algorithm for Updating a Multi-objective Shortest Path in Large Dynamic NetworksProceedings of the SC '23 Workshops of the International Conference on High Performance Computing, Network, Storage, and Analysis10.1145/3624062.3625134(739-746)Online publication date: 12-Nov-2023
https://dl.acm.org/doi/10.1145/3624062.3625134
Liu YAzami NVanausdal ABurtscher MMohror KArnold DBadia R(2023)Choosing the Best Parallelization and Implementation Styles for Graph Analytics Codes: Lessons Learned from 1106 ProgramsProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/3581784.3607038(1-14)Online publication date: 12-Nov-2023
https://dl.acm.org/doi/10.1145/3581784.3607038
Hijma PHeldens SSclocco Avan Werkhoven BBal H(2023)Optimization Techniques for GPU ProgrammingACM Computing Surveys10.1145/357063855:11(1-81)Online publication date: 16-Mar-2023
https://dl.acm.org/doi/10.1145/3570638
Khanda ACorò FDas SGeorgiou CSchiller EAli-Eldin AIosup A(2022)Drone-Truck Cooperated Delivery Under Time Varying DynamicsProceedings of the 2022 Workshop on Advanced tools, programming languages, and PLatforms for Implementing and Evaluating algorithms for Distributed systems10.1145/3524053.3542743(24-29)Online publication date: 25-Jul-2022
https://dl.acm.org/doi/10.1145/3524053.3542743
Aktılav BÖz I(2022)Performance and accuracy predictions of approximation methods for shortest-path algorithms on GPUsParallel Computing10.1016/j.parco.2022.102942112:COnline publication date: 1-Sep-2022
https://dl.acm.org/doi/10.1016/j.parco.2022.102942
Huang WXu ZZhu L(2022)The minimum regret path problem on stochastic fuzzy time-varying networksNeural Networks10.1016/j.neunet.2022.06.029153:C(450-460)Online publication date: 1-Sep-2022
https://dl.acm.org/doi/10.1016/j.neunet.2022.06.029
Show More Cited By

View Options

View options

Figures

Tables

Media

View Issue’s Table of Contents