Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

An Efficient Implementation of the Bellman-Ford Algorithm for Kepler GPU Architectures

Published: 01 August 2016 Publication History

Abstract

Finding the shortest paths from a single source to all other vertices is a common problem in graph analysis. The Bellman-Ford&#x0027;s algorithm is the solution that solves such a single-source shortest path (SSSP) problem and better applies to be parallelized for many-core architectures. Nevertheless, the high degree of parallelism is guaranteed at the cost of low work efficiency, which, compared to similar algorithms in literature (e.g., Dijkstra&#x0027;s) involves much more redundant work and a consequent waste of power consumption. This article presents a parallel implementation of the Bellman-Ford algorithm that exploits the architectural characteristics of recent GPU architectures (i.e., NVIDIA Kepler, Maxwell) to improve both performance and work efficiency. The article presents different optimizations to the implementation, which are oriented both to the algorithm and to the architecture. The experimental results show that the proposed implementation provides an average speedup of <inline-formula><tex-math notation="LaTeX">$5 \times$ </tex-math><alternatives><inline-graphic xlink:type="simple" xlink:href="bombieri-ieq1-2485994.gif"/></alternatives></inline-formula> higher than the existing most efficient parallel implementations for SSSP, that it works on graphs where those implementations cannot work or are inefficient (e.g., graphs with negative weight edges, sparse graphs), and that it sensibly reduces the redundant work caused by the parallelization process.

References

[1]
T. Cormen, C. Leiserson, R. Rivest, and C. Stein, Introduction to Algorithms. Cambridge, MA, USA: MIT Press, 2009.
[2]
E. W. Dijkstra, “A note on two problems in connexion with graphs ”, Numerische Math., vol. 1, no. 1, pp. 269 –271, 1959.
[3]
R. Bellman, “On a routing problem”, Quart. Appl. Math., vol. 16, no. 1, pp. 87–90, 1958.
[4]
L. R. Ford, Network Flow Theory. Santa Monica, CA, USA : Rand Corp., 1956.
[5]
P. J. Martin, R. Torres, and A. Gavilanes, “CUDA solutions for the SSSP problem”, in Proc. 9th Int. Conf. Comput. Sci.: Part I, 2009, pp. 904–913.
[6]
H. Ortega-Arranz, Y. Torres, D. Llanos, and A. Gonzalez-Escribano, “A new GPU-based approach to the shortest path problem”, in Proc. Int. Conf. High Perform. Comput. Simul. 2013, pp. 505–511.
[7]
M. Burtscher, R. Nasre, and K. Pingali, “A quantitative study of irregular programs on GPUs”, in Proc. IEEE Int. Symp. Workload Characterization, 2012, pp. 141–151.
[8]
S. Hong and H. Kim, “An integrated GPU power and performance model”, in Proc. 37th Annu. Int. Symp. Comput. Archit., 2010, pp. 280–289.
[9]
A. Davidson, S. Baxter, M. Garland, and J. Owens, “Work-efficient parallel GPU methods for Single-source shortest paths”, in Proc. IEEE 28th Int. Parallel Distrib. Process. Symp. 2014, pp. 349–359.
[10]
X. Zhang, F. Yan, L. Tao, and D. Sung, “Optimal candidate set for opportunistic routing in asynchronous wireless sensor networks”, in Proc. 23rd Int. Conf. Comput. Commun. Netw. 2014, pp. 1–8.
[11]
M. Saad, “Joint optimal routing and power allocation for spectral efficiency in multihop wireless networks,” IEEE Trans. Wireless Commun., vol. 13, no. 5, pp. 2530–2539, May 2014 .
[12]
S. Klamt and A. von Kamp, “Computing paths and cycles in biological interaction graphs”, BMC Bioinformat., vol. 10, no. 6, pp. 1–11, 2014.
[13]
S. Lu, P. Weston, S. Hillmansen, H. Gooi, and C. Roberts, “Increasing the regenerative braking energy for railway vehicles,” IEEE Trans. Intell. Transp. Syst., vol. 15, no. 181, pp. 2506–2515, Dec. 2014.
[15]
NVIDIA. Cuda home page [Online]. Available: https://developer.nvidia.com/cuda-zone, 2015.
[16]
NVIDIA. Kepler gk110 [Online]. Available: www.nvidia.com/content/PDF/kepler/NVIDIA -kepler-GK110-Architecture-Whitepaper.pdf, 2015.
[17]
B. V. Cherkassky, A. V. Goldberg, and T. Radzik, “ Shortest paths algorithms: Theory and experimental evaluation”, Math. Program., vol. 73, no. 2, pp. 129–174, 1996.
[18]
F. B. Zhan and C. E. Noon, “Shortest path algorithms: An evaluation using real road networks”, Transp. Sci., vol. 32, no. 1, pp. 65–73, 1998.
[19]
P. Harish and P. Narayanan, “Accelerating large graph algorithms on the GPU using cuda”, in Proc. 14th Int. Conf. High Perform. Comput., 2007, pp. 197–208 .
[20]
U. Meyer and P. Sanders, “$\Delta$-stepping: A parallelizable shortest path algorithm”, J. Algorithms, vol. 49, no. 1, pp. 114–152, 2003.
[21]
J. R. Crobak, J. W. Berry, K. Madduri, and D. A. Bader, “Advanced shortest paths algorithms on a massively-multithreaded architecture”, in Proc. IEEE Int. Parallel Distrib. Process. Symp., 2007, pp. 1–8.
[22]
V. Chakaravarthy, F. Checconi, F. Petrini, and Y. Sabharwal, “Scalable single source shortest path algorithms for massively parallel systems”, in Proc. IEEE 28th Int. Parallel Distrib. Process. Symp. 2014, pp. 889–901.
[23]
K. Kelley and T. B. Schardl, “Parallel single-source shortest paths”, MIT computer science and artificial intelligence laboratory, internal report, 2010.
[24]
H. N. Garbow, “Scaling algorithms for network problems”, J. Comput. Syst. Sci., vol. 31, no. 2, pp. 148– 168, 1985.
[25]
J. Shun and G. E. Blelloch, “Ligra: A lightweight graph processing framework for shared memory ”, ACM SIGPLAN Notices, vol. 48, no. 8. pp. 135 –146, 2013.
[26]
N. Edmonds, A. Breuer, D. Gregor, and A. Lumsdaine, “Single-source shortest paths with the parallel boost graph library”, in The Ninth DIMACS Implementation Challenge: The Shortest Path Problem, Piscataway, NJ, USA: AMS, 2006, pp. 219–248.
[27]
G. Malewicz, M. H. Austern, A. J. Bik, J. C. Dehnert, I. Horn, N. Leiser, and G. Czajkowski, “Pregel: A system for Large-scale graph processing”, in Proc. ACM SIGMOD Int. Conf. Manage. Data , 2010, pp. 135–146.
[28]
E. Deelman, G. Singh, M.-H. Su, J. Blythe, Y. Gil, C. Kesselman, G. Mehta, K. Vahi, G. B. Berriman, J. Good, et al., “Pegasus: A framework for mapping complex scientific workflows onto distributed systems”, Scientific Program., vol. 13, no. 3, pp. 219–237, 2005.
[29]
S. Odeh, O. Green, Z. Mwassi, O. Shmueli, and Y. Birk, “Merge Path-parallel merging made simple”, in Proc. IEEE 26th Int. Parallel Distrib. Process. Symp. Workshops PhD Forum, 2012, pp. 1611–1618.
[30]
R. Sedgewick and K. Wayne, Algorithms. New York, NY, USA: Pearson Education, 2011.
[31]
U. Pape, “Implementation and efficiency of Moore-algorithms for the shortest route problem”, Math. Program., vol. 7, no. 1, pp. 212–222, 1974.
[32]
D. Merrill, M. Garland, and A. Grimshaw, “Scalable GPU graph traversal”, in Proc. ACM SIGPLAN 17th Symp. Principles Practice Parallel Program., 2012, pp. 117–128.
[33]
F. Busato and N. Bombieri, “BFS-4K: An efficient implementation of BFS for kepler GPU architectures”, IEEE Trans. Parallel Distrib. Syst., vol. 26, no. 7, pp. 1826–1838, 2015.
[34]
NVIDIA, ( 2014). Parallel thread execution ISA version 4.1 [Online]. Available: http://docs.nvidia.com/cuda/parallel-thread-execution/, 1.
[35]
J. D. O. Mark Harris, Shubhabrata Sengupta, GPU Gems 3: Parallel Prefix Sum (Scan) With CUDA. Reading, MA, USA : Addison-Wesley, 2008, ch. 3.
[36]
S. Hong, S. K. Kim, T. Oguntebi, and K. Olukotun, “Accelerating CUDA graph algorithms at maximum warp”, in Proc. ACM 16th Symp. Principles Practice Parallel Program., 2011, pp. 267–276.
[37]
D. A. Bader, H. Meyerhenke, P. Sanders, and D. Wagner, “10th dimacs implementation challenge: Graph partitioning and graph clustering”, 2011.
[38]
D. A. Bader and K. Madduri, “GTgraph: A synthetic graph generator suite”, For the 9th DIMACS Implementation Challenge, 2006.
[39]
T. A. Davis and Y. Hu, “The university of florida sparse matrix collection”, ACM Trans. Math. Softw., vol. 38, no. 1, p. 1, 2011.
[40]
J. Leskovec and R. Sosič. (2014, Jun.) SNAP: A general purpose network analysis and graph mining library in C++ [Online]. Available: http://snap.stanford.edu/snap, 2015.
[41]
J. G. Siek, L.-Q. Lee, and A. Lumsdaine, Boost Graph Library: User Guide and Reference Manual, The. New York, NY, USA: Pearson Educ., 2001.
[42]
C. Demetrescu, A. Goldberg, and D. Johnson, “9th DIMACS implementation challenge–shortest paths”, AMS, 2006.

Cited By

View all
  • (2024)A Survey of Distributed Graph Algorithms on Massive GraphsACM Computing Surveys10.1145/369496657:2(1-39)Online publication date: 10-Oct-2024
  • (2024)DAWN: Matrix Operation-Optimized Algorithm for Shortest Paths Problem on Unweighted GraphsProceedings of the 38th ACM International Conference on Supercomputing10.1145/3650200.3656600(1-13)Online publication date: 30-May-2024
  • (2024)Reactive Composition of UAV Delivery Services in Urban EnvironmentsIEEE Transactions on Intelligent Transportation Systems10.1109/TITS.2024.339291425:10(13453-13466)Online publication date: 1-Oct-2024
  • Show More Cited By

Index Terms

  1. An Efficient Implementation of the Bellman-Ford Algorithm for Kepler GPU Architectures
        Index terms have been assigned to the content through auto-classification.

        Recommendations

        Comments

        Information & Contributors

        Information

        Published In

        cover image IEEE Transactions on Parallel and Distributed Systems
        IEEE Transactions on Parallel and Distributed Systems  Volume 27, Issue 8
        Aug. 2016
        309 pages

        Publisher

        IEEE Press

        Publication History

        Published: 01 August 2016

        Qualifiers

        • Research-article

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)0
        • Downloads (Last 6 weeks)0
        Reflects downloads up to 10 Feb 2025

        Other Metrics

        Citations

        Cited By

        View all
        • (2024)A Survey of Distributed Graph Algorithms on Massive GraphsACM Computing Surveys10.1145/369496657:2(1-39)Online publication date: 10-Oct-2024
        • (2024)DAWN: Matrix Operation-Optimized Algorithm for Shortest Paths Problem on Unweighted GraphsProceedings of the 38th ACM International Conference on Supercomputing10.1145/3650200.3656600(1-13)Online publication date: 30-May-2024
        • (2024)Reactive Composition of UAV Delivery Services in Urban EnvironmentsIEEE Transactions on Intelligent Transportation Systems10.1109/TITS.2024.339291425:10(13453-13466)Online publication date: 1-Oct-2024
        • (2024)An efficient SSSP algorithm on time-evolving graphs with prediction of computation resultsJournal of Parallel and Distributed Computing10.1016/j.jpdc.2023.104830186:COnline publication date: 1-Apr-2024
        • (2023)A Parallel Algorithm for Updating a Multi-objective Shortest Path in Large Dynamic NetworksProceedings of the SC '23 Workshops of the International Conference on High Performance Computing, Network, Storage, and Analysis10.1145/3624062.3625134(739-746)Online publication date: 12-Nov-2023
        • (2023)Choosing the Best Parallelization and Implementation Styles for Graph Analytics Codes: Lessons Learned from 1106 ProgramsProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/3581784.3607038(1-14)Online publication date: 12-Nov-2023
        • (2023)Optimization Techniques for GPU ProgrammingACM Computing Surveys10.1145/357063855:11(1-81)Online publication date: 16-Mar-2023
        • (2022)Drone-Truck Cooperated Delivery Under Time Varying DynamicsProceedings of the 2022 Workshop on Advanced tools, programming languages, and PLatforms for Implementing and Evaluating algorithms for Distributed systems10.1145/3524053.3542743(24-29)Online publication date: 25-Jul-2022
        • (2022)Performance and accuracy predictions of approximation methods for shortest-path algorithms on GPUsParallel Computing10.1016/j.parco.2022.102942112:COnline publication date: 1-Sep-2022
        • (2022)The minimum regret path problem on stochastic fuzzy time-varying networksNeural Networks10.1016/j.neunet.2022.06.029153:C(450-460)Online publication date: 1-Sep-2022
        • Show More Cited By

        View Options

        View options

        Figures

        Tables

        Media

        Share

        Share

        Share this Publication link

        Share on social media