Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
article

R-Kleene: A High-Performance Divide-and-Conquer Algorithm for the All-Pair Shortest Path for Densely Connected Networks

Published: 01 February 2007 Publication History

Abstract

We propose a novel divide-and-conquer algorithm for the solution of the all-pair shortest-path problem for directed and dense graphs with no negative cycles. We propose R-Kleene, a compact and in-place recursive algorithm inspired by Kleene's algorithm. R-Kleene delivers a better performance than previous algorithms for randomly generated graphs represented by highly dense adjacency matrices, in which the matrix components can have any integer value. We show that R-Kleene, unchanged and without any machine tuning, yields consistently between 1/7 and 1/2 of the peak performance running on five very different uniprocessor systems.

References

[1]
Cormen, T., Leiserson, C., Rivest, R.: Introduction to Algorithms. MIT Press, Cambridge, MA, 1990.
[2]
Ullman, J., Yannakakis, M.: The input/output complexity of transitive closure. In: Proceedings of the 1990 ACM SIGMOD International Conference on Management of Data, Volume 19, 1990.
[3]
V. Strassen: Gaussian elimination is not optimal. Numerische Mathematik 14 (1969), 354-356.
[4]
Coppersmith, D., Winograd, S.: Matrix multiplication via arithmetic progressions. In: Proceedings of the 19th Annual ACM Conference on Theory of Computing, pp. 1-6, 1987.
[5]
Zwick, U.: All pairs shortest paths using bridging sets and rectangular matrix multiplication. Journal of the ACM 49 (2002), 289-317.
[6]
Floyd, R.: Algorithm 97: Shortest path. Communications of the ACM 5 (1962).
[7]
Warshall, S.: A theorem on boolean matrices. Journal of the ACM 9 (1962).
[8]
Whaley, R. C., Dongarra, J. J.: Automatically tuned linear algebra software. In: Proceedings of the 1998 ACM/IEEE Conference on Supercomputing (CDROM), pp. 1-27. IEEE Computer Society, Los Alamitos, CA, 1998.
[9]
Blumofe, R., Frigo, M., Joerg, C., Leiserson, C., Randall, K.: Dag-consistent distributed shared memory. In: IPPS '96: Proceedings of the 10th International Parallel Processing Symposium, pp. 132-141. IEEE Computer Society, Los Alamitos, CA, 1996.
[10]
Chatterjee, S., Lebeck, A., Patnala, P., Thottethodi, M.: Recursive array layout and fast parallel matrix multiplication. In: Proceedings of 11th ACM SIGPLAN, 1999.
[11]
Zwick, U.: Exact and approximate distances in graphs--a survey. In: Proceedings of the 9th Annual European Symposium on Algorithms, pp. 33-48. Springer-Verlag, Berlin, 2001.
[12]
Dijkstra, E.: A note on two problems in connection with graphs. Numerische Mathematik (1959), 269-271.
[13]
Park, J., Penner, M., Prasanna, V.: Optimizing graph algorithms for improved cache performance. In: Proceedings of the International Parallel and Distributed Processing Symposium, 2002.
[14]
Penner, M., Prasanna, V.: Cache-friendly implementations of transitive closure. In: Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, 2001.
[15]
Sibeyn, S.: External matrix multiplication and all-pairs shortest path. Information Processing Letters 91 (2004), 99-106.
[16]
Cherkassky, B.V., Goldberg, A.V., Radzik, T.: Shortest paths algorithms: theory and experimental evaluation. In: Proceedings of the Fifth Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 516-525. Society for Industrial and Applied Mathematics, Philadelphia, PA, 1994.
[17]
Demetrescu, C., Italiano, G.F.: A new approach to dynamic all pairs shortest paths. In: Proceedings of the Thirty-Fifth ACM Symposium on Theory of Computing, pp. 159-166. ACMPress, New York, 2003.
[18]
Demetrescu, C., Emiliozzi, S., Italiano, G.F.: Experimental analysis of dynamic all pairs shortest path algorithms. In: Proceedings of the Fifteenth Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 369-378. Society for Industrial and Applied Mathematics, Philadelphia, PA, 2004.
[19]
Frens, J., Wise, D.: Auto-blocking matrix-multiplication or tracking BLAS3 performance from source code. In: Proceedings of the 1997 ACM Symposium on Principles and Practice of Parallel Programming, pp. 206-216, Volume 32, 1997.
[20]
Gustavson, F.: Recursion leads to automatic variable blocking for dense linear algebra algorithms. Journal of Research and Development 41 (1997).
[21]
Gustavson, F., Henriksson, A., Jonsson, I., Ling, P., Kagstrom, B.: Recursive blocked data formats and BLASs for dense linear algebra algorithms. In Verlag, S., ed.: PARA '98 Proceedings, pp. 195-206. Lecture Notes in Computing Science, No. 1541. Springer-Verlag, Berlin, 1998.
[22]
Hong, J., Kung, T.: I/o complexity, the red-blue pebble game. In: Proceedings of the 13th Annual ACM Symposium on Theory of Computing, pp. 326-333, 1981.
[23]
Jalby, E.G.W., Teman, O.: To copy or not to copy: a compile-time technique for assessing when data copying should be used to eliminate cache conflicts. In: Proceedings of Supercomputing, pp. 410-419, 1993.
[24]
Rothberg, M.L.E., Wolfe, M.: The cache performance and optimizations of blocked algorithms. In: Proceedings of the Fourth International Conference on Architectural Support for Programming Languages and Operating System, pp. 63-74, 1991.
[25]
Dayde, M., Duff, I.: A blocked implementation of level 3 BLAS for RISC processors. Technical Report TR PA 96 062, CERFACS, 1996. http://www.cerfacs.fr/algor/reports/TR_PA_96_06.ps.gz.
[26]
Bilardi, G., D'Alberto, P., Nicolau, A.: Fractal matrix multiplication: a case study on portability of cache performance. In: Workshop on Algorithm Engineering 2001, Aarhus, Denmark, 2001.
[27]
Chatterjee, S., Jain, V., Lebeck, A., Mundhra, S.: Nonlinear array layouts for hierarchical memory systems. In: Proceedings of ACM International Conference on Supercomputing, Rhodes, Greece, 1999.

Cited By

View all
  • (2023)SAGE: A Storage-Based Approach for Scalable and Efficient Sparse Generalized Matrix-Matrix MultiplicationProceedings of the 32nd ACM International Conference on Information and Knowledge Management10.1145/3583780.3615044(923-933)Online publication date: 21-Oct-2023
  • (2023)An Efficient Gustavson-Based Sparse Matrix–Matrix Multiplication Accelerator on Embedded FPGAsIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2023.328171942:12(4671-4680)Online publication date: 1-Dec-2023
  • (2019)Register-Aware Optimizations for Parallel Sparse Matrix---Matrix MultiplicationInternational Journal of Parallel Programming10.1007/s10766-018-0604-847:3(403-417)Online publication date: 1-Jun-2019
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Algorithmica
Algorithmica  Volume 47, Issue 2
February 2007
94 pages

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 01 February 2007

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 01 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2023)SAGE: A Storage-Based Approach for Scalable and Efficient Sparse Generalized Matrix-Matrix MultiplicationProceedings of the 32nd ACM International Conference on Information and Knowledge Management10.1145/3583780.3615044(923-933)Online publication date: 21-Oct-2023
  • (2023)An Efficient Gustavson-Based Sparse Matrix–Matrix Multiplication Accelerator on Embedded FPGAsIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2023.328171942:12(4671-4680)Online publication date: 1-Dec-2023
  • (2019)Register-Aware Optimizations for Parallel Sparse Matrix---Matrix MultiplicationInternational Journal of Parallel Programming10.1007/s10766-018-0604-847:3(403-417)Online publication date: 1-Jun-2019
  • (2018)Cache-Oblivious Buffer Heap and Cache-Efficient Computation of Shortest Paths in GraphsACM Transactions on Algorithms10.1145/314717214:1(1-33)Online publication date: 3-Jan-2018
  • (2018)A survey of graph processing on graphics processing unitsThe Journal of Supercomputing10.1007/s11227-017-2225-174:5(2086-2115)Online publication date: 1-May-2018
  • (2017)Exploiting Locality in Sparse Matrix-Matrix Multiplication on Many-Core ArchitecturesIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2017.265689328:8(2258-2271)Online publication date: 13-Jul-2017
  • (2007)The cache-oblivious gaussian elimination paradigmProceedings of the nineteenth annual ACM symposium on Parallel algorithms and architectures10.1145/1248377.1248392(71-80)Online publication date: 9-Jun-2007

View Options

View options

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media