article

R-Kleene: A High-Performance Divide-and-Conquer Algorithm for the All-Pair Shortest Path for Densely Connected Networks

Authors:

Paolo D'alberto,

Alexandru NicolauAuthors Info & Claims

Algorithmica, Volume 47, Issue 2

Pages 203 - 213

Published: 01 February 2007 Publication History

Abstract

We propose a novel divide-and-conquer algorithm for the solution of the all-pair shortest-path problem for directed and dense graphs with no negative cycles. We propose R-Kleene, a compact and in-place recursive algorithm inspired by Kleene's algorithm. R-Kleene delivers a better performance than previous algorithms for randomly generated graphs represented by highly dense adjacency matrices, in which the matrix components can have any integer value. We show that R-Kleene, unchanged and without any machine tuning, yields consistently between 1/7 and 1/2 of the peak performance running on five very different uniprocessor systems.

References

[1]

Cormen, T., Leiserson, C., Rivest, R.: Introduction to Algorithms. MIT Press, Cambridge, MA, 1990.

Digital Library

[2]

Ullman, J., Yannakakis, M.: The input/output complexity of transitive closure. In: Proceedings of the 1990 ACM SIGMOD International Conference on Management of Data, Volume 19, 1990.

Digital Library

[3]

V. Strassen: Gaussian elimination is not optimal. Numerische Mathematik 14 (1969), 354-356.

Digital Library

[4]

Coppersmith, D., Winograd, S.: Matrix multiplication via arithmetic progressions. In: Proceedings of the 19th Annual ACM Conference on Theory of Computing, pp. 1-6, 1987.

Digital Library

[5]

Zwick, U.: All pairs shortest paths using bridging sets and rectangular matrix multiplication. Journal of the ACM 49 (2002), 289-317.

Digital Library

[6]

Floyd, R.: Algorithm 97: Shortest path. Communications of the ACM 5 (1962).

[7]

Warshall, S.: A theorem on boolean matrices. Journal of the ACM 9 (1962).

[8]

Whaley, R. C., Dongarra, J. J.: Automatically tuned linear algebra software. In: Proceedings of the 1998 ACM/IEEE Conference on Supercomputing (CDROM), pp. 1-27. IEEE Computer Society, Los Alamitos, CA, 1998.

Digital Library

[9]

Blumofe, R., Frigo, M., Joerg, C., Leiserson, C., Randall, K.: Dag-consistent distributed shared memory. In: IPPS '96: Proceedings of the 10th International Parallel Processing Symposium, pp. 132-141. IEEE Computer Society, Los Alamitos, CA, 1996.

Digital Library

[10]

Chatterjee, S., Lebeck, A., Patnala, P., Thottethodi, M.: Recursive array layout and fast parallel matrix multiplication. In: Proceedings of 11th ACM SIGPLAN, 1999.

Digital Library

[11]

Zwick, U.: Exact and approximate distances in graphs--a survey. In: Proceedings of the 9th Annual European Symposium on Algorithms, pp. 33-48. Springer-Verlag, Berlin, 2001.

Digital Library

[12]

Dijkstra, E.: A note on two problems in connection with graphs. Numerische Mathematik (1959), 269-271.

[13]

Park, J., Penner, M., Prasanna, V.: Optimizing graph algorithms for improved cache performance. In: Proceedings of the International Parallel and Distributed Processing Symposium, 2002.

[14]

Penner, M., Prasanna, V.: Cache-friendly implementations of transitive closure. In: Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, 2001.

[15]

Sibeyn, S.: External matrix multiplication and all-pairs shortest path. Information Processing Letters 91 (2004), 99-106.

Digital Library

[16]

Cherkassky, B.V., Goldberg, A.V., Radzik, T.: Shortest paths algorithms: theory and experimental evaluation. In: Proceedings of the Fifth Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 516-525. Society for Industrial and Applied Mathematics, Philadelphia, PA, 1994.

[17]

Demetrescu, C., Italiano, G.F.: A new approach to dynamic all pairs shortest paths. In: Proceedings of the Thirty-Fifth ACM Symposium on Theory of Computing, pp. 159-166. ACMPress, New York, 2003.

[18]

Demetrescu, C., Emiliozzi, S., Italiano, G.F.: Experimental analysis of dynamic all pairs shortest path algorithms. In: Proceedings of the Fifteenth Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 369-378. Society for Industrial and Applied Mathematics, Philadelphia, PA, 2004.

Digital Library

[19]

Frens, J., Wise, D.: Auto-blocking matrix-multiplication or tracking BLAS3 performance from source code. In: Proceedings of the 1997 ACM Symposium on Principles and Practice of Parallel Programming, pp. 206-216, Volume 32, 1997.

Digital Library

[20]

Gustavson, F.: Recursion leads to automatic variable blocking for dense linear algebra algorithms. Journal of Research and Development 41 (1997).

[21]

Gustavson, F., Henriksson, A., Jonsson, I., Ling, P., Kagstrom, B.: Recursive blocked data formats and BLASs for dense linear algebra algorithms. In Verlag, S., ed.: PARA '98 Proceedings, pp. 195-206. Lecture Notes in Computing Science, No. 1541. Springer-Verlag, Berlin, 1998.

[22]

Hong, J., Kung, T.: I/o complexity, the red-blue pebble game. In: Proceedings of the 13th Annual ACM Symposium on Theory of Computing, pp. 326-333, 1981.

[23]

Jalby, E.G.W., Teman, O.: To copy or not to copy: a compile-time technique for assessing when data copying should be used to eliminate cache conflicts. In: Proceedings of Supercomputing, pp. 410-419, 1993.

[24]

Rothberg, M.L.E., Wolfe, M.: The cache performance and optimizations of blocked algorithms. In: Proceedings of the Fourth International Conference on Architectural Support for Programming Languages and Operating System, pp. 63-74, 1991.

[25]

Dayde, M., Duff, I.: A blocked implementation of level 3 BLAS for RISC processors. Technical Report TR PA 96 062, CERFACS, 1996. http://www.cerfacs.fr/algor/reports/TR_PA_96_06.ps.gz.

[26]

Bilardi, G., D'Alberto, P., Nicolau, A.: Fractal matrix multiplication: a case study on portability of cache performance. In: Workshop on Algorithm Engineering 2001, Aarhus, Denmark, 2001.

Digital Library

[27]

Chatterjee, S., Jain, V., Lebeck, A., Mundhra, S.: Nonlinear array layouts for hierarchical memory systems. In: Proceedings of ACM International Conference on Supercomputing, Rhodes, Greece, 1999.

Digital Library

Cited By

Jang MKo YGwon HJo IPark YKim SFrommholz IHopfgartner FLee MOakes MLalmas MZhang MSantos R(2023)SAGE: A Storage-Based Approach for Scalable and Efficient Sparse Generalized Matrix-Matrix MultiplicationProceedings of the 32nd ACM International Conference on Information and Knowledge Management10.1145/3583780.3615044(923-933)Online publication date: 21-Oct-2023
https://dl.acm.org/doi/10.1145/3583780.3615044
Li SHuai SLiu W(2023)An Efficient Gustavson-Based Sparse Matrix–Matrix Multiplication Accelerator on Embedded FPGAsIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2023.328171942:12(4671-4680)Online publication date: 1-Dec-2023
https://dl.acm.org/doi/10.1109/TCAD.2023.3281719
Liu JHe XLiu WTan G(2019)Register-Aware Optimizations for Parallel Sparse Matrix---Matrix MultiplicationInternational Journal of Parallel Programming10.1007/s10766-018-0604-847:3(403-417)Online publication date: 1-Jun-2019
https://dl.acm.org/doi/10.1007/s10766-018-0604-8
Show More Cited By

Recommendations

Clique r-Domination and Clique r-Packing Problems on Dually Chordal Graphs

Let $\cal C$ be a family of cliques of a graph G=(V,E). Suppose that each clique C of $\cal C$ is associated with an integer r(C)$, where $r(C) \ge 0$. A vertex v r-dominates a clique C of G if $d(v,x) \le r(C)$ for all $x \in C$, where d(v,x) is the ...
On r-acyclic edge colorings of planar graphs

A proper edge coloring of G is r-acyclic if every cycle C contained in G is colored with at least min{|C|,r} colors. The r-acyclic chromatic index of a graph, denoted by a"r^'(G), is the minimum number of colors required to produce an r-acyclic edge ...
The $$r$$r-acyclic chromatic number of planar graphs

A vertex coloring of a graph G is r -acyclic if it is a proper vertex coloring such that every cycle $$C$$ C receives at least $$\min \{|C|,r\}$$ min { | C | , r } colors. The $$r$$ r -acyclic chromatic number $$a_{r}(G)$$ a r ( G ) of $$G$$ G is the least number of colors in an $$r$$ r -acyclic coloring of $$G$$ G . Let $$G$$ G be a planar graph. By ...

Comments

Information & Contributors

Information

Published In

cover image Algorithmica

Algorithmica Volume 47, Issue 2

February 2007

94 pages

ISSN:0178-4617

Issue’s Table of Contents

Copyright © Copyright © 2007 Springer.

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 01 February 2007

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

7
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 01 Sep 2024

Other Metrics

View Author Metrics

Citations

Cited By

Jang MKo YGwon HJo IPark YKim SFrommholz IHopfgartner FLee MOakes MLalmas MZhang MSantos R(2023)SAGE: A Storage-Based Approach for Scalable and Efficient Sparse Generalized Matrix-Matrix MultiplicationProceedings of the 32nd ACM International Conference on Information and Knowledge Management10.1145/3583780.3615044(923-933)Online publication date: 21-Oct-2023
https://dl.acm.org/doi/10.1145/3583780.3615044
Li SHuai SLiu W(2023)An Efficient Gustavson-Based Sparse Matrix–Matrix Multiplication Accelerator on Embedded FPGAsIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2023.328171942:12(4671-4680)Online publication date: 1-Dec-2023
https://dl.acm.org/doi/10.1109/TCAD.2023.3281719
Liu JHe XLiu WTan G(2019)Register-Aware Optimizations for Parallel Sparse Matrix---Matrix MultiplicationInternational Journal of Parallel Programming10.1007/s10766-018-0604-847:3(403-417)Online publication date: 1-Jun-2019
https://dl.acm.org/doi/10.1007/s10766-018-0604-8
Chowdhury RRamachandran V(2018)Cache-Oblivious Buffer Heap and Cache-Efficient Computation of Shortest Paths in GraphsACM Transactions on Algorithms10.1145/314717214:1(1-33)Online publication date: 3-Jan-2018
https://dl.acm.org/doi/10.1145/3147172
Tran HCambria E(2018)A survey of graph processing on graphics processing unitsThe Journal of Supercomputing10.1007/s11227-017-2225-174:5(2086-2115)Online publication date: 1-May-2018
https://dl.acm.org/doi/10.1007/s11227-017-2225-1
Akbudak KAykanat C(2017)Exploiting Locality in Sparse Matrix-Matrix Multiplication on Many-Core ArchitecturesIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2017.265689328:8(2258-2271)Online publication date: 13-Jul-2017
https://dl.acm.org/doi/10.1109/TPDS.2017.2656893
Chowdhury RRamachandran VGibbons PScheideler C(2007)The cache-oblivious gaussian elimination paradigmProceedings of the nineteenth annual ACM symposium on Parallel algorithms and architectures10.1145/1248377.1248392(71-80)Online publication date: 9-Jun-2007
https://dl.acm.org/doi/10.1145/1248377.1248392

View Options

View options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Issue’s Table of Contents