Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Optimizing Graph Algorithms for Improved Cache Performance

Published: 01 September 2004 Publication History

Abstract

In this paper, we develop algorithmic optimizations to improve the cache performance of four fundamental graph algorithms. We present a cache-oblivious implementation of the Floyd-Warshall Algorithm for the fundamental graph problem of all-pairs shortest paths by relaxing some dependencies in the iterative version. We show that this implementation achieves the lower bound on processor-memory traffic of \Omega (N^3/\sqrt{C}), where N and C are the problem size and cache size, respectively. Experimental results show that this cache-oblivious implementation shows more than six times the improvement in real execution time over that of the iterative implementation with the usual row major data layout, on three state-of-the-art architectures. Second, we address Dijkstra's algorithm for the single-source shortest paths problem and Prim's algorithm for minimum spanning tree problem. For these algorithms, we demonstrate up to two times the improvement in real execution time by using a simple cache-friendly graph representation, namely adjacency arrays. Finally, we address the matching algorithm for bipartite graphs. We show performance improvements of two to three times in real execution time by using the technique of making the algorithm initially work on subproblems to generate a suboptimal solution and, then, solving the whole problem using the suboptimal solution as a starting point. Experimental results are shown for the Pentium III, UltraSPARC III, Alpha 21264, and MIPS R12000 machines.

References

[1]
ADVISOR Project, http://advisor.usc.edu/, 2001.]]
[2]
M. Brenner, “Multiagent Planning with Partially Ordered Temporal Plans,” Proc Int'l Joint Conf. Artificial Intelligence, 2003.]]
[3]
D. Burger and T. Austin, “The SimpleScalar Tool Set, Version 2.0,” Univ. of Wisconsin-Madison Computer Sciences Dept. Technical Report #1342, 1997.]]
[4]
J. Carter W. Hsieh L. Stoller M. Swanson L. Zhang and S. McKee, “Impulse: Memory System Support for Scientific Applications,” J. Scientific Programming, vol. 7, nos. 3-4, 1999.]]
[5]
S. Chatterjee V. Jain A. Lebeck S. Mundhra and M. Thottethodi, “Nonlinear Array Layouts for Hierarchical Memory Systems,” Proc. ACM Symp. Parallel Algorithms and Architectures, 1999.]]
[6]
T. Chilimbi M. Hill and J. Larus, “Cache-Conscious Structure Layout,” Proc. ACM SIGPLAN Conf. Programming Language Design and Implementation, 1999.]]
[7]
T. Cormen C. Leiserson and R. Rivest, Introduction to Algorithms. MIT Press, 1990.]]
[8]
N. Dutt P. Panda and A. Nicolau, “Data Organization for Improved Performance in Embedded Processor Applications,” ACM Trans. Design Automation of Electronic Systems, vol. 2, no. 4, Oct. 1997.]]
[9]
J. Frens and D. Wise, “Auto-Blocking Matrix-Multiplication or Tracking BLAS3 Performance from Source Code,” Proc. Sixth ACM SIGPLAN Symp. Principles and Practice of Parallel Programming, June 1997.]]
[10]
M. Frigo C.E. Leiserson H. Prokop and S. Ramachandran, “Cache-Oblivious Algorithms,” Proc. 40th Ann. Symp. Foundations of Computer Science, pp. 17-18, Oct. 1999.]]
[11]
R. Gallagher and D. Bertsekas, Data Networks. Prentice Hall, 1987.]]
[12]
S. Gerez, Algorithms for VLSI Design Automation. Wiley, 1998.]]
[13]
A. Gonzalez M. Valero N. Topham and J.M. Parcerisa, “Eliminating Cache Conflict Misses through XOR-Based Placement Functions,” Proc. 1997 Int'l Conf. Supercomputing, July 1997.]]
[14]
J. Hong and H. Kung, “I/O Complexity: The Red Blue Pebble Game,” Proc. ACM Symp. Theory of Computing, 1981.]]
[15]
M. Kallahalla and P.J. Varman, “Optimal Prefetching and Caching for Parallel I/O Systems,” Proc. 13th ACM Symp. Parallel Algorithms and Architectures, 2001.]]
[16]
M. Lam E. Rothberg and M. Wolf, “The Cache Performance and Optimizations of Blocked Algorithms,” Proc. Fourth Int'l Conf. Architectural Support for Programming Languages and Operating Systems, Apr. 1991.]]
[17]
A. LaMarca and R. Ladner, “The Influence of Caches on the Performance of Heaps,” ACM J. Experimental Algorithmics, vol. 1, 1996.]]
[18]
E. Lawler, Combinatorial Optimization: Networks and Matroids. New York: Holt, Rhinehart, and Winston, 1976.]]
[19]
R. Murphy and P.M. Kogge, “The Characterization of Data Intensive Memory Workloads on Distributed PIM Systems,” Proc. Intelligent Memory Systems Workshop, ASPLOS-IX 2000, Nov. 2000.]]
[20]
A. Nakaya S. Goto and M. Kanehisa, “Extraction of Correlated Gene Clusters by Multiple Graph Comparison,” Genome Informatics, vol. 12, 2001.]]
[21]
J. Park M. Penner and V.K. Prasanna, “Optimizing Graph Algorithms for Improved Cache Performance,” Technical Report USC-CENG 03-03, Dept. of Electrical Eng., Univ. of Southern California, Nov. 2003.]]
[22]
N. Park B. Hong and V. Prasanna, “Tiling, Block Data Layout, and Memory Hierarchy Performance,” IEEE Trans. Parallel and Distributed Systems, vol. 14, no. 7, July 2003.]]
[23]
N. Park B. Hong and V. Prasanna, “Analysis of Memory Hierarchy Performance of Block Data Layout,” Proc. Int'l Conf. Parallel Processing (ICPP), Aug. 2002.]]
[24]
N. Park D. Kang K. Bondalapati and V. Prasanna, “Dynamic Data Layouts for Cache-Conscious Factorization of the DFT,” Proc. Int'l Parallel and Distributed Processing Symp., May 2000.]]
[25]
D. Patterson and J. Hennessy, Computer Architecture: A Quantitative Approach, second ed. San Francisco, Calif.: Morgan Kaufmann, 1996.]]
[26]
M. Penner and V. Prasanna, “Cache-Friendly Implementations of Transitive Closure,” Proc. Int'l Conf. Parallel Architectures and Compiler Techniques, Sept. 2001.]]
[27]
G. Rivera and C. Tseng, “Data Transformations for Eliminating Conflict Misses,” Proc. 1998 ACM SIGPLAN Conf. Programming Language Design and Implementation, June 1998.]]
[28]
F. Rastello and Y. Robert, “Loop Partitioning Versus Tiling for Cache-Based Multiprocessor,” Proc. Int'l Conf. Parallel and Distributed Computing and Systems, 1998.]]
[29]
S. Sahni, Data Structures, Algorithms, and Applications in Java. New York: McGraw Hill, 2000.]]
[30]
P. Sanders, “Fast Priority Queues for Cached Memory,” ACM J. Experimental Algorithmics, vol. 5, 2000.]]
[31]
S. Sarawagi R. Agrawal and A. Gupta, “On Computing the Data Cube,” Research Report 10026, IBM Almaden Research Center, San Jose, Calif., 1996.]]
[32]
S. Sen and S. Chatterjee, “Towards a Theory of Cache-Efficient Algorithms,” Proc. Symp. Discrete Algorithms, 2000.]]
[33]
SPIRAL Project, http://www.ece.cmu.edu/~spiral/, 2004.]]
[34]
G. Venkataraman S. Sahni and S. Mukhopadhyaya, “A Blocked All-Pairs Shortest-Paths Algorithm,” Proc. Scandinavian Workshop Algorithms and Theory, 2000.]]
[35]
D. Weikle S. McKee and W. Wulf, “Caches as Filters: A New Approach to Cache Analysis,” Proc. Grace Murray Hopper Conf., Sept. 2000.]]
[36]
R. Whaley and J. Dongarra, “Automatically Tuned Linear Algebra Software,” High Performance Computing and Networking, Nov. 1998.]]
[37]
M. Yannakakis, “Graph Theoretic Methods in Database Theory,” Proc. ACM Conf. Principles of Database Systems, 1990.]]

Cited By

View all
  • (2022)Accelerated butterfly counting with vertex priority on bipartite graphsThe VLDB Journal — The International Journal on Very Large Data Bases10.1007/s00778-022-00746-032:2(257-281)Online publication date: 16-May-2022
  • (2021)Communication Avoiding All-Pairs Shortest Paths Algorithm for Sparse GraphsProceedings of the 50th International Conference on Parallel Processing10.1145/3472456.3472524(1-10)Online publication date: 9-Aug-2021
  • (2020)GraptorProceedings of the 34th ACM International Conference on Supercomputing10.1145/3392717.3392753(1-13)Online publication date: 29-Jun-2020
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image IEEE Transactions on Parallel and Distributed Systems
IEEE Transactions on Parallel and Distributed Systems  Volume 15, Issue 9
September 2004
96 pages

Publisher

IEEE Press

Publication History

Published: 01 September 2004

Author Tags

  1. 65
  2. Cache-friendly algorithms
  3. algorithm performance.
  4. cache-oblivious algorithms
  5. data layout optimizations
  6. graph algorithms
  7. graph matching
  8. minimum spanning trees
  9. shortest path

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 04 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2022)Accelerated butterfly counting with vertex priority on bipartite graphsThe VLDB Journal — The International Journal on Very Large Data Bases10.1007/s00778-022-00746-032:2(257-281)Online publication date: 16-May-2022
  • (2021)Communication Avoiding All-Pairs Shortest Paths Algorithm for Sparse GraphsProceedings of the 50th International Conference on Parallel Processing10.1145/3472456.3472524(1-10)Online publication date: 9-Aug-2021
  • (2020)GraptorProceedings of the 34th ACM International Conference on Supercomputing10.1145/3392717.3392753(1-13)Online publication date: 29-Jun-2020
  • (2020)Integrating Cache Oblivious Approach with Modern Processor ArchitectureProceedings of the International Conference on High Performance Computing in Asia-Pacific Region10.1145/3368474.3368477(123-130)Online publication date: 15-Jan-2020
  • (2020)Closing the Gap Between Cache-oblivious and Cache-adaptive AnalysisProceedings of the 32nd ACM Symposium on Parallelism in Algorithms and Architectures10.1145/3350755.3400274(63-73)Online publication date: 6-Jul-2020
  • (2019)Vertex priority based butterfly counting for large-scale bipartite networksProceedings of the VLDB Endowment10.14778/3339490.333949712:10(1139-1152)Online publication date: 1-Jun-2019
  • (2019)GCache: Neighborhood-Guided Graph Caching in a Distributed EnvironmentIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2019.291530030:11(2463-2477)Online publication date: 1-Nov-2019
  • (2018)Cache-Oblivious Buffer Heap and Cache-Efficient Computation of Shortest Paths in GraphsACM Transactions on Algorithms10.1145/314717214:1(1-33)Online publication date: 3-Jan-2018
  • (2016)Exposing the Locality of Heterogeneous Memory Architectures to HPC ApplicationsProceedings of the Second International Symposium on Memory Systems10.1145/2989081.2989115(30-39)Online publication date: 3-Oct-2016
  • (2016)Speedup Graph Processing by Graph OrderingProceedings of the 2016 International Conference on Management of Data10.1145/2882903.2915220(1813-1828)Online publication date: 26-Jun-2016
  • Show More Cited By

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media