research-article

Exploiting Locality for Irregular Scientific Codes

Authors:

Chau-Wen TsengAuthors Info & Claims

IEEE Transactions on Parallel and Distributed Systems, Volume 17, Issue 7

Pages 606 - 618

https://doi.org/10.1109/TPDS.2006.88

Published: 01 July 2006 Publication History

Abstract

Irregular scientific codes experience poor cache performance due to their irregular memory access patterns. In this paper, we present two new locality improving techniques for irregular scientific codes. Our techniques exploit geometric structures hidden in data access patterns and computation structures. Our new data reordering (Gpart) finds the graph structure within data accesses and applies hierarchical clustering. Quality partitions are constructed quickly by clustering multiple neighbor nodes with priority on nodes with high degree and repeating a few passes. Overhead is kept low by clustering multiple nodes in each pass and considering only edges between partitions. Our new computation reordering (Z-Sort) treats the values of index arrays as coordinates and reorders corresponding computations in Z-curve order. Applied to dense inputs, Z-Sort achieves performance close to data reordering combined with other computation reordering but without the overhead involved in data reordering. Experiments on irregular scientific codes for a variety of meshes show locality optimization techniques are effective for both sequential and parallelized codes, improving performance by 60-87 percent. Gpart achieved within 1-2 percent of the performance of more sophisticated partitioning algorithms, but with one third of the overhead. Z-Sort also yields the performance improvement of 64 percent for dense inputs, which is comparable with data reordering combined with computation reordering.

References

[1]

Kathryn S. McKinley, Steve Carr, Chau-Wen Tseng, Improving data locality with loop transformations, ACM Transactions on Programming Languages and Systems (TOPLAS), v.18 n.4, p.424-453, July 1996

Digital Library

[2]

M.E. Wolf and M. Lam, “A Data Locality Optimizing Algorithm,” Proc. SIGPLAN '91 Conf. Programming Language Design and Implementation, June 1991.

Digital Library

[3]

R. Das, D. Mavriplis, J. Saltz, S. Gupta, and R. Ponnusamy, “The Design and Implementation of a Parallel Unstructured Euler Solver Using Software Primitives,” Proc. 30th Aerospace Sciences Meeting and Exhibit, Jan. 1992.

[4]

I. Al-Furaih and S. Ranka, “Memory Hierarchy Management for Iterative Graph Structures,” Proc. 12th Int'l Parallel Processing Symp., Apr. 1998.

Digital Library

[5]

C. Ding and K. Kennedy, “Improving Cache Performance of Dynamic Applications with Computation and Data Layout Transformations,” Proc. SIGPLAN '99 Conf. Programming Language Design and Implementation, May 1999.

Digital Library

[6]

J. Mellor-Crummey, D. Whalley, and K. Kennedy, “Improving Memory Hierarchy Performance for Irregular Applications,” Proc. 1999 ACM Int'l Conf. Supercomputing, June 1999.

Digital Library

[7]

N. Mitchell, L. Carter, and J. Ferrante, “Localizing Non-Affine Array References,” Proc. Int'l Conf. Parallel Architectures and Compilation Techniques, Oct. 1999.

Digital Library

[8]

H. Han and C.-W. Tseng, “A Comparison of Locality Transformations for Irregular Codes,” Proc. Fifth Workshop Languages, Compilers, and Run-Time Systems for Scalable Computers, May 2000.

Digital Library

[9]

W. Liu and A. Sherman, “Comparative Analysis of the Cuthill-Mckee and the Reverse Cuthill-Mckee Ordering Algorithms for Sparse Matrices,” SIAM J. Numerical Analysis, vol. 13, no. 2, pp. 198-213, Apr. 1976.

[10]

E. Im and K. Yelick, “Model-Based Memory Hierarchy Optimizations for Sparse Matrices,” Proc. 1998 Workshop Profile and Feedback-Directed Compilation, Oct. 1998.

[11]

M. Berger and S. Bokhari, “A Partitioning Strategy for Non-Uniform Problems on Multiprocessors,” IEEE Trans. Computers, vol. 37, no. 12, pp. 570-580, Dec. 1987.

Digital Library

[12]

G. Karypis and V. Kumar, “A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs,” Proc. 24th Int'l Conf. Parallel Processing, Aug. 1995.

[13]

H. Han and C.-W. Tseng, “Improving Locality for Adaptive Irregular Codes,” Proc. 13th Workshop Languages and Compilers for Parallel Computing, Aug. 2000.

Digital Library

[14]

S. Tjiang, M.E. Wolf, M. Lam, K. Pieper, and J. Hennessy, “Integrating Scalar Optimization and Parallelization,” Proc. Fourth Int'l Workshop Languages and Compilers for Parallel Computing, U. Banerjee, D. Gelernter, A. Nicolau, and D. Padua, eds., Aug. 1991.

Digital Library

[15]

R. v Hanxleden and K. Kennedy, “Give-n-Take— A Balanced Code Placement Framework,” Proc. SIGPLAN '94 Conf. Programming Language Design and Implementation, June 1994.

Digital Library

[16]

G. Agarwal, J. Saltz, and R. Das, “Interprocedural Partial Redundancy Elimination and Its Application to Distributed Memory Compilation,” Proc. SIGPLAN '95 Conf. Programming Language Design and Implementation, June 1995.

Digital Library

[17]

Y.-S. Hwang, B. Moon, S. Sharma, R. Ponnusamy, R. Das, and J. Saltz, “Runtime and Language Support for Compiling Adaptive Irregular Programs on Distributed Memory Machines,” Software— Practice and Experience, vol. 25, no. 6, pp. 597-621, June 1995.

Digital Library

[18]

H. Yu and L. Rauchwerger, “Adaptive Reduction Parallelization Techniques,” Proc. 2000 ACM Int'l Conf. Supercomputing, May 2000.

Digital Library

[19]

B. Cmelik and D. Keppel, “Shade: A Fast Instruction-Set Simulator for Execution Profile,” Proc. ACM SIGMETRICS Conf. Measurement and Modeling of Computer Systems, 1994.

Digital Library

[20]

H. Han and C.-W. Tseng, “Locality Transformation Package v1.1,”

[21]

Parallel Computing, vol. 26, nos. 13-14, pp. 1861-1887, Dec. 2000.

[22]

W. Pottenger, “The Role of Associativity and Commutativity in the Detection and Transformation of Loop Level Parallelism,” Proc. 1998 ACM Int'l Conf. Supercomputing, July 1998.

Digital Library

[23]

Seema Hiranandani, Ken Kennedy, Chau-Wen Tseng, Compiling Fortran D for MIMD distributed-memory machines, Communications of the ACM, v.35 n.8, p.66-80, Aug. 1992

Digital Library

[24]

S. Hiranandani, K. Kennedy, and C.-W. Tseng, “Preliminary Experiences with the Fortran D Compiler,” Proc. Conf. Supercomputing '93, Nov. 1993.

[25]

IEEE Trans. Parallel and Distributed Systems, vol. 2, no. 4, pp. 440-451, Oct. 1991.

Digital Library

[26]

S. Chandra and J. Larus, “Optimizing Communication in HPF Programs for Fine-Grain Distributed Shared Memory,” Proc. Sixth ACM SIGPLAN Symp. Principles and Practice of Parallel Programming, June 1997.

Digital Library

[27]

H. Han and C.-W. Tseng, “Improving Compiler and Run-Time Support for Adaptive Irregular Codes,” Proc. Int'l Conf. Parallel Architectures and Compilation Techniques, Oct. 1998.

Digital Library

[28]

E. Gutierrez, O. Plata, and E. Zapata, “A Compiler Method for the Parallel Execution of Irregular Reductions on Scalable Shared Memory Multiprocessors,” Proc. 2000 ACM Int'l Conf. Supercomputing, May. 2000.

Digital Library

[29]

E. Gutierrez, O. Plata, and E. Zapata, “Data Partitioning-Based Parallel Irregular Reductions Describes Efficient Parallelization of Irregular Scientific Applications,” Concurrency and Computation: Practice and Experience, vol. 16, pp. 155-172, 2004.

Digital Library

[30]

D.E. Singh et al. “A Run-Time Framework for Parallelizing Loops with Irregular Accesses,” Proc. Seventh Workshop Languages, Compilers, and Run-Time Systems for Scalable Computers, Mar. 2002.

[31]

J. Parallel and Distributed Computing, vol. 22, no. 3, pp. 462-479, Sept. 1994.

[32]

M. Strout, L. Carter, and J. Ferrante, “Compile-Time Composition of Run-Time Data and Iteration Reorderings,” Proc. SIGPLAN '03 Conf. Programming Language Design and Implementation, June 2003.

Digital Library

[33]

M. Strout, L. Carter, J. Ferrante, J. Freeman, and B. Kreaseck, “Combining Performance Aspects of Irregular Gauss-Seidel via Sparse Tiling,” Proc. Int'l Conf. Computational Science, May 2001.

[34]

C. Douglas, J. Hu, M. Kowarschik, U. Rüde, and C. Weiss, “Digital Libraries and Autonomous Citation Indexing,” Electronic Trans. Numerical Analysis, vol. 10, pp. 21-40, Feb. 2000.

[35]

C. Ding and Y. Zhong, “Predicting Whole-Program Locality through Reuse Distance Analysis,” Proc. SIGPLAN '03 Conf. Programming Language Design and Implementation, June 2003.

Digital Library

[36]

M. Strout and P. Hovland, “Metrics and Models for Reordering Transformations,” Proc. Memory Systems Performance, June 2004.

Digital Library

Cited By

You XLiu CYang HWang PLuan ZQian D(2022)Vectorizing SpMV by Exploiting Dynamic Regular PatternsProceedings of the 51st International Conference on Parallel Processing10.1145/3545008.3545042(1-12)Online publication date: 29-Aug-2022
https://dl.acm.org/doi/10.1145/3545008.3545042
Ahrens WKjolstad FAmarasinghe SJhala RDillig I(2022)Autoscheduling for sparse tensor algebra with an asymptotic cost modelProceedings of the 43rd ACM SIGPLAN International Conference on Programming Language Design and Implementation10.1145/3519939.3523442(269-285)Online publication date: 9-Jun-2022
https://dl.acm.org/doi/10.1145/3519939.3523442
Aggarwal KBondhugula U(2020)Optimizing the Linear Fascicle Evaluation Algorithm for Multi-core and Many-core SystemsACM Transactions on Parallel Computing10.1145/34180757:4(1-45)Online publication date: 25-Nov-2020
https://dl.acm.org/doi/10.1145/3418075
Show More Cited By

Index Terms

Exploiting Locality for Irregular Scientific Codes
1. Hardware
  1. Integrated circuits
    1. Semiconductor memory
      1. Dynamic memory
2. Software and its engineering
  1. Software notations and tools
    1. Compilers
  2. Software organization and properties
    1. Contextual software domains
      1. Operating systems
        Memory management
        Garbage collection

Recommendations

Distributed memory code generation for mixed Irregular/Regular computations
PPoPP 2015: Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming

Many applications feature a mix of irregular and regular computational structures. For example, codes using adaptive mesh refinement (AMR) typically use a collection of regular blocks, where the number of blocks and the relationship between blocks is ...
Transforming Complex Loop Nests for Locality

Over the past 20 years, increases in processor speed have dramatically outstripped performance increases for standard memory chips. To bridge this gap, compilers must optimize applications so that data fetched into caches are reused before being ...
Distributed memory code generation for mixed Irregular/Regular computations
PPoPP '15

Many applications feature a mix of irregular and regular computational structures. For example, codes using adaptive mesh refinement (AMR) typically use a collection of regular blocks, where the number of blocks and the relationship between blocks is ...

Comments

Information & Contributors

Information

Published In

cover image IEEE Transactions on Parallel and Distributed Systems

IEEE Transactions on Parallel and Distributed Systems Volume 17, Issue 7

July 2006

143 pages

ISSN:1045-9219

Issue’s Table of Contents

Copyright © 2006.

Publisher

IEEE Press

Publication History

Published: 01 July 2006

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

50
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 10 Aug 2024

Other Metrics

View Author Metrics

Citations

Cited By

You XLiu CYang HWang PLuan ZQian D(2022)Vectorizing SpMV by Exploiting Dynamic Regular PatternsProceedings of the 51st International Conference on Parallel Processing10.1145/3545008.3545042(1-12)Online publication date: 29-Aug-2022
https://dl.acm.org/doi/10.1145/3545008.3545042
Ahrens WKjolstad FAmarasinghe SJhala RDillig I(2022)Autoscheduling for sparse tensor algebra with an asymptotic cost modelProceedings of the 43rd ACM SIGPLAN International Conference on Programming Language Design and Implementation10.1145/3519939.3523442(269-285)Online publication date: 9-Jun-2022
https://dl.acm.org/doi/10.1145/3519939.3523442
Aggarwal KBondhugula U(2020)Optimizing the Linear Fascicle Evaluation Algorithm for Multi-core and Many-core SystemsACM Transactions on Parallel Computing10.1145/34180757:4(1-45)Online publication date: 25-Nov-2020
https://dl.acm.org/doi/10.1145/3418075
Wang QZheng LZhao JLiao XJin HXue J(2020)A Conflict-free Scheduler for High-performance Graph Processing on Multi-pipeline FPGAsACM Transactions on Architecture and Code Optimization10.1145/339052317:2(1-26)Online publication date: 29-May-2020
https://dl.acm.org/doi/10.1145/3390523
Chatterjee KGoharshady AOkati NPavlogiannis A(2019)Efficient parameterized algorithms for data packingProceedings of the ACM on Programming Languages10.1145/32903663:POPL(1-28)Online publication date: 2-Jan-2019
https://dl.acm.org/doi/10.1145/3290366
Abubaker NAkbudak KAykanat C(2019)Spatiotemporal Graph and Hypergraph Partitioning Models for Sparse Matrix-Vector Multiplication on Many-Core ArchitecturesIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2018.286472930:2(445-458)Online publication date: 1-Feb-2019
https://dl.acm.org/doi/10.1109/TPDS.2018.2864729
Kislal OKotra JTang XKandemir MJung M(2018)Enhancing computation-to-core assignment with physical location informationACM SIGPLAN Notices10.1145/3296979.319238653:4(312-327)Online publication date: 11-Jun-2018
https://dl.acm.org/doi/10.1145/3296979.3192386
Wang XLiu WXue WWu L(2018)swSpTRSVACM SIGPLAN Notices10.1145/3200691.317851353:1(338-353)Online publication date: 10-Feb-2018
https://dl.acm.org/doi/10.1145/3200691.3178513
Grossman SLitz HKozyrakis C(2018)Making pull-based graph processing performantACM SIGPLAN Notices10.1145/3200691.317850653:1(246-260)Online publication date: 10-Feb-2018
https://dl.acm.org/doi/10.1145/3200691.3178506
Kislal OKotra JTang XKandemir MJung MFoster JGrossman D(2018)Enhancing computation-to-core assignment with physical location informationProceedings of the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation10.1145/3192366.3192386(312-327)Online publication date: 11-Jun-2018
https://dl.acm.org/doi/10.1145/3192366.3192386
Show More Cited By

View Options

View options

Media

Figures

Other

Tables

View Issue’s Table of Contents