Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1109/CGO.2005.10acmconferencesArticle/Chapter ViewAbstractPublication PagescgoConference Proceedingsconference-collections
Article

Combining Models and Guided Empirical Search to Optimize for Multiple Levels of the Memory Hierarchy

Published: 20 March 2005 Publication History

Abstract

This paper describes an algorithm for simultaneously optimizing across multiple levels of the memory hierarchy for dense-matrix computations. Our approach combines compiler models and heuristics with guided empirical search to take advantage of their complementary strengths. The models and heuristics limit the search to a small number of candidate implementations, and the empirical results provide the most accurate information to the compiler to select among candidates and tune optimization parameter values. We have developed an initial implementation and applied this approach to two case studies, Matrix Multiply and Jacobi Relaxation. For Matrix Multiply, our results on two architectures, SGI R10000 and Sun UltraSparc IIe, outperform the native compiler, and either outperform or achieve comparable performance as the ATLAS self-tuning library and the hand-tuned vendor BLAS library. Jacobi results also substantially outperform the native compilers.

References

[1]
{1} D. F. Bacon, J. Chow, R. Ju, K. Muthukumar, and V. Sarkar. A compiler framework for restructuring data declarations to enhance cache and TLB effectiveness. In Proc. of the Conference of the Center for Advanced Studies on Collaborative Research, Nov. 1994.
[2]
{2} J. Bilmes, K. Asanovi¿, C. Chin, and J. Demmel. Optimizing matrix multiply using PHiPAC: a portable, high-performance, ANSI C coding methodology. In Proc. of the International Conference on Supercomputing , June 1997.
[3]
{3} S. Browne, J. Dongarra, N. Garner, G. Ho, and P. Mucci. A portable programming interface for performance evaluation on modern processors. International Journal of High Performance Computing Applications, 14(3):189-204, Aug. 2000.
[4]
{4} E. Bugnion, J. M. Anderson, T. C. Mowry, M. Rosenblum, and M. S. Lam. Compiler-directed page coloring for multiprocessors. In Proc. of the International Conference on Architectural Support for Programming Languages and Operating Systems, Oct. 1996.
[5]
{5} S. Carr and K. Kennedy. Improving the ratio of memory operations to floating-point operations in loops. ACM Transactions on Programming Languages and Systems, 16(6):1768-1810, Nov. 1994.
[6]
{6} S. Chatterjee, E. Parker, P. J. Hanlon, and A. R. Lebeck. Exact analysis of the cache behavior of nested loops. In Proc. of the Conference on Programming Language Design and Implementation, June 2001.
[7]
{7} S. Coleman and K. S. McKinley. Tile size selection using cache organization and data layout. In Proc. of the Conference on Programming Language Design and Implementation, June 1995.
[8]
{8} K. D. Cooper, P. J. Schielke, and D. Subramanian. Optimizing for reduced code space using genetic algorithms. In Proc. of the Workshop on Languages, Compilers, and Tools for Embedded Systems, May 1999.
[9]
{9} M. Frigo. A fast Fourier transform compiler. In Proc. of the Conference on Programming Language Design and Implementation, May 1999.
[10]
{10} S. Ghosh, M. Martonosi, and S. Malik. Precise miss analysis for program transformations with caches of arbitrary associativity. In Proc. of the International Conference on Architectural Support for Programming Languages and Operating Systems, Oct. 1998.
[11]
{11} P. M. W. Knijnenburg, T. Kisuki, K. Gallivan, and M. F. P. O'Boyle. The effect of cache models on iterative compilation for combined tiling and unrolling. Concurrency and Computation: Practice and Experience , 16(2-3):247-270, 2004.
[12]
{12} M. S. Lam, E. E. Rothberg, and M. E. Wolf. The cache performance and optimizations of blocked algorithms. In Proc. of the International Conference on Architectural Support for Programming Languages and Operating Systems, Apr. 1991.
[13]
{13} N. Mitchell, K. Högstedt, L. Carter, and J. Ferrante. Quantifying the multi-level nature of tiling interactions. In Proc. of the Workshop on Languages and Compilers for Parallel Computing, Aug. 1997.
[14]
{14} N. Park, B. Hong, and V. K. Prasanna. Tiling, block data layout, and memory hierarchy performance. IEEE Transactions on Parallel and Distributed Systems , 14(7):640-654, July 2003.
[15]
{15} G. Pike and P. N. Hilfinger. Better tiling and array contraction for compiling scientific programs. In Proc. of Supercomputing '02, Nov. 2002.
[16]
{16} G. Rivera and C. Tseng. Data transformations for eliminating conflict misses. In Proc. of the Conference on Programming Language Design and Implementation , June 1998.
[17]
{17} B. Singer and M. Veloso. Stochastic search for signal processing algorithm optimization. In Proc. of Supercomputing '01, Nov. 2001.
[18]
{18} M. Stephenson, S. Amarasinghe, M. Rinard, and U. O'Reilly. Meta optimization: Improving compiler heuristics with machine learning. In Proc. of the Conference on Programming Language Design and Implementation , June 2003.
[19]
{19} O. Temam, E. D. Granston, and W. Jalby. To copy or not to copy: A compile-time technique for assessing when data copying should be used to eliminate cache conflicts. In Proc. of Supercomputing '93, Nov. 1993.
[20]
{20} X. Vera, J. Abella, A. González, and J. Llosa. Optimizing program locality through CMEs and GAs. In Proc. of the International Conference on Parallel Architectures and Compilation Techniques, Sept. 2003.
[21]
{21} R. C. Whaley, A. Petitet, and J. J. Dongarra. Automated empirical optimization of software and the ATLAS project. Parallel Computing, 27(1-2):3-35, Jan. 2001.
[22]
{22} M. E. Wolf. Improving Locality and Parallelism in Nested Loops. PhD thesis, Dept. of Computer Science, Stanford University, Aug. 1992.
[23]
{23} M. E. Wolf and M. S. Lam. A data locality optimizing algorithm. In Proc. of the Conference on Programming Language Design and Implementation, June 1991.
[24]
{24} M. E. Wolf, D. E. Maydan, and D.-K. Chen. Combining loop transformations considering caches and scheduling. In Proc. of the 29th Annual IEEE/ACM International Symposium on Microarchitecture, Dec. 1996.
[25]
{25} J. Xiong, J. Johnson, R. Johnson, and D. Padua. SPL: A language and compiler for DSP algorithms. In Proc. of the Conference on Programming Language Design and Implementation, June 2001.
[26]
{26} K. Yotov, X. Li, G. Ren, M. Cibulskis, G. DeJong, M. Garzaran, D. Padua, K. Pingali, P. Stodghill, and P. Wu. A comparison of empirical and model-driven optimization. In Proc. of the Conference on Programming Language Design and Implementation, June 2003.
[27]
{27} K. Yotov, X. Li, G. Ren, M. Garzaran, D. Padua, K. Pingali, and P. Stodghill. Is search really necessary to generate high-performance BLAS? Proceedings of the IEEE, 93(2), Feb. 2005.

Cited By

View all
  • (2024)Register Blocking: An Analytical Modelling Approach for Affine Loop KernelsProceedings of the 21st ACM International Conference on Computing Frontiers10.1145/3649153.3649194(71-79)Online publication date: 7-May-2024
  • (2022)QRANE: lifting QASM programs to an affine IRProceedings of the 31st ACM SIGPLAN International Conference on Compiler Construction10.1145/3497776.3517775(15-28)Online publication date: 19-Mar-2022
  • (2020)Automatic Generation of Multi-Objective Polyhedral Compiler TransformationsProceedings of the ACM International Conference on Parallel Architectures and Compilation Techniques10.1145/3410463.3414635(83-96)Online publication date: 30-Sep-2020
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
CGO '05: Proceedings of the international symposium on Code generation and optimization
March 2005
313 pages
ISBN:076952298X

Sponsors

Publisher

IEEE Computer Society

United States

Publication History

Published: 20 March 2005

Check for updates

Qualifiers

  • Article

Conference

CGO05

Acceptance Rates

CGO '05 Paper Acceptance Rate 26 of 75 submissions, 35%;
Overall Acceptance Rate 312 of 1,061 submissions, 29%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)2
  • Downloads (Last 6 weeks)0
Reflects downloads up to 12 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Register Blocking: An Analytical Modelling Approach for Affine Loop KernelsProceedings of the 21st ACM International Conference on Computing Frontiers10.1145/3649153.3649194(71-79)Online publication date: 7-May-2024
  • (2022)QRANE: lifting QASM programs to an affine IRProceedings of the 31st ACM SIGPLAN International Conference on Compiler Construction10.1145/3497776.3517775(15-28)Online publication date: 19-Mar-2022
  • (2020)Automatic Generation of Multi-Objective Polyhedral Compiler TransformationsProceedings of the ACM International Conference on Parallel Architectures and Compilation Techniques10.1145/3410463.3414635(83-96)Online publication date: 30-Sep-2020
  • (2019)Efficient hierarchical online-autotuningProceedings of the ACM International Conference on Supercomputing10.1145/3330345.3330377(354-366)Online publication date: 26-Jun-2019
  • (2019)Model-driven transformations for multi- and many-core CPUsProceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation10.1145/3314221.3314653(469-484)Online publication date: 8-Jun-2019
  • (2019)An Autotuning Protocol to Rapidly Build AutotunersACM Transactions on Parallel Computing10.1145/32915275:2(1-25)Online publication date: 4-Jan-2019
  • (2018)Revisiting Loop Tiling for DatacentersProceedings of the 2018 International Conference on Supercomputing10.1145/3205289.3205306(328-340)Online publication date: 12-Jun-2018
  • (2018)A Survey on Compiler Autotuning using Machine LearningACM Computing Surveys10.1145/319797851:5(1-42)Online publication date: 18-Sep-2018
  • (2017)RT-CUDAInternational Journal of Parallel Programming10.1007/s10766-016-0433-645:3(551-594)Online publication date: 1-Jun-2017
  • (2016)Continuous shape shiftingThe 49th Annual IEEE/ACM International Symposium on Microarchitecture10.5555/3195638.3195666(1-12)Online publication date: 15-Oct-2016
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media