Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/258915.258946acmconferencesArticle/Chapter ViewAbstractPublication PagespldiConference Proceedingsconference-collections
Article
Free access

Data-centric multi-level blocking

Published: 01 May 1997 Publication History

Abstract

We present a simple and novel framework for generating blocked codes for high-performance machines with a memory hierarchy. Unlike traditional compiler techniques like tiling, which are based on reasoning about the control flow of programs, our techniques are based on reasoning directly about the flow of data through the memory hierarchy. Our data-centric transformations permit a more direct solution to the problem of enhancing data locality than current control-centric techniques do, and generalize easily to multiple levels of memory hierarchy. We buttress these claims with performance numbers for standard benchmarks from the problem domain of dense numerical linear algebra. The simplicity and intuitive appeal of our approach should make it attractive to compiler writers as well as to library writers.

References

[1]
Ramesh C. Agarwal and Fred G. Gustavson. Algorithm and Architecture Aspects of Producing ESSL BLAS on POWERS.
[2]
E. Anderson, Z. Bat, C. Bischof, J. Demreel, J. Dongarra, J. Du Croz, A. Greenbaum, S. Hammarling, A. McKenney, S. Ostrouchov, and D. Sorensen, editors. LAPA CK Users' Guide. Sec. ond Edition. SIAM, Philadelphia, 1995.
[3]
Jennifer Anderson, Saman Amarsinghe, and Monica Lain. Data and computation transformations for multiprocessors. In A CM Symposium on Principles and Practice of Parallel Programming, Jun 1995.
[4]
U. Banerjee. Unimodular transformations of double loops. In Proceedings of the Workshop on Ad. vances in Languages and Compilers for Parallel Processing, pages 192-219, August 1990.
[5]
David Bau, Induprakas Kodukula, Vladimir Kotlyar, Keshav Pingali, and Paul Stodghil. Solving alignment using elementary linear algebra. In Proceedings of the 7th LCPC Workshop, August 1994. Also available as Cornell Computer Science Dept. tech report TR95-1478.
[6]
Pierre Boulet, Alain Darte, Tanguy Risset, and Yves Robert. (Pen)-ultimate tiling? In INTE- GRATION, the VLSI Journal, volume 17, pages 33-51. 1994.
[7]
Steve Carr and K. Kennedy. Compiler blockability of numerical algorithms. In Supercompating, 1992.
[8]
Steve Cart and R. B. Lehoucq. Compiler blockability of dense matrix factorizations. Technical report, Argonne National Laboratory, Oct 1996.
[9]
Steven Cart and R. B. Lehoucq. A compilerblockable algorithm for QR decomposition, 1994.
[10]
L. Carter, J. Ferrante, and S. Flynn Hummel. Hierarchical tiling for improved superscalar performance. In International Parallel Processing Sym. posiam, April 1995.
[11]
Michael Cierniak and Wet Li. Unifying data and control transformations for distributed shared memory machines. In $IGPLAN 1995 conference on Programming Languages Design and Implementation, Jun 1995.
[12]
Stephanie Coleman and Kathryn S. McKinley. Tile size selection using cache organization and data layout, in David W. Wail, editor, A CM SIGPLAN '95 Con/erence on Programming Language Design and Implementation (PLDI), volume 30(6) of A CM $IGPLAN Notices, pages 279-290, New York, NY, USA, June 1995. ACM Press.
[13]
Jim Demmel. Personal communication, Sep 1996.
[14]
Jack Dongarra and Robert Schreiber. Automatic blocking of nested loops. Technical Report UT-CS- 90-108, Department of Computer Science, University of Tennessee, May 1990.
[15]
Gene Golub and Charles Van Loan. Matrix Com. putations. The Johns Hopkins University Press, 1996.
[16]
Monica S. Lain, Edward E. Rothberg, and Michael E. Wolf. The cache performance and optimizations of blocked algorithms. In Proceedings of the Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, pages 63-74, Santa Clara, California, April 8-11, 1991. ACM SIGARCH, SIG- PLAN, SIGOPS, and the IEEE Computer Society.
[17]
W. Li and K. Pingali. Access Normalization: Loop restructuring for NUMA compilers. A CM Transactions on Computer Systems, 1993.
[18]
Kathryn S. McKinley, Steve Carr, and Chau-Wen Tseng. Improving data locality with loop transformations. In A CM Transactions on Programming Languages and Systems, volume 18, pages 424-453. july 1996.
[19]
W. Pugh. A practical algorithm for exact array dependency analysis. Comm. of the A CM, 35(8):102, August 1992.
[20]
J. Ramanujam and P. Sadayappan. Tiling multidimensional iteration spaces for multicomputers. Journal of Parallel and Distributed Computing, 16(2):108-120, October 1992.
[21]
A. Rogers and K. Pingali. Process decomposition through locality of reference. In SIGPLAN89 conference on Programming Languages, Design and Implementation, Jun 1989.
[22]
Vivek Sarkar. Automatic selection of high order transformations in the IBM ASTI optimizer. Technical Report ADTI-96-004, Application Development Technology Institute, IBM Software Solutions Division, July 1996. Submitted to special issue of IBM Journal of Research and Development.
[23]
M.E. Wolf and M.S. Lam. A data locality optimizing algorithm. In SIGPLAN 1991 conference on Programming Languages Design and Implementation, Jun 1991.
[24]
M. Wolfe. Iteration space tiling for memory hierarchies. In Third SIAM Conference on Parallel Pro. cessing for Scientific Computing, December 1987.
[25]
M. Wolfe. High Performance Compilers for Parallel Computing. Addison-Wesley Publishing Company, 1995.

Cited By

View all
  • (2023)Rank-Polymorphism for Shape-Guided BlockingProceedings of the 11th ACM SIGPLAN International Workshop on Functional High-Performance and Numerical Computing10.1145/3609024.3609410(1-14)Online publication date: 30-Aug-2023
  • (2022)Mix and Match: Reorganizing Tasks for Enhancing Data LocalityACM SIGMETRICS Performance Evaluation Review10.1145/3543516.346010349:1(47-48)Online publication date: 7-Jun-2022
  • (2021)Mix and Match: Reorganizing Tasks for Enhancing Data LocalityProceedings of the ACM on Measurement and Analysis of Computing Systems10.1145/34600875:2(1-24)Online publication date: 4-Jun-2021
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
PLDI '97: Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation
May 1997
365 pages
ISBN:0897919076
DOI:10.1145/258915
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 May 1997

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Article

Conference

PLDI97
Sponsor:
PLDI97: Conference on Programming Language
June 16 - 18, 1997
Nevada, Las Vegas, USA

Acceptance Rates

PLDI '97 Paper Acceptance Rate 31 of 158 submissions, 20%;
Overall Acceptance Rate 406 of 2,067 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)143
  • Downloads (Last 6 weeks)11
Reflects downloads up to 28 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2023)Rank-Polymorphism for Shape-Guided BlockingProceedings of the 11th ACM SIGPLAN International Workshop on Functional High-Performance and Numerical Computing10.1145/3609024.3609410(1-14)Online publication date: 30-Aug-2023
  • (2022)Mix and Match: Reorganizing Tasks for Enhancing Data LocalityACM SIGMETRICS Performance Evaluation Review10.1145/3543516.346010349:1(47-48)Online publication date: 7-Jun-2022
  • (2021)Mix and Match: Reorganizing Tasks for Enhancing Data LocalityProceedings of the ACM on Measurement and Analysis of Computing Systems10.1145/34600875:2(1-24)Online publication date: 4-Jun-2021
  • (2020)A Vector-Length Agnostic Compiler for the Connex-S Accelerator with Scratchpad MemoryACM Transactions on Embedded Computing Systems10.1145/340653619:6(1-30)Online publication date: 3-Oct-2020
  • (2019)Understanding Reuse, Performance, and Hardware Cost of DNN DataflowProceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3352460.3358252(754-768)Online publication date: 12-Oct-2019
  • (2019)Co-optimizing memory-level parallelism and cache-level parallelismProceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation10.1145/3314221.3314599(935-949)Online publication date: 8-Jun-2019
  • (2018)Enhancing computation-to-core assignment with physical location informationACM SIGPLAN Notices10.1145/3296979.319238653:4(312-327)Online publication date: 11-Jun-2018
  • (2018)Computing with Near DataProceedings of the ACM on Measurement and Analysis of Computing Systems10.1145/32873212:3(1-30)Online publication date: 21-Dec-2018
  • (2018)Enhancing computation-to-core assignment with physical location informationProceedings of the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation10.1145/3192366.3192386(312-327)Online publication date: 11-Jun-2018
  • (2018)Investigating Data Layout Transformations in Chapel2018 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)10.1109/IPDPSW.2018.00145(915-924)Online publication date: May-2018
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media