Article

Free access

Data-centric multi-level blocking

Authors:

Induprakas Kodukula,

Keshav PingaliAuthors Info & Claims

PLDI '97: Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation

Pages 346 - 357

https://doi.org/10.1145/258915.258946

Published: 01 May 1997 Publication History

Abstract

We present a simple and novel framework for generating blocked codes for high-performance machines with a memory hierarchy. Unlike traditional compiler techniques like tiling, which are based on reasoning about the control flow of programs, our techniques are based on reasoning directly about the flow of data through the memory hierarchy. Our data-centric transformations permit a more direct solution to the problem of enhancing data locality than current control-centric techniques do, and generalize easily to multiple levels of memory hierarchy. We buttress these claims with performance numbers for standard benchmarks from the problem domain of dense numerical linear algebra. The simplicity and intuitive appeal of our approach should make it attractive to compiler writers as well as to library writers.

References

[1]

Ramesh C. Agarwal and Fred G. Gustavson. Algorithm and Architecture Aspects of Producing ESSL BLAS on POWERS.

[2]

E. Anderson, Z. Bat, C. Bischof, J. Demreel, J. Dongarra, J. Du Croz, A. Greenbaum, S. Hammarling, A. McKenney, S. Ostrouchov, and D. Sorensen, editors. LAPA CK Users' Guide. Sec. ond Edition. SIAM, Philadelphia, 1995.

Digital Library

[3]

Jennifer Anderson, Saman Amarsinghe, and Monica Lain. Data and computation transformations for multiprocessors. In A CM Symposium on Principles and Practice of Parallel Programming, Jun 1995.

Digital Library

[4]

U. Banerjee. Unimodular transformations of double loops. In Proceedings of the Workshop on Ad. vances in Languages and Compilers for Parallel Processing, pages 192-219, August 1990.

[5]

David Bau, Induprakas Kodukula, Vladimir Kotlyar, Keshav Pingali, and Paul Stodghil. Solving alignment using elementary linear algebra. In Proceedings of the 7th LCPC Workshop, August 1994. Also available as Cornell Computer Science Dept. tech report TR95-1478.

Digital Library

[6]

Pierre Boulet, Alain Darte, Tanguy Risset, and Yves Robert. (Pen)-ultimate tiling? In INTE- GRATION, the VLSI Journal, volume 17, pages 33-51. 1994.

Digital Library

[7]

Steve Carr and K. Kennedy. Compiler blockability of numerical algorithms. In Supercompating, 1992.

Digital Library

[8]

Steve Cart and R. B. Lehoucq. Compiler blockability of dense matrix factorizations. Technical report, Argonne National Laboratory, Oct 1996.

[9]

Steven Cart and R. B. Lehoucq. A compilerblockable algorithm for QR decomposition, 1994.

[10]

L. Carter, J. Ferrante, and S. Flynn Hummel. Hierarchical tiling for improved superscalar performance. In International Parallel Processing Sym. posiam, April 1995.

Digital Library

[11]

Michael Cierniak and Wet Li. Unifying data and control transformations for distributed shared memory machines. In $IGPLAN 1995 conference on Programming Languages Design and Implementation, Jun 1995.

Digital Library

[12]

Stephanie Coleman and Kathryn S. McKinley. Tile size selection using cache organization and data layout, in David W. Wail, editor, A CM SIGPLAN '95 Con/erence on Programming Language Design and Implementation (PLDI), volume 30(6) of A CM $IGPLAN Notices, pages 279-290, New York, NY, USA, June 1995. ACM Press.

Digital Library

[13]

Jim Demmel. Personal communication, Sep 1996.

[14]

Jack Dongarra and Robert Schreiber. Automatic blocking of nested loops. Technical Report UT-CS- 90-108, Department of Computer Science, University of Tennessee, May 1990.

Digital Library

[15]

Gene Golub and Charles Van Loan. Matrix Com. putations. The Johns Hopkins University Press, 1996.

[16]

Monica S. Lain, Edward E. Rothberg, and Michael E. Wolf. The cache performance and optimizations of blocked algorithms. In Proceedings of the Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, pages 63-74, Santa Clara, California, April 8-11, 1991. ACM SIGARCH, SIG- PLAN, SIGOPS, and the IEEE Computer Society.

Digital Library

[17]

W. Li and K. Pingali. Access Normalization: Loop restructuring for NUMA compilers. A CM Transactions on Computer Systems, 1993.

Digital Library

[18]

Kathryn S. McKinley, Steve Carr, and Chau-Wen Tseng. Improving data locality with loop transformations. In A CM Transactions on Programming Languages and Systems, volume 18, pages 424-453. july 1996.

Digital Library

[19]

W. Pugh. A practical algorithm for exact array dependency analysis. Comm. of the A CM, 35(8):102, August 1992.

Digital Library

[20]

J. Ramanujam and P. Sadayappan. Tiling multidimensional iteration spaces for multicomputers. Journal of Parallel and Distributed Computing, 16(2):108-120, October 1992.

[21]

A. Rogers and K. Pingali. Process decomposition through locality of reference. In SIGPLAN89 conference on Programming Languages, Design and Implementation, Jun 1989.

Digital Library

[22]

Vivek Sarkar. Automatic selection of high order transformations in the IBM ASTI optimizer. Technical Report ADTI-96-004, Application Development Technology Institute, IBM Software Solutions Division, July 1996. Submitted to special issue of IBM Journal of Research and Development.

[23]

M.E. Wolf and M.S. Lam. A data locality optimizing algorithm. In SIGPLAN 1991 conference on Programming Languages Design and Implementation, Jun 1991.

Digital Library

[24]

M. Wolfe. Iteration space tiling for memory hierarchies. In Third SIAM Conference on Parallel Pro. cessing for Scientific Computing, December 1987.

Digital Library

[25]

M. Wolfe. High Performance Compilers for Parallel Computing. Addison-Wesley Publishing Company, 1995.

Digital Library

Cited By

Šinkarovs AKoopman TScholz SKeller GWestrick S(2023)Rank-Polymorphism for Shape-Guided BlockingProceedings of the 11th ACM SIGPLAN International Workshop on Functional High-Performance and Numerical Computing10.1145/3609024.3609410(1-14)Online publication date: 30-Aug-2023
https://dl.acm.org/doi/10.1145/3609024.3609410
Tang XKandemir MKarakoy M(2022)Mix and Match: Reorganizing Tasks for Enhancing Data LocalityACM SIGMETRICS Performance Evaluation Review10.1145/3543516.346010349:1(47-48)Online publication date: 7-Jun-2022
https://dl.acm.org/doi/10.1145/3543516.3460103
Tang XKandemir MKarakoy M(2021)Mix and Match: Reorganizing Tasks for Enhancing Data LocalityProceedings of the ACM on Measurement and Analysis of Computing Systems10.1145/34600875:2(1-24)Online publication date: 4-Jun-2021
https://dl.acm.org/doi/10.1145/3460087
Show More Cited By

Index Terms

Data-centric multi-level blocking

Recommendations

Data-centric multi-level blocking

We present a simple and novel framework for generating blocked codes for high-performance machines with a memory hierarchy. Unlike traditional compiler techniques like tiling, which are based on reasoning about the control flow of programs, our ...
Energy-efficient multi-level cell phase-change memory system with data encoding
ICCD '11: Proceedings of the 2011 IEEE 29th International Conference on Computer Design

Phase-change memory (PCM) is one of the most promising technologies among emerging non-volatile memories. Recently, the technology of multi-level cell (MLC) for PCM has been developed and a high capacity memory system can be implemented by storing ...
Multi-level cell STT-RAM: is it realistic or just a dream?
ICCAD '12: Proceedings of the International Conference on Computer-Aided Design

Spin-transfer torque random access memory (STT-RAM) is a promising nonvolatile memory technology aiming on-chip or embedded applications. In recent years, many researches have been conducted to improve the storage density and enhance the scalability of ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

PLDI '97: Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation

May 1997

365 pages

ISBN:0897919076

DOI:10.1145/258915

Chairmen:
Marina Chen
Boston Univ., Boston, MA
,
Ron K. Cytron
Washington Univ., St. Louis, MO
,
Editor:
A. Michael Berman
Rowan Univ., Glassboro, NJ

ACM SIGPLAN Notices Volume 32, Issue 5
May 1997
365 pages
ISSN:0362-1340
EISSN:1558-1160
DOI:10.1145/258916
Chairmen:
Ron K. Cytron
Washington Univ., St. Louis, MO
,
Marina Chen
Boston Univ., Boston, MA
,
Editor:
A. Michael Berman
Rowan Univ., Glassboro, NJ
Issue’s Table of Contents

Copyright © 1997 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGPLAN: ACM Special Interest Group on Programming Languages

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 May 1997

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Article

Conference

PLDI97

Sponsor:

SIGPLAN

PLDI97: Conference on Programming Language

June 16 - 18, 1997

Nevada, Las Vegas, USA

Acceptance Rates

PLDI '97 Paper Acceptance Rate 31 of 158 submissions, 20%;

Overall Acceptance Rate 406 of 2,067 submissions, 20%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

180
Total Citations
View Citations
905
Total Downloads

Downloads (Last 12 months)143
Downloads (Last 6 weeks)11

Reflects downloads up to 28 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Šinkarovs AKoopman TScholz SKeller GWestrick S(2023)Rank-Polymorphism for Shape-Guided BlockingProceedings of the 11th ACM SIGPLAN International Workshop on Functional High-Performance and Numerical Computing10.1145/3609024.3609410(1-14)Online publication date: 30-Aug-2023
https://dl.acm.org/doi/10.1145/3609024.3609410
Tang XKandemir MKarakoy M(2022)Mix and Match: Reorganizing Tasks for Enhancing Data LocalityACM SIGMETRICS Performance Evaluation Review10.1145/3543516.346010349:1(47-48)Online publication date: 7-Jun-2022
https://dl.acm.org/doi/10.1145/3543516.3460103
Tang XKandemir MKarakoy M(2021)Mix and Match: Reorganizing Tasks for Enhancing Data LocalityProceedings of the ACM on Measurement and Analysis of Computing Systems10.1145/34600875:2(1-24)Online publication date: 4-Jun-2021
https://dl.acm.org/doi/10.1145/3460087
Şuşu A(2020)A Vector-Length Agnostic Compiler for the Connex-S Accelerator with Scratchpad MemoryACM Transactions on Embedded Computing Systems10.1145/340653619:6(1-30)Online publication date: 3-Oct-2020
https://dl.acm.org/doi/10.1145/3406536
Kwon HChatarasi PPellauer MParashar ASarkar VKrishna T(2019)Understanding Reuse, Performance, and Hardware Cost of DNN DataflowProceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3352460.3358252(754-768)Online publication date: 12-Oct-2019
https://dl.acm.org/doi/10.1145/3352460.3358252
Tang XKandemir MKarakoy MArunachalam MMcKinley KFisher K(2019)Co-optimizing memory-level parallelism and cache-level parallelismProceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation10.1145/3314221.3314599(935-949)Online publication date: 8-Jun-2019
https://dl.acm.org/doi/10.1145/3314221.3314599
Kislal OKotra JTang XKandemir MJung M(2018)Enhancing computation-to-core assignment with physical location informationACM SIGPLAN Notices10.1145/3296979.319238653:4(312-327)Online publication date: 11-Jun-2018
https://dl.acm.org/doi/10.1145/3296979.3192386
Tang XKandemir MZhao HJung MKarakoy M(2018)Computing with Near DataProceedings of the ACM on Measurement and Analysis of Computing Systems10.1145/32873212:3(1-30)Online publication date: 21-Dec-2018
https://dl.acm.org/doi/10.1145/3287321
Kislal OKotra JTang XKandemir MJung MFoster JGrossman D(2018)Enhancing computation-to-core assignment with physical location informationProceedings of the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation10.1145/3192366.3192386(312-327)Online publication date: 11-Jun-2018
https://dl.acm.org/doi/10.1145/3192366.3192386
Qasem AAji AChu M(2018)Investigating Data Layout Transformations in Chapel2018 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)10.1109/IPDPSW.2018.00145(915-924)Online publication date: May-2018
https://doi.org/10.1109/IPDPSW.2018.00145
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents