research-article

Open access

Automatic Storage Optimization for Arrays

Authors:

Somashekaracharya G. Bhaskaracharya,

Uday Bondhugula,

Albert CohenAuthors Info & Claims

ACM Transactions on Programming Languages and Systems (TOPLAS), Volume 38, Issue 3

Article No.: 11, Pages 1 - 23

https://doi.org/10.1145/2845078

Published: 08 April 2016 Publication History

Abstract

Efficient memory allocation is crucial for data-intensive applications, as a smaller memory footprint ensures better cache performance and allows one to run a larger problem size given a fixed amount of main memory. In this article, we describe a new automatic storage optimization technique to minimize the dimensionality and storage requirements of arrays used in sequences of loop nests with a predetermined schedule. We formulate the problem of intra-array storage optimization as one of finding the right storage partitioning hyperplanes: each storage partition corresponds to a single storage location. Our heuristic is driven by a dual-objective function that minimizes both the dimensionality of the mapping and the extents along those dimensions. The technique is dimension optimal for most codes encountered in practice. The storage requirements of the mappings obtained also are asymptotically better than those obtained by any existing schedule-dependent technique. Storage reduction factors and other results that we report from an implementation of our technique demonstrate its effectiveness on several real-world examples drawn from the domains of image processing, stencil computations, high-performance computing, and the class of tiled codes in general.

References

[1]

Samah Abu-Mahmeed, Cheryl McCosh, Zoran Budimli, Ken Kennedy, Kaushik Ravindran, Kevin Hogan, Paul Austin, Steve Rogers, and Jacob Kornerup. 2009. Scheduling tasks to maximize usage of aggregate variables in place. In Proceedings of the International Conference on Compiler Construction (CC’09).

Digital Library

[2]

Alfred V. Aho, Ravi Sethi, Jeffrey D. Ullman, and Monica S. Lam. 2006. Compilers: Principles, Techniques, and Tools (2nd ed.). Prentice Hall.

Digital Library

[3]

Christophe Alias. 2007. Bee+Cl@k. Available at http://compsys-tools.ens-lyon.fr/.

[4]

Christophe Alias, Fabrice Baray, and Alain Darte. 2007. Bee+Cl@k: An implementation of lattice-based array contraction in the source-to-source translator Rose. In Proceedings of the ACM SIGPLAN/SIGBED Conference on Languages, Compilers, and Tools for Embedded Systems. 73--82.

Digital Library

[5]

Vinayaka Bandishti, Irshad Pananilath, and Uday Bondhugula. 2012. Tiling stencil computations to maximize parallelism. In Proceedings of the International Conference on High Performance Computing, Networking, Storage, and Analysis. Article No. 40.

Digital Library

[6]

U. Bondhugula, M. Baskaran, S. Krishnamoorthy, J. Ramanujam, A. Rountev, and P. Sadayappan. 2008. Automatic transformations for communication-minimized parallelization and locality optimization in the polyhedral model. In Proceedings of the Joint European Conferences on Theory and Practice of Software 17th International Conference on Compiler Construction (CC’08/ETAPS’08). 132--146.

Digital Library

[7]

Philippe Clauss, Federico Javier Fernandez, Diego Garbervetsky, and Sven Verdoolaege. 2009. Symbolic polynomial maximization over convex sets and its application to memory requirement estimation. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 17, 8, 983--996.

Digital Library

[8]

Alain Darte, Robert Schreiber, and Gilles Villard. 2005. Lattice-based memory allocation. IEEE Transactions on Computing 54, 10, 1242--1257.

Digital Library

[9]

Eddy de Greef, Francky Catthoor, and Hugo De Man. 1997. Memory size reduction through storage order optimization for embedded parallel multimedia applications. Parallel Computing 23, 12, 1811--1837.

Digital Library

[10]

P. Feautrier. 1992. Some efficient solutions to the affine scheduling problem: Part I, one-dimensional time. International Journal of Parallel Programming 21, 5, 313--348.

Digital Library

[11]

GNU. 2010. GLPK (GNU Linear Programming Kit). Retrieved February 27, 2016, from https://www.gnu. org/software/glpk/.

[12]

Tobias Grosser, Albert Cohen, Justin Holewinski, Ponuswamy Sadayappan, and Sven Verdoolaege. 2014. Hybrid hexagonal/classical tiling for GPUs. In Proceedings of the IEEE/ACM International Symposium on Code Generation and Optimization. ACM, New York, NY, 66.

Digital Library

[13]

Chris Harris and Mike Stephens. 1988. A combined corner and edge detector. In Proceedings of the 4th Alvey Vision Conference. 147--151.

[14]

Intel. 2013. Using Intel VTune Amplifier XE to Tune Software on the Intel Xeon Processor E5 Family. Retrieved February 27, 2016, from https://software.intel.com/en-us/articles/using-intel-vtune-amplifier-xe-to-tune-software-on-the-intel-xeon-processor-e5-family.

[15]

Intel. 2015. Intel VTune Amplifier XE 2015 (build 367957). Retrieved December 20, 2015, from https://software.intel.com/en-us/intel-vtune-amplifier-xe.

[16]

Vincent Lefebvre and Paul Feautrier. 1998. Automatic storage management for parallel programs. Parallel Computing 24, 3--4, 649--671.

Digital Library

[17]

Ravi Teja Mullapudi, Vinay Vasista, and Uday Bondhugula. 2015. PolyMage: Automatic optimization for image processing pipelines. In Proceedings of the 20th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’15).

Digital Library

[18]

Irshad Pananilath, Aravind Acharya, Vinay Vasista, and Uday Bondhugula. 2015. An optimizing code generator for a class of Lattice-Boltzmann computations. ACM Transactions on Architecture and Code Optimization 12, 2, Article No. 14.

Digital Library

[19]

Pluto. 2008. PLUTO: An Automatic Polyhedral parallelizer and locality optimizer for multicores. Available at http://pluto-compiler.sourceforge.net.

[20]

Fabien Quilleré and Sanjay V. Rajopadhye. 2000. Optimizing memory usage in the polyhedral model. ACM Transactions on Programming Languages and Systems 22, 5, 773--815.

Digital Library

[21]

Jonathan Ragan-Kelley, Connelly Barnes, Andrew Adams, Sylvain Paris, Frédo Durand, and Saman P. Amarasinghe. 2013. Halide: A language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines. In Proceedings of the 34th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’13). 519--530.

Digital Library

[22]

Alexander Schrijver. 1986. Theory of Linear and Integer Programming. John Wiley & Sons.

Digital Library

[23]

M. Strout, L. Carter, J. Ferrante, and B. Simon. 1998. Schedule-independent storage mapping for loops. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems. 24--33.

Digital Library

[24]

S. Succi. 2001. The Lattice Boltzmann Equation: For Fluid Dynamics and Beyond. Oxford University Press.

[25]

William Thies, Frédéric Vivien, and Saman Amarasinghe. 2007. A step towards unifying schedule and storage optimization. ACM Transactions on Programming Languages and Systems 29, 6, Article No. 34.

Digital Library

[26]

William Thies, Frédéric Vivien, Jeffrey Sheldon, and Saman P. Amarasinghe. 2001. A unified framework for schedule and storage optimization. In Proceedings of the ACM SIGPLAN Symposium on Programming Language Design and Implementation. 232--242.

Digital Library

[27]

Sven Verdoolaege. 2010. isl: An integer set library for the polyhedral model. In Mathematical Software—ICMS 2010. Lecture Notes in Computer Science, Vol. 6327. Springer, 299--302.

Digital Library

[28]

Doran Wilde and Sanjay V. Rajopadhye. 1996. Memory reuse analysis in the polyhedral model. In Proceedings of the 2nd International Euro-Par Conference on Parallel Processing. 389--397.

Digital Library

Cited By

Thievenaz HKimura KAlias C(2022)Lightweight Array Contraction by Trace-Based Polyhedral AnalysisHigh Performance Computing. ISC High Performance 2022 International Workshops10.1007/978-3-031-23220-6_2(20-32)Online publication date: 29-May-2022
https://dl.acm.org/doi/10.1007/978-3-031-23220-6_2
Leben JTzanetakis G(2019)Polyhedral Compilation for Multi-dimensional Stream ProcessingACM Transactions on Architecture and Code Optimization10.1145/333099916:3(1-26)Online publication date: 18-Jul-2019
https://dl.acm.org/doi/10.1145/3330999
Kruse MGrosser T(2018)DeLICM: scalar dependence removal at zero memory costProceedings of the 2018 International Symposium on Code Generation and Optimization - CGO 201810.1145/3179541.3168815(241-253)Online publication date: 2018
https://doi.org/10.1145/3179541.3168815

Index Terms

Automatic Storage Optimization for Arrays

Index terms have been assigned to the content through auto-classification.

Recommendations

SMO: an integrated approach to intra-array and inter-array storage optimization
POPL '16: Proceedings of the 43rd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages

The polyhedral model provides an expressive intermediate representation that is convenient for the analysis and subsequent transformation of affine loop nests. Several heuristics exist for achieving complex program transformations in this model. ...
SMO: an integrated approach to intra-array and inter-array storage optimization
POPL '16

The polyhedral model provides an expressive intermediate representation that is convenient for the analysis and subsequent transformation of affine loop nests. Several heuristics exist for achieving complex program transformations in this model. ...
Using MEMS-based storage in disk arrays
FAST'03: Proceedings of the 2nd USENIX conference on File and storage technologies

Current disk arrays, the basic building blocks of high-performance storage systems, are built around two memory technologies: magnetic disk drives, and non-volatile DRAM caches. Disk latencies are higher by six orders of magnitude than non-volatile DRAM ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Programming Languages and Systems

ACM Transactions on Programming Languages and Systems Volume 38, Issue 3

May 2016

209 pages

ISSN:0164-0925

EISSN:1558-4593

DOI:10.1145/2914585

Editor:
Jens Palsberg
University of California, Los Angeles, USA

Issue’s Table of Contents

Copyright © 2016 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 08 April 2016

Accepted: 01 November 2015

Revised: 01 August 2015

Received: 01 February 2015

Published in TOPLAS Volume 38, Issue 3

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

3
Total Citations
View Citations
567
Total Downloads

Downloads (Last 12 months)55
Downloads (Last 6 weeks)12

Reflects downloads up to 11 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Thievenaz HKimura KAlias C(2022)Lightweight Array Contraction by Trace-Based Polyhedral AnalysisHigh Performance Computing. ISC High Performance 2022 International Workshops10.1007/978-3-031-23220-6_2(20-32)Online publication date: 29-May-2022
https://dl.acm.org/doi/10.1007/978-3-031-23220-6_2
Leben JTzanetakis G(2019)Polyhedral Compilation for Multi-dimensional Stream ProcessingACM Transactions on Architecture and Code Optimization10.1145/333099916:3(1-26)Online publication date: 18-Jul-2019
https://dl.acm.org/doi/10.1145/3330999
Kruse MGrosser T(2018)DeLICM: scalar dependence removal at zero memory costProceedings of the 2018 International Symposium on Code Generation and Optimization - CGO 201810.1145/3179541.3168815(241-253)Online publication date: 2018
https://doi.org/10.1145/3179541.3168815

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Figures

Tables

Media

View Issue’s Table of Contents