Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Array Size Computation under Uniform Overlapping and Irregular Accesses

Published: 28 January 2016 Publication History

Abstract

The size required to store an array is crucial for an embedded system, as it affects the memory size, the energy per memory access, and the overall system cost. Existing techniques for finding the minimum number of resources required to store an array are less efficient for codes with large loops and not regularly occurring memory accesses. They have to approximate the accessed parts of the array leading to overestimation of the required resources. Otherwise, their exploration time is increased with an increase over the number of the different accessed parts of the array. We propose a methodology to compute the minimum resources required for storing an array which keeps the exploration time low and provides a near-optimal result for regularly and non-regularly occurring memory accesses and overlapping writes and reads.

References

[1]
F. Balasa, P. G. Kjeldsberg, A. Vandecappelle, M. Palkovic, Q. Hu, H. Zhu, and F. Catthoor. 2008. Storage estimation and design space exploration methodologies for the memory management of signal processing applications. J. Signal Process. Syst. 53, 1--2, 51--71.
[2]
V. Balasundaram and K. Kennedy. 1989. A technique for summarizing data access and its use in parallelism enhancing transformations. SIGPLAN Not. 24, 7, 41--53.
[3]
T. Ball and J. R. Larus. 1996. Efficient path profiling. In Proceedings of the International Symposium on Microarchitecture. 46--57.
[4]
I. Barany. 2007. Random polytopes, convex bodies, and approximation. In Stochastic Geometry. Lecture Notes in Mathematics, Vol. 1892, Springer, Berlin, 77--118.
[5]
J. L. Bentley, F. P. Preparata, and M. G. Faust. 1982. Approximation algorithms for convex hulls. Commun. ACM 25, 1, 64--68.
[6]
E. M. Bronstein. 2008. Approximation of convex sets by polytopes. J. Math. Sci. 153, 6, 727--762.
[7]
F. Catthoor. 1999. Energy-delay efficient data storage and transfer architectures and methodologies: Current solutions and remaining problems. J. VLSI Signal Process. 21, 219--231.
[8]
F. Catthoor, S. Wuytack, E. D. Greef, F. Franssen, L. Nachtergaele, and H. D. Man. 1998. System-level transformations for low power data transfer & storage. In Low-Power CMOS Design. IEEE, 609--618.
[9]
D. Cho, I. Issenin, N. Dutt, J. W. Yoon, and Y. Paek. 2007. Software controlled memory layout reorganization for irregular array access patterns. In Proceedings of the International Conference on Compilers, Architectures and Synthesis of Embedded Systems. 179--188.
[10]
P. Clauss and B. Meister. 2000. Automatic memory layout transformations to optimize spatial locality in parameterized loop nests. SIGARCH Comput. Architect. News 28, 11--19.
[11]
P. Clauss, F. J. Fernandez, D. Garbervetsky, and S. Verdoolaege. 2009. Symbolic polynomial maximization over convex sets and its application to memory requirement estimation. Transactions on VLSI 17, 8, 983--996.
[12]
P. Clauss, D. Garbervetsky, V. Loechner, and S. Verdoolaege. 2012. Polyhedral techniques for parametric memory requirement estimation. In Energy-Aware Memory Management for Embedded Multimedia Systems: A Computer-Aided Design Approach. Chapter 4. CRC Press, New York, 117--149.
[13]
A. Cohen and V. Lefebvre. 1999. Storage mapping optimization for parallel programs. In Proceedings of the International Conference on Parallel and Distributed Computing. 375--382.
[14]
J. Cong, W. Jiang, B. Liu, and Y. Zou. 2011. Automatic memory partitioning and scheduling for throughput and power optimization. Transactions on Design Autom. Electron. Syst. 16, 2, 15:1--15:25.
[15]
B. Creusillet and F. Irigoin. 1996. Exact vs. approximate array region analyses. In Proceedings of the International Workshop on Languages and Compilers for Parallel Computing.
[16]
A. Darte, R. Schreiber, and G. Villard. 2005. Lattice-based memory allocation. ACM Transaction on Comput. 54, 10, 1242--1257.
[17]
E. De. Greef, F. Catthoor, and H. D. Man. 1996. Reducing storage size for static control programs mapped onto parallel architectures. In Dagstuhl Seminar on Loop Parallelisation. 728--735.
[18]
E. De. Greef, F. Catthoor, and H. D. Man. 1997. Memory size reduction through storage order optimization for embedded parallel multimedia applications. In Parallel Computing. Springer, London, UK. 84--98.
[19]
J. Dongarra, O. Brewer, J. A. Kohl, and S. Fineberg. 1990. A tool to aid in the design, implementation, and understanding of matrix algorithms for parallel processors. J. Parallel Distrib. Comput. 9, 2, 185--202.
[20]
R. O. Duda and P. E. Hart. 1972. Use of the hough transformation to detect lines and curves in pictures. Commun. ACM 15, 1, 11--15.
[21]
P. Feautrier. 1988. Array expansion. In Proceedings of the International Conference on Supercomputing. 429--441.
[22]
I. Filippopoulos, F. Catthoor, P. G. Kjeldsberg, E. Hammari, and J. Huisken. 2012. Memory-aware system scenario approach energy impact. In Proceedings of the NORCHIP. 1--6.
[23]
I. Filippopoulos, F. Catthoor, and P. Kjeldsberg. 2013. Exploration of energy efficient memory organisations for dynamic multimedia applications using system scenarios. Desi. Autom. Embed. Syst. 17, 3--4, 669--692.
[24]
F. Franssen, F. Balasa, M. van Swaaij, F. Catthoor, and H. D. Man. 1993. Modeling multidimensional data & control flow. Transactions on VLSI 1, 3, 319--327.
[25]
S. V. Gheorghita, M. Palkovic, J. Hamers, et al. 2009. System scenario based design of dynamic embedded systems. Transactions on Desi. Autom. Electron. Syst. 14, 3, 1--45.
[26]
A. Grösslinger. 2009. Precise management of scratchpad memories for localising array accesses in scientific codes. In Proceedings of the International Conference on Compiler Construction. 236--250.
[27]
M. R. Guthaus, J. S. Ringenberg, D. Ernst, T. M. Austin, T. Mudge, and R. B. Brown. 2001. MiBench: A free, commercially representative embedded benchmark suite. In Proceedings of the International Symposium on Workload Characterization. 3--14.
[28]
B. Jang, D. Schaa, P. Mistry, and D. Kaeli. 2011. Exploiting memory access patterns to improve memory performance in data-parallel architectures. Transactions on Parallel & Distributed Systems 22, 105--118.
[29]
T. Janjusic, K. Kavi, and B. Potter. 2011. Gleipnir: A memory analysis tool. In Proceedings of the International Conference on Computational Science. 2058--2067.
[30]
P. K. Jha and N. D. Dutt. 1997. Library mapping for memories. In Proceedings of the European Design and Test Conference. 288.
[31]
B. Juurlink, M. Alvarez-Mesa, C. Chi, A. Azevedo, C. Meenderinck, and A. Ramirez. 2012. Understanding the application: An overview of the h.264 Standard. In Scalable Parallel Programming Applied to H.264/AVC Decoding. Springer, Berlin, 5--15.
[32]
M. T. Kandemir. 2001. A compiler technique for improving whole-program locality. SIGPLAN Not. 36, 3, 179--192.
[33]
P. G. Kjeldsberg, F. Catthoor, and E. J. Aas. 2003. Data dependency size estimation for use in memory optimization. ACM Transactions on Computer-Aid. Desi. Integrat. Circu. Syst. 22, 908--921.
[34]
P. G. Kjeldsberg, F. Catthoor, and E. J. Aas. 2004. Storage requirement estimation for optimized design of data intensive applications. ACM Transactions on Desi. Autom. Electron. Syst. 9, 2, 133--158.
[35]
A. Kritikakou, F. Catthoor, V. Kelefouras, and C. Goutis. 2013a. Near-optimal & scalable intra-signal in-place for non-overlapping & irregular access schemes. ACM Transactions on Desi. Autom. Electron. Syst. 19, 1, 4:1--4:30.
[36]
A. Kritikakou, F. Catthoor, V. Kelefouras, and C. Goutis. 2013b. A systematic approach to classify design-time global scheduling techniques. ACM Comput. Surv. 45, 2.
[37]
A. Kritikakou, F. Catthoor, V. Kelefouras, and C. Goutis. 2014. A scalable and near-optimal representation of access schemes for memory management. ACM Transactions on Architec. Code Optim. 11, 1, 13:1--13:25.
[38]
C. Lee, M. Potkonjak, and W. H. Mangione-Smith. 1997. MediaBench: A tool for evaluating & synthesizing multimedia & communicatons systems. In Proceedings of the International Symposium on Microarchitecture. 330--335.
[39]
L. H. Lee, E. P. Chew, K. C. Tan, and Y. Han. 2007. An optimization model for storage yard management in transshipment hubs. In Container Terminals and Cargo Systems. Springer, Berlin, 107--129.
[40]
P. E. Lippens, J. L. Van Meerbergen, W. Verhaegh, and A. van der Werf. 1993. Allocation of multiport memories for hierarchical data stream. In Proceedings of the International Conference on Computer-Aided Design. 728--735.
[41]
C. K. Luk, R.Cohn, R. Muth, H. Patil, A. Klauser, G. Lowney, S. Wallace, V. J. Reddi, and K. Hazelwood. 2005. Pin: Building customized program analysis tools with dynamic instrumentation. SIGPLAN Not. 40, 6, 190--200.
[42]
D. E. Maydan, S. P. Amarasinghe, and M. S. Lam. 1993. Array-data flow analysis and its use in array privatization. In Proceedings of the International Symposium on Principles of Programming Languages. 2--15.
[43]
B. Meister and S. Verdoolaege. 2008. Polynomial approximations in the polytope model: Bringing the power of quasi-polynomials to the masses. In Proceedings of the Workshop Optimizations for DSP and Embedded Systems.
[44]
L. Nachtergaele, I. Bolsens, and H. D. Man. 1992. Specification and simulation front-end for hardware synthesis of DSP applications. International J. Comput. Simul. 2, 213--2291.
[45]
N. Nethercote, R. Walsh, and J. Fitzhardinge. 2006. Building workload characterization tools with Valgrind. In Proceedings of the International Symposium on Workload Characterization.
[46]
Y. Paek, J. Hoeflinger, and D. Padua. 2002. Efficient & precise array access analysis. ACM Transactions on Program. Lang. Syst. 24, 1, 65--109.
[47]
K. V. Palem, R. M. Rabbah, V. J. Mooney III, P. Korkmaz, and K. Puttaswamy. 2002. Design space optimization of embedded memory systems via data remapping. In Proceedings of the LCTES. 28--37.
[48]
P. R. Panda, F. Catthoor, N. Dutt, K. Danckaert, E. Brockmeyer, C. Kulkarni, A. Vandercappelle, and P. G. Kjeldsberg. 2001. Data and memory optimization techniques for embedded systems. ACM Transactions on Desi. Autom. Electron. Syst. 6, 2, 149--206.
[49]
L. N. Pouchet. 2012. Polybenchmarks benchmark suite. http://www.cse.ohio-state.edu/∼pouchet/software/polybench/.
[50]
J. Ramanujam, J. Hong, M. Kandemir, and A. Narayan. 2001. Reducing memory requirements of nested loops for embedded systems. In Proceedings of the DAC. 359--364.
[51]
S. Rubin, R. Bodk, and T. Chilimbi. 2002. An efficient profile-analysis framework for data-layout optimizations. SIGPLAN Not. 37, 140--153.
[52]
R. Seghir, V. Loechner, and B. Meister. 2012. Integer affine transformations of parametric polytopes and applications to loop nest optimization. ACM Transactions on Architecture and Code Optim. 9, 2, 8:1--8:27.
[53]
Z. Shen, Z. Li, and P. C. Yew. 1990. An empirical study of fortran programs for parallelizing compilers. ACM Transactions on Parallel and Distrib. Syst., 1, 356--364.
[54]
B. So, M. W. Hall, and H. E. Ziegler. 2004. Custom data layout for memory parallelism. In Proceedings of the International Symposium on Code Generation and Optimization. 291.
[55]
R. Tronon, M. Bruynooghe, G. Janssens, and F. Catthoor. 2002. Storage size reduction by in-place mapping of arrays. In Verification, Model Checking, and Abstract Interpretation. Lecture Notes in Computer Science, Vol. 2294. Springer, Berlin, 167--181. ISBN 978-3-540-43631-7. http://dx.doi.org/10.1007/3-540-47813-2_12
[56]
M. van Swaaij, F. Micha, F. Catthoor, and H. D. Man. 1992. Automating high level control flow transformations for DSP memory management. In Proceedings of the Workshop on VLSI Signal Processing. 397--406.
[57]
P. Vanbroekhoven, G. Janssens, M. Bruynooghe, H. Corporaal, and F. Catthoor. 2003. A step towards a scalable dynamic single assignment conversion. Technical Report. KU Leuven.
[58]
P. Vanbroekhoven, G. Janssens, M. Bruynooghe, and F. Catthoor. 2005. Transformation to dynamic single assignment using a simple data flow analysis. In Programming Languages and Systems. Lecture Notes in Computer Science, Vol. 3780. Springer, Berlin, 330--346.
[59]
P. Vanbroekhoven, G. Janssens, M. Bruynooghe, and F. Catthoor. 2007. A practical dynamic single assignment transformation. ACM Transactions on Desi. Autom. Electron. Syst. 12, 4, 40:1--40:21.
[60]
S. Verdoolaege. 2013. Barvinok. http://barvinok.gforge.inria.fr/.
[61]
S. Verdoolaege, R. Seghir, K. Beyls, V. Loechner, and M. Bruynooghe. 2007. Counting integer points in parametric polytopes using Barvinok's rational functions. Algorithmica 48, 1, 37--66.
[62]
J. Weidendorfer, M. Kowarschik, and C. Trinitis. 2004. A tool suite for simulation based analysis of memory access behavior. In Proceedings of the International Conference on Computational Science. 440--447.
[63]
S. Wuytack, J. P. Diguet, F. Catthoor, and H. D. Man. 1998. Formalized methodology for data reuse exploration for low-power hierarchical memory mappings. Transactions on VLSI 6, 529--537.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Design Automation of Electronic Systems
ACM Transactions on Design Automation of Electronic Systems  Volume 21, Issue 2
January 2016
422 pages
ISSN:1084-4309
EISSN:1557-7309
DOI:10.1145/2888405
  • Editor:
  • Naehyuck Chang
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Journal Family

Publication History

Published: 28 January 2016
Accepted: 01 July 2015
Revised: 01 March 2015
Received: 01 November 2014
Published in TODAES Volume 21, Issue 2

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Liveness
  2. iteration space
  3. near-optimality
  4. resources optimization
  5. scalability

Qualifiers

  • Research-article
  • Research
  • Refereed

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 102
    Total Downloads
  • Downloads (Last 12 months)1
  • Downloads (Last 6 weeks)0
Reflects downloads up to 09 Jan 2025

Other Metrics

Citations

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media