Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Near-optimal and scalable intrasignal in-place optimization for non-overlapping and irregular access schemes

Published: 20 December 2013 Publication History

Abstract

Storage-size management techniques aim to reduce the resources required to store elements and to concurrently provide efficient addressing during element accessing. Existing techniques are less appropriate for large iteration spaces with increased numbers of irregularly spread holes. They either have to approximate the accessed regions, leading to overestimation of the final resources, or they require prohibited exploration time to find the storage size. In this work, we present a near-optimal and scalable methodology for storage-size, intrasignal, in-place optimization, that is, to compute the minimum amount of resources required to store the elements of a group (array), for irregular complex access schemes in the target domain of non-overlapping store and load accesses.

References

[1]
Balasa, F., Franssen, F. H. M., Catthoor, F. V. M., and De Man, H. J. 1994. Transformation of nested loops with modulo indexing to affine recurrences. Lett. Parallel Process. 4, 271--280.
[2]
Balasundaram, V. and Kennedy, K. 1989. A technique for summarizing data access and its use in parallelism enhancing transformations. SIGPLAN Not. 24, 7, 41--53.
[3]
Ball, T. and Larus, J. R. 1996. Efficient path profiling. In Proceedings of the 29th Annual ACM/IEEE International Symposium on Microarchitecture (MICRO). IEEE, 46--57.
[4]
Catthoor, F. 1999. Energy-delay efficient data storage and transfer architectures and methodologies: Current solutions and remaining problems. J. VLSI Signal Process. 21, 3, 219--231.
[5]
Catthoor, F., Wuytack, S., Greef, G. E., Banica, F., Nachtergaele, L., and Vandecappelle, A. 1998. Custom Memory Management Methodology: Exploration of Memory Organisation for Embedded Multimedia System Design. Kluwer Academic Publishers, Norwell, MA.
[6]
Cho, D., Issenin, I., Dutt, N., Yoon, J. W., and Paek, Y. 2007. Software controlled memory layout reorganization for irregular array access patterns. In Proceedings of the International Conference on CASES. ACM, New York, NY, 179--188.
[7]
Clauss, P. and Meister, B. 2000. Automatic memory layout transformations to optimize spatial locality in parameterized loop nests. SIGARCH Comput. Archit. News 28, 1, 11--19.
[8]
Cong, J., Jiang, W., Liu, B., and Zou, Y. 2011. Automatic memory partitioning and scheduling for throughput and power optimization. ACM Trans. Des. Autom. Electron. Syst. 16, 2, 15:1--15:25.
[9]
Creusillet, B. and Irigoin, F. 1996. Exact vs. approximate array region analyses.In Proceedings of the 9th International Workshop on Languages and Compilers for Parallel Computing (LCPC'96). Lecture Notes in Computer Science, vol. 1239, Springer, Berlin, 86--100.
[10]
Darte, A., Schreiber, R., and Villard, G. 2005. Lattice-based memory allocation. IEEE Trans. Comput. 54, 10, 1242--1257.
[11]
De Greef, E., Catthoor, F., and De Man, H. 1997. Memory size reduction through storage order optimization for embedded parallel multimedia applications. J. Parallel Comput. 23, 12, 1811--1837.
[12]
Dongarra, J., Brewer, O., Fineberg, S., and Kohl, J. A. 1990. A tool to aid in the design, implementation, and understanding of matrix algorithms for parallel processors. J. Parallel Distrib. Comput. 9, 2, 185--202.
[13]
Franssen, F. H. M., Balasa, F., van Swaaij, M. F. X. B., Catthoor, F. V. M., and De Man, H. J. 1993. Modeling multidimensional data and control flow. J. Trans. VLSI Syst. 1, 3, 319--327.
[14]
Gheorghita, S. V., Palkovic, M., Hamers, J., Vandecappelle, A., Mamagkakis, S., Basten, T., Eeckout, L., Corporaal, H., Catthoor, F., Vandeputte, F., and De Bosschere, K. 2009. System scenario based design of dynamic embedded systems. ACM Trans. Des. Autom. Electron. Syst. 14, 3, 1--45.
[15]
Größlinger, A. 2009. Precise management of scratchpad memories for localising array accesses in scientific codes. In Proceedings of the 18th International Conference on Compiler Construction Held as Part of the Joint European Conferences on Theory and Practice of Software (ETAPS'09). Lecture Notes in Computer Science, vol. 5501, Springer, Berlin, 236--250.
[16]
Guthaus, M. R., Ringenberg, J. S., Ernst, D., Austin, T. M., Mudge, T., and Brown, R. B. 2001. MiBench: A free, commercially representative embedded benchmark suite. In Proceedings of the International Workshop on Workload Characterization. IEEE, 3--14.
[17]
Jang, B., Schaa, D., Mistry, P., and Kaeli, D. 2011. Exploiting memory access patterns to improve memory performance in data-parallel architectures. IEEE. Trans. Parallel Distrib. Syst. 22, 1, 105--118.
[18]
Janjusic, T., Kavi, K., and Potter, B. 2011. Gleipnir: A memory analysis tool. In Proceedings of the International Conference on Computational Science (ICCS'11). 2058--2067.
[19]
Jha, P. K. and Dutt, N. D. 1997. Library mapping for memories. In Proceedings of the European Design and Test Conference. IEEE, 288--292.
[20]
Kandemir, M. T. 2001. A compiler technique for improving whole-program locality. SIGPLAN Not. 36, 3, 179--192.
[21]
Kjeldsberg, P. G., Catthoor, F., and Aas, E. J. 2003. Data dependency size estimation for use in memory optimization. IEEE Trans. Comput. Aid. Des. Integr. Circ. Syst. 22, 7, 908--921.
[22]
Kjeldsberg, P. G., Catthoor, F., and Aas, E. J. 2004. Storage requirement estimation for optimized design of data intensive applications. ACM Trans. Des. Autom. Electron. Syst. 9, 2, 133--158.
[23]
Kritikakou, A., Catthoor, F., Kelefouras, V., and Goutis, C. 2013. A scalable and near-optimal representation for storage size management. Tech. rep., Department of Electrical and Computer Engineering, University of Patras.
[24]
Lee, C., Potkonjak, M., and Mangione-Smith, W. H. 1997. MediaBench: A tool for evaluating and synthesizing multimedia and communications systems. In Proceedings of the International Symposium on Microarchitecture. IEEE, 330--335.
[25]
Lee, L. H., Chew, E. P., Tan, K. C., and Han, Y. 2007. An optimization model for storage yard management in transshipment hubs. OR Spectrum 28, 4, 539--561.
[26]
Lefebvre, V. and Feautrier, P. 1998. Automatic storage management for parallel programs. Parallel Comput. 24, 3--4, 649--671.
[27]
Lippens, P. E. R., van Meerbergen, J. L., Verhaegh, W. F. J., and van der Werf, A. 1993. Allocation of multiport memories for hierarchical data stream. In Proceedings of the International Conference on Computer-Aided Design. IEEE, 728--735.
[28]
Luk, C.-K., Cohn, R., Muth, R., Patil, H., Klauser, A., Lowney, G., Wallace, S., Janapa, V., and Hazelwood, R. K. 2005. Pin: Building customized program analysis tools with dynamic instrumentation. SIGPLAN Not. 40, 6, 190--200.
[29]
Maydan, D. E., Amarasinghe, S. P., and Lam, M. S. 1993. Array-data flow analysis and its use in array privatization. In Proceedings of the 20th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL'93). ACM, New York, NY, 2--15.
[30]
Nachtergaele, L., Bolsens, I., and De Man, H. J. 1992. Specification and simulation front-end for hardware synthesis of DSP applications. Int. J. Comp. Simul. 2, 213--2291.
[31]
Nethercote, N., Walsh, R., and Fitzhardinge, J. 2006. Building workload characterization tools with Valgrind. In Proceedings of the IEEE International Symposium on Workload Characterization (IISWC'06).
[32]
Paek, Y., Hoeflinger, J., and Padua, D. 2002. Efficient and precise array access analysis. ACM Trans. Program. Lang. Syst. 24, 1, 65--109.
[33]
Palem, K. V., Rabbah, R. M., Mooney, III, V. J., Korkmaz, P., and Puttswamy, K. P. 2002. Design space optimization of embedded memory systems via data remapping. In Proceedings of the Conference on Languages, Compilers and Tools for Embedded Systems. ACM, 28--37.
[34]
Panda, P. R., Catthoor, F., Dutt, N. D., Danckaert, K., Brockmeyer, E., Kulkarni, C., Vandercappelle, A., and Kjeldsberg, P. G. 2001. Data and memory optimization techniques for embedded systems. ACM Trans. Des. Autom. Electron. Syst. 6, 2, 149--206.
[35]
Pouchet, L.-N. 2012. PolyBench/C -- the Polyhedral Benchmark suite. http://www.cse.ohio-state.edu/∼ pouchet/software/polybench/.
[36]
Ramanujam, J., Hong, J., Kandemir, M., and Narayan, A. 2001. Reducing memory requirements of nested loops for embedded systems. In Proceedings of the 38th Annual Design Automation Conference. ACM, New York, NY, 359--364.
[37]
Rubin, S., BodíK, R., and Chilimbi, T. 2002. An efficient profile-analysis framework for data-layout optimizations. SIGPLAN Not. 37, 1, 140--153.
[38]
Seghir, R., Loechner, V., and Meister, B. 2012. Integer affine transformations of parametric Z-polytopes and applications to loop nest optimization. ACM Trans. Archit. Code Optim. 9, 2, 8:1--8:27.
[39]
Shen, Z., Li, Z., and Yew, P. C. 1990. An empirical study of Fortran programs for parallelizing compilers. IEEE Trans. Parallel Distrib. Syst. 1, 3, 356--364.
[40]
So, B., Hall, M. W., and Ziegler, H. E. 2004. Custom data layout for memory parallelism. In Proceedings of the International Symposium on Code Generation and Optimization: Feedback-Directed and Runtime Optimization (CGO'04). IEEE, 291.
[41]
Soto, M., Sevaux, M., Rossi, A., and Laurent, J. 2013. Memory Allocation Problems in Embedded Systems: Optimization Methods. Wiley-ISTE.
[42]
van Swaaij, M. F. X. B., Franssen, F. H. M., Catthoor, F. V. M., and De Man, H. J. 1992. Automating high level control flow transformations for DSP memory management. In Proceedings of the Workshop on VLSI Signal Processing. IEEE, 397--406.
[43]
Weidendorfer, J., Kowarschik, M., and Trinitis, C. 2004. A tool suite for simulation based analysis of memory access behavior. In Proceedings of the 4th International Conference on Computational Science (ICCS'04). Lecture Notes in Computer Science, vol. 3038, Springer-Verlag, Berlin Heidelberg, 440--447.
[44]
Wuytack, S., Diguet, J.-P., Catthoor, F. V. M., and De Man, H. J. 1998. Formalized methodology for data reuse exploration for low-power hierarchical memory mappings. IEEE Trans. VLSI. Syst. 6, 4, 529--537.

Cited By

View all
  • (2016)Array Size Computation under Uniform Overlapping and Irregular AccessesACM Transactions on Design Automation of Electronic Systems10.1145/281864321:2(1-35)Online publication date: 28-Jan-2016
  • (2014)A scalable and near-optimal representation of access schemes for memory managementACM Transactions on Architecture and Code Optimization10.1145/257967711:1(1-25)Online publication date: 1-Feb-2014
  • (2014)Conclusions and Future DirectionsScalable and Near-Optimal Design Space Exploration for Embedded Systems10.1007/978-3-319-04942-7_10(261-263)Online publication date: 21-Feb-2014
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Design Automation of Electronic Systems
ACM Transactions on Design Automation of Electronic Systems  Volume 19, Issue 1
December 2013
210 pages
ISSN:1084-4309
EISSN:1557-7309
DOI:10.1145/2558148
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Journal Family

Publication History

Published: 20 December 2013
Accepted: 01 July 2013
Revised: 01 April 2013
Received: 01 December 2012
Published in TODAES Volume 19, Issue 1

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Storage size
  2. iteration space
  3. near optimality
  4. scalability

Qualifiers

  • Research-article
  • Research
  • Refereed

Funding Sources

  • Hellenic and European Regional Development Fund (ERDF) under ESPA 2007-2013 (MICRO2-SE-G)
  • Public Welfare Foundation “Propondis” research funds
  • European Social Fund
  • Greek National Funds (Heracleitus II-NSRF)

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)2
  • Downloads (Last 6 weeks)0
Reflects downloads up to 06 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2016)Array Size Computation under Uniform Overlapping and Irregular AccessesACM Transactions on Design Automation of Electronic Systems10.1145/281864321:2(1-35)Online publication date: 28-Jan-2016
  • (2014)A scalable and near-optimal representation of access schemes for memory managementACM Transactions on Architecture and Code Optimization10.1145/257967711:1(1-25)Online publication date: 1-Feb-2014
  • (2014)Conclusions and Future DirectionsScalable and Near-Optimal Design Space Exploration for Embedded Systems10.1007/978-3-319-04942-7_10(261-263)Online publication date: 21-Feb-2014
  • (2013)Exploration of energy efficient memory organisations for dynamic multimedia applications using system scenariosDesign Automation for Embedded Systems10.1007/s10617-014-9145-617:3-4(669-692)Online publication date: 1-Sep-2013

View Options

Get Access

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media