research-article

Reducing memory requirements of resource-constrained applications

Authors:

P. Unnikrishnan,

I. KolcuAuthors Info & Claims

ACM Transactions on Embedded Computing Systems (TECS), Volume 8, Issue 3

Article No.: 17, Pages 1 - 37

https://doi.org/10.1145/1509288.1509289

Published: 22 April 2009 Publication History

Abstract

Embedded computing platforms are often resource constrained, requiring great design and implementation attention to memory-power-, and heat-related parameters. An important task for a compiler in such platforms is to simplify the process of developing applications for limited memory devices and resource-constrained clients. Focusing on array-intensive embedded applications to be executed on single CPU-based architectures, this work explores how loop-based compiler optimizations can be used for increasing memory location reuse. Our goal is to transform a given application in such a way that the resulting code has fewer cases (as compared to the original code), where the lifetimes of array elements overlap. The reduction in lifetimes of array elements can then be exploited by reusing memory locations as much as possible. Our experimental results indicate that the proposed strategy reduces data space requirements of 15 resource constrained applications by more than 40%, on average. We also demonstrate how this strategy can be combined with data locality (cache behavior)--enhancing techniques so that a compiler can take advantage of both, that is, reduce data memory requirements and improve data locality at the same time.

References

[1]

Ahmed, N., Mateev, N. and Pingali, K. 2000. Synthesizing transformations for locality enhancement of imperfectly-nested loop nests. In Proceedings of the International Conference on Supercomputing (ICS'00). ACM, New York.

Digital Library

[2]

Amarasinghe, S. P., Anderson, J. M., Lam, M. S., and Tseng, C. W. 1995. The SUIF compiler for scalable parallel machines. In Proceedings of the 7th SIAM Conference on Parallel Processing for Scientific Computing. Society for Industrial and Applied Mathematics, Philadelphia, PA.

[3]

Ancourt, C., and Irigoin, F. 1991. Scanning polyhedra with DO loops. In Proceedings of the Annual ACM Symposium on Principles of Programming Languages. ACM, New York, 39--50.

Digital Library

[4]

Barthou, D., Cohen, A., and Collard, J-F. 1998. Maximal static expansion. In Proceedings of the 25th Annual ACM Symposium on Principles of Programming Languages. ACM, New York.

Digital Library

[5]

Briggs, P. 1992. Register allocation via graph coloring. Ph.D. thesis, Rice University, Houston, TX.

Digital Library

[6]

Catthoor, F., Danckaert, K., Kulkarni, C., Brockmeyer, E., Kjeldsberg, P. G., Achteren, T. V., and Omnes, T. 2002. Data Access and Storage Management for Embedded Programmable Processors. Kluwer Academic Publishers, Berlin, Germany.

[7]

Catthoor, F., Wuytack, S., Greef, E. D., Balasa, F., Nachtergaele, L., and Vandecappelle, A. 1998. Custom Memory Management Methodology: Exploration of Memory Organization for Embedded Multimedia System Design. Kluwer Academic Publishers, Berlin, Germany.

Digital Library

[8]

Darte, A., Schreiber, R., and Villard, G. 2003. Lattice based memory allocation. In Proceedings of the International Conference on Compilers, Architecture, and Embedded Systems (CASES'03). ACM, New York, 298--308.

Digital Library

[9]

Fraboulet, A., Kodary, K., and Mignotte, A. 2001. Loop fusion for memory space optimization. In Proceedings of the 14th International Symposium on System Synthesis. IEEE, Los Alamitos, CA.

Digital Library

[10]

Franke, B. and O'Boyle, M. F. P. 2001. Compiler transformation of pointers to explicit arrayaccesses in DSP applications. In Proceedings of the International Conference on Compiler Construction (CC'01). Springer, Berlin, Germany.

Digital Library

[11]

Gannon, D., Jalby, W., and Gallivan, K. 1988. Strategies for cache and local memory management by global program transformations. J. Parall. Distrib. Comput., 5, 587--616.

Digital Library

[12]

P. Grun, F. Balasa, and N. Dutt. 1998. Memory size estimation for multimedia applications. In Proceedings of CODES/CACHE.

Digital Library

[13]

Hall, M. W., Amarasinghe, S., Murphy, B., Liao, S., and Lam, M. 1995. Detecting coarse-grain parallelism using an interprocedural parallelizing compiler. In Proceedings of the IEEE/ACM Conference on Supercomputing. IEEE, Los Alamitos, CA.

Digital Library

[14]

Irigoin, F. and Triolet, R. 1988. Supernode partitioning. In Proceedings of the 15th Annual ACM Symposium on Principles of Programming Languages. ACM, New York, 319--329.

Digital Library

[15]

Kandemir, M. 2001. A compiler technique for improving whole program locality. In Proceedings of the 28th Annual ACM Symposium on Principles of Programming Languages. ACM, New York.

Digital Library

[16]

Kennedy, K. and McKinley, K. S. 1993. Maximizing loop parallelism and improving data locality via loop fusion and distribution. In Proceedings of the Workshop on Languages and Compilers for Parallel Computing. Springer, Berlin, Germany, 301--321.

Digital Library

[17]

Knobe, K. and Sarkar, V. 1998. Array SSA form and its use in parallelization. In Proceedings of the 15th Annual ACM Symposium on Principles of Programming Languages. ACM, New York, 107--120.

Digital Library

[18]

Kolcu, I. Personal communication.

[19]

Lefebvre V. and Feautrier, P. 1997. Automatic storage management for parallel programs. Res. rep. PRiSM 97/8, France.

[20]

Li, W. 1993. Compiling for NUMA parallel machines. Ph.D. Thesis, Computer Science Department, Cornell University, Ithaca, New York.

Digital Library

[21]

Marchal, P., Gomez, J. I., Verdoolaege, S., Pinuel, L., and Catthoor, F. 2004. Optimizing the memory bandwidth with loop fusion. In Proceedings of the the 2nd IEEE/ACM/IFIP International Conference on Hardware-Software Codesign and System Synthesis. IEEE, Los Alamitos, CA, 188--193.

Digital Library

[22]

Maydan, D. E., Amarasinghe, S. P., and Lam, M. S. 1993. Array dataflow analysis and its use in array privatization. In Proceedings of the Annual ACM Symposium on Principles of Programming Languages. ACM, New York, 2--15.

Digital Library

[23]

McKinley, K., Carr, S., and Tseng, C.-W. 1996. Improving data locality with loop transformations. ACM Trans. Program. Lang. Syst., 8, 424--453.

Digital Library

[24]

MediaBench. http://cares.icsl.ucla.edu/MediaBench/.

[25]

MiBench. http://www.eecs.umich.edu/mibench/.

[26]

MIPSpro Family of Compilers. http://www.sgi.com/developers/devtools/languages/mipspro.html.

[27]

Pugh, W. and Wonnacott, D. 1993. An exact method for analysis of value-based array data dependences. In Proceedings of the 6th Workshop on Languages and Compilers for Parallel Computing, ACM, New York.

Digital Library

[28]

Song, Y., Xu, R., Wang, C., and Li, Z. 2001. Data locality enhancement by memory reduction. In Proceedings of the 15th ACM International Conference on Supercomputing. ACM, New York.

Digital Library

[29]

Strout, M., Carter, L., Ferrante, J., and Simon, B. 1998. Schedule-independent storage mapping in loops. In Proceedings of the ACM International Conference on Architectural Support for Programming Languages and Operating Systems. ACM, New York.

Digital Library

[30]

Thies, W., Vivien, F., Sheldon, J., and Amarasinghe, S. 2001. A unified framework for schedule and storage optimization. In Proceedings of the ACM Conference on Programming Language Design and Implementation. ACM, New York.

Digital Library

[31]

Tu, P. and Padua, D. 1993. Automatic array privatization. In Proceedings of the 6th Workshop on Languages and Compilers for Parallel Computing, Lecture Notes in Computer Science, ACM, New York, 500--521.

Digital Library

[32]

Unnikrishnan, P., Chen, G., Kandemir, M., Karakoy, M., and Kolcu, I. 2003. Loop transformations for reducing data space requirements of resource-constrained applications. In Proceedings of the 10th Annual International Static Analysis Symposium.

Digital Library

[33]

Verdoolaege, S., Beyls, K., Bruynooghe, M., and Catthoor, F. 2005. Experiences with enumeration of integer projections of parametric polytops. In Proceedings of the 14th International Conference on Compiler Construction. Springer, Berlin, Germany, 91--105.

Digital Library

[34]

Verdoolaege, S., Seghir, R., Beyls, K., Loechner, V., and Bruynooghe, M. 2004. Analytical computation of Ehrhart polynomials: enabling more compiler analyses and optimizations. In Proceedings of the International Conference on Compilers, Architecture, and Synthesis for Embedded Systems (CASES'04). ACM, New York, 248--258.

Digital Library

[35]

Wilde, D. and Rajopadhye, S. 1997. Memory reuse analysis in the polyhedral model. In Parallel Processing Letters. Springer-Verlag, Berlin, Germany.

Digital Library

[36]

Wolf, M. and Lam, M. 1991. A data locality optimizing algorithm. In Proceedings of the ACM Conference on Programming Language Design and Implementation. ACM, New York, 30--44.

Digital Library

[37]

Wolfe, M. 1996. High Performance Compilers for Parallel Computing. Addison-Wesley, New York.

Digital Library

[38]

Zhao, Y. and Malik, S. 1999. Exact memory size estimation for array computations without loop unrolling. In Proceedings of the ACM/IEEE Design Automation Conference. ACM, New York.

Digital Library

[39]

Zervas, N. D., Masselos, K., and Goutis, C. 1998. Code transformations for embedded multimedia applications: impact on power and performance. In Proceedings of the ISCA Power-Driven Microarchitecture Workshop. ACM, New York.

Cited By

Wang YShao ZChan HLiu DGuan Y(2014)Memory-Aware Task Scheduling with Communication Overhead Minimization for Streaming Applications on Bus-Based Multiprocessor System-on-ChipsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2013.17225:7(1797-1807)Online publication date: 1-Jul-2014
https://dl.acm.org/doi/10.1109/TPDS.2013.172

Index Terms

Reducing memory requirements of resource-constrained applications
1. Software and its engineering
  1. Software notations and tools
    1. Compilers

Recommendations

A Novel Memory Block Management Scheme for PCM Using WOM-Code
HPCC-CSS-ICESS '15: Proceedings of the 2015 IEEE 17th International Conference on High Performance Computing and Communications, 2015 IEEE 7th International Symposium on Cyberspace Safety and Security, and 2015 IEEE 12th International Conf on Embedded Software and Systems

Phase Change Memory (PCM) is a promising DRAM replacement in embedded systems due to its attractive characteristics including low static power consumption and high density. However, long write latency is one of the major drawbacks in current PCM ...
Accurate age counter for wear leveling on non-volatile based main memory

Limited lifetime has been a key challenge in development of emerging non-volatile memories (NVM). Age counter based wear leveling is the most effective approach in the extension of their lifetime. The age counters in these approaches are determined by ...
Exploiting Data Longevity for Enhancing the Lifetime of Flash-based Storage Class Memory

Storage-class memory (SCM) combines the benefits of a solid-state memory, such as high-performance and robustness, with the archival capabilities and low cost of conventional hard-disk magnetic storage. Among candidate solid-state nonvolatile memory ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Embedded Computing Systems

ACM Transactions on Embedded Computing Systems Volume 8, Issue 3

April 2009

239 pages

ISSN:1539-9087

EISSN:1558-3465

DOI:10.1145/1509288

Issue’s Table of Contents

Copyright © 2009 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Journal Family

ACM Journals for the Design of Smart and Connected Systems

Publication History

Published: 22 April 2009

Accepted: 01 March 2008

Revised: 01 June 2005

Received: 01 May 2003

Published in TECS Volume 8, Issue 3

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
604
Total Downloads

Downloads (Last 12 months)3
Downloads (Last 6 weeks)0

Reflects downloads up to 12 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Wang YShao ZChan HLiu DGuan Y(2014)Memory-Aware Task Scheduling with Communication Overhead Minimization for Streaming Applications on Bus-Based Multiprocessor System-on-ChipsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2013.17225:7(1797-1807)Online publication date: 1-Jul-2014
https://dl.acm.org/doi/10.1109/TPDS.2013.172

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Issue’s Table of Contents