Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Reducing memory requirements of resource-constrained applications

Published: 22 April 2009 Publication History

Abstract

Embedded computing platforms are often resource constrained, requiring great design and implementation attention to memory-power-, and heat-related parameters. An important task for a compiler in such platforms is to simplify the process of developing applications for limited memory devices and resource-constrained clients. Focusing on array-intensive embedded applications to be executed on single CPU-based architectures, this work explores how loop-based compiler optimizations can be used for increasing memory location reuse. Our goal is to transform a given application in such a way that the resulting code has fewer cases (as compared to the original code), where the lifetimes of array elements overlap. The reduction in lifetimes of array elements can then be exploited by reusing memory locations as much as possible. Our experimental results indicate that the proposed strategy reduces data space requirements of 15 resource constrained applications by more than 40%, on average. We also demonstrate how this strategy can be combined with data locality (cache behavior)--enhancing techniques so that a compiler can take advantage of both, that is, reduce data memory requirements and improve data locality at the same time.

References

[1]
Ahmed, N., Mateev, N. and Pingali, K. 2000. Synthesizing transformations for locality enhancement of imperfectly-nested loop nests. In Proceedings of the International Conference on Supercomputing (ICS'00). ACM, New York.
[2]
Amarasinghe, S. P., Anderson, J. M., Lam, M. S., and Tseng, C. W. 1995. The SUIF compiler for scalable parallel machines. In Proceedings of the 7th SIAM Conference on Parallel Processing for Scientific Computing. Society for Industrial and Applied Mathematics, Philadelphia, PA.
[3]
Ancourt, C., and Irigoin, F. 1991. Scanning polyhedra with DO loops. In Proceedings of the Annual ACM Symposium on Principles of Programming Languages. ACM, New York, 39--50.
[4]
Barthou, D., Cohen, A., and Collard, J-F. 1998. Maximal static expansion. In Proceedings of the 25th Annual ACM Symposium on Principles of Programming Languages. ACM, New York.
[5]
Briggs, P. 1992. Register allocation via graph coloring. Ph.D. thesis, Rice University, Houston, TX.
[6]
Catthoor, F., Danckaert, K., Kulkarni, C., Brockmeyer, E., Kjeldsberg, P. G., Achteren, T. V., and Omnes, T. 2002. Data Access and Storage Management for Embedded Programmable Processors. Kluwer Academic Publishers, Berlin, Germany.
[7]
Catthoor, F., Wuytack, S., Greef, E. D., Balasa, F., Nachtergaele, L., and Vandecappelle, A. 1998. Custom Memory Management Methodology: Exploration of Memory Organization for Embedded Multimedia System Design. Kluwer Academic Publishers, Berlin, Germany.
[8]
Darte, A., Schreiber, R., and Villard, G. 2003. Lattice based memory allocation. In Proceedings of the International Conference on Compilers, Architecture, and Embedded Systems (CASES'03). ACM, New York, 298--308.
[9]
Fraboulet, A., Kodary, K., and Mignotte, A. 2001. Loop fusion for memory space optimization. In Proceedings of the 14th International Symposium on System Synthesis. IEEE, Los Alamitos, CA.
[10]
Franke, B. and O'Boyle, M. F. P. 2001. Compiler transformation of pointers to explicit arrayaccesses in DSP applications. In Proceedings of the International Conference on Compiler Construction (CC'01). Springer, Berlin, Germany.
[11]
Gannon, D., Jalby, W., and Gallivan, K. 1988. Strategies for cache and local memory management by global program transformations. J. Parall. Distrib. Comput., 5, 587--616.
[12]
P. Grun, F. Balasa, and N. Dutt. 1998. Memory size estimation for multimedia applications. In Proceedings of CODES/CACHE.
[13]
Hall, M. W., Amarasinghe, S., Murphy, B., Liao, S., and Lam, M. 1995. Detecting coarse-grain parallelism using an interprocedural parallelizing compiler. In Proceedings of the IEEE/ACM Conference on Supercomputing. IEEE, Los Alamitos, CA.
[14]
Irigoin, F. and Triolet, R. 1988. Supernode partitioning. In Proceedings of the 15th Annual ACM Symposium on Principles of Programming Languages. ACM, New York, 319--329.
[15]
Kandemir, M. 2001. A compiler technique for improving whole program locality. In Proceedings of the 28th Annual ACM Symposium on Principles of Programming Languages. ACM, New York.
[16]
Kennedy, K. and McKinley, K. S. 1993. Maximizing loop parallelism and improving data locality via loop fusion and distribution. In Proceedings of the Workshop on Languages and Compilers for Parallel Computing. Springer, Berlin, Germany, 301--321.
[17]
Knobe, K. and Sarkar, V. 1998. Array SSA form and its use in parallelization. In Proceedings of the 15th Annual ACM Symposium on Principles of Programming Languages. ACM, New York, 107--120.
[18]
Kolcu, I. Personal communication.
[19]
Lefebvre V. and Feautrier, P. 1997. Automatic storage management for parallel programs. Res. rep. PRiSM 97/8, France.
[20]
Li, W. 1993. Compiling for NUMA parallel machines. Ph.D. Thesis, Computer Science Department, Cornell University, Ithaca, New York.
[21]
Marchal, P., Gomez, J. I., Verdoolaege, S., Pinuel, L., and Catthoor, F. 2004. Optimizing the memory bandwidth with loop fusion. In Proceedings of the the 2nd IEEE/ACM/IFIP International Conference on Hardware-Software Codesign and System Synthesis. IEEE, Los Alamitos, CA, 188--193.
[22]
Maydan, D. E., Amarasinghe, S. P., and Lam, M. S. 1993. Array dataflow analysis and its use in array privatization. In Proceedings of the Annual ACM Symposium on Principles of Programming Languages. ACM, New York, 2--15.
[23]
McKinley, K., Carr, S., and Tseng, C.-W. 1996. Improving data locality with loop transformations. ACM Trans. Program. Lang. Syst., 8, 424--453.
[24]
MediaBench. http://cares.icsl.ucla.edu/MediaBench/.
[25]
MiBench. http://www.eecs.umich.edu/mibench/.
[26]
MIPSpro Family of Compilers. http://www.sgi.com/developers/devtools/languages/mipspro.html.
[27]
Pugh, W. and Wonnacott, D. 1993. An exact method for analysis of value-based array data dependences. In Proceedings of the 6th Workshop on Languages and Compilers for Parallel Computing, ACM, New York.
[28]
Song, Y., Xu, R., Wang, C., and Li, Z. 2001. Data locality enhancement by memory reduction. In Proceedings of the 15th ACM International Conference on Supercomputing. ACM, New York.
[29]
Strout, M., Carter, L., Ferrante, J., and Simon, B. 1998. Schedule-independent storage mapping in loops. In Proceedings of the ACM International Conference on Architectural Support for Programming Languages and Operating Systems. ACM, New York.
[30]
Thies, W., Vivien, F., Sheldon, J., and Amarasinghe, S. 2001. A unified framework for schedule and storage optimization. In Proceedings of the ACM Conference on Programming Language Design and Implementation. ACM, New York.
[31]
Tu, P. and Padua, D. 1993. Automatic array privatization. In Proceedings of the 6th Workshop on Languages and Compilers for Parallel Computing, Lecture Notes in Computer Science, ACM, New York, 500--521.
[32]
Unnikrishnan, P., Chen, G., Kandemir, M., Karakoy, M., and Kolcu, I. 2003. Loop transformations for reducing data space requirements of resource-constrained applications. In Proceedings of the 10th Annual International Static Analysis Symposium.
[33]
Verdoolaege, S., Beyls, K., Bruynooghe, M., and Catthoor, F. 2005. Experiences with enumeration of integer projections of parametric polytops. In Proceedings of the 14th International Conference on Compiler Construction. Springer, Berlin, Germany, 91--105.
[34]
Verdoolaege, S., Seghir, R., Beyls, K., Loechner, V., and Bruynooghe, M. 2004. Analytical computation of Ehrhart polynomials: enabling more compiler analyses and optimizations. In Proceedings of the International Conference on Compilers, Architecture, and Synthesis for Embedded Systems (CASES'04). ACM, New York, 248--258.
[35]
Wilde, D. and Rajopadhye, S. 1997. Memory reuse analysis in the polyhedral model. In Parallel Processing Letters. Springer-Verlag, Berlin, Germany.
[36]
Wolf, M. and Lam, M. 1991. A data locality optimizing algorithm. In Proceedings of the ACM Conference on Programming Language Design and Implementation. ACM, New York, 30--44.
[37]
Wolfe, M. 1996. High Performance Compilers for Parallel Computing. Addison-Wesley, New York.
[38]
Zhao, Y. and Malik, S. 1999. Exact memory size estimation for array computations without loop unrolling. In Proceedings of the ACM/IEEE Design Automation Conference. ACM, New York.
[39]
Zervas, N. D., Masselos, K., and Goutis, C. 1998. Code transformations for embedded multimedia applications: impact on power and performance. In Proceedings of the ISCA Power-Driven Microarchitecture Workshop. ACM, New York.

Cited By

View all
  • (2014)Memory-Aware Task Scheduling with Communication Overhead Minimization for Streaming Applications on Bus-Based Multiprocessor System-on-ChipsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2013.17225:7(1797-1807)Online publication date: 1-Jul-2014

Index Terms

  1. Reducing memory requirements of resource-constrained applications

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Embedded Computing Systems
    ACM Transactions on Embedded Computing Systems  Volume 8, Issue 3
    April 2009
    239 pages
    ISSN:1539-9087
    EISSN:1558-3465
    DOI:10.1145/1509288
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Journal Family

    Publication History

    Published: 22 April 2009
    Accepted: 01 March 2008
    Revised: 01 June 2005
    Received: 01 May 2003
    Published in TECS Volume 8, Issue 3

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Memory
    2. compilers
    3. embedded system
    4. lifetime
    5. reuse

    Qualifiers

    • Research-article
    • Research
    • Refereed

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)3
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 12 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2014)Memory-Aware Task Scheduling with Communication Overhead Minimization for Streaming Applications on Bus-Based Multiprocessor System-on-ChipsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2013.17225:7(1797-1807)Online publication date: 1-Jul-2014

    View Options

    Get Access

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media