Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
article

Search space definition and exploration for nonuniform data reuse opportunities in data-dominant applications

Published: 01 January 2003 Publication History

Abstract

Efficient exploitation of temporal locality in the memory accesses on array signals can have a very large impact on the power consumption in embedded data dominated applications. The effective use of an optimized custom memory hierarchy or a customized software controlled mapping on a predefined hierarchy is crucial for this. Only recently have effective systematic techniques to deal with this specific design step begun to appear. They are still limited in their exploration scope. In this paper we construct the design space by introducing three parameters which determine how and when copies are made between different levels in a hierarchy, and determine their impact on the total memory size, storage-related power consumption, and code complexity. Strategies are then established for an efficient exploration, such that cost-effective solutions for the memory size/power trade-off can be achieved. The effectiveness of the techniques is demonstrated for several real-life image processing algorithms.

References

[1]
Balakrishnan, M., Majumdar, A., Banerji, D., Linders, J., and Majithia, J. 1988. Allocation of multiport memories in data path synthesis. IEEE Trans. Comput.-Aided Des. CAD-7, 4 (April), 536--540.]]
[2]
Banerjee, U., Eigenmann, R., Nicolau, A., and Padua, D. 1993. Automatic program parallelisation. Proc. IEEE 81, 2 (Feb.), 211--243 (invited paper).]]
[3]
Belady, L. 1966. A study of replacement algorithms for a virtual-storage computer. IBM Syst. J. 5, 6, 78--101.]]
[4]
Benini, L., Macii, A., and Poncini, M. 2000. A recursive algorithm for low-power memory partitioning. In IEEE International Symposium on Low Power Design (Rapallo, Italy). 78--83.]]
[5]
Catthoor, F., Danckaert, K., Kulkarni, C., Brockmeyer, E., Kjeldsberg, P., Van Achteren, T., and Omnes, T. 2002. Data Access and Storage Management for Embedded Programmable Processors. Kluwer Academic Publishers, Boston, MA.]]
[6]
Catthoor, F., Wuytack, S., De Greef, E., Balasa, F., Nachtergaele, L., and Vandecappelle, A. 1998. Custom Memory Management Methodology---Exploration of Memory Organisation for Embedded Multimedia System Design. Kluwer Academic Publishers, Boston, MA.]]
[7]
Deklerck, R., Cornelis, J., and Bister, M. 1993. Segmentation of medical images. Image Vision Comput. J. 11, 8 (Oct.), 486--503.]]
[8]
Diguet, J., Wuytack, S., Catthoor, F., and De Man, H. 1997. Formalized methodology for data reuse exploration in hierarchical memory mappings. In Proceedings of the IEEE International Symposium on Low Power Design (Monterey, CA). 30--35.]]
[9]
Gannon, D., Jalby, W., and Gallivan, K. 1988. Strategies for cache and local memory management by global program transformation. Parallel Distr. Comput. 5, 5, 586--616.]]
[10]
Hill, M. 1987. Aspects of cache memory and instruction buffer performance. Ph.D. thesis, University of California, Berkeley, Berkeley, CA.]]
[11]
Jacob, B., Chen, P., Silverman, S., and Mudge, T. 1996. An analytical model for designing memory hierarchies. IEEE Trans. Comput. C-45, 10, 1180--1193.]]
[12]
Kandemir, M., Ramanujam, J., and Choudhary, A. 1999. Improving cache locality by a combination of loop and data transformations. IEEE Trans. Comput. 48, 2 (Feb.), 159--167.]]
[13]
Kandemir, M., Ramanujam, J., Irwin, M., Vijaykrishnan, N., Kadayif, I., and Parikh, A. 2001. Dynamic management of scratch-pad memory space. In Design Automation Conference (DAC). Las Vegas, NV.]]
[14]
Khouri, K., Lakshminarayana, G., and Jha, N. 1999. Memory binding for performance optimization of control-flow intensive behaviors. In International Conference on Computer-Aided Design (ICCAD. San Jose, CA). 482--488.]]
[15]
Kolson, D., Nicolau, A., and Dutt, N. 1996. Elimination of redundant memory traffic in high-level synthesis. IEEE Trans. Comput.-Aided Des. 15, 11 (Nov.), 1354--1363.]]
[16]
Komarek, T. and Pirsch, P. 1989. Array architectures for block matching algorithms. IEEE Trans. Circ. Syst. 36, 10, 1301--1308.]]
[17]
Kulkarni, D. and Stumm, M. 1994. Linear loop transformations in optimizing compilers for parallel machines. Tech. Rep., Computer Systems Research Institute, University of Toronto, Toronto, Ont., Canada.]]
[18]
Liu, L. 1994. Issues in multi-level cache design. In IEEE International Conference on Computer Design (Cambridge, MA). 46--52.]]
[19]
McKinley, K., Carr, S., and Tseng, C.-W. 1996. Improving data locality with loop transformations. ACM Trans. Programm. Lang. Syst. 18, 4 (July), 424--453.]]
[20]
Nachtergaele, L., Catthoor, F., Kapoor, B., Moolenaar, D., and Janssen, S. 1996. Low power storage exploration for h.263 video decoder. In IEEE Workshop on VLSI Signal Processing (Monterey, CA). Also in: 1996. VLSI Signal Processing IX, W. Burleson, K. Konstantinides, and T. Meng, eds. IEEE Press, Los Alamitos, CA, 116--125.]]
[21]
Panda, P., Dutt, N., and Nicolau, A. 1996. Memory organization for improved data cache performance in embedded processors. In 1996 International Symposium on System Synthesis (La Jolla CA). 90--95.]]
[22]
Panda, P., Dutt, N., and Nicolau, A. 1999. Memory Issues in Embedded in Systems-on-Chip: Optimization and Exploration. Kluwer Academic Publishers, Boston, MA.]]
[23]
Patterson, D. and Hennessy, J. 1990. Computer Architecture: A Quantitative Approach. Morgan Kaufmann, San Mateo, CA.]]
[24]
Przybylski, S., Horowitz, M., and Hennessy, J. 1988. Performance tradeoffs in cache design. In 15th Annual International Symposium on Computer Architecture (Honolulu, HI). 290--298.]]
[25]
Robinson, J. 1997. Efficient general-purpose image compression with binary tree predictive coding. IEEE Trans. Image Process. 6, 4 (April), 601--607.]]
[26]
Schmit, H. and Thomas, D. 1997. Synthesis of application-specific memory designs. IEEE Trans. VLSI Syst. 5, 1 (March), 101--111.]]
[27]
Shiue, W.-T., Tadas, S., and Chakrabarti, C. 2000. Low power multi-module, multi-port memory design for embedded systems. In IEEE Workshop on Signal Processing Systems (SIPS, Lafayette LA). 529--538.]]
[28]
Smith, S. and Brady, J. 1997. Susan---a new approach to low level image processing. Internat. J. Comput. Visi. 23, 1 (May), 45--78. Source code: http://www.fmrib.ox.ac.uk/∼steve/susan/susan2l.c.]]
[29]
Soudris, D., Zervas, N., Argyriou, A., Dasygenis, M., Tatas, K., Goutis, C., and Thanailakis, A. 2000. Data reuse and perallel embedded architectures for low power, real-time multimedia applications. In IEEE Workshop on Power and Timing Modeling, Optimization and Simulation (PATMOS, Goettingen, Germany). 343--354.]]
[30]
Van Achteren, T., Adé, M., Lauwereins, R., Proesmans, M., Van Gool, L., Bormans, J., and Catthoor, F. 2000a. Transformations of a 3d image reconstruction algorithm for data transfer and storage optimisation. Des. Automat. Embedded Syst. 5, 3, 313--327.]]
[31]
Van Achteren, T., Deconinck, G., Catthoor, F., and Lauwereins, R. 2002. Data reuse exploration methodology for loop-dominated applications. In IEEE/ACM Design Automation and Test Conference (DATE, Paris, France). 428--435.]]
[32]
Van Achteren, T., Lauwereins, R., and Catthoor, F. 2000b. Systematic data reuse exploration methodology for irregular access patterns. In IEEE/ACM 13th International Symposium on System Synthesis (ISSS, Madrid, Spain). 115--121.]]
[33]
Wuytack, S., Catthoor, F., Franssen, F., Nachtergaele, L., and De Man, H. 1994. Global communication and memory optimizing transformations for low power systems. In IEEE International Workshop on Low Power Design (Napa CA). 203--208.]]
[34]
Zhao, Y. and Malik, S. 1999. Exact memory size estimation for array computation without loop unrolling. In 36th ACM/IEEE Design Automation Conference (DAC, New Orleans, LA). 811--816.]]

Cited By

View all
  • (2007)DRDUACM Transactions on Design Automation of Electronic Systems10.1145/1230800.123080712:2(15-es)Online publication date: 1-Apr-2007
  • (2006)Hierarchical memory size estimation for loop fusion and loop shifting in data-dominated applicationsProceedings of the 2006 Asia and South Pacific Design Automation Conference10.1145/1118299.1118442(606-611)Online publication date: 24-Jan-2006
  • (2006)Software-Controlled Scratchpad Mapping Strategies for Wavelet-Based Applications2006 IEEE Workshop on Signal Processing Systems Design and Implementation10.1109/SIPS.2006.352609(362-367)Online publication date: Oct-2006
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Design Automation of Electronic Systems
ACM Transactions on Design Automation of Electronic Systems  Volume 8, Issue 1
January 2003
139 pages
ISSN:1084-4309
EISSN:1557-7309
DOI:10.1145/606603
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Journal Family

Publication History

Published: 01 January 2003
Published in TODAES Volume 8, Issue 1

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Memory hierarchy
  2. data reuse
  3. power consumption

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)6
  • Downloads (Last 6 weeks)1
Reflects downloads up to 22 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2007)DRDUACM Transactions on Design Automation of Electronic Systems10.1145/1230800.123080712:2(15-es)Online publication date: 1-Apr-2007
  • (2006)Hierarchical memory size estimation for loop fusion and loop shifting in data-dominated applicationsProceedings of the 2006 Asia and South Pacific Design Automation Conference10.1145/1118299.1118442(606-611)Online publication date: 24-Jan-2006
  • (2006)Software-Controlled Scratchpad Mapping Strategies for Wavelet-Based Applications2006 IEEE Workshop on Signal Processing Systems Design and Implementation10.1109/SIPS.2006.352609(362-367)Online publication date: Oct-2006
  • (2004)Data Reuse Analysis Technique for Software-Controlled Memory HierarchiesProceedings of the conference on Design, automation and test in Europe - Volume 110.5555/968878.968995Online publication date: 16-Feb-2004
  • (2004)Behavioural bitwise scheduling based on computational effort balancingProceedings Design, Automation and Test in Europe Conference and Exhibition10.1109/DATE.2004.1268930(684-685)Online publication date: 2004
  • (2004)Data reuse analysis technique for software-controlled memory hierarchiesProceedings Design, Automation and Test in Europe Conference and Exhibition10.1109/DATE.2004.1268849(202-207)Online publication date: 2004

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media