Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1016720.1016767acmconferencesArticle/Chapter ViewAbstractPublication PagesesweekConference Proceedingsconference-collections
Article

Optimizing the memory bandwidth with loop fusion

Published: 08 September 2004 Publication History

Abstract

The memory bandwidth largely determines the performance and energy cost of embedded systems. At the compiler level, several techniques improve the memory bandwidth at the scope of a basic block, but often fail to exploit all. We propose a technique to optimize the memory bandwidth across the boundaries of a basic block. Our technique incrementally fuses loops to better use the available bandwidth. The resulting performance depends on how the data is assigned to the memories of the memory layer. At the same time, the assignment also strongly influences the energy cost. Therefore, we combine in our approach the fusion and assignment decisions. Designers can use our output to trade-off the energy cost with the system's performance.

References

[1]
O. Avissar, R. Barua, and D. Stewart. Heterogeneous Memory Management for Embedded Systems. In Proc. Cases, 2001.]]
[2]
F. Bodin, W. Jalby, C. Eisenbeis, and D. Windheiser. A quantitative algorithm for data locality optimization. In Proc. Int. Wkshp. on Code Generation, pages 119--145, 1991.]]
[3]
D. Gannon and W. Jalby abd K. Gallivan. Strategies for cache and local memory management by global progra, optimizations. J. of Parallel and Distributed Systems, 25:587--617, 1988.]]
[4]
P. Grun, N. Dutt, and A. Nicolau. Memory Aware Compilation through Timing Extraction. In Proc. 37th Dac, pages 316--321, Jun. 2001.]]
[5]
K. McKinley, S. Carr, and C. Tseng. Improving data locality with loop transformations. ACM Transactions on Programming Languages and Systems, 18(4):424--453, July 1996.]]
[6]
L. Lamport. The parallel execution of do-loops. Communications of ACM, 17(2):83--93, Feb. 1974.]]
[7]
P. Marchal, J.I. Gomez, and F. Catthoor. Loop morphing to improve the performance on a VLIW. In accepted for ASAP 2004, 2004.]]
[8]
M. Wolf. Improving locality and parallelism in nested loops. Technical report, Technical report CSL-TR-92-538, Stanford Univ., CA, USA, Sep. 1992.]]
[9]
P. Panda, N. Dutt, and A. Nicolau. Exploiting Off-Chip Memory Access Modes in High-Level Synthesis. In Proc. Iccad, pages 333--340, Oct. 1997.]]
[10]
P. Panda, F. Catthoor, N. Dutt, K. Danckaert, E. Brockmeyer, C. Kulkarni, A. Vandecappelle, and P.G. Kjeldsberg. Data and Memory Optimizations for Embedded Systems. ACM Trans. on Design Automation for Embedded Systems (TODAES), 6(2):142--206, Apr. 2001.]]
[11]
Y. Qian, S. Carr, and P. Sweany. Loop Fusion for Clustered VLIW Architectures. In Proc. Joint Conference on Languages, Compilers and Tools for Embedded Systems and Software and Compilers for Embedded Systems, pages 19--21, June 2002.]]
[12]
B. Rau. Iterative Modulo Scheduling. Technical report, HP Labs, 1995.]]
[13]
M. Saghir, P. Chow, and C. Lee. Exploiting Dual Data Banks in Digital Signal Processors. In ASPLOS, Jun. 1997.]]
[14]
A. Vandecappelle, M. Miranda, E. Brockmeyer, F. Catthoor, and D. Verkest. Global Multimedia System Design Exploration using Accurate Memory Organization Feedback. In Proc. 39th DAC, 1999.]]
[15]
S. Verdoorlaege, M. Bruynooghe, G. Janssens, and F. Catthoor. Multi-dimensional incremental loop fusion for data locality. In Proceedings 2003 Application-specific Systems, Architectures and Processors, pages 17--27, 2003.]]
[16]
W. Verhaegh, E. Aarts, P. van Gorp, and P. Lippens. A Two-stage Solution Approach for Multidimensional Periodic Scheduling. IEEE Trans. Computer Aided Design of Integrated Circuits and Systems, 10(10):1185--1199, Oct. 2001.]]
[17]
S. Wuytack, F. Catthoor, G. De Jong, and H. De Man. Minimizing the required memory bandwidth in VLSI system realizations. IEEE Trans. VLSI Systems, 7(4):433--441, Dec. 1999.]]

Cited By

View all
  • (2020)THREE LEVELS EFFECTIVE MEMORY ACCESS OPTIMIZATION ADDRESSING HIGH LATENCY ISSUES IN MODERN MEMORY DEPENDENT SYSTEMSJOURNAL OF MECHANICS OF CONTINUA AND MATHEMATICAL SCIENCES10.26782/jmcms.2020.08.0005115:8Online publication date: 18-Aug-2020
  • (2014)Compile Time Modeling of Off-Chip Memory Bandwidth for Parallel LoopsLanguages and Compilers for Parallel Computing10.1007/978-3-319-09967-5_17(292-306)Online publication date: 1-Oct-2014
  • (2012)Effective Memory Access Optimization by Memory Delay Modeling, Memory Allocation, and Slack Time ManagementInternational Journal of Computer Theory and Engineering10.7763/IJCTE.2012.V4.602(897-901)Online publication date: 2012
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
CODES+ISSS '04: Proceedings of the 2nd IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
September 2004
266 pages
ISBN:158113 9373
DOI:10.1145/1016720
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 08 September 2004

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. loop fusion
  2. low power
  3. memory bandwidth

Qualifiers

  • Article

Conference

CODES/ISSS04

Acceptance Rates

Overall Acceptance Rate 280 of 864 submissions, 32%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)4
  • Downloads (Last 6 weeks)1
Reflects downloads up to 09 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2020)THREE LEVELS EFFECTIVE MEMORY ACCESS OPTIMIZATION ADDRESSING HIGH LATENCY ISSUES IN MODERN MEMORY DEPENDENT SYSTEMSJOURNAL OF MECHANICS OF CONTINUA AND MATHEMATICAL SCIENCES10.26782/jmcms.2020.08.0005115:8Online publication date: 18-Aug-2020
  • (2014)Compile Time Modeling of Off-Chip Memory Bandwidth for Parallel LoopsLanguages and Compilers for Parallel Computing10.1007/978-3-319-09967-5_17(292-306)Online publication date: 1-Oct-2014
  • (2012)Effective Memory Access Optimization by Memory Delay Modeling, Memory Allocation, and Slack Time ManagementInternational Journal of Computer Theory and Engineering10.7763/IJCTE.2012.V4.602(897-901)Online publication date: 2012
  • (2011)Data locality and parallelism optimization using a constraint-based approachJournal of Parallel and Distributed Computing10.1016/j.jpdc.2010.08.00571:2(280-287)Online publication date: 1-Feb-2011
  • (2009)Reducing memory requirements of resource-constrained applicationsACM Transactions on Embedded Computing Systems10.1145/1509288.15092898:3(1-37)Online publication date: 22-Apr-2009
  • (2009)Effective memory access optimization by memory delay modeling, memory allocation, and buffer allocation2009 International SoC Design Conference (ISOCC)10.1109/SOCDC.2009.5423893(153-156)Online publication date: Nov-2009
  • (2009)Multiprocessor, Multithreading and Memory Optimization for On-Chip Multimedia ApplicationsJournal of Signal Processing Systems10.1007/s11265-008-0293-457:2(263-283)Online publication date: 1-Nov-2009
  • (2007)MPSoC memory optimization using program transformationACM Transactions on Design Automation of Electronic Systems10.1145/1278349.127835612:4(43-es)Online publication date: 1-Sep-2007
  • (2007)Optimizing Inter-Nest Data Locality Using Loop Splitting and Reordering2007 IEEE International Parallel and Distributed Processing Symposium10.1109/IPDPS.2007.370399(1-8)Online publication date: Mar-2007
  • (2007)Two-level tiling for MPSoC architecture2007 IEEE International Conf. on Application-specific Systems, Architectures and Processors (ASAP)10.1109/ASAP.2007.4429999(314-319)Online publication date: Jul-2007
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media