Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Cache Conscious Data Layout Organization for Conflict Miss Reduction in Embedded Multimedia Applications

Published: 01 January 2005 Publication History

Abstract

Cache misses form a major bottleneck for real-time multimedia applications due to the off-chip accesses to the main memory. This results in both a major access bandwidth overhead (and related power consumption) as well as performance penalties. In this paper, we propose a new technique for organizing data in the main memory for data dominated multimedia applications so as to reduce the majority of the conflict cache misses. The focus of this paper is on the formal and heuristic algorithm we use to steer the data layout decisions and the experimental results obtained using a prototype tool. Experiments on real-life demonstrators illustrate that we are able to reduce up to 82 percent of the conflict misses for applications which are already aggressively transformed at source-level. At the same time, we also reduce the off-chip data accesses by up to 78 percent. In addition, we are able to reduce up to 20 percent more conflict misses compared to existing techniques.

References

[1]
P. Baglietto M. Maresca and M. Migliardi, “Image Processing on High-Performance RISC Systems,” Proc. IEEE, vol. 84, no. 7, pp. 917-929, July, 1996.
[2]
M. Bister Y. Taeymans and J. Cornelis, “Automatic Segmentation of Cardiac MR Images,” Computers in Cardiology, pp. 215-218, 1989.
[3]
D.C. Burger J.R. Goodman and A. Kagi, “The Declining Effectiveness of Dynamic Caching for General Purpose Multiprocessor,” Technical Report no. 1261, Univ. of Wisconsin, 1995.
[4]
E. De Greef, “Storage Size Reduction for Multimedia Applications,” doctoral dissertation, Dept. of Electrical Eng., K.U. Leuven, Jan. 1998.
[5]
F. Catthoor S. Wuytack E. De Greef F. Balasa L. Nachtergaele and A. Vandecappelle, Custom Memory Management Methodology-Exploration of Memory Organization for Embedded Multimedia System Design. Boston: Kluwer Academic, 1998.
[6]
S. Ghosh M. Martonosi and S. Malik, “Cache Miss Equations: A Compiler Framework for Analyzing and Tuning Memory Behaviour,” ACM Trans. Programming Languages and Systems, vol. 21, no. 4, pp. 702-746, July 1999.
[7]
S. Gupta M. Miranda F. Catthoor and R. Gupta, “Analysis of High-Level Address Code Transformations,” Proc. Design Automation and Test in Europe (DATE) Conf., Mar. 2000.
[8]
N. Jouppi, et al., “A 300-MHz 115-W 32-b Bipolar ECL Microprocessor,” IEEE J. Solid-State Circuits, pp. 1152-1165, Nov. 1993.
[9]
M. Kandemir J. Ramanujam and A. Choudhary, “Improving Cache Locality by a Combination of Loop and Data Transformations,” IEEE Trans. Computers, vol. 48, no. 2, pp. 159-167, Feb. 1999.
[10]
C. Kulkarni, “Cache Optimization for Multimedia Applications,” doctoral dissertation, Katholieke Universiteit Leuven, Belgium, Feb. 2001.
[11]
D. Kulkarni and M. Stumm, “Linear Loop Transformations in Optimizing Compilers for Parallel Machines,” The Australian Computer J., pp. 41-50, May 1995.
[12]
M. Lam E. Rothberg and M. Wolf, “The Cache Performance and Optimizations of Blocked Algorithms” Proc. Sixth Int'l Conf. Architectural Support for Programming Languages and Operating Systems (ASPLOS-IV), pp.nbsp63-74, 1991.
[13]
N. Manjikian and T. Abdelrahman, “Array Data Layout for Reduction of Cache Conflicts,” Proc. Int'l Conf. Parallel and Distributed Computing Systems, 1995.
[14]
K.S. McKinley and O. Temam, “A Quantitative Analysis of Loop Nest Locality” Proc. Eighth Int'l Conf. Architectural Support for Programming Languages and Operating Systems (ASPLOS-VIII), Oct. 1996.
[15]
M. Miranda C. Ghez C. Kulkarni F. Catthoor and D. Verkest, “Systematic Speed-Power Memory Data-Layout Exploration for Cache Controlled Embedded Multimedia Applications” Proc. 14th ACM/IEEE Int'l Symp. System-Level Synthesis (ISSS), pp. 107-112, Oct. 2001.
[16]
G. L. Nemhauser and L. A. Wolsey, Integer and Combinatorial Optimization. J.nbspWiley & Sons, 1988.
[17]
P.R. Panda N.D. Dutt and A. Nicolau, “Memory Data Organization for Improved Cache Performance in Embedded Processor Applications,” Proc. Int'l Symp. System-Level Synthesis (ISSS-96), pp. 90-95, Nov. 1996.
[18]
P. R. Panda H. Nakamura N. D. Dutt and A. Nicolau, “Augmented Loop Tiling with Data Alignment for Improved Cache Performance” IEEE Trans. Computers, vol. 48, no. 2, pp. 142-149, Feb. 1999.
[19]
D. Burger and T. Austin, “The Simplescalar Toolset,” version 2.0, http://ww.cs.wisc.edu/mscalar/simplescalar.html, 10 Mar. 2000.
[20]
G. Rivera and C. Tseng, ”Compiler Optimizations for Eliminating Cache Conflict Misses,” technical report, Univ. of Maryland, July 1997.
[21]
CACTI, http://www.research.compaq.com/wrl/people/jouppi/CACTI.html, 28 Nov. 2001.
[22]
Infineon Technologies, http://www.infineon.com, 28 Nov. 2001.

Cited By

View all
  • (2018)Loop acceleration exploration for ASIP architectureIEEE Transactions on Very Large Scale Integration (VLSI) Systems10.1109/TVLSI.2011.210792320:4(684-696)Online publication date: 29-Dec-2018
  • (2018)YAARCThe Journal of Supercomputing10.1007/s11227-007-0147-z44:1(24-40)Online publication date: 30-Dec-2018
  • (2016)Integrated Exploration Methodology for Data Interleaving and Data-to-Memory Mapping on SIMD ArchitecturesACM Transactions on Embedded Computing Systems10.1145/289475415:3(1-23)Online publication date: 23-May-2016
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image IEEE Transactions on Computers
IEEE Transactions on Computers  Volume 54, Issue 1
January 2005
97 pages

Publisher

IEEE Computer Society

United States

Publication History

Published: 01 January 2005

Author Tags

  1. 65
  2. Index Terms- RISC/CISC
  3. VLIW architectures
  4. VLSI systems.

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 06 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2018)Loop acceleration exploration for ASIP architectureIEEE Transactions on Very Large Scale Integration (VLSI) Systems10.1109/TVLSI.2011.210792320:4(684-696)Online publication date: 29-Dec-2018
  • (2018)YAARCThe Journal of Supercomputing10.1007/s11227-007-0147-z44:1(24-40)Online publication date: 30-Dec-2018
  • (2016)Integrated Exploration Methodology for Data Interleaving and Data-to-Memory Mapping on SIMD ArchitecturesACM Transactions on Embedded Computing Systems10.1145/289475415:3(1-23)Online publication date: 23-May-2016
  • (2015)Array Interleaving—An Energy-Efficient Data Layout TransformationACM Transactions on Design Automation of Electronic Systems10.1145/274787520:3(1-26)Online publication date: 24-Jun-2015
  • (2011)Optimizing data locality using array tilingProceedings of the International Conference on Computer-Aided Design10.5555/2132325.2132359(142-149)Online publication date: 7-Nov-2011
  • (2011)A data layout optimization framework for NUCA-based multicoresProceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture10.1145/2155620.2155677(489-500)Online publication date: 3-Dec-2011
  • (2008)Heterogeneously tagged caches for low-power embedded systems with virtual memory supportACM Transactions on Design Automation of Electronic Systems (TODAES)10.1145/1344418.134442813:2(1-24)Online publication date: 23-Apr-2008

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media