Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.5555/1015090.1015311acmconferencesArticle/Chapter ViewAbstractPublication PagesaspdacConference Proceedingsconference-collections
Article

Instruction buffering exploration for low energy VLIWs with instruction clusters

Published: 27 January 2004 Publication History

Abstract

For multimedia applications, loop buffering is an efficient mechanism to reduce the power in the instruction memory of embedded processors. In particular, software controlled clustered loop buffers are energy efficient. However current compilers for VLIW do not fully exploit the potentials offered by such a clustered organization This paper presents an algorithm to explore what is the optimal loop buffer configuration and the optimal way to use this configuration for an application or a set of applications. Results for the MediaBench application suite show an additional 18% reduction (on average) in energy in the instruction memory hierarchy as compared to traditional non-clustered approaches to the loop buffer without compromising performance.

References

[1]
M. F. Jacome and G. de Veciana, "Design challenges for new application-specific processors," Special issue on Design of Embedded Systems in IEEE Design & Test of Computers, April-June 2000.
[2]
G. Slavenburg, S. Rathnam, and H. Dijkstra, "The Trimedia TM-1 PCI VLIW media processor," in Proceedings Hot Chips VIII Conference, 1996.
[3]
Texas Instruments Inc., C6000 Platform: DSP Selection Guide, 2003. Selection Guide No. SSDV004L.
[4]
L. Benini, D. Bruni, M. Chinosi, C. Silvano, V. Zaccaria, and R. Zafalon, "A power modeling and estimation framework for vliw-based embedded systems," in in Proc. Int. Workshop on Power And Timing Modeling, Optimization and Simulation PATMOS, September 2001.
[5]
F. Catthoor, K. Danckaert, C. Kulkarni, E. Brockmeyer, P. G. Kjeldsberg, T. Van Achteren, and T. Omnes, Data access and storage management for embedded programmable processors. Kluwer Academic Publishers, March 2002.
[6]
P. R. Panda, N. D. Dutt, and A. Nicolau, "Memory data organization for improved cache performance in embedded processor applications," ACM TODAES, vol. 2, no. 4, pp. 384--409, 1997.
[7]
L. Benini and G. de Micheli, "Sysmtem-level power optimization: Techniques and tools," ACM TODAES, vol. 5, pp. 115--192, April 2000.
[8]
L. H. Lee, W. Moyer, and J. Arends, "Instruction fetch energy reduction using loop caches for embedded applications with small tight loops," in Proc of ISLPED, August 1999.
[9]
S. Rixner, W. Dally, B. Khialany, p. Mattson, U. Kapnasi, and J. Owens, "Register organization for media processing," in Proc of 26th International Symposium on High-Performance Computer Architecture (HiPC), January 2000.
[10]
M. Jayapala, F. Barat, T. Vander Aa, F. Catthoor, G. Deconinck, and H. Corporaal, "Clustered 10 buffer organization for low energy embedded processors," in Proc of 1st Workshop on Application Specific Processors (WASP), held in conjunction with MICRO-35, November 2002.
[11]
T. Anderson and S. Agarwala, "Effective hardware-based two-way loop cache for high performance low power processors," in Proc of ICCD, September 2000.
[12]
R. S. Bajwa and et al., "Instruction buffering to reduce power in processors for signal processing," IEEE Transactions on VLSI, vol. 5, pp. 417--424, December 1997.
[13]
L. H. Lee, B. Moyer, J. Arends, and A. Arbor, "Low-cost embedded program loop caching - revisited," tech.rep., EECS, University of Michigan, December 1999.
[14]
A. Gordon-Ross, S. Cotterell, and F. Vahid, "Exploiting fixed programs in embedded systems: A loop cache example," in Proc of IEEE Computer Architecture Letters, Jan 2002.
[15]
N. Bellas, I. Hajj, C. Polychronopoulos, and G. Stamoulis, "Architectural and compiler support for energy reduction in the memory hierarchy of high performance microprocessors," in Proc of ISLPED, August 1998.
[16]
R. Leupers, "Instruction scheduling for clustered VLIW DSPs, "in IEEE PACT, pp. 291--300, 2000.
[17]
J. Sánchez and A. Gonz'alez, "Modulo scheduling for a fully-distributed clustered vliw architectures," in Proc of 29th International Symposium on Microarchitecture (MICRO), December 2001.
[18]
J. A. Fisher, P. Faraboschi, and G. Desoli, "Custom-fit processors: Letting applications define architectures," in 29th Annual IEEE/ACM Symposium on Microarchitecture (MICRO-29), pp. 324--335, 1996.
[19]
A. Capitanio, N. Dutt, and A. Nicolau, "Partitioned register files for vliws: A preliminary analysis," in Proceedings of the 25th Annual International Symposium on Microarchitecture, pp. 292--300, 1992.
[20]
V. Lapinskii, M. F. Jacome, and G. de Veciana, "Application-specific clustered vliw datapaths: Early exploration on a parameterized design space," IEEE Transactions on Computer Aided Design of Integrated Circuits and Systems, vol. 21, pp. 889--903, August 2002.
[21]
D. Brooks, V. Tiwari, and M. Martonosi, "Wattch: A framework for architectural-level power analysis and optimizations," in Proc of ISCA, pp. 83--94, June 2000.
[22]
Trimaran group, http://www.trimaran.org, Trimaran: An Infrastructure for Research in Instruction-Level Parallelism, 1999.
[23]
C. Lee and et al., "Mediabench: A tool for evaluating and synthesizing multimedia and communicatons systems," in International Symposium on Microarchitecture, pp. 330--335, 1997.
[24]
T. Vander Aa, M. Jayapala, F. Barat, F. Catthoor, H. Corporaal, and G. Deconinck, "Instruction buffering exploration for low energy embedded processors," in Proc. of the 13th International Workshop on Power and Timing Modeling, Optimization and Simulation (PATMOS 2003) (E. M. J. J. Chico, ed.), (September 10--12, 2003, Torino, Italy), pp. 409--419, Springer Verlag, Lecture Notes in Computer Science, 09 2003.

Cited By

View all
  • (2013)DLICACM Transactions on Embedded Computing Systems10.1145/251246413:1(1-26)Online publication date: 5-Sep-2013
  • (2010)Enabling large decoded instruction loop caching for energy-aware embedded processorsProceedings of the 2010 international conference on Compilers, architectures and synthesis for embedded systems10.1145/1878921.1878957(247-256)Online publication date: 24-Oct-2010
  • (2009)Playing the trade-off gameACM Transactions on Design Automation of Electronic Systems10.1145/1529255.152925814:3(1-37)Online publication date: 4-Jun-2009
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ASP-DAC '04: Proceedings of the 2004 Asia and South Pacific Design Automation Conference
January 2004
957 pages
ISBN:0780381750

Sponsors

Publisher

IEEE Press

Publication History

Published: 27 January 2004

Check for updates

Qualifiers

  • Article

Conference

ASPDAC04
Sponsor:

Acceptance Rates

Overall Acceptance Rate 466 of 1,454 submissions, 32%

Upcoming Conference

ASPDAC '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 09 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2013)DLICACM Transactions on Embedded Computing Systems10.1145/251246413:1(1-26)Online publication date: 5-Sep-2013
  • (2010)Enabling large decoded instruction loop caching for energy-aware embedded processorsProceedings of the 2010 international conference on Compilers, architectures and synthesis for embedded systems10.1145/1878921.1878957(247-256)Online publication date: 24-Oct-2010
  • (2009)Playing the trade-off gameACM Transactions on Design Automation of Electronic Systems10.1145/1529255.152925814:3(1-37)Online publication date: 4-Jun-2009
  • (2008)COFFEEProceedings of the 3rd international conference on High performance embedded architectures and compilers10.5555/1786054.1786074(193-208)Online publication date: 27-Jan-2008
  • (2007)DRIMProceedings of the conference on Design, automation and test in Europe10.5555/1266366.1266659(1343-1348)Online publication date: 16-Apr-2007
  • (2007)Methodology for operation shuffling and L0 cluster generation for low energy heterogeneous VLIW processorsACM Transactions on Design Automation of Electronic Systems10.1145/1278349.127835412:4(41-es)Online publication date: 1-Sep-2007
  • (2005)Low power engineeringEmbedded Systems Design10.5555/2137690.2137724(450-478)Online publication date: 1-Jan-2005
  • (2005)Compiler Managed Dynamic Instruction Placement in a Low-Power Code CacheProceedings of the international symposium on Code generation and optimization10.1109/CGO.2005.13(179-190)Online publication date: 20-Mar-2005

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media