Article

Instruction buffering exploration for low energy VLIWs with instruction clusters

Authors: Tom Vander Aa, Murali Jayapala, Francisco Barat, Geert Deconinck, Rudy Lauwereins, Francky Catthoor, Henk CorporaalAuthors Info & Claims

ASP-DAC '04: Proceedings of the 2004 Asia and South Pacific Design Automation Conference

Pages 824 - 829

Published: 27 January 2004 Publication History

Abstract

For multimedia applications, loop buffering is an efficient mechanism to reduce the power in the instruction memory of embedded processors. In particular, software controlled clustered loop buffers are energy efficient. However current compilers for VLIW do not fully exploit the potentials offered by such a clustered organization This paper presents an algorithm to explore what is the optimal loop buffer configuration and the optimal way to use this configuration for an application or a set of applications. Results for the MediaBench application suite show an additional 18% reduction (on average) in energy in the instruction memory hierarchy as compared to traditional non-clustered approaches to the loop buffer without compromising performance.

References

[1]

M. F. Jacome and G. de Veciana, "Design challenges for new application-specific processors," Special issue on Design of Embedded Systems in IEEE Design & Test of Computers, April-June 2000.

Digital Library

[2]

G. Slavenburg, S. Rathnam, and H. Dijkstra, "The Trimedia TM-1 PCI VLIW media processor," in Proceedings Hot Chips VIII Conference, 1996.

[3]

Texas Instruments Inc., C6000 Platform: DSP Selection Guide, 2003. Selection Guide No. SSDV004L.

[4]

L. Benini, D. Bruni, M. Chinosi, C. Silvano, V. Zaccaria, and R. Zafalon, "A power modeling and estimation framework for vliw-based embedded systems," in in Proc. Int. Workshop on Power And Timing Modeling, Optimization and Simulation PATMOS, September 2001.

[5]

F. Catthoor, K. Danckaert, C. Kulkarni, E. Brockmeyer, P. G. Kjeldsberg, T. Van Achteren, and T. Omnes, Data access and storage management for embedded programmable processors. Kluwer Academic Publishers, March 2002.

Digital Library

[6]

P. R. Panda, N. D. Dutt, and A. Nicolau, "Memory data organization for improved cache performance in embedded processor applications," ACM TODAES, vol. 2, no. 4, pp. 384--409, 1997.

Digital Library

[7]

L. Benini and G. de Micheli, "Sysmtem-level power optimization: Techniques and tools," ACM TODAES, vol. 5, pp. 115--192, April 2000.

Digital Library

[8]

L. H. Lee, W. Moyer, and J. Arends, "Instruction fetch energy reduction using loop caches for embedded applications with small tight loops," in Proc of ISLPED, August 1999.

Digital Library

[9]

S. Rixner, W. Dally, B. Khialany, p. Mattson, U. Kapnasi, and J. Owens, "Register organization for media processing," in Proc of 26th International Symposium on High-Performance Computer Architecture (HiPC), January 2000.

[10]

M. Jayapala, F. Barat, T. Vander Aa, F. Catthoor, G. Deconinck, and H. Corporaal, "Clustered 10 buffer organization for low energy embedded processors," in Proc of 1st Workshop on Application Specific Processors (WASP), held in conjunction with MICRO-35, November 2002.

[11]

T. Anderson and S. Agarwala, "Effective hardware-based two-way loop cache for high performance low power processors," in Proc of ICCD, September 2000.

Digital Library

[12]

R. S. Bajwa and et al., "Instruction buffering to reduce power in processors for signal processing," IEEE Transactions on VLSI, vol. 5, pp. 417--424, December 1997.

Digital Library

[13]

L. H. Lee, B. Moyer, J. Arends, and A. Arbor, "Low-cost embedded program loop caching - revisited," tech.rep., EECS, University of Michigan, December 1999.

[14]

A. Gordon-Ross, S. Cotterell, and F. Vahid, "Exploiting fixed programs in embedded systems: A loop cache example," in Proc of IEEE Computer Architecture Letters, Jan 2002.

Digital Library

[15]

N. Bellas, I. Hajj, C. Polychronopoulos, and G. Stamoulis, "Architectural and compiler support for energy reduction in the memory hierarchy of high performance microprocessors," in Proc of ISLPED, August 1998.

Digital Library

[16]

R. Leupers, "Instruction scheduling for clustered VLIW DSPs, "in IEEE PACT, pp. 291--300, 2000.

Digital Library

[17]

J. Sánchez and A. Gonz'alez, "Modulo scheduling for a fully-distributed clustered vliw architectures," in Proc of 29th International Symposium on Microarchitecture (MICRO), December 2001.

Digital Library

[18]

J. A. Fisher, P. Faraboschi, and G. Desoli, "Custom-fit processors: Letting applications define architectures," in 29th Annual IEEE/ACM Symposium on Microarchitecture (MICRO-29), pp. 324--335, 1996.

Digital Library

[19]

A. Capitanio, N. Dutt, and A. Nicolau, "Partitioned register files for vliws: A preliminary analysis," in Proceedings of the 25th Annual International Symposium on Microarchitecture, pp. 292--300, 1992.

Digital Library

[20]

V. Lapinskii, M. F. Jacome, and G. de Veciana, "Application-specific clustered vliw datapaths: Early exploration on a parameterized design space," IEEE Transactions on Computer Aided Design of Integrated Circuits and Systems, vol. 21, pp. 889--903, August 2002.

Digital Library

[21]

D. Brooks, V. Tiwari, and M. Martonosi, "Wattch: A framework for architectural-level power analysis and optimizations," in Proc of ISCA, pp. 83--94, June 2000.

Digital Library

[22]

Trimaran group, http://www.trimaran.org, Trimaran: An Infrastructure for Research in Instruction-Level Parallelism, 1999.

[23]

C. Lee and et al., "Mediabench: A tool for evaluating and synthesizing multimedia and communicatons systems," in International Symposium on Microarchitecture, pp. 330--335, 1997.

Digital Library

[24]

T. Vander Aa, M. Jayapala, F. Barat, F. Catthoor, H. Corporaal, and G. Deconinck, "Instruction buffering exploration for low energy embedded processors," in Proc. of the 13th International Workshop on Power and Timing Modeling, Optimization and Simulation (PATMOS 2003) (E. M. J. J. Chico, ed.), (September 10--12, 2003, Torino, Italy), pp. 409--419, Springer Verlag, Lecture Notes in Computer Science, 09 2003.

Cited By

Gu JGuo HIshihara T(2013)DLICACM Transactions on Embedded Computing Systems10.1145/251246413:1(1-26)Online publication date: 5-Sep-2013
https://dl.acm.org/doi/10.1145/2512464
Gu JGuo HKathail VTatge RBarua R(2010)Enabling large decoded instruction loop caching for energy-aware embedded processorsProceedings of the 2010 international conference on Compilers, architectures and synthesis for embedded systems10.1145/1878921.1878957(247-256)Online publication date: 24-Oct-2010
https://dl.acm.org/doi/10.1145/1878921.1878957
Raghavan PJayapala MLambrechts AAbsar JCatthoor F(2009)Playing the trade-off gameACM Transactions on Design Automation of Electronic Systems10.1145/1529255.152925814:3(1-37)Online publication date: 4-Jun-2009
https://dl.acm.org/doi/10.1145/1529255.1529258
Show More Cited By

Recommendations

Instruction buffering exploration for low energy embedded processors
Low-power Embedded Systems

For multimedia applications, loop buffering is an efficient mechanism to reduce the power in the instruction memory of embedded processors. Especially software controlled loop buffers are energy efficient. However current compilers do not fully take ...
Automatic custom instruction identification for application-specific instruction set processors

The application-specific instruction set processors (ASIPs) have received more and more attention in recent years. ASIPs make trade-offs between flexibility and performance by extending the base instruction set of a general-purpose processor with custom ...
Increasing the instruction fetch rate via block-structured instruction set architectures
MICRO 29: Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture

To exploit larger amounts of instruction level parallelism, processors are being built with wider issue widths and larger numbers of functional units. Instruction fetch rate must also be increased in order to effectively exploit the performance ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ASP-DAC '04: Proceedings of the 2004 Asia and South Pacific Design Automation Conference

January 2004

957 pages

ISBN:0780381750

General Chair:
Masaharu Imai
Osaka University

Sponsors

IPSJ: Information Processing Society of Japan
IEEE Circuits and Systems Society
SIGDA: ACM Special Interest Group on Design Automation
IEICE: Institute of Electronics, Information and Communication Engineers

Publisher

IEEE Press

Publication History

Published: 27 January 2004

Check for updates

Qualifiers

Article

Conference

ASPDAC04

Sponsor:

IPSJ
SIGDA
IEICE

ASPDAC04: Asia and South Pacific Design Automation Conference 2004

January 27 - 30, 2004

Yokohama, Japan

Acceptance Rates

Overall Acceptance Rate 466 of 1,454 submissions, 32%

Upcoming Conference

ASPDAC '25

Sponsor:
sigda

30th Asia and South Pacific Design Automation Conference

January 20 - 23, 2025

Tokyo , Japan

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

8
Total Citations
View Citations
215
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 09 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Gu JGuo HIshihara T(2013)DLICACM Transactions on Embedded Computing Systems10.1145/251246413:1(1-26)Online publication date: 5-Sep-2013
https://dl.acm.org/doi/10.1145/2512464
Gu JGuo HKathail VTatge RBarua R(2010)Enabling large decoded instruction loop caching for energy-aware embedded processorsProceedings of the 2010 international conference on Compilers, architectures and synthesis for embedded systems10.1145/1878921.1878957(247-256)Online publication date: 24-Oct-2010
https://dl.acm.org/doi/10.1145/1878921.1878957
Raghavan PJayapala MLambrechts AAbsar JCatthoor F(2009)Playing the trade-off gameACM Transactions on Design Automation of Electronic Systems10.1145/1529255.152925814:3(1-37)Online publication date: 4-Jun-2009
https://dl.acm.org/doi/10.1145/1529255.1529258
Raghavan PLambrechts AAbsar JJayapala MCatthoor FVerkest D(2008)COFFEEProceedings of the 3rd international conference on High performance embedded architectures and compilers10.5555/1786054.1786074(193-208)Online publication date: 27-Jan-2008
https://dl.acm.org/doi/10.5555/1786054.1786074
Ge ZWong WLim HLauwereins RMadsen J(2007)DRIMProceedings of the conference on Design, automation and test in Europe10.5555/1266366.1266659(1343-1348)Online publication date: 16-Apr-2007
https://dl.acm.org/doi/10.5555/1266366.1266659
Kobayashi YJayapala MRaghavan PCatthoor FImai M(2007)Methodology for operation shuffling and L0 cluster generation for low energy heterogeneous VLIW processorsACM Transactions on Design Automation of Electronic Systems10.1145/1278349.127835412:4(41-es)Online publication date: 1-Sep-2007
https://dl.acm.org/doi/10.1145/1278349.1278354
Bouyssounouse BSifakis J(2005)Low power engineeringEmbedded Systems Design10.5555/2137690.2137724(450-478)Online publication date: 1-Jan-2005
https://dl.acm.org/doi/10.5555/2137690.2137724
Ravindran RNagarkar PDasika GMarsman ESenger RMahlke SBrown R(2005)Compiler Managed Dynamic Instruction Placement in a Low-Power Code CacheProceedings of the international symposium on Code generation and optimization10.1109/CGO.2005.13(179-190)Online publication date: 20-Mar-2005
https://dl.acm.org/doi/10.1109/CGO.2005.13

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents