Article

Compiler-assisted leakage energy optimization for clustered VLIW architectures

Authors:

Y. N. SrikantAuthors Info & Claims

EMSOFT '06: Proceedings of the 6th ACM & IEEE International conference on Embedded software

Pages 233 - 241

https://doi.org/10.1145/1176887.1176921

Published: 22 October 2006 Publication History

Abstract

Miniaturization of devices and the ensuing decrease in the threshold voltage has led to a substantial increase in the leakage component of the total processor energy consumption. Relatively simpler issue logic and the presence of a large number of function units in the VLIW and the clustered VLIW architectures attribute a large fraction of this leakage energy consumption in the functional units. However, functional units are not fully utilized in the VLIW architectures because of the inherent variations in the ILP of the programs. This underutilization is even more pronounced in the context of clustered VLIW architectures because of the contentions for the limited number of slow intercluster communication channels which lead to many short idle cycles.In the past, some architectural schemes have been proposed to obtain leakage energy bene .ts by aggressively exploiting the idleness of functional units. However, presence of many short idle cycles cause frequent transitions from the active mode to the sleep mode and vice-versa and adversely a ffects the energy benefits of a purely hardware based scheme. In this paper, we propose and evaluate a compiler instruction scheduling algorithm that assist such a hardware based scheme in the context of VLIW and clustered VLIW architectures. The proposed scheme exploits the scheduling slacks of instructions to orchestrate the functional unit mapping with the objective of reducing the number of transitions in functional units thereby keeping them off for a longer duration. The proposed compiler-assisted scheme obtains a further 12% reduction of energy consumption of functional units with negligible performance degradation over a hardware-only scheme for a VLIW architecture. The benefits are 15% and 17% in the context of a 2-clustered and a 4-clustered VLIW architecture respectively. Our test bed uses the Trimaran compiler infrastructure.

References

[1]

MediaBench.http://cares.icsl.ucla.edu/MediaBench/.]]

[2]

MiBench. http://www.eecs.umich.edu/mibench/.]]

[3]

NetBench. http://cares.icsl.ucla.edu/NetBench/.]]

[4]

Trimaran System. http://www.trimaran.org/.]]

[5]

S. G. Abraham, W. M. Meleis, and I. D. Baev. Efficient Backtracking Instruction Schedulers. In Proc. of Intl. Conf. on Parallel Architectures and Compilation Techniques pages 301--308, 2000.]]

Digital Library

[6]

A. Aleta, J. M. Codina, J. Sanchez, and A. Gonzalez. Graph-partitioning based Instruction Scheduling for Clustered Processors. In Proc. of Intl. Symp. on Microarchitecture pages 150--159, 2001.]]

Digital Library

[7]

S. Borkar. Design Challenges of Technology Scaling. IEEE Micro 19(4): 23--29,1999.]]

Digital Library

[8]

J. A. Buttsand G. S. Sohi. A Static Power Model for Architects. In Proc. of the Intl. Symp. on Microarchitecture pages 191--201, New York, NY, USA, 2000.]]

Digital Library

[9]

M. Chu, K. Fan, and S. Mahlke. Region-based Hierarchical Operation Partitioning for Multicluster Processors. SIGPLAN Notices pages 300--311, 2003.]]

Digital Library

[10]

G. Desoli. Instruction Assignment for Clustered VLIW DSP Compilers: A New Approach. Technical Report, Hewlett-Packard, 1998.]]

[11]

S. Dropsho, V. Kursun, D. H. Albonesi, S. Dwarkadas, and E. G. Friedman. Managing Static Leakage Energy in Microprocessor Functional Units. In Proc. of the Intl. Symp. on Microarchitecture pages 321--332, Los Alamitos, CA, USA, 2002.]]

Digital Library

[12]

J. R. Ellis. Bulldog: A Compiler for VLIW Architectures MIT Press, 1986.]]

Digital Library

[13]

PFaraboschi, G. Brown, J. A. Fisher, and G. Desoli. Clustered Instruction-level Parallel Processors. Technical report, Hewlett-Packard, 1998.]]

[14]

K. Flautner, N. S. Kim, S. Martin, D. Blaauw, and T. Mudge. Drowsy Caches: Simple Techniques for Reducing Leakage Power. In Proc. of the Intl. Symp. on Computer Architecture pages 148--157, Washington, DC, USA, 2002.]]

Digital Library

[15]

B. M.-S. Gokhan Memic and W. Hu. NetBench: A Benchmarking Suit for Network Processor. CARES Technical Report 2002.]]

[16]

M. Guthaus, J. Ringenberg, and D. Ernst. MiBench: A Free, Commercially Representative Embedded Benchmark Suite. IEEE 4th Annual Workshop on Workload Characterization 2001.]]

Digital Library

[17]

K. Kailas, A. Agrawala, and K. Ebcioglu. CARS: A New Code Generation Framework for Clustered ILP Processors. In Proc. of Intl. Symp. on High-Performance Computer Architecture page 133, 2001.]]

Digital Library

[18]

S. Kaxiras, Z. Hu, and M. Martonosi. Cache Decay: Exploiting Generational Behavior to Reduce Cache Leakage Power. In Proc. of the Intl. Symp. on Computer Architecture pages 240--251,New York, NY, USA, 2001.]]

Digital Library

[19]

H. S. Kim, N. Vijaykrishnan, M. Kandemir, and M. J. Irwin. Adapting Instruction Level Parallelism for Optimizing Leakage in VLIW Architectures. In Proc. of Conf. on Language, Compiler, and Tool for Embedded Systems pages 275--283,2003.]]

Digital Library

[20]

V. Kursun and E. G. Friedman. Low swing Dual Threshold Voltage Domino Logic. In Proc. of the ACM Great Lakes Symp. on VLSI pages 47--52, New York, NY, USA, 2002.]]

Digital Library

[21]

V. S. Lapinskii, M. F. Jacome, and G. A. De Veciana. Cluster Assignment for High-Performance Embedded VLIW Processors. ACM Trans. on Design and Automation of Electronic Systems pages 430--454, 2002.]]

Digital Library

[22]

C. Lee, M. Potkonjak, and W. H. Mangione-Smith. MediaBench: A Tool for Evaluating and Synthesizing Multimedia and Communications Systems. In Proc. of Intl. Symp. on Microarchitecture 1997.]]

Digital Library

[23]

W. Lee, D. Puppin, S. Swenson, and S. Amarasinghe. Convergent Scheduling.In Proc. of Intl. Symp. on Microarchitecture pages 111--122, 2002.]]

Digital Library

[24]

R. Leupers. Instruction Scheduling for Clustered VLIW DSPs. In Proc. of Intl. Conf. on Parallel Architectures and Compilation Techniques page 291, Washington, DC, USA, 2000.]]

Digital Library

[25]

T. N. Mudge. Power: A First Class Design Constraint for Future Architecture and Automation.In Proc. of the Intl. Conf. on High Performance Computing pages 215--224, London, UK, 2000. Springer-Verlag.]]

Digital Library

[26]

R. Nagpal and Y. N. Srikant. A Graph Matching Based Integrated Scheduling Framework for Clustered VLIW Processors.In Proc. of ICPP Workshop on Compile and Runtime Techniques Parallel Computing pages 530--537, 2004.]]

Digital Library

[27]

R. Nagpal and Y. N. Srikant. Integrated Temporal and Spatial Scheduling for Extended Operand Clustered VLIW Processors. In Proc. of Conf. on computing frontiers pages 457--470, 2004.]]

Digital Library

[28]

R. Nagpal and Y. N. Srikant. Compiler-Assisted Leakage Energy Optimization for Clustered VLIW Architectures. Technical Report, Dept. of CSA, Indian Institute of Science(http://www.archive.csa.iisc.ernet.in/TR), 2005.]]

[29]

E. Nystrom and A. E. Eichenberger. Effective Cluster Assignment for Modulo Scheduling. In Proc. of 31st annual ACM/IEEE Intl. Symp. on Microarchitecture pages 103--114, 1998.]]

Digital Library

[30]

E. Ozer, S. Banerjia, and T. M. Conte. Unified Assign and Schedule: A New Approach to Scheduling for Clustered Register File Microarchitectures. In Proc. of Intl. Symp. on Microarchitecture pages 308--315, 1998.]]

Digital Library

[31]

S. Rele, S. Pande, S. Onder, and R. Gupta. Optimizing Static Power Dissipation by Functional Units in Superscalar Processors. In Proc. of 11th Intl. Conf. on Compiler Construction pages 261--275, 2002.]]

Digital Library

[32]

D. Sylvester and H. Kaul. Power-Driven Challenges in Nanometer Design.IEEE Design and Test of Computers 18(6): 12--22, 2001.]]

Digital Library

[33]

K. A. Vardhan and Y. N. Srikant. Transition Aware Scheduling: Increasing Continuous Idle-Periods in Resource Units. In Proc. of the Conf. on Computing frontiers pages 189--198, New York, NY, USA, 2005.]]

Digital Library

[34]

S.-H. Yang, B. Falsa., M. D. Powell, K. Roy, and T. N. Vijaykumar. An Integrated Circuit/Architecture Approach to Reducing Leakage in Deep-Submicron High-Performance I Caches. In Proc. of the Intl. Symp. on High-Performance Computer Architecture page 147, Washington, DC, USA, 2001.]]

Digital Library

[35]

H. Yun and J. Kim. Power-aware Modulo Scheduling for High-Performance VLIW Processors. In Proc. of Intl. Symp. on Low Power Electronics and Design pages 40--45,2001.]]

Digital Library

[36]

J. Zalamea, J. Llosa, E. Ayguade, and M. Valero. Modulo Scheduling with Integrated Register Spilling for Clustered VLIW Architectures. In Proc. of Intl. Symp. on Microarchitecture pages 160--169, 2001.]]

Digital Library

[37]

W. Zhang, N. Vijaykrishnan, M. Kandemir, M. J. Irwin, D. Duarte, and Y.-F. Tsai. Exploiting VLIW Schedule Slacks for Dynamic and Leakage Energy Reduction. In Proc. of Intl. Symp. on Microarchitecture pages 102--113,2001.]]

Digital Library

Cited By

Wang MWang YLiu DQin ZShao Z(2019)Compiler-assisted leakage-aware loop scheduling for embedded VLIW DSP processorsJournal of Systems and Software10.1016/j.jss.2009.11.72783:5(772-785)Online publication date: 3-Jan-2019
https://dl.acm.org/doi/10.1016/j.jss.2009.11.727
Kondo MKobyashi HSakamoto RWada MTsukamoto JNamiki MWang WAmano HMatsunaga KKudo MUsami KKomoda TNakamura HFettweis GNebel W(2014)Design and evaluation of fine-grained power-gating for embedded microprocessorsProceedings of the conference on Design, Automation & Test in Europe10.5555/2616606.2616785(1-6)Online publication date: 24-Mar-2014
https://dl.acm.org/doi/10.5555/2616606.2616785
Yang X(2014)Path-Dividing Based Scheduling Algorithm for Reducing Energy Consumption of Clustered VLIW ArchitecturesIEEE Transactions on Computers10.1109/TC.2013.13863:10(2526-2539)Online publication date: Oct-2014
https://doi.org/10.1109/TC.2013.138
Show More Cited By

Index Terms

Compiler-assisted leakage energy optimization for clustered VLIW architectures
1. Software and its engineering
  1. Software notations and tools
    1. Compilers
      1. Source code generation

Recommendations

Compiler-assisted power optimization for clustered VLIW architectures

Clustered VLIW architectures solve the scalability problem associated with flat VLIW architectures by partitioning the register file and connecting only a subset of the functional units to a register file. However, inter-cluster communication in ...
Compiler-assisted energy optimization for clustered VLIW processors

Clustered architecture processors are preferred for embedded systems because centralized register file architectures scale poorly in terms of clock rate, chip area, and power consumption. Although clustering helps by improving the clock speed, reducing ...
Compiler-Assisted Instruction Decoder Energy Optimization for Clustered VLIW Architectures
High Performance Computing – HiPC 2007
Abstract
Traditionally, an instruction decoder is designed as a monolithic structure that inhibit the leakage energy optimization. In this paper, we consider a split instruction decoder that enable the leakage energy optimization. We also propose a ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

EMSOFT '06: Proceedings of the 6th ACM & IEEE International conference on Embedded software

October 2006

346 pages

ISBN:1595935428

DOI:10.1145/1176887

General Chairs:
Sang Lyul Min
Seoul National University
,
Wang Yi
Uppsala University

Copyright © 2006 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 22 October 2006

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Article

Conference

ESWEEK06

Sponsor:

ESWEEK06: Second Embedded Systems Week 2006

October 22 - 25, 2006

Seoul, Korea

Acceptance Rates

Overall Acceptance Rate 60 of 203 submissions, 30%

Upcoming Conference

ESWEEK '24

Sponsor:
sigbed
sigbed
sigbed

Twentieth Embedded Systems Week

September 29 - October 4, 2024

Raleigh , NC , USA

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

11
Total Citations
View Citations
343
Total Downloads

Downloads (Last 12 months)2
Downloads (Last 6 weeks)0

Reflects downloads up to 10 Aug 2024

Other Metrics

View Author Metrics

Citations

Cited By

Wang MWang YLiu DQin ZShao Z(2019)Compiler-assisted leakage-aware loop scheduling for embedded VLIW DSP processorsJournal of Systems and Software10.1016/j.jss.2009.11.72783:5(772-785)Online publication date: 3-Jan-2019
https://dl.acm.org/doi/10.1016/j.jss.2009.11.727
Kondo MKobyashi HSakamoto RWada MTsukamoto JNamiki MWang WAmano HMatsunaga KKudo MUsami KKomoda TNakamura HFettweis GNebel W(2014)Design and evaluation of fine-grained power-gating for embedded microprocessorsProceedings of the conference on Design, Automation & Test in Europe10.5555/2616606.2616785(1-6)Online publication date: 24-Mar-2014
https://dl.acm.org/doi/10.5555/2616606.2616785
Yang X(2014)Path-Dividing Based Scheduling Algorithm for Reducing Energy Consumption of Clustered VLIW ArchitecturesIEEE Transactions on Computers10.1109/TC.2013.13863:10(2526-2539)Online publication date: Oct-2014
https://doi.org/10.1109/TC.2013.138
Shan Cao Zhaolin Li Zhixiang Chen Guoyue Jiang Shaojun Wei (2013)Compiler-assisted leakage energy optimization of media applications on stream architecturesInternational Symposium on Quality Electronic Design (ISQED)10.1109/ISQED.2013.6523599(120-127)Online publication date: Mar-2013
https://doi.org/10.1109/ISQED.2013.6523599
Taniguchi IUchida MTomiyama HFukui MRaghavan PCatthoor F(2011)An Energy Aware Design Space Exploration for VLIW AGU Model with Fine Grained Power GatingProceedings of the 2011 14th Euromicro Conference on Digital System Design10.1109/DSD.2011.93(693-700)Online publication date: 31-Aug-2011
https://dl.acm.org/doi/10.1109/DSD.2011.93
Guan YXue J(2011)Leakage-Aware Modulo Scheduling for Embedded VLIW ProcessorsJournal of Computer Science and Technology10.1007/s11390-011-1143-626:3(405-417)Online publication date: 12-May-2011
https://doi.org/10.1007/s11390-011-1143-6
Bahuleyan JNagpal RSrikant Y(2010)Integrated energy-aware cyclic and acyclic scheduling for clustered VLIW processors2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW)10.1109/IPDPSW.2010.5470906(1-8)Online publication date: Apr-2010
https://doi.org/10.1109/IPDPSW.2010.5470906
Hoang TJalmbrant Uder Hagopian ESubramaniyan KSjalander MLarsson-Edefors P(2010)Design space exploration for an embedded processor with flexible datapath interconnectASAP 2010 - 21st IEEE International Conference on Application-specific Systems, Architectures and Processors10.1109/ASAP.2010.5540812(55-62)Online publication date: Jul-2010
https://doi.org/10.1109/ASAP.2010.5540812
Srikant YAnanda Vardhan K(2009)Energy-Aware Compiler OptimizationsThe Compiler Design Handbook10.1201/9781420043839.ch7(7-1-7-36)Online publication date: 7-Dec-2009
https://doi.org/10.1201/9781420043839.ch7
Wang MShao ZLiu HXue C(2008)Minimizing Leakage Energy with Modulo Scheduling for VLIW DSP ProcessorsDistributed Embedded Systems: Design, Middleware and Resources10.1007/978-0-387-09661-2_11(111-120)Online publication date: 2008
https://doi.org/10.1007/978-0-387-09661-2_11
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents