Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1176887.1176921acmconferencesArticle/Chapter ViewAbstractPublication PagesesweekConference Proceedingsconference-collections
Article

Compiler-assisted leakage energy optimization for clustered VLIW architectures

Published: 22 October 2006 Publication History
  • Get Citation Alerts
  • Abstract

    Miniaturization of devices and the ensuing decrease in the threshold voltage has led to a substantial increase in the leakage component of the total processor energy consumption. Relatively simpler issue logic and the presence of a large number of function units in the VLIW and the clustered VLIW architectures attribute a large fraction of this leakage energy consumption in the functional units. However, functional units are not fully utilized in the VLIW architectures because of the inherent variations in the ILP of the programs. This underutilization is even more pronounced in the context of clustered VLIW architectures because of the contentions for the limited number of slow intercluster communication channels which lead to many short idle cycles.In the past, some architectural schemes have been proposed to obtain leakage energy bene .ts by aggressively exploiting the idleness of functional units. However, presence of many short idle cycles cause frequent transitions from the active mode to the sleep mode and vice-versa and adversely a ffects the energy benefits of a purely hardware based scheme. In this paper, we propose and evaluate a compiler instruction scheduling algorithm that assist such a hardware based scheme in the context of VLIW and clustered VLIW architectures. The proposed scheme exploits the scheduling slacks of instructions to orchestrate the functional unit mapping with the objective of reducing the number of transitions in functional units thereby keeping them off for a longer duration. The proposed compiler-assisted scheme obtains a further 12% reduction of energy consumption of functional units with negligible performance degradation over a hardware-only scheme for a VLIW architecture. The benefits are 15% and 17% in the context of a 2-clustered and a 4-clustered VLIW architecture respectively. Our test bed uses the Trimaran compiler infrastructure.

    References

    [1]
    MediaBench.http://cares.icsl.ucla.edu/MediaBench/.]]
    [2]
    MiBench. http://www.eecs.umich.edu/mibench/.]]
    [3]
    NetBench. http://cares.icsl.ucla.edu/NetBench/.]]
    [4]
    Trimaran System. http://www.trimaran.org/.]]
    [5]
    S. G. Abraham, W. M. Meleis, and I. D. Baev. Efficient Backtracking Instruction Schedulers. In Proc. of Intl. Conf. on Parallel Architectures and Compilation Techniques pages 301--308, 2000.]]
    [6]
    A. Aleta, J. M. Codina, J. Sanchez, and A. Gonzalez. Graph-partitioning based Instruction Scheduling for Clustered Processors. In Proc. of Intl. Symp. on Microarchitecture pages 150--159, 2001.]]
    [7]
    S. Borkar. Design Challenges of Technology Scaling. IEEE Micro 19(4): 23--29,1999.]]
    [8]
    J. A. Buttsand G. S. Sohi. A Static Power Model for Architects. In Proc. of the Intl. Symp. on Microarchitecture pages 191--201, New York, NY, USA, 2000.]]
    [9]
    M. Chu, K. Fan, and S. Mahlke. Region-based Hierarchical Operation Partitioning for Multicluster Processors. SIGPLAN Notices pages 300--311, 2003.]]
    [10]
    G. Desoli. Instruction Assignment for Clustered VLIW DSP Compilers: A New Approach. Technical Report, Hewlett-Packard, 1998.]]
    [11]
    S. Dropsho, V. Kursun, D. H. Albonesi, S. Dwarkadas, and E. G. Friedman. Managing Static Leakage Energy in Microprocessor Functional Units. In Proc. of the Intl. Symp. on Microarchitecture pages 321--332, Los Alamitos, CA, USA, 2002.]]
    [12]
    J. R. Ellis. Bulldog: A Compiler for VLIW Architectures MIT Press, 1986.]]
    [13]
    PFaraboschi, G. Brown, J. A. Fisher, and G. Desoli. Clustered Instruction-level Parallel Processors. Technical report, Hewlett-Packard, 1998.]]
    [14]
    K. Flautner, N. S. Kim, S. Martin, D. Blaauw, and T. Mudge. Drowsy Caches: Simple Techniques for Reducing Leakage Power. In Proc. of the Intl. Symp. on Computer Architecture pages 148--157, Washington, DC, USA, 2002.]]
    [15]
    B. M.-S. Gokhan Memic and W. Hu. NetBench: A Benchmarking Suit for Network Processor. CARES Technical Report 2002.]]
    [16]
    M. Guthaus, J. Ringenberg, and D. Ernst. MiBench: A Free, Commercially Representative Embedded Benchmark Suite. IEEE 4th Annual Workshop on Workload Characterization 2001.]]
    [17]
    K. Kailas, A. Agrawala, and K. Ebcioglu. CARS: A New Code Generation Framework for Clustered ILP Processors. In Proc. of Intl. Symp. on High-Performance Computer Architecture page 133, 2001.]]
    [18]
    S. Kaxiras, Z. Hu, and M. Martonosi. Cache Decay: Exploiting Generational Behavior to Reduce Cache Leakage Power. In Proc. of the Intl. Symp. on Computer Architecture pages 240--251,New York, NY, USA, 2001.]]
    [19]
    H. S. Kim, N. Vijaykrishnan, M. Kandemir, and M. J. Irwin. Adapting Instruction Level Parallelism for Optimizing Leakage in VLIW Architectures. In Proc. of Conf. on Language, Compiler, and Tool for Embedded Systems pages 275--283,2003.]]
    [20]
    V. Kursun and E. G. Friedman. Low swing Dual Threshold Voltage Domino Logic. In Proc. of the ACM Great Lakes Symp. on VLSI pages 47--52, New York, NY, USA, 2002.]]
    [21]
    V. S. Lapinskii, M. F. Jacome, and G. A. De Veciana. Cluster Assignment for High-Performance Embedded VLIW Processors. ACM Trans. on Design and Automation of Electronic Systems pages 430--454, 2002.]]
    [22]
    C. Lee, M. Potkonjak, and W. H. Mangione-Smith. MediaBench: A Tool for Evaluating and Synthesizing Multimedia and Communications Systems. In Proc. of Intl. Symp. on Microarchitecture 1997.]]
    [23]
    W. Lee, D. Puppin, S. Swenson, and S. Amarasinghe. Convergent Scheduling.In Proc. of Intl. Symp. on Microarchitecture pages 111--122, 2002.]]
    [24]
    R. Leupers. Instruction Scheduling for Clustered VLIW DSPs. In Proc. of Intl. Conf. on Parallel Architectures and Compilation Techniques page 291, Washington, DC, USA, 2000.]]
    [25]
    T. N. Mudge. Power: A First Class Design Constraint for Future Architecture and Automation.In Proc. of the Intl. Conf. on High Performance Computing pages 215--224, London, UK, 2000. Springer-Verlag.]]
    [26]
    R. Nagpal and Y. N. Srikant. A Graph Matching Based Integrated Scheduling Framework for Clustered VLIW Processors.In Proc. of ICPP Workshop on Compile and Runtime Techniques Parallel Computing pages 530--537, 2004.]]
    [27]
    R. Nagpal and Y. N. Srikant. Integrated Temporal and Spatial Scheduling for Extended Operand Clustered VLIW Processors. In Proc. of Conf. on computing frontiers pages 457--470, 2004.]]
    [28]
    R. Nagpal and Y. N. Srikant. Compiler-Assisted Leakage Energy Optimization for Clustered VLIW Architectures. Technical Report, Dept. of CSA, Indian Institute of Science(http://www.archive.csa.iisc.ernet.in/TR), 2005.]]
    [29]
    E. Nystrom and A. E. Eichenberger. Effective Cluster Assignment for Modulo Scheduling. In Proc. of 31st annual ACM/IEEE Intl. Symp. on Microarchitecture pages 103--114, 1998.]]
    [30]
    E. Ozer, S. Banerjia, and T. M. Conte. Unified Assign and Schedule: A New Approach to Scheduling for Clustered Register File Microarchitectures. In Proc. of Intl. Symp. on Microarchitecture pages 308--315, 1998.]]
    [31]
    S. Rele, S. Pande, S. Onder, and R. Gupta. Optimizing Static Power Dissipation by Functional Units in Superscalar Processors. In Proc. of 11th Intl. Conf. on Compiler Construction pages 261--275, 2002.]]
    [32]
    D. Sylvester and H. Kaul. Power-Driven Challenges in Nanometer Design.IEEE Design and Test of Computers 18(6): 12--22, 2001.]]
    [33]
    K. A. Vardhan and Y. N. Srikant. Transition Aware Scheduling: Increasing Continuous Idle-Periods in Resource Units. In Proc. of the Conf. on Computing frontiers pages 189--198, New York, NY, USA, 2005.]]
    [34]
    S.-H. Yang, B. Falsa., M. D. Powell, K. Roy, and T. N. Vijaykumar. An Integrated Circuit/Architecture Approach to Reducing Leakage in Deep-Submicron High-Performance I Caches. In Proc. of the Intl. Symp. on High-Performance Computer Architecture page 147, Washington, DC, USA, 2001.]]
    [35]
    H. Yun and J. Kim. Power-aware Modulo Scheduling for High-Performance VLIW Processors. In Proc. of Intl. Symp. on Low Power Electronics and Design pages 40--45,2001.]]
    [36]
    J. Zalamea, J. Llosa, E. Ayguade, and M. Valero. Modulo Scheduling with Integrated Register Spilling for Clustered VLIW Architectures. In Proc. of Intl. Symp. on Microarchitecture pages 160--169, 2001.]]
    [37]
    W. Zhang, N. Vijaykrishnan, M. Kandemir, M. J. Irwin, D. Duarte, and Y.-F. Tsai. Exploiting VLIW Schedule Slacks for Dynamic and Leakage Energy Reduction. In Proc. of Intl. Symp. on Microarchitecture pages 102--113,2001.]]

    Cited By

    View all
    • (2019)Compiler-assisted leakage-aware loop scheduling for embedded VLIW DSP processorsJournal of Systems and Software10.1016/j.jss.2009.11.72783:5(772-785)Online publication date: 3-Jan-2019
    • (2014)Design and evaluation of fine-grained power-gating for embedded microprocessorsProceedings of the conference on Design, Automation & Test in Europe10.5555/2616606.2616785(1-6)Online publication date: 24-Mar-2014
    • (2014)Path-Dividing Based Scheduling Algorithm for Reducing Energy Consumption of Clustered VLIW ArchitecturesIEEE Transactions on Computers10.1109/TC.2013.13863:10(2526-2539)Online publication date: Oct-2014
    • Show More Cited By

    Index Terms

    1. Compiler-assisted leakage energy optimization for clustered VLIW architectures

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      EMSOFT '06: Proceedings of the 6th ACM & IEEE International conference on Embedded software
      October 2006
      346 pages
      ISBN:1595935428
      DOI:10.1145/1176887
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 22 October 2006

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. clustered VLIW processors
      2. energy-aware scheduling
      3. leakage energy
      4. scheduling

      Qualifiers

      • Article

      Conference

      ESWEEK06
      ESWEEK06: Second Embedded Systems Week 2006
      October 22 - 25, 2006
      Seoul, Korea

      Acceptance Rates

      Overall Acceptance Rate 60 of 203 submissions, 30%

      Upcoming Conference

      ESWEEK '24
      Twentieth Embedded Systems Week
      September 29 - October 4, 2024
      Raleigh , NC , USA

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)2
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 10 Aug 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2019)Compiler-assisted leakage-aware loop scheduling for embedded VLIW DSP processorsJournal of Systems and Software10.1016/j.jss.2009.11.72783:5(772-785)Online publication date: 3-Jan-2019
      • (2014)Design and evaluation of fine-grained power-gating for embedded microprocessorsProceedings of the conference on Design, Automation & Test in Europe10.5555/2616606.2616785(1-6)Online publication date: 24-Mar-2014
      • (2014)Path-Dividing Based Scheduling Algorithm for Reducing Energy Consumption of Clustered VLIW ArchitecturesIEEE Transactions on Computers10.1109/TC.2013.13863:10(2526-2539)Online publication date: Oct-2014
      • (2013)Compiler-assisted leakage energy optimization of media applications on stream architecturesInternational Symposium on Quality Electronic Design (ISQED)10.1109/ISQED.2013.6523599(120-127)Online publication date: Mar-2013
      • (2011)An Energy Aware Design Space Exploration for VLIW AGU Model with Fine Grained Power GatingProceedings of the 2011 14th Euromicro Conference on Digital System Design10.1109/DSD.2011.93(693-700)Online publication date: 31-Aug-2011
      • (2011)Leakage-Aware Modulo Scheduling for Embedded VLIW ProcessorsJournal of Computer Science and Technology10.1007/s11390-011-1143-626:3(405-417)Online publication date: 12-May-2011
      • (2010)Integrated energy-aware cyclic and acyclic scheduling for clustered VLIW processors2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW)10.1109/IPDPSW.2010.5470906(1-8)Online publication date: Apr-2010
      • (2010)Design space exploration for an embedded processor with flexible datapath interconnectASAP 2010 - 21st IEEE International Conference on Application-specific Systems, Architectures and Processors10.1109/ASAP.2010.5540812(55-62)Online publication date: Jul-2010
      • (2009)Energy-Aware Compiler OptimizationsThe Compiler Design Handbook10.1201/9781420043839.ch7(7-1-7-36)Online publication date: 7-Dec-2009
      • (2008)Minimizing Leakage Energy with Modulo Scheduling for VLIW DSP ProcessorsDistributed Embedded Systems: Design, Middleware and Resources10.1007/978-0-387-09661-2_11(111-120)Online publication date: 2008
      • Show More Cited By

      View Options

      Get Access

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media