Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1007/11574859_3guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

Bit-sliced datapath for energy-efficient high performance microprocessors

Published: 05 December 2004 Publication History

Abstract

In the recent years, both power and performance have become important in the design of microprocessors. In this paper, we investigate exploiting the small-sized data values for energy-efficient high performance microprocessors. For this purpose, we bit-slice the execution core (which includes the functional units, register files, data caches, and forwarding logic), so that small portions of the data are operated upon in different bit-slices. The bit-slices operating upon the higher order bits are activated only if required, saving significant energy consumption. We also investigate further advantages facilitated by bit-slicing such as energy savings obtained by reducing the number of ports provided in the higher order bit-slices and by “shutting off” bit-slices to reduce leakage energy consumption. We found that a significant energy saving can be obtained in the register file (about 20%) and the Level-1 Data Cache (about 30%) with a negligible loss of only about 2% in the instruction throughput. Our studies also showed almost 20% savings in the register file leakage energy consumption, when the unwanted higher order bit-slices are “turned off”. Bit-slicing also helps in reducing the latency of the different hardware structures, which can facilitate clock speed improvements.

References

[1]
A. Aggarwal and M. Franklin, "Energy Efficient Asymmetrically Ported Register File," Proc. ICCD, 2003.
[2]
D. Brooks and M. Martonosi, "Dynamically Exploiting Narrow Width Operands to Improve Processor Power and Performance," Proc. HPCA, 1999.
[3]
D. Burger and T. M. Austin, "The SimpleScalar Tool Set, Version 2.0," Computer Arch. News, June 1997.
[4]
R. Canal, A. Gonzalez and J. E. Smith, "Very Low Power Pipelines using Significance Compression," Proc. Micro, 2000.
[5]
R. Canal, A. Gonzalez and J. E. Smith, "Software-Controlled Operand-Gating," Proc. International Symposium on Code Generation and Optimization, 2004.
[6]
M. Powell, et. al., "Gated-Vdd: A circuit technique to reduce leakage in deep-submicron cache memories," Proc. of ISLPED, 2000.
[7]
M. R. Guthaus, et. al., "MiBench: A Free Commercially Representative Embedded Benchmark Suite," Proc. IEEE International Workshop on Workload Characterization, 2001.
[8]
M. K. Gowan, et. al., "Power Considerations in the Design of the Alpha 21264 Microprocessor," Proc. DAC, 1998.
[9]
G. Hinton, et al, "A 0.18-um CMOS IA-32 Processor With a 4-GHz Integer Execution Unit," IEEE Journal of Solid-State Circuits, Vol. 36, No. 11, Nov. 2001.
[10]
S. Larsen, and S. Amarasinghe, "Exploiting Superword Level Parallelism with Multimedia Instruction Sets," Proc. PLDI, 2000.
[11]
G. Loh, "Exploiting data-width locality to increase superscalar execution bandwidth," Proc. Micro-35, 2002.
[12]
S. Mahlke et. al., "Bitwidth Cognizant Architecture Synthesis of Custom Hardware Accelerators," IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 20(11), Nov. 2001.
[13]
G. Pokam, S. Bihan, J. Simonnet, and F. Bodin, "SWARP: A Retargetable Preprocessor for Multimedia Instructions," Concurrency and Computation: Practice and Experience, 16(2-3):303-318, Feb. 2004.
[14]
G. Pokam et. al., "Speculative Software Management of Datapath-width for Energy Optimization," Proc. LCTES, 2004.
[15]
P. Shivakumar, and N. Jouppi, "CACTI 3.0: An Integrated Cache Timing Power, and Area Model," Technical Report, DEC Western Research Lab, 2002.
[16]
M. Stepehenson et. al., "Bitwidth Analysis with Application to Silicon Compilation," Proc. PLDI, 2000.
[17]
J. Tseng, and K. Asanovic, "Banked Multiported Register Files for High-Frequency Superscalar Microprocessors," Proc. ISCA-30, 2003.
[18]
Luis Villa, Michael Zhang, and Krste Asanovic, "Dynamic zero compression for cache energy reduction," Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture, p.214-220, December 2000.
[19]
Y. Zhang, et. al., "Hotleakage: A Temperature-aware Model of Subthreshold and Gate Leakage for Architects," Technical Report CS-2003-05, University of Virginia, Department of CS, 2003.

Cited By

View all
  • (2021)Opening pandora's boxProceedings of the 48th Annual International Symposium on Computer Architecture10.1109/ISCA52012.2021.00035(347-360)Online publication date: 14-Jun-2021
  • (2014)Towards scalable arithmetic units with graceful degradationACM Transactions on Embedded Computing Systems10.1145/249936713:4(1-26)Online publication date: 10-Mar-2014
  • (2005)Restrictive Compression Techniques to Increase Level 1 Cache CapacityProceedings of the 2005 International Conference on Computer Design10.1109/ICCD.2005.94(327-333)Online publication date: 2-Oct-2005

Index Terms

  1. Bit-sliced datapath for energy-efficient high performance microprocessors
        Index terms have been assigned to the content through auto-classification.

        Recommendations

        Comments

        Information & Contributors

        Information

        Published In

        cover image Guide Proceedings
        PACS'04: Proceedings of the 4th international conference on Power-Aware Computer Systems
        December 2004
        181 pages
        ISBN:3540297901
        • Editors:
        • Babak Falsafi,
        • T. N. VijayKumar

        Publisher

        Springer-Verlag

        Berlin, Heidelberg

        Publication History

        Published: 05 December 2004

        Qualifiers

        • Article

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)0
        • Downloads (Last 6 weeks)0
        Reflects downloads up to 13 Sep 2024

        Other Metrics

        Citations

        Cited By

        View all
        • (2021)Opening pandora's boxProceedings of the 48th Annual International Symposium on Computer Architecture10.1109/ISCA52012.2021.00035(347-360)Online publication date: 14-Jun-2021
        • (2014)Towards scalable arithmetic units with graceful degradationACM Transactions on Embedded Computing Systems10.1145/249936713:4(1-26)Online publication date: 10-Mar-2014
        • (2005)Restrictive Compression Techniques to Increase Level 1 Cache CapacityProceedings of the 2005 International Conference on Computer Design10.1109/ICCD.2005.94(327-333)Online publication date: 2-Oct-2005

        View Options

        View options

        Media

        Figures

        Other

        Tables

        Share

        Share

        Share this Publication link

        Share on social media