Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Clustered Loop Buffer Organization for Low Energy VLIW Embedded Processors

Published: 01 June 2005 Publication History

Abstract

Current loop buffer organizations for very large instruction word processors are essentially centralized. As a consequence, they are energy inefficient and their scalability is limited. To alleviate this problem, we propose a clustered loop buffer organization, where the loop buffers are partitioned and functional units are logically grouped to form clusters, along with two schemes for buffer control which regulate the activity in each cluster. Furthermore, we propose a design-time scheme to generate clusters by analyzing an application profile and grouping closely related functional units. The simulation results indicate that the energy consumed in the clustered loop buffers is, on average, 63 percent lower than the energy consumed in an uncompressed centralized loop buffer scheme, 35 percent lower than a centralized compressed loop buffer scheme, and 22 percent lower than a randomly clustered loop buffer scheme.

References

[1]
M.F. Jacome and G. de Veciana, “Design Challenges for New Application-Specific Processors,” IEEE Design & Test of Computers, special issue on design of embedded systems, Apr.-June 2000.]]
[2]
Texas Instruments Inc., TMS320C6000 Power Consumption Summary, http://www.ti.com, Nov. 1999.]]
[3]
L. Benini D. Bruni M. Chinosi C. Silvano and V. Zaccaria, “A Power Modeling and Estimation Framework for VLIW-Based Embedded System,” ST J. System Research, vol. 3, pp. 110-118, Apr. 2002.]]
[4]
R.S. Bajwa M. Hiraki H. Kojima D.J. Gorny K. Nitta A. Shridhar K. Seki and K. Sasaki, “Instruction Buffering to Reduce Power in Processors for Signal Processing,” IEEE Trans. Very Large Scale Integration (VLSI) Systems, vol. 5, pp. 417-424, Dec. 1997.]]
[5]
L.H. Lee W. Moyer and J. Arends, “Instruction Fetch Energy Reduction Using Loop Caches for Embedded Applications with Small Tight Loops,” Proc. Int'l Symp. Low Power Electronic Design (ISLPED), Aug. 1999.]]
[6]
A. Gordon-Ross S. Cotterell and F. Vahid, “Exploiting Fixed Programs in Embedded Systems: A Loop Cache Example,” Proc. IEEE Computer Architecture Letters, Jan. 2002.]]
[7]
N. Bellas I. Hajj C. Polychronopoulos and G. Stamoulis, “Architectural and Compiler Support for Energy Reduction in the Memory Hierarchy of High Performance Microprocessors,” Proc. Int'l Symp. Low Power Electronic Design (ISLPED), Aug. 1998.]]
[8]
J.W. Sias H.C. Hunter and W.M.W. Hwu, “Enhancing Loop Buffering of Media and Telecommunications Applications Using Low-Overhead Predication,” Proc. 34th Ann. Int'l Symp. Microarchitecture (MICRO), Dec. 2001.]]
[9]
Texas Instruments Inc., TMS320C6000 CPU and Instruction Set Reference Guide, http://www.ti.com, Oct. 2000.]]
[10]
N. Liveris N.D. Zervas D. Soudris and C.E. Goutis, “A Code Transformation-Based Methodology for Improving I-Cache Performance of DSP Applications,” Proc. Design Automation and Test in Europe (DATE), Mar. 2002.]]
[11]
Trimaran: An Infrastructure for Research in Instruction-Level Parallelism, http://www.trimaran.org, 1999.]]
[12]
C. Lee, et al., “Mediabench: A Tool for Evaluating and Synthesizing Multimedia and Communications Systems,” Proc. Int'l Symp. Microarchitecture, pp. 330-335, 1997.]]
[13]
D. Brooks V. Tiwari and M. Martonosi, “Wattch: A Framework for Architectural-Level Power Analysis and Optimizations,” Proc. 27th Int'l Symp. Computer Architecture (ISCA), pp. 83-94, June 2000.]]
[14]
S.V. Adve D. Burger R. Eigenmann A. Rawsthorne M.D. Smith C.H. Gebotys M.T. Kandemir D.J. Lilja A.N. Choudhary J.Z. Fang and P.-C. Yew, “Changing Interaction of Compiler And Architecture,” Computer, vol. 30, no. 12, pp. 51-58, Dec. 1997.]]
[15]
C. Lee J.K. Lee and T. Hwang, “Compiler Optimization on Instruction Scheduling for Low Power,” Proc. Int'l Symp. System Synthesis (ISSS), Sept. 2000.]]
[16]
M. Mahendale S.D. Sherlekar and G. Venkatesh, “Extensions to Programmable DSP Architectures for Reduced Power Dissipation,” Proc. VLSI Design, Jan. 1998.]]
[17]
W.-C. Cheng and M. Pedram, “Power-Aware Bus Encoding Techniques for I/O and Data Busses in an Embedded System,” J. Circuits, Systems, and Computers, vol. 11, pp. 351-364, Aug. 2002.]]
[18]
L. Benini A. Macii E. Macii and M. Poncino, “Selective Instruction Compression for Memory Energy Reduction in Embedded Systems,” Proc. Int'l Symp. Low Power Electronic Design (ISLPED), Aug. 1999.]]
[19]
P. Centoducatte G. Araujo and R. Pannain, “Compressed Code Execution on DSP Architectures,” Proc. Int'l Symp. System Synthesis (ISSS), Nov. 1999.]]
[20]
H. Lekatsas J. Henkel and W. Wolf, “Code Compression for Low Power Embedded System Design,” Proc. Design Automation Conf. (DAC), June 2000.]]
[21]
S. Debray W. Evans R. Muth and B.D. Sutter, “Compiler Techniques for Code Compaction,” ACM Trans. Programming Languages and Systems (TOPLAS), vol. 22, pp. 378-415, Mar. 2000.]]
[22]
A. Halambi A. Shrivastava P. Biswas N. Dutt and A. Nicolau, “An Efficient Compiler Technique for Code Size Reduction Using Reduced Bit-Width ISAs,” Proc. Design Automation Conf. (DAC), Mar. 2002.]]
[23]
T. Ishihara and H. Yasuura, “A Power Reduction Technique with Object Code Merging for Application Specific Embedded Processors,” Proc. Design Automation and Test in Europe (DATE), Mar. 2000.]]
[24]
S. Steinke L. Wehmeyer B.-S. Lee and P. Marwedel, “Assigning Program and Data Objects to Scratchpad for Energy Reduction,” Proc. Design Automation and Test in Europe (DATE), Mar. 2002.]]
[25]
S. Parameswaran and J. Henkel, “I-Copes: Fast Instruction Code Placement for Embedded Systems to Improve Performance and Energy Efficiency,” Proc. Int'l Conf. Computer Aided Design (ICCAD), Nov. 2001.]]
[26]
N.P. Jouppi, “Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache and Prefetch Buffers,” Proc. Int'l Symp. Computer Architecture (ISCA), May 1990.]]
[27]
J.D. Bunda, “Instruction-Processing Optimization Technique for VLSI Microprocessors,” PhD dessertation, Univ. of Texas at Austin, May 1993.]]
[28]
J. Kin M. Gupta and W.H. Mangione-Smith, “Filtering Memory References to Increase Energy Efficiency,” IEEE Trans. Computers, vol. 49,no. 1, pp. 1-15, Jan. 2000.]]
[29]
W. Tang R. Gupta and A. Nicolau, “Design of a Predictive Filter Cache for Energy Savings in High Performance Processor Architectures,” Proc. Int'l Conf. Computer Design (ICCD), Sept. 2001.]]
[30]
T. Anderson and S. Agarwala, “Effective Hardware-Based Two-Way Loop Cache for High Performance Low Power Processors,” Proc. Int'l Conf. Computer Design (ICCD), Sept. 2000.]]
[31]
A. Gordon-Ross and F. Vahid, “Dynamic Loop Caching Meets Preloaded Loop Caching-A Hybrid Approach,” Proc. Int'l Conf. Computer Design (ICCD), Sept. 2002.]]
[32]
W.-T. Shiue and C. Chakrabarti, “Memory Exploration for Low Power Embedded Systems,” Proc. Design Automation Conf. (DAC), June 1999.]]
[33]
T.M. Conte S. Banerjia S.Y. Larin and K.N. Menezes, “Instruction Fetch Mechanisms for VLIW Architectures with Compressed Encodings,” Proc. 29th Int'l Symp. Microarchitecture (MICRO), Dec. 1996.]]
[34]
M.D. Powell, et al., “Reducing Set-Associative Cache Energy via Way-Prediction and Selective Direct-Mapping,” Proc. 34th Int'l Symp. Microarchitecture (MICRO), Nov. 2001.]]
[35]
S. Kim N. Vijaykrishnan M. Kandemir A. Sivasubramaniam M.J. Irwin and E. Geethanjali, “Power-Aware Partitioned Cache Architectures,” Proc. ACM/IEEE Int'l Symp. Low Power Electronics (ISLPED), Aug. 2001.]]
[36]
R. Colwell R. Nix J. O'Donnell D. Papworth and P. Rodman, “A VLIW Architecture for a Trace Scheduling Compiler,” IEEE Trans. Computers, vol. 37, no. 8, pp. 967-979, Aug. 1988.]]
[37]
V. Lapinskii M.F. Jacome and G. de Veciana, “High Quality Operation Binding for Clustered VLIW Datapaths,” Proc. IEEE/ACM Design Automation Conf. (DAC), June 2001.]]
[38]
P. Faraboschi G. Brown J. Fischer G. Desoli and F. Homewood, “Lx: A Technology Platform for Customizable VLIW Embedded Processing,” Proc. 27th Int'l Symp. Computer Architecture (ISCA), June 2000.]]
[39]
J. Sánchez and A. González, “Modulo Scheduling for a Fully-Distributed Clustered VLIW Architectures,” Proc. 29th Int'l Symp. Microarchitecture (MICRO), Dec. 2001.]]
[40]
M.J. Flynn P. Hung and K.W. Rudd, “Deep-Submicron Microprocessor Design Issues,” IEEE MICRO, vol. 19, no. 4, July-Aug. 1999.]]
[41]
V.V. Zyuban and P.M. Kogge, “Inherently Lower-Power High-Performance Superscalar Architectures,” IEEE Trans. Computers, vol. 50,no. 3, pp. 268-285, Mar. 2001.]]
[42]
M. Franklin, “The Multiscalar Architecture,” PhD dessertation, Univ. of Wisconsin Madison, Nov. 1993.]]
[43]
S. Palacharla N. Jouppi and J. Smith, “Complexity-Effective Superscalar Processor,” Proc. Int'l Symp. Computer Architecture (ISCA), June 1997.]]

Cited By

View all
  • (2018)EXA2PRO programming environmentProceedings of the 18th International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation10.1145/3229631.3239369(202-209)Online publication date: 15-Jul-2018
  • (2016)Hardware Architectural Support for Caching Partitioned Reconfigurations in Reconfigurable SystemsIEEE Transactions on Very Large Scale Integration (VLSI) Systems10.1109/TVLSI.2015.241759524:2(530-543)Online publication date: 19-Jan-2016
  • (2014)Parallel distributed scalable runtime address generation scheme for a coarse grain reconfigurable computation and storage fabricMicroprocessors & Microsystems10.1016/j.micpro.2014.05.00938:8(788-802)Online publication date: 1-Nov-2014
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image IEEE Transactions on Computers
IEEE Transactions on Computers  Volume 54, Issue 6
June 2005
143 pages

Publisher

IEEE Computer Society

United States

Publication History

Published: 01 June 2005

Author Tags

  1. Index Terms- RISC/CISC
  2. RISC/CISC
  3. VLIW architectures
  4. low-power design.
  5. memory design
  6. memory management
  7. real-time and embedded systems

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 01 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2018)EXA2PRO programming environmentProceedings of the 18th International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation10.1145/3229631.3239369(202-209)Online publication date: 15-Jul-2018
  • (2016)Hardware Architectural Support for Caching Partitioned Reconfigurations in Reconfigurable SystemsIEEE Transactions on Very Large Scale Integration (VLSI) Systems10.1109/TVLSI.2015.241759524:2(530-543)Online publication date: 19-Jan-2016
  • (2014)Parallel distributed scalable runtime address generation scheme for a coarse grain reconfigurable computation and storage fabricMicroprocessors & Microsystems10.1016/j.micpro.2014.05.00938:8(788-802)Online publication date: 1-Nov-2014
  • (2013)Design Space Exploration of Distributed Loop Buffer Architectures with Incompatible Loop-Nest Organisations in Embedded SystemsJournal of Signal Processing Systems10.1007/s11265-013-0749-z72:1(69-85)Online publication date: 1-Jul-2013
  • (2013)Design exploration of a NVM based hybrid instruction memory organization for embedded platformsDesign Automation for Embedded Systems10.1007/s10617-014-9151-817:3-4(459-483)Online publication date: 1-Sep-2013
  • (2010)Fine-grain dynamic instruction placement for L0 scratch-pad memoryProceedings of the 2010 international conference on Compilers, architectures and synthesis for embedded systems10.1145/1878921.1878943(137-146)Online publication date: 24-Oct-2010
  • (2009)Playing the trade-off gameACM Transactions on Design Automation of Electronic Systems10.1145/1529255.152925814:3(1-37)Online publication date: 4-Jun-2009
  • (2008)COFFEEProceedings of the 3rd international conference on High performance embedded architectures and compilers10.5555/1786054.1786074(193-208)Online publication date: 27-Jan-2008
  • (2008)Efficient Method to Generate an Energy Efficient Schedule Using Operation ShufflingIEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences10.1093/ietfec/e91-a.2.604E91-A:2(604-612)Online publication date: 1-Feb-2008
  • (2008)Joint hardware-software leakage minimization approach for the register file of VLIW embedded architecturesIntegration, the VLSI Journal10.1016/j.vlsi.2007.04.00441:1(38-48)Online publication date: 1-Jan-2008
  • Show More Cited By

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media