Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Playing the trade-off game: Architecture exploration using Coffeee

Published: 04 June 2009 Publication History

Abstract

Modern mobile devices need to be extremely energy efficient. Due to the growing complexity of these devices, energy-aware design exploration has become increasingly important. Current exploration tools often do not support energy estimation, or require the design to be very detailed before estimation is possible. It is important to get early feedback on both performance and energy consumption during all phases of the design and at higher abstraction levels. This article presents a unified optimization and exploration framework to explore source-level transformation to processor architecture design space. The proposed retargetable compiler and simulator framework can map applications to a range of processors and memory configurations, simulate, and report detailed performance and energy estimates. An accurate and consistent energy modeling approach is introduced which can estimate the energy consumption of processor and memories at a component level, which can help to guide the design process. Fast energy-aware architecture exploration is illustrated by modeling both state-of-the-art processors as well as other architectures. Various design trade-offs are also illustrated on different academic as well as industrial benchmarks from both the wireless communication and multimedia domain. We also illustrate a design space exploration on different applications and show that there is large trade-off space between application performance, energy consumption, and area. We show that the proposed framework is consistent, accurate, and covers a large design space including various novel low-power extensions in a unified framework.

References

[1]
Aa, T. V., Jayapala, M., Barat, F., De Coninck, G., Lauwereins, R., Catthoor, F., and Corporaal, H. 2004. Instruction buffering exploration for low energy vliws with instruction clusters. In Proceedings of the Asian Pacific Design and Automation Conference (ASPDAC'04).
[2]
Austin, T., Larson, E., and Ernst, D. 2002. Simplescalar: An infrastructure for computer system modeling. IEE Comput. Mag. 35, 2, 59--67.
[3]
Ascia, G., Catania, V., Palesi, M., and Patti, D. 2003. Epic-Explorer: A parameterized VLIW-based platform framework for design space exploration. In Proceedings of the ESTIMedia Conference, 3--4.
[4]
Banakar, R., Steinke, S., Lee, B.- S., Balakrishnan, M., and Marwedel, P. 2002. Scratchpad mem- ory: Design alternative for cache on-chip memory in embedded systems. In Proceedings of the 10th International Symposium on Hardware/Software Codesign (CODES'02). ACM Press, New York, 73--78.
[5]
Baron, M. 2005. Cortex a8: High speed, low power. In Microprocessor Report.
[6]
Benini, L., Bruni, D., Chinosi, M., Silvano, C., and Zaccaria, V. 2002. A power modeling and estimation framework for VLIW-based embedded system. ST J. Syst. Res. 3, 1, 110--118.
[7]
Brockmeyer, E., Ghez, C., Baetens, W., and Catthoor, F. 2000. Unified Low-Power Design Flow for Data-Dominated Multi-Media and Telecom Applications. Kluwer Academic, Boston, MA.
[8]
Brooks, D., Tiwari, V., and Martonosi, M. 2000. Wattch: A framework for architectural-level power analysis and optimizations. In Proceedings of the 27th International Symposium on Computer Architecture (ISCA), 83--94.
[9]
Cadence, Inc. 2006. Cadence SoC Encounter User Guide. Cadence, Inc.
[10]
Chang, N., Kim, K., and Lee, H. G. 2000. Cycle-Accurate energy consumption measurement and analysis: Case study of arm7tdmi. In Proceedings of the International Symposium on Low Power Electronics and Design (ISPLED'00), 185--190.
[11]
Cohen, A., Sigler, M., Girbal, S., Temam, O., Parello, D., and Vasilache, N. 2005. Facilitating the search for compositions of program transformations. In Proceedings of the International Conference on Supercomputing (ICS'05), 151--160.
[12]
CoWare, Inc. 2008. CoWare processor designer. www.coware.com/products/processordesigner.php.
[13]
Dally, W. J., Kapasi, U. J., Khailany, B., Ahn, J. H., and Das, A. 2004. Stream processors: Pro- grammability with efficiency. ACM Queue 2, 1.
[14]
Fan, K., Kudlur, M., Park, H., and Mahlke, S. 2005. Cost sensitive modulo scheduling in a loop accelerator synthesis system. In Proceedings of the 38th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'05). IEEE Computer Society, Washington, DC, 219--232.
[15]
Fan, K., Park, H. H., Kudlur, M., and Mahlke, S. 2008. Modulo scheduling for highly customized datapaths to increase hardware reusability. In Proceedings of the International Symposium on Code Generation and Optimization (CGO'08), 124--133.
[16]
Faraday Technology Corporation. 2007. Faraday UMC 90nm RVT Standard Cell Library. http://www.faraday-tech.com.
[17]
Gangawar, A., Balakrishnan, M., and Kumar, A. 2007. Impact of intercluster communication mechanisms on ilp in clustered VLIW architectures. ACM Trans. Des. Autom. Electron. Syst 10, 1--29.
[18]
Girbal, S., Vasilache, N., Bastoul, C., Cohen, A., Parello, D., Sigler, M., and Temam, O. 2006. Semi-Automatic composition of loop transformations for deep parallelism and memory hierarchies. Int. J. Parallel Program., 261--317.
[19]
Gonzalez, R. 2002. Xtensa: A configurable and extensible processor. IEEE Micro. 20, 2.
[20]
Gordon-Ross, A., Cotterell, S., and Vahid, F. 2002. Exploiting fixed programs in embedded systems: A loop cache example. Proc. IEEE Comput. Architecture Lett.
[21]
Holma, H. and Toskala, A. 2001. WCDMA for UMTS: Radio Access for Third Generation Mobile Communications. John Wiley.
[22]
Jacome, M. F. and De Veciana, G. 2000. Design challenges for new application-specific processors. IEEE Des. Test Comput. (special issue on design of embedded systems).
[23]
Jayapala, M., Barat, F., Aa, T. V., Catthoor, F., Corporaal, H., and De Coninck, G. 2005. Clus- tered loop buffer organization for low energy VLIW embedded processors. IEEE Trans. Comput. 54, 6, 672--683.
[24]
Kudlur, M., Fan, K., Chu, M. L., and Mahlke, S. A. 2004. Automatic synthesis of customized local memories for multicluster application accelerators. In Proceedings of the Annual Adaptive Sensor Array Processing Workshop (ASAP'04), 304--314.
[25]
Lambrechts, A., Raghavan, P., Jayapala, M., Catthoor, F., and Verkest, D. 2007. Energy vs. performance trade-offs and interconnect-aware design for coarse grained reconfigurable processors. In Proceedings of the Asia and South Pacific Design Automation Conference Ph.D. Forum.
[26]
Lin, Y., Lee, H., Woh, M., Harel, Y., Mahlke, S., Mudge, T., Chakrabarti, C., and Flautner, K. 2006. SODA: A low-power architecture for software radio. In Proceedings of the Annual International Symposium on Computer Architecture (ISCA'06).
[27]
LSF. 2002. LSF: Liberty simulation framework 1.0. http://liberty.princeton.edu/Software/LSE.
[28]
Mediabench. Mediabench homepage.http://www.cs.ucla.edu/leec/mediabench.
[29]
Mei, B., Vernalde, S., Verkest, D., Man, H. D., and Lauwereins, R. 2003. ADRES: An architecture with tightly coupled VLIW processor and coarse-grained reconfigurable matrix. In Proceedings of the Conference on Field-Programmable Logic and Applications.
[30]
Ponomarev, D., Kucuk, G., and Ghose, K. 2002. Accupower: An accurate power estimation tool for superscalar microprocessors. In Proceedings of the Conference and Exhibition on Design, Automation and Test in Europe (DATE'02), 124--130.
[31]
Rabbah, R. M., Bratt, I., Asanovic, K., and Agarwal, A. 2004. Versatility and versabench: A new metric and a benchmark suite for flexible architectures. http://groups.csail.mit.edu/cag/versabench/MIT-LCS-TM-646.pdf.
[32]
Raghavan, P., Lambrechts, A., Jayapala, M., Catthoor, F., and Verkest, D. 2006. Distributed loop controller architecture for multi-threading in uni-threaded VLIW processors. In Proceedings of the Conference on Design Automation and Test in Europe (DATE'06).
[33]
Raghavan, P., Lambrechts, A., Jayapala, M., Catthoor, F., Verkest, D., and Corporaal, H. 2007. Very wide register: An asymmetric register file organization for low power embedded processors. In Proceedings of the Conference on Design, Automation and Test in Europe (DATE'07).
[34]
Rixner, S., Dally, W. J., Khailany, B., Mattson, P. R., Kapasi, U. J., and Owens, J. D. 2000. Register organization for media processing. In Proceedings of the International Symposium on High-Performance Computer Architectures (HPCA'00), 375--386.
[35]
Schneider, M., Blume, H., and Noll, T. G. 2004. Power estimation on functional level for programmable processors. Adv. Radio Sci. 2, 215--219.
[36]
Schuster, T., Bougard, B., Raghavan, P., Priewasser, R., Novo, D., Vanderperre, L., and Catthoor, F. 2007. Design of a low power pre-synchronization ASIP for multimode SDR terminals. In Proceedings of the International Symposium on Systems, Architectures, Modeling and Simulation (SAMOS'07).
[37]
Singh, H., Lee, M.-H., Lu, G., Bagherzadeh, N., Kurdahi, F. J., and Filho, E. M. C. 2000. Mor- phosys: An integrated reconfigurable system for data-parallel and computation-intensive applications. IEEE Trans. Comput. 49, 5, 465--481.
[38]
Sinha, A. and Chandrakasan, A. P. 2001. Jouletrack - A Web based tool for software energy profiling. In Proceedings of the Design Automation Conference (DAC'01).
[39]
Starcore DSP Techology. 2000. SC140 DSP Core Reference Manual. Starcore DSP Techology, http://www.starcore-dsp.com.
[40]
SUIF. 2001. SUIF2 compiler system. http://suif.stanford.edu.
[41]
Synfora, Inc. 2008. PICO express. http://www.synfora.com.
[42]
Synopsys, Inc. 2006a. Design Compiler User Guide. Synopsys, Inc.
[43]
Synopsys, Inc. 2006b. Prime Power User Guide. Synopsys, Inc.
[44]
Target. 2008. IP designer. http://www.retarget.com.
[45]
Texas Instruments, Inc. 2006. TMS320C64x/C64x+ DSP CPU and Instruction Set Reference Guide. http://www.ti.com/.
[46]
Tiwari, V., Malik, S., and Wolfe, A. 1994. Power analysis of embedded software: A first step towards software power minimization. IEEE Trans. Very Large Scale Integration (VLSI) Syst. 2, 4, 437--445.
[47]
Trimaran. 1999. Trimaran 2.0: An infrastructure for research in instruction-level parallelism. http://www.trimaran.org.
[48]
Wiegand, T., Sullivan, G. J., Bjontegaard, G., and Luthra, A. 2003. Overview of the H.264/AVC video coding standard. IEEE Trans. Circ. Syst. Video Technol. 13, 7, 560--576.
[49]
Ye, W., Vijaykrishnan, N., Kandemir, M. T., and Irwin, M. J. 2000. The design and use of simple- power: A cycle-accurate energy estimation tool. In Proceedings of the Design Automation Conference (DAC'00), 340--345.

Cited By

View all
  • (2014)Fast and Accurate Architecture Exploration for High Performance and Low Energy VLIW Data-PathIEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences10.1587/transfun.E97.A.606E97.A:2(606-615)Online publication date: 2014
  • (2013)A new metric for basic-block level rough energy estimation for power-gated VLIW data-path model2013 13th International Symposium on Communications and Information Technologies (ISCIT)10.1109/ISCIT.2013.6645873(320-324)Online publication date: Sep-2013
  • (2013)GA-based architecture exploration method for low energy VLIW data-path model2013 13th International Symposium on Communications and Information Technologies (ISCIT)10.1109/ISCIT.2013.6645870(307-310)Online publication date: Sep-2013
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Design Automation of Electronic Systems
ACM Transactions on Design Automation of Electronic Systems  Volume 14, Issue 3
May 2009
376 pages
ISSN:1084-4309
EISSN:1557-7309
DOI:10.1145/1529255
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Journal Family

Publication History

Published: 04 June 2009
Accepted: 01 January 2009
Revised: 01 November 2008
Received: 01 September 2007
Published in TODAES Volume 14, Issue 3

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Energy
  2. VLIW
  3. architecture exploration
  4. area
  5. compiler-architecture interaction
  6. design
  7. embedded systems
  8. loop transformations
  9. power estimation
  10. power-performance trade-off
  11. processors

Qualifiers

  • Research-article
  • Research
  • Refereed

Funding Sources

  • Flexware project IWT

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)6
  • Downloads (Last 6 weeks)2
Reflects downloads up to 09 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2014)Fast and Accurate Architecture Exploration for High Performance and Low Energy VLIW Data-PathIEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences10.1587/transfun.E97.A.606E97.A:2(606-615)Online publication date: 2014
  • (2013)A new metric for basic-block level rough energy estimation for power-gated VLIW data-path model2013 13th International Symposium on Communications and Information Technologies (ISCIT)10.1109/ISCIT.2013.6645873(320-324)Online publication date: Sep-2013
  • (2013)GA-based architecture exploration method for low energy VLIW data-path model2013 13th International Symposium on Communications and Information Technologies (ISCIT)10.1109/ISCIT.2013.6645870(307-310)Online publication date: Sep-2013
  • (2011)Application-Specific Energy Optimization of General-Purpose Datapath InterconnectProceedings of the 2011 IEEE Computer Society Annual Symposium on VLSI10.1109/ISVLSI.2011.71(301-306)Online publication date: 4-Jul-2011
  • (2010)Time-space energy consumption modeling of dynamic reconfigurable coarse-grain array processor datapath for wireless applications2010 IEEE Workshop On Signal Processing Systems10.1109/SIPS.2010.5624778(134-139)Online publication date: Oct-2010

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media