research-article

Playing the trade-off game: Architecture exploration using Coffeee

Authors:

Praveen Raghavan,

Murali Jayapala,

Andy Lambrechts,

Francky CatthoorAuthors Info & Claims

ACM Transactions on Design Automation of Electronic Systems (TODAES), Volume 14, Issue 3

Article No.: 36, Pages 1 - 37

https://doi.org/10.1145/1529255.1529258

Published: 04 June 2009 Publication History

Abstract

Modern mobile devices need to be extremely energy efficient. Due to the growing complexity of these devices, energy-aware design exploration has become increasingly important. Current exploration tools often do not support energy estimation, or require the design to be very detailed before estimation is possible. It is important to get early feedback on both performance and energy consumption during all phases of the design and at higher abstraction levels. This article presents a unified optimization and exploration framework to explore source-level transformation to processor architecture design space. The proposed retargetable compiler and simulator framework can map applications to a range of processors and memory configurations, simulate, and report detailed performance and energy estimates. An accurate and consistent energy modeling approach is introduced which can estimate the energy consumption of processor and memories at a component level, which can help to guide the design process. Fast energy-aware architecture exploration is illustrated by modeling both state-of-the-art processors as well as other architectures. Various design trade-offs are also illustrated on different academic as well as industrial benchmarks from both the wireless communication and multimedia domain. We also illustrate a design space exploration on different applications and show that there is large trade-off space between application performance, energy consumption, and area. We show that the proposed framework is consistent, accurate, and covers a large design space including various novel low-power extensions in a unified framework.

References

[1]

Aa, T. V., Jayapala, M., Barat, F., De Coninck, G., Lauwereins, R., Catthoor, F., and Corporaal, H. 2004. Instruction buffering exploration for low energy vliws with instruction clusters. In Proceedings of the Asian Pacific Design and Automation Conference (ASPDAC'04).

Digital Library

[2]

Austin, T., Larson, E., and Ernst, D. 2002. Simplescalar: An infrastructure for computer system modeling. IEE Comput. Mag. 35, 2, 59--67.

Digital Library

[3]

Ascia, G., Catania, V., Palesi, M., and Patti, D. 2003. Epic-Explorer: A parameterized VLIW-based platform framework for design space exploration. In Proceedings of the ESTIMedia Conference, 3--4.

[4]

Banakar, R., Steinke, S., Lee, B.- S., Balakrishnan, M., and Marwedel, P. 2002. Scratchpad mem- ory: Design alternative for cache on-chip memory in embedded systems. In Proceedings of the 10th International Symposium on Hardware/Software Codesign (CODES'02). ACM Press, New York, 73--78.

Digital Library

[5]

Baron, M. 2005. Cortex a8: High speed, low power. In Microprocessor Report.

[6]

Benini, L., Bruni, D., Chinosi, M., Silvano, C., and Zaccaria, V. 2002. A power modeling and estimation framework for VLIW-based embedded system. ST J. Syst. Res. 3, 1, 110--118.

[7]

Brockmeyer, E., Ghez, C., Baetens, W., and Catthoor, F. 2000. Unified Low-Power Design Flow for Data-Dominated Multi-Media and Telecom Applications. Kluwer Academic, Boston, MA.

[8]

Brooks, D., Tiwari, V., and Martonosi, M. 2000. Wattch: A framework for architectural-level power analysis and optimizations. In Proceedings of the 27th International Symposium on Computer Architecture (ISCA), 83--94.

Digital Library

[9]

Cadence, Inc. 2006. Cadence SoC Encounter User Guide. Cadence, Inc.

[10]

Chang, N., Kim, K., and Lee, H. G. 2000. Cycle-Accurate energy consumption measurement and analysis: Case study of arm7tdmi. In Proceedings of the International Symposium on Low Power Electronics and Design (ISPLED'00), 185--190.

Digital Library

[11]

Cohen, A., Sigler, M., Girbal, S., Temam, O., Parello, D., and Vasilache, N. 2005. Facilitating the search for compositions of program transformations. In Proceedings of the International Conference on Supercomputing (ICS'05), 151--160.

Digital Library

[12]

CoWare, Inc. 2008. CoWare processor designer. www.coware.com/products/processordesigner.php.

[13]

Dally, W. J., Kapasi, U. J., Khailany, B., Ahn, J. H., and Das, A. 2004. Stream processors: Pro- grammability with efficiency. ACM Queue 2, 1.

Digital Library

[14]

Fan, K., Kudlur, M., Park, H., and Mahlke, S. 2005. Cost sensitive modulo scheduling in a loop accelerator synthesis system. In Proceedings of the 38th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'05). IEEE Computer Society, Washington, DC, 219--232.

Digital Library

[15]

Fan, K., Park, H. H., Kudlur, M., and Mahlke, S. 2008. Modulo scheduling for highly customized datapaths to increase hardware reusability. In Proceedings of the International Symposium on Code Generation and Optimization (CGO'08), 124--133.

Digital Library

[16]

Faraday Technology Corporation. 2007. Faraday UMC 90nm RVT Standard Cell Library. http://www.faraday-tech.com.

[17]

Gangawar, A., Balakrishnan, M., and Kumar, A. 2007. Impact of intercluster communication mechanisms on ilp in clustered VLIW architectures. ACM Trans. Des. Autom. Electron. Syst 10, 1--29.

Digital Library

[18]

Girbal, S., Vasilache, N., Bastoul, C., Cohen, A., Parello, D., Sigler, M., and Temam, O. 2006. Semi-Automatic composition of loop transformations for deep parallelism and memory hierarchies. Int. J. Parallel Program., 261--317.

Digital Library

[19]

Gonzalez, R. 2002. Xtensa: A configurable and extensible processor. IEEE Micro. 20, 2.

Digital Library

[20]

Gordon-Ross, A., Cotterell, S., and Vahid, F. 2002. Exploiting fixed programs in embedded systems: A loop cache example. Proc. IEEE Comput. Architecture Lett.

Digital Library

[21]

Holma, H. and Toskala, A. 2001. WCDMA for UMTS: Radio Access for Third Generation Mobile Communications. John Wiley.

Digital Library

[22]

Jacome, M. F. and De Veciana, G. 2000. Design challenges for new application-specific processors. IEEE Des. Test Comput. (special issue on design of embedded systems).

Digital Library

[23]

Jayapala, M., Barat, F., Aa, T. V., Catthoor, F., Corporaal, H., and De Coninck, G. 2005. Clus- tered loop buffer organization for low energy VLIW embedded processors. IEEE Trans. Comput. 54, 6, 672--683.

Digital Library

[24]

Kudlur, M., Fan, K., Chu, M. L., and Mahlke, S. A. 2004. Automatic synthesis of customized local memories for multicluster application accelerators. In Proceedings of the Annual Adaptive Sensor Array Processing Workshop (ASAP'04), 304--314.

Digital Library

[25]

Lambrechts, A., Raghavan, P., Jayapala, M., Catthoor, F., and Verkest, D. 2007. Energy vs. performance trade-offs and interconnect-aware design for coarse grained reconfigurable processors. In Proceedings of the Asia and South Pacific Design Automation Conference Ph.D. Forum.

[26]

Lin, Y., Lee, H., Woh, M., Harel, Y., Mahlke, S., Mudge, T., Chakrabarti, C., and Flautner, K. 2006. SODA: A low-power architecture for software radio. In Proceedings of the Annual International Symposium on Computer Architecture (ISCA'06).

Digital Library

[27]

LSF. 2002. LSF: Liberty simulation framework 1.0. http://liberty.princeton.edu/Software/LSE.

[28]

Mediabench. Mediabench homepage.http://www.cs.ucla.edu/leec/mediabench.

[29]

Mei, B., Vernalde, S., Verkest, D., Man, H. D., and Lauwereins, R. 2003. ADRES: An architecture with tightly coupled VLIW processor and coarse-grained reconfigurable matrix. In Proceedings of the Conference on Field-Programmable Logic and Applications.

[30]

Ponomarev, D., Kucuk, G., and Ghose, K. 2002. Accupower: An accurate power estimation tool for superscalar microprocessors. In Proceedings of the Conference and Exhibition on Design, Automation and Test in Europe (DATE'02), 124--130.

Digital Library

[31]

Rabbah, R. M., Bratt, I., Asanovic, K., and Agarwal, A. 2004. Versatility and versabench: A new metric and a benchmark suite for flexible architectures. http://groups.csail.mit.edu/cag/versabench/MIT-LCS-TM-646.pdf.

[32]

Raghavan, P., Lambrechts, A., Jayapala, M., Catthoor, F., and Verkest, D. 2006. Distributed loop controller architecture for multi-threading in uni-threaded VLIW processors. In Proceedings of the Conference on Design Automation and Test in Europe (DATE'06).

Digital Library

[33]

Raghavan, P., Lambrechts, A., Jayapala, M., Catthoor, F., Verkest, D., and Corporaal, H. 2007. Very wide register: An asymmetric register file organization for low power embedded processors. In Proceedings of the Conference on Design, Automation and Test in Europe (DATE'07).

Digital Library

[34]

Rixner, S., Dally, W. J., Khailany, B., Mattson, P. R., Kapasi, U. J., and Owens, J. D. 2000. Register organization for media processing. In Proceedings of the International Symposium on High-Performance Computer Architectures (HPCA'00), 375--386.

[35]

Schneider, M., Blume, H., and Noll, T. G. 2004. Power estimation on functional level for programmable processors. Adv. Radio Sci. 2, 215--219.

[36]

Schuster, T., Bougard, B., Raghavan, P., Priewasser, R., Novo, D., Vanderperre, L., and Catthoor, F. 2007. Design of a low power pre-synchronization ASIP for multimode SDR terminals. In Proceedings of the International Symposium on Systems, Architectures, Modeling and Simulation (SAMOS'07).

Digital Library

[37]

Singh, H., Lee, M.-H., Lu, G., Bagherzadeh, N., Kurdahi, F. J., and Filho, E. M. C. 2000. Mor- phosys: An integrated reconfigurable system for data-parallel and computation-intensive applications. IEEE Trans. Comput. 49, 5, 465--481.

Digital Library

[38]

Sinha, A. and Chandrakasan, A. P. 2001. Jouletrack - A Web based tool for software energy profiling. In Proceedings of the Design Automation Conference (DAC'01).

Digital Library

[39]

Starcore DSP Techology. 2000. SC140 DSP Core Reference Manual. Starcore DSP Techology, http://www.starcore-dsp.com.

[40]

SUIF. 2001. SUIF2 compiler system. http://suif.stanford.edu.

[41]

Synfora, Inc. 2008. PICO express. http://www.synfora.com.

[42]

Synopsys, Inc. 2006a. Design Compiler User Guide. Synopsys, Inc.

[43]

Synopsys, Inc. 2006b. Prime Power User Guide. Synopsys, Inc.

[44]

Target. 2008. IP designer. http://www.retarget.com.

[45]

Texas Instruments, Inc. 2006. TMS320C64x/C64x+ DSP CPU and Instruction Set Reference Guide. http://www.ti.com/.

[46]

Tiwari, V., Malik, S., and Wolfe, A. 1994. Power analysis of embedded software: A first step towards software power minimization. IEEE Trans. Very Large Scale Integration (VLSI) Syst. 2, 4, 437--445.

Digital Library

[47]

Trimaran. 1999. Trimaran 2.0: An infrastructure for research in instruction-level parallelism. http://www.trimaran.org.

[48]

Wiegand, T., Sullivan, G. J., Bjontegaard, G., and Luthra, A. 2003. Overview of the H.264/AVC video coding standard. IEEE Trans. Circ. Syst. Video Technol. 13, 7, 560--576.

Digital Library

[49]

Ye, W., Vijaykrishnan, N., Kandemir, M. T., and Irwin, M. J. 2000. The design and use of simple- power: A cycle-accurate energy estimation tool. In Proceedings of the Design Automation Conference (DAC'00), 340--345.

Digital Library

Cited By

TANIGUCHI IAOKI KTOMIYAMA HRAGHAVAN PCATTHOOR FFUKUI M(2014)Fast and Accurate Architecture Exploration for High Performance and Low Energy VLIW Data-PathIEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences10.1587/transfun.E97.A.606E97.A:2(606-615)Online publication date: 2014
https://doi.org/10.1587/transfun.E97.A.606
Nakamura SAoki KUchida MTaniguchi ITomiyama HFukui M(2013)A new metric for basic-block level rough energy estimation for power-gated VLIW data-path model2013 13th International Symposium on Communications and Information Technologies (ISCIT)10.1109/ISCIT.2013.6645873(320-324)Online publication date: Sep-2013
https://doi.org/10.1109/ISCIT.2013.6645873
Aoki KTaniguchi ITomiyama HFukui M(2013)GA-based architecture exploration method for low energy VLIW data-path model2013 13th International Symposium on Communications and Information Technologies (ISCIT)10.1109/ISCIT.2013.6645870(307-310)Online publication date: Sep-2013
https://doi.org/10.1109/ISCIT.2013.6645870
Show More Cited By

Index Terms

Playing the trade-off game: Architecture exploration using Coffeee

Recommendations

Energy-efficient instruction compression with programmable dictionaries: Energy-efficient instruction compression...
Abstract
To improve the energy efficiency of computation, accelerators trade off performance and energy consumption for flexibility. Fixed-function accelerators reach high energy efficiency, but are inflexible. Adding programmability via an instruction set ...
A design of EPIC type processor based on MIPS architecture
Abstract
This paper proposes an EPIC (Explicitly Parallel Instruction Computing Architecture) type processor based on MIPS. VLIW processors can execute multiple instructions simultaneously, but due to dependency of instructions, it is often impossible to ...
Efficient embedded code generation with multiple load-store instructions

In a recent study, we discovered that many single load-store operations in embedded applications can be parallelized and thus encoded simultaneously in a single-instruction multiple-data instruction, called the multiple load-store (MLS) instruction. In ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Design Automation of Electronic Systems

ACM Transactions on Design Automation of Electronic Systems Volume 14, Issue 3

May 2009

376 pages

ISSN:1084-4309

EISSN:1557-7309

DOI:10.1145/1529255

Issue’s Table of Contents

Copyright © 2009 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Journal Family

ACM Journals for the Design of Smart and Connected Systems

Publication History

Published: 04 June 2009

Accepted: 01 January 2009

Revised: 01 November 2008

Received: 01 September 2007

Published in TODAES Volume 14, Issue 3

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Funding Sources

Flexware project IWT

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

5
Total Citations
View Citations
475
Total Downloads

Downloads (Last 12 months)6
Downloads (Last 6 weeks)2

Reflects downloads up to 09 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

TANIGUCHI IAOKI KTOMIYAMA HRAGHAVAN PCATTHOOR FFUKUI M(2014)Fast and Accurate Architecture Exploration for High Performance and Low Energy VLIW Data-PathIEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences10.1587/transfun.E97.A.606E97.A:2(606-615)Online publication date: 2014
https://doi.org/10.1587/transfun.E97.A.606
Nakamura SAoki KUchida MTaniguchi ITomiyama HFukui M(2013)A new metric for basic-block level rough energy estimation for power-gated VLIW data-path model2013 13th International Symposium on Communications and Information Technologies (ISCIT)10.1109/ISCIT.2013.6645873(320-324)Online publication date: Sep-2013
https://doi.org/10.1109/ISCIT.2013.6645873
Aoki KTaniguchi ITomiyama HFukui M(2013)GA-based architecture exploration method for low energy VLIW data-path model2013 13th International Symposium on Communications and Information Technologies (ISCIT)10.1109/ISCIT.2013.6645870(307-310)Online publication date: Sep-2013
https://doi.org/10.1109/ISCIT.2013.6645870
Hidaji BAlipour SSubramaniyan KLarsson-Edefors P(2011)Application-Specific Energy Optimization of General-Purpose Datapath InterconnectProceedings of the 2011 IEEE Computer Society Annual Symposium on VLSI10.1109/ISVLSI.2011.71(301-306)Online publication date: 4-Jul-2011
https://dl.acm.org/doi/10.1109/ISVLSI.2011.71
Palkovic MHartmann MAllam ORaghavan PCatthoor F(2010)Time-space energy consumption modeling of dynamic reconfigurable coarse-grain array processor datapath for wireless applications2010 IEEE Workshop On Signal Processing Systems10.1109/SIPS.2010.5624778(134-139)Online publication date: Oct-2010
https://doi.org/10.1109/SIPS.2010.5624778

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Issue’s Table of Contents