Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Speedups in embedded systems with a high-performance coprocessor datapath

Published: 22 May 2008 Publication History

Abstract

This article presents the speedups achieved in a generic single-chip microprocessor system by employing a high-performance datapath. The datapath acts as a coprocessor that accelerates computational-intensive kernel sections thereby increasing the overall performance. We have previously introduced the datapath which is composed of Flexible Computational Components (FCCs). These components can realize any two-level template of primitive operations. The automated coprocessor synthesis method from high-level software description and its integration to a design flow for executing applications on the system is presented. For evaluating the effectiveness of our coprocessor approach, analytical study in respect to the type of the custom datapath and to the microprocessor architecture is performed. The overall application speedups of several real-life applications relative to the software execution on the microprocessor are estimated using the design flow. These speedups range from 1.75 to 5.84, with an average value of 3.04, while the overhead in circuit area is small. The design flow achieved the acceleration of the applications near to theoretical speedup bounds. A comparison with another high-performance datapath showed that the proposed coprocessor achieves smaller area-time products by an average of 23% for the generated datapaths. Additionally, the FCC coprocessor achieves better performance in accelerating kernels relative to software-programmable DSP cores.

References

[1]
Arm Corp. 2006, www.arm.com.
[2]
Atasu, K., Pozzi, L., and Ienne, P. 2003. Automatic application-specific instruction-set extensions under microarchitectural constraints. In Proceedings of the Design Automation Conference. 256--261.
[3]
Bister, M., Taeymans, Y., and Cornelis, J. 1989. Automatic segmentation of cardiac MR images. In Proceedings of the Computers in Cardiology. IEEE Computer Society Press, 215--218.
[4]
Callahan, T. J., Hauser, J. R., and Wawrzynek, J. 2000. The Garp architecture and C compiler. IEEE Computer 33, 4 (April), 62--69.
[5]
Catthoor, F., Man, H., De, Geurts, W., and Vernalde, S. 1996. Accelerator datapath Synthesis for High-Throughput Signal Processing Applications. Kluwer Academic Publishers.
[6]
Cheung, N., Parameswaran, S., and Henkel, J. 2003. INSIDE: Instruction selection/identification and design Exploration for extensible processors. In Proceedings of the IEEE/ACM ICCAD. 291--297.
[7]
Cong, J., Fan, Y., Han, G., and Zhang, Z. 2004. Application-specific instruction generation for configurable processor architectures. In Proceedings of the ACM International Symposium on Field-Programmable Gate Arrays. 183--189.
[8]
Corazao, M. R., Khalaf, M. A., Guerra, L. M., Potkonjak, M., and Rabaey, J. M. 1996. Performance optimization using template mapping for datapath-intensive high-level synthesis. IEEE Trans. Comput. --Aid. Design, 15, 2, 877--888.
[9]
Crenshaw, J. W. 2000. MATH Toolkit for Real-Time Programming. CMP Books.
[10]
De Micheli, G. 1994. Synthesis and Optimization of Digital Circuits. McGraw-Hill.
[11]
Ebeling, C., Fisher, C., Xing, G., Shen, M., and Liu, H. 2004. Implementing an OFDM receiver on the RaPid reconfigurable architecture. IEEE Comput. 53, 11 (Nov.), 1438--1448.
[12]
Galanis, M. D., Theodoridis, G., Tragoudas, S., and Goutis, C. E. 2006. A high performance datapath for synthesizing DSP kernels. IEEE Trans. Comput. --Aid. Design, 25, 6 (June), 1154--1163.
[13]
Guthaus, M. R., Ringenberg, J. S., Ernst, D., Austin, T. M., Mudge, T., and Brown, R. B. 2001. MiBench: A free, commercially representative embedded benchmark suite. In Proceedings of the IEEE Workshop on Workload Characterization. 3--14.
[14]
Hounsell, B. and Taylor, R. 2004. Co-processor synthesis: A new methodology for embedded software acceleration. In Proceedings of the ACM/IEEE Design Automation and Test in Europe Conference. 682--683.
[15]
Ieee 802.11a wireless lan standard. http://grouper.ieee.org/groups/802/11/.
[16]
Jpeg Image Compression. www.jpeg.org.
[17]
Kastner, R., Kaplan, A., Memik, S. O., and Bozorgadeh, E. 2002. Instruction generation for hybrid reconfigurable systems. ACM Trans. Design Automat. Electro. Syst. 7, 4 (Oct.), 605--627.
[18]
Kumar, S., Pires, L., Ponnuswamy, S., Nanavati, C., Golusky, J., Vojta, M., Wadi, S., Pandalai, D., and Spaanenberg, H. 2000. A benchmark suite for evaluating configurable computing systems---Status, reflections, and future directions. In Proceedings of the ACM International Symposium on Field-Programmable Gate Arrays Symposium. 126--134.
[19]
Kuon, I. and Rose, J. 2006. Measuring the gap between FPGAs and ASICs. In Proceedings of the ACM International Symposium on Field-Programmable Gate Arrays. 21--30.
[20]
Lee, C., Potkonjak, M., and Mangione-Smith W. H. 1997. MediaBench: A tool for evaluating and synthesizing multimedia and communications systems. In Proceedings of IEEE/ACM Symposium on Microarchitecture.
[21]
Maas, E., Herrmann, D., Ernst, R., Ruffer, P., Hasenzahl, S., and Seitz, M. 1997. A Processor-coprocessor Architecture for high end video applications. In Proceedings of the ICASSP. 595--598.
[22]
Marwedel, P., Landwehr, B., and Domer, R. 1997. Built-in chaining: Introducing complex components into architectural synthesis. In Proceedings of the ASPDAC. 599--605.
[23]
Mei, B., Vernalde, S., Verkest, D., and Lauwereins, R. 2004. Mapping methodology for a tightly coupled VLIW/reconfigurable matrix architecture: A case study. In Proceedings of the ACM/IEEE Design Automation and Test in Europe Conference. 1224--1229.
[24]
Schreiber, R., Aditya, S., Mahlke, S., Kathail, V., Rau, B. R., Cronquist, D., and Sivaraman, M. 2002. PICO-NPA: High-level synthesis of nonprogrammable hardware accelerators. J. VLSI Process. 31, 2, 27--142.
[25]
Shee, S. L., Parameswaran, S., and Cheung, N. 2005. Novel architecture for loop acceleration: A case study. In Proceedings of the CODES+ISSS. 297--302.
[26]
Sima, M., Cotofana, S., Vassiliadis, S., Van Eijndhoven, J. T. J., and Vissers, K. A. 2002. A reconfigurable functional unit for TriMedia/CPU64: A case study. In Proceedings of SAMOS. 224--241.
[27]
Singh, H., Lee, M.-H., Lu, G., Kurdahi, F. J., Bagherzadeh, N., and Chaves Filho, E. M. 2000. MorphoSys: An integrated reconfigurable system for data-parallel and communication-intensive applications. IEEE Comput. 49, 5 (May), 465--481.
[28]
Smith, M. D. and Holloway, G. 2002. An introduction to machine SUIF and its portable libraries for analysis and optimization. Tech. Rep., Harvard University.
[29]
Stitt, G., Vahid, F., Mcgregor, G., and Einloth, B. 2005. Hardware/software partitioning of software binaries: A case study of H.264 Decode. In Proceedings of the CODES+ISSS. 285-290.
[30]
Suif2 Compiler Infrastucture. http://suif.stanford.edu/suif/suif2/index.html.
[31]
Sun, F., Ravi, S., Raghunathan, A., and Jha, N. K. 2002. Synthesis of custom processors based on extensible platforms. In Proceedings of the ACM/IEEE ICCAD. 641--648.
[32]
Synplify Asic. www.synplicity.com, Synplicity Inc.
[33]
Texas Instruments. www.ti.com.
[34]
Utdsp Suite. http://www.eecg.toronto.edu/~corinna/DSP/infrastructure/UTDSP.html.
[35]
Villarreal, J., Suresh, D., Stitt, G., Vahid, F., and Najjar, W. 2002. Improving software performance with configurable logic. In Proceedings of the Design Automation for Embedded Systems 7, 4, 325--339.

Cited By

View all
  • (2018)Dataflow Modeling for Reconfigurable Signal Processing SystemsHandbook of Signal Processing Systems10.1007/978-3-319-91734-4_22(787-824)Online publication date: 14-Oct-2018
  • (2010)Dynamic Context Compression for Low-Power CGRADesign of Low-Power Coarse-Grained Reconfigurable Architectures10.1201/b10471-13(119-140)Online publication date: 9-Dec-2010
  • (2009)Hierarchical reconfigurable computing arrays for efficient CGRA-based embedded systemsProceedings of the 46th Annual Design Automation Conference10.1145/1629911.1630123(826-831)Online publication date: 26-Jul-2009

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Design Automation of Electronic Systems
ACM Transactions on Design Automation of Electronic Systems  Volume 12, Issue 3
August 2007
427 pages
ISSN:1084-4309
EISSN:1557-7309
DOI:10.1145/1255456
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Journal Family

Publication History

Published: 22 May 2008
Accepted: 01 December 2006
Revised: 01 October 2006
Received: 01 April 2006
Published in TODAES Volume 12, Issue 3

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Performance improvements
  2. chaining
  3. coprocessor datapath
  4. design flow
  5. kernels
  6. synthesis

Qualifiers

  • Research-article
  • Research
  • Refereed

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)3
  • Downloads (Last 6 weeks)1
Reflects downloads up to 27 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2018)Dataflow Modeling for Reconfigurable Signal Processing SystemsHandbook of Signal Processing Systems10.1007/978-3-319-91734-4_22(787-824)Online publication date: 14-Oct-2018
  • (2010)Dynamic Context Compression for Low-Power CGRADesign of Low-Power Coarse-Grained Reconfigurable Architectures10.1201/b10471-13(119-140)Online publication date: 9-Dec-2010
  • (2009)Hierarchical reconfigurable computing arrays for efficient CGRA-based embedded systemsProceedings of the 46th Annual Design Automation Conference10.1145/1629911.1630123(826-831)Online publication date: 26-Jul-2009

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media