research-article

A design flow for application specific heterogeneous pipelined multiprocessor systems

Authors:

Sri ParameswaranAuthors Info & Claims

DAC '09: Proceedings of the 46th Annual Design Automation Conference

Pages 250 - 253

https://doi.org/10.1145/1629911.1629979

Published: 26 July 2009 Publication History

Abstract

This paper describes a rapid design methodology to create a pipeline of processers to execute streaming applications. The methodology is in two separate phases: the first phase, uses a heuristic to rapidly search through a large number of processor configurations (configurations differ by the base processor, the additional instructions and cache sizes) to find the near Pareto front; the second phase, utilizes either the above heuristic or an ILP (Integer Linear Programming) formulation to search a smaller design space to find an appropriate final implementation. By the utilization of the fast heuristic with differing runtime constraints in the first phase, we rapidly find the near Pareto front. The second phase provides either an optimal or a near optimal solution. Both the ILP formulation and the heuristic find a system with the smallest area, within a designer specified runtime constraint. The system has efficiently explored design spaces with over 10¹² design points.

We integrated this design methodology into a commercial design flow and evaluated our approach with different benchmarks (JPEG Encoder, JPEG Decoder and MP3 Encoder). For each benchmark, the near Pareto front was found in a few hours using the heuristic (took several days for the ILP). The results show that the average area error of the heuristic is within 2.5% of the optimal design points (obtained using ILP) for all benchmarks.

References

[1]

Altera Nios Processor. Altera Corp. (http://www.altera.com).

[2]

ARC the leader in configurable processor technology. ARC International (http://www.arc.com).

[3]

Xtensa Processor. Tensilica Inc. (http://www.tensilica.com).

[4]

S. L. Shee, A. Erdos, and S. Parameswaran. Heterogeneous multiprocessor implementations for jpeg:: a case study. In CODES+ISSS '06: Proceedings of the 4th international conference on Hardware/software codesign and system synthesis, pages 217--222, New York, NY, USA, 2006. ACM.

Digital Library

[5]

M. Strik, A. Timmer, J. van Meerbergen, and G.-J. van Rootselaar. Heterogeneous multiprocessor for the management of real-time video and graphics streams. Solid-State Circuits, IEEE Journal of, 35(11):1722--1731, Nov 2000.

[6]

A. Beric, R. Sethuraman, C. Pinto, H. Peters, G. Veldman, P. van de Haar, and M. Duranton. Heterogeneous multiprocessor for high definition video. Consumer Electronics, 2006. ICCE '06. 2006 Digest of Technical Papers. International Conference on, pages 401--402, 7--11 Jan. 2006.

[7]

T. Kodaka, K. Kimura, and H. Kasahara. Multigrain parallel processing for jpeg encoding on a single chip multiprocessor. In IWIA '02: Proceedings of the International Workshop on Innovative Architecture for Future Generation High-Performance Processors and Systems (IWIA'02), page 57, Washington, DC, USA, 2002. IEEE Computer Society.

Digital Library

[8]

S. Banerjee, T. Hamada, P. Chau, and R. Fellman. Macro pipelining based scheduling on high performance heterogeneous multiprocessor systems. Signal Processing, IEEE Transactions on, 43(6):1468--1484, 1995.

Digital Library

[9]

J. Jeon and K. Choi. Loop pipelining in hardware-software partitioning. In Asia and South Pacific Design Automation Conference, pages 361--366, 1998.

[10]

T. H. Cormen, C. E. Leiserson, R. L. Rivest, and C. Stien. Introduction to Algorithms. MIT Press and MCGraw-Hill, Second edition, 2001.

Digital Library

[11]

J. DeSouza-Batista and A. Parker. Optimal synthesis of application specific heterogeneous pipelined multiprocessors. Application Specific Array Processors, 1994. Proceedings., International Conference on, pages 99--110, 22--24 Aug 1994.

[12]

S.-R. Kuang, C.-Y. Chen, and R.-Z. Liao. Partitioning and pipelined scheduling of embedded system using integer linear programming. In ICPADS '05: Proceedings of the 11th International Conference on Parallel and Distributed Systems - Workshops (ICPADS'05), pages 37--41, Washington, DC, USA, 2005. IEEE Computer Society.

Digital Library

[13]

M. Schwiegershausen and P. Pirsch. A formal approach for the optimization of heterogeneous multiprocessors for complex image processing schemes. In EURO-DAC '95/EURO-VHDL '95: Proceedings of the conference on European design automation, pages 8--13, Los Alamitos, CA, USA, 1995. IEEE Computer Society Press.

Digital Library

[14]

F. Sun, S. Ravi, A. Raghunathan, and N. K. Jha. Synthesis of application-specific heterogeneous multiprocessor architectures using extensible processors. In VLSID '05: Proceedings of the 18th International Conference on VLSI Design held jointly with 4th International Conference on Embedded Systems Design, pages 551--556, Washington, DC, USA, 2005. IEEE Computer Society.

Digital Library

[15]

J. Cong, G. Han, and W. Jiang. Synthesis of an application-specific soft multiprocessor system. In FPGA '07: Proceedings of the 2007 ACM/SIGDA 15th international symposium on Field programmable gate arrays, pages 99--107, New York, NY, USA, 2007. ACM.

Digital Library

[16]

S. L. Shee and S. Parameswaran. Design methodology for pipelined heterogeneous multiprocessor system. In DAC '07: Proceedings of the 44th annual conference on Design automation, pages 811--816, New York, NY, USA, 2007. ACM.

Digital Library

[17]

H. Javaid and S. Parameswaran. Synthesis of heterogeneous pipelined multiprocessor systems using ilp: jpeg case study. In CODES/ISSS '08: Proceedings of the 6th IEEE/ACM/IFIP international conference on Hardware/Software codesign and system synthesis, pages 1--6, New York, NY, USA, 2008. ACM.

Digital Library

[18]

H. Javaid and S. Parameswaran. Synthesis of application specific heterogeneous multiprocessor systems. Technical Report UNSW-CSE-TR-0911, School of Computer Science and Engineering, The University of New South Wales.

[19]

Flix: Fast relief for performance-hungry embedded applications, 2005. Available at: http://www.tensilica.com/pdf/FLIX_White_Paper_v2.pdf.

[20]

XPRES Generated Specialized Operations, 2005. Available at: http://tensilica.com/pdf/XPRES%201205.pdf.

[21]

lp_solve. Available at: http://lpsolve.sourceforge.net/5.5/.

Cited By

Uma PSindhuja MReddy AVignesh NPanigrahy A(2022)An Extensive Survey on Assessment of Multicore Processors for Embedded SystemsAdvances in Signal Processing and Communication Engineering10.1007/978-981-19-5550-1_16(161-170)Online publication date: 2-Dec-2022
https://doi.org/10.1007/978-981-19-5550-1_16
Xiao YNazarian SBogdan P(2019)Self-Optimizing and Self-Programming Computing Systems: A Combined Compiler, Complex Networks, and Machine Learning ApproachIEEE Transactions on Very Large Scale Integration (VLSI) Systems10.1109/TVLSI.2019.2897650(1-12)Online publication date: 2019
https://doi.org/10.1109/TVLSI.2019.2897650
Belkebir DZga A(2019)Mapping and scheduling techniques in NoC: A survey of the state of the art2019 International Conference on Networking and Advanced Systems (ICNAS)10.1109/ICNAS.2019.8807815(1-6)Online publication date: Jun-2019
https://doi.org/10.1109/ICNAS.2019.8807815
Show More Cited By

Index Terms

A design flow for application specific heterogeneous pipelined multiprocessor systems

Recommendations

Rapid design space exploration of application specific heterogeneous pipelined multiprocessor systems

This paper describes a rapid design methodology to create a pipeline of processors to execute streaming applications. The methodology seeks a system with the smallest area while its runtime is within a specified runtime constraint. Initially, a ...
A Heuristic Ceiling Point Algorithm for General Integer Linear Programming

This paper first examines the role of ceiling points in solving a pure, general integer linear programming problem P. Several kinds of ceiling points are defined and analyzed and one kind called "feasible 1-ceiling points" proves to be of special ...
Optimal synthesis of latency and throughput constrained pipelined MPSoCs targeting streaming applications
CODES/ISSS '10: Proceedings of the eighth IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis

A streaming application, characterized by a kernel that can be broken down into independent tasks which can be executed in a pipelined fashion, inherently allows its implementation on a pipeline of Application Specific Instruction set Processors (ASIPs),...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

DAC '09: Proceedings of the 46th Annual Design Automation Conference

July 2009

994 pages

ISBN:9781605584973

DOI:10.1145/1629911

Copyright © 2009 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

EDAC: Electronic Design Automation Consortium
SIGDA: ACM Special Interest Group on Design Automation
IEEE-CAS: Circuits & Systems

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 26 July 2009

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Australian Research Council

Conference

DAC '09

Sponsor:

EDAC
SIGDA
IEEE-CAS

DAC '09: The 46th Annual Design Automation Conference 2009

July 26 - 31, 2009

California, San Francisco

Acceptance Rates

Overall Acceptance Rate 1,770 of 5,499 submissions, 32%

Upcoming Conference

DAC '25

Sponsor:
sigda

62nd ACM/IEEE Design Automation Conference

June 22 - 26, 2025

San Francisco , CA , USA

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

26
Total Citations
View Citations
282
Total Downloads

Downloads (Last 12 months)2
Downloads (Last 6 weeks)0

Reflects downloads up to 03 Sep 2024

Other Metrics

View Author Metrics

Citations

Cited By

Uma PSindhuja MReddy AVignesh NPanigrahy A(2022)An Extensive Survey on Assessment of Multicore Processors for Embedded SystemsAdvances in Signal Processing and Communication Engineering10.1007/978-981-19-5550-1_16(161-170)Online publication date: 2-Dec-2022
https://doi.org/10.1007/978-981-19-5550-1_16
Xiao YNazarian SBogdan P(2019)Self-Optimizing and Self-Programming Computing Systems: A Combined Compiler, Complex Networks, and Machine Learning ApproachIEEE Transactions on Very Large Scale Integration (VLSI) Systems10.1109/TVLSI.2019.2897650(1-12)Online publication date: 2019
https://doi.org/10.1109/TVLSI.2019.2897650
Belkebir DZga A(2019)Mapping and scheduling techniques in NoC: A survey of the state of the art2019 International Conference on Networking and Advanced Systems (ICNAS)10.1109/ICNAS.2019.8807815(1-6)Online publication date: Jun-2019
https://doi.org/10.1109/ICNAS.2019.8807815
Sinaei SFatemi O(2018)Run-time Mapping Algorithm for Dynamic Workloads on Heterogeneous MPSoCs Platforms2018 21st Euromicro Conference on Digital System Design (DSD)10.1109/DSD.2018.00071(373-380)Online publication date: Aug-2018
https://doi.org/10.1109/DSD.2018.00071
Namazi AAbdollahi MSafari SMohammadi S(2017)A Majority-Based Reliability-Aware Task Mapping in High-Performance Homogenous NoC ArchitecturesACM Transactions on Embedded Computing Systems10.1145/313127317:1(1-31)Online publication date: 6-Dec-2017
https://dl.acm.org/doi/10.1145/3131273
Singh ADziurzanski PMendis HIndrusiak L(2017)A Survey and Comparative Study of Hard and Soft Real-Time Dynamic Resource Allocation Strategies for Multi-/Many-Core SystemsACM Computing Surveys10.1145/305726750:2(1-40)Online publication date: 11-Apr-2017
https://dl.acm.org/doi/10.1145/3057267
Sinaei SPimentel AFatemi O(2017)Run-time resource allocation for embedded Multiprocessor System-on-Chip using tree-based design space exploration2017 12th International Conference on Design & Technology of Integrated Systems In Nanoscale Era (DTIS)10.1109/DTIS.2017.7929873(1-6)Online publication date: Apr-2017
https://doi.org/10.1109/DTIS.2017.7929873
Pagani SShafique MHenkel J(2017)Design Space Exploration and Run-Time Adaptation for Multi-core Resource Management Under Performance and Power ConstraintsHandbook of Hardware/Software Codesign10.1007/978-94-017-7358-4_11-1(1-32)Online publication date: 8-Apr-2017
https://doi.org/10.1007/978-94-017-7358-4_11-1
Pagani SShafique MHenkel J(2017)Design Space Exploration and Run-Time Adaptation for Multicore Resource Management Under Performance and Power ConstraintsHandbook of Hardware/Software Codesign10.1007/978-94-017-7267-9_11(301-332)Online publication date: 27-Sep-2017
https://doi.org/10.1007/978-94-017-7267-9_11
Singh AShafique MKumar AHenkel J(2016)Resource and Throughput Aware Execution Trace Analysis for Efficient Run-Time Mapping on MPSoCsIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2015.244693835:1(72-85)Online publication date: Jan-2016
https://doi.org/10.1109/TCAD.2015.2446938
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents