research-article

Synthesis algorithm for application-specific homogeneous processor networks

Authors:

Karthik Gururaj,

Wei JiangAuthors Info & Claims

IEEE Transactions on Very Large Scale Integration (VLSI) Systems, Volume 17, Issue 9

Pages 1318 - 1329

https://doi.org/10.1109/TVLSI.2008.2004874

Published: 01 September 2009 Publication History

Abstract

The application-specific multiprocessor system-on-achip is a promising design alternative because of its high degree of flexibility, short development time, and potentially high performance attributed to application-specific optimizations. However, designing an optimal application-specific multiprocessor system is still challenging because there are a number of important metrics, such as throughput, latency, and resource usage, which need to be explored and optimized. This paper addresses the problem of synthesizing an application-specific multiprocessor system for stream-oriented embedded applications to minimize system latency under the throughput constraint. We employ a novel framework for this problem, similar to that of technology mapping in the logic synthesis domain, and develop a set of efficient algorithms, including labeling and clustering for efficient generation of the multiprocessor architecture with application-specific optimized latency. Specifically, the result of our algorithm is latency-optimal for directed acyclic task graphs. Application of our approach to the Motion JPEG example on Xilinx's Virtex II Pro platform FPGA shows interesting design tradeoffs.

References

[1]

J. Cong, H. Li, and C. Wu, "Simultaneous circuit partitioning/clustering with retiming for performance optimization," in Proc. ACM Design Automation Conf., 1999, pp. 460-465.

Digital Library

[2]

J. Cong, G. Han, and W. Jiang, "Synthesis of an application-specific soft multiprocessor system," in Proc. 15th ACM/SIGDA Int. Symp. Field Programmable Gate Arrays, Monterey, CA, Feb. 2007, pp. 99-107.

Digital Library

[3]

R. P. Dick, D. L. Rhodes, and W. Wolf, "TGFF: Task graph for free," in Proc. 6th Int. Workshop Hardware/Software Codesign, Mar. 1998, pp. 97-101.

Digital Library

[4]

R. P. Dick and N. K. Jha, "MOGAC: A multiobjective genetic algorithm for hardware-software cosynthesis of distributed embedded systems," IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst., vol. 17, no. 10, pp. 920-935, Oct. 1998.

Digital Library

[5]

S. Dutta, R. Jensen, and A. Rieckmann, "Viper: A multiprocessor SOC for advanced set-top box and digital TV systems," IEEE Design Test, vol. 18, no. 5, pp. 21-31, Sep. 2001.

Digital Library

[6]

H. El-Rewini, T. Lewis, and H. Ali, Task Scheduling in Parallel and Distributed Systems. Englewood Cliffs, NJ: Prentice-Hall, 1994.

Digital Library

[7]

M. Grajcar, "Genetic list scheduling algorithm for scheduling and allocation on a loosely coupled heterogeneous multiprocessor system," in Proc. 36th ACM/IEEE Conf. Design Autom., New Orleans, LA, 1999, pp. 280-285.

Digital Library

[8]

P. D. Hoang and J. M. Rabaey, "Scheduling of DSP programs onto multiprocessors for maximum throughput," IEEE Trans. Signal Process., vol. 41, no. 6, pp. 2225-2235, Jun. 1993.

[9]

A. Jerraya and W. Wolf, Multiprocessor Systems-on-Chip. New York: Elsevier, 2005.

[10]

Y. Jin, N. Satish, K. Ravindran, and K. Keutzer, "An automated exploration framework for FPGA-based soft multiprocessor systems," in Proc. Int. Conf. Hardware/Software Codesign Syst. Synth., Sep. 2005, pp. 273-278.

Digital Library

[11]

I. Karkowski and H. Corporaal, "Design of heterogenous multiprocessor embedded systems: Applying functional pipelining," in Proc. Conf. Parallel Architectures Compilation Tech. (PACT '97), San Francisco, CA, 1997, pp. 156-165.

Digital Library

[12]

E. L. Lawler, K. N. Levitt, and J. Turner, "Module clustering to minimize delay in digital networks," IEEE Trans. Comput., vol. C-18, no. 1, pp. 47-57, Jan. 1966.

Digital Library

[13]

E. A. Lee and D. G. Messerschmitt, "Synchronous dataflow," Proc. IEEE, vol. 75, no. 9, pp. 1235-1245, Sep. 1987.

[14]

E. A. Lee and T. M. Parks, "Dataflow process networks," Proc. IEEE, vol. 83, no. 5, pp. 773-799, May 1995.

[15]

J. K. Lenstra, A. H. G. R. Kan, and P. Brucker, "Complexity of machine scheduling problems," Ann. Discrete Math., vol. 1, pp. 343-362, 1977.

[16]

D. Pham, S. Asano, M. Bolliger, M. N. Day, H. P. Hofstee, C. Johns, J. Kahle, A. Kameyama, J. Keaty, Y. Masubuchi, M. Riley, D. Shippy, D. Stasiak, M. Suzuoki, M. Wang, J. Warnock, S. Weitzel, D. Wendel, T. Yamazaki, and K. Yazawa, "The design and implementation of a first-generation CELL processor," in ISSCC Dig. Tech. Papers, Feb. 2005, pp. 184-185.

[17]

S. Prakash and A. C. Parker, "SOS: Synthesis of application-specific heterogeneous multiprocessor systems," J. Parallel Distrib. Comput., vol. 16, pp. 338-351, 1992.

[18]

R. Rajaraman and D. F. Wong, "Optimal clustering for delay minimization," in Proc. ACM Design Autom. Conf., 1993, pp. 309-314.

Digital Library

[19]

K. Ravindran, N. Satish, Y. Jin, and K. Keutzer, "An FPGA-based soft multiprocessor system for IPv4 packet forwarding," in Proc. 15th Int. Conf. Field Programmable Logic Applicat., Aug. 2005, pp. 487-492.

[20]

V. Sarkar and J. Hennessy, "Compile-time partitioning and scheduling of parallel programs," in Proc. SIGPLAN'86 Symp. Compiler Construction, 1986, pp. 17-26.

Digital Library

[21]

S. Sriram and S. S. Bhattacharyya, Embedded Multiprocessors: Scheduling and Synchronization. New York: Marcel Dekker, 2000.

Digital Library

[22]

J. Subhlok and G. Vondran, "Optimal use of mixed task and data parallelism for pipelined computations," J. Parallel Distrib. Comput., vol. 60, no. 3, pp. 297-319, 1997.

Digital Library

[23]

J. D. Ullman, "NP-complete scheduling problem," J. Comput. Syst. Sci., vol. 10, pp. 384-393, 1975.

Digital Library

[24]

M. Wolf, "The definition of dependence distance," ACM Trans. Programming Lang. Syst., vol. 16, no. 4, pp. 1114-1116, 1994.

Digital Library

[25]

W. Wolf, "An architectural co-synthesis algorithm for distributed, embedded computing systems," IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 5, no. 2, pp. 218-229, Jun. 1997.

Digital Library

[26]

Altera Corp. {Online}. Available: http://www.altera.com.

[27]

LPsolve. {Online}. Available: http://www.cs.sunysb.edu/~algorith/implement/ lpsolve/implement.shtml.

[28]

Xilinx, Inc. {Online}. Available: http://www.xilinx.com.

[29]

Intel Corp. {Online}. Available: http://www.intel.com.

[30]

Advanced Micro Devices, Inc. {Online}. Available: http://www.amd. com.

Cited By

Nowatzki TArdalani NSankaralingam KWeng JEvripidou SStenström PO'Boyle M(2018)Hybrid optimization/heuristic instruction scheduling for programmable accelerator codesignProceedings of the 27th International Conference on Parallel Architectures and Compilation Techniques10.1145/3243176.3243212(1-15)Online publication date: 1-Nov-2018
https://dl.acm.org/doi/10.1145/3243176.3243212
Liu DSpasic JZhai JStefanov TChen GFettweis GNebel W(2014)Resource optimization for CSDF-modeled streaming applications with latency constraintsProceedings of the conference on Design, Automation & Test in Europe10.5555/2616606.2616837(1-6)Online publication date: 24-Mar-2014
https://dl.acm.org/doi/10.5555/2616606.2616837
Nowatzki TSartin-Tarm MDe Carli LSankaralingam KEstan CRobatmili B(2014)A Scheduling Framework for Spatial Architectures Across Multiple Constraint-Solving TheoriesACM Transactions on Programming Languages and Systems10.1145/265899337:1(1-30)Online publication date: 17-Nov-2014
https://dl.acm.org/doi/10.1145/2658993
Show More Cited By

Recommendations

Synthesis of an application-specific soft multiprocessor system
FPGA '07: Proceedings of the 2007 ACM/SIGDA 15th international symposium on Field programmable gate arrays

The application-specific multiprocessor System-on-a-Chip is a promising design alternative because of its high degree of flexibility, short development time, and potentially high performance attributed to application-specific optimizations. However, ...
Exploration and optimization of a homogeneous tree-based application specific inflexible FPGA

An Application Specific Inflexible FPGA (ASIF) is a modified form of an FPGA which is designed for a predefined set of applications that operate at mutually exclusive times. An ASIF is a compromise between FPGAs and Application Specific Integrated ...
A design methodology for application-specific networks-on-chip

With the help of HW/SW codesign, system-on-chip (SoC) can effectively reduce cost, improve reliability, and produce versatile products. The growing complexity of SoC designs makes on-chip communication subsystem design as important as computation ...

Comments

Information & Contributors

Information

Published In

cover image IEEE Transactions on Very Large Scale Integration (VLSI) Systems

IEEE Transactions on Very Large Scale Integration (VLSI) Systems Volume 17, Issue 9

September 2009

198 pages

ISSN:1063-8210

Issue’s Table of Contents

Copyright © 2009.

Publisher

IEEE Educational Activities Department

United States

Publication History

Published: 01 September 2009

Revised: 15 March 2008

Received: 26 October 2007

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

6
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 12 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Nowatzki TArdalani NSankaralingam KWeng JEvripidou SStenström PO'Boyle M(2018)Hybrid optimization/heuristic instruction scheduling for programmable accelerator codesignProceedings of the 27th International Conference on Parallel Architectures and Compilation Techniques10.1145/3243176.3243212(1-15)Online publication date: 1-Nov-2018
https://dl.acm.org/doi/10.1145/3243176.3243212
Liu DSpasic JZhai JStefanov TChen GFettweis GNebel W(2014)Resource optimization for CSDF-modeled streaming applications with latency constraintsProceedings of the conference on Design, Automation & Test in Europe10.5555/2616606.2616837(1-6)Online publication date: 24-Mar-2014
https://dl.acm.org/doi/10.5555/2616606.2616837
Nowatzki TSartin-Tarm MDe Carli LSankaralingam KEstan CRobatmili B(2014)A Scheduling Framework for Spatial Architectures Across Multiple Constraint-Solving TheoriesACM Transactions on Programming Languages and Systems10.1145/265899337:1(1-30)Online publication date: 17-Nov-2014
https://dl.acm.org/doi/10.1145/2658993
Nowatzki TSartin-Tarm MDe Carli LSankaralingam KEstan CRobatmili B(2013)A general constraint-centric scheduling framework for spatial architecturesACM SIGPLAN Notices10.1145/2499370.246216348:6(495-506)Online publication date: 16-Jun-2013
https://dl.acm.org/doi/10.1145/2499370.2462163
Nowatzki TSartin-Tarm MDe Carli LSankaralingam KEstan CRobatmili BBoehm HFlanagan C(2013)A general constraint-centric scheduling framework for spatial architecturesProceedings of the 34th ACM SIGPLAN Conference on Programming Language Design and Implementation10.1145/2491956.2462163(495-506)Online publication date: 16-Jun-2013
https://dl.acm.org/doi/10.1145/2491956.2462163
Zhai JNikolov HStefanov T(2013)Mapping of streaming applications considering alternative application specificationsACM Transactions on Embedded Computing Systems10.1145/2435227.243523012:1s(1-21)Online publication date: 21-Mar-2013
https://dl.acm.org/doi/10.1145/2435227.2435230

View Options

View options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Issue’s Table of Contents