Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Synthesis algorithm for application-specific homogeneous processor networks

Published: 01 September 2009 Publication History

Abstract

The application-specific multiprocessor system-on-achip is a promising design alternative because of its high degree of flexibility, short development time, and potentially high performance attributed to application-specific optimizations. However, designing an optimal application-specific multiprocessor system is still challenging because there are a number of important metrics, such as throughput, latency, and resource usage, which need to be explored and optimized. This paper addresses the problem of synthesizing an application-specific multiprocessor system for stream-oriented embedded applications to minimize system latency under the throughput constraint. We employ a novel framework for this problem, similar to that of technology mapping in the logic synthesis domain, and develop a set of efficient algorithms, including labeling and clustering for efficient generation of the multiprocessor architecture with application-specific optimized latency. Specifically, the result of our algorithm is latency-optimal for directed acyclic task graphs. Application of our approach to the Motion JPEG example on Xilinx's Virtex II Pro platform FPGA shows interesting design tradeoffs.

References

[1]
J. Cong, H. Li, and C. Wu, "Simultaneous circuit partitioning/clustering with retiming for performance optimization," in Proc. ACM Design Automation Conf., 1999, pp. 460-465.
[2]
J. Cong, G. Han, and W. Jiang, "Synthesis of an application-specific soft multiprocessor system," in Proc. 15th ACM/SIGDA Int. Symp. Field Programmable Gate Arrays, Monterey, CA, Feb. 2007, pp. 99-107.
[3]
R. P. Dick, D. L. Rhodes, and W. Wolf, "TGFF: Task graph for free," in Proc. 6th Int. Workshop Hardware/Software Codesign, Mar. 1998, pp. 97-101.
[4]
R. P. Dick and N. K. Jha, "MOGAC: A multiobjective genetic algorithm for hardware-software cosynthesis of distributed embedded systems," IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst., vol. 17, no. 10, pp. 920-935, Oct. 1998.
[5]
S. Dutta, R. Jensen, and A. Rieckmann, "Viper: A multiprocessor SOC for advanced set-top box and digital TV systems," IEEE Design Test, vol. 18, no. 5, pp. 21-31, Sep. 2001.
[6]
H. El-Rewini, T. Lewis, and H. Ali, Task Scheduling in Parallel and Distributed Systems. Englewood Cliffs, NJ: Prentice-Hall, 1994.
[7]
M. Grajcar, "Genetic list scheduling algorithm for scheduling and allocation on a loosely coupled heterogeneous multiprocessor system," in Proc. 36th ACM/IEEE Conf. Design Autom., New Orleans, LA, 1999, pp. 280-285.
[8]
P. D. Hoang and J. M. Rabaey, "Scheduling of DSP programs onto multiprocessors for maximum throughput," IEEE Trans. Signal Process., vol. 41, no. 6, pp. 2225-2235, Jun. 1993.
[9]
A. Jerraya and W. Wolf, Multiprocessor Systems-on-Chip. New York: Elsevier, 2005.
[10]
Y. Jin, N. Satish, K. Ravindran, and K. Keutzer, "An automated exploration framework for FPGA-based soft multiprocessor systems," in Proc. Int. Conf. Hardware/Software Codesign Syst. Synth., Sep. 2005, pp. 273-278.
[11]
I. Karkowski and H. Corporaal, "Design of heterogenous multiprocessor embedded systems: Applying functional pipelining," in Proc. Conf. Parallel Architectures Compilation Tech. (PACT '97), San Francisco, CA, 1997, pp. 156-165.
[12]
E. L. Lawler, K. N. Levitt, and J. Turner, "Module clustering to minimize delay in digital networks," IEEE Trans. Comput., vol. C-18, no. 1, pp. 47-57, Jan. 1966.
[13]
E. A. Lee and D. G. Messerschmitt, "Synchronous dataflow," Proc. IEEE, vol. 75, no. 9, pp. 1235-1245, Sep. 1987.
[14]
E. A. Lee and T. M. Parks, "Dataflow process networks," Proc. IEEE, vol. 83, no. 5, pp. 773-799, May 1995.
[15]
J. K. Lenstra, A. H. G. R. Kan, and P. Brucker, "Complexity of machine scheduling problems," Ann. Discrete Math., vol. 1, pp. 343-362, 1977.
[16]
D. Pham, S. Asano, M. Bolliger, M. N. Day, H. P. Hofstee, C. Johns, J. Kahle, A. Kameyama, J. Keaty, Y. Masubuchi, M. Riley, D. Shippy, D. Stasiak, M. Suzuoki, M. Wang, J. Warnock, S. Weitzel, D. Wendel, T. Yamazaki, and K. Yazawa, "The design and implementation of a first-generation CELL processor," in ISSCC Dig. Tech. Papers, Feb. 2005, pp. 184-185.
[17]
S. Prakash and A. C. Parker, "SOS: Synthesis of application-specific heterogeneous multiprocessor systems," J. Parallel Distrib. Comput., vol. 16, pp. 338-351, 1992.
[18]
R. Rajaraman and D. F. Wong, "Optimal clustering for delay minimization," in Proc. ACM Design Autom. Conf., 1993, pp. 309-314.
[19]
K. Ravindran, N. Satish, Y. Jin, and K. Keutzer, "An FPGA-based soft multiprocessor system for IPv4 packet forwarding," in Proc. 15th Int. Conf. Field Programmable Logic Applicat., Aug. 2005, pp. 487-492.
[20]
V. Sarkar and J. Hennessy, "Compile-time partitioning and scheduling of parallel programs," in Proc. SIGPLAN'86 Symp. Compiler Construction, 1986, pp. 17-26.
[21]
S. Sriram and S. S. Bhattacharyya, Embedded Multiprocessors: Scheduling and Synchronization. New York: Marcel Dekker, 2000.
[22]
J. Subhlok and G. Vondran, "Optimal use of mixed task and data parallelism for pipelined computations," J. Parallel Distrib. Comput., vol. 60, no. 3, pp. 297-319, 1997.
[23]
J. D. Ullman, "NP-complete scheduling problem," J. Comput. Syst. Sci., vol. 10, pp. 384-393, 1975.
[24]
M. Wolf, "The definition of dependence distance," ACM Trans. Programming Lang. Syst., vol. 16, no. 4, pp. 1114-1116, 1994.
[25]
W. Wolf, "An architectural co-synthesis algorithm for distributed, embedded computing systems," IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 5, no. 2, pp. 218-229, Jun. 1997.
[26]
Altera Corp. {Online}. Available: http://www.altera.com.
[27]
LPsolve. {Online}. Available: http://www.cs.sunysb.edu/~algorith/implement/ lpsolve/implement.shtml.
[28]
Xilinx, Inc. {Online}. Available: http://www.xilinx.com.
[29]
Intel Corp. {Online}. Available: http://www.intel.com.
[30]
Advanced Micro Devices, Inc. {Online}. Available: http://www.amd. com.

Cited By

View all
  • (2018)Hybrid optimization/heuristic instruction scheduling for programmable accelerator codesignProceedings of the 27th International Conference on Parallel Architectures and Compilation Techniques10.1145/3243176.3243212(1-15)Online publication date: 1-Nov-2018
  • (2014)Resource optimization for CSDF-modeled streaming applications with latency constraintsProceedings of the conference on Design, Automation & Test in Europe10.5555/2616606.2616837(1-6)Online publication date: 24-Mar-2014
  • (2014)A Scheduling Framework for Spatial Architectures Across Multiple Constraint-Solving TheoriesACM Transactions on Programming Languages and Systems10.1145/265899337:1(1-30)Online publication date: 17-Nov-2014
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image IEEE Transactions on Very Large Scale Integration (VLSI) Systems
IEEE Transactions on Very Large Scale Integration (VLSI) Systems  Volume 17, Issue 9
September 2009
198 pages

Publisher

IEEE Educational Activities Department

United States

Publication History

Published: 01 September 2009
Revised: 15 March 2008
Received: 26 October 2007

Author Tags

  1. Clustering
  2. clustering
  3. design space
  4. labeling
  5. multiprocessor
  6. task-level pipeline

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 12 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2018)Hybrid optimization/heuristic instruction scheduling for programmable accelerator codesignProceedings of the 27th International Conference on Parallel Architectures and Compilation Techniques10.1145/3243176.3243212(1-15)Online publication date: 1-Nov-2018
  • (2014)Resource optimization for CSDF-modeled streaming applications with latency constraintsProceedings of the conference on Design, Automation & Test in Europe10.5555/2616606.2616837(1-6)Online publication date: 24-Mar-2014
  • (2014)A Scheduling Framework for Spatial Architectures Across Multiple Constraint-Solving TheoriesACM Transactions on Programming Languages and Systems10.1145/265899337:1(1-30)Online publication date: 17-Nov-2014
  • (2013)A general constraint-centric scheduling framework for spatial architecturesACM SIGPLAN Notices10.1145/2499370.246216348:6(495-506)Online publication date: 16-Jun-2013
  • (2013)A general constraint-centric scheduling framework for spatial architecturesProceedings of the 34th ACM SIGPLAN Conference on Programming Language Design and Implementation10.1145/2491956.2462163(495-506)Online publication date: 16-Jun-2013
  • (2013)Mapping of streaming applications considering alternative application specificationsACM Transactions on Embedded Computing Systems10.1145/2435227.243523012:1s(1-21)Online publication date: 21-Mar-2013

View Options

View options

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media