research-article

Open access

Near-Optimal Microprocessor and Accelerators Codesign with Latency and Throughput Constraints

Authors:

Angeliki Kritikakou,

Francky Catthoor,

George S. Athanasiou,

Vasilios Kelefouras,

Costas GoutisAuthors Info & Claims

ACM Transactions on Architecture and Code Optimization (TACO), Volume 10, Issue 2

Article No.: 6, Pages 1 - 25

https://doi.org/10.1145/2459316.2459317

Published: 01 May 2013 Publication History

Abstract

A systematic methodology for near-optimal software/hardware codesign mapping onto an FPGA platform with microprocessor and HW accelerators is proposed. The mapping steps deal with the inter-organization, the foreground memory management, and the datapath mapping. A step is described by parameters and equations combined in a scalable template. Mapping decisions are propagated as design constraints to prune suboptimal options in next steps. Several performance-area Pareto points are produced by instantiating the parameters. To evaluate our methodology we map a real-time bio-imaging application and loop-dominated benchmarks.

References

[1]

Ahn, Y., Han, K., Lee, G., Song, H., Yoo, J., Et al. 2008. Socdal: System-On-Chip design accelerator. Trans. Des. Autom. Electron Syst. 13, 1, 17:1--17:38.

Digital Library

[2]

Bjerregaard, T. and Mahadevan, S. 2006. A survey of research and practices of network-on-chip. ACM Comput. Surv. 38, 1, 1--51.

Digital Library

[3]

Callahan, T., Hauser, J., and Wawrzynek, J. 2000. The garp architecture and c compiler. J. Comput. 33, 4, 62--69.

Digital Library

[4]

Capitanio, A., Nicolau, A., and Dutt, N. 1995. A hypergraph-based model for port allocation on multiple-register-file vliw architectures. Int. J. Parallel Program. 23, 6, 499--513.

Digital Library

[5]

Compton, K. and Hauck, S. 2002. Reconfigurable computing: A survey of systems and software. ACM Comput. Surv. 34, 171--210.

Digital Library

[6]

Cooper, K. D., Simpson, L. T., and Vick, C. A. 2001. Operator strength reduction. ACM Trans. Program. Lang. Syst. 23, 5, 603--625.

Digital Library

[7]

Criticalblue. 2012. Criticalblue cascade, programmable application coprocessor generation. http://www.criticalblue.com.

[8]

Demiris, A. and Blionas, S. 2011. Integrated system for the visual control, quantitative and qualitative flow measurement in microfluidics. Hellinic Industrial Property Organisation patent 20110100390.

[9]

Diguet, J. P., Chillet, D., and Sentieys, O. 2000. A framework for high level estimations of signal processing vlsi implementations. J. VLSI Signal Process. Syst. 25, 3, 261--284.

Digital Library

[10]

Dimond, R., Mencer, O., and Luk, W. 2005. Custard - A customisable threaded fpga soft processor and tools. In Proceedings of the International Conference on Field Programming Logic and Applications. IEEE, 1--6.

[11]

Ferrandi, F., Lanzi, P. L., Palermo, G., Pilato, C., Sciuto., and Tumeo, A. 2007. An evolutionary approach to area-time optimization of fpga designs. In Proceedings of the International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation. IEEE, 145--152.

[12]

Flatt, H., Blume, H., and Pirsch, P. 2010. Mapping of a real-time object detection application to a configurable risc/coprocessor architecture at full hd resolution. In Proceedings of the Reconfigurable Computing and FPGAs (ReConFig). IEEE, 452--457.

Digital Library

[13]

Gajski, D., Vahid, K., Narayan, S., and Gong, J. 1998. Specsyn: An environment supporting the specify-explore-refine paradigm for hardware/software system design. IEEE Trans. VLSI Syst. 6, 1, 84--100.

Digital Library

[14]

Guo, Z., Najjar, W. A., and Buyukkurt, B. 2008. Efficient hardware code generation for fpgas. ACM Trans. Archit. Code Optim. 5, 1, 6:1--6:26.

Digital Library

[15]

Hennessy, J. and Patterson, D. 2006. Computer Architecture: A Quantitative Approach 4th Ed. Morgan Kaufmann Publishers, San Francisco, CA.

Digital Library

[16]

Huang, C. and Vahid, F. 2011. Scalable object detection accelerators on fpgas using custom design space exploration. In Proceedings of the Symposium on Application Specific Processors. IEEE, 115--121.

Digital Library

[17]

Jozwiak, L., Gawlowski, D., and Slusarczyk, A. 2006. Multi-objective optimal controller synthesis for heterogeneous embedded systems. In Proceedings of the International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation. 177--184.

[18]

Kim, Y., Lee, J., Mai, T. X., and Paek, Y. 2012. Improving performance of nested loops on reconfigurable array processors. ACM Trans. Archit. Code Optim. 8, 4, 32:1--32:23.

Digital Library

[19]

Koes, D. R. and Goldstein, S. C. 2008. Near-optimal instruction selection on dags. In Proceedings of the International Symposium on Code Generation and Optimization. ACM Press, New York, 45--54.

Digital Library

[20]

Kornaros, G. 2010. A soft multi-core architecture for edge detection and data analysis of microarray images. J. Syst. Archit. 56, 48--62.

Digital Library

[21]

Kritikakou, A., Catthoor, F., Athanasiou, G. S., Kelefouras, V., and Goutis, C. 2012. A template-based methodology for efficient microprocessor and fpga accelerator co-design. In Proceedings of the International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation (SAMOS). 15--22.

[22]

Kritikakou, A., Catthoor, F., Kelefouras, V., and Goutis, C. 2013. A systematic approach to classify global scheduling techniques. ACM J. Comput. Surv. 45, 2, Article 14.

Digital Library

[23]

Lam, M. 1988. Software pipelining: An effective scheduling technique for vliw machines. SIGPLAN Not. 23, 7, 318--328.

Digital Library

[24]

Liao, J., Wong, W.-F., and Mitra, T. 2003. A model for hardware realization of kernel loops. In Proceedings of the International Conference on Field Programmable Gate Arrays. Vol. 2778, Springer, 334--344.

[25]

Melpignano, D., Benini, L., Flamand, E., Jego, B., Lepley, T., et al. 2012. Platform 2012, a many-core computing accelerator for embedded socs: Performance evaluation of visual analytics applications. In Proceedings of the 49th Annual Design Automation Conference (DAC). ACM Press, New York, 1137--1142.

Digital Library

[26]

Milder, P., Franchetti, F., Hoe, J. C., and Puschel, M. 2012. Computer generation of hardware for linear digital signal processing transforms. ACM Trans. Des. Autom. Electron. Syst. 17, 2, 15:1--15:33.

Digital Library

[27]

Neumann, B., Von Sydow, T., Blume, H., and Noll, T. G. 2008. Designflow for embedded fpgas based on a flexible architecture template. In Proceedings of the Conference on Design, Automation and Test in Europe. ACM Press, New York, 56--61.

Digital Library

[28]

Novo, D., Kritikakou, P., Raghavan, P., Van Der Perre, L., Huisken, J., and Catthoor, F. 2010. Ultra low energy domain specific instruction-set processor for on-line surveillance. In Proceedings of the IEEE 8th Symposium on Application Specific Processors (SASP). IEEE Computer Society Press, 30--35.

Digital Library

[29]

Palermo, G., Silvano, C., and Zaccaria, V. 2005. Multi-objective design space exploration of embedded systems. J. Embedded Comput. 1, 305--316.

Digital Library

[30]

Poletto, M. and Sarkar, V. 1999. Linear scan register allocation. ACM Trans. Program. Lang. Syst. 21, 5, 895--913.

Digital Library

[31]

Pouchet, L.-N., et al. 2012. Polybenchmarks benchmark suite. http://www.cse.ohio-state.edu/~pouchet/software/polybench/.

[32]

Sant’Anna, R., De Lima, M. E., and Mariel, P. R. M. 2004. A left-edge algorithm approach for scheduling and allocation of hw contexts in dynamically reconfigurable architectures. In Proceedings of the ACM/SIGDA 12th International Symposium on Field Programmable Gate Arrays (FPGA). ACM Press, New York, 259--259.

Digital Library

[33]

Shahzad, M. and Zahid, S. 2009. Image coprocessor: A real-time approach towards object tracking. In Proceedings of the International Conference on Digital Image Processing. IEEE, 220--224.

Digital Library

[34]

Sheldon, D., Kumar, R., Lysecky, R.,Vahid, F., and Tullsen, D. 2006. Application-specific customization of parameterized fpga soft-core processors. In Proceedings of the International Conference on Computer-Aided Design. IEEE, 261--268.

Digital Library

[35]

Sheldon, D. and Vahid, F. 2009. Making good points: Application-specific pareto-point generation for dse using statistical methods. In Proceedings of the ACM/SIGDA International Symposium on Field Programmable Gate Arrays (FPGA). ACM Press, New York, 123--132.

Digital Library

[36]

Synopsys. 2012. Synopsys synphony - high level synthesis solution. http://www.synopsys.com.

[37]

Vassiliadis, N., Theodoridis, G., and Nikolaidis, S. 2009. The arise approach for extending embedded processors with arbitrary hardware accelerators. Trans. VLSI 17, 2, 221--233.

Digital Library

[38]

Xilinx. 2011. Logicore ip multi-port memory controller.

Cited By

Kritikakou ACatthoor FGoutis CKritikakou ACatthoor FGoutis C(2014)Conclusions and Future DirectionsScalable and Near-Optimal Design Space Exploration for Embedded Systems10.1007/978-3-319-04942-7_10(261-263)Online publication date: 21-Feb-2014
https://doi.org/10.1007/978-3-319-04942-7_10

Index Terms

Near-Optimal Microprocessor and Accelerators Codesign with Latency and Throughput Constraints
1. Computer systems organization

Recommendations

Hardware reuse in modern application-specific processors and accelerators

Effective exploitation of the application-specific parallel patterns and computation operations through their direct implementation in hardware is the base for construction of high-quality application-specific (re-) configurable application specific ...
From software to accelerators with LegUp high-level synthesis
CASES '13: Proceedings of the 2013 International Conference on Compilers, Architectures and Synthesis for Embedded Systems

Embedded system designers can achieve energy and performance benefits by using dedicated hardware accelerators. However, implementing custom hardware accelerators for an application can be difficult and time intensive. LegUp is an open-source high-level ...
Using reconfigurability to achieve real-time profiling for hardware/software codesign
FPGA '04: Proceedings of the 2004 ACM/SIGDA 12th international symposium on Field programmable gate arrays

Embedded systems combine a processor with dedicated logic to meet design specifications at a reasonable cost. The attempt to amalgamate two distinct design environments introduces many problems, one being how to partition a single design for the two ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Architecture and Code Optimization

ACM Transactions on Architecture and Code Optimization Volume 10, Issue 2

May 2013

101 pages

ISSN:1544-3566

EISSN:1544-3973

DOI:10.1145/2459316

Issue’s Table of Contents

Copyright © 2013 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 May 2013

Accepted: 01 December 2012

Revised: 01 October 2012

Received: 01 June 2012

Published in TACO Volume 10, Issue 2

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Funding Sources

Hellenic and European Regional Development Fund (ERDF)
Greek national funds (Heracleitus II-NSRF)
Public Welfare Foundation “Propondis” research funds
European Social Fund

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
646
Total Downloads

Downloads (Last 12 months)60
Downloads (Last 6 weeks)10

Reflects downloads up to 12 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Kritikakou ACatthoor FGoutis CKritikakou ACatthoor FGoutis C(2014)Conclusions and Future DirectionsScalable and Near-Optimal Design Space Exploration for Embedded Systems10.1007/978-3-319-04942-7_10(261-263)Online publication date: 21-Feb-2014
https://doi.org/10.1007/978-3-319-04942-7_10

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Figures

Tables

Media

View Issue’s Table of Contents