Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article
Open access

Near-Optimal Microprocessor and Accelerators Codesign with Latency and Throughput Constraints

Published: 01 May 2013 Publication History

Abstract

A systematic methodology for near-optimal software/hardware codesign mapping onto an FPGA platform with microprocessor and HW accelerators is proposed. The mapping steps deal with the inter-organization, the foreground memory management, and the datapath mapping. A step is described by parameters and equations combined in a scalable template. Mapping decisions are propagated as design constraints to prune suboptimal options in next steps. Several performance-area Pareto points are produced by instantiating the parameters. To evaluate our methodology we map a real-time bio-imaging application and loop-dominated benchmarks.

References

[1]
Ahn, Y., Han, K., Lee, G., Song, H., Yoo, J., Et al. 2008. Socdal: System-On-Chip design accelerator. Trans. Des. Autom. Electron Syst. 13, 1, 17:1--17:38.
[2]
Bjerregaard, T. and Mahadevan, S. 2006. A survey of research and practices of network-on-chip. ACM Comput. Surv. 38, 1, 1--51.
[3]
Callahan, T., Hauser, J., and Wawrzynek, J. 2000. The garp architecture and c compiler. J. Comput. 33, 4, 62--69.
[4]
Capitanio, A., Nicolau, A., and Dutt, N. 1995. A hypergraph-based model for port allocation on multiple-register-file vliw architectures. Int. J. Parallel Program. 23, 6, 499--513.
[5]
Compton, K. and Hauck, S. 2002. Reconfigurable computing: A survey of systems and software. ACM Comput. Surv. 34, 171--210.
[6]
Cooper, K. D., Simpson, L. T., and Vick, C. A. 2001. Operator strength reduction. ACM Trans. Program. Lang. Syst. 23, 5, 603--625.
[7]
Criticalblue. 2012. Criticalblue cascade, programmable application coprocessor generation. http://www.criticalblue.com.
[8]
Demiris, A. and Blionas, S. 2011. Integrated system for the visual control, quantitative and qualitative flow measurement in microfluidics. Hellinic Industrial Property Organisation patent 20110100390.
[9]
Diguet, J. P., Chillet, D., and Sentieys, O. 2000. A framework for high level estimations of signal processing vlsi implementations. J. VLSI Signal Process. Syst. 25, 3, 261--284.
[10]
Dimond, R., Mencer, O., and Luk, W. 2005. Custard - A customisable threaded fpga soft processor and tools. In Proceedings of the International Conference on Field Programming Logic and Applications. IEEE, 1--6.
[11]
Ferrandi, F., Lanzi, P. L., Palermo, G., Pilato, C., Sciuto., and Tumeo, A. 2007. An evolutionary approach to area-time optimization of fpga designs. In Proceedings of the International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation. IEEE, 145--152.
[12]
Flatt, H., Blume, H., and Pirsch, P. 2010. Mapping of a real-time object detection application to a configurable risc/coprocessor architecture at full hd resolution. In Proceedings of the Reconfigurable Computing and FPGAs (ReConFig). IEEE, 452--457.
[13]
Gajski, D., Vahid, K., Narayan, S., and Gong, J. 1998. Specsyn: An environment supporting the specify-explore-refine paradigm for hardware/software system design. IEEE Trans. VLSI Syst. 6, 1, 84--100.
[14]
Guo, Z., Najjar, W. A., and Buyukkurt, B. 2008. Efficient hardware code generation for fpgas. ACM Trans. Archit. Code Optim. 5, 1, 6:1--6:26.
[15]
Hennessy, J. and Patterson, D. 2006. Computer Architecture: A Quantitative Approach 4th Ed. Morgan Kaufmann Publishers, San Francisco, CA.
[16]
Huang, C. and Vahid, F. 2011. Scalable object detection accelerators on fpgas using custom design space exploration. In Proceedings of the Symposium on Application Specific Processors. IEEE, 115--121.
[17]
Jozwiak, L., Gawlowski, D., and Slusarczyk, A. 2006. Multi-objective optimal controller synthesis for heterogeneous embedded systems. In Proceedings of the International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation. 177--184.
[18]
Kim, Y., Lee, J., Mai, T. X., and Paek, Y. 2012. Improving performance of nested loops on reconfigurable array processors. ACM Trans. Archit. Code Optim. 8, 4, 32:1--32:23.
[19]
Koes, D. R. and Goldstein, S. C. 2008. Near-optimal instruction selection on dags. In Proceedings of the International Symposium on Code Generation and Optimization. ACM Press, New York, 45--54.
[20]
Kornaros, G. 2010. A soft multi-core architecture for edge detection and data analysis of microarray images. J. Syst. Archit. 56, 48--62.
[21]
Kritikakou, A., Catthoor, F., Athanasiou, G. S., Kelefouras, V., and Goutis, C. 2012. A template-based methodology for efficient microprocessor and fpga accelerator co-design. In Proceedings of the International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation (SAMOS). 15--22.
[22]
Kritikakou, A., Catthoor, F., Kelefouras, V., and Goutis, C. 2013. A systematic approach to classify global scheduling techniques. ACM J. Comput. Surv. 45, 2, Article 14.
[23]
Lam, M. 1988. Software pipelining: An effective scheduling technique for vliw machines. SIGPLAN Not. 23, 7, 318--328.
[24]
Liao, J., Wong, W.-F., and Mitra, T. 2003. A model for hardware realization of kernel loops. In Proceedings of the International Conference on Field Programmable Gate Arrays. Vol. 2778, Springer, 334--344.
[25]
Melpignano, D., Benini, L., Flamand, E., Jego, B., Lepley, T., et al. 2012. Platform 2012, a many-core computing accelerator for embedded socs: Performance evaluation of visual analytics applications. In Proceedings of the 49th Annual Design Automation Conference (DAC). ACM Press, New York, 1137--1142.
[26]
Milder, P., Franchetti, F., Hoe, J. C., and Puschel, M. 2012. Computer generation of hardware for linear digital signal processing transforms. ACM Trans. Des. Autom. Electron. Syst. 17, 2, 15:1--15:33.
[27]
Neumann, B., Von Sydow, T., Blume, H., and Noll, T. G. 2008. Designflow for embedded fpgas based on a flexible architecture template. In Proceedings of the Conference on Design, Automation and Test in Europe. ACM Press, New York, 56--61.
[28]
Novo, D., Kritikakou, P., Raghavan, P., Van Der Perre, L., Huisken, J., and Catthoor, F. 2010. Ultra low energy domain specific instruction-set processor for on-line surveillance. In Proceedings of the IEEE 8th Symposium on Application Specific Processors (SASP). IEEE Computer Society Press, 30--35.
[29]
Palermo, G., Silvano, C., and Zaccaria, V. 2005. Multi-objective design space exploration of embedded systems. J. Embedded Comput. 1, 305--316.
[30]
Poletto, M. and Sarkar, V. 1999. Linear scan register allocation. ACM Trans. Program. Lang. Syst. 21, 5, 895--913.
[31]
Pouchet, L.-N., et al. 2012. Polybenchmarks benchmark suite. http://www.cse.ohio-state.edu/~pouchet/software/polybench/.
[32]
Sant’Anna, R., De Lima, M. E., and Mariel, P. R. M. 2004. A left-edge algorithm approach for scheduling and allocation of hw contexts in dynamically reconfigurable architectures. In Proceedings of the ACM/SIGDA 12th International Symposium on Field Programmable Gate Arrays (FPGA). ACM Press, New York, 259--259.
[33]
Shahzad, M. and Zahid, S. 2009. Image coprocessor: A real-time approach towards object tracking. In Proceedings of the International Conference on Digital Image Processing. IEEE, 220--224.
[34]
Sheldon, D., Kumar, R., Lysecky, R.,Vahid, F., and Tullsen, D. 2006. Application-specific customization of parameterized fpga soft-core processors. In Proceedings of the International Conference on Computer-Aided Design. IEEE, 261--268.
[35]
Sheldon, D. and Vahid, F. 2009. Making good points: Application-specific pareto-point generation for dse using statistical methods. In Proceedings of the ACM/SIGDA International Symposium on Field Programmable Gate Arrays (FPGA). ACM Press, New York, 123--132.
[36]
Synopsys. 2012. Synopsys synphony - high level synthesis solution. http://www.synopsys.com.
[37]
Vassiliadis, N., Theodoridis, G., and Nikolaidis, S. 2009. The arise approach for extending embedded processors with arbitrary hardware accelerators. Trans. VLSI 17, 2, 221--233.
[38]
Xilinx. 2011. Logicore ip multi-port memory controller.

Cited By

View all
  • (2014)Conclusions and Future DirectionsScalable and Near-Optimal Design Space Exploration for Embedded Systems10.1007/978-3-319-04942-7_10(261-263)Online publication date: 21-Feb-2014

Index Terms

  1. Near-Optimal Microprocessor and Accelerators Codesign with Latency and Throughput Constraints

        Recommendations

        Comments

        Information & Contributors

        Information

        Published In

        cover image ACM Transactions on Architecture and Code Optimization
        ACM Transactions on Architecture and Code Optimization  Volume 10, Issue 2
        May 2013
        101 pages
        ISSN:1544-3566
        EISSN:1544-3973
        DOI:10.1145/2459316
        Issue’s Table of Contents
        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        Published: 01 May 2013
        Accepted: 01 December 2012
        Revised: 01 October 2012
        Received: 01 June 2012
        Published in TACO Volume 10, Issue 2

        Permissions

        Request permissions for this article.

        Check for updates

        Author Tags

        1. FPGA
        2. area reduction
        3. near optimal
        4. real-time behavior

        Qualifiers

        • Research-article
        • Research
        • Refereed

        Funding Sources

        • Hellenic and European Regional Development Fund (ERDF)
        • Greek national funds (Heracleitus II-NSRF)
        • Public Welfare Foundation “Propondis” research funds
        • European Social Fund

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)60
        • Downloads (Last 6 weeks)10
        Reflects downloads up to 12 Feb 2025

        Other Metrics

        Citations

        Cited By

        View all
        • (2014)Conclusions and Future DirectionsScalable and Near-Optimal Design Space Exploration for Embedded Systems10.1007/978-3-319-04942-7_10(261-263)Online publication date: 21-Feb-2014

        View Options

        View options

        PDF

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        Login options

        Full Access

        Figures

        Tables

        Media

        Share

        Share

        Share this Publication link

        Share on social media