Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

SCF: A Framework for Task-Level Coordination in Reconfigurable, Heterogeneous Systems

Published: 01 June 2012 Publication History

Abstract

Heterogeneous computing systems comprised of accelerators such as FPGAs, GPUs, and manycore processors coupled with standard microprocessors are becoming an increasingly popular solution for future computing systems due to their higher performance and energy efficiency. Although programming languages and tools are evolving to simplify device-level design, programming such systems is still difficult and time-consuming largely due to system-wide challenges involving communication between heterogeneous devices, which currently require ad hoc solutions. Most communication frameworks and APIs which have dominated parallel application development for decades were developed for homogeneous systems, and hence cannot be directly employed for hybrid systems. To solve this problem, this article presents the System Coordination Framework (SCF), which employs message passing to transparently enable communication between tasks described using different programming tools (and languages), and running on heterogeneous processing devices of systems from domains ranging from embedded systems to High-Performance Computing (HPC) systems. By hiding low-level architectural details of the underlying communication from an application designer, SCF can improve application development productivity, provide higher levels of application portability, and offer rapid design-space exploration of different task/device mappings. In addition, SCF enables custom communication synthesis that exploits mechanisms specific to different devices and platforms, which can provide performance improvements over generic solutions employed previously. Our results indicate a performance improvement of 28× and 682× by employing FPGA devices for two applications presented in this article, while simultaneously improving the developer productivity by approximately 2.5 to 5 times by using SCF.

References

[1]
Aggarwal, V., Garcia, R., Stitt, G., George, A., and Lam, H. 2009a. SCF: A device- and language-independent task coordination framework for reconfigurable, heterogeneous systems. In Proceedings of the 3rd International Workshop on High-Performance Reconfigurable Computing Technology and Applications (HPRCTA’09). ACM, New York, 19--28.
[2]
Aggarwal, V., George, A., Yalamanchili, K., Yoon, C., Lam, H., and Stitt, G. 2009b. Bridging parallel and reconfigurable computing with multilevel PGAS and SHMEM+. In Proceedings of the 3rd International Workshop on High-Performance Reconfigurable Computing Technology and Applications (HPRCTA’09). ACM, New York, 47--54.
[3]
Aggarwal, V., George, A., Yoon, C., Yalamanchili, K., and Lam, H. 2012. SHMEM+: A multilevel-pgas programming model for reconfigurable supercomputing. ACM Trans. Reconfig. Technol. Syst. (to appear).
[4]
Bhat, P., Lim, Y., and Prasanna, V. 1995. Issues in using heterogeneous HPC systems for embedded real time signal processing applications. In Proceedings of the 2nd International Workshop on Real-Time Computing Systems and Applications. 134--141.
[5]
Carlson, W. W., Draper, J. M., Culler, D. E., Yelick, K., Brooks, E., and Warren, K. 1999. Introduction to UPC and language specification. Tech. rep., University of California-Berkeley, Berkeley, CA.
[6]
Chamberlain, R. D., Franklin, M. A., Tyson, E. J., Buckley, J. H., Buhler, J., Galloway, G., Gayen, S., Hall, M., Shands, E. B., and Singla, N. 2010. Auto-Pipe: Streaming applications on architecturally diverse systems. Comput. 43, 42--49.
[7]
Culler, D., Singh, J., and Gupta, A. 1998. Parallel Computer Architecture: A Hardware/Software Approach. Morgan Kaufmann, Chapter 2.3.
[8]
Eclipse. 2011. Eclipse classic 3.4.1. http://www.eclipse.org/downloads/packages/eclipse-classic-341/ganymedesr1.
[9]
El-Ghazawi, T., El-Araby, E., Huang, M., Gaj, K., Kindratenko, V., and Buell, D. 2008. The promise of high-performance reconfigurable computing. Comput. 41, 69--76.
[10]
El-Ghazawi, T. A., Carlson, W. W., and Draper, J. M. 2001. UPC language specifications v1.0. http://upc.gwu.edu/docs/upc_spec_1.1.1.pdf.
[11]
Erbas, C. and Pimentel, A. D. 2003. Utilizing synthesis methods in accurate system-level exploration of heterogeneous embedded systems. In Proceedings of the IEEE Workshop on Signal Processing Systems (SIPS). 310--315.
[12]
Farreras, M., Marjanovic, V., Ayguade, E., and Labarta, J. 2009. Gaining asynchrony by using hybrid UPC/SMPSs. In Proceedings of the Workshop on Asynchrony in the PGAS Programming Model.
[13]
Franklin, M., Tyson, E., Buckley, J., Crowley, P., and Maschmeyer, J. 2006. Auto-Pipe and the X language: A pipeline design tool and description language. In Proceedings of the 20th International Parallel and Distributed Processing Symposium.
[14]
Graham, R., Shipman, G., Barrett, B., Castain, R., Bosilca, G., and Lumsdaine, A. 2006. Open mpi: A high-performance, heterogeneous mpi. In Proceedings of the IEEE International Conference on Cluster Computing. 1--9.
[15]
Group, K. 2011. OpenCL 1.0 specification. http://www.khronos.org/registry/cl/specs/opencl-1.0.43.pdf.
[16]
Lastovetsky, A. and Reddy, R. 2006. Heterompi: Towards a message-passing library for heterogeneous networks of computers. J. Parallel Distrib. Comput. 66, 2, 197--220.
[17]
Lee, C. and Salcic, Z. 1997. A fully-hardware-type maximum-parallel architecture for kalman tracking filter in fpgas. In Proceedings of the Conference on Information, Communications and Signal Processing (ICICS). 1243--1247.
[18]
Lig, H. T., Hylands, C., Lee, E., Liu, J., Liu, X., Neuendorffer, S., Xiong, Y., Zhao, Y., and Zheng, H. 2003. Overview of the Ptolemy project.
[19]
Luk, W., Coutinho, J., Todman, T., Lam, Y., Osborne, W., Susanto, K., Liu, Q., and Wong, W. 2009. A high-level compilation toolchain for heterogeneous systems. In Proceedings of the IEEE International SOC Conference. 9--18.
[20]
Massetto, F. I., Junior, A. M. G., and Sato, L. M. 2006. HyMPI - a MPI implementation for heterogeneous high performance systems. In Proceedings of the International Conference on Grid and Pervasive Computing (GPC’06). 314--323.
[21]
MPI. 2011. MPI standard. http://www.mcs.anl.gov/research/projects/mpi/.
[22]
Olukotun, K. and Hammond, L. 2005. The future of microprocessors. Queue 3, 7, 26--29.
[23]
OpenArchitectureWare. 2011. Xtext reference documentation. http://www.openarchitectureware.org/pub/documentation/4.1//r80_xtextReference.pdf.
[24]
OpenFPGA. 2011. OpenFPGA GenAPI version 0.4 draft for comment. http://www.openfpga.org/Standards%20Documents/OpenFPGA-GenAPIv0.4.pdf.
[25]
OpenMP. 2011. The OpenMP API specification for parallel programming. http://openmp.org/wp/.
[26]
Pascoe, C., Lawande, A., Lam, H., George, A., Sun, Y., and Farmerie, W. 2010. Reconfigurable supercomputing with scalable systolic arrays and in-stream control for wavefront genomics processing. In Proceedings of the Symposium on Application Accelerators in High-Performance Computing (SAAHPC).
[27]
Pellerin, D. and Thibault, S. 2005. Practical FPGA Programming in C 1st Ed. Prentice Hall Press, Upper Saddle River, NJ.
[28]
Reardon, C., Holland, B., George, A., Stitt, G., and Lam, H. 2012. RCML: An environment for estimation modeling of reconfigurable computing systems. ACM Trans. Embed. Comput. Syst. (to appear).
[29]
Saldana, M., Patel, A., Madill, C., Nunes, D., Danyao, W., Styles, H., Putnam, A., Wittig, R., and Chow, P. 2008. MPI as an abstraction for software-hardware interaction for HPRCs. In Proceedings of the 3rd International Workshop on High-Performance Reconfigurable Computing Technology and Applications (HPRCTA’08). ACM, New York.
[30]
Sanders, J. and Kandrot, E. 2010. CUDA by Example: An Introduction to General-Purpose GPU Programming, 1st Ed. Addison-Wesley Professional.
[31]
SGI. 2011. Introduction to the SHMEM programming model.
[32]
Shih, K., Balachandran, A., Nagarajan, K., Holland, B., Slatton, C., and George, A. 2008. Fast real-time LIDAR processing on FPGAs. In Proceedings of the International Conference on Engineering of Reconfigurable Systems and Algorithms.
[33]
Storaasli, O. 2008. Accelerating genome sequencing 100-1000X with FPGAs. In Proceedings of the Many-Core and Reconfigurable Supercomputing Conference (MRSC).
[34]
Subramanian, N. 2009. A C-to-FPGA solution for accelerating tomographic reconstruction. M.S. thesis, University of Washington.
[35]
Sunderam, V. S. 1990. Pvm: A framework for parallel distributed computing. Concur. Pract. Exper. 2, 315--339.
[36]
Tilera Corp. 2008. TILE64 processor product brief. Tilera Corp.
[37]
Tsui, B. M. W. and Frey, E. C. 2006. Analytic image reconstruction methods in emission computed tomography. In Quantitative Analysis in Nuclear Medicine Imaging, Springer, 82--106.
[38]
Williams, J., Massie, C., George, A. D., Richardson, J., Gosrani, K., and Lam, H. 2010. Characterization of fixed and reconfigurable multi-core devices for application acceleration. ACM Trans. Reconfig. Technol. Syst. 3, 19:1--19:29.

Cited By

View all
  • (2015)Low-Overhead FPGA Middleware for Application Portability and ProductivityACM Transactions on Reconfigurable Technology and Systems10.1145/27464048:4(1-22)Online publication date: 11-Sep-2015
  • (2014)Automatic Synthesis over Multiple APIs from Uml/Marte Models for Easy Platform Mapping and ReuseProceedings of the 2014 17th Euromicro Conference on Digital System Design10.1109/DSD.2014.48(443-450)Online publication date: 27-Aug-2014
  • (2014)Automatic deployment of component-based embedded systems from UML/MARTE models using MCAPIDesign of Circuits and Integrated Systems10.1109/DCIS.2014.7035575(1-6)Online publication date: Nov-2014

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Reconfigurable Technology and Systems
ACM Transactions on Reconfigurable Technology and Systems  Volume 5, Issue 2
June 2012
100 pages
ISSN:1936-7406
EISSN:1936-7414
DOI:10.1145/2209285
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 June 2012
Accepted: 01 September 2011
Revised: 01 June 2011
Received: 01 February 2011
Published in TRETS Volume 5, Issue 2

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. FPGA
  2. Reconfigurable computing
  3. accelerators
  4. communication
  5. coordination
  6. heterogeneous computing
  7. portability
  8. productivity

Qualifiers

  • Research-article
  • Research
  • Refereed

Funding Sources

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)3
  • Downloads (Last 6 weeks)0
Reflects downloads up to 15 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2015)Low-Overhead FPGA Middleware for Application Portability and ProductivityACM Transactions on Reconfigurable Technology and Systems10.1145/27464048:4(1-22)Online publication date: 11-Sep-2015
  • (2014)Automatic Synthesis over Multiple APIs from Uml/Marte Models for Easy Platform Mapping and ReuseProceedings of the 2014 17th Euromicro Conference on Digital System Design10.1109/DSD.2014.48(443-450)Online publication date: 27-Aug-2014
  • (2014)Automatic deployment of component-based embedded systems from UML/MARTE models using MCAPIDesign of Circuits and Integrated Systems10.1109/DCIS.2014.7035575(1-6)Online publication date: Nov-2014

View Options

Get Access

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media