research-article

A performance and energy comparison of FPGAs, GPUs, and multicores for sliding-window applications

Authors:

Greg StittAuthors Info & Claims

FPGA '12: Proceedings of the ACM/SIGDA international symposium on Field Programmable Gate Arrays

Pages 47 - 56

https://doi.org/10.1145/2145694.2145704

Published: 22 February 2012 Publication History

Abstract

With the emergence of accelerator devices such as multicores, graphics-processing units (GPUs), and field-programmable gate arrays (FPGAs), application designers are confronted with the problem of searching a huge design space that has been shown to have widely varying performance and energy metrics for different accelerators, different application domains, and different use cases. To address this problem, numerous studies have evaluated specific applications across different accelerators. In this paper, we analyze an important domain of applications, referred to as sliding-window applications, when executing on FPGAs, GPUs, and multicores. For each device, we present optimization strategies and analyze use cases where each device is most effective. The results show that FPGAs can achieve speedup of up to 11x and 57x compared to GPUs and multicores, respectively, while also using orders of magnitude less energy.

References

[1]

Altera, Inc. 2011 Stratix III Early Power Estimator. http://www.altera.com/support/devices/estimator/st3-estimator/st3-power-estimator.html.

[2]

Asano, S., Maruyama, T., and Yamaguchi, Y. 2009. Performance comparison of FPGA, GPU and CPU in image processing. In Proc. of Int. Conf. on Field Prog, Logic and App. FPL '09. 126--131.

[3]

Baker, Z.K., Gokhale, M.B., and Tripp, J.L. 2007. Matched filter computation on FPGA, Cell and GPU. In Proc. of the IEEE Symp. on Field-Prog. Custom Computing Machines. FCCM'07. 207--218.

Digital Library

[4]

Chase, J., Nelson, B., Bodily, J., Zhaoyi W., and Dah-Jye, L. 2008. Real-time optical flow calculations on FPGA and GPU architectures: a comparison study. In Proc. of the Int. Symp. on Field-Prog. Custom Computing Machines. FCCM '08. 173--182.

Digital Library

[5]

Che, S., Li, J., Sheaffer, J.W., Skadron, K., and Lach, J. 2008. Accelerating compute-intensive applications with GPUs and FPGAs. In Proc. of the Symp. on Application Specific Processors. SASP'08. 101--107.

Digital Library

[6]

Cope, B., Cheung, P.Y.K., Luk, W., and Witt, S. 2005. Have GPUs made FPGAs redundant in the field of video processing? In Proc. of the IEEE Int. Conf. on Field-Prog. Technology. 111--118.

[7]

Dong, Y., Dou, Y., and Zhou, J. 2007. Optimized generation of memory structure in compiling window operations onto reconfigurable hardware," in Proc. of the Int. Symp. on Applied Reconfigurable Computing, ARC '07. 110--121.

Digital Library

[8]

Friemel, B.H., Bohs, L.N., and Trahey, G.E. 1995. Relative performance of two-dimensional speckle-tracking techniques: normalized correlation, non-normalized correlation and sum-absolute-difference. In Proc. of the IEEE Ultrasonics Symp. 2, 1481--1484.

[9]

Frigo, M., and Johnson, S. 2009. FFTW Library. http://fftw.org

[10]

Guo, Z., Najjar, W., Vahid, F., and Vissers, K. 2004. A quantitative analysis of the speedup factors of FPGAs over processors. In Proc. of the ACM/SIGDA Int. Symp. on Field Prog. gate arrays. FPGA '04. 162--170.

Digital Library

[11]

Harris, M. 2007. "Optimizing Parallel Reduction in CUDA," NVIDIA Developer Technology.

[12]

Hunt, L. 2009. Fault-aware machine vision in small unmanned systems. In Proc. of the Florida Conf. on Recent Advances in Robotics. FCRAR'09.

[13]

Intel. 2010. Writing Optimal OpenCL Code with Intel OpenCL SDK: Performance Guide. http://software.intel.com/file/37171/.

[14]

Liu, W., Pokharel, P., and Principe, J. 2007. Correntropy: Properties and applications in non-Gaussian signal processing. IEEE Tranactions on. Signal Processing, 55, 11 (Nov. 2007), 5286--5298.

Digital Library

[15]

Mehta, S., Misra, A., Singhal, A., Kumar, P., and Mittal, A. 2010. A high-performance parallel implementation of sum of absolute differences algorithm for motion estimation using CUDA. HiPC Conf. 2010.

[16]

Munshi, A. The OpenCL Specification. http://www.khronos.org/registry/cl/specs/opencl-1.0.29.pdf.

[17]

NVIDIA. 2001. CUDA. http://developer.nvidia.com/object/cuda.html.

[18]

NVIDIA. 2011. CUDA CUFFT Library. http://developer.nvidia.com/cuda-toolkit-40.

[19]

NVIDIA. 2011. NVIDIA Tegra 2. http://www.nvidia.com/object/tegra-2.html.

[20]

Owens, J.D., Houston, M., Luebke, D., Green, S., Stone, J.E., and Phillips, J.C. 2008. GPU computing. Proc. of the IEEE. 96, 5, 879--899.

[21]

Pauwels, K., Tomasi, M., Diaz Alonso, J., Ros, E., and Van Hulle, M. 2011. A comparison of FPGA and GPU for real-time phase-based optical flow, stereo, and local image features. IEEE Transactions on Computers. 99.

Digital Library

[22]

Podlozhnyuk, V. 2007. FFT-based 2D convolution. White Paper. NVIDIA Corporation.

[23]

Porter, R.B. and Bergmann, N.W. A generic implementation framework for FPGA based stereo matching. In Proc. of the IEEE Speech and Image Technologies for Computing and Telecommunications, TENCON '97. 461--464.

[24]

Principe, J., Fisher III, J., Xu, D. 2000. Information theoretic learning. In S. Haykin (Ed.), Unsupervised adaptive filtering. New York, NY: Wiley.

[25]

Sinha, S., Frahm, J.M., and Pollefeys M. 2006. GPU-based Video Feature Tracking and Matching. Technical Report TR06-012, University of North Carolina at Chapel Hill.

[26]

Underwood, K.D. and Hemmert, K.S. 2004. Closing the gap: CPU and FPGA trends in sustainable floating-point BLAS performance. In Proc. of the IEEE Symp. on Field-Prog. Custom Computing Machines, FCCM'04. 219--228.

Digital Library

[27]

Xilinx. 2010. Virtex-4 Family Overview v3.1. (Aug 30, 2010). http://www.xilinx.com/support/documentation/data_sheets/ds112.pdf

[28]

Yu, H. and Leeser, M. 2006. Automatic sliding window operation optimization for FPGA-based computing boards. In Proc. of the IEEE Symp. on Field-Prog. Custom Computing Machines. FCCM '06. 76--88.

Digital Library

[29]

Zhang, J., He, Y., Yang S., and Zhong, Y. 2003. Performance and complexity joint optimization for H.264 video coding. In Proc. of the Int. Symp. on Circuits and Systems. ISCAS '03. 2, 888--891.

[30]

Zhi G., Betul B., and Walid N. 2004. Input data reuse in compiling window operations onto reconfigurable hardware. In Proc. of the ACM SIGPLAN/SIGBED Conf. on Languages, compilers, and tools for embedded systems. LCTES '04. 249--256.

Digital Library

Cited By

Stitt GPiard WCrary CZhang ZPutnam A(2024)Low-Latency, Line-Rate Variable-Length Field Parsing for 100+ Gb/s EthernetProceedings of the 2024 ACM/SIGDA International Symposium on Field Programmable Gate Arrays10.1145/3626202.3637559(12-21)Online publication date: 1-Apr-2024
https://dl.acm.org/doi/10.1145/3626202.3637559
Hoshino YShimasaki MRathnayake NDang T(2024)Performance verification and latency time evaluation of hardware image processing module for appearance inspection systems using FPGAJournal of Real-Time Image Processing10.1007/s11554-023-01392-721:1Online publication date: 10-Jan-2024
https://doi.org/10.1007/s11554-023-01392-7
Fadhel MAlzubaidi LGu YSantamaría JDuan Y(2024)Real-time diabetic foot ulcer classification based on deep learning & parallel hardware computational toolsMultimedia Tools and Applications10.1007/s11042-024-18304-x83:27(70369-70394)Online publication date: 3-Feb-2024
https://doi.org/10.1007/s11042-024-18304-x
Show More Cited By

Index Terms

A performance and energy comparison of FPGAs, GPUs, and multicores for sliding-window applications
1. Computer systems organization
  1. Embedded and cyber-physical systems
  2. Real-time systems
2. General and reference
  1. Cross-computing tools and techniques
    1. Design

Recommendations

A Tradeoff Analysis of FPGAs, GPUs, and Multicores for Sliding-Window Applications

The increasing usage of hardware accelerators such as Field-Programmable Gate Arrays (FPGAs) and Graphics Processing Units (GPUs) has significantly increased application design complexity. Such complexity results from a larger design space created by ...
Exploiting Parallelism on GPUs and FPGAs with OmpSs
ANDARE '17: Proceedings of the 1st Workshop on AutotuniNg and aDaptivity AppRoaches for Energy efficient HPC Systems

This paper presents the OmpSs approach to deal with heterogeneous programming on GPU and FPGA accelerators. The OmpSs programming model is based on the Mercurium compiler and the Nanos++ runtime. Applications are annotated with compiler directives ...
Performance study on CUDA GPUs for parallelizing the local ensemble transformed Kalman filter algorithm

Modern graphics cards provide computational capabilities that exceed current CPUs. As one of the computational intensive problems, numerical weather prediction has the opportunity to benefit from the massive number of threads and large memory throughput ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

FPGA '12: Proceedings of the ACM/SIGDA international symposium on Field Programmable Gate Arrays

February 2012

352 pages

ISBN:9781450311557

DOI:10.1145/2145694

General Chair:
Katherine Compton
University of Wisconsin-Madison
,
Program Chair:
Brad Hutchings
Brigham Young University

Copyright © 2012 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGDA: ACM Special Interest Group on Design Automation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 22 February 2012

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

FPGA '12

Sponsor:

SIGDA

FPGA '12: ACM/SIGDA International Symposium on Field Programmable Gate Arrays

February 22 - 24, 2012

California, Monterey, USA

Acceptance Rates

FPGA '12 Paper Acceptance Rate 20 of 87 submissions, 23%;

Overall Acceptance Rate 125 of 627 submissions, 20%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

210
Total Citations
View Citations
2,262
Total Downloads

Downloads (Last 12 months)54
Downloads (Last 6 weeks)7

Reflects downloads up to 10 Aug 2024

Other Metrics

View Author Metrics

Citations

Cited By

Stitt GPiard WCrary CZhang ZPutnam A(2024)Low-Latency, Line-Rate Variable-Length Field Parsing for 100+ Gb/s EthernetProceedings of the 2024 ACM/SIGDA International Symposium on Field Programmable Gate Arrays10.1145/3626202.3637559(12-21)Online publication date: 1-Apr-2024
https://dl.acm.org/doi/10.1145/3626202.3637559
Hoshino YShimasaki MRathnayake NDang T(2024)Performance verification and latency time evaluation of hardware image processing module for appearance inspection systems using FPGAJournal of Real-Time Image Processing10.1007/s11554-023-01392-721:1Online publication date: 10-Jan-2024
https://doi.org/10.1007/s11554-023-01392-7
Fadhel MAlzubaidi LGu YSantamaría JDuan Y(2024)Real-time diabetic foot ulcer classification based on deep learning & parallel hardware computational toolsMultimedia Tools and Applications10.1007/s11042-024-18304-x83:27(70369-70394)Online publication date: 3-Feb-2024
https://doi.org/10.1007/s11042-024-18304-x
Maldonado YSalas RQuevedo JValdez RTrujillo L(2024)GSGP-hardware: instantaneous symbolic regression with an FPGA implementation of geometric semantic genetic programmingGenetic Programming and Evolvable Machines10.1007/s10710-024-09491-525:2Online publication date: 25-Jun-2024
https://doi.org/10.1007/s10710-024-09491-5
Siavvas MTsoukalas DMarantos CPapadopoulos LLamprakos CMatei OStrydis CSiddiqi MChrobocinski PFilus KDomańska JAvgeriou PAmpatzoglou ASoudris DChatzigeorgiou AGelenbe EKehagias DTzovaras D(2024)SDK4ED: a platform for building energy efficient, dependable, and maintainable embedded softwareAutomated Software Engineering10.1007/s10515-024-00450-z31:2Online publication date: 11-Jun-2024
https://doi.org/10.1007/s10515-024-00450-z
Nuño-Maganda MDávila-Rodríguez IHernández-Mier YBarrón-Zambrano JElizondo-Leal JDíaz-Manriquez APolanco-Martagón S(2023)Real-Time Embedded Vision System for Online Monitoring and Sorting of Citrus FruitsElectronics10.3390/electronics1218389112:18(3891)Online publication date: 15-Sep-2023
https://doi.org/10.3390/electronics12183891
Kljucaric LGeorge A(2023)Deep Learning Inferencing with High-performance Hardware AcceleratorsACM Transactions on Intelligent Systems and Technology10.1145/359422114:4(1-25)Online publication date: 15-Jun-2023
https://dl.acm.org/doi/10.1145/3594221
Abdelhamid RYamaguchi YBoku T(2023)A Scalable Many-core Overlay Architecture on an HBM2-enabled Multi-Die FPGAACM Transactions on Reconfigurable Technology and Systems10.1145/354765716:1(1-33)Online publication date: 18-Jan-2023
https://dl.acm.org/doi/10.1145/3547657
Marantos CPapadopoulos LLamprakos CSalapas KSoudris D(2023)Bringing Energy Efficiency Closer to Application Developers: An Extensible Software Analysis FrameworkIEEE Transactions on Sustainable Computing10.1109/TSUSC.2022.32224098:2(180-193)Online publication date: 1-Apr-2023
https://doi.org/10.1109/TSUSC.2022.3222409
Bavikadi SSutradhar PGanguly ADinakarrao S(2023)Heterogeneous Multi-Functional Look-Up-Table-based Processing-in-Memory Architecture for Deep Learning Acceleration2023 24th International Symposium on Quality Electronic Design (ISQED)10.1109/ISQED57927.2023.10129338(1-8)Online publication date: 5-Apr-2023
https://doi.org/10.1109/ISQED57927.2023.10129338
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents