Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2145694.2145704acmconferencesArticle/Chapter ViewAbstractPublication PagesfpgaConference Proceedingsconference-collections
research-article

A performance and energy comparison of FPGAs, GPUs, and multicores for sliding-window applications

Published: 22 February 2012 Publication History
  • Get Citation Alerts
  • Abstract

    With the emergence of accelerator devices such as multicores, graphics-processing units (GPUs), and field-programmable gate arrays (FPGAs), application designers are confronted with the problem of searching a huge design space that has been shown to have widely varying performance and energy metrics for different accelerators, different application domains, and different use cases. To address this problem, numerous studies have evaluated specific applications across different accelerators. In this paper, we analyze an important domain of applications, referred to as sliding-window applications, when executing on FPGAs, GPUs, and multicores. For each device, we present optimization strategies and analyze use cases where each device is most effective. The results show that FPGAs can achieve speedup of up to 11x and 57x compared to GPUs and multicores, respectively, while also using orders of magnitude less energy.

    References

    [1]
    Altera, Inc. 2011 Stratix III Early Power Estimator. http://www.altera.com/support/devices/estimator/st3-estimator/st3-power-estimator.html.
    [2]
    Asano, S., Maruyama, T., and Yamaguchi, Y. 2009. Performance comparison of FPGA, GPU and CPU in image processing. In Proc. of Int. Conf. on Field Prog, Logic and App. FPL '09. 126--131.
    [3]
    Baker, Z.K., Gokhale, M.B., and Tripp, J.L. 2007. Matched filter computation on FPGA, Cell and GPU. In Proc. of the IEEE Symp. on Field-Prog. Custom Computing Machines. FCCM'07. 207--218.
    [4]
    Chase, J., Nelson, B., Bodily, J., Zhaoyi W., and Dah-Jye, L. 2008. Real-time optical flow calculations on FPGA and GPU architectures: a comparison study. In Proc. of the Int. Symp. on Field-Prog. Custom Computing Machines. FCCM '08. 173--182.
    [5]
    Che, S., Li, J., Sheaffer, J.W., Skadron, K., and Lach, J. 2008. Accelerating compute-intensive applications with GPUs and FPGAs. In Proc. of the Symp. on Application Specific Processors. SASP'08. 101--107.
    [6]
    Cope, B., Cheung, P.Y.K., Luk, W., and Witt, S. 2005. Have GPUs made FPGAs redundant in the field of video processing? In Proc. of the IEEE Int. Conf. on Field-Prog. Technology. 111--118.
    [7]
    Dong, Y., Dou, Y., and Zhou, J. 2007. Optimized generation of memory structure in compiling window operations onto reconfigurable hardware," in Proc. of the Int. Symp. on Applied Reconfigurable Computing, ARC '07. 110--121.
    [8]
    Friemel, B.H., Bohs, L.N., and Trahey, G.E. 1995. Relative performance of two-dimensional speckle-tracking techniques: normalized correlation, non-normalized correlation and sum-absolute-difference. In Proc. of the IEEE Ultrasonics Symp. 2, 1481--1484.
    [9]
    Frigo, M., and Johnson, S. 2009. FFTW Library. http://fftw.org
    [10]
    Guo, Z., Najjar, W., Vahid, F., and Vissers, K. 2004. A quantitative analysis of the speedup factors of FPGAs over processors. In Proc. of the ACM/SIGDA Int. Symp. on Field Prog. gate arrays. FPGA '04. 162--170.
    [11]
    Harris, M. 2007. "Optimizing Parallel Reduction in CUDA," NVIDIA Developer Technology.
    [12]
    Hunt, L. 2009. Fault-aware machine vision in small unmanned systems. In Proc. of the Florida Conf. on Recent Advances in Robotics. FCRAR'09.
    [13]
    Intel. 2010. Writing Optimal OpenCL Code with Intel OpenCL SDK: Performance Guide. http://software.intel.com/file/37171/.
    [14]
    Liu, W., Pokharel, P., and Principe, J. 2007. Correntropy: Properties and applications in non-Gaussian signal processing. IEEE Tranactions on. Signal Processing, 55, 11 (Nov. 2007), 5286--5298.
    [15]
    Mehta, S., Misra, A., Singhal, A., Kumar, P., and Mittal, A. 2010. A high-performance parallel implementation of sum of absolute differences algorithm for motion estimation using CUDA. HiPC Conf. 2010.
    [16]
    Munshi, A. The OpenCL Specification. http://www.khronos.org/registry/cl/specs/opencl-1.0.29.pdf.
    [17]
    NVIDIA. 2001. CUDA. http://developer.nvidia.com/object/cuda.html.
    [18]
    NVIDIA. 2011. CUDA CUFFT Library. http://developer.nvidia.com/cuda-toolkit-40.
    [19]
    NVIDIA. 2011. NVIDIA Tegra 2. http://www.nvidia.com/object/tegra-2.html.
    [20]
    Owens, J.D., Houston, M., Luebke, D., Green, S., Stone, J.E., and Phillips, J.C. 2008. GPU computing. Proc. of the IEEE. 96, 5, 879--899.
    [21]
    Pauwels, K., Tomasi, M., Diaz Alonso, J., Ros, E., and Van Hulle, M. 2011. A comparison of FPGA and GPU for real-time phase-based optical flow, stereo, and local image features. IEEE Transactions on Computers. 99.
    [22]
    Podlozhnyuk, V. 2007. FFT-based 2D convolution. White Paper. NVIDIA Corporation.
    [23]
    Porter, R.B. and Bergmann, N.W. A generic implementation framework for FPGA based stereo matching. In Proc. of the IEEE Speech and Image Technologies for Computing and Telecommunications, TENCON '97. 461--464.
    [24]
    Principe, J., Fisher III, J., Xu, D. 2000. Information theoretic learning. In S. Haykin (Ed.), Unsupervised adaptive filtering. New York, NY: Wiley.
    [25]
    Sinha, S., Frahm, J.M., and Pollefeys M. 2006. GPU-based Video Feature Tracking and Matching. Technical Report TR06-012, University of North Carolina at Chapel Hill.
    [26]
    Underwood, K.D. and Hemmert, K.S. 2004. Closing the gap: CPU and FPGA trends in sustainable floating-point BLAS performance. In Proc. of the IEEE Symp. on Field-Prog. Custom Computing Machines, FCCM'04. 219--228.
    [27]
    Xilinx. 2010. Virtex-4 Family Overview v3.1. (Aug 30, 2010). http://www.xilinx.com/support/documentation/data_sheets/ds112.pdf
    [28]
    Yu, H. and Leeser, M. 2006. Automatic sliding window operation optimization for FPGA-based computing boards. In Proc. of the IEEE Symp. on Field-Prog. Custom Computing Machines. FCCM '06. 76--88.
    [29]
    Zhang, J., He, Y., Yang S., and Zhong, Y. 2003. Performance and complexity joint optimization for H.264 video coding. In Proc. of the Int. Symp. on Circuits and Systems. ISCAS '03. 2, 888--891.
    [30]
    Zhi G., Betul B., and Walid N. 2004. Input data reuse in compiling window operations onto reconfigurable hardware. In Proc. of the ACM SIGPLAN/SIGBED Conf. on Languages, compilers, and tools for embedded systems. LCTES '04. 249--256.

    Cited By

    View all
    • (2024)Low-Latency, Line-Rate Variable-Length Field Parsing for 100+ Gb/s EthernetProceedings of the 2024 ACM/SIGDA International Symposium on Field Programmable Gate Arrays10.1145/3626202.3637559(12-21)Online publication date: 1-Apr-2024
    • (2024)Performance verification and latency time evaluation of hardware image processing module for appearance inspection systems using FPGAJournal of Real-Time Image Processing10.1007/s11554-023-01392-721:1Online publication date: 10-Jan-2024
    • (2024)Real-time diabetic foot ulcer classification based on deep learning & parallel hardware computational toolsMultimedia Tools and Applications10.1007/s11042-024-18304-x83:27(70369-70394)Online publication date: 3-Feb-2024
    • Show More Cited By

    Index Terms

    1. A performance and energy comparison of FPGAs, GPUs, and multicores for sliding-window applications

          Recommendations

          Comments

          Information & Contributors

          Information

          Published In

          cover image ACM Conferences
          FPGA '12: Proceedings of the ACM/SIGDA international symposium on Field Programmable Gate Arrays
          February 2012
          352 pages
          ISBN:9781450311557
          DOI:10.1145/2145694
          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Sponsors

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          Published: 22 February 2012

          Permissions

          Request permissions for this article.

          Check for updates

          Author Tags

          1. FPGA
          2. GPU
          3. multicore
          4. parallelism
          5. sliding window
          6. speedup

          Qualifiers

          • Research-article

          Conference

          FPGA '12
          Sponsor:

          Acceptance Rates

          FPGA '12 Paper Acceptance Rate 20 of 87 submissions, 23%;
          Overall Acceptance Rate 125 of 627 submissions, 20%

          Contributors

          Other Metrics

          Bibliometrics & Citations

          Bibliometrics

          Article Metrics

          • Downloads (Last 12 months)54
          • Downloads (Last 6 weeks)7
          Reflects downloads up to 10 Aug 2024

          Other Metrics

          Citations

          Cited By

          View all
          • (2024)Low-Latency, Line-Rate Variable-Length Field Parsing for 100+ Gb/s EthernetProceedings of the 2024 ACM/SIGDA International Symposium on Field Programmable Gate Arrays10.1145/3626202.3637559(12-21)Online publication date: 1-Apr-2024
          • (2024)Performance verification and latency time evaluation of hardware image processing module for appearance inspection systems using FPGAJournal of Real-Time Image Processing10.1007/s11554-023-01392-721:1Online publication date: 10-Jan-2024
          • (2024)Real-time diabetic foot ulcer classification based on deep learning & parallel hardware computational toolsMultimedia Tools and Applications10.1007/s11042-024-18304-x83:27(70369-70394)Online publication date: 3-Feb-2024
          • (2024)GSGP-hardware: instantaneous symbolic regression with an FPGA implementation of geometric semantic genetic programmingGenetic Programming and Evolvable Machines10.1007/s10710-024-09491-525:2Online publication date: 25-Jun-2024
          • (2024)SDK4ED: a platform for building energy efficient, dependable, and maintainable embedded softwareAutomated Software Engineering10.1007/s10515-024-00450-z31:2Online publication date: 11-Jun-2024
          • (2023)Real-Time Embedded Vision System for Online Monitoring and Sorting of Citrus FruitsElectronics10.3390/electronics1218389112:18(3891)Online publication date: 15-Sep-2023
          • (2023)Deep Learning Inferencing with High-performance Hardware AcceleratorsACM Transactions on Intelligent Systems and Technology10.1145/359422114:4(1-25)Online publication date: 15-Jun-2023
          • (2023)A Scalable Many-core Overlay Architecture on an HBM2-enabled Multi-Die FPGAACM Transactions on Reconfigurable Technology and Systems10.1145/354765716:1(1-33)Online publication date: 18-Jan-2023
          • (2023)Bringing Energy Efficiency Closer to Application Developers: An Extensible Software Analysis FrameworkIEEE Transactions on Sustainable Computing10.1109/TSUSC.2022.32224098:2(180-193)Online publication date: 1-Apr-2023
          • (2023)Heterogeneous Multi-Functional Look-Up-Table-based Processing-in-Memory Architecture for Deep Learning Acceleration2023 24th International Symposium on Quality Electronic Design (ISQED)10.1109/ISQED57927.2023.10129338(1-8)Online publication date: 5-Apr-2023
          • Show More Cited By

          View Options

          Get Access

          Login options

          View options

          PDF

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader

          Media

          Figures

          Other

          Tables

          Share

          Share

          Share this Publication link

          Share on social media