Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2485922.2485925acmotherconferencesArticle/Chapter ViewAbstractPublication PagesiscaConference Proceedingsconference-collections
research-article

Convolution engine: balancing efficiency & flexibility in specialized computing

Published: 23 June 2013 Publication History

Abstract

This paper focuses on the trade-off between flexibility and efficiency in specialized computing. We observe that specialized units achieve most of their efficiency gains by tuning data storage and compute structures and their connectivity to the data-flow and data-locality patterns in the kernels. Hence, by identifying key data-flow patterns used in a domain, we can create efficient engines that can be programmed and reused across a wide range of applications.
We present an example, the Convolution Engine (CE), specialized for the convolution-like data-flow that is common in computational photography, image processing, and video processing applications. CE achieves energy efficiency by capturing data reuse patterns, eliminating data transfer overheads, and enabling a large number of operations per memory access. We quantify the tradeoffs in efficiency and flexibility and demonstrate that CE is within a factor of 2-3x of the energy and area efficiency of custom units optimized for a single kernel. CE improves energy and area efficiency by 8-15x over a SIMD engine for most applications.

References

[1]
Digic Processors, Canon Inc. http://learn.usa.canon.com/resources-/articles/2012/digic_processors.htmlp.
[2]
Omap 5 platform, texas instruments, www.ti.com/omap.
[3]
Snapdragon Processors, Qualcomm Inc. http://www.qualcomm.com/snapdragon/processors.
[4]
Tegra processors. NVIDIA Corporation.
[5]
A. Adams, D. Jacobs, J. Dolson, M. Tico, K. Pulli, E. Talvala, B. Ajdin, D. Vaquero, H. Lensch, M. Horowitz, et al. The frankencamera: an experimental platform for computational photography. ACM Transactions on Graphics (TOG), 2010.
[6]
A. Bakhoda, G. Yuan, W. W. L. Fung, H. Wong, and T. M. Aamodt. Analyzing cuda workloads using a detailed gpu simulator. In ISPASS: IEEE International Symposium on Performance Analysis of Systems and Software, 2009.
[7]
J. Balfour, W. Dally, D. Black-Schaffer, V. Parikh, and J. Park. An energy-efficient processor architecture for embedded systems. Computer Architecture Letters, 7(1):29--32, 2007.
[8]
H. Bay, T. Tuytelaars, and L. Van Gool. Surf: Speeded up robust features. Computer Vision--ECCV 2006, pages 404--417, 2006.
[9]
B. Bayer. Color imaging array, 1976. US Patent 3,971,065.
[10]
J. D. Brown. The ibm power edge of network processor. In The Technical Record of the 22nd Hot Chips Conference, Aug. 2010.
[11]
T. C. Chen. Analysis and architecture design of an HDTV720p 30 frames/s H.264/AVC encoder. IEEE Transactions on Circuits and Systems for Video Technology, 16(6):673--688, 2006.
[12]
Y. Cheng, K. Xie, Y. Zhou, and Y. Liu. An adaptive color plane interpolation method based on edge detection. Journal of Electronics (China), 2007.
[13]
J. Cong, V. Sarkar, G. Reinman, and A. Bui. Customizable domain-specific computing. IEEE Des. Test, 28(2):6--15, Mar. 2011.
[14]
N. Corporation. Expeed Digital Image Processors. Nikon Corporation., http://imaging.nikon.com/lineup/microsite/d300.
[15]
S. Corporation. BIONZ Image Processing Engine. Sony Corporation., http://www.sony-mea.com/microsite/dslr/10/tech/bionz.html.
[16]
P. Debevec, E. Reinhard, G. Ward, and S. Pattanaik. High dynamic range imaging. In ACM SIGGRAPH 2004 Course Notes, page 14. ACM, 2004.
[17]
R. Golla and P. Jordan. T4: A highly threaded server-on-a-chip with native support for heterogeneous computing. In The Technical Record of the 23rd Hot Chips Conference, Aug. 2011.
[18]
R. Gonzalez. Xtensa: a configurable and extensible processor. Micro, IEEE, 20(2):60--70, Mar/Apr 2000.
[19]
V. Govindaraju, C.-H. Ho, T. Nowatzki, J. Chhugani, N. Satish, K. Sankaralingam, and C. Kim. Dyser: Unifying functionality and parallelism specialization for energy-efficient computing. Micro, IEEE, 2012.
[20]
R. Hameed, W. Qadeer, M. Wachs, O. Azizi, A. Solomatnikov, B. C. Lee, S. Richardson, C. Kozyrakis, and M. Horowitz. Understanding Sources of Inefficiency in General-Purpose Chips. In ISCA '10: Proc. 37th Annual International Symposium on Computer Architecture. ACM, 2010.
[21]
J. Leng, S. Gilani, T. Hetherington, A. ElTantawy, N. S. Kim, T. M. Aamodt, and V. J. Reddi. Gpuwattch: Enabling energy optimizations in gpgpus. In ISCA 2013: International Symposium on Computer Architecture, 2013.
[22]
D. Lowe. Distinctive image features from scale-invariant keypoints. International journal of computer vision, 60(2):91--110, 2004.
[23]
Y. Matsushita, E. Ofek, X. Tang, and H. Shum. Full-frame video stabilization. In Computer Vision and Pattern Recognition (CVPR), 2005. IEEE Computer Society Conference on.
[24]
G. Petschnigg, R. Szeliski, M. Agrawala, M. Cohen, H. Hoppe, and K. Toyama. Digital photography with flash and no-flash image pairs. In ACM Transactions on Graphics (TOG).
[25]
R. Raskar. Computational photography. In Computational Optical Sensing and Imaging. Optical Society of America, 2009.
[26]
O. Shacham, S. Galal, S. Sankaranarayanan, M. Wachs, J. Brunhaver, A. Vassiliev, M. Horowitz, A. Danowitz, W. Qadeer, and S. Richardson. Avoiding game over: Bringing design to the next level. In Design Automation Conference (DAC), 2012 49th ACM/EDAC/IEEE, june 2012.
[27]
A. Solomatnikov, A. Firoozshahian, W. Qadeer, O. Shacham, K. Kelley, Z. Asgar, M. Wachs, R. Hameed, and M. Horowitz. Chip Multi-Processor Generator. In DAC '07: Proceedings of the 44th Annual Design Automation Conference, 2007.
[28]
J. A. Stratton, C. Rodrigues, I.-J. Sung, N. Obeid, vLi Wen Chang, N. Anssari, G. D. Liu, and W. mei W. Hwu. Impact technical report. In IMPACT-12-01, 2012.
[29]
Tensilica Inc. ConnX Vectra LX DSP Engine Guide.
[30]
Tensilica Inc. Tensilica Instruction Extension (TIE) Language Reference Manual.
[31]
G. Venkatesh, J. Sampson, N. Goulding, S. Garcia, V. Bryksin, J. Lugo-Martinez, S. Swanson, and M. B. Taylor. Conservation cores: reducing the energy of mature computations. ASPLOS '10. ACM, 2010.

Cited By

View all
  • (2024)SLIDEX: A Novel Architecture for Sliding Window ProcessingProceedings of the 38th ACM International Conference on Supercomputing10.1145/3650200.3656613(312-323)Online publication date: 30-May-2024
  • (2024)Rubick: A Unified Infrastructure for Analyzing, Exploring, and Implementing Spatial Architectures via Dataflow DecompositionIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2023.333720843:4(1177-1190)Online publication date: Apr-2024
  • (2024)Gem5-MARVEL: Microarchitecture-Level Resilience Analysis of Heterogeneous SoC Architectures2024 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA57654.2024.00047(543-559)Online publication date: 2-Mar-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
ISCA '13: Proceedings of the 40th Annual International Symposium on Computer Architecture
June 2013
686 pages
ISBN:9781450320795
DOI:10.1145/2485922
  • cover image ACM SIGARCH Computer Architecture News
    ACM SIGARCH Computer Architecture News  Volume 41, Issue 3
    ICSA '13
    June 2013
    666 pages
    ISSN:0163-5964
    DOI:10.1145/2508148
    Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

  • IEEE CS

In-Cooperation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 23 June 2013

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. H.264
  2. computational photography
  3. convolution
  4. demosaic
  5. energy efficiency
  6. specialized computing
  7. tensilica

Qualifiers

  • Research-article

Funding Sources

Conference

ISCA'13
Sponsor:

Acceptance Rates

ISCA '13 Paper Acceptance Rate 56 of 288 submissions, 19%;
Overall Acceptance Rate 543 of 3,203 submissions, 17%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)72
  • Downloads (Last 6 weeks)8
Reflects downloads up to 15 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2024)SLIDEX: A Novel Architecture for Sliding Window ProcessingProceedings of the 38th ACM International Conference on Supercomputing10.1145/3650200.3656613(312-323)Online publication date: 30-May-2024
  • (2024)Rubick: A Unified Infrastructure for Analyzing, Exploring, and Implementing Spatial Architectures via Dataflow DecompositionIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2023.333720843:4(1177-1190)Online publication date: Apr-2024
  • (2024)Gem5-MARVEL: Microarchitecture-Level Resilience Analysis of Heterogeneous SoC Architectures2024 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA57654.2024.00047(543-559)Online publication date: 2-Mar-2024
  • (2024)SIMD-Constrained Lookup Table for Accelerating Variable-Weighted Convolution on x86/64 CPUsIEEE Access10.1109/ACCESS.2024.335472012(15800-15819)Online publication date: 2024
  • (2023)CAMJ: Enabling System-Level Energy Modeling and Architectural Exploration for In-Sensor Visual ComputingProceedings of the 50th Annual International Symposium on Computer Architecture10.1145/3579371.3589064(1-14)Online publication date: 17-Jun-2023
  • (2023)Generating Unified Platforms Using Multigranularity Domain DSE (MG-DmDSE) Exploiting Application SimilaritiesIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2022.317237342:1(280-293)Online publication date: Jan-2023
  • (2023)Studying the challenges of developing hardware description language programsInformation and Software Technology10.1016/j.infsof.2023.107196159:COnline publication date: 1-Jul-2023
  • (2023)Accelerating Image Processing Using Reduced Precision Calculation Convolution EnginesJournal of Signal Processing Systems10.1007/s11265-023-01869-595:9(1115-1126)Online publication date: 9-May-2023
  • (2022)Morphling: A Reconfigurable Architecture for Tensor ComputationIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2021.313532241:11(4733-4746)Online publication date: Nov-2022
  • (2022)Enabling Efficient Training of Convolutional Neural Networks for Histopathology ImagesImage Analysis and Processing. ICIAP 2022 Workshops10.1007/978-3-031-13321-3_47(533-544)Online publication date: 7-Aug-2022
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media