research-article

Convolution engine: balancing efficiency & flexibility in specialized computing

Authors:

Wajahat Qadeer,

Preethi Venkatesan,

Christos Kozyrakis,

Mark A. HorowitzAuthors Info & Claims

ISCA '13: Proceedings of the 40th Annual International Symposium on Computer Architecture

Pages 24 - 35

https://doi.org/10.1145/2485922.2485925

Published: 23 June 2013 Publication History

Abstract

This paper focuses on the trade-off between flexibility and efficiency in specialized computing. We observe that specialized units achieve most of their efficiency gains by tuning data storage and compute structures and their connectivity to the data-flow and data-locality patterns in the kernels. Hence, by identifying key data-flow patterns used in a domain, we can create efficient engines that can be programmed and reused across a wide range of applications.

We present an example, the Convolution Engine (CE), specialized for the convolution-like data-flow that is common in computational photography, image processing, and video processing applications. CE achieves energy efficiency by capturing data reuse patterns, eliminating data transfer overheads, and enabling a large number of operations per memory access. We quantify the tradeoffs in efficiency and flexibility and demonstrate that CE is within a factor of 2-3x of the energy and area efficiency of custom units optimized for a single kernel. CE improves energy and area efficiency by 8-15x over a SIMD engine for most applications.

References

[1]

Digic Processors, Canon Inc. http://learn.usa.canon.com/resources-/articles/2012/digic_processors.htmlp.

[2]

Omap 5 platform, texas instruments, www.ti.com/omap.

[3]

Snapdragon Processors, Qualcomm Inc. http://www.qualcomm.com/snapdragon/processors.

[4]

Tegra processors. NVIDIA Corporation.

[5]

A. Adams, D. Jacobs, J. Dolson, M. Tico, K. Pulli, E. Talvala, B. Ajdin, D. Vaquero, H. Lensch, M. Horowitz, et al. The frankencamera: an experimental platform for computational photography. ACM Transactions on Graphics (TOG), 2010.

Digital Library

[6]

A. Bakhoda, G. Yuan, W. W. L. Fung, H. Wong, and T. M. Aamodt. Analyzing cuda workloads using a detailed gpu simulator. In ISPASS: IEEE International Symposium on Performance Analysis of Systems and Software, 2009.

[7]

J. Balfour, W. Dally, D. Black-Schaffer, V. Parikh, and J. Park. An energy-efficient processor architecture for embedded systems. Computer Architecture Letters, 7(1):29--32, 2007.

Digital Library

[8]

H. Bay, T. Tuytelaars, and L. Van Gool. Surf: Speeded up robust features. Computer Vision--ECCV 2006, pages 404--417, 2006.

Digital Library

[9]

B. Bayer. Color imaging array, 1976. US Patent 3,971,065.

[10]

J. D. Brown. The ibm power edge of network processor. In The Technical Record of the 22nd Hot Chips Conference, Aug. 2010.

[11]

T. C. Chen. Analysis and architecture design of an HDTV720p 30 frames/s H.264/AVC encoder. IEEE Transactions on Circuits and Systems for Video Technology, 16(6):673--688, 2006.

Digital Library

[12]

Y. Cheng, K. Xie, Y. Zhou, and Y. Liu. An adaptive color plane interpolation method based on edge detection. Journal of Electronics (China), 2007.

[13]

J. Cong, V. Sarkar, G. Reinman, and A. Bui. Customizable domain-specific computing. IEEE Des. Test, 28(2):6--15, Mar. 2011.

Digital Library

[14]

N. Corporation. Expeed Digital Image Processors. Nikon Corporation., http://imaging.nikon.com/lineup/microsite/d300.

[15]

S. Corporation. BIONZ Image Processing Engine. Sony Corporation., http://www.sony-mea.com/microsite/dslr/10/tech/bionz.html.

[16]

P. Debevec, E. Reinhard, G. Ward, and S. Pattanaik. High dynamic range imaging. In ACM SIGGRAPH 2004 Course Notes, page 14. ACM, 2004.

Digital Library

[17]

R. Golla and P. Jordan. T4: A highly threaded server-on-a-chip with native support for heterogeneous computing. In The Technical Record of the 23rd Hot Chips Conference, Aug. 2011.

[18]

R. Gonzalez. Xtensa: a configurable and extensible processor. Micro, IEEE, 20(2):60--70, Mar/Apr 2000.

Digital Library

[19]

V. Govindaraju, C.-H. Ho, T. Nowatzki, J. Chhugani, N. Satish, K. Sankaralingam, and C. Kim. Dyser: Unifying functionality and parallelism specialization for energy-efficient computing. Micro, IEEE, 2012.

Digital Library

[20]

R. Hameed, W. Qadeer, M. Wachs, O. Azizi, A. Solomatnikov, B. C. Lee, S. Richardson, C. Kozyrakis, and M. Horowitz. Understanding Sources of Inefficiency in General-Purpose Chips. In ISCA '10: Proc. 37th Annual International Symposium on Computer Architecture. ACM, 2010.

Digital Library

[21]

J. Leng, S. Gilani, T. Hetherington, A. ElTantawy, N. S. Kim, T. M. Aamodt, and V. J. Reddi. Gpuwattch: Enabling energy optimizations in gpgpus. In ISCA 2013: International Symposium on Computer Architecture, 2013.

Digital Library

[22]

D. Lowe. Distinctive image features from scale-invariant keypoints. International journal of computer vision, 60(2):91--110, 2004.

Digital Library

[23]

Y. Matsushita, E. Ofek, X. Tang, and H. Shum. Full-frame video stabilization. In Computer Vision and Pattern Recognition (CVPR), 2005. IEEE Computer Society Conference on.

Digital Library

[24]

G. Petschnigg, R. Szeliski, M. Agrawala, M. Cohen, H. Hoppe, and K. Toyama. Digital photography with flash and no-flash image pairs. In ACM Transactions on Graphics (TOG).

Digital Library

[25]

R. Raskar. Computational photography. In Computational Optical Sensing and Imaging. Optical Society of America, 2009.

[26]

O. Shacham, S. Galal, S. Sankaranarayanan, M. Wachs, J. Brunhaver, A. Vassiliev, M. Horowitz, A. Danowitz, W. Qadeer, and S. Richardson. Avoiding game over: Bringing design to the next level. In Design Automation Conference (DAC), 2012 49th ACM/EDAC/IEEE, june 2012.

Digital Library

[27]

A. Solomatnikov, A. Firoozshahian, W. Qadeer, O. Shacham, K. Kelley, Z. Asgar, M. Wachs, R. Hameed, and M. Horowitz. Chip Multi-Processor Generator. In DAC '07: Proceedings of the 44th Annual Design Automation Conference, 2007.

Digital Library

[28]

J. A. Stratton, C. Rodrigues, I.-J. Sung, N. Obeid, vLi Wen Chang, N. Anssari, G. D. Liu, and W. mei W. Hwu. Impact technical report. In IMPACT-12-01, 2012.

[29]

Tensilica Inc. ConnX Vectra LX DSP Engine Guide.

[30]

Tensilica Inc. Tensilica Instruction Extension (TIE) Language Reference Manual.

[31]

G. Venkatesh, J. Sampson, N. Goulding, S. Garcia, V. Bryksin, J. Lugo-Martinez, S. Swanson, and M. B. Taylor. Conservation cores: reducing the energy of mature computations. ASPLOS '10. ACM, 2010.

Digital Library

Cited By

Taranco RArnau JGonzález A(2024)SLIDEX: A Novel Architecture for Sliding Window ProcessingProceedings of the 38th ACM International Conference on Supercomputing10.1145/3650200.3656613(312-323)Online publication date: 30-May-2024
https://dl.acm.org/doi/10.1145/3650200.3656613
Lu LLuo ZZheng SYin JCong JLiang YYin J(2024)Rubick: A Unified Infrastructure for Analyzing, Exploring, and Implementing Spatial Architectures via Dataflow DecompositionIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2023.333720843:4(1177-1190)Online publication date: Apr-2024
https://doi.org/10.1109/TCAD.2023.3337208
Chatzopoulos OPapadimitriou GKarakostas VGizopoulos D(2024)Gem5-MARVEL: Microarchitecture-Level Resilience Analysis of Heterogeneous SoC Architectures2024 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA57654.2024.00047(543-559)Online publication date: 2-Mar-2024
https://doi.org/10.1109/HPCA57654.2024.00047
Show More Cited By

Index Terms

Convolution engine: balancing efficiency & flexibility in specialized computing
1. Computer systems organization
  1. Architectures
    1. Other architectures
      1. Heterogeneous (hybrid) systems
2. Hardware
  1. Very large scale integration design
    1. VLSI system specification and constraints

Recommendations

Convolution engine: balancing efficiency and flexibility in specialized computing

General-purpose processors, while tremendously versatile, pay a huge cost for their flexibility by wasting over 99% of the energy in programmability overheads. We observe that reducing this waste requires tuning data storage and compute structures and ...
Convolution engine: balancing efficiency & flexibility in specialized computing
ICSA '13

This paper focuses on the trade-off between flexibility and efficiency in specialized computing. We observe that specialized units achieve most of their efficiency gains by tuning data storage and compute structures and their connectivity to the data-...
Understanding sources of inefficiency in general-purpose chips
ISCA '10

Due to their high volume, general-purpose processors, and now chip multiprocessors (CMPs), are much more cost effective than ASICs, but lag significantly in terms of performance and energy efficiency. This paper explores the sources of these performance ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

ISCA '13: Proceedings of the 40th Annual International Symposium on Computer Architecture

June 2013

686 pages

ISBN:9781450320795

DOI:10.1145/2485922

General Chair:
Avi Mendelson
Technion

ACM SIGARCH Computer Architecture News Volume 41, Issue 3
ICSA '13
June 2013
666 pages
ISSN:0163-5964
DOI:10.1145/2508148
Issue’s Table of Contents

Copyright © 2013 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

IEEE CS

In-Cooperation

SIGARCH: ACM Special Interest Group on Computer Architecture

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 23 June 2013

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Defense Advanced Research Projects Agency

Conference

ISCA'13

Sponsor:

ISCA'13: The 40th Annual International Symposium on Computer Architecture

June 23 - 27, 2013

Tel-Aviv, Israel

Acceptance Rates

ISCA '13 Paper Acceptance Rate 56 of 288 submissions, 19%;

Overall Acceptance Rate 543 of 3,203 submissions, 17%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

178
Total Citations
View Citations
2,100
Total Downloads

Downloads (Last 12 months)72
Downloads (Last 6 weeks)8

Reflects downloads up to 15 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

Taranco RArnau JGonzález A(2024)SLIDEX: A Novel Architecture for Sliding Window ProcessingProceedings of the 38th ACM International Conference on Supercomputing10.1145/3650200.3656613(312-323)Online publication date: 30-May-2024
https://dl.acm.org/doi/10.1145/3650200.3656613
Lu LLuo ZZheng SYin JCong JLiang YYin J(2024)Rubick: A Unified Infrastructure for Analyzing, Exploring, and Implementing Spatial Architectures via Dataflow DecompositionIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2023.333720843:4(1177-1190)Online publication date: Apr-2024
https://doi.org/10.1109/TCAD.2023.3337208
Chatzopoulos OPapadimitriou GKarakostas VGizopoulos D(2024)Gem5-MARVEL: Microarchitecture-Level Resilience Analysis of Heterogeneous SoC Architectures2024 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA57654.2024.00047(543-559)Online publication date: 2-Mar-2024
https://doi.org/10.1109/HPCA57654.2024.00047
Naganawa YKamei HKanetaka YNogami HMaeda YFukushima N(2024)SIMD-Constrained Lookup Table for Accelerating Variable-Weighted Convolution on x86/64 CPUsIEEE Access10.1109/ACCESS.2024.335472012(15800-15819)Online publication date: 2024
https://doi.org/10.1109/ACCESS.2024.3354720
Ma TFeng YZhang XZhu YSolihin YHeinrich M(2023)CAMJ: Enabling System-Level Energy Modeling and Architectural Exploration for In-Sensor Visual ComputingProceedings of the 50th Annual International Symposium on Computer Architecture10.1145/3579371.3589064(1-14)Online publication date: 17-Jun-2023
https://dl.acm.org/doi/10.1145/3579371.3589064
Zhang JSultan AZandigohar MSchirner G(2023)Generating Unified Platforms Using Multigranularity Domain DSE (MG-DmDSE) Exploiting Application SimilaritiesIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2022.317237342:1(280-293)Online publication date: Jan-2023
https://doi.org/10.1109/TCAD.2022.3172373
Yousefifeshki FLi HKhomh F(2023)Studying the challenges of developing hardware description language programsInformation and Software Technology10.1016/j.infsof.2023.107196159:COnline publication date: 1-Jul-2023
https://dl.acm.org/doi/10.1016/j.infsof.2023.107196
Pokhrel NSnäll SHeimo OSarwar UAirola ASäntti T(2023)Accelerating Image Processing Using Reduced Precision Calculation Convolution EnginesJournal of Signal Processing Systems10.1007/s11265-023-01869-595:9(1115-1126)Online publication date: 9-May-2023
https://doi.org/10.1007/s11265-023-01869-5
Lu LLiang Y(2022)Morphling: A Reconfigurable Architecture for Tensor ComputationIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2021.313532241:11(4733-4746)Online publication date: Nov-2022
https://doi.org/10.1109/TCAD.2021.3135322
Alali MRoohi ADeogun J(2022)Enabling Efficient Training of Convolutional Neural Networks for Histopathology ImagesImage Analysis and Processing. ICIAP 2022 Workshops10.1007/978-3-031-13321-3_47(533-544)Online publication date: 7-Aug-2022
https://doi.org/10.1007/978-3-031-13321-3_47
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents