research-article

Approximate inference systems (AxIS): end-to-end approximations for energy-efficient inference at the edge

Authors:

Soumendu Kumar Ghosh,

Vijay RaghunathanAuthors Info & Claims

ISLPED '20: Proceedings of the ACM/IEEE International Symposium on Low Power Electronics and Design

Pages 7 - 12

https://doi.org/10.1145/3370748.3406575

Published: 10 August 2020 Publication History

Abstract

The rapid proliferation of the Internet-of-Things (IoT) and the dramatic resurgence of artificial intelligence (AI) based application workloads has led to immense interest in performing inference on energy-constrained edge devices. Approximate computing (a design paradigm that yields large energy savings at the cost of a small degradation in application quality) is a promising technique to enable energy-efficient inference at the edge. This paper introduces the concept of an approximate inference system (AxIS) and proposes a systematic methodology to perform joint approximations across different subsystems in a deep neural network-based inference system, leading to significant energy benefits compared to approximating individual subsystems in isolation. We use a smart camera system that executes various convolutional neural network (CNN) based image recognition applications to illustrate how the sensor, memory, compute, and communication subsystems can all be approximated synergistically. We demonstrate our proposed methodology using two variants of a smart camera system: (a) Cam_edge, where the CNN executes locally on the edge device, and (b) Cam_cloud, where the edge device sends the captured image to a remote cloud server that executes the CNN. We have prototyped such an approximate inference system using an Altera Stratix IV GX-based Terasic TR4-230 FPGA development board. Experimental results obtained using six CNNs demonstrate significant energy savings (around 1.7× for Cam_edge and 3.5× for Cam_cloud) for minimal (< 1%) loss in application quality. Compared to approximating a single subsystem in isolation, AxIS achieves additional energy benefits of 1.6×--1.7× (Cam_edge) and 1.4×--3.4× (Cam_cloud) on average for minimal application-level quality loss.

References

[1]

Samuel Dodge and Lina Karam. 2016. Understanding how image quality affects deep neural networks. In Proc. QoMEX. 1--6.

[2]

Simon S. Du and Jason D. Lee. 2018. On the power of over-parametrization in neural networks with quadratic activation. In Proc. ICML. 1328--1337.

[3]

Sparsh Mittal. 2016. A survey of techniques for approximate computing. ACM Comput. Surveys 48, 4 (2016), 62:1--62:33.

[4]

Arnab Raha and Vijay Raghunathan. 2018. Approximating beyond the processor: Exploring full-system energy-accuracy tradeoffs in a smart camera system. IEEE Trans. VLSI Syst. 26, 12 (2018), 2884--2897.

[5]

Arnab Raha et al. 2017. Quality configurable approximate DRAM. IEEE Trans. Comput. 66, 7 (2017), 1172--1187.

[6]

Benoit Jacob et al. 2018. Quantization and training of neural networks for efficient integer-arithmetic-only inference. In Proc. CVPR. 2704--2713.

[7]

Duy Thanh Nguyen et al. 2018. An approximate memory architecture for a reduction of refresh power consumption in deep learning applications. In Proc. ISCAS. 1--5.

[8]

Huaijin G Chen et al. 2016. ASP vision: Optically computing the first layer of convolutional neural networks using angle sensitive pixels. In Proc. CVPR. 903--912.

[9]

Jia Guo et al. 2018. Efficient Image Sensor Subsampling for DNN-Based Image Classification. In Proc. ISLPED. 1--6.

[10]

Jiwei Yang et al. 2019. Quantization Networks. In Proc. CVPR. 7300--7308.

[11]

Minghai Qin et al. 2017. Robustness of neural networks against storage media errors. arXiv:1709.06173 (2017).

[12]

Neta Zmora et al. 2018. Neural Network Distiller.

[13]

Pavlo Molchanov et al. 2017. Pruning ConvNets for Resource Efficient Inference. In Proc. ICLR.

[14]

Qiang Xu et al. 2016. Approximate computing: A survey. IEEE Design & Test 33, 1 (2016), 8--22.

[15]

Song Han et al. 2016. Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding. In Proc. ICLR. 8300--8308.

[16]

Sherief Reda et al. 2019. Approximate Circuits, Methodologies and CAD. Springer.

[17]

Yang He et al. 2019. Asymptotic Soft Filter Pruning for Deep Convolutional Neural Networks. IEEE Trans. Cybern. (2019), 1--11.

[18]

Zhuo Chen et al. 2017. Image quality assessment guided deep neural networks training. arXiv:1708.03880 (2017).

[19]

Lita Yang and Boris Murmann. 2017. Approximate SRAM for energy-efficient, privacy-preserving convolutional neural networks. In Proc. ISVLSI. IEEE, 689--694.

[20]

Michael Zhu and Suyog Gupta. 2017. To prune, or not to prune: exploring the efficacy of pruning for model compression. arXiv:1710.01878 (2017).

Cited By

Ghosh SRaha ARaghunathan VRaghunathan A(2024)PArtNNer: Platform-Agnostic Adaptive Edge-Cloud DNN Partitioning for Minimizing End-to-End LatencyACM Transactions on Embedded Computing Systems10.1145/363026623:1(1-38)Online publication date: 10-Jan-2024
https://dl.acm.org/doi/10.1145/3630266
Afzali-Kusha HPedram M(2023)X-NVDLA: Runtime Accuracy Configurable NVDLA Based on Applying Voltage Overscaling to Computing and Memory UnitsIEEE Transactions on Circuits and Systems I: Regular Papers10.1109/TCSI.2023.324774370:5(1989-2002)Online publication date: May-2023
https://doi.org/10.1109/TCSI.2023.3247743
Arunachalam AKundu SRaha ABanerjee SNatarajan SBasu K(2023)A Novel Low-Power Compression Scheme for Systolic Array-Based Deep Learning AcceleratorsIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2022.319803642:4(1085-1098)Online publication date: Apr-2023
https://doi.org/10.1109/TCAD.2022.3198036
Show More Cited By

Index Terms

Approximate inference systems (AxIS): end-to-end approximations for energy-efficient inference at the edge
1. Computer systems organization
  1. Architectures
    1. Other architectures
      1. Neural networks
  2. Embedded and cyber-physical systems
    1. System on a chip
2. Computing methodologies
  1. Artificial intelligence
    1. Computer vision

Recommendations

Energy-Efficient Approximate Edge Inference Systems
The rapid proliferation of the Internet of Things and the dramatic resurgence of artificial intelligence based application workloads have led to immense interest in performing inference on energy-constrained edge devices. Approximate computing (a design ...
Hardware Approximate Techniques for Deep Neural Network Accelerators: A Survey
Deep Neural Networks (DNNs) are very popular because of their high performance in various cognitive tasks in Machine Learning (ML). Recent advancements in DNNs have brought levels beyond human accuracy in many tasks, but at the cost of high computational ...
Efficient Accuracy Recovery in Approximate Neural Networks by Systematic Error Modelling
ASPDAC '21: Proceedings of the 26th Asia and South Pacific Design Automation Conference

Approximate Computing is a promising paradigm for mitigating the computational demands of Deep Neural Networks (DNNs), by leveraging DNN performance and area, throughput or power. The DNN accuracy, affected by such approximations, can be then ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ISLPED '20: Proceedings of the ACM/IEEE International Symposium on Low Power Electronics and Design

August 2020

263 pages

ISBN:9781450370530

DOI:10.1145/3370748

Conference Chairs:
David Atienza Alonso
EPFL
,
Qinru Qiu
Syracuse Univ.
,
Program Chairs:
Sherief Reda
Brown Univ.
,
Yiran Chen
Duke Univ.

Copyright © 2020 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGDA: ACM Special Interest Group on Design Automation

In-Cooperation

IEEE CAS

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 10 August 2020

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

ISLPED '20

Sponsor:

SIGDA

ISLPED '20: ACM/IEEE International Symposium on Low Power Electronics and Design

August 10 - 12, 2020

Massachusetts, Boston

Acceptance Rates

Overall Acceptance Rate 398 of 1,159 submissions, 34%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

10
Total Citations
View Citations
477
Total Downloads

Downloads (Last 12 months)89
Downloads (Last 6 weeks)4

Reflects downloads up to 11 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Ghosh SRaha ARaghunathan VRaghunathan A(2024)PArtNNer: Platform-Agnostic Adaptive Edge-Cloud DNN Partitioning for Minimizing End-to-End LatencyACM Transactions on Embedded Computing Systems10.1145/363026623:1(1-38)Online publication date: 10-Jan-2024
https://dl.acm.org/doi/10.1145/3630266
Afzali-Kusha HPedram M(2023)X-NVDLA: Runtime Accuracy Configurable NVDLA Based on Applying Voltage Overscaling to Computing and Memory UnitsIEEE Transactions on Circuits and Systems I: Regular Papers10.1109/TCSI.2023.324774370:5(1989-2002)Online publication date: May-2023
https://doi.org/10.1109/TCSI.2023.3247743
Arunachalam AKundu SRaha ABanerjee SNatarajan SBasu K(2023)A Novel Low-Power Compression Scheme for Systolic Array-Based Deep Learning AcceleratorsIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2022.319803642:4(1085-1098)Online publication date: Apr-2023
https://doi.org/10.1109/TCAD.2022.3198036
Castro-Godínez JHanif MShafique M(2023)Cross-Layer Approximations for System-Level Optimizations: Challenges and Opportunities2023 53rd Annual IEEE/IFIP International Conference on Dependable Systems and Networks Workshops (DSN-W)10.1109/DSN-W58399.2023.00046(163-166)Online publication date: Jun-2023
https://doi.org/10.1109/DSN-W58399.2023.00046
Hanif MShafique M(2023)Cross-Layer Optimizations for Efficient Deep Learning Inference at the EdgeEmbedded Machine Learning for Cyber-Physical, IoT, and Edge Computing10.1007/978-3-031-39932-9_9(225-248)Online publication date: 10-Oct-2023
https://doi.org/10.1007/978-3-031-39932-9_9
Raha ASung RGhosh SGupta PMathaikutty DCheema UHyland KBrick CRaghunathan V(2023)Efficient Hardware Acceleration of Emerging Neural Networks for Embedded Machine Learning: An Industry PerspectiveEmbedded Machine Learning for Cyber-Physical, IoT, and Edge Computing10.1007/978-3-031-19568-6_5(121-172)Online publication date: 1-Oct-2023
https://doi.org/10.1007/978-3-031-19568-6_5
Spagnolo FPerri SCorsonello P(2022)Approximate Down-Sampling Strategy for Power-Constrained Intelligent SystemsIEEE Access10.1109/ACCESS.2022.314229210(7073-7081)Online publication date: 2022
https://doi.org/10.1109/ACCESS.2022.3142292
Raha AKim SMathaikutty DVenkataramanan GMohapatra DSung RBrick CChinya G(2021)Design Considerations for Edge Neural Network Accelerators: An Industry Perspective2021 34th International Conference on VLSI Design and 2021 20th International Conference on Embedded Systems (VLSID)10.1109/VLSID51830.2021.00061(328-333)Online publication date: Feb-2021
https://doi.org/10.1109/VLSID51830.2021.00061
Raha AGhosh SMohapatra DMathaikutty DSung RBrick CRaghunathan V(2021)Special Session: Approximate TinyML Systems: Full System Approximations for Extreme Energy-Efficiency in Intelligent Edge Devices2021 IEEE 39th International Conference on Computer Design (ICCD)10.1109/ICCD53106.2021.00015(13-16)Online publication date: Oct-2021
https://doi.org/10.1109/ICCD53106.2021.00015
Nguyen DHo NLe MWong WChang I(2021)ZEM: Zero-Cycle Bit-Masking Module for Deep Learning Refresh-Less DRAMIEEE Access10.1109/ACCESS.2021.30888939(93723-93733)Online publication date: 2021
https://doi.org/10.1109/ACCESS.2021.3088893

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents