research-article

Quantized NNs as the definitive solution for inference on low-power ARM MCUs?: work-in-progress

Authors:

Manuele Rusci,

Alessandro Capotondi,

Francesco Conti,

Luca BeniniAuthors Info & Claims

CODES '18: Proceedings of the International Conference on Hardware/Software Codesign and System Synthesis

Article No.: 12, Pages 1 - 2

Published: 30 September 2018 Publication History

Get Access

Abstract

High energy efficiency and low memory footprint are the key requirements for the deployment of deep learning based analytics on low-power microcontrollers. Here we present work-in-progress results with Q-bit Quantized Neural Networks (QNNs) deployed on a commercial Cortex-M7 class microcontroller by means of an extension to the ARM CMSIS-NN library. We show that i) for Q = 4 and Q = 2 low memory footprint QNNs can be deployed with an energy overhead of 30% and 36% respectively against the 8-bit CMSIS-NN due to the lack of quantization support in the ISA; ii) for Q = 1 native instructions can be used, yielding an energy and latency reduction of ~3.8× with respect to CMSIS-NN. Our initial results suggest that a small set of QNN-related specialized instructions could improve performance by as much as 7.5× for Q = 4, 13.6× for Q = 2 and 6.5× for binary NNs.

References

[1]

L. Lai et al. Cmsis-nn: Efficient neural network kernels for arm cortex-m cpus. arXiv:1801.06601, 2018.

Google Scholar

[2]

I. Hubara et al. Quantized neural networks: Training neural networks with low precision weights and activations. arXiv:1609.07061, 2016.

Google Scholar

[3]

B. Moons et al. Minimum energy quantized neural networks. In 2017 51st Asilomar Conference on Signals, Systems, and Computers, pages 1921--1925, Oct 2017.

Crossref

Google Scholar

[4]

T. B. Preußer et al. Inference of quantized neural networks on heterogeneous all-programmable devices. In 2018 Design, Automation Test in Europe Conference Exhibition (DATE), pages 833--838, March 2018.

Google Scholar

[5]

M. Gautschi et al. Near-Threshold RISC-V Core With DSP Extensions for Scalable IoT Endpoint Devices. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 25(10):2700--2713, October 2017.

Digital Library

Google Scholar

[6]

M. Rusci et al. Design automation for binarized neural networks: A quantum leap opportunity? In Circuits and Systems (ISCAS), 2018 IEEE International Symposium on. IEEE, 2018.

Crossref

Google Scholar

Recommendations

Low overhead dynamic binary translation on ARM
PLDI '17

The ARMv8 architecture introduced AArch64, a 64-bit execution mode with a new instruction set, while retaining binary compatibility with previous versions of the ARM architecture through AArch32, a 32-bit execution mode. Most hardware implementations ...
Low overhead dynamic binary translation on ARM
PLDI 2017: Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation

The ARMv8 architecture introduced AArch64, a 64-bit execution mode with a new instruction set, while retaining binary compatibility with previous versions of the ARM architecture through AArch32, a 32-bit execution mode. Most hardware implementations ...
Enabling mixed-precision quantized neural networks in extreme-edge devices
CF '20: Proceedings of the 17th ACM International Conference on Computing Frontiers

The deployment of Quantized Neural Networks (QNN) on advanced microcontrollers requires optimized software to exploit digital signal processing (DSP) extensions of modern instruction set architectures (ISA). As such, recent research proposed optimized ...

Comments

Information & Contributors

Information

Published In

CODES '18: Proceedings of the International Conference on Hardware/Software Codesign and System Synthesis

September 2018

64 pages

ISBN:9781538655627

Program Chairs:
Aviral Shrivastava
Arizona State University
,
Sudeep Pasricha
Colorado State University

In-Cooperation

CEDA
IEEE CAS

Publisher

IEEE Press

Publication History

Published: 30 September 2018

Check for updates

Author Tags

Qualifiers

Research-article

Conference

ESWEEK '18

Sponsor:

ESWEEK '18: Fourteenth Embedded Systems Week

September 30 - October 5, 2018

Turin, Italy

Acceptance Rates

Overall Acceptance Rate 280 of 864 submissions, 32%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
181
Total Downloads

Downloads (Last 12 months)16
Downloads (Last 6 weeks)2

Reflects downloads up to 28 Oct 2024

Other Metrics

View Author Metrics

Citations

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Recommendations

Low overhead dynamic binary translation on ARM

Low overhead dynamic binary translation on ARM

Enabling mixed-precision quantized neural networks in extreme-edge devices