short-paper

Adaptive Inference through Early-Exit Networks: Design, Challenges and Directions

Authors:

Stefanos Laskaridis,

Alexandros Kouris,

Nicholas D. LaneAuthors Info & Claims

EMDL'21: Proceedings of the 5th International Workshop on Embedded and Mobile Deep Learning

Pages 1 - 6

https://doi.org/10.1145/3469116.3470012

Published: 24 June 2021 Publication History

Abstract

DNNs are becoming less and less over-parametrised due to recent advances in efficient model design, through careful hand-crafted or NAS-based methods. Relying on the fact that not all inputs require the same amount of computation to yield a confident prediction, adaptive inference is gaining attention as a prominent approach for pushing the limits of efficient deployment. Particularly, early-exit networks comprise an emerging direction for tailoring the computation depth of each input sample at runtime, offering complementary performance gains to other efficiency optimisations. In this paper, we decompose the design methodology of early-exit networks to its key components and survey the recent advances in each one of them. We also position early-exiting against other efficient inference solutions and provide our insights on the current challenges and most promising future directions for research in the field.

References

[1]

Mario Almeida et al. 2019. EmBench: Quantifying Performance Variations of Deep Neural Networks Across Modern Commodity Devices. In EMDL.

[2]

Konstantin Berestizshevsky et al. 2019. Dynamically sacrificing accuracy for reduced computation: Cascaded inference based on softmax confidence. In ICANN.

[3]

Alsallakh Bilal et al. 2017. Do convolutional neural networks learn class hierarchy? IEEE Transactions on Visualization and Computer Graphics 24, 1 (2017).

[4]

Vanderlei Bonato and Christos Bouganis. 2021. Class-specific early exit design methodology for convolutional neural networks. Applied Soft Computing (2021).

[5]

Sanyuan Chen et al. 2021. Don't shoot butterfly with rifles: Multi-channel Continuous Speech Separation with Early Exit Transformer. In ICASSP.

[6]

Xinshi Chen et al. 2020. Learning to stop while learning to predict. In ICML.

[7]

Zhourong Chen, Yang Li, Samy Bengio, and Si Si. 2019. You look twice: Gaternet for dynamic filter selection in CNNs. In CVPR.

[8]

Xin Dai, Xiangnan Kong, and Tian Guo. 2020. EPNet: Learning to Exit with Flexible Multi-Branch Network. In CIKM.

Digital Library

[9]

Lei Deng et al. 2020. Model compression and hardware acceleration for neural networks: A comprehensive survey. Proc. IEEE 108, 4 (2020).

[10]

Maha Elbayad et al. 2020. Depth-Adaptive Transformer. In ICLR.

[11]

Biyi Fang et al. 2018. NestDNN: Resource-Aware Multi-Tenant On-Device Deep Learning for Continuous Mobile Vision. In MobiCom.

Digital Library

[12]

Biyi Fang et al. 2020. FlexDNN: Input-Adaptive On-Device Deep Learning for Efficient Mobile Vision. In Symposium on Edge Computing (SEC).

[13]

Mohammad Farhadi et al. 2019. A novel design of adaptive and hierarchical convolutional neural networks using partial reconfiguration on FPGA. In HPEC.

[14]

Yarin Gal and Zoubin Ghahramani. 2016. Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In ICML.

Digital Library

[15]

Xitong Gao, Yiren Zhao, Łukasz Dudziak, Robert Mullins, and Cheng-zhong Xu. 2019. Dynamic channel pruning: Feature boosting and suppression. ICLR.

[16]

Amir Gholami et al. 2021. A Survey of Quantization Methods for Efficient Neural Network Inference. arXiv preprint arXiv:2103.13630 (2021).

[17]

Alperen Gormez and Erdem Koyuncu. 2021. Class Means as an Early Exit Decision Mechanism. arXiv preprint arXiv:2103.01148 (2021).

[18]

Chuan Guo et al. 2017. On Calibration of Modern Neural Networks. In ICML.

Digital Library

[19]

Song Han et al. 2016. Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding. ICLR (2016).

[20]

Seungyeop Han et al. 2016. MCDNN: An Approximation-Based Execution Framework for Deep Stream Processing Under Resource Constraints. In MobiSys.

Digital Library

[21]

Yizeng Han et al. 2021. Dynamic neural networks: A survey. arXiv:2102.04906

[22]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2015. Deep Residual Learning for Image Recognition. (2015). arXiv:1512.03385

[23]

G. Hinton et al. 2015. Distilling the Knowledge in a Neural Network. In NIPSW.

[24]

Sanghyun Hong et al. 2021. A Panda? No, It's a Sloth: Slowdown Attacks on Adaptive Multi-Exit Neural Network Inference. In ICLR.

[25]

Samuel Horvath, Stefanos Laskaridis, et al. 2021. FjORD: Fair and Accurate Federated Learning under heterogeneous targets with Ordered Dropout. arXiv:2102.13451

[26]

Andrew G. Howard et al. 2017. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications. (2017). arXiv:1704.04861

[27]

Hanzhang Hu et al. 2019. Learning anytime predictions in neural networks via adaptive loss balancing. In AAAI.

[28]

Ping Hu et al. 2020. Temporally distributed networks for fast video semantic segmentation. In CVPR.

[29]

Ting-Kuei Hu et al. 2020. Triple Wins: Boosting Accuracy, Robustness and Efficiency Together by Enabling Input-Adaptive Inference. In ICLR.

[30]

Gao Huang et al. 2018. Multi-Scale Dense Networks for Resource Efficient Image Classification. In ICLR.

[31]

Andrey Ignatov et al. 2019. AI Benchmark: All About Deep Learning on Smart-phones in 2019. In ICCVW.

[32]

Heinrich Jiang, Been Kim, Melody Guan, and Maya Gupta. 2018. To Trust Or Not To Trust A Classifier. arXiv:1805.11783 (2018).

Digital Library

[33]

Junguang Jiang, Ximei Wang, Mingsheng Long, and Jianmin Wang. 2020. Resource Efficient Domain Adaptation. In ACM Int. Conf. on Multimedia.

Digital Library

[34]

Yiping Kang et al. 2017. Neurosurgeon: Collaborative Intelligence Between the Cloud and Mobile Edge. In ASPLOS.

Digital Library

[35]

Yigitcan Kaya, Sanghyun Hong, and Tudor Dumitras. 2019. Shallow-Deep Networks: Understanding and Mitigating Network Overthinking. In ICML.

[36]

Geonho Kim and Jongsun Park. 2020. Low Cost Early Exit Decision Unit Design for CNN Accelerator. In 2020 International SoC Design Conference (ISOCC).

[37]

Youngwoo Kim et al. 2020. A 0.22-0.89 mW Low-Power and Highly-Secure Always-On Face Recognition Processor With Adversarial Attack Prevention. IEEE Transactions on Circuits and Systems II: Express Briefs 67, 5 (2020).

[38]

Alexandros Kouris et al. 2018. CascadeCNN: Pushing the Performance Limits of Quantisation in Convolutional Neural Networks. In FPL.

[39]

Alexandros Kouris et al. 2020. A throughput-latency co-optimised cascade of convolutional neural network classifiers. In DATE.

Digital Library

[40]

Alexandros Kouris, Stylianos I. Venieris, Stefanos Laskaridis, and Nicholas D. Lane. 2021. Multi-Exit Semantic Segmentation Networks. (2021). arXiv:2106.03527

[41]

S. Laskaridis et al. 2020. HAPI: Hardware-Aware Progressive Inference. In ICCAD.

Digital Library

[42]

Stefanos Laskaridis et al. 2020. SPINN: Synergistic Progressive Inference of Neural Networks over Device and Cloud. In MobiCom.

Digital Library

[43]

Royson Lee et al. 2019. MobiSR: Efficient On-Device Super-Resolution Through Heterogeneous Mobile Processors. In MobiCom.

Digital Library

[44]

Ilias Leontiadis et al. 2021. It's Always Personal: Using Early Exits for Efficient On-Device CNN Personalisation (HotMobile).

Digital Library

[45]

E. Li et al. 2020. Edge AI: On-Demand Accelerating Deep Neural Network Inference via Edge Computing. In IEEE Trans. on Wireless Communications (TWC).

[46]

Fengfu Li and Bin Liu. 2016. Ternary Weight Networks. (2016). arXiv:1605.04711

[47]

Hao Li et al. 2019. Improved Techniques for Training Adaptive Deep Networks. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV).

[48]

Xiaoxiao Li et al. 2017. Not All Pixels Are Equal: Difficulty-Aware Semantic Segmentation via Deep Layer Cascade. In CVPR.

[49]

Ji Lin et al. 2017. Runtime Neural Pruning. In NeurIPS.

Digital Library

[50]

Hanxiao Liu et al. 2019. DARTS: Differentiable architecture search. ICLR (2019).

[51]

Jiayi Liu et al. 2020. Pruning Algorithms to Accelerate Convolutional Neural Networks for Edge Applications: A Survey. arXiv:2005.04275 (2020).

[52]

Lanlan Liu and Jia Deng. 2018. Dynamic Deep Neural Networks: Optimizing accuracy-efficiency trade-offs by selective execution. In AAAI.

[53]

Weijie Liu, Peng Zhou, Zhiruo Wang, Zhe Zhao, Haotang Deng, and Qi Ju. 2020. FastBERT: a Self-distilling BERT with Adaptive Inference Time. In ACL.

[54]

Yoshitomo Matsubara et al. 2021. Split computing and early exiting for deep learning applications: Survey and research challenges. arXiv:2103.04505

[55]

Ravi Teja Mullapudi et al. 2018. Hydranets: Specialized dynamic architectures for efficient inference. In CVPR.

[56]

Priyadarshini Panda, Abhronil Sengupta, and Kaushik Roy. 2016. Conditional deep learning for energy-efficient and enhanced pattern recognition. In DATE.

Digital Library

[57]

Debdeep Paul, Jawar Singh, and Jimson Mathew. 2019. Hardware-Software Co-design Approach for Deep Learning Inference. In ICSCC.

[58]

Mary Phuong and Christoph H. Lampert. 2019. Distillation-Based Training for Multi-Exit Architectures. In ICCV.

[59]

Mohammad Rastegari et al. 2016. Xnor-net: Imagenet classification using binary convolutional neural networks. In ECCV.

[60]

Simone Scardapane et al. 2020. Differentiable branching in deep networks for fast inference. In ICASSP.

[61]

Roy Schwartz et al.2020. The Right Tool for the Job: Matching Model and Instance Complexities. In ACL.

[62]

Sandra Servia-Rodríguez et al. 2017. Mobile Sensing at the Service of Mental Well-Being: A Large-Scale Longitudinal Study. WWW.

Digital Library

[63]

Jianghao Shen et al. 2020. Fractional skipping: Towards finer-grained dynamic CNN inference. In AAAI.

[64]

Luca Soldaini et al. 2020. The Cascade Transformer: an Application for Efficient Answer Sentence Selection. In ACL. 5697--5708.

[65]

Christian Szegedy et al. 2015. Going deeper with convolutions. In CVPR.

[66]

Mingxing Tan and Quoc Le. 2019. EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. In ICML.

[67]

Ben Taylor et al. 2018. Adaptive deep learning model selection on embedded systems. ACM SIGPLAN Notices 53, 6 (2018).

Digital Library

[68]

Surat Teerapittayanon, Bradley McDanel, and HT Kung. 2016. BranchyNet: Fast Inference via Early Exiting from Deep Neural Networks. In ICPR.

[69]

Ashish Vaswani et al. 2017. Attention is All you Need. In NeurIPS.

Digital Library

[70]

Andreas Veit and Serge Belongie. 2018. Convolutional networks with adaptive inference graphs. In ECCV.

[71]

Erwei Wang et al.2019. Deep neural network approximation for custom hardware: Where We've Been, Where We're going. ACM CSUR 52, 2 (2019).

Digital Library

[72]

Jindong Wang et al. 2019. Deep learning for sensor-based activity recognition: A survey. Pattern Recognition Letters 119 (2019).

[73]

Meiqi Wang, Jianqiao Mo, Jun Lin, Zhongfeng Wang, and Li Du. 2019. DynExit: A Dynamic Early-Exit Strategy for Deep Residual Networks. In SiPS.

[74]

Xin Wang et al.2017. Idk cascades: Fast deep learning by learning not to overthink. arXiv preprint arXiv:1706.00885 (2017).

[75]

Xin Wang, Fisher Yu, Zi-Yi Dou, Trevor Darrell, and Joseph E Gonzalez. 2018. Skipnet: Learning dynamic routing in convolutional networks. In ECCV.

[76]

Yue Wang et al. 2020. Dual Dynamic Inference: Enabling more efficient, adaptive, and controllable deep inference. Selected Topics in Signal Processing 14, 4 (2020).

[77]

Zhihua Wei et al. 2019. A self-adaptive cascade ConvNets model based on label relation mining. Neurocomputing 328 (2019).

[78]

Carole-Jean Wu et al. 2019. Machine learning at facebook: Understanding inference at the edge. In HPCA.

[79]

Zuxuan Wu et al. 2018. Blockdrop: Dynamic inference paths in residual networks. In CVPR.

[80]

Ji Xin et al. 2020. Early Exiting BERT for Efficient Document Ranking. In Proceedings of SustaiNLP. ACL. https://doi.org/10.18653/v1/2020.sustainlp-1.11

[81]

Ji Xin, Raphael Tang, Jaejun Lee, Yaoliang Yu, and Jimmy Lin. 2020. DeeBERT: Dynamic Early Exiting for Accelerating BERT Inference. In ACL.

[82]

Qunliang Xing et al. 2020. Early exit or not: resource-efficient blind quality enhancement for compressed images. In ECCV.

[83]

Le Yang, Yizeng Han, Xi Chen, Shiji Song, Jifeng Dai, and Gao Huang. 2020. Resolution adaptive networks for efficient inference. In CVPR.

[84]

Tien-Ju Yang et al. 2018. Netadapt: Platform-aware neural network adaptation for mobile applications. In ECCV.

[85]

Jiahui Yu et al. 2019. Slimmable Neural Networks. ICLR (2019).

[86]

Jiahui Yu and Thomas S Huang. 2019. Universally Slimmable Networks and improved training techniques. In ICCV.

[87]

Amir R Zamir et al. 2017. Feedback networks. In CVPR.

[88]

Linfeng Zhang et al. 2019. Be your own teacher: Improve the performance of convolutional neural networks via self distillation. In ICCV.

[89]

Wangchunshu Zhou et al. 2020. BERT Loses Patience: Fast and Robust Inference with Early Exit. In NeurIPS.

Cited By

Kim KLee H(2024)To Exit or Not to Exit: Cost-Effective Early-Exit Architecture Based on Markov Decision ProcessMathematics10.3390/math1214226312:14(2263)Online publication date: 19-Jul-2024
https://doi.org/10.3390/math12142263
Sponner MWaschneck BKumar A(2024)Adapting Neural Networks at Runtime: Current Trends in At-Runtime Optimizations for Deep LearningACM Computing Surveys10.1145/365728356:10(1-40)Online publication date: 14-May-2024
https://dl.acm.org/doi/10.1145/3657283
Yuan JYang CCai DWang SYuan XZhang ZLi XZhang DMei HJia XWang SXu MGanesan DLane NShi W(2024)Mobile Foundation Model as FirmwareProceedings of the 30th Annual International Conference on Mobile Computing and Networking10.1145/3636534.3649361(279-295)Online publication date: 29-May-2024
https://dl.acm.org/doi/10.1145/3636534.3649361
Show More Cited By

Recommendations

Adaptive Exact Inference in Graphical Models

Many algorithms and applications involve repeatedly solving variations of the same inference problem, for example to introduce new evidence to the model or to change conditional dependencies. As the model is updated, the goal of adaptive inference is to ...
Bayesian Inference With Adaptive Fuzzy Priors and Likelihoods

Fuzzy rule-based systems can approximate prior and likelihood probabilities in Bayesian inference and thereby approximate posterior probabilities. This fuzzy approximation technique allows users to apply a much wider and more flexible range of prior and ...
Stochastic variational inference

We develop stochastic variational inference, a scalable algorithm for approximating posterior distributions. We develop this technique for a large class of probabilistic models and we demonstrate it with two probabilistic topic models, latent Dirichlet ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

EMDL'21: Proceedings of the 5th International Workshop on Embedded and Mobile Deep Learning

June 2021

44 pages

ISBN:9781450385978

DOI:10.1145/3469116

Copyright © 2021 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGMOBILE: ACM Special Interest Group on Mobility of Systems, Users, Data and Computing

In-Cooperation

SIGOPS: ACM Special Interest Group on Operating Systems

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 24 June 2021

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Short-paper
Research
Refereed limited

Conference

MobiSys '21

Sponsor:

SIGMOBILE

MobiSys '21: The 19th Annual International Conference on Mobile Systems, Applications, and Services

June 25, 2021

WI, Virtual, USA

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

36
Total Citations
View Citations
1,220
Total Downloads

Downloads (Last 12 months)450
Downloads (Last 6 weeks)42

Reflects downloads up to 01 Sep 2024

Other Metrics

View Author Metrics

Citations

Cited By

Kim KLee H(2024)To Exit or Not to Exit: Cost-Effective Early-Exit Architecture Based on Markov Decision ProcessMathematics10.3390/math1214226312:14(2263)Online publication date: 19-Jul-2024
https://doi.org/10.3390/math12142263
Sponner MWaschneck BKumar A(2024)Adapting Neural Networks at Runtime: Current Trends in At-Runtime Optimizations for Deep LearningACM Computing Surveys10.1145/365728356:10(1-40)Online publication date: 14-May-2024
https://dl.acm.org/doi/10.1145/3657283
Yuan JYang CCai DWang SYuan XZhang ZLi XZhang DMei HJia XWang SXu MGanesan DLane NShi W(2024)Mobile Foundation Model as FirmwareProceedings of the 30th Annual International Conference on Mobile Computing and Networking10.1145/3636534.3649361(279-295)Online publication date: 29-May-2024
https://dl.acm.org/doi/10.1145/3636534.3649361
Meronen LTrapp MPilzer AYang LSolin A(2024)Fixing Overconfidence in Dynamic Neural Networks2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)10.1109/WACV57701.2024.00266(2668-2678)Online publication date: 3-Jan-2024
https://doi.org/10.1109/WACV57701.2024.00266
Ilhan FChow KHu SHuang TTekin SWei WWu YLee MKompella RLatapie HLiu GLiu L(2024)Adaptive Deep Neural Network Inference Optimization with EENet2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)10.1109/WACV57701.2024.00140(1362-1371)Online publication date: 3-Jan-2024
https://doi.org/10.1109/WACV57701.2024.00140
Lattanzi EContoli CFreschi V(2024)A Study on the Energy Sustainability of Early Exit Networks for Human Activity RecognitionIEEE Transactions on Sustainable Computing10.1109/TSUSC.2023.33032709:1(61-74)Online publication date: Jan-2024
https://doi.org/10.1109/TSUSC.2023.3303270
Yan GLiu KLiu CZhang J(2024)Edge Intelligence for Internet of Vehicles: A SurveyIEEE Transactions on Consumer Electronics10.1109/TCE.2024.337850970:2(4858-4877)Online publication date: May-2024
https://doi.org/10.1109/TCE.2024.3378509
Yang LZheng ZWang JSong SHuang GLi F(2024)AdaDet: An Adaptive Object Detection System Based on Early-Exit Neural NetworksIEEE Transactions on Cognitive and Developmental Systems10.1109/TCDS.2023.327421416:1(332-345)Online publication date: Feb-2024
https://doi.org/10.1109/TCDS.2023.3274214
Cheng YZhang ZWang S(2024)RCIF: Toward Robust Distributed DNN Collaborative Inference Under Highly Lossy IoT NetworksIEEE Internet of Things Journal10.1109/JIOT.2024.339013111:15(25939-25949)Online publication date: 1-Aug-2024
https://doi.org/10.1109/JIOT.2024.3390131
Singhal CWu YMalandrino FLevorato MChiasserini C(2024)Resource-aware Deployment of Dynamic DNNs over Multi-tiered Interconnected SystemsIEEE INFOCOM 2024 - IEEE Conference on Computer Communications10.1109/INFOCOM52122.2024.10621218(1621-1630)Online publication date: 20-May-2024
https://doi.org/10.1109/INFOCOM52122.2024.10621218
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents