Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3469116.3470012acmconferencesArticle/Chapter ViewAbstractPublication PagesmobisysConference Proceedingsconference-collections
short-paper

Adaptive Inference through Early-Exit Networks: Design, Challenges and Directions

Published: 24 June 2021 Publication History

Abstract

DNNs are becoming less and less over-parametrised due to recent advances in efficient model design, through careful hand-crafted or NAS-based methods. Relying on the fact that not all inputs require the same amount of computation to yield a confident prediction, adaptive inference is gaining attention as a prominent approach for pushing the limits of efficient deployment. Particularly, early-exit networks comprise an emerging direction for tailoring the computation depth of each input sample at runtime, offering complementary performance gains to other efficiency optimisations. In this paper, we decompose the design methodology of early-exit networks to its key components and survey the recent advances in each one of them. We also position early-exiting against other efficient inference solutions and provide our insights on the current challenges and most promising future directions for research in the field.

References

[1]
Mario Almeida et al. 2019. EmBench: Quantifying Performance Variations of Deep Neural Networks Across Modern Commodity Devices. In EMDL.
[2]
Konstantin Berestizshevsky et al. 2019. Dynamically sacrificing accuracy for reduced computation: Cascaded inference based on softmax confidence. In ICANN.
[3]
Alsallakh Bilal et al. 2017. Do convolutional neural networks learn class hierarchy? IEEE Transactions on Visualization and Computer Graphics 24, 1 (2017).
[4]
Vanderlei Bonato and Christos Bouganis. 2021. Class-specific early exit design methodology for convolutional neural networks. Applied Soft Computing (2021).
[5]
Sanyuan Chen et al. 2021. Don't shoot butterfly with rifles: Multi-channel Continuous Speech Separation with Early Exit Transformer. In ICASSP.
[6]
Xinshi Chen et al. 2020. Learning to stop while learning to predict. In ICML.
[7]
Zhourong Chen, Yang Li, Samy Bengio, and Si Si. 2019. You look twice: Gaternet for dynamic filter selection in CNNs. In CVPR.
[8]
Xin Dai, Xiangnan Kong, and Tian Guo. 2020. EPNet: Learning to Exit with Flexible Multi-Branch Network. In CIKM.
[9]
Lei Deng et al. 2020. Model compression and hardware acceleration for neural networks: A comprehensive survey. Proc. IEEE 108, 4 (2020).
[10]
Maha Elbayad et al. 2020. Depth-Adaptive Transformer. In ICLR.
[11]
Biyi Fang et al. 2018. NestDNN: Resource-Aware Multi-Tenant On-Device Deep Learning for Continuous Mobile Vision. In MobiCom.
[12]
Biyi Fang et al. 2020. FlexDNN: Input-Adaptive On-Device Deep Learning for Efficient Mobile Vision. In Symposium on Edge Computing (SEC).
[13]
Mohammad Farhadi et al. 2019. A novel design of adaptive and hierarchical convolutional neural networks using partial reconfiguration on FPGA. In HPEC.
[14]
Yarin Gal and Zoubin Ghahramani. 2016. Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In ICML.
[15]
Xitong Gao, Yiren Zhao, Łukasz Dudziak, Robert Mullins, and Cheng-zhong Xu. 2019. Dynamic channel pruning: Feature boosting and suppression. ICLR.
[16]
Amir Gholami et al. 2021. A Survey of Quantization Methods for Efficient Neural Network Inference. arXiv preprint arXiv:2103.13630 (2021).
[17]
Alperen Gormez and Erdem Koyuncu. 2021. Class Means as an Early Exit Decision Mechanism. arXiv preprint arXiv:2103.01148 (2021).
[18]
Chuan Guo et al. 2017. On Calibration of Modern Neural Networks. In ICML.
[19]
Song Han et al. 2016. Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding. ICLR (2016).
[20]
Seungyeop Han et al. 2016. MCDNN: An Approximation-Based Execution Framework for Deep Stream Processing Under Resource Constraints. In MobiSys.
[21]
Yizeng Han et al. 2021. Dynamic neural networks: A survey. arXiv:2102.04906
[22]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2015. Deep Residual Learning for Image Recognition. (2015). arXiv:1512.03385
[23]
G. Hinton et al. 2015. Distilling the Knowledge in a Neural Network. In NIPSW.
[24]
Sanghyun Hong et al. 2021. A Panda? No, It's a Sloth: Slowdown Attacks on Adaptive Multi-Exit Neural Network Inference. In ICLR.
[25]
Samuel Horvath, Stefanos Laskaridis, et al. 2021. FjORD: Fair and Accurate Federated Learning under heterogeneous targets with Ordered Dropout. arXiv:2102.13451
[26]
Andrew G. Howard et al. 2017. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications. (2017). arXiv:1704.04861
[27]
Hanzhang Hu et al. 2019. Learning anytime predictions in neural networks via adaptive loss balancing. In AAAI.
[28]
Ping Hu et al. 2020. Temporally distributed networks for fast video semantic segmentation. In CVPR.
[29]
Ting-Kuei Hu et al. 2020. Triple Wins: Boosting Accuracy, Robustness and Efficiency Together by Enabling Input-Adaptive Inference. In ICLR.
[30]
Gao Huang et al. 2018. Multi-Scale Dense Networks for Resource Efficient Image Classification. In ICLR.
[31]
Andrey Ignatov et al. 2019. AI Benchmark: All About Deep Learning on Smart-phones in 2019. In ICCVW.
[32]
Heinrich Jiang, Been Kim, Melody Guan, and Maya Gupta. 2018. To Trust Or Not To Trust A Classifier. arXiv:1805.11783 (2018).
[33]
Junguang Jiang, Ximei Wang, Mingsheng Long, and Jianmin Wang. 2020. Resource Efficient Domain Adaptation. In ACM Int. Conf. on Multimedia.
[34]
Yiping Kang et al. 2017. Neurosurgeon: Collaborative Intelligence Between the Cloud and Mobile Edge. In ASPLOS.
[35]
Yigitcan Kaya, Sanghyun Hong, and Tudor Dumitras. 2019. Shallow-Deep Networks: Understanding and Mitigating Network Overthinking. In ICML.
[36]
Geonho Kim and Jongsun Park. 2020. Low Cost Early Exit Decision Unit Design for CNN Accelerator. In 2020 International SoC Design Conference (ISOCC).
[37]
Youngwoo Kim et al. 2020. A 0.22-0.89 mW Low-Power and Highly-Secure Always-On Face Recognition Processor With Adversarial Attack Prevention. IEEE Transactions on Circuits and Systems II: Express Briefs 67, 5 (2020).
[38]
Alexandros Kouris et al. 2018. CascadeCNN: Pushing the Performance Limits of Quantisation in Convolutional Neural Networks. In FPL.
[39]
Alexandros Kouris et al. 2020. A throughput-latency co-optimised cascade of convolutional neural network classifiers. In DATE.
[40]
Alexandros Kouris, Stylianos I. Venieris, Stefanos Laskaridis, and Nicholas D. Lane. 2021. Multi-Exit Semantic Segmentation Networks. (2021). arXiv:2106.03527
[41]
S. Laskaridis et al. 2020. HAPI: Hardware-Aware Progressive Inference. In ICCAD.
[42]
Stefanos Laskaridis et al. 2020. SPINN: Synergistic Progressive Inference of Neural Networks over Device and Cloud. In MobiCom.
[43]
Royson Lee et al. 2019. MobiSR: Efficient On-Device Super-Resolution Through Heterogeneous Mobile Processors. In MobiCom.
[44]
Ilias Leontiadis et al. 2021. It's Always Personal: Using Early Exits for Efficient On-Device CNN Personalisation (HotMobile).
[45]
E. Li et al. 2020. Edge AI: On-Demand Accelerating Deep Neural Network Inference via Edge Computing. In IEEE Trans. on Wireless Communications (TWC).
[46]
Fengfu Li and Bin Liu. 2016. Ternary Weight Networks. (2016). arXiv:1605.04711
[47]
Hao Li et al. 2019. Improved Techniques for Training Adaptive Deep Networks. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV).
[48]
Xiaoxiao Li et al. 2017. Not All Pixels Are Equal: Difficulty-Aware Semantic Segmentation via Deep Layer Cascade. In CVPR.
[49]
Ji Lin et al. 2017. Runtime Neural Pruning. In NeurIPS.
[50]
Hanxiao Liu et al. 2019. DARTS: Differentiable architecture search. ICLR (2019).
[51]
Jiayi Liu et al. 2020. Pruning Algorithms to Accelerate Convolutional Neural Networks for Edge Applications: A Survey. arXiv:2005.04275 (2020).
[52]
Lanlan Liu and Jia Deng. 2018. Dynamic Deep Neural Networks: Optimizing accuracy-efficiency trade-offs by selective execution. In AAAI.
[53]
Weijie Liu, Peng Zhou, Zhiruo Wang, Zhe Zhao, Haotang Deng, and Qi Ju. 2020. FastBERT: a Self-distilling BERT with Adaptive Inference Time. In ACL.
[54]
Yoshitomo Matsubara et al. 2021. Split computing and early exiting for deep learning applications: Survey and research challenges. arXiv:2103.04505
[55]
Ravi Teja Mullapudi et al. 2018. Hydranets: Specialized dynamic architectures for efficient inference. In CVPR.
[56]
Priyadarshini Panda, Abhronil Sengupta, and Kaushik Roy. 2016. Conditional deep learning for energy-efficient and enhanced pattern recognition. In DATE.
[57]
Debdeep Paul, Jawar Singh, and Jimson Mathew. 2019. Hardware-Software Co-design Approach for Deep Learning Inference. In ICSCC.
[58]
Mary Phuong and Christoph H. Lampert. 2019. Distillation-Based Training for Multi-Exit Architectures. In ICCV.
[59]
Mohammad Rastegari et al. 2016. Xnor-net: Imagenet classification using binary convolutional neural networks. In ECCV.
[60]
Simone Scardapane et al. 2020. Differentiable branching in deep networks for fast inference. In ICASSP.
[61]
Roy Schwartz et al.2020. The Right Tool for the Job: Matching Model and Instance Complexities. In ACL.
[62]
Sandra Servia-Rodríguez et al. 2017. Mobile Sensing at the Service of Mental Well-Being: A Large-Scale Longitudinal Study. WWW.
[63]
Jianghao Shen et al. 2020. Fractional skipping: Towards finer-grained dynamic CNN inference. In AAAI.
[64]
Luca Soldaini et al. 2020. The Cascade Transformer: an Application for Efficient Answer Sentence Selection. In ACL. 5697--5708.
[65]
Christian Szegedy et al. 2015. Going deeper with convolutions. In CVPR.
[66]
Mingxing Tan and Quoc Le. 2019. EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. In ICML.
[67]
Ben Taylor et al. 2018. Adaptive deep learning model selection on embedded systems. ACM SIGPLAN Notices 53, 6 (2018).
[68]
Surat Teerapittayanon, Bradley McDanel, and HT Kung. 2016. BranchyNet: Fast Inference via Early Exiting from Deep Neural Networks. In ICPR.
[69]
Ashish Vaswani et al. 2017. Attention is All you Need. In NeurIPS.
[70]
Andreas Veit and Serge Belongie. 2018. Convolutional networks with adaptive inference graphs. In ECCV.
[71]
Erwei Wang et al.2019. Deep neural network approximation for custom hardware: Where We've Been, Where We're going. ACM CSUR 52, 2 (2019).
[72]
Jindong Wang et al. 2019. Deep learning for sensor-based activity recognition: A survey. Pattern Recognition Letters 119 (2019).
[73]
Meiqi Wang, Jianqiao Mo, Jun Lin, Zhongfeng Wang, and Li Du. 2019. DynExit: A Dynamic Early-Exit Strategy for Deep Residual Networks. In SiPS.
[74]
Xin Wang et al.2017. Idk cascades: Fast deep learning by learning not to overthink. arXiv preprint arXiv:1706.00885 (2017).
[75]
Xin Wang, Fisher Yu, Zi-Yi Dou, Trevor Darrell, and Joseph E Gonzalez. 2018. Skipnet: Learning dynamic routing in convolutional networks. In ECCV.
[76]
Yue Wang et al. 2020. Dual Dynamic Inference: Enabling more efficient, adaptive, and controllable deep inference. Selected Topics in Signal Processing 14, 4 (2020).
[77]
Zhihua Wei et al. 2019. A self-adaptive cascade ConvNets model based on label relation mining. Neurocomputing 328 (2019).
[78]
Carole-Jean Wu et al. 2019. Machine learning at facebook: Understanding inference at the edge. In HPCA.
[79]
Zuxuan Wu et al. 2018. Blockdrop: Dynamic inference paths in residual networks. In CVPR.
[80]
Ji Xin et al. 2020. Early Exiting BERT for Efficient Document Ranking. In Proceedings of SustaiNLP. ACL. https://doi.org/10.18653/v1/2020.sustainlp-1.11
[81]
Ji Xin, Raphael Tang, Jaejun Lee, Yaoliang Yu, and Jimmy Lin. 2020. DeeBERT: Dynamic Early Exiting for Accelerating BERT Inference. In ACL.
[82]
Qunliang Xing et al. 2020. Early exit or not: resource-efficient blind quality enhancement for compressed images. In ECCV.
[83]
Le Yang, Yizeng Han, Xi Chen, Shiji Song, Jifeng Dai, and Gao Huang. 2020. Resolution adaptive networks for efficient inference. In CVPR.
[84]
Tien-Ju Yang et al. 2018. Netadapt: Platform-aware neural network adaptation for mobile applications. In ECCV.
[85]
Jiahui Yu et al. 2019. Slimmable Neural Networks. ICLR (2019).
[86]
Jiahui Yu and Thomas S Huang. 2019. Universally Slimmable Networks and improved training techniques. In ICCV.
[87]
Amir R Zamir et al. 2017. Feedback networks. In CVPR.
[88]
Linfeng Zhang et al. 2019. Be your own teacher: Improve the performance of convolutional neural networks via self distillation. In ICCV.
[89]
Wangchunshu Zhou et al. 2020. BERT Loses Patience: Fast and Robust Inference with Early Exit. In NeurIPS.

Cited By

View all
  • (2024)To Exit or Not to Exit: Cost-Effective Early-Exit Architecture Based on Markov Decision ProcessMathematics10.3390/math1214226312:14(2263)Online publication date: 19-Jul-2024
  • (2024)Adapting Neural Networks at Runtime: Current Trends in At-Runtime Optimizations for Deep LearningACM Computing Surveys10.1145/365728356:10(1-40)Online publication date: 14-May-2024
  • (2024)Mobile Foundation Model as FirmwareProceedings of the 30th Annual International Conference on Mobile Computing and Networking10.1145/3636534.3649361(279-295)Online publication date: 29-May-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
EMDL'21: Proceedings of the 5th International Workshop on Embedded and Mobile Deep Learning
June 2021
44 pages
ISBN:9781450385978
DOI:10.1145/3469116
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

In-Cooperation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 24 June 2021

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Short-paper
  • Research
  • Refereed limited

Conference

MobiSys '21
Sponsor:

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)450
  • Downloads (Last 6 weeks)42
Reflects downloads up to 01 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2024)To Exit or Not to Exit: Cost-Effective Early-Exit Architecture Based on Markov Decision ProcessMathematics10.3390/math1214226312:14(2263)Online publication date: 19-Jul-2024
  • (2024)Adapting Neural Networks at Runtime: Current Trends in At-Runtime Optimizations for Deep LearningACM Computing Surveys10.1145/365728356:10(1-40)Online publication date: 14-May-2024
  • (2024)Mobile Foundation Model as FirmwareProceedings of the 30th Annual International Conference on Mobile Computing and Networking10.1145/3636534.3649361(279-295)Online publication date: 29-May-2024
  • (2024)Fixing Overconfidence in Dynamic Neural Networks2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)10.1109/WACV57701.2024.00266(2668-2678)Online publication date: 3-Jan-2024
  • (2024)Adaptive Deep Neural Network Inference Optimization with EENet2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)10.1109/WACV57701.2024.00140(1362-1371)Online publication date: 3-Jan-2024
  • (2024)A Study on the Energy Sustainability of Early Exit Networks for Human Activity RecognitionIEEE Transactions on Sustainable Computing10.1109/TSUSC.2023.33032709:1(61-74)Online publication date: Jan-2024
  • (2024)Edge Intelligence for Internet of Vehicles: A SurveyIEEE Transactions on Consumer Electronics10.1109/TCE.2024.337850970:2(4858-4877)Online publication date: May-2024
  • (2024)AdaDet: An Adaptive Object Detection System Based on Early-Exit Neural NetworksIEEE Transactions on Cognitive and Developmental Systems10.1109/TCDS.2023.327421416:1(332-345)Online publication date: Feb-2024
  • (2024)RCIF: Toward Robust Distributed DNN Collaborative Inference Under Highly Lossy IoT NetworksIEEE Internet of Things Journal10.1109/JIOT.2024.339013111:15(25939-25949)Online publication date: 1-Aug-2024
  • (2024)Resource-aware Deployment of Dynamic DNNs over Multi-tiered Interconnected SystemsIEEE INFOCOM 2024 - IEEE Conference on Computer Communications10.1109/INFOCOM52122.2024.10621218(1621-1630)Online publication date: 20-May-2024
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media