research-article

Open access

SplitEE: Early Exit in Deep Neural Networks with Split Computing

Authors:

Divya Jyoti Bajpai,

Vivek Kumar Trivedi,

Manjesh Kumar HanawalAuthors Info & Claims

AIMLSystems '23: Proceedings of the Third International Conference on AI-ML Systems

Article No.: 17, Pages 1 - 9

https://doi.org/10.1145/3639856.3639873

Published: 17 May 2024 Publication History

All formats PDF

Abstract

Deep Neural Networks (DNNs) have drawn attention because of their outstanding performance on various tasks. However, deploying full-fledged DNNs in resource-constrained devices (edge, mobile, IoT) is difficult due to their large size. To overcome the issue, various approaches are considered, like offloading part of the computation to the cloud for final inference (split computing) or performing the inference at an intermediary layer without passing through all layers (early exits). In this work, we propose combining both approaches by using early exits in split computing. In our approach, we decide up to what depth of DNNs computation to perform on the device (splitting layer) and whether a sample can exit from this layer or need to be offloaded. The decisions are based on a weighted combination of accuracy, computational, and communication costs. We develop an algorithm named SplitEE to learn an optimal policy. Since pre-trained DNNs are often deployed in new domains where the ground truths may be unavailable, and samples arrive in a streaming fashion, SplitEE works in an online and unsupervised setup. We extensively perform experiments on five different datasets. SplitEE achieves a significant cost reduction (50%) with a slight drop in accuracy (< 2%) as compared to the case when all samples are inferred at the final layer. The anonymized source code is available at https://github.com/Div290/SplitEE.

References

[1]

Nabiha Asghar. 2016. Yelp dataset challenge: Review rating prediction. arXiv preprint arXiv:1605.05362 (2016).

[2]

Peter Auer 2002. Finite-time Analysis of the Multiarmed Bandit Problem. Machine Learning 47 (2002), 235–256.

Digital Library

[3]

Peter Auer, Nicolo Cesa-Bianchi, and Paul Fischer. 2002. Finite-time analysis of the multiarmed bandit problem. Machine learning 47, 2 (2002), 235–256.

Digital Library

[4]

Xin Dai, Xiangnan Kong, and Tian Guo. 2020. EPNet: Learning to exit with flexible multi-branch network. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management. 235–244.

Digital Library

[5]

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).

[6]

Amir Erfan Eshratifar, Amirhossein Esmaili, and Massoud Pedram. 2019. Bottlenet: A deep learning architecture for intelligent mobile cloud computing services. In 2019 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED). IEEE, 1–6.

[7]

Biyi Fang, Xiao Zeng, Faen Zhang, Hui Xu, and Mi Zhang. 2020. Flexdnn: Input-adaptive on-device deep learning for efficient mobile vision. In 2020 IEEE/ACM Symposium on Edge Computing (SEC). IEEE, 84–95.

[8]

Manjesh K Hanawal, Avinash Bhardwaj, 2022. Unsupervised Early Exit in DNNs with Multiple Exits. arXiv preprint arXiv:2209.09480 (2022).

[9]

Weiyu Ju, Wei Bao, Liming Ge, and Dong Yuan. 2021. Dynamic Early Exit Scheduling for Deep Neural Network Inference through Contextual Bandits. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management. 823–832.

Digital Library

[10]

Weiyu Ju, Wei Bao, Dong Yuan, Liming Ge, and Bing Bing Zhou. 2021. Learning Early Exit for Deep Neural Network Inference on Mobile Devices through Multi-Armed Bandits. In 2021 IEEE/ACM 21st International Symposium on Cluster, Cloud and Internet Computing (CCGrid). IEEE, 11–20.

[11]

Yiping Kang, Johann Hauswald, Cao Gao, Austin Rovinski, 2017. Neurosurgeon: Collaborative intelligence between the cloud and mobile edge. In ACM Computer Architecture News, Vol. 45. 615–629.

[12]

Geonho Kim and Jongsun Park. 2020. Low Cost Early Exit Decision Unit Design for CNN Accelerator. In 2020 International SoC Design Conference (ISOCC). IEEE, 127–128.

[13]

Zhufang Kuang, Linfeng Li, Jie Gao, Lian Zhao, and Anfeng Liu. 2019. Partial offloading scheduling and power allocation for mobile edge computing systems. IEEE Internet of Things Journal 6, 4 (2019), 6774–6785.

[14]

Stefanos Laskaridis, Stylianos I Venieris, Mario Almeida, Ilias Leontiadis, and Nicholas D Lane. 2020. SPINN: synergistic progressive inference of neural networks over device and cloud. In Proceedings of the 26th annual international conference on mobile computing and networking. 1–15.

Digital Library

[15]

En Li, Liekang Zeng, Zhi Zhou, and Xu Chen. 2019. Edge AI: On-demand accelerating deep neural network inference via edge computing. IEEE Transactions on Wireless Communications 19, 1 (2019), 447–457.

[16]

Xiangyang Liu, Tianxiang Sun, Junliang He, Lingling Wu, Xinyu Zhang, Hao Jiang, Zhao Cao, Xuanjing Huang, and Xipeng Qiu. 2021. Towards Efficient NLP: A Standard Evaluation and A Strong Baseline. (2021). https://arxiv.org/abs/2110.07038

[17]

Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019. Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692 (2019).

[18]

Andrew Maas, Raymond E Daly, Peter T Pham, Dan Huang, Andrew Y Ng, and Christopher Potts. 2011. Learning word vectors for sentiment analysis. In Proceedings of the 49th annual meeting of the association for computational linguistics: Human language technologies. 142–150.

Digital Library

[19]

Yoshitomo Matsubara, Sabur Baidya, Davide Callegaro, Marco Levorato, and Sameer Singh. 2019. Distilled split deep neural networks for edge-assisted real-time systems. In Proceedings of the 2019 Workshop on Hot Topics in Video Analytics and Intelligent Edges. 21–26.

Digital Library

[20]

Yoshitomo Matsubara, Davide Callegaro, Sameer Singh, Marco Levorato, and Francesco Restuccia. 2022. Bottlefit: Learning compressed representations in deep neural networks for effective and efficient split computing. In 2022 IEEE 23rd International Symposium on a World of Wireless, Mobile and Multimedia Networks (WoWMoM). IEEE, 337–346.

[21]

Yoshitomo Matsubara and Marco Levorato. 2021. Neural compression and filtering for edge-assisted real-time object detection in challenged networks. In 2020 25th International Conference on Pattern Recognition (ICPR). IEEE, 2272–2279.

[22]

Yoshitomo Matsubara, Marco Levorato, and Francesco Restuccia. 2022. Split computing and early exiting for deep learning applications: Survey and research challenges. Comput. Surveys 55, 5 (2022), 1–30.

Digital Library

[23]

Roberto G Pacheco, Fernanda DVR Oliveira, and Rodrigo S Couto. 2021. Early-exit deep neural networks for distorted images: Providing an efficient edge offloading. In 2021 IEEE Global Communications Conference (GLOBECOM). IEEE, 1–6.

Digital Library

[24]

ME Peters, M Neumann, M Iyyer, M Gardner, C Clark, K Lee, and L Zettlemoyer. 1802. Deep contextualized word representations. arXiv. arXiv (1802).

[25]

Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, Ilya Sutskever, 2019. Language models are unsupervised multitask learners. OpenAI blog 1, 8 (2019), 9.

[26]

Marion Sbai, Muhamad Risqi U Saputra, Niki Trigoni, and Andrew Markham. 2021. Cut, distil and encode (cde): Split cloud-edge deep inference. In 2021 18th Annual IEEE International Conference on Sensing, Communication, and Networking (SECON). IEEE, 1–9.

Digital Library

[27]

Jiawei Shao and Jun Zhang. 2020. Bottlenet++: An end-to-end approach for feature compression in device-edge co-inference systems. In 2020 IEEE International Conference on Communications Workshops (ICC Workshops). IEEE, 1–6.

[28]

Surat Teerapittayanon, Bradley McDanel, and Hsiang-Tsung Kung. 2016. Branchynet: Fast inference via early exiting from deep neural networks. In 2016 23rd International Conference on Pattern Recognition (ICPR). IEEE, 2464–2469.

[29]

Alex Wang, Amanpreet Singh, Julian Michael, Felix Hill, Omer Levy, and Samuel R. Bowman. 2019. GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding. In the Proceedings of ICLR.

[30]

Meiqi Wang, Jianqiao Mo, Jun Lin, Zhongfeng Wang, and Li Du. 2019. Dynexit: A dynamic early-exit strategy for deep residual networks. In 2019 IEEE International Workshop on Signal Processing Systems (SiPS). IEEE, 178–183.

[31]

Zizhao Wang, Wei Bao, Dong Yuan, Liming Ge, Nguyen H Tran, and Albert Y Zomaya. 2019. SEE: Scheduling early exit for mobile DNN inference during service outage. In Proceedings of the 22nd International ACM Conference on Modeling, Analysis and Simulation of Wireless and Mobile Systems. 279–288.

Digital Library

[32]

Ji Xin, Raphael Tang, Jaejun Lee, Yaoliang Yu, and Jimmy Lin. 2020. DeeBERT: Dynamic Early Exiting for Accelerating BERT Inference. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Online, 2246–2251. https://doi.org/10.18653/v1/2020.acl-main.204

[33]

Ji Xin, Raphael Tang, Jaejun Lee, Yaoliang Yu, and Jimmy Lin. 2020. DeeBERT: Dynamic early exiting for accelerating BERT inference. arXiv preprint arXiv:2004.12993 (2020).

[34]

Ji Xin, Raphael Tang, Yaoliang Yu, and Jimmy Lin. 2021. BERxiT: Early exiting for BERT with better fine-tuning and extension to regression. In Proceedings of the 16th conference of the European chapter of the association for computational linguistics: Main Volume. 91–104.

[35]

Zhilin Yang, Zihang Dai, Yiming Yang, Jaime Carbonell, Russ R Salakhutdinov, and Quoc V Le. 2019. Xlnet: Generalized autoregressive pretraining for language understanding. Advances in neural information processing systems 32 (2019).

[36]

Shuochao Yao, Jinyang Li, Dongxin Liu, Tianshi Wang, Shengzhong Liu, Huajie Shao, and Tarek Abdelzaher. 2020. Deep compressive offloading: Speeding up neural network inference by trading edge computation for network latency. In Proceedings of the 18th conference on embedded networked sensor systems. 476–488.

Digital Library

[37]

Zhen Zhang, Wei Zhu, Jinfan Zhang, Peng Wang, Rize Jin, and Tae-Sun Chung. 2022. Pcee-bert: Accelerating bert inference via patient and confident early exiting. In Findings of the Association for Computational Linguistics: NAACL 2022. 327–338.

[38]

Wangchunshu Zhou, Canwen Xu, Tao Ge, Julian McAuley, Ke Xu, and Furu Wei. 2020. Bert loses patience: Fast and robust inference with early exit. Advances in Neural Information Processing Systems 33 (2020), 18330–18341.

[39]

Wei Zhu. 2021. LeeBERT: Learned early exit for BERT with cross-level optimization. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). 2968–2980.

Cited By

Kim KLee H(2024)To Exit or Not to Exit: Cost-Effective Early-Exit Architecture Based on Markov Decision ProcessMathematics10.3390/math1214226312:14(2263)Online publication date: 19-Jul-2024
https://doi.org/10.3390/math12142263
Chawla MGupta GGaddam SWadhwa M(2024)Beyond Federated Learning for IoT: Efficient Split Learning With Caching and Model CustomizationIEEE Internet of Things Journal10.1109/JIOT.2024.342466011:20(32617-32630)Online publication date: 15-Oct-2024
https://doi.org/10.1109/JIOT.2024.3424660
Said NLandsiedel O(2024)EdgeBoost: Confidence Boosting for Resource Constrained Inference via Selective Offloading2024 20th International Conference on Distributed Computing in Smart Systems and the Internet of Things (DCOSS-IoT)10.1109/DCOSS-IoT61029.2024.00013(11-18)Online publication date: 29-Apr-2024
https://doi.org/10.1109/DCOSS-IoT61029.2024.00013

Index Terms

SplitEE: Early Exit in Deep Neural Networks with Split Computing

Index terms have been assigned to the content through auto-classification.

Recommendations

Early-Exit Deep Neural Network - A Comprehensive Survey
Deep neural networks (DNNs) typically have a single exit point that makes predictions by running the entire stack of neural layers. Since not all inputs require the same amount of computation to reach a confident prediction, recent research has focused on ...
Unsupervised Early Exit in DNNs with Multiple Exits
AIMLSystems '22: Proceedings of the Second International Conference on AI-ML Systems

Deep Neural Networks (DNNs) are generally designed as sequentially cascaded differentiable blocks/layers with a prediction module connected only to its last layer. DNNs can be attached with prediction modules at multiple points along the backbone where ...
Dynamic Early Exit Scheduling for Deep Neural Network Inference through Contextual Bandits
CIKM '21: Proceedings of the 30th ACM International Conference on Information & Knowledge Management

Recent advances in Deep Neural Networks (DNNs) have dramatically improved the accuracy of DNN inference, but also introduce larger latency. In this paper, we investigate how to utilize early exit, a novel method that allows inference to exit at earlier ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

AIMLSystems '23: Proceedings of the Third International Conference on AI-ML Systems

October 2023

381 pages

ISBN:9798400716492

DOI:10.1145/3639856

Copyright © 2023 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 May 2024

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

AIMLSystems 2023

AIMLSystems 2023: The Third International Conference on Artificial Intelligence and Machine Learning Systems

October 25 - 28, 2023

Bangalore, India

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

3
Total Citations
View Citations
225
Total Downloads

Downloads (Last 12 months)225
Downloads (Last 6 weeks)46

Reflects downloads up to 25 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Kim KLee H(2024)To Exit or Not to Exit: Cost-Effective Early-Exit Architecture Based on Markov Decision ProcessMathematics10.3390/math1214226312:14(2263)Online publication date: 19-Jul-2024
https://doi.org/10.3390/math12142263
Chawla MGupta GGaddam SWadhwa M(2024)Beyond Federated Learning for IoT: Efficient Split Learning With Caching and Model CustomizationIEEE Internet of Things Journal10.1109/JIOT.2024.342466011:20(32617-32630)Online publication date: 15-Oct-2024
https://doi.org/10.1109/JIOT.2024.3424660
Said NLandsiedel O(2024)EdgeBoost: Confidence Boosting for Resource Constrained Inference via Selective Offloading2024 20th International Conference on Distributed Computing in Smart Systems and the Internet of Things (DCOSS-IoT)10.1109/DCOSS-IoT61029.2024.00013(11-18)Online publication date: 29-Apr-2024
https://doi.org/10.1109/DCOSS-IoT61029.2024.00013

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Figures

Tables

Media

View Table of Conten