Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3639856.3639873acmotherconferencesArticle/Chapter ViewAbstractPublication PagesaimlsystemsConference Proceedingsconference-collections
research-article
Open access

SplitEE: Early Exit in Deep Neural Networks with Split Computing

Published: 17 May 2024 Publication History

Abstract

Deep Neural Networks (DNNs) have drawn attention because of their outstanding performance on various tasks. However, deploying full-fledged DNNs in resource-constrained devices (edge, mobile, IoT) is difficult due to their large size. To overcome the issue, various approaches are considered, like offloading part of the computation to the cloud for final inference (split computing) or performing the inference at an intermediary layer without passing through all layers (early exits). In this work, we propose combining both approaches by using early exits in split computing. In our approach, we decide up to what depth of DNNs computation to perform on the device (splitting layer) and whether a sample can exit from this layer or need to be offloaded. The decisions are based on a weighted combination of accuracy, computational, and communication costs. We develop an algorithm named SplitEE to learn an optimal policy. Since pre-trained DNNs are often deployed in new domains where the ground truths may be unavailable, and samples arrive in a streaming fashion, SplitEE works in an online and unsupervised setup. We extensively perform experiments on five different datasets. SplitEE achieves a significant cost reduction (50%) with a slight drop in accuracy (< 2%) as compared to the case when all samples are inferred at the final layer. The anonymized source code is available at https://github.com/Div290/SplitEE.

References

[1]
Nabiha Asghar. 2016. Yelp dataset challenge: Review rating prediction. arXiv preprint arXiv:1605.05362 (2016).
[2]
Peter Auer 2002. Finite-time Analysis of the Multiarmed Bandit Problem. Machine Learning 47 (2002), 235–256.
[3]
Peter Auer, Nicolo Cesa-Bianchi, and Paul Fischer. 2002. Finite-time analysis of the multiarmed bandit problem. Machine learning 47, 2 (2002), 235–256.
[4]
Xin Dai, Xiangnan Kong, and Tian Guo. 2020. EPNet: Learning to exit with flexible multi-branch network. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management. 235–244.
[5]
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).
[6]
Amir Erfan Eshratifar, Amirhossein Esmaili, and Massoud Pedram. 2019. Bottlenet: A deep learning architecture for intelligent mobile cloud computing services. In 2019 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED). IEEE, 1–6.
[7]
Biyi Fang, Xiao Zeng, Faen Zhang, Hui Xu, and Mi Zhang. 2020. Flexdnn: Input-adaptive on-device deep learning for efficient mobile vision. In 2020 IEEE/ACM Symposium on Edge Computing (SEC). IEEE, 84–95.
[8]
Manjesh K Hanawal, Avinash Bhardwaj, 2022. Unsupervised Early Exit in DNNs with Multiple Exits. arXiv preprint arXiv:2209.09480 (2022).
[9]
Weiyu Ju, Wei Bao, Liming Ge, and Dong Yuan. 2021. Dynamic Early Exit Scheduling for Deep Neural Network Inference through Contextual Bandits. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management. 823–832.
[10]
Weiyu Ju, Wei Bao, Dong Yuan, Liming Ge, and Bing Bing Zhou. 2021. Learning Early Exit for Deep Neural Network Inference on Mobile Devices through Multi-Armed Bandits. In 2021 IEEE/ACM 21st International Symposium on Cluster, Cloud and Internet Computing (CCGrid). IEEE, 11–20.
[11]
Yiping Kang, Johann Hauswald, Cao Gao, Austin Rovinski, 2017. Neurosurgeon: Collaborative intelligence between the cloud and mobile edge. In ACM Computer Architecture News, Vol. 45. 615–629.
[12]
Geonho Kim and Jongsun Park. 2020. Low Cost Early Exit Decision Unit Design for CNN Accelerator. In 2020 International SoC Design Conference (ISOCC). IEEE, 127–128.
[13]
Zhufang Kuang, Linfeng Li, Jie Gao, Lian Zhao, and Anfeng Liu. 2019. Partial offloading scheduling and power allocation for mobile edge computing systems. IEEE Internet of Things Journal 6, 4 (2019), 6774–6785.
[14]
Stefanos Laskaridis, Stylianos I Venieris, Mario Almeida, Ilias Leontiadis, and Nicholas D Lane. 2020. SPINN: synergistic progressive inference of neural networks over device and cloud. In Proceedings of the 26th annual international conference on mobile computing and networking. 1–15.
[15]
En Li, Liekang Zeng, Zhi Zhou, and Xu Chen. 2019. Edge AI: On-demand accelerating deep neural network inference via edge computing. IEEE Transactions on Wireless Communications 19, 1 (2019), 447–457.
[16]
Xiangyang Liu, Tianxiang Sun, Junliang He, Lingling Wu, Xinyu Zhang, Hao Jiang, Zhao Cao, Xuanjing Huang, and Xipeng Qiu. 2021. Towards Efficient NLP: A Standard Evaluation and A Strong Baseline. (2021). https://arxiv.org/abs/2110.07038
[17]
Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019. Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692 (2019).
[18]
Andrew Maas, Raymond E Daly, Peter T Pham, Dan Huang, Andrew Y Ng, and Christopher Potts. 2011. Learning word vectors for sentiment analysis. In Proceedings of the 49th annual meeting of the association for computational linguistics: Human language technologies. 142–150.
[19]
Yoshitomo Matsubara, Sabur Baidya, Davide Callegaro, Marco Levorato, and Sameer Singh. 2019. Distilled split deep neural networks for edge-assisted real-time systems. In Proceedings of the 2019 Workshop on Hot Topics in Video Analytics and Intelligent Edges. 21–26.
[20]
Yoshitomo Matsubara, Davide Callegaro, Sameer Singh, Marco Levorato, and Francesco Restuccia. 2022. Bottlefit: Learning compressed representations in deep neural networks for effective and efficient split computing. In 2022 IEEE 23rd International Symposium on a World of Wireless, Mobile and Multimedia Networks (WoWMoM). IEEE, 337–346.
[21]
Yoshitomo Matsubara and Marco Levorato. 2021. Neural compression and filtering for edge-assisted real-time object detection in challenged networks. In 2020 25th International Conference on Pattern Recognition (ICPR). IEEE, 2272–2279.
[22]
Yoshitomo Matsubara, Marco Levorato, and Francesco Restuccia. 2022. Split computing and early exiting for deep learning applications: Survey and research challenges. Comput. Surveys 55, 5 (2022), 1–30.
[23]
Roberto G Pacheco, Fernanda DVR Oliveira, and Rodrigo S Couto. 2021. Early-exit deep neural networks for distorted images: Providing an efficient edge offloading. In 2021 IEEE Global Communications Conference (GLOBECOM). IEEE, 1–6.
[24]
ME Peters, M Neumann, M Iyyer, M Gardner, C Clark, K Lee, and L Zettlemoyer. 1802. Deep contextualized word representations. arXiv. arXiv (1802).
[25]
Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, Ilya Sutskever, 2019. Language models are unsupervised multitask learners. OpenAI blog 1, 8 (2019), 9.
[26]
Marion Sbai, Muhamad Risqi U Saputra, Niki Trigoni, and Andrew Markham. 2021. Cut, distil and encode (cde): Split cloud-edge deep inference. In 2021 18th Annual IEEE International Conference on Sensing, Communication, and Networking (SECON). IEEE, 1–9.
[27]
Jiawei Shao and Jun Zhang. 2020. Bottlenet++: An end-to-end approach for feature compression in device-edge co-inference systems. In 2020 IEEE International Conference on Communications Workshops (ICC Workshops). IEEE, 1–6.
[28]
Surat Teerapittayanon, Bradley McDanel, and Hsiang-Tsung Kung. 2016. Branchynet: Fast inference via early exiting from deep neural networks. In 2016 23rd International Conference on Pattern Recognition (ICPR). IEEE, 2464–2469.
[29]
Alex Wang, Amanpreet Singh, Julian Michael, Felix Hill, Omer Levy, and Samuel R. Bowman. 2019. GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding. In the Proceedings of ICLR.
[30]
Meiqi Wang, Jianqiao Mo, Jun Lin, Zhongfeng Wang, and Li Du. 2019. Dynexit: A dynamic early-exit strategy for deep residual networks. In 2019 IEEE International Workshop on Signal Processing Systems (SiPS). IEEE, 178–183.
[31]
Zizhao Wang, Wei Bao, Dong Yuan, Liming Ge, Nguyen H Tran, and Albert Y Zomaya. 2019. SEE: Scheduling early exit for mobile DNN inference during service outage. In Proceedings of the 22nd International ACM Conference on Modeling, Analysis and Simulation of Wireless and Mobile Systems. 279–288.
[32]
Ji Xin, Raphael Tang, Jaejun Lee, Yaoliang Yu, and Jimmy Lin. 2020. DeeBERT: Dynamic Early Exiting for Accelerating BERT Inference. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Online, 2246–2251. https://doi.org/10.18653/v1/2020.acl-main.204
[33]
Ji Xin, Raphael Tang, Jaejun Lee, Yaoliang Yu, and Jimmy Lin. 2020. DeeBERT: Dynamic early exiting for accelerating BERT inference. arXiv preprint arXiv:2004.12993 (2020).
[34]
Ji Xin, Raphael Tang, Yaoliang Yu, and Jimmy Lin. 2021. BERxiT: Early exiting for BERT with better fine-tuning and extension to regression. In Proceedings of the 16th conference of the European chapter of the association for computational linguistics: Main Volume. 91–104.
[35]
Zhilin Yang, Zihang Dai, Yiming Yang, Jaime Carbonell, Russ R Salakhutdinov, and Quoc V Le. 2019. Xlnet: Generalized autoregressive pretraining for language understanding. Advances in neural information processing systems 32 (2019).
[36]
Shuochao Yao, Jinyang Li, Dongxin Liu, Tianshi Wang, Shengzhong Liu, Huajie Shao, and Tarek Abdelzaher. 2020. Deep compressive offloading: Speeding up neural network inference by trading edge computation for network latency. In Proceedings of the 18th conference on embedded networked sensor systems. 476–488.
[37]
Zhen Zhang, Wei Zhu, Jinfan Zhang, Peng Wang, Rize Jin, and Tae-Sun Chung. 2022. Pcee-bert: Accelerating bert inference via patient and confident early exiting. In Findings of the Association for Computational Linguistics: NAACL 2022. 327–338.
[38]
Wangchunshu Zhou, Canwen Xu, Tao Ge, Julian McAuley, Ke Xu, and Furu Wei. 2020. Bert loses patience: Fast and robust inference with early exit. Advances in Neural Information Processing Systems 33 (2020), 18330–18341.
[39]
Wei Zhu. 2021. LeeBERT: Learned early exit for BERT with cross-level optimization. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). 2968–2980.

Cited By

View all
  • (2024)To Exit or Not to Exit: Cost-Effective Early-Exit Architecture Based on Markov Decision ProcessMathematics10.3390/math1214226312:14(2263)Online publication date: 19-Jul-2024
  • (2024)Beyond Federated Learning for IoT: Efficient Split Learning With Caching and Model CustomizationIEEE Internet of Things Journal10.1109/JIOT.2024.342466011:20(32617-32630)Online publication date: 15-Oct-2024
  • (2024)EdgeBoost: Confidence Boosting for Resource Constrained Inference via Selective Offloading2024 20th International Conference on Distributed Computing in Smart Systems and the Internet of Things (DCOSS-IoT)10.1109/DCOSS-IoT61029.2024.00013(11-18)Online publication date: 29-Apr-2024

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
AIMLSystems '23: Proceedings of the Third International Conference on AI-ML Systems
October 2023
381 pages
ISBN:9798400716492
DOI:10.1145/3639856
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 May 2024

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Early-exit in DNNs
  2. Multi-Exit DNNs
  3. Multi-armed Bandits
  4. Unsupervised online learning

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

AIMLSystems 2023

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)225
  • Downloads (Last 6 weeks)46
Reflects downloads up to 25 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)To Exit or Not to Exit: Cost-Effective Early-Exit Architecture Based on Markov Decision ProcessMathematics10.3390/math1214226312:14(2263)Online publication date: 19-Jul-2024
  • (2024)Beyond Federated Learning for IoT: Efficient Split Learning With Caching and Model CustomizationIEEE Internet of Things Journal10.1109/JIOT.2024.342466011:20(32617-32630)Online publication date: 15-Oct-2024
  • (2024)EdgeBoost: Confidence Boosting for Resource Constrained Inference via Selective Offloading2024 20th International Conference on Distributed Computing in Smart Systems and the Internet of Things (DCOSS-IoT)10.1109/DCOSS-IoT61029.2024.00013(11-18)Online publication date: 29-Apr-2024

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Login options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media