survey

Adapting Neural Networks at Runtime: Current Trends in At-Runtime Optimizations for Deep Learning

Authors: Max Sponner, Bernd Waschneck, and Akash KumarAuthors Info & Claims

ACM Computing Surveys, Volume 56, Issue 10

Article No.: 248, Pages 1 - 40

https://doi.org/10.1145/3657283

Published: 14 May 2024 Publication History

Abstract

Adaptive optimization methods for deep learning adjust the inference task to the current circumstances at runtime to improve the resource footprint while maintaining the model’s performance. These methods are essential for the widespread adoption of deep learning, as they offer a way to reduce the resource footprint of the inference task while also having access to additional information about the current environment. This survey covers the state-of-the-art at-runtime optimization methods, provides guidance for readers to choose the best method for their specific use-case, and also highlights current research gaps in this field.

References

[1]

Vahideh Akhlaghi, Amir Yazdanbakhsh, Kambiz Samadi, Rajesh K. Gupta, and Hadi Esmaeilzadeh. 2018. SnaPEA: Predictive early activation for reducing computation in deep convolutional neural networks. In 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA). 662–673. DOI:

Digital Library

[2]

Amjad Almahairi, Nicolas Ballas, Tim Cooijmans, Yin Zheng, Hugo Larochelle, and Aaron Courville. 2016. Dynamic capacity networks. In Proceedings of The 33rd International Conference on Machine Learning. PMLR, 2549–2558.

[3]

Humam Alwassel, Fabian Caba Heilbron, and Bernard Ghanem. 2018. Action search: Spotting actions in videos and its application to temporal action localization. In Proceedings of the European Conference on Computer Vision (ECCV). 251–266.

Digital Library

[4]

Udari De Alwis and Massimo Alioto. 2021. TempDiff: Temporal difference-based feature map-level sparsity induction in CNNs with \(\lt\) 4% memory overhead. In 2021 IEEE 3rd International Conference on Artificial Intelligence Circuits and Systems (AICAS). 1–4. DOI:

[5]

Manuel Amthor, Erik Rodner, and Joachim Denzler. 2016. Impatient DNNs - Deep Neural Networks with Dynamic Time Budgets. DOI:arxiv:1610.02850 [cs].

[6]

Kittipat Apicharttrisorn, Xukan Ran, Jiasi Chen, Srikanth V. Krishnamurthy, and Amit K. Roy-Chowdhury. 2019. Frugal following: Power thrifty object detection and tracking for mobile augmented reality. In Proceedings of the 17th Conference on Embedded Networked Sensor Systems (SenSys’19). Association for Computing Machinery, New York, NY, USA, 96–109. DOI:

Digital Library

[7]

Babak Ehteshami Bejnordi, Tijmen Blankevoort, and Max Welling. 2020. Batch-Shaping for Learning Conditional Channel Gated Networks. DOI:arxiv:1907.06627 [cs, stat].

[8]

Emmanuel Bengio, Pierre-Luc Bacon, Joelle Pineau, and Doina Precup. 2016. Conditional Computation in Neural Networks for Faster Models. DOI:arxiv:1511.06297 [cs].

[9]

Tolga Bolukbasi, Joseph Wang, Ofer Dekel, and Venkatesh Saligrama. 2017. Adaptive neural networks for efficient inference. In Proceedings of the 34th International Conference on Machine Learning. PMLR, 527–536.

Digital Library

[10]

Tolga Bolukbasi, Joseph Wang, Ofer Dekel, and Venkatesh Saligrama. 2017. Adaptive neural networks for fast test-time prediction. (Feb.2017).

[11]

Mark Buckler, Philip Bedoukian, Suren Jayasuriya, and Adrian Sampson. 2018. EVA \(^2\) : Exploiting temporal redundancy in live computer vision. In 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA). 533–546. DOI:

Digital Library

[12]

Paola Busia, Ilias Theodorakopoulos, Vasileios Pothos, Nikos Fragoulis, and Paolo Meloni. 2022. Dynamic pruning for parsimonious CNN inference on embedded systems. In Design and Architecture for Signal and Image Processing (Lecture Notes in Computer Science), Karol Desnos and Sergio Pertuz (Eds.). Springer International Publishing, Cham, 45–56. DOI:

Digital Library

[13]

Shaofeng Cai, Yao Shu, and Wei Wang. 2021. Dynamic routing networks. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. 3588–3597.

[14]

Victor Campos, Brendan Jou, Xavier Giro-i-Nieto, Jordi Torres, and Shih-Fu Chang. 2018. Skip RNN: Learning to Skip State Updates in Recurrent Neural Networks. DOI:arxiv:1708.06834 [cs].

[15]

Christopher Canel, Thomas Kim, Giulio Zhou, Conglong Li, Hyeontaek Lim, David G. Andersen, Michael Kaminsky, and Subramanya Dulloor. 2019. Scaling video analytics on constrained edge nodes. Proceedings of Machine Learning and Systems 1 (April2019), 406–417.

[16]

Shijie Cao, Lingxiao Ma, Wencong Xiao, Chen Zhang, Yunxin Liu, Lintao Zhang, Lanshun Nie, and Zhi Yang. 2019. SeerNet: Predicting convolutional neural network feature-map sparsity through low-bit quantization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 11216–11225.

[17]

Lukas Cavigelli and Luca Benini. 2020. CBinfer: Exploiting frame-to-frame locality for faster convolutional network inference on video streams. IEEE Transactions on Circuits and Systems for Video Technology 30, 5 (May2020), 1451–1465. DOI:

[18]

Lukas Cavigelli, Philippe Degen, and Luca Benini. 2017. CBinfer: Change-based inference for convolutional neural networks on video data. In Proceedings of the 11th International Conference on Distributed Smart Cameras (ICDSC 2017). Association for Computing Machinery, New York, NY, USA, 1–8. DOI:

Digital Library

[19]

Jinting Chen, Zhaocheng Zhu, Cheng Li, and Yuming Zhao. 2019. Self-adaptive network pruning. In Neural Information Processing (Lecture Notes in Computer Science), Tom Gedeon, Kok Wai Wong, and Minho Lee (Eds.). Springer International Publishing, Cham, 175–186. DOI:

Digital Library

[20]

Jou-An Chen, Wei Niu, Bin Ren, Yanzhi Wang, and Xipeng Shen. 2023. Survey: Exploiting data redundancy for optimization of deep learning. Comput. Surveys 55, 10 (Feb.2023), 212:1–212:38. DOI:

Digital Library

[21]

Tiffany Yu-Han Chen, Lenin Ravindranath, Shuo Deng, Paramvir Bahl, and Hari Balakrishnan. 2015. Glimpse: Continuous, real-time object recognition on mobile devices. In Proceedings of the 13th ACM Conference on Embedded Networked Sensor Systems (SenSys’15). Association for Computing Machinery, New York, NY, USA, 155–168. DOI:

Digital Library

[22]

Yinpeng Chen, Xiyang Dai, Mengchen Liu, Dongdong Chen, Lu Yuan, and Zicheng Liu. 2020. Dynamic ReLU. In Computer Vision – ECCV 2020 (Lecture Notes in Computer Science), Andrea Vedaldi, Horst Bischof, Thomas Brox, and Jan-Michael Frahm (Eds.). Springer International Publishing, Cham, 351–367. DOI:

Digital Library

[23]

Zhourong Chen, Yang Li, Samy Bengio, and Si Si. 2019. You look twice: GaterNet for dynamic filter selection in CNNs. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 9172–9180.

[24]

An-Chieh Cheng, Chieh Hubert Lin, Da-Cheng Juan, Wei Wei, and Min Sun. 2020. InstaNAS: Instance-aware neural architecture search. Proceedings of the AAAI Conference on Artificial Intelligence 34, 04 (April2020), 3577–3584. DOI:

[25]

Chang-Han Chiang, Pangfeng Liu, Da-Wei Wang, Ding-Yong Hong, and Jan-Jan Wu. 2021. Optimal branch location for cost-effective inference on Branchynet. In 2021 IEEE International Conference on Big Data (Big Data). 5071–5080. DOI:

[26]

Flavio Chierichetti, Ravi Kumar, and Sergei Vassilvitskii. 2009. Similarity caching. In Proceedings of the Twenty-Eighth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS’09). Association for Computing Machinery, New York, NY, USA, 127–136. DOI:

Digital Library

[27]

Jean-Baptiste Cordonnier, Aravindh Mahendran, Alexey Dosovitskiy, Dirk Weissenborn, Jakob Uszkoreit, and Thomas Unterthiner. 2021. Differentiable patch selection for image recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2351–2360.

[28]

Bart Cox, Robert Birke, and Lydia Y. Chen. 2022. Memory-aware and context-aware multi-DNN inference on the edge. Pervasive and Mobile Computing 83 (July2022), 101594. DOI:

Digital Library

[29]

Yarens J. Cruz, Marcelino Rivas, Ramón Quiza, Rodolfo E. Haber, Fernando Castaño, and Alberto Villalonga. 2022. A two-step machine learning approach for dynamic model selection: A case study on a micro milling process. Computers in Industry 143 (Dec.2022), 103764. DOI:

Digital Library

[30]

Mostafa Dehghani, Stephan Gouws, Oriol Vinyals, Jakob Uszkoreit, and Łukasz Kaiser. 2019. Universal Transformers. DOI:arxiv:1807.03819 [cs, stat].

[31]

Xuanyi Dong, Junshi Huang, Yi Yang, and Shuicheng Yan. 2017. More is less: A more complicated network with less inference complexity. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 5840–5848.

[32]

Utsav Drolia, Katherine Guo, and Priya Narasimhan. 2017. Precog: Prefetching for image recognition applications at the edge. In Proceedings of the Second ACM/IEEE Symposium on Edge Computing (SEC’17). Association for Computing Machinery, New York, NY, USA, 1–13.

Digital Library

[33]

Utsav Drolia, Katherine Guo, Jiaqi Tan, Rajeev Gandhi, and Priya Narasimhan. 2017. Cachier: Edge-caching for recognition applications. In 2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS). 276–286. DOI:

[34]

Ali Ehteshami Bejnordi and Ralf Krestel. 2020. Dynamic channel and layer gating in convolutional neural networks. In KI 2020: Advances in Artificial Intelligence (Lecture Notes in Computer Science), Ute Schmid, Franziska Klügl, and Diedrich Wolter (Eds.). Springer International Publishing, Cham, 33–45. DOI:

Digital Library

[35]

Maha Elbayad, Jiatao Gu, Edouard Grave, and Michael Auli. 2020. Depth-Adaptive Transformer. DOI:arxiv:1910.10073 [cs].

[36]

Sara Elkerdawy, Mostafa Elhoushi, Hong Zhang, and Nilanjan Ray. 2022. Fire together wire together: A dynamic pruning approach with self-supervised mask prediction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 12454–12463.

[37]

OpenAI et al.2023. GPT-4 Technical Report. Technical Report arXiv:2303.08774. DOI:arxiv:2303.08774 [cs].

[38]

Fabrizio Falchi, Claudio Lucchese, Salvatore Orlando, Raffaele Perego, and Fausto Rabitti. 2008. A metric cache for similarity search. In Proceedings of the 2008 ACM Workshop on Large-Scale Distributed Systems for Information Retrieval (LSDS-IR’08). Association for Computing Machinery, New York, NY, USA, 43–50. DOI:

Digital Library

[39]

Fabrizio Falchi, Claudio Lucchese, Salvatore Orlando, Raffaele Perego, and Fausto Rabitti. 2009. Caching content-based queries for robust and efficient image retrieval. In Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology (EDBT’09). Association for Computing Machinery, New York, NY, USA, 780–790. DOI:

Digital Library

[40]

Fabrizio Falchi, Claudio Lucchese, Salvatore Orlando, Raffaele Perego, and Fausto Rabitti. 2012. Similarity caching in large-scale image retrieval. Information Processing & Management 48, 5 (Sept.2012), 803–818. DOI:

Digital Library

[41]

H. Fan, Z. Xu, L. Zhu, C. Yan, J. Ge, and Y. Yang. 2018. Watching a Small Portion Could Be as Good as Watching All: Towards Efficient Video Classification.

[42]

Biyi Fang, Xiao Zeng, Faen Zhang, Hui Xu, and Mi Zhang. 2020. FlexDNN: Input-adaptive on-device deep learning for efficient mobile vision. In 2020 IEEE/ACM Symposium on Edge Computing (SEC). 84–95. DOI:

[43]

Yihao Fang, Shervin Manzuri Shalmani, and Rong Zheng. 2020. CacheNet: A Model Caching Framework for Deep Learning Inference on the Edge. DOI:arxiv:2007.01793 [cs, eess].

[44]

Mohsen Fayyaz, Soroush Abbasi Koohpayegani, Farnoush Rezaei Jafari, Sunando Sengupta, Hamid Reza Vaezi Joze, Eric Sommerlade, Hamed Pirsiavash, and Jürgen Gall. 2022. Adaptive token sampling for efficient vision transformers. In Computer Vision – ECCV 2022 (Lecture Notes in Computer Science), Shai Avidan, Gabriel Brostow, Moustapha Cissé, Giovanni Maria Farinella, and Tal Hassner (Eds.). Springer Nature Switzerland, Cham, 396–414. DOI:

Digital Library

[45]

Michael Figurnov, Maxwell D. Collins, Yukun Zhu, Li Zhang, Jonathan Huang, Dmitry Vetrov, and Ruslan Salakhutdinov. 2017. Spatially adaptive computation time for residual networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1039–1048.

[46]

Alessandro Finamore, James Roberts, Massimo Gallo, and Dario Rossi. 2022. Accelerating deep learning classification with error-controlled approximate-key caching. In IEEE INFOCOM 2022 - IEEE Conference on Computer Communications. 2118–2127. DOI:

Digital Library

[47]

Jianlong Fu, Heliang Zheng, and Tao Mei. 2017. Look closer to see better: Recurrent attention convolutional neural network for fine-grained image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4438–4446.

[48]

Tsu-Jui Fu and Wei-Yun Ma. 2018. Speed reading: Learning to read forbackward via shuttle. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Brussels, Belgium, 4439–4448. DOI:

[49]

Xitong Gao, Yiren Zhao, Łukasz Dudziak, Robert Mullins, and Cheng-zhong Xu. 2019. Dynamic Channel Pruning: Feature Boosting and Suppression. DOI:arxiv:1810.05331 [cs].

[50]

Nikhil P. Ghanathe and Steve Wilton. 2022. T-RECX: Tiny-Resource Efficient Convolutional Neural Networks with Early-Exit. DOI:arxiv:2207.06613 [cs, eess].

[51]

Amir Ghodrati, Babak Ehteshami Bejnordi, and Amirhossein Habibian. 2021. FrameExit: Conditional early exiting for efficient video recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 15608–15618.

[52]

Guin R. Gilman, Samuel S. Ogden, Robert J. Walls, and Tian Guo. 2019. Challenges and opportunities of DNN model execution caching. In Proceedings of the Workshop on Distributed Infrastructures for Deep Learning (DIDL’19). Association for Computing Machinery, New York, NY, USA, 7–12. DOI:

Digital Library

[53]

Chao Gong, Fuhong Lin, Xiaowen Gong, and Yueming Lu. 2020. Intelligent cooperative edge computing in internet of things. IEEE Internet of Things Journal 7, 10 (Oct.2020), 9372–9382. DOI:

[54]

Hongyu Gong, Xian Li, and Dmitriy Genzel. 2022. Adaptive Sparse Transformer for Multilingual Translation. DOI:arxiv:2104.07358 [cs].

[55]

Alex Graves. 2017. Adaptive Computation Time for Recurrent Neural Networks. DOI:arxiv:1603.08983 [cs].

[56]

Peizhen Guo and Wenjun Hu. 2018. Potluck: Cross-application approximate deduplication for computation-intensive mobile applications. In Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’18). Association for Computing Machinery, New York, NY, USA, 271–284. DOI:

Digital Library

[57]

Peizhen Guo, Rui Li, Bo Hu, and Wenjun Hu. 2018. FoggyCache: Cross-device approximate computation reuse. Living on the Edge (2018), 16.

[58]

Qiushan Guo, Zhipeng Yu, Yichao Wu, Ding Liang, Haoyu Qin, and Junjie Yan. 2019. Dynamic recursive neural network. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5147–5156.

[59]

Yunhui Guo. 2018. A Survey on Methods and Theories of Quantized Neural Networks. DOI:arxiv:1808.04752 [cs, stat].

[60]

Amir Hadifar, Johannes Deleu, Chris Develder, and Thomas Demeester. 2021. Exploration of block-wise dynamic sparseness. Pattern Recognition Letters 151 (Nov.2021), 187–192. DOI:

Digital Library

[61]

Seungyeop Han, Haichen Shen, Matthai Philipose, Sharad Agarwal, Alec Wolman, and Arvind Krishnamurthy. 2016. MCDNN: An approximation-based execution framework for deep stream processing under resource constraints. In Proceedings of the 14th Annual International Conference on Mobile Systems, Applications, and Services (MobiSys’16). Association for Computing Machinery, New York, NY, USA, 123–136. DOI:

Digital Library

[62]

Yizeng Han, Gao Huang, Shiji Song, Le Yang, Honghui Wang, and Yulin Wang. 2021. Dynamic neural networks: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence (2021), 1–1. DOI:

[63]

Christian Hansen, Casper Hansen, Stephen Alstrup, Jakob Grue Simonsen, and Christina Lioma. 2019. Neural Speed Reading with Structural-Jump-LSTM. DOI:arxiv:1904.00761 [cs, stat].

[64]

Mirazul Haque, Anki Chauhan, Cong Liu, and Wei Yang. 2020. ILFO: Adversarial attack on adaptive neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 14264–14273.

[65]

Hussein Hazimeh, Natalia Ponomareva, Petros Mol, Zhenyu Tan, and Rahul Mazumder. 2020. The tree ensemble layer: Differentiability meets conditional computation. In Proceedings of the 37th International Conference on Machine Learning. PMLR, 4138–4148.

[66]

Charles Herrmann, Richard Strong Bowen, and Ramin Zabih. 2020. Channel selection using Gumbel Softmax. In Computer Vision – ECCV 2020 (Lecture Notes in Computer Science), Andrea Vedaldi, Horst Bischof, Thomas Brox, and Jan-Michael Frahm (Eds.). Springer International Publishing, Cham, 241–257. DOI:

Digital Library

[67]

Sanghyun Hong, Yiğitcan Kaya, Ionuţ-Vlad Modoranu, and Tudor Dumitraş. 2021. A Panda? No, It’s a Sloth: Slowdown Attacks on Adaptive Multi-Exit Neural Network Inference. DOI:arxiv:2010.02432 [cs].

[68]

Lu Hou, Zhiqi Huang, Lifeng Shang, Xin Jiang, Xiao Chen, and Qun Liu. 2020. DynaBERT: Dynamic BERT with adaptive width and depth. In Advances in Neural Information Processing Systems, Vol. 33. Curran Associates, Inc., 9782–9793.

[69]

Kevin Hsieh, Ganesh Ananthanarayanan, Peter Bodik, Shivaram Venkataraman, Paramvir Bahl, Matthai Philipose, Phillip B. Gibbons, and Onur Mutlu. 2018. Focus: Querying large video datasets with low latency and low cost. In 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 18). 269–286.

[70]

Hanzhang Hu, Debadeepta Dey, Martial Hebert, and J. Andrew Bagnell. 2018. Learning Anytime Predictions in Neural Networks via Adaptive Loss Balancing. DOI:arxiv:1708.06832 [cs].

[71]

Ting-Kuei Hu, Tianlong Chen, Haotao Wang, and Zhangyang Wang. 2020. Triple Wins: Boosting Accuracy, Robustness and Efficiency Together by Enabling Input-Adaptive Inference. DOI:arxiv:2002.10025 [cs].

[72]

Zilong Hu, Jinshan Tang, Ziming Wang, Kai Zhang, Ling Zhang, and Qingling Sun. 2018. Deep learning for image-based cancer detection and diagnosis - a survey. Pattern Recognition 83 (Nov.2018), 134–149. DOI:

Digital Library

[73]

Weizhe Hua, Yuan Zhou, Christopher De Sa, Zhiru Zhang, and G. Edward Suh. 2019. Boosting the performance of CNN accelerators with dynamic fine-grained channel gating. In Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO’52). Association for Computing Machinery, New York, NY, USA, 139–150. DOI:

Digital Library

[74]

Weizhe Hua, Yuan Zhou, Christopher M. De Sa, Zhiru Zhang, and G. Edward Suh. 2019. Channel gating neural networks. In Advances in Neural Information Processing Systems, Vol. 32. Curran Associates, Inc.

[75]

Gao Huang, Danlu Chen, Tianhong Li, Felix Wu, Laurens van der Maaten, and Kilian Q. Weinberger. 2018. Multi-Scale Dense Networks for Resource Efficient Image Classification. DOI:arxiv:1703.09844 [cs].

[76]

Gao Huang, Yulin Wang, Kangchen Lv, Haojun Jiang, Wenhui Huang, Pengfei Qi, and Shiji Song. 2022. Glance and Focus Networks for Dynamic Visual Recognition. DOI:arxiv:2201.03014 [cs].

[77]

Zhengjie Huang, Zi Ye, Shuangyin Li, and Rong Pan. 2017. Length adaptive recurrent model for text classification. In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management (CIKM’17). Association for Computing Machinery, New York, NY, USA, 1019–1027. DOI:

Digital Library

[78]

Loc N. Huynh, Youngki Lee, and Rajesh Krishna Balan. 2017. DeepMon: Mobile GPU-based deep learning framework for continuous vision applications. In Proceedings of the 15th Annual International Conference on Mobile Systems, Applications, and Services (MobiSys’17). Association for Computing Machinery, New York, NY, USA, 82–95. DOI:

Digital Library

[79]

Yani Ioannou, Duncan Robertson, Darko Zikic, Peter Kontschieder, Jamie Shotton, Matthew Brown, and Antonio Criminisi. 2016. Decision Forests, Convolutional Networks and the Models in-Between. DOI:arxiv:1603.01250 [cs].

[80]

Samvit Jain, Xun Zhang, Yuhao Zhou, Ganesh Ananthanarayanan, Junchen Jiang, Yuanchao Shu, Paramvir Bahl, and Joseph Gonzalez. 2020. Spatula: Efficient cross-camera video analytics on large camera networks. In 2020 IEEE/ACM Symposium on Edge Computing (SEC). 110–124. DOI:

[81]

Samvit Jain, Xun Zhang, Yuhao Zhou, Ganesh Ananthanarayanan, Junchen Jiang, Yuanchao Shu, and Joseph Gonzalez. 2019. ReXCam: Resource-Efficient, Cross-Camera Video Analytics at Scale. DOI:arxiv:1811.01268 [cs].

[82]

Yacine Jernite, Edouard Grave, Armand Joulin, and Tomas Mikolov. 2017. Variable Computation in Recurrent Neural Networks. DOI:arxiv:1611.06188 [cs, stat].

[83]

Junchen Jiang, Ganesh Ananthanarayanan, Peter Bodik, Siddhartha Sen, and Ion Stoica. 2018. Chameleon: Scalable adaptation of video analytics. In Proceedings of the 2018 Conference of the ACM Special Interest Group on Data Communication (SIGCOMM’18). Association for Computing Machinery, New York, NY, USA, 253–266. DOI:

Digital Library

[84]

Zutao Jiang, Changlin Li, Xiaojun Chang, Jihua Zhu, and Yi Yang. 2021. Dynamic Slimmable Denoising Network. DOI:arxiv:2110.08940 [cs, eess].

[85]

Qing Jin, Linjie Yang, and Zhenyu Liao. 2020. AdaBits: Neural network quantization with adaptive bit-widths. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2146–2156.

[86]

Daniel Kang, John Emmons, Firas Abuzaid, Peter Bailis, and Matei Zaharia. 2017. NoScope: Optimizing Neural Network Queries over Video at Scale. DOI:arxiv:1703.02529 [cs].

[87]

Yigitcan Kaya, Sanghyun Hong, and Tudor Dumitras. 2019. Shallow-deep networks: Understanding and mitigating network overthinking. In Proceedings of the 36th International Conference on Machine Learning. PMLR, 3301–3310.

[88]

Gyuwan Kim and Kyunghyun Cho. 2021. Length-Adaptive Transformer: Train Once with Length Drop, Use Anytime with Search. DOI:arxiv:2010.07003 [cs].

[89]

Alexander Kirillov, Yuxin Wu, Kaiming He, and Ross Girshick. 2020. PointRend: Image segmentation as rendering. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 9799–9808.

[90]

Shu Kong and Charless Fowlkes. 2019. Pixel-wise attentional gating for scene parsing. In 2019 IEEE Winter Conference on Applications of Computer Vision (WACV). 1024–1033. DOI:

[91]

Peter Kontschieder, Madalina Fiterau, Antonio Criminisi, and Samuel Rota Bulo. 2015. Deep neural decision forests. In Proceedings of the IEEE International Conference on Computer Vision. 1467–1475.

Digital Library

[92]

Alexandros Kouris, Stylianos I. Venieris, Stefanos Laskaridis, and Nicholas D. Lane. 2022. Multi-Exit Semantic Segmentation Networks. arxiv:2106.03527 [cs].

[93]

Tarun Krishna, Ayush K. Rai, Yasser A. D. Djilali, Alan F. Smeaton, Kevin McGuinness, and Noel E. O’Connor. 2022. Dynamic Channel Selection in Self-Supervised Learning. DOI:arxiv:2207.12065 [cs].

[94]

Jason Kuen, Xiangfei Kong, Zhe Lin, Gang Wang, Jianxiong Yin, Simon See, and Yap-Peng Tan. 2018. Stochastic downsampling for cost-adjustable inference and improved regularization in convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 7929–7938.

[95]

Stefanos Laskaridis, Alexandros Kouris, and Nicholas D. Lane. 2021. Adaptive inference through early-exit networks: Design, challenges and directions. In Proceedings of the 5th International Workshop on Embedded and Mobile Deep Learning (EMDL’21). Association for Computing Machinery, New York, NY, USA, 1–6. DOI:

Digital Library

[96]

Stefanos Laskaridis, Stylianos I. Venieris, Mario Almeida, Ilias Leontiadis, and Nicholas D. Lane. 2020. SPINN: Synergistic progressive inference of neural networks over device and cloud. In Proceedings of the 26th Annual International Conference on Mobile Computing and Networking (MobiCom’20). Association for Computing Machinery, New York, NY, USA, 1–15. DOI:

Digital Library

[97]

Changsik Lee, Seungwoo Hong, Sungback Hong, and Taeyeon Kim. 2020. Performance analysis of local exit for distributed deep neural networks over cloud and edge computing. ETRI Journal 42, 5 (2020), 658–668. DOI:

[98]

Hankook Lee and Jinwoo Shin. 2018. Anytime Neural Prediction via Slicing Networks Vertically. DOI:arxiv:1807.02609 [cs, stat].

[99]

Royson Lee, Stylianos I. Venieris, Lukasz Dudziak, Sourav Bhattacharya, and Nicholas D. Lane. 2019. MobiSR: Efficient on-device super-resolution through heterogeneous mobile processors. In The 25th Annual International Conference on Mobile Computing and Networking (MobiCom’19). Association for Computing Machinery, New York, NY, USA, 1–16. DOI:

Digital Library

[100]

Sam Leroux, Steven Bohez, Cedric De Boom, Elias De Coninck, Tim Verbelen, Bert Vankeirsbilck, Pieter Simoens, and Bart Dhoedt. 2016. Lazy Evaluation of Convolutional Filters. DOI:arxiv:1605.08543 [cs].

[101]

Sam Leroux, Steven Bohez, Elias De Coninck, Tim Verbelen, Bert Vankeirsbilck, Pieter Simoens, and Bart Dhoedt. 2017. The cascading neural network: Building the internet of smart things. Knowledge and Information Systems 52, 3 (Sept.2017), 791–814. DOI:

Digital Library

[102]

Sam Leroux, Pavlo Molchanov, Pieter Simoens, Bart Dhoedt, Thomas Breuel, and Jan Kautz. 2018. IamNN: Iterative and Adaptive Mobile Neural Network for Efficient Image Classification. DOI:arxiv:1804.10123 [cs].

[103]

Changlin Li, Guangrun Wang, Bing Wang, Xiaodan Liang, Zhihui Li, and Xiaojun Chang. 2021. DS-Net++: Dynamic Weight Slicing for Efficient Inference in CNNs and Transformers. DOI:arxiv:2109.10060 [cs].

[104]

Changlin Li, Guangrun Wang, Bing Wang, Xiaodan Liang, Zhihui Li, and Xiaojun Chang. 2021. Dynamic slimmable network. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 8607–8617.

[105]

Hengduo Li, Zuxuan Wu, Abhinav Shrivastava, and Larry S. Davis. 2021. 2D or not 2D? Adaptive 3D convolution selection for efficient video recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 6155–6164.

[106]

Hao Li, Hong Zhang, Xiaojuan Qi, Ruigang Yang, and Gao Huang. 2019. Improved techniques for training adaptive deep networks. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 1891–1900.

[107]

Liangzhi Li, Kaoru Ota, and Mianxiong Dong. 2018. Deep learning for smart industry: Efficient manufacture inspection system with fog computing. IEEE Transactions on Industrial Informatics 14, 10 (Oct.2018), 4665–4673. DOI:

[108]

Xiaoxiao Li, Ziwei Liu, Ping Luo, Chen Change Loy, and Xiaoou Tang. 2017. Not all pixels are equal: Difficulty-aware semantic segmentation via deep layer cascade. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3193–3202.

[109]

Yanwei Li, Lin Song, Yukang Chen, Zeming Li, Xiangyu Zhang, Xingang Wang, and Jian Sun. 2020. Learning dynamic routing for semantic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 8553–8562.

[110]

Zhichao Li, Yi Yang, Xiao Liu, Feng Zhou, Shilei Wen, and Wei Xu. 2017. Dynamic computational time for visual attention. In Proceedings of the IEEE International Conference on Computer Vision Workshops. 1199–1209.

[111]

Robert LiKamWa and Lin Zhong. 2015. Starfish: Efficient concurrency support for computer vision applications. In Proceedings of the 13th Annual International Conference on Mobile Systems, Applications, and Services (MobiSys’15). Association for Computing Machinery, New York, NY, USA, 213–226. DOI:

Digital Library

[112]

Ji Lin, Yongming Rao, Jiwen Lu, and Jie Zhou. 2017. Runtime neural pruning. In Advances in Neural Information Processing Systems, Vol. 30. Curran Associates, Inc.

[113]

Yingyan Lin, Charbel Sakr, Yongjune Kim, and Naresh Shanbhag. 2017. PredictiveNet: An energy-efficient convolutional neural network via zero prediction. In 2017 IEEE International Symposium on Circuits and Systems (ISCAS). 1–4. DOI:

[114]

Chuanjian Liu, Yunhe Wang, Kai Han, Chunjing Xu, and Chang Xu. 2019. Learning Instance-wise Sparsity for Accelerating Deep Models. DOI:arxiv:1907.11840 [cs].

[115]

Lanlan Liu and Jia Deng. 2018. Dynamic deep neural networks: Optimizing accuracy-efficiency trade-offs by selective execution. Proceedings of the AAAI Conference on Artificial Intelligence 32, 1 (April2018). DOI:

[116]

Luyang Liu, Hongyu Li, and Marco Gruteser. 2019. Edge assisted real-time object detection for mobile augmented reality. In The 25th Annual International Conference on Mobile Computing and Networking (MobiCom’19). Association for Computing Machinery, New York, NY, USA, 1–16. DOI:

Digital Library

[117]

Miaomiao Liu, Xianzhong Ding, and Wan Du. 2020. Continuous, real-time object detection on mobile devices without offloading. In 2020 IEEE 40th International Conference on Distributed Computing Systems (ICDCS). 976–986. DOI:

[118]

Sicong Liu, Yingyan Lin, Zimu Zhou, Kaiming Nan, Hui Liu, and Junzhao Du. 2018. On-demand deep model compression for mobile devices: A usage-driven model selection framework. In Proceedings of the 16th Annual International Conference on Mobile Systems, Applications, and Services (MobiSys’18). Association for Computing Machinery, New York, NY, USA, 389–400. DOI:

Digital Library

[119]

Weijie Liu, Peng Zhou, Zhe Zhao, Zhiruo Wang, Haotang Deng, and Qi Ju. 2020. FastBERT: A Self-distilling BERT with Adaptive Inference Time. DOI:arxiv:2004.02178 [cs].

[120]

Xianggen Liu, Lili Mou, Haotian Cui, Zhengdong Lu, and Sen Song. 2020. Finding decision jumps in text classification. Neurocomputing 371 (Jan.2020), 177–187. DOI:

Digital Library

[121]

Chi Lo, Yu-Yi Su, Chun-Yi Lee, and Shih-Chieh Chang. 2017. A dynamic deep neural network design for efficient workload allocation in edge computing. In 2017 IEEE International Conference on Computer Design (ICCD). 273–280. DOI:

[122]

Wei Lou, Lei Xun, Amin Sabet, Jia Bi, Jonathon Hare, and Geoff V. Merrett. 2021. Dynamic-OFA: Runtime DNN architecture switching for performance scaling on heterogeneous embedded platforms. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 3110–3118.

[123]

Luca Lovagnini, Wenxiao Zhang, Farshid Hassani Bijarbooneh, and Pan Hui. 2018. CIRCE: Real-time caching for instance recognition on cloud environments and multi-core architectures. In Proceedings of the 26th ACM International Conference on Multimedia (MM’18). Association for Computing Machinery, New York, NY, USA, 346–354. DOI:

Digital Library

[124]

Jiachen Mao, Qing Yang, Ang Li, Kent W. Nixon, Hai Li, and Yiran Chen. 2022. Toward efficient and adaptive design of video detection system with deep neural networks. ACM Transactions on Embedded Computing Systems 21, 3 (July2022), 33:1–33:21. DOI:

Digital Library

[125]

Vicent Sanz Marco, Ben Taylor, Zheng Wang, and Yehia Elkhatib. 2020. Optimizing deep learning inference on embedded systems through adaptive model selection. ACM Transactions on Embedded Computing Systems 19, 1 (Feb.2020), 2:1–2:28. DOI:

Digital Library

[126]

Yoshitomo Matsubara, Marco Levorato, and Francesco Restuccia. 2022. Split computing and early exiting for deep learning applications: Survey and research challenges. Comput. Surveys 55, 5 (Dec.2022), 90:1–90:30. DOI:

Digital Library

[127]

Mason McGill and Pietro Perona. 2017. Deciding how to decide: Dynamic routing in artificial neural networks. In Proceedings of the 34th International Conference on Machine Learning. PMLR, 2363–2372.

[128]

Lingchen Meng, Hengduo Li, Bor-Chun Chen, Shiyi Lan, Zuxuan Wu, Yu-Gang Jiang, and Ser-Nam Lim. 2022. AdaViT: Adaptive vision transformers for efficient image recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 12309–12318.

[129]

Yue Meng, Chung-Ching Lin, Rameswar Panda, Prasanna Sattigeri, Leonid Karlinsky, Aude Oliva, Kate Saenko, and Rogerio Feris. 2020. AR-Net: Adaptive frame resolution for efficient action recognition. In Computer Vision – ECCV 2020 (Lecture Notes in Computer Science), Andrea Vedaldi, Horst Bischof, Thomas Brox, and Jan-Michael Frahm (Eds.). Springer International Publishing, Cham, 86–104. DOI:

Digital Library

[130]

Yue Meng, Rameswar Panda, Chung-Ching Lin, Prasanna Sattigeri, Leonid Karlinsky, Kate Saenko, Aude Oliva, and Rogerio Feris. 2021. AdaFuse: Adaptive Temporal Fusion Network for Efficient Action Recognition. DOI:arxiv:2102.05775 [cs].

[131]

Volodymyr Mnih, Nicolas Heess, Alex Graves, and Koray Kavukcuoglu. 2014. Recurrent models of visual attention. In Advances in Neural Information Processing Systems, Vol. 27. Curran Associates, Inc.

Digital Library

[132]

Ravi Teja Mullapudi, William R. Mark, Noam Shazeer, and Kayvon Fatahalian. 2018. HydraNets: Specialized dynamic architectures for efficient inference. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 8080–8089.

[133]

Keivan Nalaie, Renjie Xu, and Rong Zheng. 2022. DeepScale: Online frame size adaptation for multi-object tracking on smart cameras and edge servers. In 2022 IEEE/ACM Seventh International Conference on Internet-of-Things Design and Implementation (IoTDI). 67–79. DOI:

[134]

Srikanth Namuduri, Barath Narayanan Narayanan, Venkata Salini Priyamvada Davuluru, Lamar Burton, and Shekhar Bhansali. 2020. Review—Deep learning methods for sensor based predictive maintenance and future perspectives for electrochemical sensors. Journal of The Electrochemical Society 167, 3 (Jan.2020), 037552. DOI:

[135]

Mark Neumann, Pontus Stenetorp, and Sebastian Riedel. 2016. Learning to Reason with Adaptive Computation. DOI:arxiv:1610.07647 [cs, stat].

[136]

Peter O’Connor and Max Welling. 2016. Sigma Delta Quantized Networks. DOI:arxiv:1611.02024 [cs].

[137]

Augustus Odena, Dieterich Lawson, and Christopher Olah. 2017. Changing Model Behavior at Test-Time Using Reinforcement Learning. DOI:arxiv:1702.07780 [cs, stat].

[138]

Samuel S. Ogden and Tian Guo. 2018. {MODI}: Mobile deep inference made efficient by edge computing. In USENIX Workshop on Hot Topics in Edge Computing (HotEdge 18).

[139]

Bowen Pan, Wuwei Lin, Xiaolin Fang, Chaoqin Huang, Bolei Zhou, and Cewu Lu. 2018. Recurrent residual module for fast inference in videos. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1536–1545.

[140]

Bowen Pan, Rameswar Panda, Camilo Fosco, Chung-Ching Lin, Alex Andonian, Yue Meng, Kate Saenko, Aude Oliva, and Rogerio Feris. 2021. VA-RED$ \(\hat2\) $: Video Adaptive Redundancy Reduction. DOI:arxiv:2102.07887 [cs].

[141]

Bowen Pan, Rameswar Panda, Yifan Jiang, Zhangyang Wang, Rogerio Feris, and Aude Oliva. 2021. IA-RED2: Interpretability-aware redundancy reduction for vision transformers. In Advances in Neural Information Processing Systems, Vol. 34. Curran Associates, Inc., 24898–24911.

[142]

Priyadarshini Panda, Aayush Ankit, Parami Wijesinghe, and Kaushik Roy. 2017. FALCON: Feature driven selective classification for energy-efficient image recognition. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 36, 12 (Dec.2017), 2017–2029. DOI:

[143]

Priyadarshini Panda, Abhronil Sengupta, and Kaushik Roy. 2016. Conditional deep learning for energy-efficient and enhanced pattern recognition. In 2016 Design, Automation & Test in Europe Conference & Exhibition (DATE). 475–480.

Digital Library

[144]

Priyadarshini Panda, Abhronil Sengupta, and Kaushik Roy. 2017. Energy-efficient and improved image recognition with conditional deep learning. ACM Journal on Emerging Technologies in Computing Systems 13, 3 (Feb.2017), 33:1–33:21. DOI:

Digital Library

[145]

Mathias Parger, Chengcheng Tang, Christopher D. Twigg, Cem Keskin, Robert Wang, and Markus Steinberger. 2022. DeltaCNN: End-to-End CNN inference of sparse frame differences in videos. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 12497–12506.

[146]

Eunhyeok Park, Dongyoung Kim, Soobeom Kim, Yong-Deok Kim, Gunhee Kim, Sungroh Yoon, and Sungjoo Yoo. 2015. Big/little deep neural network for ultra low power inference. In 2015 International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS). 124–132. DOI:

[147]

Yongming Rao, Zuyan Liu, Wenliang Zhao, Jie Zhou, and Jiwen Lu. 2022. Dynamic Spatial Sparsification for Efficient Vision Transformers and Convolutional Neural Networks. DOI:arxiv:2207.01580 [cs].

[148]

Yongming Rao, Jiwen Lu, and Jie Zhou. 2017. Attention-aware deep reinforcement learning for video face recognition. In Proceedings of the IEEE International Conference on Computer Vision. 3931–3940.

[149]

Yongming Rao, Wenliang Zhao, Benlin Liu, Jiwen Lu, Jie Zhou, and Cho-Jui Hsieh. 2021. DynamicViT: Efficient vision transformers with dynamic token sparsification. In Advances in Neural Information Processing Systems, Vol. 34. Curran Associates, Inc., 13937–13949.

[150]

Nafiul Rashid, Berken Utku Demirel, Mohanad Odema, and Mohammad Abdullah Al Faruque. 2022. Template matching based early exit CNN for energy-efficient myocardial infarction detection on low-power wearable devices. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 6, 2 (July2022), 68:1–68:22. DOI:

Digital Library

[151]

Mengye Ren, Andrei Pokrovsky, Bin Yang, and Raquel Urtasun. 2018. SBNet: Sparse blocks network for fast inference. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 8711–8720.

[152]

Clemens Rosenbaum, Tim Klinger, and Matthew Riemer. 2017. Routing Networks: Adaptive Selection of Non-linear Functions for Multi-Task Learning. DOI:arxiv:1711.01239 [cs].

[153]

Samuel Rota Bulo and Peter Kontschieder. 2014. Neural decision forests for semantic image labelling. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 81–88.

Digital Library

[154]

Mohammadamin Sabetsarvestani, Jonathon Hare, Bashir Al-Hashimi, and Geoff Merrett. 2021. Similarity-aware CNN for efficient video recognition at the edge. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (Dec.2021). DOI:

Digital Library

[155]

Muhammad Sabih, Frank Hannig, and Jürgen Teich. 2022. DyFiP: Explainable AI-based dynamic filter pruning of convolutional neural networks. In Proceedings of the 2nd European Workshop on Machine Learning and Systems (EuroMLSys’22). Association for Computing Machinery, New York, NY, USA, 109–115. DOI:

Digital Library

[156]

Tareq Si Salem, Giovanni Neglia, and Damiano Carra. 2021. AÇAI: Ascent similarity caching with approximate indexes. In 2021 33rd International Teletraffic Congress (ITC-33). 1–9.

[157]

Simone Scardapane, Michele Scarpiniti, Enzo Baccarelli, and Aurelio Uncini. 2020. Why should we add early exits to neural networks? Cognitive Computation 12, 5 (Sept.2020), 954–966. DOI:

[158]

Jordan Schmerge, Daniel Mawhirter, Connor Holmes, Jedidiah McClurg, and Bo Wu. 2021. ELI \(\chi\) R: Eliminating computation redundancy in CNN-based video processing. In 2021 IEEE/ACM Redefining Scalability for Diversely Heterogeneous Architectures Workshop (RSDHA). 34–44. DOI:

[159]

Tal Schuster, Adam Fisch, Jai Gupta, Mostafa Dehghani, Dara Bahri, Vinh Tran, Yi Tay, and Donald Metzler. 2022. Confident adaptive language modeling. Advances in Neural Information Processing Systems 35 (Dec.2022), 17456–17472.

[160]

Roy Schwartz, Gabriel Stanovsky, Swabha Swayamdipta, Jesse Dodge, and Noah A. Smith. 2020. The Right Tool for the Job: Matching Model and Instance Complexities. DOI:arxiv:2004.07453 [cs].

[161]

Minjoon Seo, Sewon Min, Ali Farhadi, and Hannaneh Hajishirzi. 2018. Neural Speed Reading via Skim-RNN. DOI:arxiv:1711.02085 [cs].

[162]

Noam Shazeer, Azalia Mirhoseini, Krzysztof Maziarz, Andy Davis, Quoc Le, Geoffrey Hinton, and Jeff Dean. 2017. Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer. DOI:arxiv:1701.06538 [cs, stat].

[163]

Jianghao Shen, Yue Wang, Pengfei Xu, Yonggan Fu, Zhangyang Wang, and Yingyan Lin. 2020. Fractional skipping: Towards finer-grained dynamic CNN inference. Proceedings of the AAAI Conference on Artificial Intelligence 34, 04 (April2020), 5700–5708. DOI:

[164]

Mengnan Shi, Chang Liu, Qixiang Ye, and Jianbin Jiao. 2021. Feature-Gate Coupling for Dynamic Network Pruning. DOI:arxiv:2111.14302 [cs].

[165]

Martin Simonovsky and Nikos Komodakis. 2017. Dynamic edge-conditioned filters in convolutional neural networks on graphs. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3693–3702.

[166]

Zhuoran Song, Feiyang Wu, Xueyuan Liu, Jing Ke, Naifeng Jing, and Xiaoyao Liang. 2020. VR-DANN: Real-time video recognition via decoder-assisted neural network acceleration. In 2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). 698–710. DOI:

[167]

Dimitrios Stamoulis, Ting-Wu Rudy Chin, Anand Krishnan Prakash, Haocheng Fang, Sribhuvan Sajja, Mitchell Bognar, and Diana Marculescu. 2018. Designing adaptive neural networks for energy-constrained image classification. In 2018 IEEE/ACM International Conference on Computer-Aided Design (ICCAD). IEEE Press, San Diego, CA, USA, 1–8. DOI:

Digital Library

[168]

Yu-Chuan Su and Kristen Grauman. 2016. Leaving some stones unturned: Dynamic feature prioritization for activity detection in streaming video. In Computer Vision – ECCV 2016 (Lecture Notes in Computer Science), Bastian Leibe, Jiri Matas, Nicu Sebe, and Max Welling (Eds.). Springer International Publishing, Cham, 783–800. DOI:

[169]

Sainbayar Sukhbaatar, Edouard Grave, Piotr Bojanowski, and Armand Joulin. 2019. Adaptive Attention Span in Transformers. arxiv:1905.07799 [cs, stat].

[170]

Ximeng Sun, Rameswar Panda, Chun-Fu (Richard) Chen, Aude Oliva, Rogerio Feris, and Kate Saenko. 2021. Dynamic network quantization for efficient video inference. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 7375–7385.

[171]

Zafar Takhirov, Joseph Wang, Venkatesh Saligrama, and Ajay Joshi. 2016. Energy-efficient adaptive classifier design for mobile systems. In Proceedings of the 2016 International Symposium on Low Power Electronics and Design (ISLPED’16). Association for Computing Machinery, New York, NY, USA, 52–57. DOI:

Digital Library

[172]

Tianxiang Tan and Guohong Cao. 2021. Efficient execution of deep neural networks on mobile devices with NPU. In Proceedings of the 20th International Conference on Information Processing in Sensor Networks (Co-Located with CPS-IoT Week 2021) (IPSN’21). Association for Computing Machinery, New York, NY, USA, 283–298. DOI:

Digital Library

[173]

Chen Tang, Wenyu Sun, Wenxun Wang, and Yongpan Liu. 2022. Dynamic CNN accelerator supporting efficient filter generator with kernel enhancement and online channel pruning. In 2022 27th Asia and South Pacific Design Automation Conference (ASP-DAC). 436–441. DOI:

Digital Library

[174]

Chen Tang, Haoyu Zhai, Kai Ouyang, Zhi Wang, Yifei Zhu, and Wenwu Zhu. 2022. Arbitrary Bit-width Network: A Joint Layer-Wise Quantization and Adaptive Inference Approach. DOI:arxiv:2204.09992 [cs].

[175]

Yansong Tang, Yi Tian, Jiwen Lu, Peiyang Li, and Jie Zhou. 2018. Deep progressive reinforcement learning for skeleton-based action recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 5323–5332.

[176]

Hokchhay Tann, Soheil Hashemi, R. Iris Bahar, and Sherief Reda. 2016. Runtime configurable deep neural networks for energy-accuracy trade-off. In 2016 International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS). 1–10.

Digital Library

[177]

Ryutaro Tanno, Kai Arulkumaran, Daniel Alexander, Antonio Criminisi, and Aditya Nori. 2019. Adaptive neural trees. In Proceedings of the 36th International Conference on Machine Learning. PMLR, 6166–6175.

[178]

Ben Taylor, Vicent Sanz Marco, Willy Wolff, Yehia Elkhatib, and Zheng Wang. 2018. Adaptive deep learning model selection on embedded systems. ACM SIGPLAN Notices 53, 6 (June2018), 31–43. DOI:

Digital Library

[179]

Surat Teerapittayanon, Bradley McDanel, and H. T. Kung. 2016. BranchyNet: Fast inference via early exiting from deep neural networks. In 2016 23rd International Conference on Pattern Recognition (ICPR). 2464–2469. DOI:

[180]

Surat Teerapittayanon, Bradley McDanel, and H. T. Kung. 2017. Distributed deep neural networks over the cloud, the edge and end devices. In 2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS). 328–339. DOI:

[181]

Guillaume Vaudaux-Ruth, Adrien Chan-Hon-Tong, and Catherine Achard. 2021. ActionSpotter: Deep reinforcement learning framework for temporal action spotting in videos. In 2020 25th International Conference on Pattern Recognition (ICPR). 631–638. DOI:

[182]

Andreas Veit and Serge Belongie. 2018. Convolutional networks with adaptive inference graphs. In Proceedings of the European Conference on Computer Vision (ECCV). 3–18.

Digital Library

[183]

Srikumar Venugopal, Michele Gazzetti, Yiannis Gkoufas, and Kostas Katrinis. 2018. Shadow puppets: Cloud-level accurate {AI} inference at the speed and economy of edge. In USENIX Workshop on Hot Topics in Edge Computing (HotEdge 18).

[184]

Thomas Verelst and Tinne Tuytelaars. 2020. Dynamic convolutions: Exploiting spatial sparsity for faster inference. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2320–2329.

[185]

Thomas Verelst and Tinne Tuytelaars. 2021. BlockCopy: High-resolution video processing with block-sparse feature propagation and online policies. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 5158–5167.

[186]

Huiyu Wang, Aniruddha Kembhavi, Ali Farhadi, Alan L. Yuille, and Mohammad Rastegari. 2019. ELASTIC: Improving CNNs With dynamic scaling policies. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2258–2267.

[187]

Junjue Wang, Ziqiang Feng, Zhuo Chen, Shilpa George, Mihir Bala, Padmanabhan Pillai, Shao-Wen Yang, and Mahadev Satyanarayanan. 2018. Bandwidth-efficient live video analytics for drones via edge computing. In 2018 IEEE/ACM Symposium on Edge Computing (SEC). 159–173. DOI:

[188]

Limin Wang, Yuanjun Xiong, Zhe Wang, Yu Qiao, Dahua Lin, Xiaoou Tang, and Luc Van Gool. 2016. Temporal segment networks: Towards good practices for deep action recognition. In Computer Vision – ECCV 2016 (Lecture Notes in Computer Science), Bastian Leibe, Jiri Matas, Nicu Sebe, and Max Welling (Eds.). Springer International Publishing, Cham, 20–36. DOI:

[189]

Qilong Wang, Banggu Wu, Pengfei Zhu, Peihua Li, Wangmeng Zuo, and Qinghua Hu. 2020. ECA-Net: Efficient channel attention for deep convolutional neural networks. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 11531–11539. DOI:

[190]

Xin Wang, Fisher Yu, Zi-Yi Dou, Trevor Darrell, and Joseph E. Gonzalez. 2018. SkipNet: Learning dynamic routing in convolutional networks. In Proceedings of the European Conference on Computer Vision (ECCV). 409–424.

Digital Library

[191]

Xin Wang, Fisher Yu, Lisa Dunlap, Yi-An Ma, Ruth Wang, Azalia Mirhoseini, Trevor Darrell, and Joseph E. Gonzalez. 2020. Deep mixture of experts via shallow embedding. In Proceedings of The 35th Uncertainty in Artificial Intelligence Conference. PMLR, 552–562.

[192]

Yulin Wang, Zhaoxi Chen, Haojun Jiang, Shiji Song, Yizeng Han, and Gao Huang. 2021. Adaptive focus for efficient video recognition. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 16249–16258.

[193]

Yulin Wang, Rui Huang, Shiji Song, Zeyi Huang, and Gao Huang. 2021. Not All Images are Worth 16x16 Words: Dynamic Transformers for Efficient Image Recognition. DOI:arxiv:2105.15075 [cs].

[194]

Yue Wang, Jianghao Shen, Ting-Kuei Hu, Pengfei Xu, Tan Nguyen, Richard Baraniuk, Zhangyang Wang, and Yingyan Lin. 2020. Dual dynamic inference: Enabling more efficient, adaptive, and controllable deep inference. IEEE Journal of Selected Topics in Signal Processing 14, 4 (May2020), 623–633. DOI:

[195]

Wenhao Wu, Dongliang He, Xiao Tan, Shifeng Chen, and Shilei Wen. 2019. Multi-agent reinforcement learning based frame sampling for effective untrimmed video recognition. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 6222–6231.

[196]

Wenhao Wu, Dongliang He, Xiao Tan, Shifeng Chen, Yi Yang, and Shilei Wen. 2020. Dynamic inference: A new approach toward efficient video action recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. 676–677.

[197]

Zuxuan Wu, Tushar Nagarajan, Abhishek Kumar, Steven Rennie, Larry S. Davis, Kristen Grauman, and Rogerio Feris. 2018. BlockDrop: Dynamic inference paths in residual networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 8817–8826.

[198]

Zuxuan Wu, Caiming Xiong, Yu-Gang Jiang, and Larry S. Davis. 2019. LiteEval: A coarse-to-fine framework for resource efficient video recognition. In Advances in Neural Information Processing Systems, Vol. 32. Curran Associates, Inc.

[199]

Zuxuan Wu, Caiming Xiong, Chih-Yao Ma, Richard Socher, and Larry S. Davis. 2019. AdaFrame: Adaptive frame selection for fast video recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 1278–1287.

[200]

Zhaofeng Wu, Ding Zhao, Qiao Liang, Jiahui Yu, Anmol Gulati, and Ruoming Pang. 2021. Dynamic sparsity neural networks for automatic speech recognition. In ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 6014–6018. DOI:

[201]

Wenhan Xia, Hongxu Yin, Xiaoliang Dai, and Niraj K. Jha. 2022. Fully dynamic inference with deep neural networks. IEEE Transactions on Emerging Topics in Computing 10, 2 (April2022), 962–972. DOI:

[202]

Zhenda Xie, Zheng Zhang, Xizhou Zhu, Gao Huang, and Stephen Lin. 2020. Spatially adaptive inference with stochastic feature sampling and interpolation. In Computer Vision – ECCV 2020 (Lecture Notes in Computer Science), Andrea Vedaldi, Horst Bischof, Thomas Brox, and Jan-Michael Frahm (Eds.). Springer International Publishing, Cham, 531–548. DOI:

Digital Library

[203]

Ji Xin, Raphael Tang, Jaejun Lee, Yaoliang Yu, and Jimmy Lin. 2020. DeeBERT: Dynamic Early Exiting for Accelerating BERT Inference. DOI:arxiv:2004.12993 [cs].

[204]

Ji Xin, Raphael Tang, Yaoliang Yu, and Jimmy Lin. 2021. BERxiT: Early exiting for BERT with better fine-tuning and extension to regression. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, Paola Merlo, Jorg Tiedemann, and Reut Tsarfaty (Eds.). Association for Computational Linguistics, Online, 91–104. DOI:

[205]

Dianlei Xu, Tong Li, Yong Li, Xiang Su, Sasu Tarkoma, Tao Jiang, Jon Crowcroft, and Pan Hui. 2021. Edge intelligence: Empowering intelligence to the edge of network. Proc. IEEE 109, 11 (Nov.2021), 1778–1837. DOI:

[206]

Kelvin Xu, Jimmy Ba, Ryan Kiros, Kyunghyun Cho, Aaron Courville, Ruslan Salakhudinov, Rich Zemel, and Yoshua Bengio. 2015. Show, attend and tell: Neural image caption generation with visual attention. In Proceedings of the 32nd International Conference on Machine Learning. PMLR, 2048–2057.

Digital Library

[207]

Lanyu Xu, Arun Iyengar, and Weisong Shi. 2020. CHA: A caching framework for home-based voice assistant systems. In 2020 IEEE/ACM Symposium on Edge Computing (SEC). 293–306. DOI:

[208]

Mengwei Xu, Xuanzhe Liu, Yunxin Liu, and Felix Xiaozhu Lin. 2017. Accelerating convolutional neural networks for continuous mobile vision via cache reuse. CoRR abs/1712.01670 (2017). arXiv:1712.01670. http://arxiv.org/abs/1712.01670

[209]

Mengwei Xu, Mengze Zhu, Yunxin Liu, Felix Xiaozhu Lin, and Xuanzhe Liu. 2018. DeepCache: Principled cache for mobile deep vision. In Proceedings of the 24th Annual International Conference on Mobile Computing and Networking (MobiCom’18). Association for Computing Machinery, New York, NY, USA, 129–144. DOI:

Digital Library

[210]

Lei Xun, Long Tran-Thanh, Bashir M. Al-Hashimi, and Geoff V. Merrett. 2019. Incremental training and group convolution pruning for runtime DNN performance scaling on heterogeneous embedded platforms. In 2019 ACM/IEEE 1st Workshop on Machine Learning for CAD (MLCAD). 1–6. DOI:

[211]

Lei Xun, Long Tran-Thanh, Bashir M. Al-Hashimi, and Geoff V. Merrett. 2020. Optimising resource management for embedded machine learning. In 2020 Design, Automation & Test in Europe Conference & Exhibition (DATE). 1556–1561. DOI:

[212]

Zhicheng Yan, Hao Zhang, Robinson Piramuthu, Vignesh Jagadeesh, Dennis DeCoste, Wei Di, and Yizhou Yu. 2015. HD-CNN: Hierarchical deep convolutional neural networks for large scale visual recognition. In Proceedings of the IEEE International Conference on Computer Vision. 2740–2748.

Digital Library

[213]

Kang Yang, Tianzhang Xing, Yang Liu, Zhenjiang Li, Xiaoqing Gong, Xiaojiang Chen, and Dingyi Fang. 2019. cDeepArch: A compact deep neural network architecture for mobile sensing. IEEE/ACM Transactions on Networking 27, 5 (Oct.2019), 2043–2055. DOI:

Digital Library

[214]

Kichang Yang, Juheon Yi, Kyungjin Lee, and Youngki Lee. 2022. FlexPatch: Fast and accurate object detection for on-device high-resolution live video analytics. In IEEE INFOCOM 2022 - IEEE Conference on Computer Communications. 1898–1907. DOI:

Digital Library

[215]

Le Yang, Yizeng Han, Xi Chen, Shiji Song, Jifeng Dai, and Gao Huang. 2020. Resolution adaptive networks for efficient inference. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2369–2378.

[216]

Yu Yang, Di Liu, Hui Fang, Yi-Xiong Huang, Ying Sun, and Zhi-Yuan Zhang. 2022. Once for all skip: Efficient adaptive deep neural networks. In 2022 Design, Automation & Test in Europe Conference & Exhibition (DATE). 568–571. DOI:

[217]

Zerui Yang, Yuhui Xu, Wenrui Dai, and Hongkai Xiong. 2019. Dynamic-stride-net: Deep convolutional neural network with dynamic stride. In Optoelectronic Imaging and Multimedia Technology VI, Vol. 11187. SPIE, 42–53. DOI:

[218]

Serena Yeung, Olga Russakovsky, Greg Mori, and Li Fei-Fei. 2016. End-to-end learning of action detection from frame glimpses in videos. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2678–2687.

[219]

Hongxu Yin, Arash Vahdat, Jose M. Alvarez, Arun Mallya, Jan Kautz, and Pavlo Molchanov. 2022. A-ViT: Adaptive tokens for efficient vision transformer. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 10809–10818.

[220]

Amirreza Yousefzadeh and Manolis Sifalakis. 2022. Delta Activation Layer Exploits Temporal Sparsity for Efficient Embedded Video Processing.

[221]

Adams Wei Yu, Hongrae Lee, and Quoc V. Le. 2017. Learning to Skim Text. DOI:arxiv:1704.06877 [cs].

[222]

Haichao Yu, Haoxiang Li, Humphrey Shi, Thomas S. Huang, and Gang Hua. 2021. Any-precision deep neural networks. Proceedings of the AAAI Conference on Artificial Intelligence 35, 12 (May2021), 10763–10771. DOI:

[223]

Jiahui Yu and Thomas S. Huang. 2019. Universally slimmable networks and improved training techniques. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 1803–1811.

[224]

Jiahui Yu, Linjie Yang, Ning Xu, Jianchao Yang, and Thomas Huang. 2018. Slimmable Neural Networks. DOI:arxiv:1812.08928 [cs].

[225]

Keyi Yu, Yang Liu, Alexander G. Schwing, and Jian Peng. 2022. Fast and accurate text classification: Skimming, rereading and early stopping. (Feb.2022).

[226]

Kun Yuan, Quanquan Li, Shaopeng Guo, Dapeng Chen, Aojun Zhou, Fengwei Yu, and Ziwei Liu. 2021. Differentiable dynamic wirings for neural networks. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 327–336.

[227]

Zhihang Yuan, Bingzhe Wu, Guangyu Sun, Zheng Liang, Shiwan Zhao, and Weichen Bi. 2020. S2DNAS: Transforming static CNN model for dynamic inference via neural architecture search. In Computer Vision – ECCV 2020 (Lecture Notes in Computer Science), Andrea Vedaldi, Horst Bischof, Thomas Brox, and Jan-Michael Frahm (Eds.). Springer International Publishing, Cham, 175–192. DOI:

Digital Library

[228]

Liekang Zeng, En Li, Zhi Zhou, and Xu Chen. 2019. Boomerang: On-demand cooperative deep neural network inference for edge intelligence on the industrial internet of things. IEEE Network 33, 5 (Sept.2019), 96–103. DOI:

Digital Library

[229]

Chen Zhang, Qiang Cao, Hong Jiang, Wenhui Zhang, Jingjun Li, and Jie Yao. 2018. FFS-VA: A fast filtering system for large-scale video analytics. In Proceedings of the 47th International Conference on Parallel Processing (ICPP 2018). Association for Computing Machinery, New York, NY, USA, 1–10. DOI:

Digital Library

[230]

Chen Zhang, Qiang Cao, Hong Jiang, Wenhui Zhang, Jingjun Li, and Jie Yao. 2020. A fast filtering mechanism to improve efficiency of large-scale video analytics. IEEE Trans. Comput. 69, 6 (June2020), 914–928. DOI:

[231]

Jinrui Zhang, Deyu Zhang, Huan Yang, Yunxin Liu, Ju Ren, Xiaohui Xu, Fucheng Jia, and Yaoxue Zhang. 2022. MVPose:Realtime multi-person pose estimation using motion vector on mobile devices. IEEE Transactions on Mobile Computing (2022), 1–1. DOI:

Digital Library

[232]

Linfeng Zhang, Zhanhong Tan, Jiebo Song, Jingwei Chen, Chenglong Bao, and Kaisheng Ma. 2019. SCAN: A scalable neural networks framework towards compact and efficient models. In Advances in Neural Information Processing Systems, Vol. 32. Curran Associates, Inc.

[233]

Pei Zhang, Tailin Liang, John Glossner, Lei Wang, Shaobo Shi, and Xiaotong Zhang. 2021. Dynamic runtime feature map pruning. In Pattern Recognition and Computer Vision (Lecture Notes in Computer Science), Huimin Ma, Liang Wang, Changshui Zhang, Fei Wu, Tieniu Tan, Yaonan Wang, Jianhuang Lai, and Yao Zhao (Eds.). Springer International Publishing, Cham, 411–422. DOI:

Digital Library

[234]

Wuyang Zhang, Zhezhi He, Luyang Liu, Zhenhua Jia, Yunxin Liu, Marco Gruteser, Dipankar Raychaudhuri, and Yanyong Zhang. 2021. Elf: Accelerate high-resolution mobile deep vision with content-aware parallel offloading. In Proceedings of the 27th Annual International Conference on Mobile Computing and Networking (MobiCom’21). Association for Computing Machinery, New York, NY, USA, 201–214. DOI:

Digital Library

[235]

Yu Zhang, Dajiang Liu, and Yongkang Xing. 2021. Dynamic convolution pruning using pooling characteristic in convolution neural networks. In Neural Information Processing (Communications in Computer and Information Science), Teddy Mantoro, Minho Lee, Media Anugerah Ayu, Kok Wai Wong, and Achmad Nizar Hidayanto (Eds.). Springer International Publishing, Cham, 558–565. DOI:

[236]

Yin-Dong Zheng, Zhaoyang Liu, Tong Lu, and Limin Wang. 2020. Dynamic sampling networks for efficient action recognition in videos. IEEE Transactions on Image Processing 29 (2020), 7970–7983. DOI:

[237]

Wangchunshu Zhou, Canwen Xu, Tao Ge, Julian McAuley, Ke Xu, and Furu Wei. 2020. BERT loses patience: Fast and robust inference with early exit. In Advances in Neural Information Processing Systems, Vol. 33. Curran Associates, Inc., 18330–18341.

[238]

Mohammadreza Zolfaghari, Kamaljeet Singh, and Thomas Brox. 2018. ECO: Efficient convolutional network for online video understanding. In Proceedings of the European Conference on Computer Vision (ECCV). 695–712.

Digital Library

[239]

Get Your Footage. 2021. Hands Up Waving Hello Green Screen Effect | Gesture Say Hi Chroma Key in HD 4K.

[240]

PCV. 2022. Vehicle Detection Dataset. https://universe.roboflow.com/pcv-wndzh/vehicle-detection-bq16s. Visited on 2024-04-09.

Index Terms

Adapting Neural Networks at Runtime: Current Trends in At-Runtime Optimizations for Deep Learning
1. Computing methodologies
  1. Artificial intelligence

Recommendations

Deep learning in neural networks

In recent years, deep artificial neural networks (including recurrent ones) have won numerous contests in pattern recognition and machine learning. This historical survey compactly summarizes relevant work, much of it from the previous millennium. ...
Read More
Neural Networks and Deep Learning: Neural Networks & Deep Learning, Deep Learning, Blockchain Blueprint
Read More
Continual Learning with Neural Networks: A Review
CODS-COMAD '19: Proceedings of the ACM India Joint International Conference on Data Science and Management of Data

Continual learning broadly refers to the algorithms which aim to learn continuously over time across varying domains, tasks or data distributions. This is in contrast to algorithms restricted to learning a fixed number of tasks in a given domain, ...
Read More

Comments

Information & Contributors

Information

Published In

cover image ACM Computing Surveys

ACM Computing Surveys Volume 56, Issue 10

October 2024

954 pages

ISSN:0360-0300

EISSN:1557-7341

DOI:10.1145/3613652

Editors:
David Atienza
Swiss Federal Institute of Technology Lausanne (EPFL), Switzerland
,
Michela Milano
University of Bologna, Italy

Issue’s Table of Contents

Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the owner/author(s).

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 May 2024

Online AM: 10 April 2024

Accepted: 02 April 2024

Revised: 08 March 2024

Received: 18 January 2023

Published in CSUR Volume 56, Issue 10

Check for updates

Author Tags

Qualifiers

Survey

Funding Sources

German Federal Ministry of Education and Research (BMBF)

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
546
Total Downloads

Downloads (Last 12 months)546
Downloads (Last 6 weeks)178

Other Metrics

View Author Metrics

Citations

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Full Text

View this article in Full Text.

Media

Figures

Other

Tables

View full text|Download PDF

View Issue’s Table of Contents