Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3495243.3560545acmconferencesArticle/Chapter ViewAbstractPublication PagesmobicomConference Proceedingsconference-collections
research-article

Mandheling: mixed-precision on-device DNN training with DSP offloading

Published: 14 October 2022 Publication History

Abstract

This paper proposes Mandheling, the first system that enables highly resource-efficient on-device training by orchestrating mixed-precision training with on-chip Digital Signal Processor (DSP) offloading. Mandheling fully explores the advantages of DSP in integer-based numerical calculations using four novel techniques: (1) a CPU-DSP co-scheduling scheme to situationally mitigate the overhead from DSP-unfriendly operators; (2) a self-adaptive rescaling algorithm to reduce the overhead of dynamic rescaling in backward propagation; (3) a batch-splitting algorithm to improve DSP cache efficiency; (4) a DSP compute subgraph-reusing mechanism to eliminate the preparation overhead on DSP. We have fully implemented Mandheling and demonstrated its effectiveness through extensive experiments. The results show that, compared to the state-of-the-art DNN engines from TFLite and MNN, Mandheling reduces per-batch training time by 5.5X and energy consumption by 8.9X on average. In end-to-end training tasks, Mandheling reduces convergence time by up to 10.7X and energy consumption by 13.1X, with only 1.9%--2.7% accuracy loss compared to the FP32 precision setting.

References

[1]
Federated learning: Collaborative machine learning without centralized training data. https://ai.googleblog.com/2017/04/federated-learning-collaborative.html, 2017.
[2]
How apple personalizes siri without hoovering up your data. https://www.technologyreview.com/2019/12/11/131629/apple-ai-personalizes-siri-federated-learning/, 2019.
[3]
Qualcomm hexagon nn offload framework. https://source.codeaurora.org/quic/hexagon_nn/, 2020.
[4]
dsp-processor. https://developer.qualcomm.com/software/hexagon-dsp-sdk/dsp-processor, 2021.
[5]
General data protection regulation. https://gdpr-info.eu/, 2021.
[6]
Genshin. https://genshin.mihoyo.com/, 2021.
[7]
/hexagon-dsp-sdk. https://developer.qualcomm.com/software/hexagon-dsp-sdk, 2021.
[8]
Qualcomm hexagon. https://en.wikipedia.org/wiki/Qualcomm_Hexagon, 2021.
[9]
Tensorflow graph reusing. https://www.tensorflow.org/guide/function, 2021.
[10]
Tiktok. https://www.tiktok.com, 2021.
[11]
Tnn. https://github.com/Tencent/TNN, 2021.
[12]
Youtube. https://www.youtube.com, 2021.
[13]
Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, et al. Tensorflow: A system for large-scale machine learning. In 12th USENIX Symposium on Operating Systems Design and Implementation), pages 265--283, 2016.
[14]
Ron Banner, Itay Hubara, Elad Hoffer, and Daniel Soudry. Scalable methods for 8-bit training of neural networks. Advances in neural information processing systems, 31, 2018.
[15]
Jean Christophe Beyler and Philippe Clauss. Performance driven data cache prefetching in a dynamic software optimization system. In Proceedings of the 21st annual international conference on Supercomputing, pages 202--209, 2007.
[16]
Keith Bonawitz, Hubert Eichner, Wolfgang Grieskamp, Dzmitry Huba, Alex Ingerman, Vladimir Ivanov, Chloe Kiddon, Jakub Konečnỳ, Stefano Mazzocchi, H Brendan McMahan, et al. Towards federated learning at scale: System design. arXiv preprint arXiv:1902.01046, 2019.
[17]
Léon Bottou. Large-scale machine learning with stochastic gradient descent. In Proceedings of 19th International Conference on Computational Statistics, pages 177--186. Springer, 2010.
[18]
Dongqi Cai, Qipeng Wang, Yuanqiang Liu, Yunxin Liu, Shangguang Wang, and Mengwei Xu. Towards ubiquitous learning: A first measurement of on-device training performance. In Proceedings of the 5th International Workshop on Embedded and Mobile Deep Learning, pages 31--36, 2021.
[19]
Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin Zheng, Eddie Yan, Haichen Shen, Meghan Cowan, Leyuan Wang, Yuwei Hu, Luis Ceze, et al. TVM: An automated end-to-end optimizing compiler for deep learning. In 13th USENIX Symposium on Operating Systems Design and Implementation, pages 578--594, 2018.
[20]
Xi Chen, Xiaolin Hu, Hucheng Zhou, and Ningyi Xu. Fxpnet: Training a deep convolutional neural network in fixed-point representation. In 2017 International Joint Conference on Neural Networks, pages 2494--2501. IEEE, 2017.
[21]
Matthieu Courbariaux, Yoshua Bengio, and Jean-Pierre David. Binaryconnect: Training deep neural networks with binary weights during propagations. Advances in neural information processing systems, 28, 2015.
[22]
Biyi Fang, Xiao Zeng, and Mi Zhang. Nestdnn: Resource-aware multi-tenant on-device deep learning for continuous mobile vision. In Proceedings of the 24th Annual International Conference on Mobile Computing and Networking, pages 115--127, 2018.
[23]
Petko Georgiev, Nicholas D Lane, Cecilia Mascolo, and David Chu. Accelerating mobile audio sensing algorithms through on-chip gpu offloading. In Proceedings of the 15th Annual International Conference on Mobile Systems, Applications, and Services, pages 306--318, 2017.
[24]
Petko Georgiev, Nicholas D Lane, Kiran K Rachuri, and Cecilia Mascolo. Dsp. ear: Leveraging co-processor support for continuous audio sensing on smart-phones. In Proceedings of the 12th ACM Conference on Embedded Network Sensor Systems, pages 295--309, 2014.
[25]
Priya Goyal, Piotr Dollár, Ross Girshick, Pieter Noordhuis, Lukasz Wesolowski, Aapo Kyrola, Andrew Tulloch, Yangqing Jia, and Kaiming He. Accurate, large minibatch sgd: Training imagenet in 1 hour. arXiv preprint arXiv:1706.02677, 2017.
[26]
Wentian Guo, Yuchen Li, and Kian-Lee Tan. Exploiting reuse for gpu subgraph enumeration. IEEE Transactions on Knowledge and Data Engineering, 2020.
[27]
Donghee Ha, Mooseop Kim, KyeongDeok Moon, and Chi Yoon Jeong. Accelerating on-device learning with layer-wise processor selection method on unified memory. IEEE Sensors, 21(7):2364, 2021.
[28]
Myeonggyun Han and Woongki Baek. Herti: A reinforcement learning-augmented system for efficient real-time inference on heterogeneous embedded systems. In 2021 30th International Conference on Parallel Architectures and Compilation Techniques (PACT), pages 90--102. IEEE, 2021.
[29]
Myeonggyun Han, Jihoon Hyun, Seongbeom Park, Jinsu Park, and Woongki Baek. Mosaic: Heterogeneity-, communication-, and constraint-aware model slicing and execution for accurate and efficient inference. In 2019 28th International Conference on Parallel Architectures and Compilation Techniques, pages 165--177. IEEE, 2019.
[30]
Rui Han, Qinglong Zhang, Chi Harold Liu, Guoren Wang, Jian Tang, and Lydia Y Chen. Legodnn: block-grained scaling of deep neural networks for mobile vision. In Proceedings of the 27th Annual International Conference on Mobile Computing and Networking, pages 406--419, 2021.
[31]
Seungyeop Han, Haichen Shen, Matthai Philipose, Sharad Agarwal, Alec Wolman, and Arvind Krishnamurthy. Mcdnn: An approximation-based execution framework for deep stream processing under resource constraints. In Proceedings of the 14th Annual International Conference on Mobile Systems, Applications, and Services, pages 123--136, 2016.
[32]
Andrew Hard, Kanishka Rao, Rajiv Mathews, Swaroop Ramaswamy, Françoise Beaufays, Sean Augenstein, Hubert Eichner, Chloé Kiddon, and Daniel Ramage. Federated learning for mobile keyboard prediction. arXiv preprint arXiv:1811.03604, 2018.
[33]
Chaoyang He, Songze Li, Jinhyun So, Xiao Zeng, Mi Zhang, Hongyi Wang, Xiaoyang Wang, Praneeth Vepakomma, Abhishek Singh, Hang Qiu, et al. Fedml: A research library and benchmark for federated machine learning. arXiv preprint arXiv:2007.13518, 2020.
[34]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770--778, 2016.
[35]
Loc N Huynh, Youngki Lee, and Rajesh Krishna Balan. Deepmon: Mobile gpu-based deep learning framework for continuous vision applications. In Proceedings of the 15th Annual International Conference on Mobile Systems, Applications, and Services, pages 82--95, 2017.
[36]
Benoit Jacob, Skirmantas Kligys, Bo Chen, Menglong Zhu, Matthew Tang, Andrew Howard, Hartwig Adam, and Dmitry Kalenichenko. Quantization and training of neural networks for efficient integer-arithmetic-only inference. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2704--2713, 2018.
[37]
Joo Seong Jeong, Jingyu Lee, Donghyun Kim, Changmin Jeon, Changjin Jeong, Youngki Lee, and Byung-Gon Chun. Band: coordinated multi-dnn inference on heterogeneous mobile processors. In Proceedings of the 20th Annual International Conference on Mobile Systems, Applications and Services, pages 235--247, 2022.
[38]
Divyansh Jhunjhunwala, Advait Gadhikar, Gauri Joshi, and Yonina C Eldar. Adaptive quantization of model updates for communication-efficient federated learning. In 2021 IEEE International Conference on Acoustics, Speech and Signal Processing, pages 3110--3114. IEEE, 2021.
[39]
Xiaotang Jiang, Huan Wang, Yiliu Chen, Ziqi Wu, Lichuan Wang, Bin Zou, Yafeng Yang, Zongyang Cui, Yu Cai, Tianhang Yu, et al. Mnn: A universal and efficient inference engine. arXiv preprint arXiv:2002.12418, 2020.
[40]
Peter Kairouz, H Brendan McMahan, Brendan Avent, Aurélien Bellet, Mehdi Bennis, Arjun Nitin Bhagoji, Kallista Bonawitz, Zachary Charles, Graham Cormode, Rachel Cummings, et al. Advances and open problems in federated learning. Foundations and Trends® in Machine Learning, 14(1--2):1--210, 2021.
[41]
Youngsok Kim, Joonsung Kim, Dongju Chae, Daehyun Kim, and Jangwoo Kim. μlayer: Low latency on-device inference using cooperative single-layer acceleration and processor-friendly quantization. In Proceedings of the Fourteenth EuroSys Conference 2019, pages 1--15, 2019.
[42]
Jakub Konečnỳ, H Brendan McMahan, Daniel Ramage, and Peter Richtárik. Federated optimization: Distributed machine learning for on-device intelligence. arXiv preprint arXiv:1610.02527, 2016.
[43]
Jakub Konečnỳ, H Brendan McMahan, Felix X Yu, Peter Richtárik, Ananda Theertha Suresh, and Dave Bacon. Federated learning: Strategies for improving communication efficiency. arXiv preprint arXiv:1610.05492, 2016.
[44]
Alex Krizhevsky, Geoffrey Hinton, et al. Learning multiple layers of features from tiny images. 2009.
[45]
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems, 25, 2012.
[46]
Nicholas D Lane, Sourav Bhattacharya, Petko Georgiev, Claudio Forlivesi, Lei Jiao, Lorena Qendro, and Fahim Kawsar. Deepx: A software accelerator for low-power deep learning inference on mobile devices. In 2016 15th ACM/IEEE International Conference on Information Processing in Sensor Networks, pages 1--12. IEEE, 2016.
[47]
Nicholas D Lane, Petko Georgiev, and Lorena Qendro. Deepear: robust smart-phone audio sensing in unconstrained acoustic environments using deep learning. In Proceedings of the 2015 ACM international joint conference on pervasive and ubiquitous computing, pages 283--294, 2015.
[48]
Royson Lee, Stylianos I Venieris, Lukasz Dudziak, Sourav Bhattacharya, and Nicholas D Lane. Mobisr: Efficient on-device super-resolution through heterogeneous mobile processors. In The 25th Annual International Conference on Mobile Computing and Networking, pages 1--16, 2019.
[49]
Ang Li, Jingwei Sun, Pengcheng Li, Yu Pu, Hai Li, and Yiran Chen. Hermes: an efficient federated learning framework for heterogeneous mobile clients. In Proceedings of the 27th Annual International Conference on Mobile Computing and Networking, pages 420--437, 2021.
[50]
Xiaofan Lin, Cong Zhao, and Wei Pan. Towards accurate binary convolutional neural network. Advances in neural information processing systems, 30, 2017.
[51]
Zhouhan Lin, Matthieu Courbariaux, Roland Memisevic, and Yoshua Bengio. Neural networks with few multiplications. arXiv preprint arXiv:1510.03009, 2015.
[52]
TensorFlow Lite. Deploy machine learning models on mobile and iot devices, 2019.
[53]
Yizhi Liu, Yao Wang, Ruofei Yu, Mu Li, Vin Sharma, and Yida Wang. Optimizing cnn model inference on cpus. In 2019 USENIX Annual Technical Conference, pages 1025--1040, 2019.
[54]
Christos Louizos, Matthias Reisser, Tijmen Blankevoort, Efstratios Gavves, and Max Welling. Relaxed quantization for discretized neural networks. arXiv preprint arXiv:1810.01875, 2018.
[55]
Brendan McMahan, Eider Moore, Daniel Ramage, Seth Hampson, and Blaise Aguera y Arcas. Communication-efficient learning of deep networks from decentralized data. In Artificial intelligence and statistics, pages 1273--1282. PMLR, 2017.
[56]
Sparsh Mittal. A survey of recent prefetching techniques for processor caches. ACM Computing Surveys, 49(2):1--35, 2016.
[57]
Fan Mo, Hamed Haddadi, Kleomenis Katevas, Eduard Marin, Diego Perino, and Nicolas Kourtellis. Ppfl: privacy-preserving federated learning with trusted execution environments. In Proceedings of the 19th Annual International Conference on Mobile Systems, Applications, and Services, pages 94--108, 2021.
[58]
Chaoyue Niu, Fan Wu, Shaojie Tang, Lifeng Hua, Rongfei Jia, Chengfei Lv, Zhihua Wu, and Guihai Chen. Billion-scale federated learning on mobile clients: A submodel design with tunable privacy. In Proceedings of the 26th Annual International Conference on Mobile Computing and Networking, pages 1--14, 2020.
[59]
Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, et al. Pytorch: An imperative style, high-performance deep learning library. arXiv preprint arXiv:1912.01703, 2019.
[60]
Robert Pyka, Christoph Faßbach, Manish Verma, Heiko Falk, and Peter Marwedel. Operating system integrated energy aware scratchpad allocation strategies for multiprocess applications. In Proceedings of the 10th international workshop on Software & compilers for embedded systems, pages 41--50, 2007.
[61]
Rajib Rana, Margee Hume, John Reilly, Raja Jurdak, and Jeffrey Soar. Opportunistic and context-aware affect sensing on smartphones. IEEE Pervasive Computing, 15(02):60--69, 2016.
[62]
Mohammad Rastegari, Vicente Ordonez, Joseph Redmon, and Ali Farhadi. Xnor-net: Imagenet classification using binary convolutional neural networks. In European conference on computer vision, pages 525--542. Springer, 2016.
[63]
Sashank Reddi, Zachary Charles, Manzil Zaheer, Zachary Garrett, Keith Rush, Jakub Konečnỳ, Sanjiv Kumar, and H Brendan McMahan. Adaptive federated optimization. arXiv preprint arXiv:2003.00295, 2020.
[64]
Venu Gopal Reddy. Neon technology introduction. ARM Corporation, 4(1):1--33, 2008.
[65]
Amirhossein Reisizadeh, Aryan Mokhtari, Hamed Hassani, Ali Jadbabaie, and Ramtin Pedarsani. Fedpaq: A communication-efficient federated learning method with periodic averaging and quantization. In International Conference on Artificial Intelligence and Statistics, pages 2021--2031. PMLR, 2020.
[66]
Nir Shlezinger, Mingzhe Chen, Yonina C Eldar, H Vincent Poor, and Shuguang Cui. Federated learning with quantization constraints. In 2020 IEEE International Conference on Acoustics, Speech and Signal Processing, pages 8851--8855. IEEE, 2020.
[67]
Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014.
[68]
Samuel L Smith, Pieter-Jan Kindermans, Chris Ying, and Quoc V Le. Don't decay the learning rate, increase the batch size. arXiv preprint arXiv:1711.00489, 2017.
[69]
Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jon Shlens, and Zbigniew Wojna. Rethinking the inception architecture for computer vision. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2818--2826, 2016.
[70]
Hanlin Tang, Shaoduo Gan, Ce Zhang, Tong Zhang, and Ji Liu. Communication compression for decentralized training. Advances in Neural Information Processing Systems, 31, 2018.
[71]
Devesh Tiwari, Sanghoon Lee, James Tuck, and Yan Solihin. Mmt: Exploiting finegrained parallelism in dynamic memory management. In 2010 IEEE International Symposium on Parallel & Distributed Processing, pages 1--12. IEEE, 2010.
[72]
Haozhao Wang, Zhihao Qu, Qihua Zhou, Haobo Zhang, Boyuan Luo, Wenchao Xu, Song Guo, and Ruixuan Li. A comprehensive survey on training acceleration for large machine learning models in iots. IEEE Internet of Things Journal, 2021.
[73]
Manni Wang, Shaohua Ding, Ting Cao, Yunxin Liu, and Fengyuan Xu. Asymo: scalable and efficient deep-learning inference on asymmetric mobile cpus. In Proceedings of the 27th Annual International Conference on Mobile Computing and Networking, pages 215--228, 2021.
[74]
Maolin Wang, Seyedramin Rasoulinezhad, Philip HW Leong, and Hayden KH So. Niti: Training integer neural networks using integer-only arithmetic. arXiv preprint arXiv:2009.13108, 2020.
[75]
Naigang Wang, Jungwook Choi, Daniel Brand, Chia-Yu Chen, and Kailash Gopalakrishnan. Training deep neural networks with 8-bit floating point numbers. Advances in neural information processing systems, 31, 2018.
[76]
Qipeng Wang, Mengwei Xu, Chao Jin, Xinran Dong, Jinliang Yuan, Xin Jin, Gang Huang, Yunxin Liu, and Xuanzhe Liu. Melon: Breaking the memory wall for resource-efficient on-device machine learning. 2022.
[77]
Shuang Wu, Guoqi Li, Feng Chen, and Luping Shi. Training and inference with integers in deep neural networks. arXiv preprint arXiv:1802.04680, 2018.
[78]
Xundong Wu, Yong Wu, and Yong Zhao. Binarized neural networks on the imagenet classification task. arXiv preprint arXiv:1604.03058, 2016.
[79]
Mengwei Xu, Feng Qian, Qiaozhu Mei, Kang Huang, and Xuanzhe Liu. Deep-type: On-device deep learning for input personalization service with minimal privacy concern. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies.
[80]
Mengwei Xu, Feng Qian, Mengze Zhu, Feifan Huang, Saumay Pushp, and Xuanzhe Liu. Deepwear: Adaptive local offloading for on-wearable deep learning. IEEE Transactions on Mobile Computing, 19(2):314--330, 2019.
[81]
Mengwei Xu, Mengze Zhu, Yunxin Liu, Felix Xiaozhu Lin, and Xuanzhe Liu. Deepcache: Principled cache for mobile deep vision. In Proceedings of the 24th Annual International Conference on Mobile Computing and Networking, pages 129--144, 2018.
[82]
Chengxu Yang, Qipeng Wang, Mengwei Xu, Zhenpeng Chen, Kaigui Bian, Yunxin Liu, and Xuanzhe Liu. Characterizing impacts of heterogeneity in federated learning upon large-scale smartphone data. In Proceedings of the Web Conference 2021, pages 935--946, 2021.
[83]
Yukuan Yang, Lei Deng, Shuang Wu, Tianyi Yan, Yuan Xie, and Guoqi Li. Training high-performance and large-scale deep neural networks with full 8-bit integers. Neural Networks, 125:70--82, 2020.
[84]
Hyunho Yeo, Chan Ju Chong, Youngmok Jung, Juncheol Ye, and Dongsu Han. Nemo: enabling neural-enhanced video streaming on commodity mobile devices. In Proceedings of the 26th Annual International Conference on Mobile Computing and Networking, pages 1--14, 2020.
[85]
Qunsong Zeng, Yuqing Du, Kaibin Huang, and Kin K Leung. Energy-efficient resource management for federated edge learning with cpu-gpu heterogeneous computing. IEEE Transactions on Wireless Communications, 20(12):7947--7962, 2021.
[86]
Jinrui Zhang, Deyu Zhang, Xiaohui Xu, Fucheng Jia, Yunxin Liu, Xuanzhe Liu, Ju Ren, and Yaoxue Zhang. Mobipose: Real-time multi-person pose estimation on mobile devices. In Proceedings of the 18th Conference on Embedded Networked Sensor Systems, pages 136--149, 2020.
[87]
Qiyang Zhang, Xiang Li, Xiangying Che, Xiao Ma, Ao Zhou, Mengwei Xu, Shangguang Wang, Yun Ma, and Xuanzhe Liu. A comprehensive benchmark of deep learning libraries on mobile devices. arXiv preprint arXiv:2202.06512, 2022.
[88]
Xishan Zhang, Shaoli Liu, Rui Zhang, Chang Liu, Di Huang, Shiyi Zhou, Jiaming Guo, Qi Guo, Zidong Du, Tian Zhi, et al. Fixed-point back-propagation training. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2330--2338, 2020.
[89]
Qin Zhao, Rodric Rabbah, and Weng-Fai Wong. Dynamic memory optimization using pool allocation and prefetching. ACM SIGARCH Computer Architecture News, 33(5):27--32, 2005.
[90]
Kai Zhong, Tianchen Zhao, Xuefei Ning, Shulin Zeng, Kaiyuan Guo, Yu Wang, and Huazhong Yang. Towards lower bit multiplication for convolutional neural network training. arXiv preprint arXiv:2006.02804, 3(4), 2020.
[91]
Qihua Zhou, Song Guo, Zhihao Qu, Jingcai Guo, Zhenda Xu, Jiewei Zhang, Tao Guo, Boyuan Luo, and Jingren Zhou. Octo: {INT8} training with loss-aware compensation and backward quantization for tiny on-device learning. In 2021 USENIX Annual Technical Conference, pages 177--191, 2021.
[92]
Shuchang Zhou, Yuxin Wu, Zekun Ni, Xinyu Zhou, He Wen, and Yuheng Zou. Dorefa-net: Training low bitwidth convolutional neural networks with low bitwidth gradients. arXiv preprint arXiv:1606.06160, 2016.
[93]
Yiren Zhou, Seyed-Mohsen Moosavi-Dezfooli, Ngai-Man Cheung, and Pascal Frossard. Adaptive quantization for deep neural network. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 32, 2018.
[94]
Chenzhuo Zhu, Song Han, Huizi Mao, and William J Dally. Trained ternary quantization. arXiv preprint arXiv:1612.01064, 2016.
[95]
Feng Zhu, Ruihao Gong, Fengwei Yu, Xianglong Liu, Yanfei Wang, Zhelong Li, Xiuqi Yang, and Junjie Yan. Towards unified int8 training for convolutional neural network. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1969--1979, 2020.

Cited By

View all
  • (2024)FAMOSProceedings of the 33rd USENIX Conference on Security Symposium10.5555/3698900.3698917(289-306)Online publication date: 14-Aug-2024
  • (2024)TinyTrainProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3693103(25812-25843)Online publication date: 21-Jul-2024
  • (2024)FwdLLMProceedings of the 2024 USENIX Conference on Usenix Annual Technical Conference10.5555/3691992.3692028(579-596)Online publication date: 10-Jul-2024
  • Show More Cited By

Index Terms

  1. Mandheling: mixed-precision on-device DNN training with DSP offloading

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      MobiCom '22: Proceedings of the 28th Annual International Conference on Mobile Computing And Networking
      October 2022
      932 pages
      ISBN:9781450391818
      DOI:10.1145/3495243
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 14 October 2022

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. DSP offloading
      2. deep learning
      3. mobile device

      Qualifiers

      • Research-article

      Funding Sources

      • the National Key R&D Program of China
      • Beijing Nova Program
      • the National Natural Science Foundation of China
      • Young Elite Scientists Sponsorship Program by CAST
      • the PKU-Baidu Fund Project
      • NSFC
      • the National Natural Science Fund for the Excellent Young Scientists Fund Program (Overseas)

      Conference

      ACM MobiCom '22
      Sponsor:

      Acceptance Rates

      Overall Acceptance Rate 440 of 2,972 submissions, 15%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)324
      • Downloads (Last 6 weeks)23
      Reflects downloads up to 26 Jan 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)FAMOSProceedings of the 33rd USENIX Conference on Security Symposium10.5555/3698900.3698917(289-306)Online publication date: 14-Aug-2024
      • (2024)TinyTrainProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3693103(25812-25843)Online publication date: 21-Jul-2024
      • (2024)FwdLLMProceedings of the 2024 USENIX Conference on Usenix Annual Technical Conference10.5555/3691992.3692028(579-596)Online publication date: 10-Jul-2024
      • (2024)More is differentProceedings of the 2024 USENIX Conference on Usenix Annual Technical Conference10.5555/3691992.3692009(285-302)Online publication date: 10-Jul-2024
      • (2024)On-device Training: A First Overview on Existing SystemsACM Transactions on Sensor Networks10.1145/369600320:6(1-39)Online publication date: 14-Sep-2024
      • (2024)Artificial Intelligence of Things: A SurveyACM Transactions on Sensor Networks10.1145/369063921:1(1-75)Online publication date: 30-Aug-2024
      • (2024)Pluto and Charon: A Time and Memory Efficient Collaborative Edge AI Framework for Personal LLMs Fine-tuningProceedings of the 53rd International Conference on Parallel Processing10.1145/3673038.3673043(762-771)Online publication date: 12-Aug-2024
      • (2024)AdaShadow: Responsive Test-time Model Adaptation in Non-stationary Mobile EnvironmentsProceedings of the 22nd ACM Conference on Embedded Networked Sensor Systems10.1145/3666025.3699339(295-308)Online publication date: 4-Nov-2024
      • (2024)PieBridge: Fast and Parameter-Efficient On-Device Training via Proxy NetworksProceedings of the 22nd ACM Conference on Embedded Networked Sensor Systems10.1145/3666025.3699327(126-140)Online publication date: 4-Nov-2024
      • (2024)WiP: Efficient LLM Prefilling with Mobile NPUProceedings of the Workshop on Edge and Mobile Foundation Models10.1145/3662006.3662066(33-35)Online publication date: 3-Jun-2024
      • Show More Cited By

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media