research-article

Flamingo: A User-Centric System for Fast and Energy-Efficient DNN Training on Smartphones

Authors:

Sanjay Sri Vallabh Singapuram,

Chengsong Zhang,

Mosharaf ChowdhuryAuthors Info & Claims

DistributedML '23: Proceedings of the 4th International Workshop on Distributed Machine Learning

Pages 1 - 10

https://doi.org/10.1145/3630048.3630183

Published: 05 December 2023 Publication History

Abstract

Training DNNs on a smartphone system-on-a-chip (SoC) without carefully considering its resource constraints leads to suboptimal training performance and significantly affects user experience. To this end, we present Flamingo, a system for smartphones that optimizes DNN training for time and energy under dynamic resource availability, by scaling parallelism and exploiting compute heterogeneity in real-time. As AI becomes a part of the mainstream smartphone experience, the need to train on-device becomes crucial to fine-tune predictive models while ensuring data privacy. Our experiments show that Flamingo achieves significant improvement in reducing time (12×) and energy (8×) for on-device training, while nearly eliminating detrimental user experience. Extensive large-scale evaluations show that Flamingo can improve end-to-end training performance by 1.2-23.3× and energy efficiency by 1.6--7× over the state-of-the-art.

References

[1]

Alibaba MNN. https://github.com/alibaba/MNN.

[2]

Android Sandbox. https://source.android.com/security/app-sandbox.

[3]

Android WakeLock. https://developer.android.com/reference/android/os/PowerManager.WakeLock.

[4]

Apple Core ML. https://developer.apple.com/documentation/coreml.

[5]

Deeplearning for Java. https://deeplearning4j.konduit.ai/.

[6]

Depthwise convolution in intel mkl-dnn. https://oneapi-src.github.io/oneDNN/dev_guide_convolution.html?highlight=depthwise.

[7]

Federated learning: Collaborative machine learning without centralized training Data. Google AI Blog.

[8]

GeekBench Android. https://browser.geekbench.com/android-benchmarks.

[9]

Google Open Images Dataset. https://storage.googleapis.com/openimages/web/index.html.

[10]

Huawei battery recommendations. https://consumer.huawei.com/za/support/battery-charging/lithium-ion-battery/.

[11]

iOS Thread Affinity API. https://github.com/apple-oss-distributions/xnu/blob/main/osfmk/mach/thread_policy.h.

[12]

iOS/macOS DispactchQoS API. https://developer.apple.com/documentation/dispatch/dispatchqos.

[13]

Linux CPU affinity. https://man7.org/linux/man-pages/man2/sched_getaffinity.2.html.

[14]

macOS DispatchQoS Experiments. https://eclecticlight.co/2022/01/07/how-macos- controls-performance-qos-on-intel-and-m1-processors/.

[15]

ONNX->MNN conversion issue. https://github.com/alibaba/MNN/issues/2298.

[16]

PassMark Android. ''https://www.androidbenchmark.net/''.

[17]

PCMark for Android. https://benchmarks.ul.com/pcmark-android.

[18]

PyTorch. https://pytorch.org/.

[19]

PyTorch on iOS. https://pytorch.org/mobile/ios/.

[20]

Snapdragon 845 specs. https://en.wikichip.org/wiki/qualcomm/snapdragon_800/845.

[21]

Snapdragon 855 specs. https://en.wikichip.org/wiki/qualcomm/snapdragon_800/855.

[22]

Snapdragon 865 specs. https://en.wikichip.org/wiki/qualcomm/snapdragon_800/865.

[23]

TensorFlow Lite. https://www.tensorflow.org/lite/models.

[24]

Termux. https://termux.com/.

[25]

Termux API. https://wiki.termux.com/wiki/Termux:API.

[26]

TFLite GPU Delegate Issue. https://github.com/tensorflow/tensorflow/issues/53335.

[27]

Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, Manjunath Kudlur, Josh Levenberg, Rajat Monga, Sherry Moore, Derek G. Murray, Benoit Steiner, Paul Tucker, Vijay Vasudevan, Pete Warden, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. Tensorflow: A system for large-scale machine learning. In OSDI, 2016.

Digital Library

[28]

Daniel J. Beutel, Taner Topal, Akhil Mathur, Xinchi Qiu, Titouan Parcollet, Pedro P. B. de Gusmão, and Nicholas D. Lane. Flower: A friendly federated learning research framework. In arxiv.org/abs/2007.14390, 2020.

[29]

Keith Bonawitz, Hubert Eichner, Wolfgang Grieskamp, Dzmitry Huba, Alex Ingerman, Vladimir Ivanov, Chloe Kiddon, Jakub Konečnỳy, Stefano Mazzocchi, H Brendan McMahan, et al. Towards federated learning at scale: System design. In MLSys, 2019.

[30]

Luca Casati and Andrea Visconti. The dangers of rooting: data leakage detection in android applications. Mobile Information Systems, 2018, 2018.

[31]

Leonardo Dagum and Ramesh Menon. Openmp: an industry standard api for shared-memory programming. IEEE computational science and engineering, 5(1):46--55, 1998.

[32]

ARM Developer. Energy aware scheduling (eas).

[33]

Apple Differential Privacy Team. Learning with privacy at scale. In Apple Machine Learning Journal, 2017.

[34]

Dmitry Duplyakin, Robert Ricci, Aleksander Maricq, Gary Wong, Jonathon Duerig, Eric Eide, Leigh Stoller, Mike Hibler, David Johnson, Kirk Webb, Aditya Akella, Kuangching Wang, Glenn Ricart, Larry Landweber, Chip Elliott, Michael Zink, Emmanuel Cecchet, Snigdhaswin Kar, and Prabodh Mishra. The design and operation of CloudLab. In Proceedings of the USENIX Annual Technical Conference (ATC), pages 1--14, July 2019.

[35]

Begum Egilmez, Gokhan Memik, Seda Ogrenci-Memik, and Oguz Ergin. User-specific skin temperature-aware dvfs for smartphones. In 2015 Design, Automation & Test in Europe Conference & Exhibition (DATE), pages 1217--1220. IEEE, 2015.

[36]

Cao Gao, Anthony Gutierrez, Madhav Rajan, Ronald G. Dreslinski, Trevor Mudge, and Carole-Jean Wu. A study of mobile device utilization. In 2015 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), pages 225--234, 2015.

[37]

Avishek Ghosh, Jichan Chung, Dong Yin, and Kannan Ramchandran. An efficient framework for clustered federated learning. In NeurIPS, 2020.

[38]

In Gim and JeongGil Ko. Memory-efficient dnn training on mobile devices. In Proceedings of the 20th Annual International Conference on Mobile Systems, Applications and Services, MobiSys '22, page 464--476, New York, NY, USA, 2022. Association for Computing Machinery.

Digital Library

[39]

Peizhen Guo, Bo Hu, and Wenjun Hu. Mistify: Automating dnn model porting for on-device inference at the edge. In NSDI, 2021.

[40]

Song Han, Huizi Mao, and William J. Dally. Deep compression: Compressing deep neural network with pruning, trained quantization and huffman coding. In Yoshua Bengio and Yann LeCun, editors, 4th International Conference on Learning Representations, ICLR 2016, San Juan, Puerto Rico, May 2--4, 2016, Conference Track Proceedings, 2016.

[41]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In CVPR, 2016.

[42]

Kevin Hsieh, Ganesh Ananthanarayanan, Peter Bodik, Shivaram Venkataraman, Paramvir Bahl, Matthai Philipose, Phillip B Gibbons, and Onur Mutlu. Focus: Querying large video datasets with low latency and low cost. In OSDI, 2018.

[43]

Dzmitry Huba, John Nguyen, Kshitiz Malik, Ruiyu Zhu, Mike Rabbat, Ashkan Yousefpour, Carole-Jean Wu, Hongyuan Zhan, Pavel Ustinov, Harish Srinivas, Kaikai Wang, Anthony Shoumikhin, Jesik Min, and Mani Malek. Papaya: Practical, private, and scalable federated learning. In MLSys, 2022.

[44]

Peter Kairouz, H Brendan McMahan, Brendan Avent, Aurélien Bellet, Mehdi Bennis, Arjun Nitin Bhagoji, Keith Bonawitz, Zachary Charles, Graham Cormode, Rachel Cummings, et al. Advances and open problems in federated learning. In Foundations and Trends in Machine Learning, 2021.

[45]

Young Geun Kim and Carole-Jean Wu. Autofl: Enabling heterogeneity-aware energy efficient federated learning. In MICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture, pages 183--198, 2021.

Digital Library

[46]

Fan Lai, Yinwei Dai, Sanjay S. Singapuram, Jiachen Liu, Xiangfeng Zhu, Harsha V. Madhyastha, and Mosharaf Chowdhury. FedScale: Benchmarking model and system performance of federated learning at scale. In International Conference on Machine Learning (ICML), 2022.

[47]

Fan Lai, Xiangfeng Zhu, Harsha V. Madhyastha, and Mosharaf Chowdhury. Oort: Efficient federated learning via guided participant selection. In 15th USENIX Symposium on Operating Systems Design and Implementation (OSDI 21). USENIX Association, July 2021.

[48]

Li Li, Haoyi Xiong, Zhishan Guo, Jun Wang, and Cheng-Zhong Xu. Smartpc: Hierarchical pace control in real-time federated learning system. In 2019 IEEE Real-Time Systems Symposium (RTSS), pages 406--418, 2019.

[49]

Tian Li, Anit Kumar Sahu, Manzil Zaheer, Maziar Sanjabi, Ameet Talwalkar, and Virginia Smith. Federated optimization in heterogeneous networks. In MLSys, 2020.

[50]

Shuai Ma, Modi Jiang, Peng Tao, Chengyi Song, Jianbo Wu, Jun Wang, Tao Deng, and Wen Shang. Temperature effect and thermal impact in lithium-ion batteries: A review. Progress in Natural Science: Materials International, 28(6):653--666, 2018.

[51]

Hugo Matalonga, Bruno Cabral, Fernando Castor, Marco Couto, Rui Pereira, Simão Melo de Sousa, and João Paulo Fernandes. Greenhub farmer: Real-world data for android energy mining. In Proceedings of the 16th International Conference on Mining Software Repositories, MSR '19, page 171--175. IEEE Press, 2019.

[52]

H. Brendan McMahan, Eider Moore, Daniel Ramage, Seth Hampson, and Blaise Agüera y Arcas. Communication-efficient learning of deep networks from decentralized data. In AISTATS, 2017.

[53]

Matthias Paulik, Matt Seigel, Henry Mason, Dominic Telaar, Joris Kluivers, Rogier C. van Dalen, Chi Wai Lau, Luke Carlson, Filip Granqvist, Chris Vandevelde, Sudeep Agarwal, Julien Freudiger, Andrew Byde, Abhishek Bhowmick, Gaurav Kapoor, Si Beaumont, Áine Cahill, Dominic Hughes, Omid Javidbakht, Fei Dong, Rehan Rishi, and Stanley Hung. Federated evaluation and tuning for on-device personalization: System design & applications. CoRR, abs/2102.08503, 2021.

[54]

Zheng Qin, Zhaoning Zhang, Dongsheng Li, Yiming Zhang, and Yuxing Peng. Diagonalwise refactorization: An efficient training method for depthwise convolutions. In 2018 International Joint Conference on Neural Networks (IJCNN), pages 1--8. IEEE, 2018.

[55]

Sashank Reddi, Zachary Charles, Manzil Zaheer, Zachary Garrett, Keith Rush, Jakub Konečnỳ, Sanjiv Kumar, and H Brendan McMahan. Adaptive federated optimization. In ICLR, 2021.

[56]

Vijay Janapa Reddi, Hongil Yoon, and Allan Knies. 2 billion devices and counting: An industry perspective on the state of mobile computer architecture. IEEE Micro, 38:6--21, 2018.

[57]

Daniel Rothchild, Ashwinee Panda, Enayat Ullah, Nikita Ivkin, Ion Stoica, Vladimir Braverman, Joseph Gonzalez, and Raman Arora. Fetchsgd: Communication-efficient federated learning with sketching. In ICML, 2020.

[58]

Theo Ryffel, Andrew Trask, Morten Dahl, Bobby Wagner, Jason Mancuso, Daniel Rueckert, and Jonathan Passerat-Palmbach. A generic framework for privacy preserving deep learning. arXiv preprint arXiv:1811.04017, 2018.

[59]

Theo Ryffel, Andrew Trask, Morten Dahl, Bobby Wagner, Jason Mancuso, Daniel Rueckert, and Jonathan Passerat-Palmbach. A generic framework for privacy preserving deep learning. https://github.com/OpenMined/PySyft, 2018.

[60]

Mark Sandler, Andrew G. Howard, Menglong Zhu, Andrey Zhmoginov, and Liang-Chieh Chen. Mobilenetv2: Inverted residuals and linear bottlenecks. In CVPR, 2018.

[61]

Naichen Shi, Fan Lai, Raed Al Kontar, and Mosharaf Chowdhury. Fed-ensemble: Improving generalization through model ensembling in federated learning. In arxiv.org/abs/2107.10663, 2021.

[62]

Qipeng Wang, Mengwei Xu, Chao Jin, Xinran Dong, Jinliang Yuan, Xin Jin, Gang Huang, Yunxin Liu, and Xuanzhe Liu. Melon: Breaking the memory wall for resource-efficient on-device machine learning. In Proceedings of the 20th Annual International Conference on Mobile Systems, Applications and Services, MobiSys '22, page 450--463, New York, NY, USA, 2022. Association for Computing Machinery.

Digital Library

[63]

Pete Warden. Speech commands: A dataset for limited-vocabulary speech recognition. In arxiv.org/abs/1804.03209, 2018.

[64]

Daliang Xu, Mengwei Xu, Qipeng Wang, Shangguang Wang, Yun Ma, Kang Huang, Gang Huang, Xin Jin, and Xuanzhe Liu. Mandheling: Mixed-precision on-device dnn training with dsp offloading. In Proceedings of the 28th Annual International Conference on Mobile Computing And Networking, MobiCom '22, page 214--227, New York, NY, USA, 2022. Association for Computing Machinery.

Digital Library

[65]

Qiyang Zhang, Zuo Zhu, Ao Zhou, Qibo Sun, Schahram Dustdar, and Shangguang Wang. Energy-efficient federated training on mobile device. IEEE Network, pages 1--7, 2023.

Digital Library

[66]

Xiangyu Zhang, Xinyu Zhou, Mengxiao Lin, and Jian Sun. Shufflenet: An extremely efficient convolutional neural network for mobile devices. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 6848--6856, 2018.

[67]

Wennan Zhu, Peter Kairouz, Brendan McMahan, and Wei Li Haicheng Sun. Federated heavy hitters discovery with differential privacy. In AISTATS, 2020.

Index Terms

Flamingo: A User-Centric System for Fast and Energy-Efficient DNN Training on Smartphones
1. Computer systems organization
  1. Real-time systems
    1. Real-time system architecture
2. Computing methodologies
  1. Artificial intelligence
    1. Distributed artificial intelligence

Recommendations

Energy-efficient hardware data prefetching

Extensive research has been done in prefetching techniques that hide memory latency in microprocessors leading to performance improvements. However, the energy aspect of prefetching is relatively unknown. While aggressive prefetching techniques often ...
User-Centric Energy-Efficient Scheduling on Multi-Core Mobile Devices
DAC '14: Proceedings of the 51st Annual Design Automation Conference

Mobile devices will provide improved computing resources to sustain progressively more complicated applications. However, the design concept of fair scheduling and governing borrowed from legacy operating systems cannot be applied seamlessly in mobile ...
Sensing Human-Screen Interaction for Energy-Efficient Frame Rate Adaptation on Smartphones
Touch-screen technique has gained the large popularity in human-screen interaction with modern smartphones. Due to the limited size of equipped screens, scrolling operations are indispensable in order to display the content of interest on screen. While ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

DistributedML '23: Proceedings of the 4th International Workshop on Distributed Machine Learning

December 2023

112 pages

ISBN:9798400704475

DOI:10.1145/3630048

General Chair:
Stefanos Laskaridis
Brave Software, UK
,
Program Chairs:
Alexey Tumanov
Georgia Institute of Technology, USA
,
Nathalie Baracaldo
IBM Research, USA
,
Dimitrios Vytiniotis
Google DeepMind, UK

Copyright © 2023 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGCOMM: ACM Special Interest Group on Data Communication

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 05 December 2023

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Conference

CoNEXT 2023

Sponsor:

SIGCOMM

CoNEXT 2023: The 19th International Conference on emerging Networking EXperiments and Technologies

December 8, 2023

Paris, France

Acceptance Rates

Overall Acceptance Rate 5 of 10 submissions, 50%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
69
Total Downloads

Downloads (Last 12 months)69
Downloads (Last 6 weeks)7

Reflects downloads up to 30 Aug 2024

Other Metrics

View Author Metrics

Citations

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents