Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

DaCapo: An On-Device Learning Scheme for Memory-Constrained Embedded Systems

Published: 09 September 2023 Publication History

Abstract

The use of deep neural network (DNN) applications in microcontroller unit (MCU) embedded systems is getting popular. However, the DNN models in such systems frequently suffer from accuracy loss due to the dataset shift problem. On-device learning resolves this problem by updating the model parameters on-site with the real-world data, thus localizing the model to its surroundings. However, the backpropagation step during on-device learning requires the output of every layer computed during the forward pass to be stored in memory. This is usually infeasible in MCU devices as they are equipped only with a few KBs of SRAM. Given their energy limitation and the timeliness requirements, using flash memory to store the output of every layer is not practical either. Although there have been proposed a few research results to enable on-device learning under stringent memory conditions, they require the modification of the target models or the use of non-conventional gradient computation strategies. This paper proposes DaCapo, a backpropagation scheme that enables on-device learning in memory-constrained embedded systems. DaCapo stores only the output of certain layers, known as checkpoints, in SRAM, and discards the others. The discarded outputs are recomputed during backpropagation from the nearest checkpoint in front of them. In order to minimize the recomputation occurrences, DaCapo optimally plans the checkpoints to be stored in the SRAM area at a particular phase of the backpropagation and thus replaces the checkpoints stored in memory as the backpropagation progresses. We implemented the proposed scheme in an STM32F429ZI board and evaluated it with five representative DNN models. Our evaluation showed that DaCapo improved backpropagation time by up to 22% and saved energy consumption by up to 28% in comparison to AIfES, a machine learning platform optimized for MCU devices. In addition, our proposed approach enabled the training of MobileNet, which the MCU device had been previously unable to train.

References

[1]
2022. AIfES: Aritifical Intelligence for Embedded Systems. https://github.com/Fraunhofer-IMS/AIfES_for_Arduino
[3]
Colby Banbury et al. 2021. MLPerf tiny benchmark. Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks (2021).
[4]
Colby Banbury, Chuteng Zhou, Igor Fedorov, Ramon Matas, Urmish Thakker, Dibakar Gope, Vijay Janapa Reddi, Matthew Mattina, and Paul Whatmough. 2021. Micronets: Neural network architectures for deploying tinyml applications on commodity microcontrollers. Proceedings of Machine Learning and Systems 3 (2021), 517–532.
[5]
Joseph L. Betthauser, John T. Krall, Rahul R. Kaliki, Matthew S. Fifer, and Nitish V. Thakor. 2019. Stable electromyographic sequence prediction during movement transitions using temporal convolutional networks. In 2019 9th International IEEE/EMBS Conference on Neural Engineering (NER’19). IEEE, 1046–1049.
[6]
Alessio Burrello, Marcello Zanghieri, Cristian Sarti, Leonardo Ravaglia, Simone Benatt, and Luca Benini. 2021. Tackling time-variability in sEMG-based gesture recognition with on-device incremental learning and temporal convolutional networks. In 2021 IEEE Sensors Applications Symposium (SAS’21). IEEE, 1–6.
[7]
Han Cai, Chuang Gan, Ligeng Zhu, and Song Han. 2020. TinyTL: Reduce memory, not parameters for efficient on-device learning. In Advances in Neural Information Processing Systems, Vol. 33. 11285–11297.
[8]
Punarjay Chakravarty, Klaas Kelchtermans, Tom Roussel, Stijn Wellens, Tinne Tuytelaars, and Luc Van Eycken. 2017. CNN-based single image obstacle avoidance on a quadrotor. In 2017 IEEE International Conference on Robotics and Automation (ICRA’17). IEEE, 6369–6374.
[9]
Tianqi Chen, Bing Xu, Chiyuan Zhang, and Carlos Guestrin. 2016. Training deep nets with sublinear memory cost. arXiv preprint arXiv:1604.06174 (2016).
[10]
L. Crippa, R. Micheloni, I. Motta, and M. Sangalli. 2008. Nonvolatile memories: NOR vs. NAND architectures. In Memories in Wireless Systems. Springer, 29–53.
[11]
Robert David, Jared Duke, Advait Jain, Vijay Janapa Reddi, Nat Jeffries, Jian Li, Nick Kreeger, Ian Nappier, Meghna Natraj, Rocky Rhodes, Tiezhen Wang, and Pete Warden. 2021. Tensorflow lite micro: Embedded machine learning for tinyml systems. Proceedings of Machine Learning and Systems 3 (2021), 800–811.
[12]
Fabrizio De Vita, Giorgio Nocera, Dario Bruneo, Valeria Tomaselli, and Mirko Falchetto. 2022. On-device training of deep learning models on edge microcontrollers. In 2022 IEEE International Conferences on Internet of Things (iThings) and IEEE Green Computing & Communications (GreenCom) and IEEE Cyber, Physical & Social Computing (CPSCom) and IEEE Smart Data (SmartData) and IEEE Congress on Cybermatics (Cybermatics). IEEE, 62–69.
[13]
Sauptik Dhar, Junyao Guo, Jiayi Liu, Samarth Tripathi, Unmesh Kurup, and Mohak Shah. 2021. A survey of on-device machine learning: An algorithms and learning theory perspective. ACM Transactions on Internet of Things 2, 3 (2021), 1–49.
[14]
Juan P. Dominguez-Morales, Lourdes Duran-Lopez, Daniel Gutierrez-Galan, Antonio Rios-Navarro, Alejandro Linares-Barranco, and Angel Jimenez-Fernandez. 2021. Wildlife monitoring on the edge: A performance evaluation of embedded neural networks on microcontrollers for animal behavior classification. Sensors 21, 9 (2021), 2975.
[15]
Liqi Feng, Yaqin Zhao, Yichao Sun, Wenxuan Zhao, and Jiaxi Tang. 2021. Action recognition using a spatial-temporal network for wild felines. Animals 11, 2 (2021), 485.
[16]
Aidan N. Gomez, Mengye Ren, Raquel Urtasun, and Roger B. Grosse. 2017. The reversible residual network: Backpropagation without storing activations. Advances in Neural Information Processing Systems 30 (2017).
[17]
Audrunas Gruslys, Rémi Munos, Ivo Danihelka, Marc Lanctot, and Alex Graves. 2016. Memory-efficient backpropagation through time. Advances in Neural Information Processing Systems 29 (2016), 4125–4133.
[18]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 770–778.
[19]
Zhiwen Hu, Zixuan Bai, Yuzhe Yang, Zijie Zheng, Kaigui Bian, and Lingyang Song. 2019. UAV aided aerial-ground IoT for air quality sensing in smart city: Architecture, technologies, and implementation. IEEE Network 33, 2 (2019), 14–22.
[20]
Max Jaderberg, Wojciech Marian Czarnecki, Simon Osindero, Oriol Vinyals, Alex Graves, David Silver, and Koray Kavukcuoglu. 2017. Decoupled neural interfaces using synthetic gradients. In International Conference on Machine Learning. PMLR, 1627–1635.
[21]
Paras Jain, Ajay Jain, Aniruddha Nrusimha, Amir Gholami, Pieter Abbeel, Joseph Gonzalez, Kurt Keutzer, and Ion Stoica. 2020. Checkmate: Breaking the memory wall with optimal tensor rematerialization. In Proceedings of the Machine Learning and Systems, I. Dhillon, D. Papailiopoulos, and V. Sze (Eds.), Vol. 2. 497–511.
[22]
Jing Jiang and ChengXiang Zhai. 2007. Instance weighting for domain adaptation in NLP. In Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics. 264–271.
[23]
Kavya Kopparapu, Eric Lin, John G. Breslin, and Bharath Sudharsan. 2022. TinyFedTL: Federated transfer learning on ubiquitous tiny IoT devices. In 2022 IEEE International Conference on Pervasive Computing and Communications Workshops and other Affiliated Events (PerCom Workshops’22). IEEE, 79–81.
[24]
Ravi Kumar, Manish Purohit, Zoya Svitkina, Erik Vee, and Joshua Wang. 2019. Efficient rematerialization for deep networks. Advances in Neural Information Processing Systems 32 (2019).
[25]
Mitsuru Kusumoto, Takuya Inoue, Gentaro Watanabe, Takuya Akiba, and Masanori Koyama. 2019. A graph theoretic framework of recomputation algorithms for memory-efficient backpropagation. Advances in Neural Information Processing Systems 32 (2019).
[26]
Seulki Lee and Shahriar Nirjon. 2019. Neuro.ZERO: A zero-energy neural network accelerator for embedded sensing and inference systems. In Proceedings of the 17th Conference on Embedded Networked Sensor Systems. 138–152.
[27]
Seulki Lee and Shahriar Nirjon. 2020. Learning in the wild: When, how, and what to learn for on-device dataset adaptation. In Proceedings of the 2nd International Workshop on Challenges in Artificial Intelligence and Machine Learning for Internet of Things. 34–40.
[28]
Ji Lin, Ligeng Zhu, Wei-Ming Chen, Wei-Chen Wang, Chuang Gan, and Song Han. 2022. On-device training under 256KB memory. arXiv preprint arXiv:2206.15472 (2022).
[29]
Bojan Milosevic, Elisabetta Farella, and Simone Benatti. 2018. Exploring arm posture and temporal variability in myoelectric hand gesture recognition. In 2018 7th IEEE International Conference on Biomedical Robotics and Biomechatronics (Biorob’18). 1032–1037. DOI:
[30]
Pramod Kaushik Mudrakarta, Mark Sandler, Andrey Zhmoginov, and Andrew Howard. 2019. K for the Price of 1: Parameter-efficient Multi-task and Transfer Learning. arxiv:1810.10703 [cs.LG]
[31]
Shishir G. Patil, Paras Jain, Prabal Dutta, Ion Stoica, and Joseph Gonzalez. 2022. POET: Training neural networks on tiny devices with integrated rematerialization and paging. In International Conference on Machine Learning. PMLR, 17573–17583.
[32]
Sameer Qazi, Bilal A. Khawaja, and Qazi U. Farooq. 2022. IoT-equipped and AI-enabled next generation smart agriculture: A critical review, current challenges and future trends. IEEE Access (2022).
[33]
Haoyu Ren, Darko Anicic, and Thomas A Runkler. 2021. TinyOL: TinyML with online-learning on microcontrollers. In Proceedings of the 2021 International Joint Conference on Neural Networks (IJCNN’21). IEEE, 1–8.
[34]
Anuj Sehgal, Vladislav Perelman, Siarhei Kuryla, and Jurgen Schonwalder. 2012. Management of resource constrained devices in the Internet of Things. IEEE Communications Magazine 50, 12 (2012), 144–149.
[35]
STMicroelectronics 2008. STM32F427xx, STM32F429xx Datasheet, DocID024030 Rev. 10. STMicroelectronics.
[36]
STMicroelectronics 2021. STM32F405/415, STM32F407/417, STM32F427/437 and STM32F429/439 Advanced Arm®-based 32-bit MCUs Reference Manual, RM0090 Rev. 19. STMicroelectronics. pp. 77.
[37]
Bharath Sudharsan, John G. Breslin, and Muhammad Intizar Ali. 2020. Edge2train: A framework to train machine learning models (SVMs) on resource-constrained IoT edge devices. In Proceedings of the 10th International Conference on the Internet of Things. 1–8.
[38]
Vivienne Sze, Yu-Hsin Chen, Tien-Ju Yang, and Joel S. Emer. 2017. Efficient processing of deep neural networks: A tutorial and survey. Proc. IEEE 105, 12 (2017), 2295–2329.
[39]
Makoto Yamada, Leonid Sigal, and Michalis Raptis. 2013. Covariate shift adaptation for discriminative 3D pose estimation. IEEE Transactions on Pattern Analysis and Machine Intelligence 36, 2 (2013), 235–247.
[40]
Makoto Yamada, Masashi Sugiyama, and Tomoko Matsui. 2010. Semi-supervised speaker identification under covariate shift. Signal Processing 90, 8 (2010), 2353–2361.

Cited By

View all
  • (2024)On-device Online Learning and Semantic Management of TinyML SystemsACM Transactions on Embedded Computing Systems10.1145/366527823:4(1-32)Online publication date: 10-Jun-2024
  • (2024)DACAPO: Accelerating Continuous Learning in Autonomous Systems for Video Analytics2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA)10.1109/ISCA59077.2024.00093(1246-1261)Online publication date: 29-Jun-2024

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Embedded Computing Systems
ACM Transactions on Embedded Computing Systems  Volume 22, Issue 5s
Special Issue ESWEEK 2023
October 2023
1394 pages
ISSN:1539-9087
EISSN:1558-3465
DOI:10.1145/3614235
  • Editor:
  • Tulika Mitra
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Journal Family

Publication History

Published: 09 September 2023
Accepted: 10 July 2023
Revised: 02 June 2023
Received: 23 March 2023
Published in TECS Volume 22, Issue 5s

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. On-device learning
  2. embedded systems
  3. backpropagation
  4. machine learning
  5. Internet-of-Things

Qualifiers

  • Research-article

Funding Sources

  • Institute of Information and Communications Technology Planning and Evaluation
  • Development of Core Technology for Autonomous Energy-driven Computing System SW in Power-Instable Environment
  • National Research Foundation of Korea

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)681
  • Downloads (Last 6 weeks)82
Reflects downloads up to 03 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2024)On-device Online Learning and Semantic Management of TinyML SystemsACM Transactions on Embedded Computing Systems10.1145/366527823:4(1-32)Online publication date: 10-Jun-2024
  • (2024)DACAPO: Accelerating Continuous Learning in Autonomous Systems for Video Analytics2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA)10.1109/ISCA59077.2024.00093(1246-1261)Online publication date: 29-Jun-2024

View Options

Get Access

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Full Text

View this article in Full Text.

Full Text

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media