research-article

MinUn: Accurate ML Inference on Microcontrollers

Authors:

Shikhar Jaiswal,

Rahul Kranti Kiran Goli,

Vivek Seshadri,

Rahul SharmaAuthors Info & Claims

LCTES 2023: Proceedings of the 24th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, and Tools for Embedded Systems

Pages 26 - 39

https://doi.org/10.1145/3589610.3596278

Published: 13 June 2023 Publication History

Abstract

Running machine learning inference on tiny devices, known as TinyML, is an emerging research area. This task requires generating inference code that uses memory frugally, a task that standard ML frameworks are ill-suited for. A deployment framework for TinyML must a) be parametric in the number representation to take advantage of the emerging representations like posits, b) carefully assign high-precision to a few tensors so that most tensors can be kept in low-precision while still maintaining model accuracy, and c) avoid memory fragmentation. We describe MinUn, the first TinyML framework that holistically addresses these issues to generate efficient code for ARM microcontrollers (e.g., Arduino Uno, Due and STM32H747) that outperforms the prior TinyML frameworks.

References

[1]

Martin Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, Manjunath Kudlur, Josh Levenberg, Rajat Monga, Sherry Moore, Derek G. Murray, Benoit Steiner, Paul Tucker, Vijay Vasudevan, Pete Warden, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. 2015. TensorFlow: Large-scale machine learning on heterogeneous systems. https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/framework/bfloat16.cc

[2]

Alfred V. Aho, Monica S. Lam, Ravi Sethi, and Jeffrey D. Ullman. 2006. Compilers: Principles, Techniques, and Tools (2nd Edition). Addison-Wesley Longman Publishing Co., Inc., USA. isbn:0321486811

Digital Library

[3]

Kerem Altun, Billur Barshan, and Orkun Tunçel. 2010. Comparative study on classifying human activities with miniature inertial and magnetic sensors. Pattern Recognition, 43, 10 (2010), 3605–3620. https://archive.ics.uci.edu/ml/datasets/Daily+and+Sports+Activities

Digital Library

[4]

Davide Anguita, Alessandro Ghio, Luca Oneto, Xavier Parra, and Jorge L. Reyes-Ortiz. 2012. Human Activity Recognition on Smartphones Using a Multiclass Hardware-Friendly Support Vector Machine. In Proceedings of the 4th International Conference on Ambient Assisted Living and Home Care (IWAAL’12). Springer-Verlag, Berlin, Heidelberg. 216–223. isbn:9783642353949 https://archive.ics.uci.edu/ml/datasets/human+activity+recognition+using+smartphones

[5]

ARM. 2021. ARM Cortex-M official website. https://developer.arm.com/ip-products/processors/cortex-m

[6]

Oren Avissar, Rajeev Barua, and Dave Stewart. 2002. An Optimal Memory Allocation Scheme for Scratch-Pad-Based Embedded Systems. ACM Trans. Embed. Comput. Syst., 1, 1 (2002), nov, 6–26. issn:1539-9087 https://doi.org/10.1145/581888.581891

Digital Library

[7]

Jonathan Babb, Martin C. Rinard, Csaba Andras Moritz, Walter Lee, Matthew I. Frank, Rajeev Barua, and Saman P. Amarasinghe. 1999. Parallelizing Applications into Silicon. In 7th IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM ’99), 21-23 April 1999, Napa, CA, USA. IEEE, Napa, CA, USA. 70. https://doi.org/10.1109/FPGA.1999.803669

[8]

Woongki Baek and Trishul M. Chilimbi. 2010. Green: a framework for supporting energy-conscious programming using controlled approximation. In Proceedings of the 2010 ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2010, Toronto, Ontario, Canada, June 5-10, 2010. Association for Computing Machinery, New York, NY, USA. 198–209. https://doi.org/10.1145/1809028.1806620

Digital Library

[9]

P. Banerjee, D. Bagchi, M. Haldar, A. Nayak, V. Kim, and R. Uribe. 2003. Automatic conversion of floating point MATLAB programs into fixed point FPGA based hardware design. In 11th Annual IEEE Symposium on Field-Programmable Custom Computing Machines, 2003. FCCM 2003. IEEE Computer Society, Los Alamitos, CA, USA. 263–264. https://doi.org/10.1109/FPGA.2003.1227262

[10]

Massimo Banzi and Michael Shiloh. 2014. Getting started with Arduino: the open source electronics prototyping platform. Maker Media, Inc.

Digital Library

[11]

M Bečvář and P Štukjunger. 2005. Fixed-point arithmetic in FPGA. Acta Polytechnica, 45, 2 (2005).

[12]

David M. Brooks and Margaret Martonosi. 1999. Dynamically Exploiting Narrow Width Operands to Improve Processor Power and Performance. In Proceedings of the Fifth International Symposium on High-Performance Computer Architecture, Orlando, FL, USA, January 9-12, 1999. Association for Computing Machinery, New York, NY, USA. 13–22.

Digital Library

[13]

Han Cai, Ligeng Zhu, and Song Han. 2018. Proxylessnas: Direct neural architecture search on target task and hardware. arXiv preprint arXiv:1812.00332.

[14]

Shangyu Chen, Wenya Wang, and Sinno Jialin Pan. 2019. MetaQuant: Learning to Quantize by Learning to Penetrate Non-differentiable Quantization. In Advances in Neural Information Processing Systems 32, H. Wallach, H. Larochelle, A. Beygelzimer, F. d' Alché-Buc, E. Fox, and R. Garnett (Eds.). Curran Associates, Inc., 3916–3926. http://papers.nips.cc/paper/8647-metaquant-learning-to-quantize-by-learning-to-penetrate-non-differentiable-quantization.pdf

[15]

Xi Chen, Xiaolin Hu, Hucheng Zhou, and Ningyi Xu. 2017. FxpNet: Training a deep convolutional neural network in fixed-point representation. In 2017 International Joint Conference on Neural Networks, IJCNN 2017, Anchorage, AK, USA, May 14-19, 2017. IEEE, 2494–2501. https://doi.org/10.1109/IJCNN.2017.7966159

[16]

Matthieu Courbariaux and Yoshua Bengio. 2016. BinaryNet: Training Deep Neural Networks with Weights and Activations Constrained to +1 or -1. CoRR, abs/1602.02830 (2016), arxiv:1602.02830. arxiv:1602.02830

[17]

Eva Darulova and Viktor Kuncak. 2014. Sound compilation of reals. In The 41st Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL ’14, San Diego, CA, USA, January 20-21, 2014. Association for Computing Machinery, New York, NY, USA. 235–248. https://doi.org/10.1145/2535838.2535874

Digital Library

[18]

Eva Darulova and Viktor Kuncak. 2017. Towards a Compiler for Reals. ACM Trans. Program. Lang. Syst., 39, 2 (2017), Article 8, March, 28 pages. issn:0164-0925 https://doi.org/10.1145/3014426

Digital Library

[19]

Eva Darulova, Viktor Kuncak, Rupak Majumdar, and Indranil Saha. 2013. Synthesis of Fixed-point Programs. In Proceedings of the Eleventh ACM International Conference on Embedded Software (EMSOFT ’13). IEEE Press, Piscataway, NJ, USA. Article 22, 10 pages. isbn:978-1-4799-1443-2 http://dl.acm.org/citation.cfm?id=2555754.2555776

Digital Library

[20]

Teófilo Emídio de Campos, Bodla Rakesh Babu, and Manik Varma. 2009. Character Recognition in Natural Images. In VISAPP 2009 - Proceedings of the Fourth International Conference on Computer Vision Theory and Applications, Lisboa, Portugal, February 5-8, 2009 - Volume 2. INSTICC Press, Portugal. 273–280. https://www.researchgate.net/publication/221416071_Character_Recognition_in_Natural_Images

[21]

Don Dennis, Durmus Alp Emre Acar, Vikram Mandikal, Vinu Sankar Sadasivan, Harsha Vardhan Simhadri, Venkatesh Saligrama, and Prateek Jain. 2019. Shallow RNN: Accurate Time-series Classification on Resource Constrained Devices. In Proceedings of the Thirty-second Annual Conference on Neural Information Processing Systems (NeurIPS).

[22]

Don Dennis, Chirag Pabbaraju, Harsha Vardhan Simhadri, and Prateek Jain. 2018. Multiple Instance Learning for Efficient Sequential Data Classification on Resource-constrained Devices. In Proceedings of the Thirty-first Annual Conference on Neural Information Processing Systems (NeurIPS). 10976–10987. all_papers/DennisPSJ18.pdf slides/DennisPSJ18.pdf

[23]

Josh Fromm, Shwetak Patel, and Matthai Philipose. 2018. Heterogeneous Bitwidth Binarization in Convolutional Neural Networks. CoRR, abs/1805.10368 (2018), arxiv:1805.10368. arxiv:1805.10368

[24]

Nikhil Pratap Ghanathe, Vivek Seshadri, Rahul Sharma, Steve Wilton, and Aayan Kumar. 2021. MAFIA: Machine Learning Acceleration on FPGAs for IoT Applications. In 31st International Conference on Field-Programmable Logic and Applications, FPL 2021, Dresden, Germany, August 30 - Sept. 3, 2021. 347–354.

[25]

Ruihao Gong, Xianglong Liu, Shenghu Jiang, Tianxiang Li, Peng Hu, Jiazhen Lin, Fengwei Yu, and Junjie Yan. 2019. Differentiable Soft Quantization: Bridging Full-Precision and Low-Bit Neural Networks. In The IEEE International Conference on Computer Vision (ICCV). https://doi.org/10.1109/ICCV.2019.00495

[26]

Sridhar Gopinath, Nikhil Ghanathe, Vivek Seshadri, and Rahul Sharma. 2019. Compiling KB-Sized Machine Learning Models to Tiny IoT Devices. In Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI). Association for Computing Machinery, New York, NY, USA. 79–95. https://www.microsoft.com/en-us/research/uploads/prod/2018/10/pldi19-SeeDot.pdf

Digital Library

[27]

Posit Working Group. 2018. Posit Standard Documentation. https://posithub.org/docs/posit_standard.pdf

[28]

Denis A. Gudovskiy and Luca Rigazio. 2017. ShiftCNN: Generalized Low-Precision Architecture for Inference of Convolutional Neural Networks. CoRR, abs/1706.02393 (2017), https://www.researchgate.net/publication/317419072_ShiftCNN_Generalized_Low-Precision_Architecture_for_Inference_of_Convolutional_Neural_Networks

[29]

Chirag Gupta, Arun Sai Suggala, Ankit Goyal, Harsha Vardhan Simhadri, Bhargavi Paranjape, Ashish Kumar, Saurabh Goyal, Raghavendra Udupa, Manik Varma, and Prateek Jain. 2017. ProtoNN: compressed and accurate kNN for resource-scarce devices. In International Conference on Machine Learning. PMLR, International Convention Centre, Sydney, Australia. 1331–1340. https://dl.acm.org/doi/10.5555/3305381.3305519

[30]

Gustafson and Yonemoto. 2017. Beating Floating Point at Its Own Game: Posit Arithmetic. Supercomput. Front. Innov.: Int. J., 4, 2 (2017), June, 71–86. issn:2409-6008 https://doi.org/10.14529/jsfi170206

Digital Library

[31]

John Gustafson. 2017. Posit Arithmetic. https://posithub.org/docs/Posits4.pdf

[32]

Yonghao He, Dezhong Xu, Lifang Wu, Meng Jian, Shiming Xiang, and Chunhong Pan. 2019. LFFD: A Light and Fast Face Detector for Edge Devices. In arXiv:1904.10633.

[33]

Yihui He, Xiangyu Zhang, and Jian Sun. 2017. Channel Pruning for Accelerating Very Deep Neural Networks. In IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017. 1398–1406.

[34]

Zhezhi He and Deliang Fan. 2019. Simultaneously Optimizing Weight and Quantizer of Ternary Neural Network Using Truncated Gaussian Approximation. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR). arxiv:1810.01018

[35]

Lu Hou, Jinhua Zhu, James Kwok, Fei Gao, Tao Qin, and Tie-Yan Liu. 2019. Normalization Helps Training of Quantized LSTM. In Advances in Neural Information Processing Systems 32, H. Wallach, H. Larochelle, A. Beygelzimer, F. d' Alché-Buc, E. Fox, and R. Garnett (Eds.). Curran Associates, Inc., 7346–7356. http://papers.nips.cc/paper/8954-normalization-helps-training-of-quantized-lstm.pdf

[36]

Chih-Wei Hsu and Chih-Jen Lin. 2002. A comparison of methods for multiclass support vector machines. IEEE transactions on Neural Networks, 13, 2 (2002), 415–425. https://doi.org/10.1109/72.991427

Digital Library

[37]

Gao Huang, Zhuang Liu, Laurens Van Der Maaten, and Kilian Q Weinberger. 2017. Densely connected convolutional networks. In Proceedings of the IEEE conference on computer vision and pattern recognition. 4700–4708.

[38]

Itay Hubara, Matthieu Courbariaux, Daniel Soudry, Ran El-Yaniv, and Yoshua Bengio. 2016. Binarized Neural Networks. In Advances in Neural Information Processing Systems 29, D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, and R. Garnett (Eds.). Curran Associates, Inc., 4107–4115. http://papers.nips.cc/paper/6573-binarized-neural-networks.pdf

Digital Library

[39]

Jonathan J. Hull. 1994. A database for handwritten text recognition research. IEEE Transactions on pattern analysis and machine intelligence, 16, 5 (1994), 550–554. https://doi.org/10.1109/34.291440

Digital Library

[40]

Forrest N. Iandola, Matthew W. Moskewicz, Khalid Ashraf, Song Han, William J. Dally, and Kurt Keutzer. 2016. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and < 1MB model size. CoRR, abs/1602.07360 (2016), arxiv:1602.07360v1

[41]

Benoit Jacob, Skirmantas Kligys, Bo Chen, Menglong Zhu, Matthew Tang, Andrew Howard, Hartwig Adam, and Dmitry Kalenichenko. 2018. Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[42]

Benoit Jacob, Pete Warden, Miao Wang, David Andersen, Maciek Chociej, Justine Tunney, Mark J. Matthews, Marie White, Suharsh Sivakumar, Sagi Marcovich, Murat Efe Guney, Sarah Knepper, Mourad Gouicem, Richard Winterton, David Mansell, Andreas Gal, Alexey Frunze, and Alexey Frunze. 2015. gemmlowp: a small self-contained low-precision gemm library. https://github.com/google/gemmlowp

[43]

Shikhar Jaiswal, Rahul Kiran Kranti Goli, Aayan Kumar, Vivek Seshadri, and Rahul Sharma. 2022. MinUn: Accurate ML Inference on Microcontrollers. arxiv:2210.16556.

[44]

Shikhar Jaiswal, Oindrila Saha, Aayan Kumar, Harsha Vardhan Simhadri, Prateek Jain, and Rahul Sharma. 2021. Enabling Accurate Computer Vision on Tiny Microcontrollers with RNNPool Operator and SeeDot Compiler. https://towardsdatascience.com/enabling-accurate-computer-vision-on-tiny-microcontrollers-with-rnnpool-operator-and-seedot-d6944930dcf9

[45]

Irena Jekova and Vessela Krasteva. 2004. Real Time detection of ventricular fibrillation and tachycardia. Physiological measurement, 25 (2004), 11, 1167–78. https://doi.org/10.1088/0967-3334/25/5/007

[46]

Jeff Johnson. 2018. Rethinking floating point for deep learning. CoRR, abs/1811.01721 (2018), arxiv:1811.01721

[47]

Don E. Knuth. 2000. Dancing Links. CoRR, abs/cs/0011047 (2000), arxiv:cs/0011047

[48]

Urs Köster, Tristan Webb, Xin Wang, Marcel Nassar, Arjun K. Bansal, William Constable, Oguz Elibol, Stewart Hall, Luke Hornof, Amir Khosrowshahi, Carey Kloss, Ruby J. Pai, and Naveen Rao. 2017. Flexpoint: An Adaptive Numerical Format for Efficient Training of Deep Neural Networks. CoRR, abs/1711.02213 (2017), arxiv:1711.02213

[49]

Raghuraman Krishnamoorthi. 2018. Quantizing deep convolutional networks for efficient inference: A whitepaper. CoRR, abs/1806.08342 (2018), arxiv:1806.08342

[50]

Alex Krizhevsky. 2009. Learning multiple layers of features from tiny images. Citeseer.

[51]

Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. ImageNet Classification with Deep Convolutional Neural Networks. In Advances in Neural Information Processing Systems 25: 26th Annual Conference on Neural Information Processing Systems 2012. Proceedings of a meeting held December 3-6, 2012, Lake Tahoe, Nevada, United States. 1106–1114. http://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks

Digital Library

[52]

Ashish Kumar, Saurabh Goyal, and Manik Varma. 2017. Resource-efficient Machine Learning in 2 KB RAM for the Internet of Things. In International Conference on Machine Learning. PMLR, International Convention Centre, Sydney, Australia. 1935–1944. https://dl.acm.org/doi/10.5555/3305381.3305581

[53]

Aayan Kumar, Vivek Seshadri, and Rahul Sharma. 2020. Shiftry: RNN Inference in 2KB of RAM. Proc. ACM Program. Lang., 4, OOPSLA (2020), Article 182, Nov., 30 pages. https://doi.org/10.1145/3428250

Digital Library

[54]

Aditya Kusupati, Manish Singh, Kush Bhatia, Ashish Kumar, Prateek Jain, and Manik Varma. 2018. FastGRNN: A Fast, Accurate, Stable and Tiny Kilobyte Sized Gated Recurrent Neural Network. In Proceedings of the Thirty-first Annual Conference on Neural Information Processing Systems (NeurIPS). 9031–9042. https://dl.acm.org/doi/10.5555/3327546.3327577

[55]

Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner. 1998. Gradient-based learning applied to document recognition. Proc. IEEE, 86, 11 (1998), 2278–2324. https://doi.org/10.1109/5.726791

[56]

Siew Hoon Leong. 2020. SoftPosit. https://gitlab.com/cerlane/SoftPosit

[57]

Hao Li, Soham De, Zheng Xu, Christoph Studer, Hanan Samet, and Tom Goldstein. 2017. Training Quantized Nets: A Deeper Understanding. In Advances in Neural Information Processing Systems 30, I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (Eds.). Curran Associates, Inc., 5811–5821. http://papers.nips.cc/paper/7163-training-quantized-nets-a-deeper-understanding.pdf

Digital Library

[58]

Ji Lin, Wei-Ming Chen, Han Cai, Chuang Gan, and Song Han. 2021. MCUNetV2: Memory-Efficient Patch-based Inference for Tiny Deep Learning. In Annual Conference on Neural Information Processing Systems (NeurIPS).

[59]

Ji Lin, Wei-Ming Chen, John Cohn, Chuang Gan, and Song Han. 2020. MCUNet: Tiny Deep Learning on IoT Devices. In Annual Conference on Neural Information Processing Systems (NeurIPS).

[60]

Zhouhan Lin, Matthieu Courbariaux, Roland Memisevic, and Yoshua Bengio. 2015. Neural Networks with Few Multiplications. CoRR, abs/1510.03009 (2015), arxiv:1510.03009. arxiv:1510.03009

[61]

Christos Louizos, Matthias Reisser, Tijmen Blankevoort, Efstratios Gavves, and Max Welling. 2019. Relaxed Quantization for Discretized Neural Networks. In International Conference on Learning Representations. https://openreview.net/forum?id=HkxjYoCqKX

[62]

Julieta Martinez, Shobhit Zakhmi, Holger H. Hoos, and James J. Little. 2018. LSQ++: Lower running time and higher recall in multi-codebook quantization. In The European Conference on Computer Vision (ECCV). https://openaccess.thecvf.com/content_ECCV_2018/papers/Julieta_Martinez_LSQ_lower_runtime_ECCV_2018_paper.pdf

[63]

Eldad Meller, Alexander Finkelstein, Uri Almog, and Mark Grobman. 2019. Same, Same But Different - Recovering Neural Network Quantization Error Through Weight Factorization. CoRR, abs/1902.01917 (2019), arxiv:1902.01917

[64]

Daniel Menard, Daniel Chillet, François Charot, and Olivier Sentieys. 2002. Automatic Floating-point to Fixed-point Conversion for DSP Code Generation. In Proceedings of the 2002 International Conference on Compilers, Architecture, and Synthesis for Embedded Systems (CASES ’02). Association for Computing Machinery, New York, NY, USA. 270–276. isbn:1-58113-575-0 https://doi.org/10.1145/581630.581674

Digital Library

[65]

Daisuke Miyashita, Edward H. Lee, and Boris Murmann. 2016. Convolutional Neural Networks using Logarithmic Data Representation. CoRR, abs/1603.01025 (2016), arxiv:1603.01025

[66]

Elon Musk. 2019. An Integrated Brain-Machine Interface Platform With Thousands of Channels. J Med Internet Res, 21, 10 (2019), 31 Oct, e16194. issn:1438-8871 https://doi.org/10.2196/16194

[67]

Markus Nagel, Mart van Baalen, Tijmen Blankevoort, and Max Welling. 2019. Data-Free Quantization Through Weight Equalization and Bias Correction. In 2019 IEEE/CVF International Conference on Computer Vision, ICCV 2019, Seoul, Korea (South), October 27 - November 2, 2019. IEEE, 1325–1334. https://openaccess.thecvf.com/content_ICCV_2019/papers/Nagel_Data-Free_Quantization_Through_Weight_Equalization_and_Bias_Correction_ICCV_2019_paper.pdf

[68]

Anshuman Nayak, Malay Haldar, Alok N. Choudhary, and Prithviraj Banerjee. 2001. Precision and error analysis of MATLAB applications during automated hardware synthesis for FPGAs. In Proceedings of the Conference on Design, Automation and Test in Europe, DATE 2001, Munich, Germany, March 12-16, 2001. 722–728. https://doi.org/10.1109/DATE.2001.915108

[69]

Gary J. Nutt. 2002. Operating Systems: A Modern Perspective. Addison Wesley. isbn:9780201612431 lccn:2001033570 https://books.google.co.in/books?id=AHBGAAAAYAAJ

[70]

Pavel Panchekha, Alex Sanchez-Stern, James R. Wilcox, and Zachary Tatlock. 2015. Automatically improving accuracy for floating point expressions. In Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation, Portland, OR, USA, June 15-17, 2015. 1–11.

Digital Library

[71]

Dezhi Peng, Zikai Sun, Zirong Chen, Zirui Cai, Lele Xie, and Lianwen Jin. 2018. Detecting Heads using Feature Refine Net and Cascaded Multi-scale Architecture. CoRR, abs/1803.09256 (2018), arXiv:1803.09256. arxiv:1803.09256

[72]

Mohammad Rastegari, Vicente Ordonez, Joseph Redmon, and Ali Farhadi. 2016. XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks. CoRR, abs/1603.05279 (2016), arxiv:1603.05279. arxiv:1603.05279

[73]

Cindy Rubio-González, Cuong Nguyen, Hong Diep Nguyen, James Demmel, William Kahan, Koushik Sen, David H. Bailey, Costin Iancu, and David Hough. 2013. Precimonious: Tuning Assistant for Floating-point Precision. In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis (SC ’13). Association for Computing Machinery, New York, NY, USA. Article 27, 12 pages. isbn:978-1-4503-2378-9 https://doi.org/10.1145/2503210.2503296

Digital Library

[74]

Oindrila Saha, Aditya Kusupati, Harsha Vardhan Simhadri, Manik Varma, and Prateek Jain. 2020. RNNPool: Efficient Non-linear Pooling for RAM Constrained Inference. In Advances in Neural Information Processing Systems.

[75]

Charbel Sakr and Naresh Shanbhag. 2019. Per-Tensor Fixed-Point Quantization of the Back-Propagation Algorithm. In International Conference on Learning Representations. https://openreview.net/forum?id=rkxaNjA9Ym

[76]

Mark Sandler, Andrew Howard, Menglong Zhu, Andrey Zhmoginov, and Liang-Chieh Chen. 2018. Mobilenetv2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE conference on computer vision and pattern recognition. 4510–4520.

[77]

Eric Schkufza, Rahul Sharma, and Alex Aiken. 2014. Stochastic optimization of floating-point programs with tunable precision. In ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI ’14, Edinburgh, United Kingdom - June 09 - 11, 2014. 53–64. https://doi.org/10.1145/2666356.2594302

Digital Library

[78]

Abhishek A. Sharma. 2014. A Consolidated Review on Embedded MicroControllers for Pace Maker Applications. In The 2014 International Conference on Embedded Systems and Applications.

[79]

Albert Shaw, Daniel Hunter, Forrest Landola, and Sammy Sidhu. 2019. SqueezeNAS: Fast neural architecture search for faster semantic segmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops. 0–0.

[80]

Stelios Sidiroglou-Douskos, Sasa Misailovic, Henry Hoffmann, and Martin Rinard. 2011. Managing Performance vs. Accuracy Trade-offs with Loop Perforation. In Proceedings of the 19th ACM SIGSOFT Symposium and the 13th European Conference on Foundations of Software Engineering (ESEC/FSE ’11). Association for Computing Machinery, New York, NY, USA. 124–134. isbn:978-1-4503-0443-6 https://doi.org/10.1145/2025113.2025133

Digital Library

[81]

Mingxing Tan and Quoc Le. 2019. Efficientnet: Rethinking model scaling for convolutional neural networks. In International Conference on Machine Learning. 6105–6114.

[82]

ONNX Development Team. 2021. ONNX Model Zoo. https://github.com/onnx/models/

[83]

Manik Varma and Andrew Zisserman. 2005. A statistical approach to texture classification from single images. International journal of computer vision, 62, 1-2 (2005), 61–81. https://doi.org/10.1023/B:VISI.0000046589.39864.ee

[84]

Kuan Wang, Zhijian Liu, Yujun Lin, Ji Lin, and Song Han. 2019. Haq: Hardware-aware automated quantization with mixed precision. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 8612–8620.

[85]

Pete Warden. 2018. Speech Commands: A Dataset for Limited-Vocabulary Speech Recognition. arxiv:1804.03209.

[86]

M. Willems. 1997. FRIDGE : Floating-point programming of fixed-point digital signal processors. Proc. International Conference on Signal Processing Applications and Technology 1997 (ICSPAT-97), Sept., https://ci.nii.ac.jp/naid/10018558547/en/

[87]

Jingjing Yang, Yuanning Li, Yonghong Tian, Lingyu Duan, and Wen Gao. 2009. Group-sensitive multiple kernel learning for object categorization. In Computer Vision, 2009 IEEE 12th International Conference on. IEEE, Kyoto, Japan. 436–443. https://doi.org/10.1109/ICCV.2009.5459172

[88]

Shuo Yang, Ping Luo, Chen Change Loy, and Xiaoou Tang. 2016. WIDER FACE: A Face Detection Benchmark. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[89]

Young Joon Yoo, Dongyoon Han, and Sangdoo Yun. 2019. EXTD: Extremely Tiny Face Detector via Iterative Filter Reuse. CoRR, abs/1906.06579 (2019), arXiv:1906.06579. arxiv:1906.06579

[90]

Shifeng Zhang, Xiangyu Zhu, Zhen Lei, Hailin Shi, Xiaobo Wang, and Stan Z Li. 2017. Faceboxes: A CPU real-time face detector with high accuracy. In 2017 IEEE International Joint Conference on Biometrics (IJCB). 1–9.

Digital Library

[91]

Xu Zhao, Xiaoqing Liang, Chaoyang Zhao, Ming Tang, and Jinqiao Wang. 2019. Real-Time Multi-Scale Face Detector on Embedded Devices. Sensors, 19, 9 (2019), issn:1424-8220 https://doi.org/10.3390/s19092158

[92]

Yiren Zhao, Xitong Gao, Daniel Bates, Robert Mullins, and Cheng-Zhong Xu. 2019. Focused Quantization for Sparse CNNs. In Advances in Neural Information Processing Systems 32, H. Wallach, H. Larochelle, A. Beygelzimer, F. d' Alché-Buc, E. Fox, and R. Garnett (Eds.). Curran Associates, Inc., 5584–5593. http://papers.nips.cc/paper/8796-focused-quantization-for-sparse-cnns.pdf

[93]

Aojun Zhou, Anbang Yao, Yiwen Guo, Lin Xu, and Yurong Chen. 2017. Incremental Network Quantization: Towards Lossless CNNs with Low-precision Weights. In 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings. OpenReview.net. arxiv:1702.03044v2

[94]

Aojun Zhou, Anbang Yao, Kuan Wang, and Yurong Chen. 2018. Explicit Loss-Error-Aware Quantization for Low-Bit Deep Neural Networks. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR). https://openaccess.thecvf.com/content_cvpr_2018/papers/Zhou_Explicit_Loss-Error-Aware_Quantization_CVPR_2018_paper.pdf

[95]

Zeyuan Allen Zhu, Sasa Misailovic, Jonathan A. Kelner, and Martin Rinard. 2012. Randomized Accuracy-aware Program Transformations for Efficient Approximate Computations. In Proceedings of the 39th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL ’12). Association for Computing Machinery, New York, NY, USA. 441–454. isbn:978-1-4503-1083-3 https://doi.org/10.1145/2103621.2103710

Digital Library

Cited By

Alvanaki EKatsaragakis MMasouros DXydis SSoudris D(2024)Decoupled Access-Execute Enabled DVFS for TinyML Deployments on STM32 Microcontrollers2024 Design, Automation & Test in Europe Conference & Exhibition (DATE)10.23919/DATE58400.2024.10546540(1-6)Online publication date: 25-Mar-2024
https://doi.org/10.23919/DATE58400.2024.10546540
Capogrosso LCunico FCheng DFummi FCristani M(2024)A Machine Learning-Oriented Survey on Tiny Machine LearningIEEE Access10.1109/ACCESS.2024.336534912(23406-23426)Online publication date: 2024
https://doi.org/10.1109/ACCESS.2024.3365349

Index Terms

MinUn: Accurate ML Inference on Microcontrollers
1. Computer systems organization
  1. Embedded and cyber-physical systems
    1. Embedded systems
      1. Embedded hardware
2. Software and its engineering
  1. Software notations and tools
    1. Compilers
      1. Retargetable compilers
      2. Source code generation

Recommendations

Enabling Hybrid PCM Memory System with Inherent Memory Management
RACS '16: Proceedings of the International Conference on Research in Adaptive and Convergent Systems

Replacing the traditional volatile main memory, e.g., DRAM, with a non-volatile phase change memory (PCM) has become a possible solution to reduce the energy consumption of computing systems. To further reduce the bit cost of PCM, the development trend ...
NVM duet: unified working memory and persistent store architecture
ASPLOS '14

Emerging non-volatile memory (NVM) technologies have gained a lot of attention recently. The byte-addressability and high density of NVM enable computer architects to build large-scale main memory systems. NVM has also been shown to be a promising ...
NVM duet: unified working memory and persistent store architecture
ASPLOS '14: Proceedings of the 19th international conference on Architectural support for programming languages and operating systems

Emerging non-volatile memory (NVM) technologies have gained a lot of attention recently. The byte-addressability and high density of NVM enable computer architects to build large-scale main memory systems. NVM has also been shown to be a promising ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

LCTES 2023: Proceedings of the 24th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, and Tools for Embedded Systems

June 2023

147 pages

ISBN:9798400701740

DOI:10.1145/3589610

General Chair:
Bernhard Egger
Seoul National University, South Korea
,
Program Chair:
Dongyoon Lee
Stony Brook University, USA

Copyright © 2023 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 June 2023

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

LCTES '23

Sponsor:

LCTES '23: 24th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, and Tools for Embedded Systems

June 18, 2023

FL, Orlando, USA

Acceptance Rates

Overall Acceptance Rate 116 of 438 submissions, 26%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
175
Total Downloads

Downloads (Last 12 months)87
Downloads (Last 6 weeks)3

Reflects downloads up to 26 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Alvanaki EKatsaragakis MMasouros DXydis SSoudris D(2024)Decoupled Access-Execute Enabled DVFS for TinyML Deployments on STM32 Microcontrollers2024 Design, Automation & Test in Europe Conference & Exhibition (DATE)10.23919/DATE58400.2024.10546540(1-6)Online publication date: 25-Mar-2024
https://doi.org/10.23919/DATE58400.2024.10546540
Capogrosso LCunico FCheng DFummi FCristani M(2024)A Machine Learning-Oriented Survey on Tiny Machine LearningIEEE Access10.1109/ACCESS.2024.336534912(23406-23426)Online publication date: 2024
https://doi.org/10.1109/ACCESS.2024.3365349

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten