Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3446999.3447008acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicitConference Proceedingsconference-collections
research-article

embARC MLI based Design and Implementation of Real-time Keyword Spotting

Published: 09 April 2021 Publication History

Abstract

Efficient implementation of inference is essential for neural network applications on edge devices. This paper presents a neural network based Keyword Spotting (KWS) system built with embARC MLI Library and ARC EM9D micro-processor. embARC MLI Library is a highly optimized machine learning inference library for IoT edge devices, and it is open source. With unique XY-architecture, EM9D processor achieves high efficiency when executing continuous MAC instructions. Performance of the combination is analyzed in detail and is compared with other processors. As edge devices generally have limited computing and memory resources, there are many optimization tasks need to be done and they are also presented in the paper. The paper shows that with highly optimized code based on particular hardware, AI applications can meet real-time requirements even on low-cost edge devices.

References

[1]
Stojkoska, B. L. R., & Trivodaliev, K. V. (2017). A review of Internet of Things for smart home: Challenges and solutions. Journal of Cleaner Production, 140, 1454-1464. DOI= https://doi.org/10.1016/j.jclepro.2016.10.006.
[2]
Atal, B. S. (1974). Effectiveness of linear prediction characteristics of the speech wave for automatic speaker identification and verification. the Journal of the Acoustical Society of America, 55(6), 1304-1312. DOI= https://doi.org/10.1121/1.1914702.
[3]
Davis, S., & Mermelstein, P. (1980). Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE transactions on acoustics, speech, and signal processing, 28(4), 357-366. DOI= https://doi.org/10.1109/TASSP.1980.1163420.
[4]
Rohlicek, J. R., Russell, W., Roukos, S., & Gish, H. (1989, May). Continuous hidden Markov modeling for speaker-independent word spotting. In International Conference on Acoustics, Speech, and Signal Processing, (pp. 627-630). IEEE. DOI= https://doi.org/10.1109/ICASSP.1989.266505.
[5]
Fernández, S., Graves, A., & Schmidhuber, J. (2007, September). An application of recurrent neural networks to discriminative keyword spotting. In International Conference on Artificial Neural Networks (pp. 220-229). Springer, Berlin, Heidelberg.
[6]
Sun, M., Raju, A., Tucker, G., Panchapagesan, S., Fu, G., Mandal, A., ... & Vitaladevuni, S. (2016, December). Max-pooling loss training of long short-term memory networks for small-footprint keyword spotting. In 2016 IEEE Spoken Language Technology Workshop (SLT) (pp. 474-480). IEEE. DOI= https://doi.org/10.1109/SLT.2016.7846306.
[7]
Arik, S. O., Kliegl, M., Child, R., Hestness, J., Gibiansky, A., Fougner, C., ... & Coates, A. (2017). Convolutional recurrent neural networks for small-footprint keyword spotting. arXiv preprint arXiv:1703.05390.
[8]
Zhang, Y., Suda, N., Lai, L., & Chandra, V. (2017). Hello edge: Keyword spotting on microcontrollers. arXiv preprint arXiv:1711.07128.
[9]
Chollet, F. (2017). Xception: Deep learning with depthwise separable convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1251-1258).
[10]
Chen, G., Parada, C., & Heigold, G. (2014, May). Small-footprint keyword spotting using deep neural networks. In 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 4087-4091). IEEE. DOI= https://doi.org/10.1109/ICASSP.2014.6854370.
[11]
Say Welcome to the Machine - Low-Power Machine Learning for Smart IoT Applications. Retrieved May, 2019, from https://www.synopsys.com/dw/doc.php/wp/arc_low_power_machine_learning_for_iot.pdf.
[12]
Lai, L., Suda, N., & Chandra, V. (2017). Deep convolutional neural network inference with floating-point weights and fixed-point activations. arXiv preprint arXiv:1703.03073.
[13]
Q (number format). https://en.wikipedia.org/wiki/Q_(number_format).
[14]
Zhang, T., Shao, Y., Wu, Y., Geng, Y., & Fan, L. (2020). An overview of speech endpoint detection algorithms. Applied Acoustics, 160, 107133. DOI= https://doi.org/10.1016/j.apacoust.2019.107133.
[15]
Howard, A. G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., ... & Adam, H. (2017). Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861.
[16]
Warden, P. (2018). Speech commands: A dataset for limited-vocabulary speech recognition. arXiv preprint arXiv:1804.03209.
[17]
Arm Cortex-M7 Devices Generic User Guide. https://developer.arm.com/documentation/dui0646/.

Cited By

View all
  • (2022)Design and Exploration of an ARC-Coprocessor for LSTM Based Audio Applications2022 IEEE Nordic Circuits and Systems Conference (NorCAS)10.1109/NorCAS57515.2022.9934553(1-7)Online publication date: 25-Oct-2022

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
ICIT '20: Proceedings of the 2020 8th International Conference on Information Technology: IoT and Smart City
December 2020
266 pages
ISBN:9781450388559
DOI:10.1145/3446999
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 09 April 2021

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Embedded Neural Networks
  2. Keyword Spotting
  3. XY architecture
  4. embARC MLI Library

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Funding Sources

Conference

ICIT 2020
ICIT 2020: IoT and Smart City
December 25 - 27, 2020
Xi'an, China

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)2
  • Downloads (Last 6 weeks)1
Reflects downloads up to 18 Aug 2024

Other Metrics

Citations

Cited By

View all
  • (2022)Design and Exploration of an ARC-Coprocessor for LSTM Based Audio Applications2022 IEEE Nordic Circuits and Systems Conference (NorCAS)10.1109/NorCAS57515.2022.9934553(1-7)Online publication date: 25-Oct-2022

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media