Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

A Deep Learning-Powered Voice-Enabled Math Tutor for Kids

  • Conference paper
  • First Online:
Recent Trends in Image Processing and Pattern Recognition (RTIP2R 2022)

Abstract

In this study, a voice enabled math tutor system is proposed that enables children to practice math problems on their own. For this, we have developed numerical sound dataset targeting the application. In the application, when the system is turned on, a math problem is generated, and the child will respond verbally to it. The system will categorize the audio data (user-provided answer) and produce a text number, which will then be further analysed and generate output as either a correct or erroneous answer. Any toy can be equipped with the proposed system, allowing a kid to practice problems while engaging with the system. A dataset named JUDVLP-BCRP: numeralSound.v1 is prepared, with 2315 audio data of numerals in the range of 0 to 9. In a typical setting, the audio data were collected from people in the age range of 10 to 60 from West Bengal, Jharkhand, Delhi, Assam, Bihar, and Orissa. After pre-processing the audios, Mel spectrograms were produced which acts as input by the deep neural network algorithms. The audio data has been classified using a number of well-known deep learning algorithms, including DenseNet-121, VGG-16, modified DenseNet121 (DenseNet-41), and modified VGG-16 (VGG-12). Using DenseNet-121, VGG-16, DenseNet-41, VGG-12, 94.60%, 98.70%, 98.27%, and 98.48% accuracy was obtained. The networks were run for 100 epochs using a learning rate of 0.0001, and categorical cross-entropy loss function. The VGG-16 produced the highest precision of 98.9%, and the VGG-12 produced the second-best precision of 98.6%. The outcomes are positive and influence a workable system design.

Supported by Dr. B. C. Roy Polytechnic and Jadavpur University.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    Click here to place a request to access the JUDVLP-BCRP: numeralSounddb.v1 dataset.

References

  1. Khamparia, A., Gupta, D., Nguyen, N.G., Khanna, A., Pandey, B., Tiwari, P.: Sound classification using convolutional neural network and tensor deep stacking network. IEEE Access 7, 7717–7727 (2019). https://doi.org/10.1109/ACCESS.2018.2888882

    Article  Google Scholar 

  2. Mu, W., Yin, B., Huang, X., et al.: Environmental sound classification using temporal-frequency attention based convolutional neural network. Sci. Rep. 11, 21552 (2021). https://doi.org/10.1038/s41598-021-01045-4

    Article  Google Scholar 

  3. Salamon, J., Jacoby, C., Bello, J.P.: A dataset and taxonomy for urban sound research. In: MM 2014 Proceedings of the 22nd ACM International Conference on Multimedia, no. 3, p. 1041–1044 (2014)

    Google Scholar 

  4. Piczak, K.J.: ESC: Dataset for environmental sound classification. In: Proceedings of the 23rd ACM International Conference on Multimedia, pp. 1015–1018 (2015)

    Google Scholar 

  5. Agarwal, S., Khatter, K., Relan, D.: Security threat sounds classification using neural network. In: 2021 8th International Conference on Computing for Sustainable Global Development (INDIACom) 2021, pp. 690–694 (2021)

    Google Scholar 

  6. Ghildiyal, A., Singh, K., Sharma, S.: Music genre classification using machine learning. In: 2020 4th International Conference on Electronics, Communication and Aerospace Technology (ICECA) 2020, pp. 1368–1372 (2020). https://doi.org/10.1109/ICECA49313.2020.9297444

  7. Tzanetakis, G., Cook, P.: Musical genre classification of audio signals. IEEE Trans. Speech Audio Process. 10(5), 293–302 (2002). https://doi.org/10.1109/TSA.2002.800560

    Article  Google Scholar 

  8. Pendyala, V.S., Yadav, N., Kulkarni, C., Vadlamudi, L.: Towards building a deep learning based automated Indian classical music tutor for the masses, systems and soft computing, vol. 4, p. 200042, ISSN 2772-9419 (2022). https://doi.org/10.1016/j.sasc.2022.200042

  9. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: 3rd International Conference on Learning Representations, ICLR 2015, 7–9 May 2015, San Diego, CA, USA, Conference Track Proceedings (2015)

    Google Scholar 

  10. Huang, G., Liu, Z., Van Der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2017, 2261–2269 (2017). https://doi.org/10.1109/CVPR.2017.243

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Arnab Banerjee .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Banerjee, A., Paul, S., Priya, T., Rohit, A., Das, N. (2023). A Deep Learning-Powered Voice-Enabled Math Tutor for Kids. In: Santosh, K., Goyal, A., Aouada, D., Makkar, A., Chiang, YY., Singh, S.K. (eds) Recent Trends in Image Processing and Pattern Recognition. RTIP2R 2022. Communications in Computer and Information Science, vol 1704. Springer, Cham. https://doi.org/10.1007/978-3-031-23599-3_31

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-23599-3_31

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-23598-6

  • Online ISBN: 978-3-031-23599-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics