Abstract
In this study, a voice enabled math tutor system is proposed that enables children to practice math problems on their own. For this, we have developed numerical sound dataset targeting the application. In the application, when the system is turned on, a math problem is generated, and the child will respond verbally to it. The system will categorize the audio data (user-provided answer) and produce a text number, which will then be further analysed and generate output as either a correct or erroneous answer. Any toy can be equipped with the proposed system, allowing a kid to practice problems while engaging with the system. A dataset named JUDVLP-BCRP: numeralSound.v1 is prepared, with 2315 audio data of numerals in the range of 0 to 9. In a typical setting, the audio data were collected from people in the age range of 10 to 60 from West Bengal, Jharkhand, Delhi, Assam, Bihar, and Orissa. After pre-processing the audios, Mel spectrograms were produced which acts as input by the deep neural network algorithms. The audio data has been classified using a number of well-known deep learning algorithms, including DenseNet-121, VGG-16, modified DenseNet121 (DenseNet-41), and modified VGG-16 (VGG-12). Using DenseNet-121, VGG-16, DenseNet-41, VGG-12, 94.60%, 98.70%, 98.27%, and 98.48% accuracy was obtained. The networks were run for 100 epochs using a learning rate of 0.0001, and categorical cross-entropy loss function. The VGG-16 produced the highest precision of 98.9%, and the VGG-12 produced the second-best precision of 98.6%. The outcomes are positive and influence a workable system design.
Supported by Dr. B. C. Roy Polytechnic and Jadavpur University.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
Click here to place a request to access the JUDVLP-BCRP: numeralSounddb.v1 dataset.
References
Khamparia, A., Gupta, D., Nguyen, N.G., Khanna, A., Pandey, B., Tiwari, P.: Sound classification using convolutional neural network and tensor deep stacking network. IEEE Access 7, 7717–7727 (2019). https://doi.org/10.1109/ACCESS.2018.2888882
Mu, W., Yin, B., Huang, X., et al.: Environmental sound classification using temporal-frequency attention based convolutional neural network. Sci. Rep. 11, 21552 (2021). https://doi.org/10.1038/s41598-021-01045-4
Salamon, J., Jacoby, C., Bello, J.P.: A dataset and taxonomy for urban sound research. In: MM 2014 Proceedings of the 22nd ACM International Conference on Multimedia, no. 3, p. 1041–1044 (2014)
Piczak, K.J.: ESC: Dataset for environmental sound classification. In: Proceedings of the 23rd ACM International Conference on Multimedia, pp. 1015–1018 (2015)
Agarwal, S., Khatter, K., Relan, D.: Security threat sounds classification using neural network. In: 2021 8th International Conference on Computing for Sustainable Global Development (INDIACom) 2021, pp. 690–694 (2021)
Ghildiyal, A., Singh, K., Sharma, S.: Music genre classification using machine learning. In: 2020 4th International Conference on Electronics, Communication and Aerospace Technology (ICECA) 2020, pp. 1368–1372 (2020). https://doi.org/10.1109/ICECA49313.2020.9297444
Tzanetakis, G., Cook, P.: Musical genre classification of audio signals. IEEE Trans. Speech Audio Process. 10(5), 293–302 (2002). https://doi.org/10.1109/TSA.2002.800560
Pendyala, V.S., Yadav, N., Kulkarni, C., Vadlamudi, L.: Towards building a deep learning based automated Indian classical music tutor for the masses, systems and soft computing, vol. 4, p. 200042, ISSN 2772-9419 (2022). https://doi.org/10.1016/j.sasc.2022.200042
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: 3rd International Conference on Learning Representations, ICLR 2015, 7–9 May 2015, San Diego, CA, USA, Conference Track Proceedings (2015)
Huang, G., Liu, Z., Van Der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2017, 2261–2269 (2017). https://doi.org/10.1109/CVPR.2017.243
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Banerjee, A., Paul, S., Priya, T., Rohit, A., Das, N. (2023). A Deep Learning-Powered Voice-Enabled Math Tutor for Kids. In: Santosh, K., Goyal, A., Aouada, D., Makkar, A., Chiang, YY., Singh, S.K. (eds) Recent Trends in Image Processing and Pattern Recognition. RTIP2R 2022. Communications in Computer and Information Science, vol 1704. Springer, Cham. https://doi.org/10.1007/978-3-031-23599-3_31
Download citation
DOI: https://doi.org/10.1007/978-3-031-23599-3_31
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-23598-6
Online ISBN: 978-3-031-23599-3
eBook Packages: Computer ScienceComputer Science (R0)