A Deep Learning-Powered Voice-Enabled Math Tutor for Kids

Banerjee, Arnab; Paul, Srijoy; Priya, Tisu; Rohit, Anamika; Das, Nibaran

doi:10.1007/978-3-031-23599-3_31

Arnab Banerjee^11,12,
Srijoy Paul¹¹,
Tisu Priya¹¹,
Anamika Rohit¹¹ &
…
Nibaran Das¹²

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1704))

Included in the following conference series:

International Conference on Recent Trends in Image Processing and Pattern Recognition

495 Accesses
1 Citations

Abstract

In this study, a voice enabled math tutor system is proposed that enables children to practice math problems on their own. For this, we have developed numerical sound dataset targeting the application. In the application, when the system is turned on, a math problem is generated, and the child will respond verbally to it. The system will categorize the audio data (user-provided answer) and produce a text number, which will then be further analysed and generate output as either a correct or erroneous answer. Any toy can be equipped with the proposed system, allowing a kid to practice problems while engaging with the system. A dataset named JUDVLP-BCRP: numeralSound.v1 is prepared, with 2315 audio data of numerals in the range of 0 to 9. In a typical setting, the audio data were collected from people in the age range of 10 to 60 from West Bengal, Jharkhand, Delhi, Assam, Bihar, and Orissa. After pre-processing the audios, Mel spectrograms were produced which acts as input by the deep neural network algorithms. The audio data has been classified using a number of well-known deep learning algorithms, including DenseNet-121, VGG-16, modified DenseNet121 (DenseNet-41), and modified VGG-16 (VGG-12). Using DenseNet-121, VGG-16, DenseNet-41, VGG-12, 94.60%, 98.70%, 98.27%, and 98.48% accuracy was obtained. The networks were run for 100 epochs using a learning rate of 0.0001, and categorical cross-entropy loss function. The VGG-16 produced the highest precision of 98.9%, and the VGG-12 produced the second-best precision of 98.6%. The outcomes are positive and influence a workable system design.

Supported by Dr. B. C. Roy Polytechnic and Jadavpur University.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Subject dependent speech verification approach for assistive special education

Article Open access 07 February 2024

Deep Learning-Based Automatic Speech and Emotion Recognition for Students with Disabilities: A Review

Big Data Based English Oral Teaching by Voice Network Analysis in 6G Wireless Sensor Transmission Model

Article 01 June 2024

Notes

1.
Click here to place a request to access the JUDVLP-BCRP: numeralSounddb.v1 dataset.

References

Khamparia, A., Gupta, D., Nguyen, N.G., Khanna, A., Pandey, B., Tiwari, P.: Sound classification using convolutional neural network and tensor deep stacking network. IEEE Access 7, 7717–7727 (2019). https://doi.org/10.1109/ACCESS.2018.2888882
Article Google Scholar
Mu, W., Yin, B., Huang, X., et al.: Environmental sound classification using temporal-frequency attention based convolutional neural network. Sci. Rep. 11, 21552 (2021). https://doi.org/10.1038/s41598-021-01045-4
Article Google Scholar
Salamon, J., Jacoby, C., Bello, J.P.: A dataset and taxonomy for urban sound research. In: MM 2014 Proceedings of the 22nd ACM International Conference on Multimedia, no. 3, p. 1041–1044 (2014)
Google Scholar
Piczak, K.J.: ESC: Dataset for environmental sound classification. In: Proceedings of the 23rd ACM International Conference on Multimedia, pp. 1015–1018 (2015)
Google Scholar
Agarwal, S., Khatter, K., Relan, D.: Security threat sounds classification using neural network. In: 2021 8th International Conference on Computing for Sustainable Global Development (INDIACom) 2021, pp. 690–694 (2021)
Google Scholar
Ghildiyal, A., Singh, K., Sharma, S.: Music genre classification using machine learning. In: 2020 4th International Conference on Electronics, Communication and Aerospace Technology (ICECA) 2020, pp. 1368–1372 (2020). https://doi.org/10.1109/ICECA49313.2020.9297444
Tzanetakis, G., Cook, P.: Musical genre classification of audio signals. IEEE Trans. Speech Audio Process. 10(5), 293–302 (2002). https://doi.org/10.1109/TSA.2002.800560
Article Google Scholar
Pendyala, V.S., Yadav, N., Kulkarni, C., Vadlamudi, L.: Towards building a deep learning based automated Indian classical music tutor for the masses, systems and soft computing, vol. 4, p. 200042, ISSN 2772-9419 (2022). https://doi.org/10.1016/j.sasc.2022.200042
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: 3rd International Conference on Learning Representations, ICLR 2015, 7–9 May 2015, San Diego, CA, USA, Conference Track Proceedings (2015)
Google Scholar
Huang, G., Liu, Z., Van Der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2017, 2261–2269 (2017). https://doi.org/10.1109/CVPR.2017.243
Article Google Scholar

Download references

Author information

Authors and Affiliations

Dr. B. C. Roy Polytechnic, Durgapur, 713206, West Bengal, India
Arnab Banerjee, Srijoy Paul, Tisu Priya & Anamika Rohit
Jadavpur University, Kolkata, 700032, West Bengal, India
Arnab Banerjee & Nibaran Das

Authors

Arnab Banerjee
View author publications
You can also search for this author in PubMed Google Scholar
Srijoy Paul
View author publications
You can also search for this author in PubMed Google Scholar
Tisu Priya
View author publications
You can also search for this author in PubMed Google Scholar
Anamika Rohit
View author publications
You can also search for this author in PubMed Google Scholar
Nibaran Das
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Arnab Banerjee .

Editor information

Editors and Affiliations

University of South Dakota, Vermillion, SD, USA
KC Santosh
Texas A&M University, College Station, TX, USA
Ayush Goyal
University of Luxembourg, Luxembourg, Luxembourg
Djamila Aouada
University of Derby, Derby, UK
Aaisha Makkar
University of Minnesota, Minneapolis, MN, USA
Yao-Yi Chiang
IIIT Allahabad, Allahabad, India
Satish K Singh

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Banerjee, A., Paul, S., Priya, T., Rohit, A., Das, N. (2023). A Deep Learning-Powered Voice-Enabled Math Tutor for Kids. In: Santosh, K., Goyal, A., Aouada, D., Makkar, A., Chiang, YY., Singh, S.K. (eds) Recent Trends in Image Processing and Pattern Recognition. RTIP2R 2022. Communications in Computer and Information Science, vol 1704. Springer, Cham. https://doi.org/10.1007/978-3-031-23599-3_31

Download citation

DOI: https://doi.org/10.1007/978-3-031-23599-3_31
Published: 11 January 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-23598-6
Online ISBN: 978-3-031-23599-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

A Deep Learning-Powered Voice-Enabled Math Tutor for Kids

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Subject dependent speech verification approach for assistive special education

Deep Learning-Based Automatic Speech and Emotion Recognition for Students with Disabilities: A Review

Big Data Based English Oral Teaching by Voice Network Analysis in 6G Wireless Sensor Transmission Model

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

A Deep Learning-Powered Voice-Enabled Math Tutor for Kids

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Subject dependent speech verification approach for assistive special education

Deep Learning-Based Automatic Speech and Emotion Recognition for Students with Disabilities: A Review

Big Data Based English Oral Teaching by Voice Network Analysis in 6G Wireless Sensor Transmission Model

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation