Voice-Controlled Robotics in Early Education: Implementing and Validating Child-Directed Interactions Using a Collaborative Robot and Artificial Intelligence
Abstract
:1. Introduction
- We present a proof of concept for a robotic system responsive to children’s verbal instructions. This system not only advances the field of educational robotics with its novel capabilities but also serves as a foundational platform for further empirical exploration within the domain.
- Our comprehensive evaluation of critical technologies, including speech-to-text, large language models, and object detection models, to process and interpret children’s instructions, highlights these technologies’ potential to enhance educational experiences and pinpoints challenges for future research.
- To the best of our knowledge, this research represents the first attempt to utilize collaborative robots, similar to those in industrial settings, in both suitable and beneficial ways for early childhood education. This endeavor marks a significant step forward in educational robotics and opens new opportunities for development within educational contexts.
2. System Architecture and Methods
2.1. Experimental Setup: An Overview
2.2. System Setup Details
2.3. Implementation Details
2.3.1. Voice Recognition Module
2.3.2. Semantic Analysis Module
2.3.3. Vision-Based Object Detection Module
2.4. Safety and Privacy Measures
3. Results
3.1. Datasets
3.1.1. Geometric Figures Dataset
3.1.2. Voice Instructions Dataset (VID)
3.2. Object Detection Results
3.3. Speech-to-Text Results
3.4. Semantic Analysis Results
3.5. System Repeatability Results
3.6. Evaluation of the System with Children Aged 4 to 6
4. Discussion
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Castro, A.; Medina, J.; Aguilera, C.A.; Ramirez, M.; Aguilera, C. Robotics Education in STEM Units: Breaking Down Barriers in Rural Multigrade Schools. Sensors 2023, 23, 387. [Google Scholar] [CrossRef] [PubMed]
- Sisman, B.; Kucuk, S. An Educational Robotics Course: Examination of Educational Potentials and Pre-service Teachers’ Experiences. Int. J. Res. Educ. Sci. 2019, 5, 510–531. [Google Scholar]
- Misirli, A.; Komis, V. Robotics and Programming Concepts in Early Childhood Education: A Conceptual Framework for Designing Educational Scenarios. In Research on e-Learning and ICT in Education: Technological, Pedagogical and Instructional Perspectives; Karagiannidis, C., Politis, P., Karasavvidis, I., Eds.; Springer: New York, NY, USA, 2014; pp. 99–118. [Google Scholar] [CrossRef]
- Garvis, S.; Keane, T. A Literature Review of Educational Robotics and Early Childhood Education. In Technological Innovations in Education: Applications in Education and Teaching; Garvis, S., Keane, T., Eds.; Springer Nature: Singapore, 2023; pp. 71–83. [Google Scholar] [CrossRef]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I. Attention is All You Need. In Proceedings of the Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017. [Google Scholar]
- Radford, A.; Kim, J.W.; Xu, T.; Brockman, G.; McLeavey, C.; Sutskever, I. Robust Speech Recognition via Large-Scale Weak Supervision. In Proceedings of the International Conference on Machine Learning, Honolulu, HI, USA, 23–29 July 2023. [Google Scholar]
- Lv, W.; Xu, S.; Zhao, Y.; Wang, G.; Wei, J.; Cui, C.; Du, Y.; Dang, Q.; Liu, Y. DETRs Beat YOLOs on Real-time Object Detection. arXiv 2023, arXiv:2304.08069. [Google Scholar]
- Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You Only Look Once: Unified, Real-Time Object Detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar] [CrossRef]
- Cherti, M.; Beaumont, R.; Wightman, R.; Wortsman, M.; Ilharco, G.; Gordon, C.; Schuhmann, C.; Schmidt, L.; Jitsev, J. Reproducible scaling laws for contrastive language-image learning. In Proceedings of the Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 18–22 June 2023; pp. 2818–2829. [Google Scholar]
- Radford, A.; Kim, J.W.; Hallacy, C.; Ramesh, A.; Goh, G.; Agarwal, S.; Sastry, G.; Askell, A.; Mishkin, P.; Clark, J.; et al. Learning Transferable Visual Models From Natural Language Supervision. In Proceedings of the ICML, Virtual Event, 18–24 July 2021. [Google Scholar]
- Williams, R.; Park, H.W.; Oh, L.; Breazeal, C. PopBots: Designing an Artificial Intelligence Curriculum for Early Childhood Education. AAAI Conf. Artif. Intell. 2019, 33, 9729–9736. [Google Scholar] [CrossRef]
- Calo Mosquera, N.; García-Rodeja Gayoso, I.; Sesto Varela, V. Construyendo conceptos sobre electricidad en infantil mediante actividades de indagación. Enseñanza Cienc. Rev. Investig. Exp. Didácticas 2021, 39, 223–240. [Google Scholar] [CrossRef]
- Kambouri-Danos, M.; Ravanis, K.; Jameau, A.; Boilevin, J.M. Precursor models and early years science learning: A case study related to the water state changes. Early Child. Educ. J. 2019, 47, 475–488. [Google Scholar] [CrossRef]
- Mendez, E.; Ochoa, O.; Olivera-Guzman, D.; Soto-Herrera, V.H.; Luna-Sánchez, J.A.; Lucas-Dophe, C.; Lugo-del Real, E.; Ayala-Garcia, I.N.; Alvarado Perez, M.; González, A. Integration of Deep Learning and Collaborative Robot for Assembly Tasks. Appl. Sci. 2024, 14, 839. [Google Scholar] [CrossRef]
- Aguilera-Carrasco, C.A.; González-Böhme, L.F.; Valdes, F.; Quitral-Zapata, F.J.; Raducanu, B. A Hand-Drawn Language for Human–Robot Collaboration in Wood Stereotomy. IEEE Access 2023, 11, 100975–100985. [Google Scholar] [CrossRef]
- Leopard-Picovoice Speech-to-Text Engine. Available online: https://picovoice.ai/docs/leopard/ (accessed on 29 January 2024).
- Vosk Speech Recognition Toolkit: Offline Speech Recognition API for Android, iOS, Raspberry Pi and Servers with Python, Java, C# and Node. Available online: https://github.com/alphacep/vosk-api (accessed on 29 January 2024).
- Google Cloud Speech-to-Text. Available online: https://cloud.google.com/speech-to-text/ (accessed on 29 January 2024).
- Team, G.; Anil, R.; Borgeaud, S.; Wu, Y.; Alayrac, J.B.; Yu, J.; Soricut, R.; Schalkwyk, J.; Dai, A.M.; Hauth, A.; et al. Gemini: A Family of Highly Capable Multimodal Models. arXiv 2023, arXiv:2312.11805. [Google Scholar]
- OpenAI Introducing ChatGPT. OpenAI. 2022. Available online: https://openai.com/blog/chatgpt (accessed on 10 January 2024).
- Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.E.; Fu, C.; Berg, A.C. SSD: Single Shot MultiBox Detector. In Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands, 11–14 October 2016; pp. 21–37. [Google Scholar]
- Cai, Z.; Vasconcelos, N. Cascade R-CNN: Delving into High Quality Object Detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 6154–6162. [Google Scholar]
- He, K.; Gkioxari, G.; Dollár, P.; Girshick, R.B. Mask R-CNN. IEEE Trans. Pattern Anal. Mach. Intell. 2020, 42, 386–397. [Google Scholar] [CrossRef] [PubMed]
- Soviany, P.; Ionescu, R.T. Optimizing the Trade-off between Single-Stage and Two-Stage Object Detectors using Image Difficulty Prediction. arXiv 2018, arXiv:1803.08707. [Google Scholar]
- Bochkovskiy, A.; Wang, C.Y.; Liao, H.Y.M. YOLOv4: Optimal Speed and Accuracy of Object Detection. arXiv 2020, arXiv:2004.10934. [Google Scholar]
- Jocher, G.; Chaurasia, A.; Stoken, A.; Borovec, J.; Kwon, Y.; Michael, K.; Fang, J.; Yifu, Z.; Wong, C.; Montes, D.; et al. Ultralytics/yolov5: V7. 0-YOLOv5 SOTA realtime instance segmentation. Zenodo 2022. [Google Scholar]
- Tsai, R.; Lenz, R. A new technique for fully autonomous and efficient 3D robotics hand/eye calibration. IEEE Trans. Robot. Autom. 1989, 5, 345–358. [Google Scholar] [CrossRef]
- Unity Technologies. Unity Perception Package. 2020. Available online: https://github.com/Unity-Technologies/com.unity.perception (accessed on 9 March 2024).
- Jocher, G.; Chaurasia, A.; Qiu, J. Ultralytics YOLOv8. 2023. Available online: https://github.com/ultralytics/ultralytics (accessed on 9 March 2024).
- Lin, T.; Maire, M.; Belongie, S.J.; Bourdev, L.D.; Girshick, R.B.; Hays, J.; Perona, P.; Ramanan, D.; Doll’a r, P.; Zitnick, C.L. Microsoft COCO: Common Objects in Context. In Proceedings of the Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, 6–12 September 2014. [Google Scholar]
Subset | Number of Images |
---|---|
Training | 3500 |
Validation | 500 |
Testing | 1001 |
Total | 5001 |
Model | Class | Instances | mAP50 | mAP50-95 | GFLOPs |
---|---|---|---|---|---|
YOLOv8N | all | 400 | 0.994 | 0.834 | 8.1 |
cube | 161 | 0.993 | 0.836 | ||
cylinder | 89 | 0.995 | 0.882 | ||
star | 150 | 0.995 | 0.783 | ||
YOLOv8M | all | 400 | 0.983 | 0.818 | 78.7 |
cube | 161 | 0.970 | 0.777 | ||
cylinder | 89 | 0.995 | 0.881 | ||
star | 150 | 0.983 | 0.795 | ||
YOLOv8X | all | 400 | 0.983 | 0.812 | 257.4 |
cube | 161 | 0.959 | 0.768 | ||
cylinder | 89 | 0.995 | 0.862 | ||
star | 150 | 0.995 | 0.805 | ||
RT-DETR | all | 400 | 0.776 | 0.501 | 103.4 |
cube | 161 | 0.819 | 0.497 | ||
cylinder | 89 | 0.951 | 0.632 | ||
star | 150 | 0.558 | 0.372 |
Dataset | WB [6] | WT [6] | WS [6] | WM [6] | WL [6] | VS [17] | VN [17] | G [18] | GC [18] | L [16] |
---|---|---|---|---|---|---|---|---|---|---|
VID | 1.20 | 1.47 | 0.67 | 0.90 | 0.37 | 0.64 | 0.42 | 0.45 | 0.33 | 0.73 |
VID1S | 0.87 | 0.97 | 0.82 | 0.64 | 0.39 | 0.68 | 0.44 | 0.44 | 0.26 | 0.66 |
STT Correct | Gemini | GPT-3.5 Turbo | GPT-4 | ||||
---|---|---|---|---|---|---|---|
Correct | Runtime | Correct | Runtime | Correct | Runtime | ||
Total | 184 | 200 | 1.2 s | 192 | 1.1 s | 186 | 2.5 s |
Dataset | Total Attempts | Successful Attempts | by STT | Errors by Object Detection | by LLM |
---|---|---|---|---|---|
VID Tasks | 100 | 93 | 7 | 0 | 0 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Aguilera, C.A.; Castro, A.; Aguilera, C.; Raducanu, B. Voice-Controlled Robotics in Early Education: Implementing and Validating Child-Directed Interactions Using a Collaborative Robot and Artificial Intelligence. Appl. Sci. 2024, 14, 2408. https://doi.org/10.3390/app14062408
Aguilera CA, Castro A, Aguilera C, Raducanu B. Voice-Controlled Robotics in Early Education: Implementing and Validating Child-Directed Interactions Using a Collaborative Robot and Artificial Intelligence. Applied Sciences. 2024; 14(6):2408. https://doi.org/10.3390/app14062408
Chicago/Turabian StyleAguilera, Cristhian A., Angela Castro, Cristhian Aguilera, and Bogdan Raducanu. 2024. "Voice-Controlled Robotics in Early Education: Implementing and Validating Child-Directed Interactions Using a Collaborative Robot and Artificial Intelligence" Applied Sciences 14, no. 6: 2408. https://doi.org/10.3390/app14062408