short-paper

Open access

An AI-Powered Computer Vision Module for Social Interactive Agents

Authors:

Francesc Xavier Gaya Morey,

Cristina Manresa-Yee,

Jose Maria Buades RubioAuthors Info & Claims

Interacción '24: Proceedings of the XXIV International Conference on Human Computer Interaction

Article No.: 19, Pages 1 - 5

https://doi.org/10.1145/3657242.3658601

Published: 19 June 2024 Publication History

All formats PDF

Abstract

Social interactive agents play a crucial role in various domains, providing intelligent assistance in healthcare, entertainment, and education settings. Recent advancements in Artificial Intelligence (AI) have shown promising potential to enhance the autonomy of these agents. However, the lack of standardization in their development often results in the creation of complex functionalities that are challenging to transfer across different platforms. In this study, we introduce a general-purpose AI-powered computer vision module designed to address this challenge. Our module features a modular structure that enables easy scalability and integration into diverse environments. Currently supporting seven tasks, including face and person detection, facial recognition, facial expression recognition, facial landmarks estimation, age and gender estimation, and background subtraction, the module offers up to 21 computer vision methods. Additionally, we integrate explainability functionalities to enhance user trust in the system. Moving forward, we aim to expand the module by adding new tasks and methods to meet evolving needs. Our goal is to streamline the integration of AI capabilities into social interactive agents, simplifying their development and enhancing their utility across various applications.

References

[1]

Sebastian Bach, Alexander Binder, Grégoire Montavon, Frederick Klauschen, Klaus-Robert Müller, and Wojciech Samek. 2015. On Pixel-Wise Explanations for Non-Linear Classifier Decisions by Layer-Wise Relevance Propagation. PLOS ONE 10, 7 (07 2015), 1–46.

[2]

Alejandro Barredo Arrieta, Natalia Díaz-Rodríguez, Javier Del Ser, Adrien Bennetot, Siham Tabik, Alberto Barbado, Salvador Garcia, Sergio Gil-Lopez, Daniel Molina, Richard Benjamins, Raja Chatila, and Francisco Herrera. 2020. Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI. Information Fusion 58 (2020), 82–115.

Digital Library

[3]

Francois Chollet. 2017. Xception: Deep Learning With Depthwise Separable Convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]

A Costa, E Martinez-Martin, M Cazorla, and V Julian. 2018. PHAROS-PHysical Assistant RObot System. Sensors (Basel) 18 (8 2018). Issue 8.

[5]

N. Dalal and B. Triggs. 2005. Histograms of oriented gradients for human detection. In 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), Vol. 1. 886–893 vol. 1. https://doi.org/10.1109/CVPR.2005.177

Digital Library

[6]

Paul Ekman. 1992. An argument for basic emotions. Cognition and Emotion 6, 3-4 (1992), 169–200. https://doi.org/10.1080/02699939208411068

[7]

Terrence Fong, Illah Nourbakhsh, and Kerstin Dautenhahn. 2003. A survey of socially interactive robots. Robotics and Autonomous Systems 42, 3 (2003), 143–166. https://doi.org/10.1016/S0921-8890(02)00372-X Socially Interactive Robots.

[8]

Jia Guo, Jiankang Deng, Alexandros Lattas, and Stefanos Zafeiriou. 2021. Sample and Computation Redistribution for Efficient Face Detection. arxiv:2105.04714 [cs.CV]

[9]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2015. Deep Residual Learning for Image Recognition.

[10]

Andrew Howard, Mark Sandler, Grace Chu, Liang-Chieh Chen, Bo Chen, Mingxing Tan, Weijun Wang, Yukun Zhu, Ruoming Pang, Vijay Vasudevan, Quoc V. Le, and Hartwig Adam. 2019. Searching for MobileNetV3. arxiv:1905.02244 [cs.CV]

[11]

Andrew G. Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, and Hartwig Adam. 2017. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications. arxiv:1704.04861 [cs.CV]

[12]

Manuel Jesús-Azabal, Javier Rojo, Enrique Moguel, Daniel Flores-Martin, Javier Berrocal, José García-Alonso, and Juan M. Murillo. 2020. Voice Assistant to Remind Pharmacologic Treatment in Elders. In Gerontechnology, José García-Alonso and César Fonseca (Eds.). Springer International Publishing, Cham, 123–133.

[13]

Glenn Jocher, Ayush Chaurasia, and Jing Qiu. 2023. Ultralytics YOLO. https://github.com/ultralytics/ultralytics

[14]

Vahid Kazemi and Josephine Sullivan. 2014. One millisecond face alignment with an ensemble of regression trees. In 2014 IEEE Conference on Computer Vision and Pattern Recognition. 1867–1874. https://doi.org/10.1109/CVPR.2014.241

Digital Library

[15]

Lachaux Killian, Maitre Julien, Bouchard Kevin, Lussier Maxime, Bottari Carolina, Couture Mélanie, Bier Nathalie, Giroux Sylvain, and Gaboury Sebastien. 2021. Fall Prevention and Detection in Smart Homes Using Monocular Cameras and an Interactive Social Robot. Proceedings of the Conference on Information Technology for Social Good, 7–12.

Digital Library

[16]

Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2012. ImageNet Classification with Deep Convolutional Neural Networks. In Advances in Neural Information Processing Systems, F. Pereira, C.J. Burges, L. Bottou, and K.Q. Weinberger (Eds.). Vol. 25. Curran Associates, Inc.

[17]

Maksim Kuprashevich and Irina Tolstykh. 2023. MiVOLO: Multi-input Transformer for Age and Gender Estimation. (2023). arXiv:arXiv:2307.04616

[18]

Wei Li, Min Li, Zhong Su, and Zhigang Zhu. 2015. A deep-learning approach to facial expression recognition with candid images. In 2015 14th IAPR International Conference on Machine Vision Applications (MVA). IEEE, 279–282.

[19]

Scott M Lundberg and Su-In Lee. 2017. A Unified Approach to Interpreting Model Predictions. In Advances in Neural Information Processing Systems, I. Guyon, U. Von Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (Eds.). Vol. 30. Curran Associates, Inc.

[20]

Anastasia K. Ostrowski, Hae Won Park, and Cynthia Lynn Breazeal. 2020. Design Research in HRI: Roboticists, Design Features, and Users as Co-Designers.

[21]

Vitali Petsiuk, Abir Das, and Kate Saenko. 2018. RISE: Randomized Input Sampling for Explanation of Black-box Models.

[22]

Andrés Prados-Torreblanca, José M Buenaposada, and Luis Baumela. 2022. Shape Preserving Facial Landmarks with Graph Attention Networks. In 33rd British Machine Vision Conference 2022, BMVC 2022, London, UK, November 21-24, 2022. BMVA Press.

[23]

Xuebin Qin, Zichen Zhang, Chenyang Huang, Masood Dehghan, Osmar Zaiane, and Martin Jagersand. 2020. U2-Net: Going Deeper with Nested U-Structure for Salient Object Detection. Pattern Recognition 106, 107404.

[24]

Silvia Ramis, Jose M. Buades, Francisco J. Perales, and Cristina Manresa-Yee. 2022. A Novel Approach to Cross Dataset Studies in Facial Expression Recognition. Multimedia Tools Appl. 81, 27 (nov 2022), 39507–39544. https://doi.org/10.1007/s11042-022-13117-2

Digital Library

[25]

Silvia Ramis Guarinos, Cristina Manresa Yee, Jose Maria Buades Rubio, and Francesc Xavier Gaya-Morey. 2024. Explainable Facial Expression Recognition for People with Intellectual Disabilities. In Proceedings of the XXIII International Conference on Human Computer Interaction (Lleida, Spain) (Interaccion ’23). Association for Computing Machinery, New York, NY, USA, Article 5, 7 pages. https://doi.org/10.1145/3612783.3612789

Digital Library

[26]

Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. 2016. "Why Should I Trust You?": Explaining the Predictions of Any Classifier. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (San Francisco, California, USA) (KDD ’16). Association for Computing Machinery, New York, NY, USA, 1135–1144.

Digital Library

[27]

Hayley Robinson, Bruce MacDonald, Ngaire Kerse, and Elizabeth Broadbent. 2013. The Psychosocial Effects of a Companion Robot: A Randomized Controlled Trial. Journal of the American Medical Directors Association 14, 9 (2013), 661–667. https://doi.org/10.1016/j.jamda.2013.02.007

[28]

Ramprasaath R. Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. 2017. Grad-CAM: Visual Explanations From Deep Networks via Gradient-Based Localization. In Proceedings of the IEEE International Conference on Computer Vision (ICCV).

[29]

David Silvera-Tawil and Christine Roberts Yates. 2018. Socially-Assistive Robots to Enhance Learning for Secondary Students with Intellectual Disabilities and Autism. In 2018 27th IEEE International Symposium on Robot and Human Interactive Communication (RO-MAN). 838–843. https://doi.org/10.1109/ROMAN.2018.8525743

Digital Library

[30]

Karen Simonyan and Andrew Zisserman. 2015. Very Deep Convolutional Networks for Large-Scale Image Recognition. arxiv:1409.1556 [cs.CV]

[31]

Inchul Song, Hyun-Jun Kim, and Paul Barom Jeon. 2014. Deep learning for real-time robust facial expression recognition on a smartphone. In 2014 IEEE International Conference on Consumer Electronics (ICCE). 564–567. https://doi.org/10.1109/ICCE.2014.6776135

[32]

Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jonathon Shlens, and Zbigniew Wojna. 2015. Rethinking the Inception Architecture for Computer Vision. arxiv:1512.00567 [cs.CV]

[33]

Mingxing Tan and Quoc Le. 2019. EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. In Proceedings of the 36th International Conference on Machine Learning(Proceedings of Machine Learning Research, Vol. 97), Kamalika Chaudhuri and Ruslan Salakhutdinov (Eds.). PMLR, 6105–6114.

[34]

Mingxing Tan and Quoc V. Le. 2021. EfficientNetV2: Smaller Models and Faster Training. arxiv:2104.00298 [cs.CV]

[35]

P. Viola and M. Jones. 2001. Rapid object detection using a boosted cascade of simple features. In Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001, Vol. 1. I–I. https://doi.org/10.1109/CVPR.2001.990517

[36]

Kaipeng Zhang, Zhanpeng Zhang, Zhifeng Li, and Yu Qiao. 2016. Joint Face Detection and Alignment Using Multitask Cascaded Convolutional Networks. IEEE Signal Processing Letters 23, 10 (2016), 1499–1503. https://doi.org/10.1109/LSP.2016.2603342

[37]

Álvaro Sabater-Gárriz, F. Xavier Gaya-Morey, José María Buades-Rubio, Cristina Manresa Yee, Pedro Montoya, and Inmaculada Riquelme. 2024. Automated facial recognition system using deep learning for pain assessment in adults with cerebral palsy. arxiv:2401.12161 [cs.CV]

Index Terms

An AI-Powered Computer Vision Module for Social Interactive Agents
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision tasks
        Vision for robotics
2. Human-centered computing
  1. Human computer interaction (HCI)
    1. Interactive systems and tools
      1. User interface programming

Recommendations

Ambient Intelligence—the Next Step for Artificial Intelligence

Ambient intelligence (AmI) deals with a new world of ubiquitous computing devices, where physical environments interact intelligently and unobtrusively with people. These environments should be aware of people's needs, customizing requirements and ...
Trust in an AI versus a Human teammate: The effects of teammate identity and performance on Human-AI cooperation
Abstract
Recent advances in artificial intelligence (AI) enable researchers to create more powerful AI agents that are becoming competent teammates for humans. However, human distrust of AI is a critical factor that may impede human-AI ...
Highlights
- Humans accept their AI teammate's decision less often when they are deceived about the identity of the AI as another human.
RETRACTED ARTICLE: Computer vision for facial analysis using human–computer interaction models
Abstract
Currently, a facial analysis system for human–computer interfaces is presented and employed extensively. The increasing computer and digital speed, accuracy, and low-cost webcams that users can use bring computer vision systems more and more ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

Interacción '24: Proceedings of the XXIV International Conference on Human Computer Interaction

June 2024

155 pages

ISBN:9798400717871

DOI:10.1145/3657242

Copyright © 2024 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 19 June 2024

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Short-paper
Research
Refereed limited

Funding Sources

MCIN/AEI/10.13039/501100011033

Conference

INTERACCION 2024

INTERACCION 2024: XXIV Congreso Internacional de Interacción Persona-Ordenador \ XXIV International Conference on Human Computer Interaction

June 19 - 21, 2024

A Coruña, Spain

Acceptance Rates

Overall Acceptance Rate 109 of 163 submissions, 67%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
30
Total Downloads

Downloads (Last 12 months)30
Downloads (Last 6 weeks)30

Reflects downloads up to 26 Jul 2024

Other Metrics

View Author Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents