Abstract
In this paper, we present an Embodied Conversational Agent (ECA) enriched with automatic image understanding, using vision data derived from state-of-the-art machine learning techniques for the advancement of autonomous interaction with the elderly or infirm. The agent is developed to conduct health and emotion well-being monitoring for the elderly. It is not only able to conduct question-answering via speech-based interaction, but also able to provide analysis of the user’s surroundings, company, emotional states, hazards and fall actions via visual data using deep learning techniques. The agent is accessible from a web browser and can be communicated with via voice means, with a webcam required for the visual analysis functionality. The system has been evaluated with diverse real-life images to prove its efficiency.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
De Vos, E.: Look at that doggy in my windows, on effects of anthropomorphism in human-agent interaction. Doctoral Thesis, Utrecht University (2002)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)
Sermanet, P., Eigen, D., Zhang, X., Mathieu, M., Fergus, R., LeCun, Y.: OverFeat: integrated recognition, localization and detection using convolutional networks (2013)
Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature heirarchies for accurate object detection and semantic segmentation. IEEE Transactions on Computer Vision and Pattern Recognition (CVPR), pp. 580–587 (2014)
Girshick, R.: Fast R-CNN. In: Proceedings of ICCV 2015 (2015)
Facebook: Tornado (2011). http://www.tornadoweb.org/en/stable/
Wallace, R.: The elements of AIML style. Alice AI Foundation (2003). https://files.ifi.uzh.ch/cl/hess/classes/seminare/chatbots/style.pdf
Wallace, R.: Symbolic reductions in AIML (2000). http://www.alicebot.org/documentation/srai.html
Shires, G., Wennborg, H.: Web speech API specification. W3C Community Final Specification Agreement (2012). https://dvcs.w3.org/hg/speech-api/raw-file/tip/speechapi.html
Chatfield, K., Simonyan, K., Vedalsi, A., Zisserman, A.: Return of the devil in the details delving deep into convolutional neural nets. In: BMVC (2014)
Fei-Fei, L.: ImageNet: crowdsourcing, benchmarking and other cool things. CMU VASC Seminar (2010)
Kumar, N., Berg, A.C., Belhumeur, P.N., Nayar, S.K.: Attribute and simile classifiers for face verification. In: International Conference on Computer Vision (ICCV) (2009)
Zhou, B., Lapedriza, A., Xiao, J., Torralba, A., Oliva, A.: Learning deep features for scene recognition using places database. In: Advances in Neural Information Processing Systems, pp. 487–495 (2014)
Jia, Y. et al.: Caffe: an open source convolutional architecture for fast feature embedding (2013). http://caffe.berkeleyvision.org/
Neoh, S.C., Zhang, L., Mistry, K., Hossain, M.A., Lim, C.P., Aslam, N., Kinghorn, P.: Intelligent facial emotion recognition using a layered encoding cascade optimization model. Appl. Soft Comput. 34(2015), 72–93 (2015)
Mistry, K., Zhang, L., Neoh, S.C., Lim, C.P., Fielding, B.: A micro-GA embedded PSO feature selection approach to intelligent facial emotion recognition. IEEE Trans. Cybern. PP(99), 1–14 (2016). ISSN 2168-2267
Zhang, L., Mistry, K., Jiang, M., Neoh, S.C., Hossain, A.: Adaptive facial point detection and emotion recognition for a humanoid robot. Comput. Vis. Image Underst. 140, 93–114 (2015)
Lucey, P., Cohn, J.F., Kanade, T., Saragih, J., Ambadar, Z., Matthews, I.: The extended Cohn-Kanade dataset (CK+): a complete expression dataset for action unit and emotion-specified expression. In: Proceedings of CVPR4HB (2010)
Lin, C.: ROUGE: a package for automatic evaluation of summaries. In: Proceedings of Workshop on Text Summarization Branches Out (2004)
Grubinger, M., Clough, P.D., Müller, H., Deselaers, T.: The IAPR benchmark: a new evaluation resource for visual information systems. In: International Conference on Language Resources and Evaluation (2006)
Lin, D., Fidler, S., Kong, C., Urtasun, R.: Generating multi-sentence natural language descriptions of indoor scenes. In: British Machine Vision Conference (BMVC) (2015)
Karpathy, A., Fei-Fei, L.: Deep visual-semantic alignments for generating image descriptions. In: Computer Vision and Pattern Recognition (CVPR) (2015)
Vinyals, O., Toshev, A., Bengio, S., Erhan, D.: Show and tell: a neural image caption generator. In: Computer Vision and Pattern Recognition (CVPR) (2015)
Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks. In: Advances in Neural Information Processing Systems, pp. 3104–3112 (2014)
Jozefowicz, R., Zaremba, W., Sutskever, I.: An empirical exploration of recurrent network architectures. In: Proceedings of the 32nd International Conference on Machine Learning (ICML 2015), pp. 2342–2350 (2015)
Neoh, S.C., Srisukkham, W., Zhang, L, Todryk, S., Greystoke, B., Lim, C.P., Hossain, A., Aslam, N.: An intelligent decision support system for Leukaemia diagnosis using microscopic blood images. Sci. Rep. 5(14938), 1–14 (2015)
Bourouis, A., Feham, M., Hossain, M.A., Zhang, L.: An intelligent mobile based decision support system for retinal disease diagnosis. Decis. Support Syst. 59, 341–350 (2014)
Acknowledgments
We appreciate the funding support received from Higher Education Innovation Fund and RPPTV Ltd.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing AG
About this paper
Cite this paper
Fielding, B., Kinghorn, P., Mistry, K., Zhang, L. (2016). An Enhanced Intelligent Agent with Image Description Generation. In: Traum, D., Swartout, W., Khooshabeh, P., Kopp, S., Scherer, S., Leuski, A. (eds) Intelligent Virtual Agents. IVA 2016. Lecture Notes in Computer Science(), vol 10011. Springer, Cham. https://doi.org/10.1007/978-3-319-47665-0_10
Download citation
DOI: https://doi.org/10.1007/978-3-319-47665-0_10
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-47664-3
Online ISBN: 978-3-319-47665-0
eBook Packages: Computer ScienceComputer Science (R0)