An Enhanced Intelligent Agent with Image Description Generation

Fielding, Ben; Kinghorn, Philip; Mistry, Kamlesh; Zhang, Li

doi:10.1007/978-3-319-47665-0_10

Ben Fielding¹⁹,
Philip Kinghorn¹⁹,
Kamlesh Mistry¹⁹ &
…
Li Zhang¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10011))

Included in the following conference series:

International Conference on Intelligent Virtual Agents

2714 Accesses
4 Citations

Abstract

In this paper, we present an Embodied Conversational Agent (ECA) enriched with automatic image understanding, using vision data derived from state-of-the-art machine learning techniques for the advancement of autonomous interaction with the elderly or infirm. The agent is developed to conduct health and emotion well-being monitoring for the elderly. It is not only able to conduct question-answering via speech-based interaction, but also able to provide analysis of the user’s surroundings, company, emotional states, hazards and fall actions via visual data using deep learning techniques. The agent is accessible from a web browser and can be communicated with via voice means, with a webcam required for the visual analysis functionality. The system has been evaluated with diverse real-life images to prove its efficiency.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Computer Vision for Image Understanding: A Comprehensive Review

Drishti—Artificial Vision

Deep Photo Rally: Let’s Gather Conversational Pictures

References

De Vos, E.: Look at that doggy in my windows, on effects of anthropomorphism in human-agent interaction. Doctoral Thesis, Utrecht University (2002)
Google Scholar
Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)
Google Scholar
Sermanet, P., Eigen, D., Zhang, X., Mathieu, M., Fergus, R., LeCun, Y.: OverFeat: integrated recognition, localization and detection using convolutional networks (2013)
Google Scholar
Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature heirarchies for accurate object detection and semantic segmentation. IEEE Transactions on Computer Vision and Pattern Recognition (CVPR), pp. 580–587 (2014)
Google Scholar
Girshick, R.: Fast R-CNN. In: Proceedings of ICCV 2015 (2015)
Google Scholar
Facebook: Tornado (2011). http://www.tornadoweb.org/en/stable/
Wallace, R.: The elements of AIML style. Alice AI Foundation (2003). https://files.ifi.uzh.ch/cl/hess/classes/seminare/chatbots/style.pdf
Wallace, R.: Symbolic reductions in AIML (2000). http://www.alicebot.org/documentation/srai.html
Shires, G., Wennborg, H.: Web speech API specification. W3C Community Final Specification Agreement (2012). https://dvcs.w3.org/hg/speech-api/raw-file/tip/speechapi.html
Chatfield, K., Simonyan, K., Vedalsi, A., Zisserman, A.: Return of the devil in the details delving deep into convolutional neural nets. In: BMVC (2014)
Google Scholar
Fei-Fei, L.: ImageNet: crowdsourcing, benchmarking and other cool things. CMU VASC Seminar (2010)
Google Scholar
Kumar, N., Berg, A.C., Belhumeur, P.N., Nayar, S.K.: Attribute and simile classifiers for face verification. In: International Conference on Computer Vision (ICCV) (2009)
Google Scholar
Zhou, B., Lapedriza, A., Xiao, J., Torralba, A., Oliva, A.: Learning deep features for scene recognition using places database. In: Advances in Neural Information Processing Systems, pp. 487–495 (2014)
Google Scholar
Jia, Y. et al.: Caffe: an open source convolutional architecture for fast feature embedding (2013). http://caffe.berkeleyvision.org/
Neoh, S.C., Zhang, L., Mistry, K., Hossain, M.A., Lim, C.P., Aslam, N., Kinghorn, P.: Intelligent facial emotion recognition using a layered encoding cascade optimization model. Appl. Soft Comput. 34(2015), 72–93 (2015)
Article Google Scholar
Mistry, K., Zhang, L., Neoh, S.C., Lim, C.P., Fielding, B.: A micro-GA embedded PSO feature selection approach to intelligent facial emotion recognition. IEEE Trans. Cybern. PP(99), 1–14 (2016). ISSN 2168-2267
Article Google Scholar
Zhang, L., Mistry, K., Jiang, M., Neoh, S.C., Hossain, A.: Adaptive facial point detection and emotion recognition for a humanoid robot. Comput. Vis. Image Underst. 140, 93–114 (2015)
Article Google Scholar
Lucey, P., Cohn, J.F., Kanade, T., Saragih, J., Ambadar, Z., Matthews, I.: The extended Cohn-Kanade dataset (CK+): a complete expression dataset for action unit and emotion-specified expression. In: Proceedings of CVPR4HB (2010)
Google Scholar
Lin, C.: ROUGE: a package for automatic evaluation of summaries. In: Proceedings of Workshop on Text Summarization Branches Out (2004)
Google Scholar
Grubinger, M., Clough, P.D., Müller, H., Deselaers, T.: The IAPR benchmark: a new evaluation resource for visual information systems. In: International Conference on Language Resources and Evaluation (2006)
Google Scholar
Lin, D., Fidler, S., Kong, C., Urtasun, R.: Generating multi-sentence natural language descriptions of indoor scenes. In: British Machine Vision Conference (BMVC) (2015)
Google Scholar
Karpathy, A., Fei-Fei, L.: Deep visual-semantic alignments for generating image descriptions. In: Computer Vision and Pattern Recognition (CVPR) (2015)
Google Scholar
Vinyals, O., Toshev, A., Bengio, S., Erhan, D.: Show and tell: a neural image caption generator. In: Computer Vision and Pattern Recognition (CVPR) (2015)
Google Scholar
Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks. In: Advances in Neural Information Processing Systems, pp. 3104–3112 (2014)
Google Scholar
Jozefowicz, R., Zaremba, W., Sutskever, I.: An empirical exploration of recurrent network architectures. In: Proceedings of the 32nd International Conference on Machine Learning (ICML 2015), pp. 2342–2350 (2015)
Google Scholar
Neoh, S.C., Srisukkham, W., Zhang, L, Todryk, S., Greystoke, B., Lim, C.P., Hossain, A., Aslam, N.: An intelligent decision support system for Leukaemia diagnosis using microscopic blood images. Sci. Rep. 5(14938), 1–14 (2015)
Google Scholar
Bourouis, A., Feham, M., Hossain, M.A., Zhang, L.: An intelligent mobile based decision support system for retinal disease diagnosis. Decis. Support Syst. 59, 341–350 (2014)
Article Google Scholar

Download references

Acknowledgments

We appreciate the funding support received from Higher Education Innovation Fund and RPPTV Ltd.

Author information

Authors and Affiliations

Facutly of Engineering and Environment, Department of Computer Science and Digital Technologies, Northumbria University, Newcastle, NE1 8ST, UK
Ben Fielding, Philip Kinghorn, Kamlesh Mistry & Li Zhang

Authors

Ben Fielding
View author publications
You can also search for this author in PubMed Google Scholar
Philip Kinghorn
View author publications
You can also search for this author in PubMed Google Scholar
Kamlesh Mistry
View author publications
You can also search for this author in PubMed Google Scholar
Li Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Li Zhang .

Editor information

Editors and Affiliations

University of Southern California, Los Angeles, California, USA
David Traum
University of Southern California, Los Angeles, California, USA
William Swartout
US Army Research Laboratory, Los Angeles, California, USA
Peter Khooshabeh
Universität Bielefeld, Bielefeld, Nordrhein-Westfalen, Germany
Stefan Kopp
University of Southern California, Los Angeles, California, USA
Stefan Scherer
University of Southern California, Los Angeles, California, USA
Anton Leuski

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Fielding, B., Kinghorn, P., Mistry, K., Zhang, L. (2016). An Enhanced Intelligent Agent with Image Description Generation. In: Traum, D., Swartout, W., Khooshabeh, P., Kopp, S., Scherer, S., Leuski, A. (eds) Intelligent Virtual Agents. IVA 2016. Lecture Notes in Computer Science(), vol 10011. Springer, Cham. https://doi.org/10.1007/978-3-319-47665-0_10

Download citation

DOI: https://doi.org/10.1007/978-3-319-47665-0_10
Published: 19 October 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-47664-3
Online ISBN: 978-3-319-47665-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

An Enhanced Intelligent Agent with Image Description Generation

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Computer Vision for Image Understanding: A Comprehensive Review

Drishti—Artificial Vision

Deep Photo Rally: Let’s Gather Conversational Pictures

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

An Enhanced Intelligent Agent with Image Description Generation

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Computer Vision for Image Understanding: A Comprehensive Review

Drishti—Artificial Vision

Deep Photo Rally: Let’s Gather Conversational Pictures

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation