Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3308532.3329462acmconferencesArticle/Chapter ViewAbstractPublication PagesivaConference Proceedingsconference-collections
research-article

A Generic Machine Learning Based Approach for Addressee Detection In Multiparty Interaction

Published: 01 July 2019 Publication History

Abstract

Addressee detection is one of the most fundamental tasks for seamless dialogue management and turn taking in human-agent interaction. Whereas addressee detection is implicit in dyadic interaction, it becomes a challenging task in multiparty interactions when more than two participants are involved. Existing research works employ either rule-based or statistical approaches for addressee detection. However, most of these works either have been tested on a single data set or only support a fixed number of participants. In this article, we propose a model based on generic features to predict the addressee in data sets with varying number of participants. The results tested on two different corpora show that the proposed model outperforms existing baselines.

References

[1]
Harm Akker and Rieks Akker. 2009. Are You Being Addressed?-real-time addressee detection to support remote participants in hybrid meetings. In Proceedings of the SIGDIAL 2009 Conference. 21--28.
[2]
Rieks op den Akker and David Traum. 2009. A comparison of addressee detection methods for multiparty conversations. In Workshop on the Semantics and Pragmatics of Dialogue. 99--106.
[3]
Naoya Baba, Hung-Hsuan Huang, and Yukiko I Nakano. 2011. Identifying Utterances Addressed to an Agent in Multiparty Human--Agent Conversations. In International Workshop on Intelligent Virtual Agents. 255--261.
[4]
Jean Carletta. 2007. Unleashing the killer corpus: experiences in creating the multi-everything AMI Meeting Corpus. Language Resources and Evaluation 41, 2 (2007), 181--190.
[5]
Michel Galley, Kathleen McKeown, Julia Hirschberg, and Elizabeth Shriberg. 2004. Identifying agreement and disagreement in conversational speech: Use of bayesian networks to model pragmatic dependencies. In Proceedings of ACL'04. 669.
[6]
Erving Goffman. 1981. Forms of talk. University of Pennsylvania publications in conduct and communication.
[7]
Marti A. Hearst, Susan T Dumais, Edgar Osuna, John Platt, and Bernhard Scholkopf. 1998. Support vector machines. Intelligent Systems and their applications 13, 4 (1998), 18--28.
[8]
David W Hosmer Jr, Stanley Lemeshow, and Rodney X Sturdivant. 2013. Applied logistic regression. Vol. 398.
[9]
Natasa Jovanovic. 2007. To Whom It May Concern-Addressee Identification in Face-to-Face Meetings. (2007).
[10]
Natasa Jovanovic, Rieks op den Akker, and Anton Nijholt. 2006. A corpus for studying addressing behaviour in multi-party dialogues. LREC'06 40, 1 (2006), 5--23.
[11]
Maria Koutsombogera and Carl Vogel. 2018. Modeling collaborative multimodal behavior in group dialogues: the MULTISIMO Corpus. In Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC- 2018).
[12]
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2012. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems. 1097--1105.
[13]
Rudolf Kruse, Christian Borgelt, Frank Klawonn, Christian Moewes, Matthias Steinbrecher, and Pascal Held. 2013. Multi-layer perceptrons. In Computational Intelligence. 47--81.
[14]
Thao Minh Le, Nobuyuki Shimizu, Takashi Miyazaki, and Koichi Shinoda. 2018. Deep Learning Based Multi-modal Addressee Recognition in Visual Scenes with Utterances. arXiv preprint arXiv:1809.04288 (2018).
[15]
Andy Liaw, Matthew Wiener, et al. 2002. Classification and regression by randomForest. R news 2, 3 (2002), 18--22.
[16]
Usman Malik, Mukesh Barange, Julien Saunier, and Alexandre Pauchet. 2018. Performance Comparison of Machine Learning Models Trained on Manual vs ASR Transcriptions for Dialogue Act Annotation. In 2018 IEEE 30th International Conference on Tools with Artificial Intelligence (ICTAI). IEEE, 1013--1017.
[17]
Usman Malik., Mukesh Barange., Julien Saunier., and Alexandre Pauchet. 2019. Using Multimodal Information to Enhance Addressee Detection in Multiparty Interaction. In Proceedings of the 11th International Conference on Agents and Artificial Intelligence - Volume 1: ICAART,. INSTICC, SciTePress, 267--274.
[18]
Iain McCowan, Jean Carletta, W Kraaij, S Ashby, S Bourban, M Flynn, M Guillemot, T Hain, J Kadlec, V Karaiskos, et al. 2005. The AMI meeting corpus. In Proc. of the 5th International Conference on Methods and Techniques in Behavioral Research, Vol. 88. 100.
[19]
Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, Ron Weiss, Vincent Dubourg, et al. 2011. Scikit-learn: Machine learning in Python. Journal of machine learning research 12, Oct (2011), 2825--2830.
[20]
Chao-Ying Joanne Peng, Kuk Lida Lee, and Gary M Ingersoll. 2002. An introduction to logistic regression analysis and reporting. The journal of educational research 96, 1 (2002), 3--14.
[21]
Adria Recasens, Aditya Khosla, Carl Vondrick, and Antonio Torralba. 2015. Where are they looking?. In Adv. in Neural Information Processing Systems. 199--207.
[22]
Irina Rish et al. 2001. An empirical study of the naive Bayes classifier. In IJCAI 2001 workshop on empirical methods in artificial intelligence, Vol. 3. IBM New York, 41--46.
[23]
John Searle. 1969. Speech Acts: An Essay in the Philosophy of Language.
[24]
Ovidiu erban and Alexandre Pauchet. 2013. Agentslang: A fast and reliable platform for distributed interactive systems. In 2013 IEEE 9th International Conference on Intelligent Computer Communication and Processing (ICCP). IEEE, 35--42.
[25]
Selmar K Smit and Agoston E Eiben. 2009. Comparing parameter tuning methods for evolutionary algorithms. In Proc of CEC'09. 399--406.
[26]
David R Traum, Susan Robinson, and Jens Stephan. 2006. Evaluation of Multi- Party Reality Dialogue Interaction. Technical Report. University of Southern California Marina Del Rey CA Inst For Creative Technologies.
[27]
Roel Vertegaal. 1998. Look Who's Talking to Whom. Mediating Joint Attention in multiparty (1998).
[28]
Min-Ling Zhang and Zhi-Hua Zhou. 2005. A k-nearest neighbor based algorithm for multi-label classification. In Granular Computing, 2005 IEEE International Conference on, Vol. 2. IEEE, 718--721.

Cited By

View all
  • (2023)Addressee Detection Using Facial and Audio Features in Mixed Human–Human and Human–Robot Settings: A Deep Learning FrameworkIEEE Systems, Man, and Cybernetics Magazine10.1109/MSMC.2022.32248439:2(25-38)Online publication date: Apr-2023
  • (2021)A novel focus encoding scheme for addressee detection in multiparty interaction using machine learning algorithmsJournal on Multimodal User Interfaces10.1007/s12193-020-00361-915:2(175-188)Online publication date: 17-Jan-2021
  • (2020)Gaze, Dominance and Dialogue Role in the MULTISIMO Corpus2020 11th IEEE International Conference on Cognitive Infocommunications (CogInfoCom)10.1109/CogInfoCom50765.2020.9237833(000083-000088)Online publication date: 23-Sep-2020
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
IVA '19: Proceedings of the 19th ACM International Conference on Intelligent Virtual Agents
July 2019
282 pages
ISBN:9781450366724
DOI:10.1145/3308532
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 July 2019

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. human-agent interaction
  2. machine learning
  3. mixed communities
  4. multimodal interaction
  5. multiparty inter-action

Qualifiers

  • Research-article

Conference

IVA '19
Sponsor:

Acceptance Rates

IVA '19 Paper Acceptance Rate 15 of 63 submissions, 24%;
Overall Acceptance Rate 53 of 196 submissions, 27%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)9
  • Downloads (Last 6 weeks)1
Reflects downloads up to 09 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2023)Addressee Detection Using Facial and Audio Features in Mixed Human–Human and Human–Robot Settings: A Deep Learning FrameworkIEEE Systems, Man, and Cybernetics Magazine10.1109/MSMC.2022.32248439:2(25-38)Online publication date: Apr-2023
  • (2021)A novel focus encoding scheme for addressee detection in multiparty interaction using machine learning algorithmsJournal on Multimodal User Interfaces10.1007/s12193-020-00361-915:2(175-188)Online publication date: 17-Jan-2021
  • (2020)Gaze, Dominance and Dialogue Role in the MULTISIMO Corpus2020 11th IEEE International Conference on Cognitive Infocommunications (CogInfoCom)10.1109/CogInfoCom50765.2020.9237833(000083-000088)Online publication date: 23-Sep-2020
  • (2020)Feature Selection-Based Approach for Generalized Physical Contradiction RecognitionSystematic Complex Problem Solving in the Age of Digitalization and Open Innovation10.1007/978-3-030-61295-5_26(321-339)Online publication date: 9-Oct-2020

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media