Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Task-independent Recognition of Communication Skills in Group Interaction Using Time-series Modeling

Published: 12 November 2021 Publication History

Abstract

Case studies of group discussions are considered an effective way to assess communication skills (CS). This method can help researchers evaluate participants’ engagement with each other in a specific realistic context. In this article, multimodal analysis was performed to estimate CS indices using a three-task-type group discussion dataset, the MATRICS corpus. The current research investigated the effectiveness of engaging both static and time-series modeling, especially in task-independent settings. This investigation aimed to understand three main points: first, the effectiveness of time-series modeling compared to nonsequential modeling; second, multimodal analysis in a task-independent setting; and third, important differences to consider when dealing with task-dependent and task-independent settings, specifically in terms of modalities and prediction models. Several modalities were extracted (e.g., acoustics, speaking turns, linguistic-related movement, dialog tags, head motions, and face feature sets) for inferring the CS indices as a regression task. Three predictive models, including support vector regression (SVR), long short-term memory (LSTM), and an enhanced time-series model (an LSTM model with a combination of static and time-series features), were taken into account in this study. Our evaluation was conducted by using the R2 score in a cross-validation scheme. The experimental results suggested that time-series modeling can improve the performance of multimodal analysis significantly in the task-dependent setting (with the best R2 = 0.797 for the total CS index), with word2vec being the most prominent feature. Unfortunately, highly context-related features did not fit well with the task-independent setting. Thus, we propose an enhanced LSTM model for dealing with task-independent settings, and we successfully obtained better performance with the enhanced model than with the conventional SVR and LSTM models (the best R2 = 0.602 for the total CS index). In other words, our study shows that a particular time-series modeling can outperform traditional nonsequential modeling for automatically estimating the CS indices of a participant in a group discussion with regard to task dependency.

References

[1]
Jennifer C. Greene and Brant R. Burleson. 2003. Handbook of Communication and Social Interaction Skills. Lawrence Erlbaum Associates Publishers. DOI:https://doi.org/10.4324/9781410607133
[2]
Maggie Lu. 2002. The Harvard Business School Guide to Careers in Management Consulting. Harvard Business School Pr.
[3]
Oya Aran and Daniel Gatica-Perez. 2013. One of a kind: Inferring personality impressions in meetings. In Proceedings of the ICMI. 11–18. DOI:https://doi.org/10.1145/2522848.2522859
[4]
J.-I. Biel, L. Teijeiro-Mosquera, and D. Gatica-Perez. 2012. FaceTube: Predicting personality from facial expressions of emotion in online conversational video. In Proceedings of the ICMI. 53–56. DOI:https://doi.org/10.1145/2388676.2388689
[5]
Candy Olivia Mawalim, Shogo Okada, Yukiko Nakano, and Masashi Unoki. 2019. Multimodal bigfive personality trait analysis using communication skill indices and multiple discussion types dataset. In Proceedings of the HCI International. 370–383. DOI:
[6]
Dairazalia Sanchez-Cortes, Oya Aran, Marianne Mast, and Daniel Gatica-Perez. 2012. A nonverbal behavior approach to identify emergent leaders in small groups. IEEE Trans. Multimedia 14 (2012), 816–832. DOI:https://doi.org/10.1109/TMM.2011.2181941
[7]
Stefan Scherer, Nadir Weibel, Sharon Oviatt, and Louis-Philippe Morency. 2012. Multimodal prediction of expertise and leadership in learning groups. In Proceedings of the MLA. DOI:https://doi.org/10.1145/2389268.2389269
[8]
Vikram Ramanarayanan, Chee Wee Leong, Lei Chen, Gary Feng, and David Suendermann-Oeft. 2015. Evaluating speech, face, emotion and body movement time-series features for automated multimodal presentation scoring. In Proceedings of the ICMI. 23–30. DOI:https://doi.org/10.1145/2818346.2820765
[9]
Fabio Valente, Samuel Kim, and Petr Motlicek. 2012. Annotation and recognition of personality traits in spoken conversations from the AMI meetings corpus. In Proceedings of INTERSPEECH.
[10]
Firoj Alam, Morena Danieli, and Giuseppe Riccardi. 2017. Annotating and modeling empathy in spoken conversations. arxiv:1705.04839.
[11]
Sunghyun Park, Han Shim, Moitreya Chatterjee, Kenji Sagae, and Louis-Philippe Morency. 2014. Computational analysis of persuasiveness in social multimedia. In Proceedings of the ICMI. https://doi.org/10.1145/2663204.2663260
[12]
Shogo Okada, Yoshihiko Ohtake, Yukiko Nakano, Yuki Hayashi, Hung-Hsuan Huang, Yutaka Takase, and Katsumi Nitta. 2016. Estimating communication skills using dialogue acts and nonverbal features in multiple discussion datasets. In Proceedings of the ICMI. 169–176. DOI:https://doi.org/10.1145/2993148.2993154
[13]
Fumio Nihei, Yukiko I. Nakano, Yuki Hayashi, Hung-Hsuan Hung, and Shogo Okada. 2014. Predicting influential statements in group discussions using speech and head motion information. In Proceedings of the ICMI. 136–143. DOI:https://doi.org/10.1145/2663204.2663248
[14]
Franz Adler. 1965. The conduct of inquiry: Methodology for behavioral science. Soc. Forces 44, 1 (1965), 126–127.
[15]
Owen Hargie. 2019. The Handbook of Communication Skills. Routledge, London, UK.
[16]
Renee Edwards. 2011. Listening and message interpretation. Int. J. Listen. 25 (01 2011), 47–65. DOI:
[17]
D. Hymes. 1972. On Communicative Competence. Penguin, Harmondsworth, UK.
[18]
L. F. Bachman. 1990. Fundamental Considerations in Language Testing. Oxford University Press, Oxford, UK.
[19]
I. Naim, M. I. Tanveer, D. Gildea, and M. E. Hoque. 2018. Automated analysis and prediction of job interview performance. IEEE Trans. Affect. Comput. 9, 2 (2018), 191–204. DOI:
[20]
A. Sapru and H. Bourlard. 2015. Automatic recognition of emergent social roles in small group interactions. IEEE Trans. Multimedia 17, 5 (2015), 746–760. DOI:
[21]
Sowmya Rasipuram and Dinesh Babu Jayagopi. 2018. Automatic assessment of communication skill in interview-based interactions. Multimedia Tools Appl. 77, 14 (2018), 18709–18739. DOI:https://doi.org/10.1007/s11042-018-5654-9
[22]
Yash Mehta, Navonil Majumder, Alexander Gelbukh, and Erik Cambria. 2019. Recent trends in deep learning based personality detection. Artific. Intell. Rev. 53 (2019). DOI:
[23]
Yun-Shao Lin and Chi-Chun Lee. 2018. Using interlocutor-modulated attention BLSTM to predict personality traits in small group interaction. In Proceedings of the ICMI. 163–169. DOI:https://doi.org/10.1145/3242969.3243001
[24]
Cigdem Beyan, Vasiliki-Maria Katsageorgiou, and Vittorio Murino. 2019. A sequential data analysis approach to detect emergent leaders in small groups. IEEE Trans. Multimedia (2019). DOI:
[25]
Peter Bull. 2002. Communication under the microscope: The theory and practice of microanalysis. Routledge. DOI:https://doi.org/10.4324/9780203408025
[26]
Florian Eyben, Martin Wöllmer, and Björn Schuller. 2010. openSMILE—The Munich versatile and fast open-source audio feature extractor. In Proceedings of the ACM Multimedia. 1459–1462. DOI:https://doi.org/10.1145/1873951.1874246
[27]
B. Schuller, S. Steidl, A. Batliner, E. Noeth, A. Vinciarelli, F. Burkhardt, R. van Son, Felix Weninger, F. Eyben, T. Bocklet, G. Mohammadi, and B. Weiss. 2012. The INTERSPEECH 2012 speaker trait challenge. In Proceedings of INTERSPEECH.
[28]
B. Schuller, S. Steidl, A. Batliner, E. Noeth, A. Vinciarelli, F. Burkhardt, R. van Son, F. Weninger, F. Eyben, T. Bocklet, G. Mohammadi, and B. Weiss. 2014. A survey on perceived speaker traits: Personality, likability, pathology and the first challenge. Comput. Speech Lang. 29 (2014). DOI:
[29]
Taku Kudo, Kaoru Yamamoto, and Yuji Matsumoto. 2004. Applying conditional random fields to Japanese morphological analysis. In Proceedings of EMNLP. 230–237.
[30]
Quoc Le and Tomas Mikolov. 2014. Distributed representations of sentences and documents. Proceedings of the 31st International Conference on Machine Learning 32, 2 (2014), 1188–1196.
[31]
Tomas Mikolov, Greg Corrado, Kai Chen, and Jeffrey Dean. 2013. Efficient estimation of word representations in vector space. 1–12. arxiv:1301.3781.
[32]
Mark G. Core and James Allen. 1997. Coding dialogs with the DAMSL annotation scheme. In AAAI Fall Symposium on Communicative Action in Humans and Machines, Vol. 56. 28–35.
[33]
Elizabeth Shriberg, Raj Dhillon, Sonali Bhagat, Jeremy Ang, and Hannah Carvey-Essenburg. 2004. The ICSI meeting recorder dialogue act (MRDA) corpus. SIGdial (Apr. 2004).
[34]
Tadas Baltrusaitis, Amir Zadeh, Yao Lim, and Louis-Philippe Morency. 2018. OpenFace 2.0: Facial behavior analysis toolkit. In Proceeding of the FG. 59–66. DOI:
[35]
Yingli Tian, Takeo Kanade, and Jeffrey Cohn. 2001. Recognizing action units for facial expression analysis. IEEE Trans. Pattern Anal. Mach. Intell. 23 (2001), 97–115. DOI:https://doi.org/10.1109/34.908962
[36]
Gwen Littlewort, Mark Frank, Claudia Lainscsek, Ian Fasel, and Javier Movellan. 2005. Recognizing facial expression: Machine learning and application to spontaneous behavior. In Proceeding of the IEEE CVPR. 568–573. DOI:https://doi.org/10.1109/CVPR.2005.297
[37]
E. Wood, T. Baltruaitis, X. Zhang, Y. Sugano, P. Robinson, and A. Bulling. 2015. Rendering of eyes for eye-shape registration and gaze estimation. In Proceedings of the IEEE ICCV. 3756–3764. DOI:https://doi.org/10.1109/ICCV.2015.428
[38]
Amir Zadeh, Tadas Baltrusaitis, and Louis-Philippe Morency. 2017. Convolutional experts constrained local model for facial landmark detection. In Proceedings of the IEEE CVPRW. 2051–2059. DOI:
[39]
S. Sabanovic, M. P. Michalowski, and Reid Simmons. 2006. Robots in the wild: Observing human-robot social interaction outside the lab. In Proceedings of the IEEE AMC. 596–601. DOI:
[40]
Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural Comput. 9 (1997), 1735–80. DOI:https://doi.org/10.1162/neco.1997.9.8.1735
[41]
Laurent Nguyen, Denise Frauendorfer, Marianne Mast, and Daniel Gatica-Perez. 2014. Hire me: Computational inference of hirability in employment interviews based on nonverbal behavior. IEEE Trans. Multimedia 16 (2014), 1018–1031. DOI:https://doi.org/10.1109/TMM.2014.2307169
[42]
Harris Drucker, Chris J. C. Burges, Linda Kaufman, Alex Smola, and Vladimir Vapnik. 2003. Support vector regression machines. Adv. Neural Info. Process. Syst. 9 (2003).
[43]
Shogo Okada, Yoshihiro Matsugi, Yukiko Nakano, Yuki Hayashi, Hung-Hsuan Huang, Yutaka Takase, and Katsumi Nitta. 2016. Estimating communication skills based on multimodal information in group discussions. Trans. Japan. Soc. Artific. Intell. 31 (2016), AI30–E_1. DOI:
[44]
Isabelle Guyon, Jason Weston, Stephen Barnhill, and Vladimir Vapnik. 2002. Gene selection for cancer classification using support vector machines. Mach. Learn. 46 (2002), 389–422. DOI:https://doi.org/10.1023/A:1012487302797
[45]
Martin Wöllmer, Angeliki Metallinou, Nassos Katsamanis, Björn Schuller, and Shrikanth Narayanan. 2012. Analyzing the memory of BLSTM neural networks for enhanced emotion classification in dyadic spoken interactions. In Proceedings of the ICASSP. 4157–4160. DOI:

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Multimedia Computing, Communications, and Applications
ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 17, Issue 4
November 2021
529 pages
ISSN:1551-6857
EISSN:1551-6865
DOI:10.1145/3492437
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 November 2021
Accepted: 01 February 2021
Revised: 01 November 2020
Received: 01 May 2020
Published in TOMM Volume 17, Issue 4

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Multimodal analysis
  2. time-series modeling
  3. task independent
  4. communication skills
  5. group discussion

Qualifiers

  • Research-article
  • Refereed

Funding Sources

  • Japan Society for the Promotion of Science (JSPS) KAKENHI
  • JST AIP Trilateral AI Research

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 363
    Total Downloads
  • Downloads (Last 12 months)71
  • Downloads (Last 6 weeks)11
Reflects downloads up to 09 Nov 2024

Other Metrics

Citations

Cited By

View all

View Options

Get Access

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Full Text

View this article in Full Text.

Full Text

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media