research-article

Analyzing Free-standing Conversational Groups: A Multimodal Approach

Authors:

Xavier Alameda-Pineda,

Nicu SebeAuthors Info & Claims

MM '15: Proceedings of the 23rd ACM international conference on Multimedia

Pages 5 - 14

https://doi.org/10.1145/2733373.2806238

Published: 13 October 2015 Publication History

Abstract

During natural social gatherings, humans tend to organize themselves in the so-called free-standing conversational groups. In this context, robust head and body pose estimates can facilitate the higher-level description of the ongoing interplay. Importantly, visual information typically obtained with a distributed camera network might not suffice to achieve the robustness sought. In this line of thought, recent advances in wearable sensing technology open the door to multimodal and richer information flows. In this paper we propose to cast the head and body pose estimation problem into a matrix completion task. We introduce a framework able to fuse multimodal data emanating from a combination of distributed and wearable sensors, taking into account the temporal consistency, the head/body coupling and the noise inherent to the scenario. We report results on the novel and challenging SALSA dataset, containing visual, auditory and infrared recordings of 18 people interacting in a regular indoor environment. We demonstrate the soundness of the proposed method and the usability for higher-level tasks such as the detection of F-formations and the discovery of social attention attractors.

References

[1]

X. Alameda-Pineda and R. Horaud. Vision-guided robot hearing. IJRR, 34(4--5):437--456, 2015.

Digital Library

[2]

X. Alameda-Pineda, J. Staiano, R. Subramanian, L. Batrinca, E. Ricci, B. Lepri, O. Lanz, and N. Sebe. Salsa: A novel dataset for multimodal group behavior analysis. arXiv:1506.06882, 2015.

[3]

B. Benfold and I. Reid. Unsupervised learning of a scene-specific coarse gaze estimator. In ICCV, 2011.

Digital Library

[4]

S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein. Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundations and Trends in Machine Learning, 3(1):1--122, 2011.

Digital Library

[5]

R. Cabral, F. De la Torre, J. Costeira, and A. Bernardino. Matrix completion for multi-label image classification. TPAMI, 37(1):121--135, 2014.

[6]

A. T. Campbell, S. B. Eisenman, N. D. Lane, E. Miluzzo, and R. A. Peterson. People-centric urban sensing. In Int. Work. on Wireless Internet, 2006.

Digital Library

[7]

E. J. Candès and T. Tao. The power of convex relaxation: Near-optimal matrix completion. IEEE Trans. Inf. Theory, 56(5):2053--2080, 2010.

Digital Library

[8]

C. Canton-Ferrer, C. Segura, J. R. Casas, M. Pardas, and J. Hernando. Audiovisual head orientation estimation with particle filtering in multisensor scenarios. EURASIP Jour. on Adv. in Sig. Proc., 2008.

Digital Library

[9]

C. Chen and J.-M. Odobez. We are not contortionists: coupled adaptive learning for head and body orientation estimation in surveillance video. In CVPR, 2012.

Digital Library

[10]

T. Choudhury and A. Pentland. Sensing and modeling human networks using the sociometer. In Inter. Sym. on Wearable Comp., 2003.

Digital Library

[11]

M. Cristani et al. Social interaction discovery by statistical analysis of F-formations. In BMVC, 2011.

[12]

N. Dalal and B. Triggs. Histograms of oriented gradients for human detection. In CVPR, 2005.

Digital Library

[13]

F. De la Torre et al. Guide to the Carnegie Mellon University multimodal activity (Carnegie Mellon University-MMAC) database. Technical report, 2009.

[14]

M. Demirkus, D. Precup, J. J. Clark, and T. Arbel. Probabilistic temporal head pose estimation using a hierarchical graphical model. In ECCV, 2014.

[15]

T. M. T. Do and D. Gatica-Perez. Human interaction discovery in smartphone proximity networks. Personal and Ubiquitous Computing, 17(3):413--431, 2013.

Digital Library

[16]

N. Eagle and A. Pentland. Reality mining: sensing complex social systems. Personal and Ubiquitous Computing, 10(4):255--268, 2006.

Digital Library

[17]

S. Escalera, X. Bar, J. Gonzlez, M. A. Bautista, M. Madadi, M. Reyes, V. Ponce, H. J. Escalante, J. Shotton, and I. Guyon. Chalearn looking at people challenge 2014: Dataset and results. In ECCV Workshops, 2014.

[18]

D. Gatica-Perez. Automatic nonverbal analysis of social interaction in small groups: A review. Image and Vision Computing, 27(12):1775--1787, 2009.

Digital Library

[19]

X. Geng and Y. Xia. Head pose estimation based on multivariate label distribution. In CVPR, 2014.

Digital Library

[20]

P. Gilbert, J. Price, and S. Allan. Social comparison, social attractiveness and evolution: How might they be related? New Ideas in Psychology, 13(2), 1995.

[21]

A. Goldberg, B. Recht, J. Xu, R. Nowak, and X. Zhu. Transduction with matrix completion: Three birds with one stone. In NIPS, 2010.

[22]

H. Hung and B. Kröse. Detecting F-formations as dominant sets. In ICMI, 2011.

Digital Library

[23]

T. Joachims. Transductive inference for text classification using SVM. In ICML, 1999.

Digital Library

[24]

V. Kalofolias, X. Bresson, M. Bronstein, and P. Vandergheynst. Matrix completion on graphs. arXiv:1408.1717, 2014.

[25]

N. Krahnstoever, M.-C. Chang, and W. Ge. Gaze and body pose estimation from a distance. In AVSS, 2011.

Digital Library

[26]

O. Lanz. Approximate bayesian multibody tracking. TPAMI, 28:1436--1449, 2006.

Digital Library

[27]

F. Lingenfelser, J. Wagner, E. André, G. McKeown, and W. Curran. An event driven fusion approach for enjoyment recognition in real-time. In ACMMM, 2014.

Digital Library

[28]

A. Matic, V. Osmani, A. Maxhuni, and O. Mayora. Multi-modal mobile sensing of social interactions. In PervasiveHealth, 2012.

[29]

M. Pantic, N. Sebe, J. F. Cohn, and T. Huang. Affective multimodal human-computer interaction. In ACMMM, 2005.

Digital Library

[30]

S. Petridis, B. Martinez, and M. Pantic. The mahnob laughter database. Image and Vision Computing, 31(2):186--202, 2013.

Digital Library

[31]

A. K. Rajagopal, R. Subramanian, E. Ricci, R. L. Vieriu, O. Lanz, and N. Sebe. Exploring transfer learning approaches for head pose classification from multi-view surveillance images. IJCV, 109(1--2):146--167, 2014.

Digital Library

[32]

N. Robertson and I. Reid. Estimating gaze direction from low-resolution faces in video. In ECCV, 2006.

Digital Library

[33]

F. Setti, O. Lanz, R. Ferrario, V. Murino, and M. Cristani. Multi-scale F-formation discovery for group detection. In ICIP, 2013.

[34]

F. Setti, C. Russell, C. Bassetti, and M. Cristani. F-formation detection: Individuating free-standing conversational groups in images. arXiv:1409.2702, 2014.

[35]

J. Shotton et al. Real-time human pose recognition in parts from single depth images. Communications of the ACM, 56(1):116--124, 2013.

Digital Library

[36]

Y. Song, L.-P. Morency, and R. Davis. Multimodal human behavior analysis: learning correlation and interaction across modalities. In ICMI, 2012.

Digital Library

[37]

A. Vinciarelli, A. Dielmann, S. Favre, and H. Salamin. Canal9: A database of political debates for analysis of social interactions. In ICACII, 2009.

[38]

M. Voit and R. Stiefelhagen. 3d user-perspective, voxel-based estimation of visual focus of attention in dynamic meeting scenarios. In ICMI, 2010.

Digital Library

[39]

B. Wu, S. Lyu, B.-G. Hu, and Q. Ji. Multi-label learning with missing labels for image annotation and facial action unit recognition. Pattern Recognition, 2015.

Digital Library

[40]

L. Wu, R. Jin, and A. K. Jain. Tag completion for image retrieval. TPAMI, 35(3):716--727, 2013.

Digital Library

[41]

Y. Yan, E. Ricci, R. Subramanian, O. Lanz, and N. Sebe. No matter where you are: Flexible graph-guided multi-task learning for multi-view head pose classification under target motion. In ICCV, 2013.

Digital Library

[42]

Y. Yan, E. Ricci, R. Subramanian, G. Liu, and N. Sebe. Multitask linear discriminant analysis for view invariant action recognition. IEEE Transactions on Image Processing, 23(12):5599--5611, 2014.

Cited By

Barua HMg TPramanick PSarkar C(2024)Enabling Social Robots to Perceive and Join Socially Interacting Groups using F-formation: A Comprehensive OverviewACM Transactions on Human-Robot Interaction10.1145/3682072Online publication date: 29-Jul-2024
https://dl.acm.org/doi/10.1145/3682072
Gillet SVázquez MAndrist SLeite ISebo S(2024)Interaction-Shaping Robotics: Robots That Influence Interactions between Other AgentsACM Transactions on Human-Robot Interaction10.1145/364380313:1(1-23)Online publication date: 2-Feb-2024
https://dl.acm.org/doi/10.1145/3643803
Xia ZFeng ZYang XKong DCui H(2023)MFIRA: Multimodal Fusion Intent Recognition Algorithm for AR Chemistry ExperimentsApplied Sciences10.3390/app1314820013:14(8200)Online publication date: 14-Jul-2023
https://doi.org/10.3390/app13148200
Show More Cited By

Index Terms

Analyzing Free-standing Conversational Groups: A Multimodal Approach
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision tasks
        Vision for robotics

Recommendations

Multimodal analysis of free-standing conversational groups
Frontiers of Multimedia Research

"Free-standing conversational groups" are what we call the elementary building blocks of social interactions formed in settings when people are standing and congregate in groups. The automatic detection, analysis, and tracking of such structural ...
Impact of Multimodal Communication on Persuasiveness and Perceived Politeness of Virtual Agents in Small Groups
IVA '23: Proceedings of the 23rd ACM International Conference on Intelligent Virtual Agents

Multimodal communication is essential in human interactions, as it allows for a more comprehensive and nuanced exchange of information and emotions. The use of multiple communication channels such as speech, body language, and gaze can enhance the ...
Join Me Here if You Will: Investigating Embodiment and Politeness Behaviors When Joining Small Groups of Humans, Robots, and Virtual Characters
CHI '24: Proceedings of the 2024 CHI Conference on Human Factors in Computing Systems

Politeness and embodiment are pivotal elements in human-agent interactions. While many previous works advocate the positive role of embodiment in enhancing these interactions, it remains unclear how embodiment and politeness affect individuals joining ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

MM '15: Proceedings of the 23rd ACM international conference on Multimedia

October 2015

1402 pages

ISBN:9781450334594

DOI:10.1145/2733373

General Chairs:
Xiaofang Zhou
The University of Queensland, Australia
,
Alan F. Smeaton
Dublin City University, Ireland
,
Qi Tian
The University of Texas at San Antonio, USA
,
Program Chairs:
Dick C.A. Bulterman
FXPAL, USA
,
Heng Tao Shen
The University of Queensland, Australia
,
Ketan Mayer-Patel
The University of North Carolina, USA
,
Shuicheng Yan
National University of Singapore, Singapore

Copyright © 2015 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 October 2015

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

MM '15

Sponsor:

SIGMM

MM '15: ACM Multimedia Conference

October 26 - 30, 2015

Brisbane, Australia

Acceptance Rates

MM '15 Paper Acceptance Rate 56 of 252 submissions, 22%;

Overall Acceptance Rate 995 of 4,171 submissions, 24%

Upcoming Conference

MM '24

Sponsor:
sigmm

The 32nd ACM International Conference on Multimedia

October 28 - November 1, 2024

Melbourne , VIC , Australia

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

64
Total Citations
View Citations
901
Total Downloads

Downloads (Last 12 months)22
Downloads (Last 6 weeks)3

Reflects downloads up to 16 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

Barua HMg TPramanick PSarkar C(2024)Enabling Social Robots to Perceive and Join Socially Interacting Groups using F-formation: A Comprehensive OverviewACM Transactions on Human-Robot Interaction10.1145/3682072Online publication date: 29-Jul-2024
https://dl.acm.org/doi/10.1145/3682072
Gillet SVázquez MAndrist SLeite ISebo S(2024)Interaction-Shaping Robotics: Robots That Influence Interactions between Other AgentsACM Transactions on Human-Robot Interaction10.1145/364380313:1(1-23)Online publication date: 2-Feb-2024
https://dl.acm.org/doi/10.1145/3643803
Xia ZFeng ZYang XKong DCui H(2023)MFIRA: Multimodal Fusion Intent Recognition Algorithm for AR Chemistry ExperimentsApplied Sciences10.3390/app1314820013:14(8200)Online publication date: 14-Jul-2023
https://doi.org/10.3390/app13148200
Beyan CVinciarelli ABue A(2023)Co-Located Human–Human Interaction Analysis Using Nonverbal Cues: A SurveyACM Computing Surveys10.1145/362651656:5(1-41)Online publication date: 25-Nov-2023
https://dl.acm.org/doi/10.1145/3626516
Mavrogiannis CBaldini FWang AZhao DTrautman PSteinfeld AOh J(2023)Core Challenges of Social Robot Navigation: A SurveyACM Transactions on Human-Robot Interaction10.1145/358374112:3(1-39)Online publication date: 24-Apr-2023
https://dl.acm.org/doi/10.1145/3583741
Li RSeyed TMarquardt NOfek EHodges SSinclair MRomat HPahud MSharma JBuxton WHinckley KRiche N(2023)AdHocProx: Sensing Mobile, Ad-Hoc Collaborative Device Formations using Dual Ultra-Wideband RadiosProceedings of the 2023 CHI Conference on Human Factors in Computing Systems10.1145/3544548.3581300(1-18)Online publication date: 19-Apr-2023
https://dl.acm.org/doi/10.1145/3544548.3581300
Sosa-León VSchwering A(2022)Evaluating Automatic Body Orientation Detection for Indoor Location from Skeleton Tracking Data to Detect Socially Occupied Spaces Using the Kinect v2, Azure Kinect and Zed 2iSensors10.3390/s2210379822:10(3798)Online publication date: 17-May-2022
https://doi.org/10.3390/s22103798
Tan STax DHung H(2022)Conversation Group Detection With Spatio-Temporal ContextProceedings of the 2022 International Conference on Multimodal Interaction10.1145/3536221.3556611(170-180)Online publication date: 7-Nov-2022
https://dl.acm.org/doi/10.1145/3536221.3556611
Karale SSaini J(2022)Usage of Parallelization Techniques for Video Summarisation: State-of-the-art, Open Issues, and Future Research Avenues2022 IEEE 7th International conference for Convergence in Technology (I2CT)10.1109/I2CT54291.2022.9824044(1-10)Online publication date: 7-Apr-2022
https://doi.org/10.1109/I2CT54291.2022.9824044
Raman CHung HLoog M(2022)Social Processes: Self-supervised Meta-learning Over Conversational Groups for Forecasting Nonverbal Social CuesComputer Vision – ECCV 2022 Workshops10.1007/978-3-031-25066-8_37(639-659)Online publication date: 23-Oct-2022
https://dl.acm.org/doi/10.1007/978-3-031-25066-8_37
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents