Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2733373.2806238acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

Analyzing Free-standing Conversational Groups: A Multimodal Approach

Published: 13 October 2015 Publication History

Abstract

During natural social gatherings, humans tend to organize themselves in the so-called free-standing conversational groups. In this context, robust head and body pose estimates can facilitate the higher-level description of the ongoing interplay. Importantly, visual information typically obtained with a distributed camera network might not suffice to achieve the robustness sought. In this line of thought, recent advances in wearable sensing technology open the door to multimodal and richer information flows. In this paper we propose to cast the head and body pose estimation problem into a matrix completion task. We introduce a framework able to fuse multimodal data emanating from a combination of distributed and wearable sensors, taking into account the temporal consistency, the head/body coupling and the noise inherent to the scenario. We report results on the novel and challenging SALSA dataset, containing visual, auditory and infrared recordings of 18 people interacting in a regular indoor environment. We demonstrate the soundness of the proposed method and the usability for higher-level tasks such as the detection of F-formations and the discovery of social attention attractors.

References

[1]
X. Alameda-Pineda and R. Horaud. Vision-guided robot hearing. IJRR, 34(4--5):437--456, 2015.
[2]
X. Alameda-Pineda, J. Staiano, R. Subramanian, L. Batrinca, E. Ricci, B. Lepri, O. Lanz, and N. Sebe. Salsa: A novel dataset for multimodal group behavior analysis. arXiv:1506.06882, 2015.
[3]
B. Benfold and I. Reid. Unsupervised learning of a scene-specific coarse gaze estimator. In ICCV, 2011.
[4]
S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein. Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundations and Trends in Machine Learning, 3(1):1--122, 2011.
[5]
R. Cabral, F. De la Torre, J. Costeira, and A. Bernardino. Matrix completion for multi-label image classification. TPAMI, 37(1):121--135, 2014.
[6]
A. T. Campbell, S. B. Eisenman, N. D. Lane, E. Miluzzo, and R. A. Peterson. People-centric urban sensing. In Int. Work. on Wireless Internet, 2006.
[7]
E. J. Candès and T. Tao. The power of convex relaxation: Near-optimal matrix completion. IEEE Trans. Inf. Theory, 56(5):2053--2080, 2010.
[8]
C. Canton-Ferrer, C. Segura, J. R. Casas, M. Pardas, and J. Hernando. Audiovisual head orientation estimation with particle filtering in multisensor scenarios. EURASIP Jour. on Adv. in Sig. Proc., 2008.
[9]
C. Chen and J.-M. Odobez. We are not contortionists: coupled adaptive learning for head and body orientation estimation in surveillance video. In CVPR, 2012.
[10]
T. Choudhury and A. Pentland. Sensing and modeling human networks using the sociometer. In Inter. Sym. on Wearable Comp., 2003.
[11]
M. Cristani et al. Social interaction discovery by statistical analysis of F-formations. In BMVC, 2011.
[12]
N. Dalal and B. Triggs. Histograms of oriented gradients for human detection. In CVPR, 2005.
[13]
F. De la Torre et al. Guide to the Carnegie Mellon University multimodal activity (Carnegie Mellon University-MMAC) database. Technical report, 2009.
[14]
M. Demirkus, D. Precup, J. J. Clark, and T. Arbel. Probabilistic temporal head pose estimation using a hierarchical graphical model. In ECCV, 2014.
[15]
T. M. T. Do and D. Gatica-Perez. Human interaction discovery in smartphone proximity networks. Personal and Ubiquitous Computing, 17(3):413--431, 2013.
[16]
N. Eagle and A. Pentland. Reality mining: sensing complex social systems. Personal and Ubiquitous Computing, 10(4):255--268, 2006.
[17]
S. Escalera, X. Bar, J. Gonzlez, M. A. Bautista, M. Madadi, M. Reyes, V. Ponce, H. J. Escalante, J. Shotton, and I. Guyon. Chalearn looking at people challenge 2014: Dataset and results. In ECCV Workshops, 2014.
[18]
D. Gatica-Perez. Automatic nonverbal analysis of social interaction in small groups: A review. Image and Vision Computing, 27(12):1775--1787, 2009.
[19]
X. Geng and Y. Xia. Head pose estimation based on multivariate label distribution. In CVPR, 2014.
[20]
P. Gilbert, J. Price, and S. Allan. Social comparison, social attractiveness and evolution: How might they be related? New Ideas in Psychology, 13(2), 1995.
[21]
A. Goldberg, B. Recht, J. Xu, R. Nowak, and X. Zhu. Transduction with matrix completion: Three birds with one stone. In NIPS, 2010.
[22]
H. Hung and B. Kröse. Detecting F-formations as dominant sets. In ICMI, 2011.
[23]
T. Joachims. Transductive inference for text classification using SVM. In ICML, 1999.
[24]
V. Kalofolias, X. Bresson, M. Bronstein, and P. Vandergheynst. Matrix completion on graphs. arXiv:1408.1717, 2014.
[25]
N. Krahnstoever, M.-C. Chang, and W. Ge. Gaze and body pose estimation from a distance. In AVSS, 2011.
[26]
O. Lanz. Approximate bayesian multibody tracking. TPAMI, 28:1436--1449, 2006.
[27]
F. Lingenfelser, J. Wagner, E. André, G. McKeown, and W. Curran. An event driven fusion approach for enjoyment recognition in real-time. In ACMMM, 2014.
[28]
A. Matic, V. Osmani, A. Maxhuni, and O. Mayora. Multi-modal mobile sensing of social interactions. In PervasiveHealth, 2012.
[29]
M. Pantic, N. Sebe, J. F. Cohn, and T. Huang. Affective multimodal human-computer interaction. In ACMMM, 2005.
[30]
S. Petridis, B. Martinez, and M. Pantic. The mahnob laughter database. Image and Vision Computing, 31(2):186--202, 2013.
[31]
A. K. Rajagopal, R. Subramanian, E. Ricci, R. L. Vieriu, O. Lanz, and N. Sebe. Exploring transfer learning approaches for head pose classification from multi-view surveillance images. IJCV, 109(1--2):146--167, 2014.
[32]
N. Robertson and I. Reid. Estimating gaze direction from low-resolution faces in video. In ECCV, 2006.
[33]
F. Setti, O. Lanz, R. Ferrario, V. Murino, and M. Cristani. Multi-scale F-formation discovery for group detection. In ICIP, 2013.
[34]
F. Setti, C. Russell, C. Bassetti, and M. Cristani. F-formation detection: Individuating free-standing conversational groups in images. arXiv:1409.2702, 2014.
[35]
J. Shotton et al. Real-time human pose recognition in parts from single depth images. Communications of the ACM, 56(1):116--124, 2013.
[36]
Y. Song, L.-P. Morency, and R. Davis. Multimodal human behavior analysis: learning correlation and interaction across modalities. In ICMI, 2012.
[37]
A. Vinciarelli, A. Dielmann, S. Favre, and H. Salamin. Canal9: A database of political debates for analysis of social interactions. In ICACII, 2009.
[38]
M. Voit and R. Stiefelhagen. 3d user-perspective, voxel-based estimation of visual focus of attention in dynamic meeting scenarios. In ICMI, 2010.
[39]
B. Wu, S. Lyu, B.-G. Hu, and Q. Ji. Multi-label learning with missing labels for image annotation and facial action unit recognition. Pattern Recognition, 2015.
[40]
L. Wu, R. Jin, and A. K. Jain. Tag completion for image retrieval. TPAMI, 35(3):716--727, 2013.
[41]
Y. Yan, E. Ricci, R. Subramanian, O. Lanz, and N. Sebe. No matter where you are: Flexible graph-guided multi-task learning for multi-view head pose classification under target motion. In ICCV, 2013.
[42]
Y. Yan, E. Ricci, R. Subramanian, G. Liu, and N. Sebe. Multitask linear discriminant analysis for view invariant action recognition. IEEE Transactions on Image Processing, 23(12):5599--5611, 2014.

Cited By

View all
  • (2024)Enabling Social Robots to Perceive and Join Socially Interacting Groups using F-formation: A Comprehensive OverviewACM Transactions on Human-Robot Interaction10.1145/3682072Online publication date: 29-Jul-2024
  • (2024)Interaction-Shaping Robotics: Robots That Influence Interactions between Other AgentsACM Transactions on Human-Robot Interaction10.1145/364380313:1(1-23)Online publication date: 2-Feb-2024
  • (2023)MFIRA: Multimodal Fusion Intent Recognition Algorithm for AR Chemistry ExperimentsApplied Sciences10.3390/app1314820013:14(8200)Online publication date: 14-Jul-2023
  • Show More Cited By

Index Terms

  1. Analyzing Free-standing Conversational Groups: A Multimodal Approach

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    MM '15: Proceedings of the 23rd ACM international conference on Multimedia
    October 2015
    1402 pages
    ISBN:9781450334594
    DOI:10.1145/2733373
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 13 October 2015

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. free-standing conversational groups
    2. head/body pose estimation
    3. matrix completion
    4. multi-view processing
    5. sensor fusion
    6. wearable sensors

    Qualifiers

    • Research-article

    Conference

    MM '15
    Sponsor:
    MM '15: ACM Multimedia Conference
    October 26 - 30, 2015
    Brisbane, Australia

    Acceptance Rates

    MM '15 Paper Acceptance Rate 56 of 252 submissions, 22%;
    Overall Acceptance Rate 995 of 4,171 submissions, 24%

    Upcoming Conference

    MM '24
    The 32nd ACM International Conference on Multimedia
    October 28 - November 1, 2024
    Melbourne , VIC , Australia

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)22
    • Downloads (Last 6 weeks)3
    Reflects downloads up to 16 Oct 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Enabling Social Robots to Perceive and Join Socially Interacting Groups using F-formation: A Comprehensive OverviewACM Transactions on Human-Robot Interaction10.1145/3682072Online publication date: 29-Jul-2024
    • (2024)Interaction-Shaping Robotics: Robots That Influence Interactions between Other AgentsACM Transactions on Human-Robot Interaction10.1145/364380313:1(1-23)Online publication date: 2-Feb-2024
    • (2023)MFIRA: Multimodal Fusion Intent Recognition Algorithm for AR Chemistry ExperimentsApplied Sciences10.3390/app1314820013:14(8200)Online publication date: 14-Jul-2023
    • (2023)Co-Located Human–Human Interaction Analysis Using Nonverbal Cues: A SurveyACM Computing Surveys10.1145/362651656:5(1-41)Online publication date: 25-Nov-2023
    • (2023)Core Challenges of Social Robot Navigation: A SurveyACM Transactions on Human-Robot Interaction10.1145/358374112:3(1-39)Online publication date: 24-Apr-2023
    • (2023)AdHocProx: Sensing Mobile, Ad-Hoc Collaborative Device Formations using Dual Ultra-Wideband RadiosProceedings of the 2023 CHI Conference on Human Factors in Computing Systems10.1145/3544548.3581300(1-18)Online publication date: 19-Apr-2023
    • (2022)Evaluating Automatic Body Orientation Detection for Indoor Location from Skeleton Tracking Data to Detect Socially Occupied Spaces Using the Kinect v2, Azure Kinect and Zed 2iSensors10.3390/s2210379822:10(3798)Online publication date: 17-May-2022
    • (2022)Conversation Group Detection With Spatio-Temporal ContextProceedings of the 2022 International Conference on Multimodal Interaction10.1145/3536221.3556611(170-180)Online publication date: 7-Nov-2022
    • (2022)Usage of Parallelization Techniques for Video Summarisation: State-of-the-art, Open Issues, and Future Research Avenues2022 IEEE 7th International conference for Convergence in Technology (I2CT)10.1109/I2CT54291.2022.9824044(1-10)Online publication date: 7-Apr-2022
    • (2022)Social Processes: Self-supervised Meta-learning Over Conversational Groups for Forecasting Nonverbal Social CuesComputer Vision – ECCV 2022 Workshops10.1007/978-3-031-25066-8_37(639-659)Online publication date: 23-Oct-2022
    • Show More Cited By

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media