Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Boosting-Based Multimodal Speaker Detection for Distributed Meeting Videos

Published: 01 December 2008 Publication History

Abstract

Identifying the active speaker in a video of a distributed meeting can be very helpful for remote participants to understand the dynamics of the meeting. A straightforward application of such analysis is to stream a high resolution video of the speaker to the remote participants. In this paper, we present the challenges we met while designing a speaker detector for the Microsoft RoundTable distributed meeting device, and propose a novel boosting-based multimodal speaker detection (BMSD) algorithm. Instead of separately performing sound source localization (SSL) and multiperson detection (MPD) and subsequently fusing their individual results, the proposed algorithm fuses audio and visual information at feature level by using boosting to select features from a combined pool of both audio and visual features simultaneously. The result is a very accurate speaker detector with extremely high efficiency. In experiments that includes hundreds of real-world meetings, the proposed BMSD algorithm reduces the error rate of SSL-only approach by 24.6%, and the SSL and MPD fusion approach by 20.9%. To the best of our knowledge, this is the first real-time multimodal speaker detection algorithm that is deployed in commercial products.

Cited By

View all
  • (2024)Multimodal Boosting: Addressing Noisy Modalities and Identifying Modality ContributionIEEE Transactions on Multimedia10.1109/TMM.2023.330648926(3018-3033)Online publication date: 1-Jan-2024
  • (2016)Look who's talkingProceedings of the 2nd Workshop on Advancements in Social Signal Processing for Multimodal Interaction10.1145/3005467.3005470(22-27)Online publication date: 12-Nov-2016
  • (2016)Visual Voice Activity Detection in the WildIEEE Transactions on Multimedia10.1109/TMM.2016.253535718:6(967-977)Online publication date: 13-May-2016
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image IEEE Transactions on Multimedia
IEEE Transactions on Multimedia  Volume 10, Issue 8
December 2008
279 pages

Publisher

IEEE Press

Publication History

Published: 01 December 2008

Author Tags

  1. Audiovisual fusion
  2. boosting
  3. speaker detection

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 31 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Multimodal Boosting: Addressing Noisy Modalities and Identifying Modality ContributionIEEE Transactions on Multimedia10.1109/TMM.2023.330648926(3018-3033)Online publication date: 1-Jan-2024
  • (2016)Look who's talkingProceedings of the 2nd Workshop on Advancements in Social Signal Processing for Multimodal Interaction10.1145/3005467.3005470(22-27)Online publication date: 12-Nov-2016
  • (2016)Visual Voice Activity Detection in the WildIEEE Transactions on Multimedia10.1109/TMM.2016.253535718:6(967-977)Online publication date: 13-May-2016
  • (2015)Who's Speaking?Proceedings of the 2015 ACM on International Conference on Multimodal Interaction10.1145/2818346.2820780(87-90)Online publication date: 9-Nov-2015
  • (2011)Robust user context analysis for multimodal interfacesProceedings of the 13th international conference on multimodal interfaces10.1145/2070481.2070498(81-88)Online publication date: 14-Nov-2011
  • (2010)Client and speech detection system for intelligent infokioskProceedings of the 13th international conference on Text, speech and dialogue10.5555/1887176.1887251(560-567)Online publication date: 6-Sep-2010
  • (2009)Multimodal collaboration and human-computer interactionProceedings of the 2009 IEEE international conference on Multimedia and Expo10.5555/1698924.1699327(1596-1599)Online publication date: 28-Jun-2009
  • (2009)Audio-visual analysis for event understandingProceedings of the 2009 workshop on Ambient media computing10.1145/1631005.1631016(47-48)Online publication date: 23-Oct-2009

View Options

View options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media