Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1180995.1181004acmconferencesArticle/Chapter ViewAbstractPublication Pagesicmi-mlmiConference Proceedingsconference-collections
Article

Speaker localization for microphone array-based ASR: the effects of accuracy on overlapping speech

Published: 02 November 2006 Publication History

Abstract

Accurate speaker location is essential for optimal performance of distant speech acquisition systems using microphone array techniques. However, to the best of our knowledge, no comprehensive studies on the degradation of automatic speech recognition (ASR) as a function of speaker location accuracy in a multi-party scenario exist. In this paper, we describe a framework for evaluation of the effects of speaker location errors on a microphone array-based ASR system, in the context of meetings in multi-sensor rooms comprising multiple cameras and microphones. Speakers are manually annotated in videos in different camera views, and triangulation is used to determine an accurate speaker location. Errors in the speaker location are then induced in a systematic manner to observe their influence on speech recognition performance. The system is evaluated on real overlapping speech data collected with simultaneous speakers in a meeting room. The results are compared with those obtained from close-talking headset microphones, lapel microphones, and speaker location based on audio-only and audio-visual information approaches.

References

[1]
F. Asano et al. "Detection and Separation of Speech Event using audio and video information fusion," Journal of Applied Signal Processing, 2004.
[2]
H. Cox et al. "Robust adaptive beamforming," IEEE Trans. on Acoustics, Speech and Sig. Proc., Oct., 1987.
[3]
J. Crowley and P. Berard. "Multi-modal tracking of faces for video communications," CVPR, June, 1997.
[4]
R. Cutler et al. "Distributed meetings: A meeting capture and broadcasting system". ACM, Oct, 2002.
[5]
D. Gatica-Perez et al. "Multimodal Multispeaker Probabilistic Tracking in Meetings," ICMI, Oct., 2005.
[6]
J.-L. Gauvain and C.-H. Lee. "Maximum a posteriori estimation for multivariate gaussian mixture observations of Markov chains," IEEE Trans. on Acoustics, Speech and Signal Processing, Apr., 1994.
[7]
R. Hartley and A. Zisserman. "Multiple View Geometry in Computer Vision," CU Press, 2001.
[8]
T. Hain et al. "The Development of the AMI System for the Transcriptions of Speech in Meetings," Proc. MLMI, July, 2005.
[9]
C. J. Leggetter and P. C. Woodland. "Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models," Computer Speech and Language, 9(2):171--185, 1995.
[10]
G. Lathoud and I. McCowan. "A Sector-Based Approach for Localization of Multiple Speakers with Microphone Arrays," Proc. SAPA, Oct., 2004.
[11]
I. McCowan et al. "Speech Acquisition in Meetings with an Audio-Visual Sensor Array," ICME, July, 2005.
[12]
D. Moore and I. McCowan. "Microphone array speech recognition: Experiments on overlapping speech in meetings," Proc. ICASSP, Apr., 2003.
[13]
K. Nickel et al. "A joint particle filter for audio-visual speaker tracking," Proc. ICMI, Oct., 2005.
[14]
T.Robinson et al. "WSJCAM0: A British English Speech Corpus for Large Vocabulary Continuous Speech Recognition," Proc. ICASSP, April, 1995.
[15]
E. Shriberg et al. "Observations on overlap: findings and implications for automatic processing of multi-party conversation," Eurospeech, Sep., 2001.
[16]
M. Wolfel et al. "Microphone Array Driven Speech Recognition: Influence of Localization on the Word Error Rate," Proc. MLMI, July, 2005.

Cited By

View all
  • (2019)Using artificial intelligence to assess clinicians’ communication skillsBMJ10.1136/bmj.l161(l161)Online publication date: 18-Jan-2019
  • (2018)Virtual speaker tracking by camera using a sound source localisation with two microphonesInternational Journal of Networking and Virtual Organisations10.1504/IJNVO.2013.05373312:2(85-110)Online publication date: 19-Dec-2018
  • (2015)Speaker detection in a virtual classroom using 3D triangulation2015 Asia Pacific Conference on Multimedia and Broadcasting10.1109/APMediaCast.2015.7210280(1-6)Online publication date: Apr-2015
  • Show More Cited By
  1. Speaker localization for microphone array-based ASR: the effects of accuracy on overlapping speech

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      ICMI '06: Proceedings of the 8th international conference on Multimodal interfaces
      November 2006
      404 pages
      ISBN:159593541X
      DOI:10.1145/1180995
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 02 November 2006

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. audio-visual speaker tracking
      2. microphone array ASR

      Qualifiers

      • Article

      Conference

      ICMI06
      Sponsor:

      Acceptance Rates

      Overall Acceptance Rate 453 of 1,080 submissions, 42%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)4
      • Downloads (Last 6 weeks)1
      Reflects downloads up to 25 Feb 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2019)Using artificial intelligence to assess clinicians’ communication skillsBMJ10.1136/bmj.l161(l161)Online publication date: 18-Jan-2019
      • (2018)Virtual speaker tracking by camera using a sound source localisation with two microphonesInternational Journal of Networking and Virtual Organisations10.1504/IJNVO.2013.05373312:2(85-110)Online publication date: 19-Dec-2018
      • (2015)Speaker detection in a virtual classroom using 3D triangulation2015 Asia Pacific Conference on Multimedia and Broadcasting10.1109/APMediaCast.2015.7210280(1-6)Online publication date: Apr-2015
      • (2012)Automatic Speaker Localization and TrackingAdvancing the Next-Generation of Mobile Computing10.4018/978-1-4666-0119-2.ch011(164-181)Online publication date: 2012
      • (2011)Speaker localization using stereo-based sound source localizationInternational Workshop on Systems, Signal Processing and their Applications, WOSSPA10.1109/WOSSPA.2011.5931459(231-234)Online publication date: May-2011
      • (2010)Automatic Speaker Localization and TrackingInternational Journal of Mobile Computing and Multimedia Communications10.4018/jmcmc.20100701022:3(15-33)Online publication date: 1-Jul-2010
      • (2010)A new method of speaker localization using the filtered correlation2010 The 2nd International Conference on Industrial Mechatronics and Automation10.1109/ICINDMA.2010.5538372(46-49)Online publication date: May-2010
      • (2008)Visual Focus of Attention in Dynamic Meeting ScenariosProceedings of the 5th international workshop on Machine Learning for Multimodal Interaction10.1007/978-3-540-85853-9_1(1-13)Online publication date: 8-Sep-2008
      • (2007)Computer-supported human-human multilingual communication50 years of artificial intelligence10.5555/1806115.1806145(271-287)Online publication date: 1-Jan-2007
      • (2007)Computer-Supported Human-Human Multilingual Communication50 Years of Artificial Intelligence10.1007/978-3-540-77296-5_25(271-287)Online publication date: 2007

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media