Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.5555/645524.656651guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

Multimodal Speaker Detection Using Input/Output Dynamic Bayesian Networks

Published: 14 October 2000 Publication History

Abstract

Inferring users' actions and intentions forms an integral part of design and development of any human-computer interface. The presence of noisy and at times ambiguous sensory data makes this problem challenging. We formulate a framework for temporal fusion of multiple sensors using input-output dynamic Bayesian networks (IODBNs). We find that contextual information about the state of the computer interface, used as an input to the DBN, and sensor distributions learned from data are crucial for good detection performance. Nevertheless, classical DBN learning methods can cause such models to fail when the data exhibits complex behavior. To further improve the detection rate we formulate an error-feedback learning strategy for DBNs. We apply this framework to the problem of audio/visual speaker detection in an interactive kiosk application using "off-the-shelf" visual and audio sensors (face, skin, texture, mouth motion, and silence detectors). Detection results obtained in this setup demonstrate numerous benefits of our learning-based framework.

References

[1]
Y. Bengio and P. Frasconi, "An input-output HMM architecture," in Advances in Neural Information Processing Systems 7, pp. 427-434, Cambridge, MA: MIT Press, 1995.
[2]
M. Brand, N. Oliver, and A. Pentland, "Coupled hidden markov models for complex action recognition," in Proc. IEEE Conf. on Computer Vision and Pattern Recognition, (San Juan, PR), pp. 994-999, 1997.
[3]
A. Garg, V. Pavlovic, J. Rehg, and T. S. Huang, "Audio-visual speaker detection using dynamic Bayesian networks," in Proc. of 4rd Intl Conf. Automatic Face and Gesture Rec., (Grenbole, France), pp. 374-471, 2000.
[4]
S. Intille and A. Bobick, "Representation and visual recognition of complex, multi-agent actions using belief networks," Tech. Rep. 454, MIT Media Lab, Cambridge, MA, 1998.
[5]
V. Pavlovic, A. Garg, J. Rehg, and T. S. Huang, "Multimodal speaker detection using error feedback dynamic Bayesian networks." To appear in Computer Vision and Pattern Recognition 2000.
[6]
J. M. Rehg, M. Loughlin, and K. Waters, "Vision for a smart kiosk," in Proc. IEEE Conf. on Computer Vision and Pattern Recognition, (Puerto Rico), pp. 690-696, 1997.
[7]
J.M. Rehg, K. P. Murphy, and P.W. Fieguth, "Vision-based speaker detection using bayesian networks," in Proc. IEEE Conf. on Computer Vision and Pattern Recognition, (Ft. Collins, CO), pp. 110-116, 1999.
[8]
H. Rowley, S. Baluja, and T. Kanade, "Neural network-based face detection," in Proc. IEEE Conf. on Computer Vision and Pattern Recognition, (San Francisco, CA), pp. 203-208, 1996.
[9]
R. E. Schapire and Y. Singer, "Improved boosting algorithms using cofidence rated predictions." To appear in Machine Learning.
[10]
J. Yang and A. Waibel, "A real-time face tracker," in Proc. of 3rd Workshop on Appl. of Comp. Vision, (Sarasota, FL), pp. 142-147, 1996.

Cited By

View all
  • (2006)Mobility detection using everyday GSM tracesProceedings of the 8th international conference on Ubiquitous Computing10.1007/11853565_13(212-224)Online publication date: 17-Sep-2006
  • (2005)Multimodal multispeaker probabilistic tracking in meetingsProceedings of the 7th international conference on Multimodal interfaces10.1145/1088463.1088496(183-190)Online publication date: 4-Oct-2005
  1. Multimodal Speaker Detection Using Input/Output Dynamic Bayesian Networks

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image Guide Proceedings
    ICMI '00: Proceedings of the Third International Conference on Advances in Multimodal Interfaces
    October 2000
    673 pages
    ISBN:3540411801

    Publisher

    Springer-Verlag

    Berlin, Heidelberg

    Publication History

    Published: 14 October 2000

    Qualifiers

    • Article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 27 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2006)Mobility detection using everyday GSM tracesProceedings of the 8th international conference on Ubiquitous Computing10.1007/11853565_13(212-224)Online publication date: 17-Sep-2006
    • (2005)Multimodal multispeaker probabilistic tracking in meetingsProceedings of the 7th international conference on Multimodal interfaces10.1145/1088463.1088496(183-190)Online publication date: 4-Oct-2005

    View Options

    View options

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media