Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1459359.1459380acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

Probabilistic integration of sparse audio-visual cues for identity tracking

Published: 26 October 2008 Publication History

Abstract

In the context of smart environments, the ability to track and identify persons is a key factor, determining the scope and flexibility of analytical components or intelligent services that can be provided. While some amount of work has been done concerning the camera-based tracking of multiple users in a variety of scenarios, technologies for acoustic and visual identification, such as face or voice ID, are unfortunately still subjected to severe limitations when distantly placed sensors have to be used. Because of this, reliable cues for identification can be hard to obtain without user cooperation, especially when multiple users are involved.
In this paper, we present a novel technique for the tracking and identification of multiple persons in a smart environment using distantly placed audio-visual sensors. The technique builds on the opportunistic integration of tracking as well as face and voice identification cues, gained from several cameras and microphones, whenever these cues can be captured with a sufficient degree of confidence. A probabilistic model is used to keep track of identified persons and update the belief in their identities whenever new observations can be made. The technique has been systematically evaluated on the CLEAR Interactive Seminar database, a large audio-visual corpus of realistic meeting scenarios captured in a variety of smart rooms.

References

[1]
I. McCowan, D. Gatica-Perez, S. Bengio, G. Lathoud, M. Barnard, D. Zhang, "Automatic Analysis of Multimodal Group Actions in Meetings". IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 27, No. 3, pp. 305--317, March, 2005.
[2]
R. Stiefelhagen, "Tracking Focus of Attention in Meetings". IEEE Int. Conf. on Multimodal Interfaces - ICMI 2002, Pittsburgh, 2002.
[3]
T. Choudhury, B. Clarkson, T. Jebara and A. Pentland, "Multimodal Person Recognition using Unconstrained Audio and Video". Second Conference on Audio- and Video-based Biometric Person Authentication '99 (AVBPA '99), pp. 176--181, Washington DC
[4]
J. Yang, X. Zhu, R. Gross, J. Kominek, Y. Pan, A. Waibel, "Multimodal people ID for a multimedia meeting browser". Proceedings of the 7th ACM International Conference on Multimedia '99, Orlando, FL
[5]
A. Hampapur, S. Pankanti, A. W. Senior, Y.-L. Tian, L. Brown, R. M. Bolle, "Face Cataloger: Multi-Scale Imaging for Relating Identity to Location". IEEE Conference on Advanced Video and Signal Based Surveillance (AVSS 2003), July 2003, Miami, FL.
[6]
S. Stillman, R. Tanawongsuwan, and I. Essa, "A system for tracking and recognizing multiple people with multiple cameras". Technical Report GIT-GVU-98-25, Georgia Inst. of Tech., Graphics, Visualization, and Usability Center, 1998.
[7]
S. Stillman and I. Essa, "Towards reliable multimodal sensing in aware environments" Perceptual User Interfaces (PUI) Workshop, 2001.
[8]
M. Trivedi, I. Mikic and S. Bhonsle, "Active Camera Networks and Semantic Event Databases for Intelligent Environments". IEEE Workshop on Human Modeling, Analysis and Synthesis, June 2000.
[9]
Dimitrios Makris, Tim Ellis, James Black, "Bridging the Gaps between Cameras". IEEE Conference on Computer Vision and Pattern Recognition (CVPR'04) - Vol. 2, 2004
[10]
W. Zajdel, B. J. A. Kröse, "A sequential Bayesian algorithm for surveillance with nonoverlapping cameras". IJPRAI, Vol. 19, No. 8, Dec 2005
[11]
T. Gehrig, K. Nickel, H. K. Ekenel, U. Klee, and J. McDonough, "Kalman Filters for Audio-Video Source Localization". IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, October. 2005.
[12]
H. K. Ekenel, R. Stiefelhagen, "Local Appearance based Face Recognition Using Discrete Cosine Transform". 13th European Signal Processing Conference (EUSIPCO), Antalya Turkey, September 2005.
[13]
H. K. Ekenel, R. Stiefelhagen, "A Generic Face Representation Approach for Local Appearance based Face Verification". CVPR IEEE Workshop on Face Recognition Grand Challenge Experiments, San Diego, CA, USA, June 2005.
[14]
H. K. Ekenel, Q. Jin, "ISL Person Identification Systems in the CLEAR Evaluations". Proceedings of the first International CLEAR evaluation workshop, Southampton, UK, April 2006.
[15]
K. Bernardin, A. Elbs, R. Stiefelhagen, "Multiple Object Tracking Performance Metrics and Evaluation in a Smart Room Environment". 6th IEEE Int. Workshop on Visual Surveillance, VS 2006, Graz, Austria, May 2006
[16]
K. Bernardin and R. Stiefelhagen, "Evaluating Multiple Object Tracking Performance: The CLEAR MOT Metrics". EURASIP Journal on Image and Video Processing, Special Issue on Video Tracking in Complex Scenes for Surveillance Applications, Vol. 2008, Article ID 246309, May 2008
[17]
K. Bernardin and R. Stiefelhagen, "Audio-Visual Multi-Person Tracking and Identification for Smart Environments". ACM Multimedia 2007, Augsburg, Germany, September 2007
[18]
K. Bernardin, T. Gehrig, R. Stiefelhagen, "Multi-Level Particle Filter Fusion of Features and Cues for Audio-Visual Person Tracking". Multimodal Technologies for Perception of Humans, Joint Proceedings of the CLEAR 2007 and RT 2007 Evaluation Workshops, May 2007, Baltimore, MD, USA, Springer LNCS 4625, 2008
[19]
R. Stiefelhagen, K. Bernardin, R. Bowers, T. Rose, M. Michel and J. Garofolo, "The CLEAR 2007 Evaluation". Multimodal Technologies for Perception of Humans, Joint Proceedings of the CLEAR 2007 and RT 2007 Evaluation Workshops, May 2007, Baltimore, MD, USA, Springer LNCS 4625, 2008
[20]
D. Mostefa, N. Moreau, K. Choukri, G. Potamianos, S. M. Chu, A. Tyagi, J. R. Casas, J. Turmo, L. Christoforetti, F. Tobia, A. Pnevmatikakis, V. Mylonakis, F. Talantzis, S. Burger, R. Stiefelhagen, K. Bernardin, and C. Rochet, "The CHIL Audiovisual Corpus for Lecture and Meeting Analysis Inside Smart Rooms". In Language Resources and Evaluation, No. 41, Springer, 2007.
[21]
Rainer Stiefelhagen, Jonathan Fiscus and Rachel Bowers, "Multimodal Technologies for Perception of Humans, Joint Proceedings of the CLEAR 2007 and RT 2007 Evaluation Workshops". Springer Lecture Notes in Computer Science, No. 4625, 2008.
[22]
J. Munkres, "Algorithms for the Assignment and Transportation Problems". Journal of the Society of Industrial and Applied Mathematics, Vol. 5(1), pp. 32--38, March 1957.
[23]
T. Kailath, "The Divergence and Bhattacharyya Distance Measures in Signal Selection". IEEE Trans. on Comm. Technology, Vol. 15, pp. 52--60, Feb. 1967
[24]
CLEAR - Classification of Events, Activities and Relationships, http://www.clear-evaluation.org/

Cited By

View all

Index Terms

  1. Probabilistic integration of sparse audio-visual cues for identity tracking

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      MM '08: Proceedings of the 16th ACM international conference on Multimedia
      October 2008
      1206 pages
      ISBN:9781605583037
      DOI:10.1145/1459359
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 26 October 2008

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. human perception
      2. modality fusion
      3. sensor fusion
      4. smart environments

      Qualifiers

      • Research-article

      Conference

      MM08
      Sponsor:
      MM08: ACM Multimedia Conference 2008
      October 26 - 31, 2008
      British Columbia, Vancouver, Canada

      Acceptance Rates

      Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)2
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 30 Oct 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2018)An embedded audio-visual tracking and speech purification system on a dual-core processor platformMicroprocessors & Microsystems10.1016/j.micpro.2010.05.00434:7-8(274-284)Online publication date: 28-Dec-2018
      • (2018)Context-based environmental audio event recognition for scene understandingMultimedia Systems10.1007/s00530-014-0424-721:5(507-524)Online publication date: 27-Dec-2018
      • (2015)ReferencesSimilarity Measures for Face Recognition10.2174/9781681080444115010014(99-106)Online publication date: 26-Apr-2015
      • (2011)Spatio-Temporal Reasoning in Biometrics Based Smart EnvironmentsProcedia Computer Science10.1016/j.procs.2011.07.0495(378-385)Online publication date: 2011
      • (2010)Audio-Visual Fusion and Tracking With Multilevel Iterative Decoding: Framework and Experimental EvaluationIEEE Journal of Selected Topics in Signal Processing10.1109/JSTSP.2010.20578904:5(882-894)Online publication date: Oct-2010
      • (2010)Audiovisual Information Fusion in Human–Computer Interfaces and Intelligent Environments: A SurveyProceedings of the IEEE10.1109/JPROC.2010.205723198:10(1692-1715)Online publication date: Oct-2010
      • (2010)Multimodal identification and tracking in smart environmentsPersonal and Ubiquitous Computing10.1007/s00779-010-0288-614:8(685-694)Online publication date: 1-Dec-2010
      • (2009)Hierarchical audio-visual cue integration framework for activity analysis in intelligent meeting rooms2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops10.1109/CVPRW.2009.5204224(107-114)Online publication date: Jun-2009

      View Options

      Get Access

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media