Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2461466.2461484acmconferencesArticle/Chapter ViewAbstractPublication PagesicmrConference Proceedingsconference-collections
research-article

Fusing matching and biometric similarity measures for face diarization in video

Published: 16 April 2013 Publication History

Abstract

This paper addresses face diarization in videos, that is, deciding which face appears and when in the video. To achieve this face-track clustering task, we propose a hierarchical approach combining the strength of two complementary measures: (i) a pairwise matching similarity relying on local interest points allowing the accurate clustering of faces tracks captured in similar conditions, a situation typically found in temporally close shots of broadcast videos or in talk-shows; (ii) a biometric cross-likelihood ratio similarity measure relying on Gaussian Mixture Models (GMMs) modeling the distribution of densely sampled local features (Discrete Cosine Transform (DCT) coefficients), that better handle appearance variability. Experiments carried out on a public video dataset and on the data from the French REPERE challenge demonstrate the effectiveness of our approach in comparison with state-of-the-art methods.

References

[1]
S. O. Ba and J. M. Odobez. A rao-blackwellized mixed state particle filter for head pose tracking. In ACM-ICMI Worksh. on Multi-modal Multi-party Meeting Processing(MMMP), pages 9--16, 2005.
[2]
H. Bay, T. Tuytelaars, and L. V. Gool. Surf: Speeded up robust features. ECCV, pages 404--417, 2006.
[3]
M. Bicego, A. Lagorio, E. Grosso, and M. Tistarelli. On the use of sift features for face authentication. In CVPRW, pages 35--35. IEEE, 2006.
[4]
F. Cardinaux, C. Sanderson, and S. Bengio. User authentication via adapted statistical models of face images. IEEE Trans. on Signal Processing, pages 361--373, 2006.
[5]
W. Chu, Y. Lee, and J. Yu. Visual language model for face clustering in consumer photos. In ACM Int. Conf. on Multimedia, pages 625--628, 2009.
[6]
R. Cinbis, J. Verbeek, and C. Schmid. Unsupervised metric learning for face identification in tv video. In IEEE ICCV, pages 1559--1566, 2011.
[7]
M. Everingham, J. Sivic, and A. Zisserman. Taking the bite out of automated naming of characters in tv video. Image and Vision Computing, pages 545--559, 2009.
[8]
M. Guillaumin, J. Verbeek, and C. Schmid. Is that you? metric learning approaches for face identification. In IEEE ICCV, pages 498--505, 2009.
[9]
J. Kahn, O. Galibert, M. Carré, A. Giraudel, P. Joly, and L. Quintard. The repere challenge: Finding people in a multimodal context. In Odyssey The Speaker and Language Recognition Workshop, 2012.
[10]
E. Khoury, C. Senac, and P. Joly. Face-and-clothing based people clustering in video content. In ACM MIR, pages 295--304, 2010.
[11]
E. Khoury, C. Senac, and P. Joly. Audiovisual diarization of people in video content. Multimedia Tools and Applications, 2012.
[12]
S. Kim, F. Valente, and A. Vinciarelli. Automatic detection of conflicts in spoken conversations: Ratings and analysis of broadcast political debates. In IEEE ICASSP, 2012.
[13]
S. Lucey and T. Chen. A gmm parts based face representation for improved verification through relevance adaptation. In CVPR, pages II--855, 2004.
[14]
C. Sanderson and K. Paliwal. Fast features for face authentication under illumination direction changes. Pattern Recognition Letters, (14):2409--2419, 2003.
[15]
X. Tan and B. Triggs. Enhanced local texture feature sets for face recognition under difficult lighting conditions. IEEE Transactions on Image Processing, (6):1635--1650, 2010.
[16]
M. Tapaswi, M. Bauml, and R. Stiefelhagen. "knock! knock! who is it?" probabilistic person identification in tv-series. In IEEE CVPR, pages 2658--2665, 2012.
[17]
R. Wallace, M. McLaren, C. McCool, and S. Marcel. Cross-pollination of normalization techniques from speaker to face authentication using gaussian mixture models. IEEE Transactions on Information Forensics and Security, 7(2):553--562, 2012.
[18]
J. Yao and J.-M. Odobez. Fast human detection from joint appearance and foreground feature subset covariances. Computer Vision and Image Understanding (CVIU), 115(10):1414--1426, 2011.
[19]
S. Zhao, F. Precioso, and M. Cord. Spatio-temporal tube kernel for actor retrieval. In IEEE ICIP, pages 1885--1888, 2009.
[20]
S. Zhao, F. Precioso, M. Cord, and S. Philipp-Foliguet. Actor retrieval system based on kernels on bags of bags. In EUSIPCO, pages 234--778, 2008.
[21]
X. Zhu, C. Barras, S. Meignier, and J. Gauvain. Combining speaker identification and bic for speaker diarization. In Europ. Conf. on Speech Communication and Technology, pages 2441--2444, 2005.

Cited By

View all
  • (2022)Multimodal Diarization Systems by Training Enrollment Models as Identity RepresentationsApplied Sciences10.3390/app1203114112:3(1141)Online publication date: 21-Jan-2022
  • (2017)Towards large scale multimedia indexingProceedings of the 15th International Workshop on Content-Based Multimedia Indexing10.1145/3095713.3095732(1-6)Online publication date: 19-Jun-2017
  • (2017)Exploiting scene maps and spatial relationships in quasi-static scenes for video face clusteringImage and Vision Computing10.1016/j.imavis.2016.11.00557:C(25-43)Online publication date: 1-Jan-2017
  • Show More Cited By

Index Terms

  1. Fusing matching and biometric similarity measures for face diarization in video

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    ICMR '13: Proceedings of the 3rd ACM conference on International conference on multimedia retrieval
    April 2013
    362 pages
    ISBN:9781450320337
    DOI:10.1145/2461466
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 16 April 2013

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. clustering
    2. face diarization
    3. similarity measures

    Qualifiers

    • Research-article

    Conference

    ICMR'13
    Sponsor:

    Acceptance Rates

    ICMR '13 Paper Acceptance Rate 38 of 96 submissions, 40%;
    Overall Acceptance Rate 254 of 830 submissions, 31%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 26 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2022)Multimodal Diarization Systems by Training Enrollment Models as Identity RepresentationsApplied Sciences10.3390/app1203114112:3(1141)Online publication date: 21-Jan-2022
    • (2017)Towards large scale multimedia indexingProceedings of the 15th International Workshop on Content-Based Multimedia Indexing10.1145/3095713.3095732(1-6)Online publication date: 19-Jun-2017
    • (2017)Exploiting scene maps and spatial relationships in quasi-static scenes for video face clusteringImage and Vision Computing10.1016/j.imavis.2016.11.00557:C(25-43)Online publication date: 1-Jan-2017
    • (2015)Opinion Question Answering by Sentiment Clip LocalizationACM Transactions on Multimedia Computing, Communications, and Applications10.1145/281871112:2(1-19)Online publication date: 2-Nov-2015
    • (2015)Space-time Histograms And Their Application To Person Re-identification In TV ShowsProceedings of the 5th ACM on International Conference on Multimedia Retrieval10.1145/2671188.2749332(91-97)Online publication date: 22-Jun-2015
    • (2014)Total ClusterProceedings of the 2014 Indian Conference on Computer Vision Graphics and Image Processing10.1145/2683483.2683490(1-8)Online publication date: 14-Dec-2014
    • (2014)A conditional random field approach for face identification in broadcast news using overlaid text2014 IEEE International Conference on Image Processing (ICIP)10.1109/ICIP.2014.7025063(318-322)Online publication date: Oct-2014
    • (2014)Comparison of two methods for unsupervised person identification in TV shows2014 12th International Workshop on Content-Based Multimedia Indexing (CBMI)10.1109/CBMI.2014.6849828(1-6)Online publication date: Jun-2014

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media