research-article

Fusing matching and biometric similarity measures for face diarization in video

Authors:

Jean-Marc OdobezAuthors Info & Claims

ICMR '13: Proceedings of the 3rd ACM conference on International conference on multimedia retrieval

Pages 97 - 104

https://doi.org/10.1145/2461466.2461484

Published: 16 April 2013 Publication History

Abstract

This paper addresses face diarization in videos, that is, deciding which face appears and when in the video. To achieve this face-track clustering task, we propose a hierarchical approach combining the strength of two complementary measures: (i) a pairwise matching similarity relying on local interest points allowing the accurate clustering of faces tracks captured in similar conditions, a situation typically found in temporally close shots of broadcast videos or in talk-shows; (ii) a biometric cross-likelihood ratio similarity measure relying on Gaussian Mixture Models (GMMs) modeling the distribution of densely sampled local features (Discrete Cosine Transform (DCT) coefficients), that better handle appearance variability. Experiments carried out on a public video dataset and on the data from the French REPERE challenge demonstrate the effectiveness of our approach in comparison with state-of-the-art methods.

References

[1]

S. O. Ba and J. M. Odobez. A rao-blackwellized mixed state particle filter for head pose tracking. In ACM-ICMI Worksh. on Multi-modal Multi-party Meeting Processing(MMMP), pages 9--16, 2005.

[2]

H. Bay, T. Tuytelaars, and L. V. Gool. Surf: Speeded up robust features. ECCV, pages 404--417, 2006.

Digital Library

[3]

M. Bicego, A. Lagorio, E. Grosso, and M. Tistarelli. On the use of sift features for face authentication. In CVPRW, pages 35--35. IEEE, 2006.

Digital Library

[4]

F. Cardinaux, C. Sanderson, and S. Bengio. User authentication via adapted statistical models of face images. IEEE Trans. on Signal Processing, pages 361--373, 2006.

Digital Library

[5]

W. Chu, Y. Lee, and J. Yu. Visual language model for face clustering in consumer photos. In ACM Int. Conf. on Multimedia, pages 625--628, 2009.

Digital Library

[6]

R. Cinbis, J. Verbeek, and C. Schmid. Unsupervised metric learning for face identification in tv video. In IEEE ICCV, pages 1559--1566, 2011.

Digital Library

[7]

M. Everingham, J. Sivic, and A. Zisserman. Taking the bite out of automated naming of characters in tv video. Image and Vision Computing, pages 545--559, 2009.

Digital Library

[8]

M. Guillaumin, J. Verbeek, and C. Schmid. Is that you? metric learning approaches for face identification. In IEEE ICCV, pages 498--505, 2009.

[9]

J. Kahn, O. Galibert, M. Carré, A. Giraudel, P. Joly, and L. Quintard. The repere challenge: Finding people in a multimodal context. In Odyssey The Speaker and Language Recognition Workshop, 2012.

[10]

E. Khoury, C. Senac, and P. Joly. Face-and-clothing based people clustering in video content. In ACM MIR, pages 295--304, 2010.

Digital Library

[11]

E. Khoury, C. Senac, and P. Joly. Audiovisual diarization of people in video content. Multimedia Tools and Applications, 2012.

[12]

S. Kim, F. Valente, and A. Vinciarelli. Automatic detection of conflicts in spoken conversations: Ratings and analysis of broadcast political debates. In IEEE ICASSP, 2012.

[13]

S. Lucey and T. Chen. A gmm parts based face representation for improved verification through relevance adaptation. In CVPR, pages II--855, 2004.

Digital Library

[14]

C. Sanderson and K. Paliwal. Fast features for face authentication under illumination direction changes. Pattern Recognition Letters, (14):2409--2419, 2003.

Digital Library

[15]

X. Tan and B. Triggs. Enhanced local texture feature sets for face recognition under difficult lighting conditions. IEEE Transactions on Image Processing, (6):1635--1650, 2010.

Digital Library

[16]

M. Tapaswi, M. Bauml, and R. Stiefelhagen. "knock! knock! who is it?" probabilistic person identification in tv-series. In IEEE CVPR, pages 2658--2665, 2012.

Digital Library

[17]

R. Wallace, M. McLaren, C. McCool, and S. Marcel. Cross-pollination of normalization techniques from speaker to face authentication using gaussian mixture models. IEEE Transactions on Information Forensics and Security, 7(2):553--562, 2012.

Digital Library

[18]

J. Yao and J.-M. Odobez. Fast human detection from joint appearance and foreground feature subset covariances. Computer Vision and Image Understanding (CVIU), 115(10):1414--1426, 2011.

[19]

S. Zhao, F. Precioso, and M. Cord. Spatio-temporal tube kernel for actor retrieval. In IEEE ICIP, pages 1885--1888, 2009.

Digital Library

[20]

S. Zhao, F. Precioso, M. Cord, and S. Philipp-Foliguet. Actor retrieval system based on kernels on bags of bags. In EUSIPCO, pages 234--778, 2008.

[21]

X. Zhu, C. Barras, S. Meignier, and J. Gauvain. Combining speaker identification and bic for speaker diarization. In Europ. Conf. on Speech Communication and Technology, pages 2441--2444, 2005.

Cited By

Mingote VViñals IGimeno PMiguel AOrtega ALleida E(2022)Multimodal Diarization Systems by Training Enrollment Models as Identity RepresentationsApplied Sciences10.3390/app1203114112:3(1141)Online publication date: 21-Jan-2022
https://doi.org/10.3390/app12031141
Le NBredin HSargent GIndia MLopez-Otero PBarras CGuinaudeau CGravier Gda Fonseca GFreire IPatrocínio ZGuimarães SMartí GMorros JHernando JDocio-Fernandez LGarcia-Mateo CMeignier SOdobez J(2017)Towards large scale multimedia indexingProceedings of the 15th International Workshop on Content-Based Multimedia Indexing10.1145/3095713.3095732(1-6)Online publication date: 19-Jun-2017
https://dl.acm.org/doi/10.1145/3095713.3095732
Bazzica ALiem CHanjalic A(2017)Exploiting scene maps and spatial relationships in quasi-static scenes for video face clusteringImage and Vision Computing10.1016/j.imavis.2016.11.00557:C(25-43)Online publication date: 1-Jan-2017
https://dl.acm.org/doi/10.1016/j.imavis.2016.11.005
Show More Cited By

Index Terms

Fusing matching and biometric similarity measures for face diarization in video
1. Computing methodologies
  1. Machine learning
    1. Learning paradigms
      1. Unsupervised learning
        Cluster analysis

Recommendations

Characterization and evaluation of similarity measures for pairs of clusterings

In evaluating the results of cluster analysis, it is common practice to make use of a number of fixed heuristics rather than to compare a data clustering directly against an empirically derived standard, such as a clustering empirically obtained from ...
Similarity measures on intuitionistic fuzzy sets

Intuitionistic fuzzy sets (IFSs), proposed by Atanassov, have gained attention from researchers for their applications in various fields. Then similarity measures between IFSs were developed. In this paper, firstly, some existing measures of similarity ...
Effective measures for inter-document similarity
CIKM '13: Proceedings of the 22nd ACM international conference on Information & Knowledge Management

While supervised learning-to-rank algorithms have largely supplanted unsupervised query-document similarity measures for search, the exploration of query-document measures by many researchers over many years produced insights that might be exploited in ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ICMR '13: Proceedings of the 3rd ACM conference on International conference on multimedia retrieval

April 2013

362 pages

ISBN:9781450320337

DOI:10.1145/2461466

General Chairs:
Ramesh Jain
University of California, Irvine, USA
,
Balakrisknan Prabhakaran
University of Texas at Dallas, USA
,
Program Chairs:
Marcel Worring
University of Amsterdam, The Netherlands
,
John Smith
IBM Research, New York, USA
,
Tat-Seng Chua
National University of Singapore

Copyright © 2013 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 16 April 2013

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

ICMR'13

Sponsor:

SIGMM

ICMR'13: International Conference on Multimedia Retrieval

April 16 - 20, 2013

Texas, Dallas, USA

Acceptance Rates

ICMR '13 Paper Acceptance Rate 38 of 96 submissions, 40%;

Overall Acceptance Rate 254 of 830 submissions, 31%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

8
Total Citations
View Citations
133
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 26 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Mingote VViñals IGimeno PMiguel AOrtega ALleida E(2022)Multimodal Diarization Systems by Training Enrollment Models as Identity RepresentationsApplied Sciences10.3390/app1203114112:3(1141)Online publication date: 21-Jan-2022
https://doi.org/10.3390/app12031141
Le NBredin HSargent GIndia MLopez-Otero PBarras CGuinaudeau CGravier Gda Fonseca GFreire IPatrocínio ZGuimarães SMartí GMorros JHernando JDocio-Fernandez LGarcia-Mateo CMeignier SOdobez J(2017)Towards large scale multimedia indexingProceedings of the 15th International Workshop on Content-Based Multimedia Indexing10.1145/3095713.3095732(1-6)Online publication date: 19-Jun-2017
https://dl.acm.org/doi/10.1145/3095713.3095732
Bazzica ALiem CHanjalic A(2017)Exploiting scene maps and spatial relationships in quasi-static scenes for video face clusteringImage and Vision Computing10.1016/j.imavis.2016.11.00557:C(25-43)Online publication date: 1-Jan-2017
https://dl.acm.org/doi/10.1016/j.imavis.2016.11.005
Pang LNgo C(2015)Opinion Question Answering by Sentiment Clip LocalizationACM Transactions on Multimedia Computing, Communications, and Applications10.1145/281871112:2(1-19)Online publication date: 2-Nov-2015
https://dl.acm.org/doi/10.1145/2818711
Auguste RMartinet JTirilly PHauptmann ANgo CXue XJiang YSnoek CVasconcelos N(2015)Space-time Histograms And Their Application To Person Re-identification In TV ShowsProceedings of the 5th ACM on International Conference on Multimedia Retrieval10.1145/2671188.2749332(91-97)Online publication date: 22-Jun-2015
https://dl.acm.org/doi/10.1145/2671188.2749332
Tapaswi MParkhi ORahtu ESommerlade EStiefelhagen RZisserman ARamakrishnan AMalik JEfros AJawahar CVarma M(2014)Total ClusterProceedings of the 2014 Indian Conference on Computer Vision Graphics and Image Processing10.1145/2683483.2683490(1-8)Online publication date: 14-Dec-2014
https://dl.acm.org/doi/10.1145/2683483.2683490
Paul GElie KSylvain MJean-Marc OPaul D(2014)A conditional random field approach for face identification in broadcast news using overlaid text2014 IEEE International Conference on Image Processing (ICIP)10.1109/ICIP.2014.7025063(318-322)Online publication date: Oct-2014
https://doi.org/10.1109/ICIP.2014.7025063
Gay PDupuy GLailler COdobez JMeignier SDeleglise P(2014)Comparison of two methods for unsupervised person identification in TV shows2014 12th International Workshop on Content-Based Multimedia Indexing (CBMI)10.1109/CBMI.2014.6849828(1-6)Online publication date: Jun-2014
https://doi.org/10.1109/CBMI.2014.6849828

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten