research-article

Beyond distance measurement: constructing neighborhood similarity for video annotation

Authors:

Xian-Sheng Hua,

Richang HongAuthors Info & Claims

IEEE Transactions on Multimedia, Volume 11, Issue 3

Pages 465 - 476

https://doi.org/10.1109/TMM.2009.2012919

Published: 01 April 2009 Publication History

Abstract

In the past few years, video annotation has benefited a lot from the progress of machine learning techniques. Recently, graph-based semi-supervised learning has gained much attention in this domain. However, as a crucial factor of these algorithms, the estimation of pairwise similarity has not been sufficiently studied. Generally, the similarity of two samples is estimated based on the Euclidean distance between them. But we will show that the similarity between two samples is not merely related to their distance but also related to the distribution of surrounding samples and labels. It is shown that the traditional distance-based similarity measure may lead to high classification error rates even on several simple datasets. To address this issue, we propose a novel neighborhood similarity measure, which explores the local sample and label distributions. We show that the neighborhood similarity between two samples simultaneously takes into account three characteristics: 1) their distance; 2) the distribution difference of the surrounding samples; and 3) the distribution difference of surrounding labels. Extensive experiments have demonstrated the superiority of neighborhood similarity over the existing distance-based similarity.

References

[1]

TRECVID: TREC Video Retrieval Evaluation. {Online}. Available: http://www-nlpir.nist.gov/projects/trecvid.

[2]

TREC-10 Proceedings Appendix on Common Evaluation Measures. {Online}. Available: http://trec.nist.gov/pubs/trec10/appendices/measures. pdf.

[3]

C. C. Aggarwal, A. Hinneburg, and D. A. Keim, "On the surprising behavior of distance metrics in high dimensional space," in Proc. Int. Conf. Database Theory, 2001.

[4]

A. Amir, J. Argillander, M. Campbell, A. Haubold, G. Iyengar, S. Ebadollahi, F. Kang, M. R. Naphade, A. Natsev, J. R. Smith, J. Tesic, and T. Volkmer, "IBM research TRECVID-2005 video retrieval system," in Proc. TREC Video Retrieval Evaluation, 2005.

[5]

M. Belkin, L. Matveeva, and P. Niyogi, "Regularization and semi-supervised learning on large graphs," in Proc. COLT, 2004.

[6]

Y. Bengio, O. Delalleau, and N. L. Roux, "Label propagation and quadratic criterion," in Book Chapter in Semi-Supervised Learning. Cambridge, MA: MIT Press, 2006.

[7]

C. C. Chang and C. J. Lin, LIBSVM: A Library for Support Vector Machines. {Online}. Available: http://www.csie.ntu.edu.tw/~cjlin/libsvm.

[8]

O. Chapelle, A. Zien, and B. Scholkopf, Semi-Supervised Learning. Cambridge, MA: MIT Press, 2006.

Digital Library

[9]

I. Cohen, F. G. Cozman, N. Sebe, M. C. Cirelo, and T. S. Huang, "Semi-supervised learning of classifiers: Theory, algorithms and their application to human-computer interaction," IEEE Trans. Pattern Anal. Mach. Intell., vol. 26, no. 12, pp. 1553-1567, Dec. 2004.

Digital Library

[10]

J. Goldberger, S. Roweis, G. Hinton, and R. Salakhutdinov, "Neighborhood component analysis," in Proc. Advances of Neural Information Processing, 2005.

[11]

T. Hastie and P. Simard, "Models and metrics for handwritten character recognition," Statist. Sci., vol. 13, no. 1, pp. 54-65, 1998.

[12]

T. Hastie, R. Tibshirani, and J. Friedman, The Elements of Statistical Learning: Data Mining, Inference, and Prediction. New York: Springer-Verlag, 2001.

[13]

A. G. Hauptmann, "Lessons for the future from a decade of informedia video analysis research," in Proc. ACM Int. Conf. Image and Video Retrieval, 2005.

Digital Library

[14]

A. G. Hauptmann, R. Yan, W. H. Lin, M. Christel, and H. Wactlar, "Can high-level concepts fill the semantic gap in video retrieval? A case study with broadcast news," IEEE Trans. Multimedia, vol. 9, no. 5, pp. 958-966, Aug. 2007.

Digital Library

[15]

J. R. He, M. J. Li, H. J. Zhang, H. H. Tong, and C. S. Zhang, "Manifoldranking based image retrieval," in Proc. ACM Multimedia, 2004.

[16]

W. Kraaij and P. Over, "TRECVID-2005 high-level feature task: Overview," in Proc. TRECVID. {Online}. Available: http://www-nlpir. nist.gov/projects/tvpubs/tv6.papers/tv6.hlf.slides-final.pdf.

[17]

S. Kullback, Information Theory and Statistics. New York: Wiley, 1959.

[18]

C. Y. Lin, B. Tseng, and J. R. Smith, "VideoAnnEx: IBM MPEG-7 annotation tool for multimedia indexing and concept learning," in Proc. Int. Conf. Multimedia & Expo, 2003.

[19]

M. R. Naphade, L. Kennedy, J. R. Kender, S.-F. Chang, J. R. Smith, P. Over, and A. Hauptmann, "A light scale concept ontology for multimedia understanding for TRECVID 2005," in IBM Research Report RC23612 (W0505-104), 2005.

[20]

M. R. Naphade and J. R. Smith, "On the detection of semantic concepts at TRECVID," in Proc. ACM Multimedia, 2004.

[21]

Y. Rubner, C. Tomasi, and L. J. Guibas, "The earth mover's distance as a metric for image retrieval," Int. J. Comput. Vis., vol. 40, no. 2, pp. 99-121, 2000.

Digital Library

[22]

N. Sebe, M. S. Lew, and D. P. Huijsmans, "Toward improved ranking metrics," IEEE Trans. Pattern Anal. Mach. Intell., vol. 22, no. 10, pp. 1132-1143, Oct. 2000.

Digital Library

[23]

H. Shin, N. J. Hill, and G. Rätsch, "Graph-based semi-supervised learning with sharper edges," in Proc. Eur. Conf. Machine Learning, 2006.

[24]

A. F. Smeaton, P. Over, and W. Kraaij, "Evaluation campaigns and TRECVid," in Proc. ACM Workshop Multimedia Information Retrieval, 2007.

[25]

C. G. Snoek, M.Worring, J.-M. Geusebroek, D. C. Koelma, F. J. Seinstra, and A. W. M. Smeulders, "The semantic pathfinder: Using an authoring metaphor for generic multimedia indexing," IEEE Trans. Pattern Anal. Mach. Intell., vol. 28, no. 10, pp. 1678-1689, Oct. 2006.

Digital Library

[26]

C. G. Snoek, M. Worring, and A. W. Smeulders, "Early versus late fusion in semantic video analysis," in Proc. ACM Multimedia, 2005.

[27]

Y. Song, X. S. Hua, L. R. Dai, and M. Wang, "Semi-automatic video annotation based on active learning with multiple complementary predictors," in Proc. ACM Int. Workshop Multimedia Information Retrieval, 2005.

[28]

J. Tang, X. S. Hua, G. J. Qi, Y. Song, and X. Wu, "Kernel based linear neighborhood label propagation for semantic video annotation," in Proc. Pacific-Asia Conf. Kernel Discovery and Data Mining, 2007.

[29]

M. Wang, X. S. Hua, Y. Song, X. Yuan, S. Li, and H. J. Zhang, "Automatic video annotation by semi-supervised learning with kernel density estimation," in Proc. ACM Multimedia, 2006.

[30]

M. Wang, X. S. Hua, X. Yuan, Y. Song, and L. R. Dai, "Optimizing multi-graph learning: towards a unified video annotation scheme," in Proc. ACM Multimedia, 2007.

[31]

M. Wang, T. Mei, X. Yuan, and L. R. Dai, "Video annotation by graphbased learning with neighborhood similarity," in Proc. ACM Multimedia, 2007.

[32]

K. Q. Weinberger, J. Blitzer, and L. K. Saul, "Distance metric learning for large margin nearest neighbor classification," in Proc. Advances of Neural Information Processing, 2006.

[33]

R. Yan and M. R. Naphade, "Semi-supervised cross feature learning for semantic concept detection in videos," in Proc. Int. Conf. Computer Vision and Pattern Recognition, 2005.

[34]

A. Yanagawa, S.-F. Chang, L. Kennedy, and W. Hsu, Columbia University's Baseline Detectors for 374 LSCOM Semantic Visual Concepts 2007, Columbia University ADVENT Tech. Rep. #222-2006-8.

[35]

L. Yang, R. Jin, R. Sukthankar, and Y. Liu, "An efficient algorithm for local distance metric learning," in Proc. AAAI Conf. Artificial Intelligence, 2006.

[36]

J. Yu, J. Amores, N. Sebe, P. Radeva, and Q. Tian, "Distance learning for similarity estimation," IEEE Trans. Pattern Anal. Mach. Intell., vol. 30, no. 3, pp. 451-462, Mar. 2008.

Digital Library

[37]

X. Yuan, X. S. Hua, M. Wang, and X. Wu, "Manifold-ranking based video concept detection on large database and feature pool," in Proc. ACM Multimedia, 2006.

[38]

M. Zakai, "General distance criteria," IEEE Trans. Inf. Theory, vol. IT-10, no. 1, pp. 94-95, Jan. 1964.

[39]

D. Zhou, O. Bousquet, T. N. Lal, J. Weston, and B. Schölkopf, "Learning with local and global consistency," in Proc. Advances of Neural Information Processing, 2004.

[40]

X. Zhu, Semi-Supervised Learning Literature Survey, University of Wisconsin-Madison, Tech. Rep. (1530).

[41]

X. Zhu, "Semi-supervised learning with graphs," Ph.D. dissertation, Carnegie Mellon Univ., Pittsburgh, PA, 2005.

[42]

X. Zhu, Z. Ghahramani, and J. Lafferty, "Semi-supervised learning using Gaussian fields and harmonic functions," in Proc. Int. Conf. Machine Learning, 2003.

Cited By

Hu HWang KLv CWu JYang Z(2019)Semi-Supervised Metric Learning-Based Anchor Graph Hashing for Large-Scale Image RetrievalIEEE Transactions on Image Processing10.1109/TIP.2018.286089828:2(739-754)Online publication date: 1-Feb-2019
https://dl.acm.org/doi/10.1109/TIP.2018.2860898
Aote SPotnurwar A(2019)An automatic video annotation framework based on two level keyframe extraction mechanismMultimedia Tools and Applications10.1007/s11042-018-6826-378:11(14465-14484)Online publication date: 1-Jun-2019
https://dl.acm.org/doi/10.1007/s11042-018-6826-3
Wei SZhao YYang TZhou ZGe S(2018)Enhancing heterogeneous similarity estimation via neighborhood reversibilityMultimedia Tools and Applications10.1007/s11042-017-4347-077:1(1437-1452)Online publication date: 1-Jan-2018
https://dl.acm.org/doi/10.1007/s11042-017-4347-0
Show More Cited By

Beyond distance measurement: constructing neighborhood similarity for video annotation
1. Computing methodologies
  1. Machine learning
    1. Learning paradigms
      1. Unsupervised learning

Recommendations

Video annotation by graph-based learning with neighborhood similarity
MM '07: Proceedings of the 15th ACM international conference on Multimedia

Graph-based semi-supervised learning methods have been proven effective in tackling the difficulty of training data insufficiency in many practical applications such as video annotation. These methods are all based on an assumption that the labels of ...
Similarity beyond distance measurement
RIAO '07: Large Scale Semantic Access to Content (Text, Image, Video, and Sound)

One of the keys issues to content-based image retrieval is the similarity measurement of images. Images are represented as points in the space of low-level visual features and most similarity measures are based on certain distance measurement between ...
Joint learning of labels and distance metric
Special issue on game theory

Machine learning algorithms frequently suffer from the in sufficiency of training data and the usage of inappropriate distance metric. In this paper, we propose a joint learning of labels and distance metric (JLLDM) approach, which is able to ...

Comments

Information & Contributors

Information

Published In

cover image IEEE Transactions on Multimedia

IEEE Transactions on Multimedia Volume 11, Issue 3

Special section on communities and media computing

April 2009

236 pages

ISSN:1520-9210

Issue’s Table of Contents

Copyright © 2009.

Publisher

IEEE Press

Publication History

Published: 01 April 2009

Revised: 22 October 2008

Received: 08 May 2008

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

99
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 13 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Hu HWang KLv CWu JYang Z(2019)Semi-Supervised Metric Learning-Based Anchor Graph Hashing for Large-Scale Image RetrievalIEEE Transactions on Image Processing10.1109/TIP.2018.286089828:2(739-754)Online publication date: 1-Feb-2019
https://dl.acm.org/doi/10.1109/TIP.2018.2860898
Aote SPotnurwar A(2019)An automatic video annotation framework based on two level keyframe extraction mechanismMultimedia Tools and Applications10.1007/s11042-018-6826-378:11(14465-14484)Online publication date: 1-Jun-2019
https://dl.acm.org/doi/10.1007/s11042-018-6826-3
Wei SZhao YYang TZhou ZGe S(2018)Enhancing heterogeneous similarity estimation via neighborhood reversibilityMultimedia Tools and Applications10.1007/s11042-017-4347-077:1(1437-1452)Online publication date: 1-Jan-2018
https://dl.acm.org/doi/10.1007/s11042-017-4347-0
Li YCao LZhu JLuo J(2017)Mining Fashion Outfit Composition Using an End-to-End Deep Learning Approach on Set DataIEEE Transactions on Multimedia10.1109/TMM.2017.269014419:8(1946-1955)Online publication date: 17-Jul-2017
https://dl.acm.org/doi/10.1109/TMM.2017.2690144
Singh ASaini SShah RNarayanan PBatra DBrown MNatarajan V(2016)Learning to hash-tag videos with Tag2VecProceedings of the Tenth Indian Conference on Computer Vision, Graphics and Image Processing10.1145/3009977.3010035(1-8)Online publication date: 18-Dec-2016
https://dl.acm.org/doi/10.1145/3009977.3010035
Li YYao TMei TChao HRui YHanjalic ASnoek CWorring MBulterman DHuet BKelliher AKompatsiaris YLi J(2016)Share-and-ChatProceedings of the 24th ACM international conference on Multimedia10.1145/2964284.2964320(928-937)Online publication date: 1-Oct-2016
https://dl.acm.org/doi/10.1145/2964284.2964320
Yan HYang JYang J(2016)Robust Joint Feature Weights Learning FrameworkIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2016.251561328:5(1327-1339)Online publication date: 1-May-2016
https://dl.acm.org/doi/10.1109/TKDE.2016.2515613
Liu YFeng XZhou Z(2016)Multimodal video classification with stacked contractive autoencodersSignal Processing10.1016/j.sigpro.2015.01.001120:C(761-766)Online publication date: 1-Mar-2016
https://dl.acm.org/doi/10.1016/j.sigpro.2015.01.001
Lu PPeng XZhu XLi R(2016)An EL-LDA based general color harmony model for photo aesthetics assessmentSignal Processing10.1016/j.sigpro.2014.12.008120:C(731-745)Online publication date: 1-Mar-2016
https://dl.acm.org/doi/10.1016/j.sigpro.2014.12.008
Wang ZFeng YQi TYang XZhang J(2016)Adaptive multi-view feature selection for human motion retrievalSignal Processing10.1016/j.sigpro.2014.11.015120:C(691-701)Online publication date: 1-Mar-2016
https://dl.acm.org/doi/10.1016/j.sigpro.2014.11.015
Show More Cited By

View Options

View options

Figures

Tables

Media

View Issue’s Table of Contents