Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Beyond distance measurement: constructing neighborhood similarity for video annotation

Published: 01 April 2009 Publication History

Abstract

In the past few years, video annotation has benefited a lot from the progress of machine learning techniques. Recently, graph-based semi-supervised learning has gained much attention in this domain. However, as a crucial factor of these algorithms, the estimation of pairwise similarity has not been sufficiently studied. Generally, the similarity of two samples is estimated based on the Euclidean distance between them. But we will show that the similarity between two samples is not merely related to their distance but also related to the distribution of surrounding samples and labels. It is shown that the traditional distance-based similarity measure may lead to high classification error rates even on several simple datasets. To address this issue, we propose a novel neighborhood similarity measure, which explores the local sample and label distributions. We show that the neighborhood similarity between two samples simultaneously takes into account three characteristics: 1) their distance; 2) the distribution difference of the surrounding samples; and 3) the distribution difference of surrounding labels. Extensive experiments have demonstrated the superiority of neighborhood similarity over the existing distance-based similarity.

References

[1]
TRECVID: TREC Video Retrieval Evaluation. {Online}. Available: http://www-nlpir.nist.gov/projects/trecvid.
[2]
TREC-10 Proceedings Appendix on Common Evaluation Measures. {Online}. Available: http://trec.nist.gov/pubs/trec10/appendices/measures. pdf.
[3]
C. C. Aggarwal, A. Hinneburg, and D. A. Keim, "On the surprising behavior of distance metrics in high dimensional space," in Proc. Int. Conf. Database Theory, 2001.
[4]
A. Amir, J. Argillander, M. Campbell, A. Haubold, G. Iyengar, S. Ebadollahi, F. Kang, M. R. Naphade, A. Natsev, J. R. Smith, J. Tesic, and T. Volkmer, "IBM research TRECVID-2005 video retrieval system," in Proc. TREC Video Retrieval Evaluation, 2005.
[5]
M. Belkin, L. Matveeva, and P. Niyogi, "Regularization and semi-supervised learning on large graphs," in Proc. COLT, 2004.
[6]
Y. Bengio, O. Delalleau, and N. L. Roux, "Label propagation and quadratic criterion," in Book Chapter in Semi-Supervised Learning. Cambridge, MA: MIT Press, 2006.
[7]
C. C. Chang and C. J. Lin, LIBSVM: A Library for Support Vector Machines. {Online}. Available: http://www.csie.ntu.edu.tw/~cjlin/libsvm.
[8]
O. Chapelle, A. Zien, and B. Scholkopf, Semi-Supervised Learning. Cambridge, MA: MIT Press, 2006.
[9]
I. Cohen, F. G. Cozman, N. Sebe, M. C. Cirelo, and T. S. Huang, "Semi-supervised learning of classifiers: Theory, algorithms and their application to human-computer interaction," IEEE Trans. Pattern Anal. Mach. Intell., vol. 26, no. 12, pp. 1553-1567, Dec. 2004.
[10]
J. Goldberger, S. Roweis, G. Hinton, and R. Salakhutdinov, "Neighborhood component analysis," in Proc. Advances of Neural Information Processing, 2005.
[11]
T. Hastie and P. Simard, "Models and metrics for handwritten character recognition," Statist. Sci., vol. 13, no. 1, pp. 54-65, 1998.
[12]
T. Hastie, R. Tibshirani, and J. Friedman, The Elements of Statistical Learning: Data Mining, Inference, and Prediction. New York: Springer-Verlag, 2001.
[13]
A. G. Hauptmann, "Lessons for the future from a decade of informedia video analysis research," in Proc. ACM Int. Conf. Image and Video Retrieval, 2005.
[14]
A. G. Hauptmann, R. Yan, W. H. Lin, M. Christel, and H. Wactlar, "Can high-level concepts fill the semantic gap in video retrieval? A case study with broadcast news," IEEE Trans. Multimedia, vol. 9, no. 5, pp. 958-966, Aug. 2007.
[15]
J. R. He, M. J. Li, H. J. Zhang, H. H. Tong, and C. S. Zhang, "Manifoldranking based image retrieval," in Proc. ACM Multimedia, 2004.
[16]
W. Kraaij and P. Over, "TRECVID-2005 high-level feature task: Overview," in Proc. TRECVID. {Online}. Available: http://www-nlpir. nist.gov/projects/tvpubs/tv6.papers/tv6.hlf.slides-final.pdf.
[17]
S. Kullback, Information Theory and Statistics. New York: Wiley, 1959.
[18]
C. Y. Lin, B. Tseng, and J. R. Smith, "VideoAnnEx: IBM MPEG-7 annotation tool for multimedia indexing and concept learning," in Proc. Int. Conf. Multimedia & Expo, 2003.
[19]
M. R. Naphade, L. Kennedy, J. R. Kender, S.-F. Chang, J. R. Smith, P. Over, and A. Hauptmann, "A light scale concept ontology for multimedia understanding for TRECVID 2005," in IBM Research Report RC23612 (W0505-104), 2005.
[20]
M. R. Naphade and J. R. Smith, "On the detection of semantic concepts at TRECVID," in Proc. ACM Multimedia, 2004.
[21]
Y. Rubner, C. Tomasi, and L. J. Guibas, "The earth mover's distance as a metric for image retrieval," Int. J. Comput. Vis., vol. 40, no. 2, pp. 99-121, 2000.
[22]
N. Sebe, M. S. Lew, and D. P. Huijsmans, "Toward improved ranking metrics," IEEE Trans. Pattern Anal. Mach. Intell., vol. 22, no. 10, pp. 1132-1143, Oct. 2000.
[23]
H. Shin, N. J. Hill, and G. Rätsch, "Graph-based semi-supervised learning with sharper edges," in Proc. Eur. Conf. Machine Learning, 2006.
[24]
A. F. Smeaton, P. Over, and W. Kraaij, "Evaluation campaigns and TRECVid," in Proc. ACM Workshop Multimedia Information Retrieval, 2007.
[25]
C. G. Snoek, M.Worring, J.-M. Geusebroek, D. C. Koelma, F. J. Seinstra, and A. W. M. Smeulders, "The semantic pathfinder: Using an authoring metaphor for generic multimedia indexing," IEEE Trans. Pattern Anal. Mach. Intell., vol. 28, no. 10, pp. 1678-1689, Oct. 2006.
[26]
C. G. Snoek, M. Worring, and A. W. Smeulders, "Early versus late fusion in semantic video analysis," in Proc. ACM Multimedia, 2005.
[27]
Y. Song, X. S. Hua, L. R. Dai, and M. Wang, "Semi-automatic video annotation based on active learning with multiple complementary predictors," in Proc. ACM Int. Workshop Multimedia Information Retrieval, 2005.
[28]
J. Tang, X. S. Hua, G. J. Qi, Y. Song, and X. Wu, "Kernel based linear neighborhood label propagation for semantic video annotation," in Proc. Pacific-Asia Conf. Kernel Discovery and Data Mining, 2007.
[29]
M. Wang, X. S. Hua, Y. Song, X. Yuan, S. Li, and H. J. Zhang, "Automatic video annotation by semi-supervised learning with kernel density estimation," in Proc. ACM Multimedia, 2006.
[30]
M. Wang, X. S. Hua, X. Yuan, Y. Song, and L. R. Dai, "Optimizing multi-graph learning: towards a unified video annotation scheme," in Proc. ACM Multimedia, 2007.
[31]
M. Wang, T. Mei, X. Yuan, and L. R. Dai, "Video annotation by graphbased learning with neighborhood similarity," in Proc. ACM Multimedia, 2007.
[32]
K. Q. Weinberger, J. Blitzer, and L. K. Saul, "Distance metric learning for large margin nearest neighbor classification," in Proc. Advances of Neural Information Processing, 2006.
[33]
R. Yan and M. R. Naphade, "Semi-supervised cross feature learning for semantic concept detection in videos," in Proc. Int. Conf. Computer Vision and Pattern Recognition, 2005.
[34]
A. Yanagawa, S.-F. Chang, L. Kennedy, and W. Hsu, Columbia University's Baseline Detectors for 374 LSCOM Semantic Visual Concepts 2007, Columbia University ADVENT Tech. Rep. #222-2006-8.
[35]
L. Yang, R. Jin, R. Sukthankar, and Y. Liu, "An efficient algorithm for local distance metric learning," in Proc. AAAI Conf. Artificial Intelligence, 2006.
[36]
J. Yu, J. Amores, N. Sebe, P. Radeva, and Q. Tian, "Distance learning for similarity estimation," IEEE Trans. Pattern Anal. Mach. Intell., vol. 30, no. 3, pp. 451-462, Mar. 2008.
[37]
X. Yuan, X. S. Hua, M. Wang, and X. Wu, "Manifold-ranking based video concept detection on large database and feature pool," in Proc. ACM Multimedia, 2006.
[38]
M. Zakai, "General distance criteria," IEEE Trans. Inf. Theory, vol. IT-10, no. 1, pp. 94-95, Jan. 1964.
[39]
D. Zhou, O. Bousquet, T. N. Lal, J. Weston, and B. Schölkopf, "Learning with local and global consistency," in Proc. Advances of Neural Information Processing, 2004.
[40]
X. Zhu, Semi-Supervised Learning Literature Survey, University of Wisconsin-Madison, Tech. Rep. (1530).
[41]
X. Zhu, "Semi-supervised learning with graphs," Ph.D. dissertation, Carnegie Mellon Univ., Pittsburgh, PA, 2005.
[42]
X. Zhu, Z. Ghahramani, and J. Lafferty, "Semi-supervised learning using Gaussian fields and harmonic functions," in Proc. Int. Conf. Machine Learning, 2003.

Cited By

View all
  • (2019)Semi-Supervised Metric Learning-Based Anchor Graph Hashing for Large-Scale Image RetrievalIEEE Transactions on Image Processing10.1109/TIP.2018.286089828:2(739-754)Online publication date: 1-Feb-2019
  • (2019)An automatic video annotation framework based on two level keyframe extraction mechanismMultimedia Tools and Applications10.1007/s11042-018-6826-378:11(14465-14484)Online publication date: 1-Jun-2019
  • (2018)Enhancing heterogeneous similarity estimation via neighborhood reversibilityMultimedia Tools and Applications10.1007/s11042-017-4347-077:1(1437-1452)Online publication date: 1-Jan-2018
  • Show More Cited By
  1. Beyond distance measurement: constructing neighborhood similarity for video annotation

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image IEEE Transactions on Multimedia
    IEEE Transactions on Multimedia  Volume 11, Issue 3
    Special section on communities and media computing
    April 2009
    236 pages

    Publisher

    IEEE Press

    Publication History

    Published: 01 April 2009
    Revised: 22 October 2008
    Received: 08 May 2008

    Author Tags

    1. Neighborhood similarity
    2. neighborhood similarity
    3. semi-supervised learning
    4. video annotation

    Qualifiers

    • Research-article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 13 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2019)Semi-Supervised Metric Learning-Based Anchor Graph Hashing for Large-Scale Image RetrievalIEEE Transactions on Image Processing10.1109/TIP.2018.286089828:2(739-754)Online publication date: 1-Feb-2019
    • (2019)An automatic video annotation framework based on two level keyframe extraction mechanismMultimedia Tools and Applications10.1007/s11042-018-6826-378:11(14465-14484)Online publication date: 1-Jun-2019
    • (2018)Enhancing heterogeneous similarity estimation via neighborhood reversibilityMultimedia Tools and Applications10.1007/s11042-017-4347-077:1(1437-1452)Online publication date: 1-Jan-2018
    • (2017)Mining Fashion Outfit Composition Using an End-to-End Deep Learning Approach on Set DataIEEE Transactions on Multimedia10.1109/TMM.2017.269014419:8(1946-1955)Online publication date: 17-Jul-2017
    • (2016)Learning to hash-tag videos with Tag2VecProceedings of the Tenth Indian Conference on Computer Vision, Graphics and Image Processing10.1145/3009977.3010035(1-8)Online publication date: 18-Dec-2016
    • (2016)Share-and-ChatProceedings of the 24th ACM international conference on Multimedia10.1145/2964284.2964320(928-937)Online publication date: 1-Oct-2016
    • (2016)Robust Joint Feature Weights Learning FrameworkIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2016.251561328:5(1327-1339)Online publication date: 1-May-2016
    • (2016)Multimodal video classification with stacked contractive autoencodersSignal Processing10.1016/j.sigpro.2015.01.001120:C(761-766)Online publication date: 1-Mar-2016
    • (2016)An EL-LDA based general color harmony model for photo aesthetics assessmentSignal Processing10.1016/j.sigpro.2014.12.008120:C(731-745)Online publication date: 1-Mar-2016
    • (2016)Adaptive multi-view feature selection for human motion retrievalSignal Processing10.1016/j.sigpro.2014.11.015120:C(691-701)Online publication date: 1-Mar-2016
    • Show More Cited By

    View Options

    View options

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media