Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Effective transfer tagging from image to video

Published: 10 May 2013 Publication History

Abstract

Recent years have witnessed a great explosion of user-generated videos on the Web. In order to achieve an effective and efficient video search, it is critical for modern video search engines to associate videos with semantic keywords automatically. Most of the existing video tagging methods can hardly achieve reliable performance due to deficiency of training data. It is noticed that abundant well-tagged data are available in other relevant types of media (e.g., images). In this article, we propose a novel video tagging framework, termed as Cross-Media Tag Transfer (CMTT), which utilizes the abundance of well-tagged images to facilitate video tagging. Specifically, we build a “cross-media tunnel” to transfer knowledge from images to videos. To this end, an optimal kernel space, in which distribution distance between images and video is minimized, is found to tackle the domain-shift problem. A novel cross-media video tagging model is proposed to infer tags by exploring the intrinsic local structures of both labeled and unlabeled data, and learn reliable video classifiers. An efficient algorithm is designed to optimize the proposed model in an iterative and alternative way. Extensive experiments illustrate the superiority of our proposal compared to the state-of-the-art algorithms.

References

[1]
Belkin, M., Niyogi, P., and Sindhwani, V. 2006. Manifold regularization: A geometric framework for learning from labeled and unlabeled examples. J. Mach. Learn. Res. 7, 2399--2434.
[2]
Borgwardt, K. M., Gretton, A., Rasch, M. J., Kriegel, H.-P., Scholkopf, B., and Smola, A. J. 2006. Integrating structured biological data by kernel maximum mean discrepancy. Bioinf. 22, e49--e57.
[3]
Chua, T.-S., Tang, J., Hong, R., Li, H., Luo, Z., and Zheng, Y. 2009. Nus-Wide: A real-world web image database from national university of singapore. In Proceeedings of the ACM International Conference on Image and Video Retrieval (CIVR'09). 48:1--48:9.
[4]
Cortes, C., Mohri, M., and Rostamizadeh, A. 2009. L2 regularization for learning kernels. In Proceedings of the Conference on Uncertainty in Artificial Intelligence (UAI'09). 109--116.
[5]
Dai, W., Yang, Q., Xue, G., and Yu, Y. 2007. Boosting for transfer learning. In Proceedings of the International Conference on Machine Learning (ICML'07). 193--200.
[6]
Duan, L., Xu, D., Tsang, I. W.-H., and Luo, J. 2010. Visual event recognition in videos by learning from web data. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR'10). 1959--1966.
[7]
Fan, J., Shen, Y., Zhou, N., and Gao, Y. 2010. Harvesting large-scale weakly-tagged image databases from the web. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recogntion (CVPR'10). 802--809.
[8]
Grant, M. and Boyd, S. 2011. CVX: Matlab software for disciplined convex programming, version 1.21. http://cvxr.com/cvx/.
[9]
Huiskes, M. J. and Lew, M. S. 2008. The mir flickr retrieval evaluation. In Proceedings of the ACM International Conference on Multimedia Information Retrieval (MIR'08). 39--43.
[10]
Jiang, W., Zavesky, E., Chang, S., and Loui, A. 2008. Cross-Domain learning methods for high-level visual concept classification. In Proceedings of the International Conference on Image Processing (ICIP'08). 161--164.
[11]
Jiang, Y.-G., Ngo, C.-W., and Chang, S.-F. 2009a. Semantic context transfer across heterogeneous sources for domain adaptive video search. In Proceedings of the ACM Multimedia Conference. 155--164.
[12]
Jiang, Y.-G., Wang, J., Chang, S.-F., and Ngo, C.-W. 2009b. Domain adaptive semantic diffusion for large scale context-based video annotation. In Proceedings of the IEEE International Conference on Computer Vision (ICCV'09). 1420--1427.
[13]
Liu, X., Yao, H., Ji, R., Xu, P., Sun, X., and Tian, Q. 2011. Learning heterogeneous data for hierarchical web video classification. In Proceedings of the ACM Multimedia Conference. 433--442.
[14]
Loui, A. C., Chang, S.-F., Ellis, D., Jiang, W., Kennedy, L., Lee, K., and Yanagawa, A. 2008. Kodak's consumer video benchmark data set: concept definition and annotation. http://www.ee.columbia.edu/~wjiang/references/datamir07.pdf.
[15]
Ojala, T., Pietikainen, M., and Harwood, D. 1996. A comparative study of texture measures with classification based on featured distributions. Pattern Recogn. 29, 51--59.
[16]
Pan, S. J., Kwok, J. T., and Yang, Q. 2008. Transfer learning via dimensionality reduction. In Proceedings of the AAAI Conference on Artificial Intelligence. 677--682.
[17]
Rakotomamonjy, A., Bach, F. R., Canu, S., and Grandvalet, Y. 2008. Simplemkl. J. Mach. Learn. Res. 9, 2491--2521.
[18]
Rockafellar, R. and Roger, J. 2005. Variational Analysis. Springer.
[19]
Tang, J., Hua, X.-S., Qi, G.-J., Song, Y., and Wu, X. 2008. Video annotation based on kernel linear neighborhood propagation. IEEE Trans. Multimedia 10, 4, 620--628.
[20]
Tang, J., Hua, X.-S., Qi, G.-J., Wang, M., Mei, T., and Wu, X. 2007. Structure-Sensitive manifold ranking for video concept detection. In Proceedings of the ACM Multimedia Conference. 852--861.
[21]
Tang, J., Yan, S., Hong, R., Qi, G.-J., and Chua, T.-S. 2009. Inferring semantic concepts from community-contributed images and noisy tags. In Proceedings of the ACM Multimedia Conference. 223--232.
[22]
Torralba, A., Fergus, R., and Freeman, W. 2008. 80 million tiny images: A large data set for nonparametric object and scene recognition. IEEE Trans. Pattern Anal. Mach. Intell. 30, 11, 1958--1970.
[23]
Trecvid. 2007. Trec video retrieval evaluation. http://www.nlpir.nist.gov/projects/trecvid.
[24]
Wang, M., Hong, R., Li, G., Zha, Z., Yan, S., and Chua, T. 2011. Event driven web video summarization by tag localization and key-shot identification. IEEE Trans. Multimedia 14, 99, 1--1.
[25]
Wang, M., Hua, X., Mei, T., Hong, R., Qi, G., Song, Y., and Dai, L. 2009a. Semi-Supervised kernel density estimation for video annotation. J. Comput. Vis. Image Understand. 113, 3, 384--396.
[26]
Wang, M., Hua, X., Tang, J., and Hong, R. 2009b. Beyond distance measurement: Constructing neighborhood similarity for video annotation. IEEE Trans. Multimedia 11, 3, 465--476.
[27]
Wang, M, Hua, X.-S., Hong, R., Tang, J., Qi, G.-J., and Song, Y. 2009c. Unified video annotation via multigraph learning. IEEE Trans. Circ. Syst. Video Technol. 19, 5, 733--746.
[28]
Wang, M., Yang, K., Hua, X., and Zhang, H. 2010. Towards a relevant and diverse search of social images. IEEE Trans. Multimedia 12, 8, 829--842.
[29]
Yang, J., Yan, R., and Hauptmann, A. G. 2007. Cross-Domain video concept detection using adaptive svms. In Proceedings of the ACM Multimedia Conference. 188--197.
[30]
Yang, Y., Huang, Z., Shen, H. T., and Zhou, X. 2011a. Mining multi-tag association for image tagging. World Wide Web 14, 2, 133--156.
[31]
Yang, Y., Xu, D., Nie, F., Luo, J., and Zhuang, Y. 2009. Ranking with local regression and global alignment for cross media retrieval. In Proceedings of the ACM Multimedia Conference. 175--184.
[32]
Yang, Y., Yang, Y., Huang, Z., and Ma, Z. 2012. Robust cross-media transfer for visual event detection. In Proceedings of the ACM Multimedia Conference.
[33]
Yang, Y., Yang, Y., Huang, Z., and Shen, H. 2011b. Transfer tagging from image to video. In Proceedings of the ACM Multimedia Conference. 1137--1140.
[34]
Yang, Y., Yang, Y., Huang, Z., Shen, H., and Nie, F. 2011c. Tag localization with spatial correlations and joint group sparsity. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR'11). 881--888.
[35]
Yao, Y. and Doretto, G. 2010. Boosting for transfer learning with multiple sources. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR'10). 1855--1862.
[36]
Zha, Z.-J., Hua, X.-S., Mei, T., Wang, J., Qi, G.-J., and Wang, Z. 2008. Joint multi-label multi-instance learning for image classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR'08). 1--8.
[37]
Zha, Z.-J., Wang, M., Zheng, Y.-T., Yang, Y., Hong, R., and Chua, T.-S. 2012. Interactive video indexing with statistical active learning. IEEE Trans. Multimedia 14, 1, 17--27.
[38]
Zha, Z.-J., Yang, L., Mei, T., Wang, M., and Wang, Z. 2009. Visual query suggestion. In Proceedings of the ACM Multimedia Conference. 15--24.
[39]
Zha, Z.-J., Yang, L., Mei, T., Wang, M., Wang, Z., Chua, T.-S., and Hua, X.-S. 2010. Visual query suggestion: Towards capturing user intent in internet image search. ACM Trans. Multimedia Comput. Comm. Appl. 6, 3, 1--19.
[40]
Zhu, X. 2008. Semi-Supervised learning literature survey. http://pages.cs.wisc.edu/~jerryzhu/pub/ssl_survey_7_19_2008.pdf.
[41]
Zhu, X., Huang, Z., and Shen, H. T. 2011a. Video-to-Shot tag allocation by weighted sparse group lasso. In Proceedings of the ACM Multimedia Conference. 1501--1504.
[42]
Zhu, Y., Chen, Y., Lu, Z., Pan, S., Xue, G., Yu, Y., and Yang, Q. 2011b. Heterogeneous transfer learning for image classification. In Proceedings of the AAAI Conference on Artificial Intelligence. 1304--1309.

Cited By

View all
  • (2024)AED-PADA: Improving Generalizability of Adversarial Example Detection via Principal Adversarial Domain AdaptationACM Transactions on Multimedia Computing, Communications, and Applications10.1145/370606121:2(1-24)Online publication date: 3-Dec-2024
  • (2021)A Survey of Transfer Learning Applied in Medical Image Recognition2021 IEEE International Conference on Advances in Electrical Engineering and Computer Applications (AEECA)10.1109/AEECA52519.2021.9574368(94-97)Online publication date: 27-Aug-2021
  • (2020)Tag Pollution Detection in Web Videos via Cross-Modal Relevance Estimation2020 IEEE/ACM 28th International Symposium on Quality of Service (IWQoS)10.1109/IWQoS49365.2020.9212971(1-10)Online publication date: Jun-2020
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Multimedia Computing, Communications, and Applications
ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 9, Issue 2
May 2013
144 pages
ISSN:1551-6857
EISSN:1551-6865
DOI:10.1145/2457450
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 10 May 2013
Accepted: 01 February 2013
Revised: 01 August 2012
Received: 01 June 2012
Published in TOMM Volume 9, Issue 2

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Video tagging
  2. cross media
  3. semi-supervised learning
  4. transfer learning

Qualifiers

  • Research-article
  • Research
  • Refereed

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)8
  • Downloads (Last 6 weeks)1
Reflects downloads up to 23 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)AED-PADA: Improving Generalizability of Adversarial Example Detection via Principal Adversarial Domain AdaptationACM Transactions on Multimedia Computing, Communications, and Applications10.1145/370606121:2(1-24)Online publication date: 3-Dec-2024
  • (2021)A Survey of Transfer Learning Applied in Medical Image Recognition2021 IEEE International Conference on Advances in Electrical Engineering and Computer Applications (AEECA)10.1109/AEECA52519.2021.9574368(94-97)Online publication date: 27-Aug-2021
  • (2020)Tag Pollution Detection in Web Videos via Cross-Modal Relevance Estimation2020 IEEE/ACM 28th International Symposium on Quality of Service (IWQoS)10.1109/IWQoS49365.2020.9212971(1-10)Online publication date: Jun-2020
  • (2019)Learning Click-Based Deep Structure-Preserving Embeddings with Visual AttentionACM Transactions on Multimedia Computing, Communications, and Applications10.1145/332899415:3(1-19)Online publication date: 8-Aug-2019
  • (2019)A Multiview Representation Framework for Micro-Expression RecognitionIEEE Access10.1109/ACCESS.2019.29327847(120670-120680)Online publication date: 2019
  • (2018)Pseudo Transfer with Marginalized Corrupted Attribute for Zero-shot LearningProceedings of the 26th ACM international conference on Multimedia10.1145/3240508.3240715(1802-1810)Online publication date: 15-Oct-2018
  • (2018)Noise Tolerant Localization for Sensor NetworksIEEE/ACM Transactions on Networking10.1109/TNET.2018.285275426:4(1701-1714)Online publication date: 1-Aug-2018
  • (2018)Cross-Domain Collaborative Learning via Discriminative Nonparametric Bayesian ModelIEEE Transactions on Multimedia10.1109/TMM.2017.278522720:8(2086-2099)Online publication date: Aug-2018
  • (2018)Q-FDBAMultimedia Tools and Applications10.1007/s11042-017-4917-177:9(10787-10806)Online publication date: 1-May-2018
  • (2017)Unsupervised feature selection for visual classification via feature-representation propertyNeurocomputing10.5555/3063411.3063458236:C(5-13)Online publication date: 2-May-2017
  • Show More Cited By

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media