research-article

Collaborative learning for image and video annotation

Authors:

Xian-Sheng HuaAuthors Info & Claims

MIR '08: Proceedings of the 1st ACM international conference on Multimedia information retrieval

Pages 443 - 450

https://doi.org/10.1145/1460096.1460168

Published: 30 October 2008 Publication History

Abstract

Classical machine learning methods, such as Support Vector Machines, by taking each concept detection as an independent classification problem, can not achieve a sound performance for image and video annotation due to the overfitting problems. Thus, some prior knowledge is required to assist the learning of independent concept detectors, e.g. some concepts look much more alike to each other. In this paper, we assume that visually similar concepts should share resembled detectors. Based on the assumption, Collaborative Learning is proposed, to incorporate cross-concept collaborations into the joint learning of similar detectors over related concepts. Besides the collaborations, different concepts should also perform discriminations for classifying each other. To benefit from different trade-offs between collaboration and discrimination, we propose Multi-Granularity Boosting strategy, where each granularity corresponds to a specific balance between collaboration and discrimination for Collaborative Learning. The ultimate concept detector is an additive model that combines classifiers under different collaboration granularities together. Evaluations on both image and video annotation benchmark demonstrate that our method achieves a superior performance over independent annotation.

References

[1]

Trecvid. http://www-nlpir.nist.gov/projects/trecvid/.

[2]

A. Amir and et al. Ibm research trecvid-2003 video retrieval system. In TRECVID Proceedings, 2003.

[3]

N. Aronszajn. Theory of reproducing kernels. Trans. Am. Math. Soc., (68):337--404, 1950.

[4]

E. Bart and S. Ullman. Cross-generalization: Learning novel classes from a single example by feature replacement. In IEEE CVPR, 2005.

Digital Library

[5]

S. Boyd and L. Vandenberghe. Convex Optimization. Cambridge University Press, 2004.

Digital Library

[6]

S.-F. Chang and et al. Columbia university trecvid-2006 video search and high-level feature extraction. In TRECVID Proceedings, 2006.

[7]

J. Fan, Y. Gao, and H. Luo. Hierarchical classification for automatic image annotation. In ACM SIGIR, 2007.

Digital Library

[8]

J. Friedman, T. Hastie, and R. Tibshirani. Additive logistic regression: a statistical view of boosting. In Annals of statistics, 2000.

[9]

A. Hauptmann, M.-Y. Chen, and M. Christel. Confounded expectations: Informedia at trecvid 2004. In TRECVID Proceedings, 2004.

[10]

G. A. Miller. Wordnet: a lexical database for english. Commun. ACM, 38(11), 1995.

Digital Library

[11]

P. Moreno, P. Ho, and N. Vasconcelos. A kullback-leibler divergence based kernel for svm classification in multimedia applications. In NIPS, 2003.

[12]

M. R. Naphade. Statistical techniques in video data management. In IEEE Workshop on Multimedia Signal Processing, 2002.

[13]

M. R. Naphade and et al. A light scale concept ontology for multimedia. Technical report, 2005.

[14]

M. R. Naphade, I. Kozintsev, and T. S. Huang. Factor graph framework for semantic video indexing. IEEE Trans. on CSVT, 12(1), Jan. 2002.

Digital Library

[15]

G.-J. Qi, X.-S. Hua, Y. Rui, J. Tang, T. Mei, and H.-J. Zhang. Correlative multi-label video annotation. In ACM Multimedia, 2007.

Digital Library

[16]

A. Torralba and K. P. Murphy. Sharing visual features for multiclass and multiview object detection. IEEE Trans. on PAMI, 29(5), 2007.

Digital Library

[17]

V. N. Vapnik. Statistical Learning Theory.Wiley-Interscience, 1998.

[18]

L. Wu, X.-S. Hua, N. Yu,W.-Y. Ma, and S. Li. Flickr distance. In ACM Multimedia, 2008.

Digital Library

[19]

R. Yan, J. Tesic, and J. R. Smith.Model-shared subspace boosting for multi-label classification. In ACM SIGKDD, 2007.

Digital Library

[20]

A. Yanagawa, S.-F. Chang, L. Kennedy, andW. Hsu. Columbia university's baseline detectors for 374 lscom semantic visual concepts. Columbia University ADVENT Technical Report, (222--2006--8), 2007.

[21]

J. Yang, R. Yan, and A. G. Hauptmann. Cross-domain video concept detection using adaptive svms. In ACM Multimedia, 2007.

Digital Library

Cited By

Wu BJia FLiu WGhanem BLyu S(2018)Multi-label Learning with Missing Labels Using Mixed Dependency GraphsInternational Journal of Computer Vision10.1007/s11263-018-1085-3126:8(875-896)Online publication date: 1-Aug-2018
https://dl.acm.org/doi/10.1007/s11263-018-1085-3
Wu BLyu SGhanem B(2015)ML-MGProceedings of the 2015 IEEE International Conference on Computer Vision (ICCV)10.1109/ICCV.2015.473(4157-4165)Online publication date: 7-Dec-2015
https://dl.acm.org/doi/10.1109/ICCV.2015.473
Zhang XZhao XLi ZXia JJain RChao W(2013)Social image tagging using graph-based reinforcement on multi-type interrelated objectsSignal Processing10.1016/j.sigpro.2012.05.02193:8(2178-2189)Online publication date: 1-Aug-2013
https://dl.acm.org/doi/10.1016/j.sigpro.2012.05.021
Show More Cited By

Index Terms

Collaborative learning for image and video annotation
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision tasks
        Video summarization
2. Information systems
  1. Information retrieval
    1. Document representation
    2. Search engine architectures and scalability
      1. Search engine indexing

Recommendations

Ensemble of Two-Class Classifiers for Image Annotation
ETTANDGRS '08: Proceedings of the 2008 International Workshop on Education Technology and Training & 2008 International Workshop on Geoscience and Remote Sensing - Volume 01

Image annotation can be formulated as a multi-class classification problem. A multi-class classification problem can be solved by ensemble classifiers. We investigate the ensemble of multiple two-class classifiers based on MPEG-7 standard. To get ride ...
Multi-view multi-label learning for image annotation

Image annotation is posed as multi-class classification problem. Pursuing higher accuracy is a permanent but not stale challenge in the field of image annotation. To further improve the accuracy of image annotation, we propose a multi-view multi-label (...
Multi-class particle swarm model selection for automatic image annotation

This article describes the application of particle swarm model selection (PSMS) to the problem of automatic image annotation (AIA). PSMS can be considered a black-box tool for the selection of effective classifiers in binary classification problems. We ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

MIR '08: Proceedings of the 1st ACM international conference on Multimedia information retrieval

October 2008

506 pages

ISBN:9781605583129

DOI:10.1145/1460096

General Chair:
Michael S. Lew
Leiden University, The Netherlands
,
Program Chairs:
Alberto del Bimbo
University of Florence, Italy
,
Erwin M. Bakker
Leiden University, The Netherlands

Copyright © 2008 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 30 October 2008

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

MM08

Sponsor:

MM08: ACM Multimedia Conference 2008

October 30 - 31, 2008

British Columbia, Vancouver, Canada

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

14
Total Citations
View Citations
364
Total Downloads

Downloads (Last 12 months)1
Downloads (Last 6 weeks)0

Reflects downloads up to 25 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Wu BJia FLiu WGhanem BLyu S(2018)Multi-label Learning with Missing Labels Using Mixed Dependency GraphsInternational Journal of Computer Vision10.1007/s11263-018-1085-3126:8(875-896)Online publication date: 1-Aug-2018
https://dl.acm.org/doi/10.1007/s11263-018-1085-3
Wu BLyu SGhanem B(2015)ML-MGProceedings of the 2015 IEEE International Conference on Computer Vision (ICCV)10.1109/ICCV.2015.473(4157-4165)Online publication date: 7-Dec-2015
https://dl.acm.org/doi/10.1109/ICCV.2015.473
Zhang XZhao XLi ZXia JJain RChao W(2013)Social image tagging using graph-based reinforcement on multi-type interrelated objectsSignal Processing10.1016/j.sigpro.2012.05.02193:8(2178-2189)Online publication date: 1-Aug-2013
https://dl.acm.org/doi/10.1016/j.sigpro.2012.05.021
Zhang XLi ZChao W(2013)Improving image tags by exploiting web search resultsMultimedia Tools and Applications10.1007/s11042-011-0863-562:3(601-631)Online publication date: 1-Feb-2013
https://dl.acm.org/doi/10.1007/s11042-011-0863-5
Li GWang MLu ZHong RChua T(2012)In-video product annotation with web information miningACM Transactions on Multimedia Computing, Communications, and Applications10.1145/2379790.23797978:4(1-19)Online publication date: 30-Nov-2012
https://dl.acm.org/doi/10.1145/2379790.2379797
Geng BLi YTao DWang MZha ZXu C(2012)Parallel Lasso for Large-Scale Video Concept DetectionIEEE Transactions on Multimedia10.1109/TMM.2011.217478114:1(55-65)Online publication date: 1-Feb-2012
https://dl.acm.org/doi/10.1109/TMM.2011.2174781
Zhang XHuang ZShen HYang YLi Z(2012)Automatic tagging by exploring tag information capability and correlationWorld Wide Web10.1007/s11280-011-0132-615:3(233-256)Online publication date: 1-May-2012
https://dl.acm.org/doi/10.1007/s11280-011-0132-6
Zhang XLi ZChao W(2012)Tagging image by merging multiple features in a integrated mannerJournal of Intelligent Information Systems10.1007/s10844-011-0184-139:1(87-107)Online publication date: 1-Aug-2012
https://dl.acm.org/doi/10.1007/s10844-011-0184-1
Zhang XLi ZLong Y(2011)Tagging image by exploring weighted correlation between visual features and tagsProceedings of the 12th international conference on Web-age information management10.5555/2035562.2035596(277-289)Online publication date: 14-Sep-2011
https://dl.acm.org/doi/10.5555/2035562.2035596
Zhang XHuang ZShen HLi Z(2011)Probabilistic image tagging with tags expanded by text-based searchProceedings of the 16th international conference on Database systems for advanced applications - Volume Part I10.5555/1997305.1997333(269-283)Online publication date: 22-Apr-2011
https://dl.acm.org/doi/10.5555/1997305.1997333
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten