Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1460096.1460168acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

Collaborative learning for image and video annotation

Published: 30 October 2008 Publication History

Abstract

Classical machine learning methods, such as Support Vector Machines, by taking each concept detection as an independent classification problem, can not achieve a sound performance for image and video annotation due to the overfitting problems. Thus, some prior knowledge is required to assist the learning of independent concept detectors, e.g. some concepts look much more alike to each other. In this paper, we assume that visually similar concepts should share resembled detectors. Based on the assumption, Collaborative Learning is proposed, to incorporate cross-concept collaborations into the joint learning of similar detectors over related concepts. Besides the collaborations, different concepts should also perform discriminations for classifying each other. To benefit from different trade-offs between collaboration and discrimination, we propose Multi-Granularity Boosting strategy, where each granularity corresponds to a specific balance between collaboration and discrimination for Collaborative Learning. The ultimate concept detector is an additive model that combines classifiers under different collaboration granularities together. Evaluations on both image and video annotation benchmark demonstrate that our method achieves a superior performance over independent annotation.

References

[1]
Trecvid. http://www-nlpir.nist.gov/projects/trecvid/.
[2]
A. Amir and et al. Ibm research trecvid-2003 video retrieval system. In TRECVID Proceedings, 2003.
[3]
N. Aronszajn. Theory of reproducing kernels. Trans. Am. Math. Soc., (68):337--404, 1950.
[4]
E. Bart and S. Ullman. Cross-generalization: Learning novel classes from a single example by feature replacement. In IEEE CVPR, 2005.
[5]
S. Boyd and L. Vandenberghe. Convex Optimization. Cambridge University Press, 2004.
[6]
S.-F. Chang and et al. Columbia university trecvid-2006 video search and high-level feature extraction. In TRECVID Proceedings, 2006.
[7]
J. Fan, Y. Gao, and H. Luo. Hierarchical classification for automatic image annotation. In ACM SIGIR, 2007.
[8]
J. Friedman, T. Hastie, and R. Tibshirani. Additive logistic regression: a statistical view of boosting. In Annals of statistics, 2000.
[9]
A. Hauptmann, M.-Y. Chen, and M. Christel. Confounded expectations: Informedia at trecvid 2004. In TRECVID Proceedings, 2004.
[10]
G. A. Miller. Wordnet: a lexical database for english. Commun. ACM, 38(11), 1995.
[11]
P. Moreno, P. Ho, and N. Vasconcelos. A kullback-leibler divergence based kernel for svm classification in multimedia applications. In NIPS, 2003.
[12]
M. R. Naphade. Statistical techniques in video data management. In IEEE Workshop on Multimedia Signal Processing, 2002.
[13]
M. R. Naphade and et al. A light scale concept ontology for multimedia. Technical report, 2005.
[14]
M. R. Naphade, I. Kozintsev, and T. S. Huang. Factor graph framework for semantic video indexing. IEEE Trans. on CSVT, 12(1), Jan. 2002.
[15]
G.-J. Qi, X.-S. Hua, Y. Rui, J. Tang, T. Mei, and H.-J. Zhang. Correlative multi-label video annotation. In ACM Multimedia, 2007.
[16]
A. Torralba and K. P. Murphy. Sharing visual features for multiclass and multiview object detection. IEEE Trans. on PAMI, 29(5), 2007.
[17]
V. N. Vapnik. Statistical Learning Theory.Wiley-Interscience, 1998.
[18]
L. Wu, X.-S. Hua, N. Yu,W.-Y. Ma, and S. Li. Flickr distance. In ACM Multimedia, 2008.
[19]
R. Yan, J. Tesic, and J. R. Smith.Model-shared subspace boosting for multi-label classification. In ACM SIGKDD, 2007.
[20]
A. Yanagawa, S.-F. Chang, L. Kennedy, andW. Hsu. Columbia university's baseline detectors for 374 lscom semantic visual concepts. Columbia University ADVENT Technical Report, (222--2006--8), 2007.
[21]
J. Yang, R. Yan, and A. G. Hauptmann. Cross-domain video concept detection using adaptive svms. In ACM Multimedia, 2007.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
MIR '08: Proceedings of the 1st ACM international conference on Multimedia information retrieval
October 2008
506 pages
ISBN:9781605583129
DOI:10.1145/1460096
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 30 October 2008

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. concept correlation
  2. image annotation
  3. video annotation

Qualifiers

  • Research-article

Conference

MM08
Sponsor:
MM08: ACM Multimedia Conference 2008
October 30 - 31, 2008
British Columbia, Vancouver, Canada

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)1
  • Downloads (Last 6 weeks)0
Reflects downloads up to 25 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2018)Multi-label Learning with Missing Labels Using Mixed Dependency GraphsInternational Journal of Computer Vision10.1007/s11263-018-1085-3126:8(875-896)Online publication date: 1-Aug-2018
  • (2015)ML-MGProceedings of the 2015 IEEE International Conference on Computer Vision (ICCV)10.1109/ICCV.2015.473(4157-4165)Online publication date: 7-Dec-2015
  • (2013)Social image tagging using graph-based reinforcement on multi-type interrelated objectsSignal Processing10.1016/j.sigpro.2012.05.02193:8(2178-2189)Online publication date: 1-Aug-2013
  • (2013)Improving image tags by exploiting web search resultsMultimedia Tools and Applications10.1007/s11042-011-0863-562:3(601-631)Online publication date: 1-Feb-2013
  • (2012)In-video product annotation with web information miningACM Transactions on Multimedia Computing, Communications, and Applications10.1145/2379790.23797978:4(1-19)Online publication date: 30-Nov-2012
  • (2012)Parallel Lasso for Large-Scale Video Concept DetectionIEEE Transactions on Multimedia10.1109/TMM.2011.217478114:1(55-65)Online publication date: 1-Feb-2012
  • (2012)Automatic tagging by exploring tag information capability and correlationWorld Wide Web10.1007/s11280-011-0132-615:3(233-256)Online publication date: 1-May-2012
  • (2012)Tagging image by merging multiple features in a integrated mannerJournal of Intelligent Information Systems10.1007/s10844-011-0184-139:1(87-107)Online publication date: 1-Aug-2012
  • (2011)Tagging image by exploring weighted correlation between visual features and tagsProceedings of the 12th international conference on Web-age information management10.5555/2035562.2035596(277-289)Online publication date: 14-Sep-2011
  • (2011)Probabilistic image tagging with tags expanded by text-based searchProceedings of the 16th international conference on Database systems for advanced applications - Volume Part I10.5555/1997305.1997333(269-283)Online publication date: 22-Apr-2011
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media