Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1291233.1291309acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
Article

Multi-layer multi-instance kernel for video concept detection

Published: 29 September 2007 Publication History

Abstract

In video concept detection, most existing methods have not well studied the intrinsic hierarchical structure of video content. However, unlike flat attribute-value data used in many existing methods, video is essentially a structured media with multi-layer representation. For example, a video can be represented by a hierarchical structure including, from large to small, shot, key-frame, and region. Moreover, it fits the typical Multi-Instance (MI) setting in which the "bag-instance" correspondence is embedded among contiguous layers. We call such multi-layer structure and the "bag-instance" relation embedded in the structure as Multi-Layer Multi-Instance (MLMI) setting in this paper. We formulate video concept detection as an MLMI learning problem in which a rooted tree with MLMI nature embedded is devised to represent a video segment. Furthermore, by fusing the information from different layers, we construct a novel MLMI kernel to measure the similarities between the instances in the same and different layers. In contrast to traditional MI learning, both the Multi-Layer structure and Multi-Instance relations are leveraged simultaneously in the proposed kernel. We applied MLMI kernel to concept detection task on TRECVID 2005 corpus and reported superior performance (+25% in Mean Average Precision) to standard Support Vector Machine based approaches.

References

[1]
TRECVID: TREC Video Retrieval Evaluation. http://www-nlpir.nist.gov/projects/trecvid.
[2]
Y. Chen, J. Bi, and J. Z. Wang. MILES: Multiple-instance learning via embedded instance selection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(12):1931--1947, 2006.
[3]
Y. Chen and J. Z. Wang. Image categorization by learning and reasoning with regions. Journal of Machine Learning Research, 5:913--939, 2004.
[4]
M. Collins and N. Duffy. Convolution kernels for natural language. In Advances in Neural Information Processing Systems, volume 14. MIT Press, 2002.
[5]
Y. Deng and B. S. Manjunath. Unsupervised segmentation of color-texture regions in images and video. IEEE Transactions on Pattern Analysis and Machine Intelligence, 23(8):800--810, Aug 2001.
[6]
K. T. H. Kashima and A. Inokuchi. Marginalized kernels between labeled graphs. In In Proceedings of the 20th International Conference on Machine Learning, 2003.
[7]
D. Haussler. Convolution kernels on discrete structures. UC Santa Cruz, Tech. Rep. UCSC-CRL-99-10, July 1999.
[8]
M. Naphade and J. Smith. A generalized multiple instance learning algorithm for large scale modeling of multimedia semantics. In IEEE International Conference on Acoustics, Speech and Signal Processing, May 2005.
[9]
T. Gartner, P. A. Flach and A. J. Smola. Multi-instance kernels. In Proc. 19th International Conf. on Machine Learning. 179--186, San Francisco, CA, 2002.
[10]
D. Tao, X. Tang, X. Li, and Y. Rui. Direct kernel biased discriminant analysis: A new content-based image retrieval relevance feedback algorithm. IEEE Transactions on Multimedia, 8(4):716--727, 2006.

Cited By

View all
  • (2014)Stop-Frame Removal Improves Web Video ClassificationProceedings of International Conference on Multimedia Retrieval10.1145/2578726.2578803(499-502)Online publication date: 1-Apr-2014
  • (2013)HyDR-MIInformation Sciences: an International Journal10.1016/j.ins.2011.01.034222(282-301)Online publication date: 1-Feb-2013
  • (2013)Information Network Construction and Alignment from Automatically Acquired Comparable CorporaBuilding and Using Comparable Corpora10.1007/978-3-642-20128-8_13(243-263)Online publication date: 14-Dec-2013
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
MM '07: Proceedings of the 15th ACM international conference on Multimedia
September 2007
1115 pages
ISBN:9781595937025
DOI:10.1145/1291233
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 29 September 2007

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. kernel machine
  2. multi-layer multi-instance learning
  3. video concept detection

Qualifiers

  • Article

Conference

MM07

Acceptance Rates

Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)1
  • Downloads (Last 6 weeks)0
Reflects downloads up to 12 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2014)Stop-Frame Removal Improves Web Video ClassificationProceedings of International Conference on Multimedia Retrieval10.1145/2578726.2578803(499-502)Online publication date: 1-Apr-2014
  • (2013)HyDR-MIInformation Sciences: an International Journal10.1016/j.ins.2011.01.034222(282-301)Online publication date: 1-Feb-2013
  • (2013)Information Network Construction and Alignment from Automatically Acquired Comparable CorporaBuilding and Using Comparable Corpora10.1007/978-3-642-20128-8_13(243-263)Online publication date: 14-Dec-2013
  • (2012)Semi-supervised multi-instance multi-label learning for video annotation taskProceedings of the 20th ACM international conference on Multimedia10.1145/2393347.2396300(737-740)Online publication date: 29-Oct-2012
  • (2012)Web-Scale Multimedia Information NetworksProceedings of the IEEE10.1109/JPROC.2012.2201909100:9(2688-2704)Online publication date: Sep-2012
  • (2010)Enhancing multi-lingual information extraction via cross-media inference and fusionProceedings of the 23rd International Conference on Computational Linguistics: Posters10.5555/1944566.1944638(630-638)Online publication date: 23-Aug-2010
  • (2010)G3P-MIInformation Sciences: an International Journal10.1016/j.ins.2010.07.031180:23(4496-4513)Online publication date: 1-Dec-2010
  • (2010)Visual Concept Learning from Weakly Labeled Web VideosVideo Search and Mining10.1007/978-3-642-12900-1_8(203-232)Online publication date: 2010
  • (2008)MILC2Proceedings of the 14th international conference on Advances in multimedia modeling10.5555/1785794.1785798(24-34)Online publication date: 9-Jan-2008
  • (2008)Multi-Layer Multi-Instance Learning for Video Concept DetectionIEEE Transactions on Multimedia10.1109/TMM.2008.200729010:8(1605-1616)Online publication date: 1-Dec-2008
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media