short-paper

Towards a universal detector by mining concepts with small semantic gaps

Authors:

Shuicheng YanAuthors Info & Claims

MM '10: Proceedings of the 18th ACM international conference on Multimedia

Pages 1707 - 1710

https://doi.org/10.1145/1873951.1874332

Published: 25 October 2010 Publication History

Abstract

Can we have a universal detector that could recognize unseen objects with no training exemplars available? Such a detector is so desirable, as there are hundreds of thousands of object concepts in human vocabulary but few available labeled image examples. In this study, we attempt to build such a universal detector to predict concepts in the absence of training data. First, by considering both semantic relatedness and visual variance, we mine a set of realistic small-semantic-gap (SSG) concepts from a large-scale image corpus. Detectors of these concepts can deliver reasonably satisfactory recognition accuracies. From these distinctive visual models, we then leverage the semantic ontology knowledge and co-occurrence statistics of concepts to extend visual recognition to unseen concepts. To the best of our knowledge, this work presents the first research attempting to substantiate the semantic gap measuring of a large amount of concepts and leverage visually learnable concepts to predicate those with no training images available. Testings on NUS-WIDE dataset demonstrate that the selected concepts with small semantic gaps can be well modeled and the prediction of unseen concepts delivers promising results with comparable accuracy to preliminary training-based methods.

References

[1]

T. Chua, J. Tang, R. Hong, H. Li, Z. Luo, and Y. Zheng. Nus-wide: A real-world web image database from national university of singapore. In CIVR, 2009.

Digital Library

[2]

R. Cilibrasi and P. Vitányi. The google similarity distance. TKDE, 2007.

Digital Library

[3]

J. Deng, W. Dong, R. Socher, L. Li, K. Li, and F. Li. Imagenet: A large-scale hierarchical image database. In CVPR, 2009.

[4]

R. Duda, D. Stork, and P. Hart. Pattern Classification. John Wiley, 2000.

Digital Library

[5]

F. Li, A. Iyer, C. Koch, and P. Perona. What do we perceive in a glance of a real-world scene? Journal of Vision, 2007.

[6]

C. Fellbaum. WordNet: An Electronic Lexical Database. MIT Press, 1998.

[7]

Y. Gao and J. Fan. Incorporating concept ontology to enable probabilistic concept reasoning for multi-level image annotation. In MIR, 2006.

Digital Library

[8]

G. Griffin and D. Perona. Learning and using taxonomies for fast visual categorization. In CVPR, 2008.

[9]

Y. Jiang, C. Ngo, and S. Chang. Semantic context transfer across heterogeneous sources for domain adaptive video search. In MM, 2009.

Digital Library

[10]

D. Liu, X.-S. Hua, L. Yang, M. Wang and H.-J. Zhang, Tag ranking, In WWW, 2009.

Digital Library

[11]

E. Rosch and B. Lloyd. Cognition and categorization. Hillsdale, NJ: Lawrence Erlbaum, 1978.

[12]

B. Schölkopf and A. Smola. Learning with Kernels: Support Vector Machines, Regularization, Optimization and Beyond. MIT Press, 2002.

Digital Library

[13]

J. Tang, S. Yan, R. Hong, G. Qi, and T. Chua. Inferring semantic concepts from community-contributed images and noisy tags. In MM, 2009.

Digital Library

[14]

B. Tversky and K. Hemenway. Categories of environmental scenes. Cognitive Psychology, 1983.

[15]

Z. Wu and M. Palmer. Verb semantics and lexical selection. In ACL, 1994.

Digital Library

[16]

J. Zhang, M. Marszalek, S. Lazebnik, and C. Schmid. Local features and kernels for classification of texture and object categories: A comprehensive study. IJCV, 2007.

Digital Library

[17]

A. Zweig and D. Weinshall. Exploiting object hierarchy: Combining models from different category levels. In ICCV, 2007.

Cited By

Oh SMccloskey SKim IVahdat ACannons KHajimirsadeghi HMori GPerera APandey MCorso J(2014)Multimedia event detection with multimodal feature fusion and temporal concept localizationMachine Vision and Applications10.1007/s00138-013-0525-x25:1(49-69)Online publication date: 1-Jan-2014
https://dl.acm.org/doi/10.1007/s00138-013-0525-x
Lang CFeng JZheng Y(2012)Short communicationExpert Systems with Applications: An International Journal10.1016/j.eswa.2012.03.01239:12(11312-11320)Online publication date: 1-Sep-2012
https://dl.acm.org/doi/10.1016/j.eswa.2012.03.012
Wan KZheng YChaisorn LCandan KPanchanathan SPrabhakaran BSundaram HFeng WSebe N(2011)Known-item video search via query-to-modality mappingProceedings of the 19th ACM international conference on Multimedia10.1145/2072298.2071957(1133-1136)Online publication date: 28-Nov-2011
https://dl.acm.org/doi/10.1145/2072298.2071957

Index Terms

Towards a universal detector by mining concepts with small semantic gaps
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision tasks

Recommendations

Short communication: Towards a universal detector by mining concepts with small semantic gaps

Can we have a universal detector that could visually recognize unseen objects with no training exemplars available? Such a detector is so desirable, as there are hundreds of thousands of object concepts in human vocabulary but few labeled image examples ...
Constructing Concept Lexica With Small Semantic Gaps

In recent years, constructing mathematical models for visual concepts by using content features, i.e., color, texture, shape, or local features, has led to the fast development of concept-based multimedia retrieval. In concept-based multimedia retrieval,...
Multi-level feature representations for video semantic concept detection

Video semantic concept detection is a fundamental problem with many practical applications such as concept-based video retrieval. The major challenge of concept detection lies in the existence of the well-known semantic gap between the low-level visual ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

MM '10: Proceedings of the 18th ACM international conference on Multimedia

October 2010

1836 pages

ISBN:9781605589336

DOI:10.1145/1873951

General Chairs:
Alberto del Bimbo
University of Florence, Italy
,
Shih-Fu Chang
Columbia University, USA
,
Program Chair:
Arnold Smeulders
University of Amsterdam, NL

Copyright © 2010 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 25 October 2010

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Short-paper

Conference

MM '10

Sponsor:

SIGMM

MM '10: ACM Multimedia Conference

October 25 - 29, 2010

Firenze, Italy

Acceptance Rates

Overall Acceptance Rate 995 of 4,171 submissions, 24%

Upcoming Conference

MM '24

Sponsor:
sigmm

The 32nd ACM International Conference on Multimedia

October 28 - November 1, 2024

Melbourne , VIC , Australia

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

3
Total Citations
View Citations
224
Total Downloads

Downloads (Last 12 months)1
Downloads (Last 6 weeks)0

Reflects downloads up to 27 Jul 2024

Other Metrics

View Author Metrics

Citations

Cited By

Oh SMccloskey SKim IVahdat ACannons KHajimirsadeghi HMori GPerera APandey MCorso J(2014)Multimedia event detection with multimodal feature fusion and temporal concept localizationMachine Vision and Applications10.1007/s00138-013-0525-x25:1(49-69)Online publication date: 1-Jan-2014
https://dl.acm.org/doi/10.1007/s00138-013-0525-x
Lang CFeng JZheng Y(2012)Short communicationExpert Systems with Applications: An International Journal10.1016/j.eswa.2012.03.01239:12(11312-11320)Online publication date: 1-Sep-2012
https://dl.acm.org/doi/10.1016/j.eswa.2012.03.012
Wan KZheng YChaisorn LCandan KPanchanathan SPrabhakaran BSundaram HFeng WSebe N(2011)Known-item video search via query-to-modality mappingProceedings of the 19th ACM international conference on Multimedia10.1145/2072298.2071957(1133-1136)Online publication date: 28-Nov-2011
https://dl.acm.org/doi/10.1145/2072298.2071957

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents