Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2187836.2187876acmotherconferencesArticle/Chapter ViewAbstractPublication PagesthewebconfConference Proceedingsconference-collections
research-article

Understanding web images by object relation network

Published: 16 April 2012 Publication History

Abstract

This paper presents an automatic method for understanding and interpreting the semantics of unannotated web images. We observe that the relations between objects in an image carry important semantics about the image. To capture and describe such semantics, we propose Object Relation Network (ORN), a graph model representing the most probable meaning of the objects and their relations in an image. Guided and constrained by an ontology, ORN transfers the rich semantics in the ontology to image objects and the relations between them, while maintaining semantic consistency (e.g., a soccer player can kick a soccer ball, but cannot ride it). We present an automatic system which takes a raw image as input and creates an ORN based on image visual appearance and the guide ontology. We demonstrate various useful web applications enabled by ORNs, such as automatic image tagging, automatic image description generation, and image search by image.

References

[1]
M. Ayer, H. Brunk, G. Ewing, W. Reid, and E. Silverman. An empirical distribution function for sampling with incomplete information. Annals of Mathematical Statistics, 1955.
[2]
G. Bradski and A. Kaehler. Learning OpenCV: Computer Vision with the OpenC V Library. O'Reilly, Cambridge, MA, 2008.
[3]
R.Datta, W.Ge, J.Li, and J.Wang. Toward bridging the annotation-retrieval gap in image search. Multimedia, IEEE, 2007.
[4]
J.Deng, W.Dong, R.Socher, L.-J.Li, K.Li,and L. Fei-Fei. Imagenet: A large-scale hierarchical image database. CVPR, 2009.
[5]
M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn, and A. Zisserman. The PASCAL Visual Object Classes Challenge 2011 (VOC2011) Results. http://www.pascal-network.org/challenges/VOC/voc2011/workshop/index.html.
[6]
J. Fan, Y. Gao, and H. Luo. Integrating concept ontology and multitask learning to achieve more effective classifier training for multilevel image annotation. Image Processing, IEEE Transactions on, 2008.
[7]
C. Fellbaum. Word Net An Electronic Lexical Database. The MIT Press, Cambridge, MA ; London, 1998.
[8]
P. Felzenszwalb, R. Girshick, D. McAllester, and D. Ramanan. Object detection with discriminatively trained part based models.IEEE TPAMI, 32(9), 2010.
[9]
P. F. Felzenszwalb, R. B. Girshick, and D. McAllester. Discriminatively trained deformable part models, release 4. http://www.cs.brown.edu/~pff/latent-release4/.
[10]
X. He, R. S. Zemel, and M. A. Carreira-Perpinan. Multiscale conditional random fields for image labeling. In CVPR, 2004.
[11]
G. Kulkarni, V. Premraj, S. Dhar, S. Li, Y. Choi, A. C. Berg, and T. L. Berg. Baby Talk: Understanding and Generating Image Descriptions. In CVPR, 2011.
[12]
L. Ladicky, P. Sturgess, K. Alahari, C. Russell, and P. H. Torr. What, where and how many? combining object detectors and crfs. In ECCV, 2010.
[13]
J. Li and J. Z. Wang. Real-time computerized annotation of pictures. In Proceedings of the 14th annual ACM international conference on Multimedia, 2006.
[14]
L.-J. Li, R. Socher, and L. Fei-Fei. Towards total scene understanding: Classification, annotation and segmentation in an automatic framework. In CVPR, 2009.
[15]
D. Liu, X.-S. Hua, L. Yang, M. Wang, and H.-J. Zhang. Tag ranking. In WWW, 2009.
[16]
A. Maedche and S. Staab. Measuring similarity between ontologies. In EKAW, 2002.
[17]
M. Marszalek and C. Schmid. Semantic hierarchies for visual object recognition. In CVPR, 2007.
[18]
I. Nwogu, V. Govindaraju, and C. Brown. Syntactic image parsing using ontology and semantic descriptions. In CVPR, 2010.
[19]
G.-J. Qi, C. Aggarwal, and T. Huang. Towards semantic knowledge propagation from text corpus to web images. In WWW, 2011.
[20]
A. Rabinovich, A. Vedaldi, C. Galleguillos, E. Wiewiora, and S. Belongie. Objects in context. In ICCV, 2007.
[21]
C. Saathoff and A. Scherp. Unlocking the semantics of multimedia presentations in the web with the multimedia metadata ontology. In WWW, 2010.
[22]
B. Scholkopf and A. J. Smola. Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond . The MIT Press, Cambridge, MA, 2002.
[23]
A. T. G. Schreiber, B. Dubbeldam, J. Wielemaker, and B. Wielinga. Ontology-based photo annotation. IEEE Intelligent Systems, 2001.
[24]
B. Sigurbjornsson and R. van Zwol. Flickr tag recommendation based on collective knowledge. In WWW, 2008.
[25]
M. Srikanth, J. Varner, M. Bowden, and D. Moldovan. Exploiting ontologies for automatic image annotation. In SIGIR, 2005.
[26]
A. Torralba, K. Murphy, and W. T. Freeman. Using the forest to see the trees: exploiting context for visual object detection and localization. In Commun. ACM, 2010.
[27]
Z. Tu, X. Chen, A. Yuille, and S. Zhu. Image parsing: Unifying segmentation, detection, and recognition. IJCV, 2005.
[28]
J. Weston, S. Bengio, and N. Usunier. Large scale image annotation: Learning to rank with joint word-image embeddings.European Conference on Machine Learning, 2010.
[29]
L. Wu, L. Yang, N. Yu, and X.-S. Hua. Learning to tag. In WWW, 2009.
[30]
B. Zadrozny and C. Elkan. Transforming classifier scores into accurate multiclass probability estimates. In ACM SIGKDD, 2002.

Cited By

View all
  • (2024) Enhancing scene‐text visual question answering with relational reasoning, attention and dynamic vocabulary integration Computational Intelligence10.1111/coin.1263540:1Online publication date: 20-Feb-2024
  • (2023)RUArt: A Novel Text-Centered Solution for Text-Based Visual Question AnsweringIEEE Transactions on Multimedia10.1109/TMM.2021.312019425(1-12)Online publication date: 2023
  • (2023)Harnessing Prior Knowledge for Explainable Machine Learning: An Overview2023 IEEE Conference on Secure and Trustworthy Machine Learning (SaTML)10.1109/SaTML54575.2023.00038(450-463)Online publication date: Feb-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
WWW '12: Proceedings of the 21st international conference on World Wide Web
April 2012
1078 pages
ISBN:9781450312295
DOI:10.1145/2187836
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

  • Univ. de Lyon: Universite de Lyon

In-Cooperation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 16 April 2012

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. detection
  2. image semantics
  3. image understanding
  4. ontology

Qualifiers

  • Research-article

Conference

WWW 2012
Sponsor:
  • Univ. de Lyon
WWW 2012: 21st World Wide Web Conference 2012
April 16 - 20, 2012
Lyon, France

Acceptance Rates

Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)5
  • Downloads (Last 6 weeks)1
Reflects downloads up to 16 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2024) Enhancing scene‐text visual question answering with relational reasoning, attention and dynamic vocabulary integration Computational Intelligence10.1111/coin.1263540:1Online publication date: 20-Feb-2024
  • (2023)RUArt: A Novel Text-Centered Solution for Text-Based Visual Question AnsweringIEEE Transactions on Multimedia10.1109/TMM.2021.312019425(1-12)Online publication date: 2023
  • (2023)Harnessing Prior Knowledge for Explainable Machine Learning: An Overview2023 IEEE Conference on Secure and Trustworthy Machine Learning (SaTML)10.1109/SaTML54575.2023.00038(450-463)Online publication date: Feb-2023
  • (2022)Readability of Graphical Contents on World Wide Web (WWW)2022 17th Iberian Conference on Information Systems and Technologies (CISTI)10.23919/CISTI54924.2022.9820011(1-4)Online publication date: 22-Jun-2022
  • (2022)Web Images Relevance and Quality: User EvaluationProceedings of the 5th International Conference on Computer Science and Software Engineering10.1145/3569966.3569984(66-69)Online publication date: 21-Oct-2022
  • (2022)TD-Road: Top-Down Road Network Extraction with Holistic Graph ConstructionComputer Vision – ECCV 202210.1007/978-3-031-20077-9_33(562-577)Online publication date: 6-Nov-2022
  • (2022)Image Relevance on Websites and ReadabilityInformation Systems and Technologies10.1007/978-3-031-04826-5_28(286-295)Online publication date: 11-May-2022
  • (2020)Research on Visual Relation Detection Based on Computer Vision2020 3rd International Conference on Advanced Electronic Materials, Computers and Software Engineering (AEMCSE)10.1109/AEMCSE50948.2020.00080(342-345)Online publication date: Apr-2020
  • (2019)Compensating Supervision Incompleteness with Prior Knowledge in Semantic Image Interpretation2019 International Joint Conference on Neural Networks (IJCNN)10.1109/IJCNN.2019.8852413(1-8)Online publication date: Jul-2019
  • (2019)Image region label refinement using spatial position relation graphKnowledge-Based Systems10.1016/j.knosys.2018.12.010166(82-91)Online publication date: Feb-2019
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media