research-article

Understanding web images by object relation network

Authors:

Viktor PrasannaAuthors Info & Claims

WWW '12: Proceedings of the 21st international conference on World Wide Web

Pages 291 - 300

https://doi.org/10.1145/2187836.2187876

Published: 16 April 2012 Publication History

Abstract

This paper presents an automatic method for understanding and interpreting the semantics of unannotated web images. We observe that the relations between objects in an image carry important semantics about the image. To capture and describe such semantics, we propose Object Relation Network (ORN), a graph model representing the most probable meaning of the objects and their relations in an image. Guided and constrained by an ontology, ORN transfers the rich semantics in the ontology to image objects and the relations between them, while maintaining semantic consistency (e.g., a soccer player can kick a soccer ball, but cannot ride it). We present an automatic system which takes a raw image as input and creates an ORN based on image visual appearance and the guide ontology. We demonstrate various useful web applications enabled by ORNs, such as automatic image tagging, automatic image description generation, and image search by image.

References

[1]

M. Ayer, H. Brunk, G. Ewing, W. Reid, and E. Silverman. An empirical distribution function for sampling with incomplete information. Annals of Mathematical Statistics, 1955.

[2]

G. Bradski and A. Kaehler. Learning OpenCV: Computer Vision with the OpenC V Library. O'Reilly, Cambridge, MA, 2008.

[3]

R.Datta, W.Ge, J.Li, and J.Wang. Toward bridging the annotation-retrieval gap in image search. Multimedia, IEEE, 2007.

Digital Library

[4]

J.Deng, W.Dong, R.Socher, L.-J.Li, K.Li,and L. Fei-Fei. Imagenet: A large-scale hierarchical image database. CVPR, 2009.

[5]

M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn, and A. Zisserman. The PASCAL Visual Object Classes Challenge 2011 (VOC2011) Results. http://www.pascal-network.org/challenges/VOC/voc2011/workshop/index.html.

[6]

J. Fan, Y. Gao, and H. Luo. Integrating concept ontology and multitask learning to achieve more effective classifier training for multilevel image annotation. Image Processing, IEEE Transactions on, 2008.

Digital Library

[7]

C. Fellbaum. Word Net An Electronic Lexical Database. The MIT Press, Cambridge, MA ; London, 1998.

[8]

P. Felzenszwalb, R. Girshick, D. McAllester, and D. Ramanan. Object detection with discriminatively trained part based models.IEEE TPAMI, 32(9), 2010.

Digital Library

[9]

P. F. Felzenszwalb, R. B. Girshick, and D. McAllester. Discriminatively trained deformable part models, release 4. http://www.cs.brown.edu/~pff/latent-release4/.

[10]

X. He, R. S. Zemel, and M. A. Carreira-Perpinan. Multiscale conditional random fields for image labeling. In CVPR, 2004.

Digital Library

[11]

G. Kulkarni, V. Premraj, S. Dhar, S. Li, Y. Choi, A. C. Berg, and T. L. Berg. Baby Talk: Understanding and Generating Image Descriptions. In CVPR, 2011.

Digital Library

[12]

L. Ladicky, P. Sturgess, K. Alahari, C. Russell, and P. H. Torr. What, where and how many? combining object detectors and crfs. In ECCV, 2010.

Digital Library

[13]

J. Li and J. Z. Wang. Real-time computerized annotation of pictures. In Proceedings of the 14th annual ACM international conference on Multimedia, 2006.

Digital Library

[14]

L.-J. Li, R. Socher, and L. Fei-Fei. Towards total scene understanding: Classification, annotation and segmentation in an automatic framework. In CVPR, 2009.

[15]

D. Liu, X.-S. Hua, L. Yang, M. Wang, and H.-J. Zhang. Tag ranking. In WWW, 2009.

Digital Library

[16]

A. Maedche and S. Staab. Measuring similarity between ontologies. In EKAW, 2002.

Digital Library

[17]

M. Marszalek and C. Schmid. Semantic hierarchies for visual object recognition. In CVPR, 2007.

[18]

I. Nwogu, V. Govindaraju, and C. Brown. Syntactic image parsing using ontology and semantic descriptions. In CVPR, 2010.

[19]

G.-J. Qi, C. Aggarwal, and T. Huang. Towards semantic knowledge propagation from text corpus to web images. In WWW, 2011.

Digital Library

[20]

A. Rabinovich, A. Vedaldi, C. Galleguillos, E. Wiewiora, and S. Belongie. Objects in context. In ICCV, 2007.

[21]

C. Saathoff and A. Scherp. Unlocking the semantics of multimedia presentations in the web with the multimedia metadata ontology. In WWW, 2010.

Digital Library

[22]

B. Scholkopf and A. J. Smola. Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond . The MIT Press, Cambridge, MA, 2002.

Digital Library

[23]

A. T. G. Schreiber, B. Dubbeldam, J. Wielemaker, and B. Wielinga. Ontology-based photo annotation. IEEE Intelligent Systems, 2001.

Digital Library

[24]

B. Sigurbjornsson and R. van Zwol. Flickr tag recommendation based on collective knowledge. In WWW, 2008.

Digital Library

[25]

M. Srikanth, J. Varner, M. Bowden, and D. Moldovan. Exploiting ontologies for automatic image annotation. In SIGIR, 2005.

Digital Library

[26]

A. Torralba, K. Murphy, and W. T. Freeman. Using the forest to see the trees: exploiting context for visual object detection and localization. In Commun. ACM, 2010.

Digital Library

[27]

Z. Tu, X. Chen, A. Yuille, and S. Zhu. Image parsing: Unifying segmentation, detection, and recognition. IJCV, 2005.

Digital Library

[28]

J. Weston, S. Bengio, and N. Usunier. Large scale image annotation: Learning to rank with joint word-image embeddings.European Conference on Machine Learning, 2010.

[29]

L. Wu, L. Yang, N. Yu, and X.-S. Hua. Learning to tag. In WWW, 2009.

Digital Library

[30]

B. Zadrozny and C. Elkan. Transforming classifier scores into accurate multiclass probability estimates. In ACM SIGKDD, 2002.

Digital Library

Cited By

Agrawal MJalal ASharma H(2024) Enhancing scene‐text visual question answering with relational reasoning, attention and dynamic vocabulary integration Computational Intelligence10.1111/coin.1263540:1Online publication date: 20-Feb-2024
https://doi.org/10.1111/coin.12635
Jin ZWu HYang CZhou FQin JXiao LYin X(2023)RUArt: A Novel Text-Centered Solution for Text-Based Visual Question AnsweringIEEE Transactions on Multimedia10.1109/TMM.2021.312019425(1-12)Online publication date: 2023
https://doi.org/10.1109/TMM.2021.3120194
Beckh KMüller SJakobs MToborek VTan HFischer RWelke PHouben Svon Rueden L(2023)Harnessing Prior Knowledge for Explainable Machine Learning: An Overview2023 IEEE Conference on Secure and Trustworthy Machine Learning (SaTML)10.1109/SaTML54575.2023.00038(450-463)Online publication date: Feb-2023
https://doi.org/10.1109/SaTML54575.2023.00038
Show More Cited By

Index Terms

Understanding web images by object relation network
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision

Recommendations

Combining intra-image and inter-class semantics for consumer image retrieval

Unconstrained consumer photos pose great challenge for content-based image retrieval. Unlike professional images or domain-specific images, consumer photos vary significantly. More often than not, the objects in the photos are ill-posed, occluded, and ...
Towards indexing representative images on the web
MM '12: Proceedings of the 20th ACM international conference on Multimedia

Even after 20 years of research on real-world image retrieval, there is still a big gap between what search engines can provide and what users expect to see. To bridge this gap, we present an image knowledge base, ImageKB, a graph representation of ...
Understanding Semantic Web Applications
ASWC '08: Proceedings of the 3rd Asian Semantic Web Conference on The Semantic Web

Ten years have passed since the concept of the semantic web was proposed by Tim Berners-Lee. For these years, basic technologies for them such as RDF(S) and OWL were published. As a result, many systems using semantic technologies have been developed. ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

WWW '12: Proceedings of the 21st international conference on World Wide Web

April 2012

1078 pages

ISBN:9781450312295

DOI:10.1145/2187836

General Chairs:
Alain Mille
Université de Lyon, France
,
Fabien Gandon
INRIA, France
,
Jacques Misselis
HP, France
,
Program Chairs:
Michael Rabinovich
Case Western Reserve University, USA
,
Steffen Staab
University of Koblenz-Landau, Germany

Copyright © 2012 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Univ. de Lyon: Universite de Lyon

In-Cooperation

SIGWEB: ACM Special Interest Group on Hypertext, Hypermedia, and Web

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 16 April 2012

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

WWW 2012

Sponsor:

Univ. de Lyon

WWW 2012: 21st World Wide Web Conference 2012

April 16 - 20, 2012

Lyon, France

Acceptance Rates

Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

24
Total Citations
View Citations
466
Total Downloads

Downloads (Last 12 months)5
Downloads (Last 6 weeks)1

Reflects downloads up to 16 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

Agrawal MJalal ASharma H(2024) Enhancing scene‐text visual question answering with relational reasoning, attention and dynamic vocabulary integration Computational Intelligence10.1111/coin.1263540:1Online publication date: 20-Feb-2024
https://doi.org/10.1111/coin.12635
Jin ZWu HYang CZhou FQin JXiao LYin X(2023)RUArt: A Novel Text-Centered Solution for Text-Based Visual Question AnsweringIEEE Transactions on Multimedia10.1109/TMM.2021.312019425(1-12)Online publication date: 2023
https://doi.org/10.1109/TMM.2021.3120194
Beckh KMüller SJakobs MToborek VTan HFischer RWelke PHouben Svon Rueden L(2023)Harnessing Prior Knowledge for Explainable Machine Learning: An Overview2023 IEEE Conference on Secure and Trustworthy Machine Learning (SaTML)10.1109/SaTML54575.2023.00038(450-463)Online publication date: Feb-2023
https://doi.org/10.1109/SaTML54575.2023.00038
Elahi EIglesias AMorato J(2022)Readability of Graphical Contents on World Wide Web (WWW)2022 17th Iberian Conference on Information Systems and Technologies (CISTI)10.23919/CISTI54924.2022.9820011(1-4)Online publication date: 22-Jun-2022
https://doi.org/10.23919/CISTI54924.2022.9820011
Elahi EIglesias AMorato J(2022)Web Images Relevance and Quality: User EvaluationProceedings of the 5th International Conference on Computer Science and Software Engineering10.1145/3569966.3569984(66-69)Online publication date: 21-Oct-2022
https://dl.acm.org/doi/10.1145/3569966.3569984
He YGarg RChowdhury A(2022)TD-Road: Top-Down Road Network Extraction with Holistic Graph ConstructionComputer Vision – ECCV 202210.1007/978-3-031-20077-9_33(562-577)Online publication date: 6-Nov-2022
https://doi.org/10.1007/978-3-031-20077-9_33
Elahi ELara JMaqueda A(2022)Image Relevance on Websites and ReadabilityInformation Systems and Technologies10.1007/978-3-031-04826-5_28(286-295)Online publication date: 11-May-2022
https://doi.org/10.1007/978-3-031-04826-5_28
Liu MWang HLi YBian Y(2020)Research on Visual Relation Detection Based on Computer Vision2020 3rd International Conference on Advanced Electronic Materials, Computers and Software Engineering (AEMCSE)10.1109/AEMCSE50948.2020.00080(342-345)Online publication date: Apr-2020
https://doi.org/10.1109/AEMCSE50948.2020.00080
Donadello ISerafini L(2019)Compensating Supervision Incompleteness with Prior Knowledge in Semantic Image Interpretation2019 International Joint Conference on Neural Networks (IJCNN)10.1109/IJCNN.2019.8852413(1-8)Online publication date: Jul-2019
https://doi.org/10.1109/IJCNN.2019.8852413
Zhang JWang ZMu YWang Z(2019)Image region label refinement using spatial position relation graphKnowledge-Based Systems10.1016/j.knosys.2018.12.010166(82-91)Online publication date: Feb-2019
https://doi.org/10.1016/j.knosys.2018.12.010
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents