research-article

Improving the representation of image descriptions for semantic image retrieval with RDF

Authors:

Wenlong Ni, and

Yan ChengAuthors Info & Claims

Volume 73, Issue C

https://doi.org/10.1016/j.jvcir.2020.102934

Published: 25 June 2024 Publication History

Abstract

The past few years have witnessed a surge of interest in many topics at the intersection of natural language processing and computer vision. In particular, using objects together with their attributes and relations to represent images or interpret languages has been proved useful across a wide variety of applications. The goal of this work is to provide an improved RDF-based model to represent images for enhancing textual based image retrieval. We use natural language processing tools to obtain a set of objects, attributes and relations; and then model them into graphical structures with RDF-based model. We also conduct some preliminary experiments to show how to handle textual based image retrieval for complex queries or multilingual queries. The experimental results show that our approach improves the representation of image descriptions, which is suitable for enhancing image retrieval with high-level semantics.

References

[1]

Wang Z., Zhou J., Ma J., Li J., Ai J., Yang Y., Discovering attractive segments in the user-generated video streams, Inf. Process. Manage. 57 (1) (2020).

Digital Library

[2]

Yang Y., Edu u A.Y., Fermuller C., Deepiu: an architecture for image understanding, Adv. Cogn. Syst. (2016).

[3]

Xu X., Lu H., Song J., Yang Y., Shen H.T., Li X., Ternary adversarial networks with self-supervision for zero-shot cross-modal retrieval, IEEE Trans. Cybern. 50 (6) (2020) 2400–2413.

[4]

Wang Z., Chen K., Zhang M., He P., Wang Y., Zhu P., Yang Y., Multi-scale aggregation network for temporal action proposals, Pattern Recognit. Lett. 122 (2019) 60–65.

[5]

Xu X., Lin K., Gao L., Lu H., Shen H.T., Li X., Cross-modal common representations by private-shared subspaces separation, IEEE Trans. Cybern. (2020) 1–14.

[6]

Ghosh S., Das N., Gonçalves T., Quaresma P., Representing image captions as concept graphs using semantic information, in: 2016 International Conference on Advances in Computing, Communications and Informatics (ICACCI), IEEE, 2016, pp. 162–167.

[7]

Xu X., Wang T., Yang Y., Zuo L., Shen F., Shen H.T., Cross-modal attention with semantic consistence for image-text matching, IEEE Trans. Neural Netw. Learn. Syst. (2020).

[8]

O. Vinyals, A. Toshev, S. Bengio, D. Erhan, Show and tell: A neural image caption generator, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015, pp. 3156–3164.

[9]

Q. You, H. Jin, Z. Wang, C. Fang, J. Luo, Image captioning with semantic attention, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 4651–4659.

[10]

Anderson P., Fernando B., Johnson M., Gould S., Spice: Semantic propositional image caption evaluation, in: European Conference on Computer Vision, Springer, 2016, pp. 382–398.

[11]

J. Johnson, R. Krishna, M. Stark, L.-J. Li, D. Shamma, M. Bernstein, L. Fei-Fei, Image retrieval using scene graphs, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015, pp. 3668–3678.

[12]

S. Schuster, R. Krishna, A. Chang, L. Fei-Fei, C.D. Manning, Generating semantically precise scene graphs from textual descriptions for improved image retrieval, in: Proceedings of the Fourth Workshop on Vision and Language, 2015, pp. 70–80.

[13]

Aditya S., Yang Y., Baral C., Fermuller C., Aloimonos Y., From images to sentences through scene description graphs using commonsense reasoning and knowledge, 2015, arXiv preprint arXiv:1511.03292.

[14]

Manola F., Miller E., McBride B., Resource Description Framework (RDF) Primer, Vol. 10, W3C Recommendation, 2004.

[15]

Hodosh M., Young P., Hockenmaier J., Framing image description as a ranking task: Data, models and evaluation metrics, J. Artificial Intelligence Res. 47 (2013) 853–899.

[16]

Berners-Lee T., Hendler J., Lassila O., et al., The semantic web, Sci. Amer. 284 (5) (2001) 28–37.

[17]

Prud’Hommeaux E., Carothers G., Beckett D., Berners-Lee T., Rdf 1.1 Turtle: Terse RDF Triple Language, Vol. 25, W3C Recommendation, 2014, pp. 2008–2014.

[18]

Prud’Hommeaux E., Seaborne A., et al., Sparql Query Language for rdf (Working Draft), W3C, 2007.

[19]

P. Bard, S. Participants, The SESAME project: an overview and main results, in: Proc. of 13th World Conf. on Earthquake Engineering, Vancouver, BC, Canada, August, 2004, pp. 1–6.

[20]

Grobe M., Rdf, jena, sparql and the ‘semantic web’, in: Proceedings of the 37th Annual ACM SIGUCCS Fall Conference: Communication and Collaboration, ACM, 2009, pp. 131–138.

[21]

Erling O., Mikhailov I., RDF support in the virtuoso DBMS, in: Networked Knowledge-Networked Media, Springer, 2009, pp. 7–24.

[22]

Liu H., Singh P., ConceptNet–a practical commonsense reasoning tool-kit, BT Technol. J. 22 (4) (2004) 211–226.

[23]

Speer R., Havasi C., Representing general relational knowledge in conceptnet 5, in: LREC, 2012, pp. 3679–3686.

[24]

Sharma A., Vo N.H., Aditya S., Baral C., Towards addressing the winograd schema challenge?building and using a semantic parser and a knowledge hunting module, in: Twenty-Fourth International Joint Conference on Artificial Intelligence, 2015.

[25]

Dong Y., Zhou Y., Li C., Ge J., Han Y., He M., Liu D., Zhou X., Luo B., Establish evidence chain model on chinese criminal judgment documents using text similarity measure, in: International Conference of Pioneering Computer Scientists, Engineers and Educators, Springer, 2018, pp. 27–40.

[26]

Kipf T.N., Welling M., Semi-supervised classification with graph convolutional networks, 2016, arXiv preprint arXiv:1609.02907.

[27]

Chaudhuri U., Banerjee B., Bhattacharya A., Siamese graph convolutional network for content based remote sensing image retrieval, Comput. Vis. Image Underst. 184 (2019) 22–30.

Digital Library

[28]

J. Yang, J. Lu, S. Lee, D. Batra, D. Parikh, Graph r-cnn for scene graph generation, in: Proceedings of the European Conference on Computer Vision (ECCV), 2018, pp. 670–685.

[29]

L. Zhao, X. Peng, Y. Tian, M. Kapadia, D.N. Metaxas, Semantic graph convolutional networks for 3D human pose regression, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019, pp. 3425–3435.

[30]

Z.-M. Chen, X.-S. Wei, P. Wang, Y. Guo, Multi-label image recognition with graph convolutional networks, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019, pp. 5177–5186.

[31]

Yoshikawa Y., Shigeto Y., Takeuchi A., Stair captions: Constructing a large-scale japanese image caption dataset, 2017, arXiv preprint arXiv:1705.00823.

[32]

Li X., Lan W., Dong J., Liu H., Adding chinese captions to images, in: International Conference on Multimedia Retrieval, ACM, 2016, pp. 271–275.

[33]

Li X., Xu C., Wang X., Lan W., Jia Z., Yang G., Xu J., COCO-CN for cross-lingual image tagging, captioning and retrieval, IEEE Trans. Multimed. (2019).

[34]

Chen H., Trouve A., Murakami K.J., Fukuda A., Semantic image retrieval for complex queries using a knowledge parser, Multimedia Tools Appl. 77 (9) (2018) 10733–10751.

[35]

McBride B., Boothby D., Dollin C., An introduction to RDF and the jena RDF api, 2004, p. 2007. Retrieved August, 1.

Recommendations

An intelligent annotation-based image retrieval system based on RDF descriptions

The notions of concept and instance are proposed to express the semantics of images.An image annotation model is proposed to annotate images at three levels.An intelligent ABIR system is implemented based on RDF descriptions.The problems of synonyms and ...
Read More
An optimized palmprint recognition approach based on image sharpness

We found changing image sharpness can improve palmprint recognition performance.Introduce EAV to assess palmprint image sharpness.We've found the optimal sharpness range E1, E2 that can optimize the performance.Using image smoothing and restoration can ...
Read More
Holistic object detection and image understanding
Abstract
This paper proposes a new representation of the visual content of an image that allows learning about what elements are part of an image and the hierarchical structure that they form. Our representation is a Top-Down Visual-Tree, where ...
Highlights
- Integrated process for object detection and image understanding.
- Holistic ...
Read More

Comments

Information & Contributors

Information

Published In

cover image Journal of Visual Communication and Image Representation

Journal of Visual Communication and Image Representation Volume 73, Issue C

Nov 2020

236 pages

ISSN:1047-3203

Issue’s Table of Contents

Elsevier Inc.

Publisher

Academic Press, Inc.

United States

Publication History

Published: 25 June 2024

Author Tags

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Other Metrics

View Author Metrics

Citations

View Options

View options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Issue’s Table of Contents