research-article

Is Visual Context Really Helpful for Knowledge Graph? A Representation Learning Perspective

Authors:

Guilin QiAuthors Info & Claims

MM '21: Proceedings of the 29th ACM International Conference on Multimedia

Pages 2735 - 2743

https://doi.org/10.1145/3474085.3475470

Published: 17 October 2021 Publication History

Abstract

Visual modality recently has aroused extensive attention in the fields of knowledge graph and multimedia because a lot of real-world knowledge is multi-modal in nature. However, it is currently unclear to what extent the visual modality can improve the performance of knowledge graph tasks over unimodal models, and equally treating structural and visual features may encode too much irrelevant information from images. In this paper, we probe the utility of the auxiliary visual context from knowledge graph representation learning perspective by designing a Relation Sensitive Multi-modal Embedding model, RSME for short. RSME can automatically encourage or filter the influence of visual context during the representation learning. We also examine the effect of different visual feature encoders. Experimental results validate the superiority of our approach compared to the state-of-the-art methods. On the basis of in-depth analysis, we conclude that under appropriate circumstances models are capable of leveraging the visual input to generate better knowledge graph embeddings and vice versa.

References

[1]

Tadas Baltruvs aitis, Chaitanya Ahuja, and Louis-Philippe Morency. 2018. Multimodal machine learning: A survey and taxonomy. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 41, 2 (2018), 423--443.

Digital Library

[2]

Christian Bizer, Jens Lehmann, Georgi Kobilarov, Sören Auer, Christian Becker, Richard Cyganiak, and Sebastian Hellmann. 2009. DBpedia-A crystallization point for the Web of Data. Journal of web semantics, Vol. 7, 3 (2009), 154--165.

Digital Library

[3]

Kurt Bollacker, Colin Evans, Praveen Paritosh, Tim Sturge, and Jamie Taylor. 2008. Freebase: a collaboratively created graph database for structuring human knowledge. In 2008 ACM SIGMOD International Conference on Management of Data. 1247--1250.

Digital Library

[4]

Antoine Bordes, Nicolas Usunier, Alberto Garcia-Duran, Jason Weston, and Oksana Yakhnenko. 2013. Translating embeddings for modeling multi-relational data. In Advances in Neural Information Processing Systems. 1--9.

Digital Library

[5]

Liyi Chen, Zhi Li, Yijun Wang, Tong Xu, Zhefeng Wang, and Enhong Chen. 2020. MMEA: Entity Alignment for Multi-modal Knowledge Graph. In International Conference on Knowledge Science, Engineering and Management. Springer, 134--147.

[6]

Guillem Collell, Ted Zhang, and Marie-Francine Moens. 2017. Imagined visual representations as multimodal embeddings. In 31st AAAI Conference on Artificial Intelligence. 4378--4384.

Digital Library

[7]

Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, et al. 2020. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020).

[8]

Takuma Ebisu and Ryutaro Ichise. 2018. Toruse: Knowledge graph embedding on a lie group. In 32rd the AAAI Conference on Artificial Intelligence. 1819--1826.

[9]

Desmond Elliott. 2018. Adversarial evaluation of multimodal machine translation. In 2018 Conference on Empirical Methods in Natural Language Processing. 2974--2978.

[10]

Andrea Frome, Greg S Corrado, Jon Shlens, Samy Bengio, Jeff Dean, Marc'Aurelio Ranzato, and Tomas Mikolov. 2013. DeViSE: A Deep Visual-Semantic Embedding Model. Advances in Neural Information Processing Systems (2013), 2121--2129.

Digital Library

[11]

Qingyu Guo, Fuzhen Zhuang, Chuan Qin, Hengshu Zhu, Xing Xie, Hui Xiong, and Qing He. 2020. A survey on knowledge graph-based recommender systems. IEEE Transactions on Knowledge and Data Engineering (2020).

[12]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In IEEE Conference on Computer Vision and Pattern Recognition. 770--778.

[13]

Xiao Huang, Jingyuan Zhang, Dingcheng Li, and Ping Li. 2019. Knowledge graph embedding based question answering. In 12th ACM International Conference on Web Search and Data Mining. 105--113.

Digital Library

[14]

Forrest N Iandola, Song Han, Matthew W Moskewicz, Khalid Ashraf, William J Dally, and Kurt Keutzer. 2016. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and 0.5 MB model size. arXiv preprint arXiv:1602.07360 (2016).

[15]

Guoliang Ji, Shizhu He, Liheng Xu, Kang Liu, and Jun Zhao. 2015. Knowledge graph embedding via dynamic mapping matrix. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing. 687--696.

[16]

Guoliang Ji, Kang Liu, Shizhu He, and Jun Zhao. 2016. Knowledge graph completion with adaptive sparse transfer matrix. In 30th AAAI Conference on Artificial Intelligence. 985--991.

Digital Library

[17]

Guohao Li, Xin Wang, and Wenwu Zhu. 2020. Boosting Visual Question Answering with Context-aware Knowledge Aggregation. In Proceedings of the 28th ACM International Conference on Multimedia. 1227--1235.

Digital Library

[18]

Yankai Lin, Zhiyuan Liu, Maosong Sun, Yang Liu, and Xuan Zhu. 2015. Learning entity and relation embeddings for knowledge graph completion. In 29th AAAI Conference on Artificial Intelligence. 2181--2187.

Digital Library

[19]

Fangyu Liu, Muhao Chen, Dan Roth, and Nigel Collier. 2021. Visual Pivoting for (Unsupervised) Entity Alignment. In 35th AAAI Conference on Artificial Intelligence. 4257--4266.

[20]

Ye Liu, Hui Li, Alberto Garcia-Duran, Mathias Niepert, Daniel Onoro-Rubio, and David S Rosenblum. 2019. MMKG: multi-modal knowledge graphs. In European Semantic Web Conference. Springer, 459--474.

Digital Library

[21]

George A Miller. 1995. WordNet: a lexical database for English. Commun. ACM, Vol. 38, 11 (1995), 39--41.

Digital Library

[22]

Hatem Mousselly-Sergieh, Teresa Botschen, Iryna Gurevych, and Stefan Roth. 2018. A multimodal translation-based approach for knowledge graph representation learning. In 7th Joint Conference on Lexical and Computational Semantics. 225--234.

[23]

Maximilian Nickel, Lorenzo Rosasco, and Tomaso Poggio. 2016. Holographic embeddings of knowledge graphs. In 30th AAAI Conference on Artificial Intelligence. 1955--1961.

Digital Library

[24]

Maximilian Nickel, Volker Tresp, and Hans-Peter Kriegel. 2011. A Three-Way Model for Collective Learning on Multi-Relational Data. In International Conference on Machine Learning. 809--816.

Digital Library

[25]

Xia-mu Niu and Yu-hua Jiao. 2008. An overview of perceptual hashing. Acta Electronica Sinica, Vol. 36, 7 (2008), 1405--1411.

[26]

Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014).

[27]

Rui Sun, Xuezhi Cao, Yan Zhao, Junchen Wan, Kun Zhou, Fuzheng Zhang, Zhongyuan Wang, and Kai Zheng. 2020. Multi-modal Knowledge Graphs for Recommender Systems. In 29th ACM International Conference on Information and Knowledge Management. 1405--1414.

Digital Library

[28]

Zhiqing Sun, Zhi-Hong Deng, Jian-Yun Nie, and Jian Tang. 2018. RotatE: Knowledge Graph Embedding by Relational Rotation in Complex Space. In International Conference on Learning Representations. 1--18.

[29]

Théo Trouillon, Johannes Welbl, Sebastian Riedel, Éric Gaussier, and Guillaume Bouchard. 2016. Complex embeddings for simple link prediction. In International Conference on Machine Learning. 2071--2080.

Digital Library

[30]

Denny Vrandevcić and Markus Krötzsch. 2014. Wikidata: a free collaborative knowledgebase. Commun. ACM, Vol. 57, 10 (2014), 78--85.

Digital Library

[31]

Quan Wang, Zhendong Mao, Bin Wang, and Li Guo. 2017. Knowledge graph embedding: A survey of approaches and applications. IEEE Transactions on Knowledge and Data Engineering, Vol. 29, 12 (2017), 2724--2743.

[32]

Zikang Wang, Linjing Li, Qiudan Li, and Daniel Zeng. 2019. Multimodal data enhanced representation learning for knowledge graphs. In 2019 International Joint Conference on Neural Networks. IEEE, 1--8.

[33]

Zhen Wang, Jianwen Zhang, Jianlin Feng, and Zheng Chen. 2014. Knowledge graph embedding by translating on hyperplanes. In 28th AAAI Conference on Artificial Intelligence. 1112--1119.

Digital Library

[34]

Ruobing Xie, Zhiyuan Liu, Huanbo Luan, and Maosong Sun. 2017. Image-embodied knowledge representation learning. In 26th International Joint Conference on Artificial Intelligence. 3140--3146.

Digital Library

[35]

Bishan Yang, Wen-tau Yih, Xiaodong He, Jianfeng Gao, and Li Deng. 2015. Embedding entities and relations for learning and inference in knowledge bases. In International Conference on Learning Representations.

Cited By

Chen TWang TZhang HXu J(2025)M2KGRL: A semantic-matching based framework for multimodal knowledge graph representation learningExpert Systems with Applications10.1016/j.eswa.2025.126388269(126388)Online publication date: Apr-2025
https://doi.org/10.1016/j.eswa.2025.126388
Zhang SHuang HLin XZheng CZheng ZWang J(2025)Multimodal Knowledge Graph Completion Model Based on Modal Hierarchical FusionComputer Supported Cooperative Work and Social Computing10.1007/978-981-96-2376-1_28(381-395)Online publication date: 5-Mar-2025
https://doi.org/10.1007/978-981-96-2376-1_28
Shang YFu KZhang ZJin LLiu ZWang SLi S(2024)MERGE: A Modal Equilibrium Relational Graph Framework for Multi-Modal Knowledge Graph CompletionSensors10.3390/s2423760524:23(7605)Online publication date: 28-Nov-2024
https://doi.org/10.3390/s24237605
Show More Cited By

Index Terms

Is Visual Context Really Helpful for Knowledge Graph? A Representation Learning Perspective
1. Computing methodologies
  1. Artificial intelligence
    1. Knowledge representation and reasoning
      1. Semantic networks
2. Information systems
  1. Information retrieval
    1. Specialized information retrieval
      1. Multimedia and multimodal retrieval

Recommendations

Audio–visual collaborative representation learning for Dynamic Saliency Prediction
Abstract
The Dynamic Saliency Prediction (DSP) task simulates the human selective attention mechanism to perceive a dynamic scene, which is significant and imperative in many vision tasks. Most of existing methods only consider visual cues, ...
A review on the reliability of knowledge graph: from a knowledge representation learning perspective
Abstract
Knowledge graphs manage and organize data and information in a structured form, which can provide effective support for various applications and services. Only reliable knowledge can provide valuable information. However, most existing knowledge ...
A Joint Model for Representation Learning of Tibetan Knowledge Graph Based on Encyclopedia
Learning the representation of a knowledge graph is critical to the field of natural language processing. There is a lot of research for English knowledge graph representation. However, for the low-resource languages, such as Tibetan, how to represent ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

MM '21: Proceedings of the 29th ACM International Conference on Multimedia

October 2021

5796 pages

ISBN:9781450386517

DOI:10.1145/3474085

General Chairs:
Heng Tao Shen
University of Electronic Science&Technology of China, China
,
Yueting Zhuang
Zhejiang University, China
,
John R. Smith
IBM, USA
,
Program Chairs:
Yang Yang
University of Electronic Science and Technology of China, China
,
Pablo Cesar
CWI&TU Delft, The Netherlands
,
Florian Metze
FACEBOOK, Inc., USA
,
Balakrishnan Prabhakaran
University of Texas at Dallas, USA

Copyright © 2021 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 October 2021

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

National Natural Science Foundation of China
CCF-Baidu Open Fund
the Fundamental Research Funds for the Central Universities

Conference

MM '21

Sponsor:

SIGMM

MM '21: ACM Multimedia Conference

October 20 - 24, 2021

Virtual Event, China

Acceptance Rates

Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

36
Total Citations
View Citations
1,269
Total Downloads

Downloads (Last 12 months)267
Downloads (Last 6 weeks)27

Reflects downloads up to 03 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Chen TWang TZhang HXu J(2025)M2KGRL: A semantic-matching based framework for multimodal knowledge graph representation learningExpert Systems with Applications10.1016/j.eswa.2025.126388269(126388)Online publication date: Apr-2025
https://doi.org/10.1016/j.eswa.2025.126388
Zhang SHuang HLin XZheng CZheng ZWang J(2025)Multimodal Knowledge Graph Completion Model Based on Modal Hierarchical FusionComputer Supported Cooperative Work and Social Computing10.1007/978-981-96-2376-1_28(381-395)Online publication date: 5-Mar-2025
https://doi.org/10.1007/978-981-96-2376-1_28
Shang YFu KZhang ZJin LLiu ZWang SLi S(2024)MERGE: A Modal Equilibrium Relational Graph Framework for Multi-Modal Knowledge Graph CompletionSensors10.3390/s2423760524:23(7605)Online publication date: 28-Nov-2024
https://doi.org/10.3390/s24237605
Li QChen ZJi CJiang SLi JLarson K(2024)LLM-based multi-level knowledge generation for few-shot knowledge graph completionProceedings of the Thirty-Third International Joint Conference on Artificial Intelligence10.24963/ijcai.2024/236(2135-2143)Online publication date: 3-Aug-2024
https://dl.acm.org/doi/10.24963/ijcai.2024/236
Yang JYang SGao YYang JYang LCai JKankanhalli MPrabhakaran BBoll SSubramanian RZheng LSingh VCesar PXie LXu D(2024)Multimodal Contextual Interactions of Entities: A Modality Circular Fusion Approach for Link PredictionProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681696(8374-8382)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3664647.3681696
Xu YLi YXu MZhu ZZhao Y(2024)HKA: A Hierarchical Knowledge Alignment Framework for Multimodal Knowledge Graph CompletionACM Transactions on Multimedia Computing, Communications, and Applications10.1145/366428820:8(1-19)Online publication date: 29-Jun-2024
https://dl.acm.org/doi/10.1145/3664288
Rong HQian MMa TJin DSheng V(2024)CoBjeason: Reasoning Covered Object in Image by Multi-Agent Collaboration Based on Informed Knowledge GraphACM Transactions on Knowledge Discovery from Data10.1145/364356518:5(1-56)Online publication date: 28-Feb-2024
https://dl.acm.org/doi/10.1145/3643565
Zhao YZhang YZhou BQian XSong KCai XHui Yang GWang HHan SHauff CZuccon GZhang Y(2024)Contrast then Memorize: Semantic Neighbor Retrieval-Enhanced Inductive Multimodal Knowledge Graph CompletionProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657838(102-111)Online publication date: 10-Jul-2024
https://dl.acm.org/doi/10.1145/3626772.3657838
Zhang YChen ZGuo LXu YHu BLiu ZZhang WChen HHui Yang GWang HHan SHauff CZuccon GZhang Y(2024)NativE: Multi-modal Knowledge Graph Completion in the WildProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657800(91-101)Online publication date: 11-Jul-2024
https://doi.org/10.1145/3626772.3657800
Walker JKoutsiana ENwachukwu MMeroño Peñuela ASimperl E(2024)The Promise and Challenge of Large Language Models for Knowledge Engineering: Insights from a HackathonExtended Abstracts of the CHI Conference on Human Factors in Computing Systems10.1145/3613905.3650844(1-9)Online publication date: 11-May-2024
https://dl.acm.org/doi/10.1145/3613905.3650844
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten