Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3474085.3475470acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

Is Visual Context Really Helpful for Knowledge Graph? A Representation Learning Perspective

Published: 17 October 2021 Publication History

Abstract

Visual modality recently has aroused extensive attention in the fields of knowledge graph and multimedia because a lot of real-world knowledge is multi-modal in nature. However, it is currently unclear to what extent the visual modality can improve the performance of knowledge graph tasks over unimodal models, and equally treating structural and visual features may encode too much irrelevant information from images. In this paper, we probe the utility of the auxiliary visual context from knowledge graph representation learning perspective by designing a Relation Sensitive Multi-modal Embedding model, RSME for short. RSME can automatically encourage or filter the influence of visual context during the representation learning. We also examine the effect of different visual feature encoders. Experimental results validate the superiority of our approach compared to the state-of-the-art methods. On the basis of in-depth analysis, we conclude that under appropriate circumstances models are capable of leveraging the visual input to generate better knowledge graph embeddings and vice versa.

References

[1]
Tadas Baltruvs aitis, Chaitanya Ahuja, and Louis-Philippe Morency. 2018. Multimodal machine learning: A survey and taxonomy. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 41, 2 (2018), 423--443.
[2]
Christian Bizer, Jens Lehmann, Georgi Kobilarov, Sören Auer, Christian Becker, Richard Cyganiak, and Sebastian Hellmann. 2009. DBpedia-A crystallization point for the Web of Data. Journal of web semantics, Vol. 7, 3 (2009), 154--165.
[3]
Kurt Bollacker, Colin Evans, Praveen Paritosh, Tim Sturge, and Jamie Taylor. 2008. Freebase: a collaboratively created graph database for structuring human knowledge. In 2008 ACM SIGMOD International Conference on Management of Data. 1247--1250.
[4]
Antoine Bordes, Nicolas Usunier, Alberto Garcia-Duran, Jason Weston, and Oksana Yakhnenko. 2013. Translating embeddings for modeling multi-relational data. In Advances in Neural Information Processing Systems. 1--9.
[5]
Liyi Chen, Zhi Li, Yijun Wang, Tong Xu, Zhefeng Wang, and Enhong Chen. 2020. MMEA: Entity Alignment for Multi-modal Knowledge Graph. In International Conference on Knowledge Science, Engineering and Management. Springer, 134--147.
[6]
Guillem Collell, Ted Zhang, and Marie-Francine Moens. 2017. Imagined visual representations as multimodal embeddings. In 31st AAAI Conference on Artificial Intelligence. 4378--4384.
[7]
Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, et al. 2020. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020).
[8]
Takuma Ebisu and Ryutaro Ichise. 2018. Toruse: Knowledge graph embedding on a lie group. In 32rd the AAAI Conference on Artificial Intelligence. 1819--1826.
[9]
Desmond Elliott. 2018. Adversarial evaluation of multimodal machine translation. In 2018 Conference on Empirical Methods in Natural Language Processing. 2974--2978.
[10]
Andrea Frome, Greg S Corrado, Jon Shlens, Samy Bengio, Jeff Dean, Marc'Aurelio Ranzato, and Tomas Mikolov. 2013. DeViSE: A Deep Visual-Semantic Embedding Model. Advances in Neural Information Processing Systems (2013), 2121--2129.
[11]
Qingyu Guo, Fuzhen Zhuang, Chuan Qin, Hengshu Zhu, Xing Xie, Hui Xiong, and Qing He. 2020. A survey on knowledge graph-based recommender systems. IEEE Transactions on Knowledge and Data Engineering (2020).
[12]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In IEEE Conference on Computer Vision and Pattern Recognition. 770--778.
[13]
Xiao Huang, Jingyuan Zhang, Dingcheng Li, and Ping Li. 2019. Knowledge graph embedding based question answering. In 12th ACM International Conference on Web Search and Data Mining. 105--113.
[14]
Forrest N Iandola, Song Han, Matthew W Moskewicz, Khalid Ashraf, William J Dally, and Kurt Keutzer. 2016. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and 0.5 MB model size. arXiv preprint arXiv:1602.07360 (2016).
[15]
Guoliang Ji, Shizhu He, Liheng Xu, Kang Liu, and Jun Zhao. 2015. Knowledge graph embedding via dynamic mapping matrix. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing. 687--696.
[16]
Guoliang Ji, Kang Liu, Shizhu He, and Jun Zhao. 2016. Knowledge graph completion with adaptive sparse transfer matrix. In 30th AAAI Conference on Artificial Intelligence. 985--991.
[17]
Guohao Li, Xin Wang, and Wenwu Zhu. 2020. Boosting Visual Question Answering with Context-aware Knowledge Aggregation. In Proceedings of the 28th ACM International Conference on Multimedia. 1227--1235.
[18]
Yankai Lin, Zhiyuan Liu, Maosong Sun, Yang Liu, and Xuan Zhu. 2015. Learning entity and relation embeddings for knowledge graph completion. In 29th AAAI Conference on Artificial Intelligence. 2181--2187.
[19]
Fangyu Liu, Muhao Chen, Dan Roth, and Nigel Collier. 2021. Visual Pivoting for (Unsupervised) Entity Alignment. In 35th AAAI Conference on Artificial Intelligence. 4257--4266.
[20]
Ye Liu, Hui Li, Alberto Garcia-Duran, Mathias Niepert, Daniel Onoro-Rubio, and David S Rosenblum. 2019. MMKG: multi-modal knowledge graphs. In European Semantic Web Conference. Springer, 459--474.
[21]
George A Miller. 1995. WordNet: a lexical database for English. Commun. ACM, Vol. 38, 11 (1995), 39--41.
[22]
Hatem Mousselly-Sergieh, Teresa Botschen, Iryna Gurevych, and Stefan Roth. 2018. A multimodal translation-based approach for knowledge graph representation learning. In 7th Joint Conference on Lexical and Computational Semantics. 225--234.
[23]
Maximilian Nickel, Lorenzo Rosasco, and Tomaso Poggio. 2016. Holographic embeddings of knowledge graphs. In 30th AAAI Conference on Artificial Intelligence. 1955--1961.
[24]
Maximilian Nickel, Volker Tresp, and Hans-Peter Kriegel. 2011. A Three-Way Model for Collective Learning on Multi-Relational Data. In International Conference on Machine Learning. 809--816.
[25]
Xia-mu Niu and Yu-hua Jiao. 2008. An overview of perceptual hashing. Acta Electronica Sinica, Vol. 36, 7 (2008), 1405--1411.
[26]
Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014).
[27]
Rui Sun, Xuezhi Cao, Yan Zhao, Junchen Wan, Kun Zhou, Fuzheng Zhang, Zhongyuan Wang, and Kai Zheng. 2020. Multi-modal Knowledge Graphs for Recommender Systems. In 29th ACM International Conference on Information and Knowledge Management. 1405--1414.
[28]
Zhiqing Sun, Zhi-Hong Deng, Jian-Yun Nie, and Jian Tang. 2018. RotatE: Knowledge Graph Embedding by Relational Rotation in Complex Space. In International Conference on Learning Representations. 1--18.
[29]
Théo Trouillon, Johannes Welbl, Sebastian Riedel, Éric Gaussier, and Guillaume Bouchard. 2016. Complex embeddings for simple link prediction. In International Conference on Machine Learning. 2071--2080.
[30]
Denny Vrandevcić and Markus Krötzsch. 2014. Wikidata: a free collaborative knowledgebase. Commun. ACM, Vol. 57, 10 (2014), 78--85.
[31]
Quan Wang, Zhendong Mao, Bin Wang, and Li Guo. 2017. Knowledge graph embedding: A survey of approaches and applications. IEEE Transactions on Knowledge and Data Engineering, Vol. 29, 12 (2017), 2724--2743.
[32]
Zikang Wang, Linjing Li, Qiudan Li, and Daniel Zeng. 2019. Multimodal data enhanced representation learning for knowledge graphs. In 2019 International Joint Conference on Neural Networks. IEEE, 1--8.
[33]
Zhen Wang, Jianwen Zhang, Jianlin Feng, and Zheng Chen. 2014. Knowledge graph embedding by translating on hyperplanes. In 28th AAAI Conference on Artificial Intelligence. 1112--1119.
[34]
Ruobing Xie, Zhiyuan Liu, Huanbo Luan, and Maosong Sun. 2017. Image-embodied knowledge representation learning. In 26th International Joint Conference on Artificial Intelligence. 3140--3146.
[35]
Bishan Yang, Wen-tau Yih, Xiaodong He, Jianfeng Gao, and Li Deng. 2015. Embedding entities and relations for learning and inference in knowledge bases. In International Conference on Learning Representations.

Cited By

View all
  • (2025)M2KGRL: A semantic-matching based framework for multimodal knowledge graph representation learningExpert Systems with Applications10.1016/j.eswa.2025.126388269(126388)Online publication date: Apr-2025
  • (2024)MERGE: A Modal Equilibrium Relational Graph Framework for Multi-Modal Knowledge Graph CompletionSensors10.3390/s2423760524:23(7605)Online publication date: 28-Nov-2024
  • (2024)LLM-based multi-level knowledge generation for few-shot knowledge graph completionProceedings of the Thirty-Third International Joint Conference on Artificial Intelligence10.24963/ijcai.2024/236(2135-2143)Online publication date: 3-Aug-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
MM '21: Proceedings of the 29th ACM International Conference on Multimedia
October 2021
5796 pages
ISBN:9781450386517
DOI:10.1145/3474085
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 October 2021

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. knowledge graph
  2. multi-modal
  3. representation learning

Qualifiers

  • Research-article

Funding Sources

Conference

MM '21
Sponsor:
MM '21: ACM Multimedia Conference
October 20 - 24, 2021
Virtual Event, China

Acceptance Rates

Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)278
  • Downloads (Last 6 weeks)41
Reflects downloads up to 22 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2025)M2KGRL: A semantic-matching based framework for multimodal knowledge graph representation learningExpert Systems with Applications10.1016/j.eswa.2025.126388269(126388)Online publication date: Apr-2025
  • (2024)MERGE: A Modal Equilibrium Relational Graph Framework for Multi-Modal Knowledge Graph CompletionSensors10.3390/s2423760524:23(7605)Online publication date: 28-Nov-2024
  • (2024)LLM-based multi-level knowledge generation for few-shot knowledge graph completionProceedings of the Thirty-Third International Joint Conference on Artificial Intelligence10.24963/ijcai.2024/236(2135-2143)Online publication date: 3-Aug-2024
  • (2024)Multimodal Contextual Interactions of Entities: A Modality Circular Fusion Approach for Link PredictionProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681696(8374-8382)Online publication date: 28-Oct-2024
  • (2024)HKA: A Hierarchical Knowledge Alignment Framework for Multimodal Knowledge Graph CompletionACM Transactions on Multimedia Computing, Communications, and Applications10.1145/366428820:8(1-19)Online publication date: 29-Jun-2024
  • (2024)CoBjeason: Reasoning Covered Object in Image by Multi-Agent Collaboration Based on Informed Knowledge GraphACM Transactions on Knowledge Discovery from Data10.1145/364356518:5(1-56)Online publication date: 28-Feb-2024
  • (2024)Contrast then Memorize: Semantic Neighbor Retrieval-Enhanced Inductive Multimodal Knowledge Graph CompletionProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657838(102-111)Online publication date: 10-Jul-2024
  • (2024)NativE: Multi-modal Knowledge Graph Completion in the WildProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657800(91-101)Online publication date: 11-Jul-2024
  • (2024)The Promise and Challenge of Large Language Models for Knowledge Engineering: Insights from a HackathonExtended Abstracts of the CHI Conference on Human Factors in Computing Systems10.1145/3613905.3650844(1-9)Online publication date: 11-May-2024
  • (2024)A Survey of Knowledge Graph Reasoning on Graph Types: Static, Dynamic, and Multi-ModalIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2024.341745146:12(9456-9478)Online publication date: Dec-2024
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media