Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3474085.3475470acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

Is Visual Context Really Helpful for Knowledge Graph? A Representation Learning Perspective

Published: 17 October 2021 Publication History
  • Get Citation Alerts
  • Abstract

    Visual modality recently has aroused extensive attention in the fields of knowledge graph and multimedia because a lot of real-world knowledge is multi-modal in nature. However, it is currently unclear to what extent the visual modality can improve the performance of knowledge graph tasks over unimodal models, and equally treating structural and visual features may encode too much irrelevant information from images. In this paper, we probe the utility of the auxiliary visual context from knowledge graph representation learning perspective by designing a Relation Sensitive Multi-modal Embedding model, RSME for short. RSME can automatically encourage or filter the influence of visual context during the representation learning. We also examine the effect of different visual feature encoders. Experimental results validate the superiority of our approach compared to the state-of-the-art methods. On the basis of in-depth analysis, we conclude that under appropriate circumstances models are capable of leveraging the visual input to generate better knowledge graph embeddings and vice versa.

    References

    [1]
    Tadas Baltruvs aitis, Chaitanya Ahuja, and Louis-Philippe Morency. 2018. Multimodal machine learning: A survey and taxonomy. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 41, 2 (2018), 423--443.
    [2]
    Christian Bizer, Jens Lehmann, Georgi Kobilarov, Sören Auer, Christian Becker, Richard Cyganiak, and Sebastian Hellmann. 2009. DBpedia-A crystallization point for the Web of Data. Journal of web semantics, Vol. 7, 3 (2009), 154--165.
    [3]
    Kurt Bollacker, Colin Evans, Praveen Paritosh, Tim Sturge, and Jamie Taylor. 2008. Freebase: a collaboratively created graph database for structuring human knowledge. In 2008 ACM SIGMOD International Conference on Management of Data. 1247--1250.
    [4]
    Antoine Bordes, Nicolas Usunier, Alberto Garcia-Duran, Jason Weston, and Oksana Yakhnenko. 2013. Translating embeddings for modeling multi-relational data. In Advances in Neural Information Processing Systems. 1--9.
    [5]
    Liyi Chen, Zhi Li, Yijun Wang, Tong Xu, Zhefeng Wang, and Enhong Chen. 2020. MMEA: Entity Alignment for Multi-modal Knowledge Graph. In International Conference on Knowledge Science, Engineering and Management. Springer, 134--147.
    [6]
    Guillem Collell, Ted Zhang, and Marie-Francine Moens. 2017. Imagined visual representations as multimodal embeddings. In 31st AAAI Conference on Artificial Intelligence. 4378--4384.
    [7]
    Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, et al. 2020. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020).
    [8]
    Takuma Ebisu and Ryutaro Ichise. 2018. Toruse: Knowledge graph embedding on a lie group. In 32rd the AAAI Conference on Artificial Intelligence. 1819--1826.
    [9]
    Desmond Elliott. 2018. Adversarial evaluation of multimodal machine translation. In 2018 Conference on Empirical Methods in Natural Language Processing. 2974--2978.
    [10]
    Andrea Frome, Greg S Corrado, Jon Shlens, Samy Bengio, Jeff Dean, Marc'Aurelio Ranzato, and Tomas Mikolov. 2013. DeViSE: A Deep Visual-Semantic Embedding Model. Advances in Neural Information Processing Systems (2013), 2121--2129.
    [11]
    Qingyu Guo, Fuzhen Zhuang, Chuan Qin, Hengshu Zhu, Xing Xie, Hui Xiong, and Qing He. 2020. A survey on knowledge graph-based recommender systems. IEEE Transactions on Knowledge and Data Engineering (2020).
    [12]
    Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In IEEE Conference on Computer Vision and Pattern Recognition. 770--778.
    [13]
    Xiao Huang, Jingyuan Zhang, Dingcheng Li, and Ping Li. 2019. Knowledge graph embedding based question answering. In 12th ACM International Conference on Web Search and Data Mining. 105--113.
    [14]
    Forrest N Iandola, Song Han, Matthew W Moskewicz, Khalid Ashraf, William J Dally, and Kurt Keutzer. 2016. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and 0.5 MB model size. arXiv preprint arXiv:1602.07360 (2016).
    [15]
    Guoliang Ji, Shizhu He, Liheng Xu, Kang Liu, and Jun Zhao. 2015. Knowledge graph embedding via dynamic mapping matrix. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing. 687--696.
    [16]
    Guoliang Ji, Kang Liu, Shizhu He, and Jun Zhao. 2016. Knowledge graph completion with adaptive sparse transfer matrix. In 30th AAAI Conference on Artificial Intelligence. 985--991.
    [17]
    Guohao Li, Xin Wang, and Wenwu Zhu. 2020. Boosting Visual Question Answering with Context-aware Knowledge Aggregation. In Proceedings of the 28th ACM International Conference on Multimedia. 1227--1235.
    [18]
    Yankai Lin, Zhiyuan Liu, Maosong Sun, Yang Liu, and Xuan Zhu. 2015. Learning entity and relation embeddings for knowledge graph completion. In 29th AAAI Conference on Artificial Intelligence. 2181--2187.
    [19]
    Fangyu Liu, Muhao Chen, Dan Roth, and Nigel Collier. 2021. Visual Pivoting for (Unsupervised) Entity Alignment. In 35th AAAI Conference on Artificial Intelligence. 4257--4266.
    [20]
    Ye Liu, Hui Li, Alberto Garcia-Duran, Mathias Niepert, Daniel Onoro-Rubio, and David S Rosenblum. 2019. MMKG: multi-modal knowledge graphs. In European Semantic Web Conference. Springer, 459--474.
    [21]
    George A Miller. 1995. WordNet: a lexical database for English. Commun. ACM, Vol. 38, 11 (1995), 39--41.
    [22]
    Hatem Mousselly-Sergieh, Teresa Botschen, Iryna Gurevych, and Stefan Roth. 2018. A multimodal translation-based approach for knowledge graph representation learning. In 7th Joint Conference on Lexical and Computational Semantics. 225--234.
    [23]
    Maximilian Nickel, Lorenzo Rosasco, and Tomaso Poggio. 2016. Holographic embeddings of knowledge graphs. In 30th AAAI Conference on Artificial Intelligence. 1955--1961.
    [24]
    Maximilian Nickel, Volker Tresp, and Hans-Peter Kriegel. 2011. A Three-Way Model for Collective Learning on Multi-Relational Data. In International Conference on Machine Learning. 809--816.
    [25]
    Xia-mu Niu and Yu-hua Jiao. 2008. An overview of perceptual hashing. Acta Electronica Sinica, Vol. 36, 7 (2008), 1405--1411.
    [26]
    Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014).
    [27]
    Rui Sun, Xuezhi Cao, Yan Zhao, Junchen Wan, Kun Zhou, Fuzheng Zhang, Zhongyuan Wang, and Kai Zheng. 2020. Multi-modal Knowledge Graphs for Recommender Systems. In 29th ACM International Conference on Information and Knowledge Management. 1405--1414.
    [28]
    Zhiqing Sun, Zhi-Hong Deng, Jian-Yun Nie, and Jian Tang. 2018. RotatE: Knowledge Graph Embedding by Relational Rotation in Complex Space. In International Conference on Learning Representations. 1--18.
    [29]
    Théo Trouillon, Johannes Welbl, Sebastian Riedel, Éric Gaussier, and Guillaume Bouchard. 2016. Complex embeddings for simple link prediction. In International Conference on Machine Learning. 2071--2080.
    [30]
    Denny Vrandevcić and Markus Krötzsch. 2014. Wikidata: a free collaborative knowledgebase. Commun. ACM, Vol. 57, 10 (2014), 78--85.
    [31]
    Quan Wang, Zhendong Mao, Bin Wang, and Li Guo. 2017. Knowledge graph embedding: A survey of approaches and applications. IEEE Transactions on Knowledge and Data Engineering, Vol. 29, 12 (2017), 2724--2743.
    [32]
    Zikang Wang, Linjing Li, Qiudan Li, and Daniel Zeng. 2019. Multimodal data enhanced representation learning for knowledge graphs. In 2019 International Joint Conference on Neural Networks. IEEE, 1--8.
    [33]
    Zhen Wang, Jianwen Zhang, Jianlin Feng, and Zheng Chen. 2014. Knowledge graph embedding by translating on hyperplanes. In 28th AAAI Conference on Artificial Intelligence. 1112--1119.
    [34]
    Ruobing Xie, Zhiyuan Liu, Huanbo Luan, and Maosong Sun. 2017. Image-embodied knowledge representation learning. In 26th International Joint Conference on Artificial Intelligence. 3140--3146.
    [35]
    Bishan Yang, Wen-tau Yih, Xiaodong He, Jianfeng Gao, and Li Deng. 2015. Embedding entities and relations for learning and inference in knowledge bases. In International Conference on Learning Representations.

    Cited By

    View all
    • (2024)HKA: A Hierarchical Knowledge Alignment Framework for Multimodal Knowledge Graph CompletionACM Transactions on Multimedia Computing, Communications, and Applications10.1145/366428820:8(1-19)Online publication date: 29-Jun-2024
    • (2024)CoBjeason: Reasoning Covered Object in Image by Multi-Agent Collaboration Based on Informed Knowledge GraphACM Transactions on Knowledge Discovery from Data10.1145/364356518:5(1-56)Online publication date: 28-Feb-2024
    • (2024)Contrast then Memorize: Semantic Neighbor Retrieval-Enhanced Inductive Multimodal Knowledge Graph CompletionProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657838(102-111)Online publication date: 10-Jul-2024
    • Show More Cited By

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    MM '21: Proceedings of the 29th ACM International Conference on Multimedia
    October 2021
    5796 pages
    ISBN:9781450386517
    DOI:10.1145/3474085
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 17 October 2021

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. knowledge graph
    2. multi-modal
    3. representation learning

    Qualifiers

    • Research-article

    Funding Sources

    Conference

    MM '21
    Sponsor:
    MM '21: ACM Multimedia Conference
    October 20 - 24, 2021
    Virtual Event, China

    Acceptance Rates

    Overall Acceptance Rate 995 of 4,171 submissions, 24%

    Upcoming Conference

    MM '24
    The 32nd ACM International Conference on Multimedia
    October 28 - November 1, 2024
    Melbourne , VIC , Australia

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)273
    • Downloads (Last 6 weeks)10
    Reflects downloads up to 26 Jul 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)HKA: A Hierarchical Knowledge Alignment Framework for Multimodal Knowledge Graph CompletionACM Transactions on Multimedia Computing, Communications, and Applications10.1145/366428820:8(1-19)Online publication date: 29-Jun-2024
    • (2024)CoBjeason: Reasoning Covered Object in Image by Multi-Agent Collaboration Based on Informed Knowledge GraphACM Transactions on Knowledge Discovery from Data10.1145/364356518:5(1-56)Online publication date: 28-Feb-2024
    • (2024)Contrast then Memorize: Semantic Neighbor Retrieval-Enhanced Inductive Multimodal Knowledge Graph CompletionProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657838(102-111)Online publication date: 10-Jul-2024
    • (2024)NativE: Multi-modal Knowledge Graph Completion in the WildProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657800(91-101)Online publication date: 11-Jul-2024
    • (2024)The Promise and Challenge of Large Language Models for Knowledge Engineering: Insights from a HackathonExtended Abstracts of the 2024 CHI Conference on Human Factors in Computing Systems10.1145/3613905.3650844(1-9)Online publication date: 11-May-2024
    • (2024)Building Multimodal Knowledge Bases With Multimodal Computational Sequences and Generative Adversarial NetworksIEEE Transactions on Multimedia10.1109/TMM.2023.329150326(2027-2040)Online publication date: 1-Jan-2024
    • (2024)A Text-Enhanced Transformer Fusion Network for Multimodal Knowledge Graph CompletionIEEE Intelligent Systems10.1109/MIS.2024.337892139:3(54-62)Online publication date: May-2024
    • (2024)Multi-Modal Siamese Network for Few-Shot Knowledge Graph Completion2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00061(719-732)Online publication date: 13-May-2024
    • (2024)RSTIE-KGC: A Relation Sensitive Textual Information Enhanced Knowledge Graph Completion Model2024 27th International Conference on Computer Supported Cooperative Work in Design (CSCWD)10.1109/CSCWD61410.2024.10580566(2991-2998)Online publication date: 8-May-2024
    • (2024)Multi-hop neighbor fusion enhanced hierarchical transformer for multi-modal knowledge graph completionWorld Wide Web10.1007/s11280-024-01289-w27:5Online publication date: 1-Sep-2024
    • Show More Cited By

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media