Abstract
Most knowledge representation learning (KRL) methods only use structured knowledge graphs (KGs); however, there is still much multi-modal (textual, visual) knowledge that has not been used. To address this challenge, we propose a novel solution called multi-modal knowledge representation learning (MMKRL) to take advantage of multi-source (structured, textual, and visual) knowledge. Instead of simply integrating multi-modal knowledge with structured knowledge in a unified space, we introduce a component alignment scheme and combine it with translation methods to accomplish multi-modal KRL. Specifically, MMKRL firstly reconstructs multi-source knowledge by summing different plausibility functions and then aligns multi-source knowledge using specific norm constraints to reduce reconstruction errors. We also select an adversarial training strategy to enhance the robustness of MMKRL, which is rarely considered in existing multi-modal KRL methods. Experimental results show that MMKRL can effectively utilize multi-modal knowledge to achieve better link prediction and triple classification than other baselines in two widely used datasets. Further, when relying on structured knowledge or limited multi-source knowledge, MMKRL still achieves competitive results in link prediction, demonstrating our model’s superiority.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Ahmadvand M, Tahmoresnezhad J (2021) Metric transfer learning via geometric knowledge embedding. Appl Intell 51(2):921–934
ao FLAC, Pádua FLC, Lacerda A, Pereira ACM, Dalip DH (2019) Multimodal data fusion framework based on autoencoders for top-n recommender systems. Appl Intell 49(9):3267–3282
Bollacker KD, Evans C, Paritosh P, Sturge T, Taylor J (2008) Freebase: a collaboratively created graph database for structuring human knowledge. In: Wang JT (ed) Proceedings of the ACM SIGMOD international conference on management of data, SIGMOD 2008, Vancouver, BC, Canada, June 10-12, 2008, pp 1247–1250 ACM
Bordes A, Glorot X, Weston J, Bengio Y (2014) A semantic matching energy function for learning with multi-relational data - application to word-sense disambiguation. Mach Learn 94(2):233–259
Bordes A, Usunier N, García-Durán A, Weston J, Yakhnenko O (2013) Translating embeddings for modeling multi-relational data. In: Burges CJC, Bottou L, Ghahramani Z, Weinberger KQ (eds) Advances in neural information processing systems 26: 27th annual conference on neural information processing systems 2013. proceedings of a meeting held December 5-8, 2013, Lake Tahoe, Nevada, United States, pp 2787–2795
Chatfield K, Simonyan K, Vedaldi A, Zisserman A (2014) Return of the devil in the details: Delving deep into convolutional nets. In: Valstar MF, French AP, Pridmore TP (eds) British machine vision conference, BMVC 2014, Nottingham, UK, September 1-5, 2014. BMVA Press
Chen L, Li Z, Wang Y, Xu T, Wang Z, Chen E (2020) MMEA: entity alignment for multi-modal knowledge graph. In: Li G, Shen HT, Yuan Y, Wang X, Liu H, Zhao X (eds) Knowledge Science, engineering and management - 13th international conference, KSEM 2020, Hangzhou, China, August 28-30, 2020, Proceedings, Part I, Lecture Notes in Computer Science, vol 12274, pp 134–147. Springer
Deng J, Dong W, Socher R, Li L, Li K, Li F (2009) Imagenet: A large-scale hierarchical image database. In: 2009 IEEE computer society conference on computer vision and pattern recognition (CVPR 2009), 20-25 June 2009, Miami, Florida, USA, pp 248–255. IEEE Computer Society
Fang Y, Wang H, Zhao L, Yu F, Wang C (2020) Dynamic knowledge graph based fake-review detection. Appl Intell 50(12):4281–4295
Goodfellow IJ, Shlens J, Szegedy C (2015) Explaining and harnessing adversarial examples. In: Bengio Y, LeCun Y (eds) 3rd International conference on learning representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings
Han Y, Chen G, Li Z, Geng Z, Li F, Ma B (2020) An asymmetric knowledge representation learning in manifold space. Inf Sci 531:1–12
Ji G, He S, Xu L, Liu K, Zhao J (2015) Knowledge graph embedding via dynamic mapping matrix. In: Proceedings of the 53rd annual meeting of the association for computational linguistics and the 7th international joint conference on natural language processing of the asian federation of natural language processing, ACL 2015, July 26-31, 2015, Beijing, China, Volume 1: Long Papers, pp. 687–696. The Association for Computer Linguistics
Kingma DP, Ba J (2015) Adam: A method for stochastic optimization. In: Bengio Y, LeCun Y (eds) 3rd International conference on learning representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings
Kundu D, Mandal DP (2019) Formulation of a hybrid expertise retrieval system in community question answering services. Appl Intell 49(2):463–477
Lehmann J, Isele R, Jakob M, Jentzsch A, Kontokostas D, Mendes PN, Hellmann S, Morsey M, van Kleef P, Auer S, Bizer C (2015) Dbpedia - A large-scale, multilingual knowledge base extracted from wikipedia. Semantic Web 6(2):167–195
Li R, Jiang Z, Wang L, Lu X, Zhao M (2020) Directional attention weaving for text-grounded conversational question answering. Neurocomputing 391:13–24
Li Y, Li H (2020) Online transferable representation with heterogeneous sources. Appl Intell 50(6):1674–1686
Lin L, Liu J, Lv Y, Guo F (2020) A similarity model based on reinforcement local maximum connected same destination structure oriented to disordered fusion of knowledge graphs. Appl Intell 50 (9):2867–2886
Lin Y, Liu Z, Sun M, Liu Y, Zhu X (2015) Learning entity and relation embeddings for knowledge graph completion. In: Bonet B, Koenig S (eds) Proceedings of the Twenty-Ninth AAAI conference on artificial intelligence, January 25-30, 2015, Austin, Texas, USA, pp. 2181–2187. AAAI Press
Mikolov T, Chen K, Corrado G, Dean J (2013) Efficient estimation of word representations in vector space. In: Bengio Y, LeCun Y (eds) 1st International conference on learning representations, ICLR 2013, Scottsdale, Arizona, USA, May 2-4, 2013, Workshop Track Proceedings
Miller GA (1995) Wordnet: A lexical database for english. Commun ACM 38(11):39–41
Omran PG, Wang K, Wang Z (2021) An embedding-based approach to rule learning in knowledge graphs. IEEE Trans Knowl Data Eng 33(4):1348–1359
Pennington J, Socher R, Manning CD (2014) Glove: Global vectors for word representation. In: Moschitti A, Pang B, Daelemans W (eds) Proceedings of the 2014 conference on empirical methods in natural language processing, EMNLP 2014, October 25-29, 2014, Doha, Qatar, A meeting of SIGDAT, a Special Interest Group of the ACL, pp. 1532–1543. ACL
Romadhony A, Widyantoro DH, Purwarianti A (2019) Utilizing structured knowledge bases in open IE based event template extraction. Appl Intell 49(1):206–219
Sergieh HM, Botschen T, Gurevych I, Roth S (2018) A multimodal translation-based approach for knowledge graph representation learning. In: Nissim M, Berant J, Lenci A (eds) Proceedings of the seventh joint conference on lexical and computational semantics, *SEM@NAACL-HLT 2018, New Orleans, Louisiana, USA, June 5-6, 2018, pp 225–234. Association for Computational Linguistics
Shafahi A, Najibi M, Xu Z, Dickerson JP, Davis LS, Goldstein T (2020) Universal adversarial training. In: The Thirty-Fourth AAAI conference on artificial intelligence, AAAI 2020, the thirty-second innovative applications of artificial intelligence conference, IAAI 2020, The Tenth AAAI symposium on educational advances in artificial intelligence, EAAI 2020, New York, NY, USA, February 7-12, 2020, pp 5636–5643. AAAI Press
Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. In: Bengio Y, LeCun Y (eds) 3rd International conference on learning representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, conference track proceedings
Tang X, Chen L, Cui J, Wei B (2019) Knowledge representation learning with entity descriptions, hierarchical types, and textual relations. Inf Process Manag 56(3):809–822
Vo A, Nguyen Q, Ock C (2020) Semantic and syntactic analysis in learning representation based on a sentiment analysis model. Appl Intell 50(3):663–680
Wang H, Jiang S, Yu Z (2020) Modeling of complex internal logic for knowledge base completion. Appl Intell 50(10):3336– 3349
Wang LF, Lu X, Jiang Z, Zhang Z, Li R, Zhao M, Chen D (2019) Frs: A simple knowledge graph embedding model for entity prediction. Math Biosci Eng 16(6):7789–7807
Wang Z, Li L, Li Q, Zeng D (2019) Multimodal data enhanced representation learning for knowledge graphs. In: International joint conference on neural networks, IJCNN 2019 Budapest, Hungary, July 14-19, 2019, pp 1–8. IEEE
Wang Z, Zhang J, Feng J, Chen Z (2014) Knowledge graph embedding by translating on hyperplanes. In: Brodley CE, Stone P (eds) Proceedings of the twenty-eighth AAAI conference on artificial intelligence, July 27 -31, 2014, Québec City, Québec, Canada, pp. 1112–1119. AAAI Press
Wu Y, Wang S, Song G, Huang Q (2021) Augmented adversarial training for cross-modal retrieval. IEEE Trans Multim 23:559–571
Xie R, Heinrich S, Liu Z, Weber C, Yao Y, Wermter S, Sun M (2020) Integrating image-based and knowledge-based representation learning. IEEE Trans Cogn Dev Syst 12(2):169–178
Xie R, Liu Z, Jia J, Luan H, Sun M (2016) Representation learning of knowledge graphs with entity descriptions. In: Schuurmans D, Wellman MP (eds) Proceedings of the Thirtieth AAAI conference on artificial intelligence, February 12-17, 2016, Phoenix, Arizona, USA, pp 2659–2665. AAAI Press
Xie R, Liu Z, Luan H, Sun M (2017) Image-embodied knowledge representation learning. In: Sierra C (ed) Proceedings of the twenty-sixth international joint conference on artificial intelligence, IJCAI 2017, Melbourne, Australia, August 19-25, 2017, pp 3140–3146. ijcai.org
Xu C, Yang M, Ao X, Shen Y, Xu R, Tian J (2021) Retrieval-enhanced adversarial training with dynamic memory-augmented attention for image paragraph captioning, vol 214, p 106730
Yan Z, Peng R, Wang Y, Li W (2020) CTEA: context and topic enhanced entity alignment for knowledge graphs. Neurocomputing 410:419–431
Zeb A, Haq AU, Zhang D, Chen J, Gong Z (2021) KGEL: A novel end-to-end embedding learning framework for knowledge graph completion. Expert Syst Appl 167:114164
Zhang Y, Fang Q, Qian S, Xu C (2020) Multi-modal multi-relational feature aggregation network for medical knowledge representation learning. In: Chen CW, Cucchiara R, Hua X, Qi G, Ricci E, Zhang Z, Zimmermann R (eds) MM ’20: The 28th ACM international conference on multimedia, virtual event / seattle, WA, USA, October 12-16, 2020, pp 3956–3965. ACM
Zhang Y, Xu H, Zhang X, Wu X, Yang Z (2021) TRFR: A ternary relation link prediction framework on knowledge graphs. Ad Hoc Netw 113:102402
Zhang Z, Cai J, Zhang Y, Wang J (2020) Learning hierarchy-aware knowledge graph embeddings for link prediction. In: The thirty-fourth AAAI conference on artificial intelligence, AAAI 2020, the thirty-second innovative applications of artificial intelligence conference, IAAI 2020, The Tenth AAAI symposium on educational advances in artificial intelligence, EAAI 2020, New York, NY, USA, February 7-12, 2020, pp. 3065–3072. AAAI Press
Acknowledgments
This work was supported by the Henan key Laboratory for Big Data Processing & Analytics of Electronic Commerce(2020-KF-9).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Lu, X., Wang, L., Jiang, Z. et al. MMKRL: A robust embedding approach for multi-modal knowledge graph representation learning. Appl Intell 52, 7480–7497 (2022). https://doi.org/10.1007/s10489-021-02693-9
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-021-02693-9