Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3318464.3380599acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article
Open access

Realistic Re-evaluation of Knowledge Graph Completion Methods: An Experimental Study

Published: 31 May 2020 Publication History
  • Get Citation Alerts
  • Abstract

    In the active research area of employing embedding models for knowledge graph completion, particularly for the task of link prediction, most prior studies used two benchmark datasets FB15k and WN18 in evaluating such models. Most triples in these and other datasets in such studies belong to reverse and duplicate relations which exhibit high data redundancy due to semantic duplication, correlation or data incompleteness. This is a case of excessive data leakage---a model is trained using features that otherwise would not be available when the model needs to be applied for real prediction. There are also Cartesian product relations for which every triple formed by the Cartesian product of applicable subjects and objects is a true fact. Link prediction on the aforementioned relations is easy and can be achieved with even better accuracy using straightforward rules instead of sophisticated embedding models. A more fundamental defect of these models is that the link prediction scenario, given such data, is non-existent in the real-world. This paper is the first systematic study with the main objective of assessing the true effectiveness of embedding models when the unrealistic triples are removed. Our experiment results show these models are much less accurate than what we used to perceive. Their poor accuracy renders link prediction a task without truly effective automated solution. Hence, we call for re-investigation of possible effective approaches.

    Supplementary Material

    Source Code (3318464.3380599_source_code.zip)
    Read me (3318464.3380599_readme.pdf)

    References

    [1]
    Farahnaz Akrami, Lingbing Guo, Wei Hu, and Chengkai Li. 2018. Re-evaluating Embedding-Based Knowledge Graph Completion Methods. In Proceedings of the 27th ACM International Conference on Information and Knowledge Management (CIKM). 1779--1782. https://doi.org/10.1145/3269206.3269266
    [2]
    Sören Auer, Christian Bizer, Georgi Kobilarov, Jens Lehmann, Richard Cyganiak, and Zachary Ives. 2007. DBpedia: A nucleus for a web of open data. In Proceedings of the 6th International Semantic Web Conference and 2nd Asian Semantic Web Conference (ISWC+ASWC). 722--735.
    [3]
    Ivana Balavzević, Carl Allen, and Timothy M Hospedales. 2019. TuckER: Tensor Factorization for Knowledge Graph Completion. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP+IJCNLP). 5185--5194. https://doi.org/10.18653/v1/D19--1522
    [4]
    Kurt Bollacker, Colin Evans, Praveen Paritosh, Tim Sturge, and Jamie Taylor. 2008. Freebase: a collaboratively created graph database for structuring human knowledge. In Proceedings of the 2008 ACM international conference on Management of data (SIGMOD). 1247--1250. https://doi.org/10.1145/1376616.1376746
    [5]
    Antoine Bordes, Nicolas Usunier, Alberto Garcia-Durán, Jason Weston, and Oksana Yakhnenko. 2013. Translating embeddings for modeling multi-relational data. In Proceedings of the 26th International Conference on Neural Information Processing Systems (NeurIPS). 2787--2795.
    [6]
    Hongyun Cai, Vincent W Zheng, and Kevin Chen-Chuan Chang. 2018. A comprehensive survey of graph embedding: Problems, techniques, and applications. IEEE Transactions on Knowledge and Data Engineering (TKDE), Vol. 30, 9 (2018), 1616--1637. https://doi.org/10.1109/TKDE.2018.2807452
    [7]
    Andrew Carlson, Justin Betteridge, Bryan Kisiel, Burr Settles, Estevam R Hruschka Jr, and Tom M Mitchell. 2010. Toward an architecture for never-ending language learning. In Proceedings of the 24th AAAI Conference on Artificial Intelligence (AAAI). 1306--1313.
    [8]
    Rajarshi Das, Shehzaad Dhuliawala, Manzil Zaheer, Luke Vilnis, Ishan Durugkar, Akshay Krishnamurthy, Alex Smola, and Andrew McCallum. 2018. Go for a Walk and Arrive at the Answer: Reasoning Over Paths in Knowledge Bases using Reinforcement Learning. In Proceedings of the 6th International Conference on Learning Representations (ICLR) .
    [9]
    Tim Dettmers, Minervini Pasquale, Stenetorp Pontus, and Sebastian Riedel. 2018. Convolutional 2D Knowledge Graph Embeddings. In Proceedings of the 32nd AAAI Conference on Artificial Intelligence (AAAI). 1811--1818.
    [10]
    Jeffrey Scott Eder. 2012. Knowledge graph based search system. US Patent App. 13/404,109.
    [11]
    Michael Farber. 2017. Semantic Search for Novel Information .IOS Press, Amsterdam, The Netherlands, The Netherlands.
    [12]
    Luis Galárraga, Christina Teflioudi, Katja Hose, and Fabian M. Suchanek. 2015. Fast Rule Mining in Ontological Knowledge Bases with AMIE+. The VLDB Journal, Vol. 24, 6 (Dec. 2015), 707--730. https://doi.org/10.1007/s00778-015-0394--1
    [13]
    Luis Antonio Galárraga, Christina Teflioudi, Katja Hose, and Fabian Suchanek. 2013. AMIE: association rule mining under incomplete evidence in ontological knowledge bases. In Proceedings of the 22nd international conference on World Wide Web (WWW). 413--422. https://doi.org/10.1145/2488388.2488425
    [14]
    Xu Han, Shulin Cao, Xin Lv, Yankai Lin, Zhiyuan Liu, Maosong Sun, and Juanzi Li. 2018. OpenKE: An Open Toolkit for Knowledge Embedding. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing: System Demonstrations (EMNLP). 139--144. https://doi.org/10.18653/v1/D18--2024
    [15]
    Guoliang Ji, Shizhu He, Liheng Xu, Kang Liu, and Jun Zhao. 2015. Knowledge Graph Embedding via Dynamic Mapping Matrix. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (ACL+IJCNLP). 687--696. https://doi.org/10.3115/v1/P15--1067
    [16]
    Ni Lao and William W Cohen. 2010. Relational retrieval using a combination of path-constrained random walks. Machine learning, Vol. 81, 1 (2010), 53--67. https://doi.org/10.1007/s10994-010--5205--8
    [17]
    Xi Victoria Lin, Richard Socher, and Caiming Xiong. 2018. Multi-hop knowledge graph reasoning with reward shaping. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing (EMNLP) (2018), 3243--3253.
    [18]
    Yankai Lin, Zhiyuan Liu, Huanbo Luan, Maosong Sun, Siwei Rao, and Song Liu. 2015a. Modeling Relation Paths for Representation Learning of Knowledge Bases. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing (EMNLP). 705--714. https://doi.org/10.18653/v1/D15--1082
    [19]
    Yankai Lin, Zhiyuan Liu, Maosong Sun, Yang Liu, and Xuan Zhu. 2015b. Learning Entity and Relation Embeddings for Knowledge Graph Completion. In Proceedings of the 29th AAAI Conference on Artificial Intelligence (AAAI). 2181--2187.
    [20]
    Farzaneh Mahdisoltani, Joanna Biega, and Fabian M Suchanek. 2015. Yago3: A knowledge base from multilingual wikipedias. In Conference on Innovative Data Systems Research (CIDR) .
    [21]
    Christian Meilicke, Manuel Fink, Yanjie Wang, Daniel Ruffinelli, Rainer Gemulla, and Heiner Stuckenschmidt. 2018. Fine-grained evaluation of rule-and embedding-based systems for knowledge graph completion. In International Semantic Web Conference (ISWC). https://doi.org/10.1007/978--3-030-00671--6_1
    [22]
    George A. Miller. 1995. WordNet: A Lexical Database for English. Commun. ACM, Vol. 38, 11 (Nov. 1995), 39--41.
    [23]
    Maximilian Nickel, Kevin Murphy, Volker Tresp, and Evgeniy Gabrilovich. 2016. A review of relational machine learning for knowledge graphs. Proc. IEEE, Vol. 104, 1 (2016), 11--33. https://doi.org/10.1109/JPROC.2015.2483592
    [24]
    Maximilian Nickel, Volker Tresp, and Hans-Peter Kriegel. 2011. A Three-Way Model for Collective Learning on Multi-Relational Data. In Proceedings of the 28th International Conference on Machine Learning (ICML). 809--816.
    [25]
    Thomas Pellissier Tanon, Denny Vrandevc ić, Sebastian Schaffert, Thomas Steiner, and Lydia Pintscher. 2016. From Freebase to Wikidata: The Great Migration. In Proceedings of the 25th International Conference on World Wide Web (WWW). 1419--1428.
    [26]
    Maya Rotmensch, Yoni Halpern, Abdulhakim Tlimat, Steven Horng, and David Alexander Sontag. 2017. Learning a Health Knowledge Graph from Electronic Medical Records. Scientific Reports, Vol. 7 (12 2017). https://doi.org/10.1038/s41598-017-05778-z
    [27]
    Baoxu Shi and Tim Weninger. 2017. ProjE: Embedding projection for knowledge graph completion. In Proceedings of the 31st AAAI Conference on Artificial Intelligence (AAAI). 1236--1242.
    [28]
    Richard Socher, Danqi Chen, Christopher D Manning, and Andrew Ng. 2013. Reasoning with neural tensor networks for knowledge base completion. In Proceedings of the 26th International Conference on Neural Information Processing Systems (NeurIPS). 926--934.
    [29]
    Fabian M Suchanek, Gjergji Kasneci, and Gerhard Weikum. 2008. Yago: A large ontology from wikipedia and wordnet. Web Semantics, Vol. 6, 3 (2008), 203--217. https://doi.org/10.1016/j.websem.2008.06.001
    [30]
    Zhiqing Sun, Zhi-Hong Deng, Jian-Yun Nie, and Jian Tang. 2019. RotatE: Knowledge Graph Embedding by Relational Rotation in Complex Space. In Proceedings of the International Conference on Learning Representations (ICLR). 926--934.
    [31]
    Kristina Toutanova and Danqi Chen. 2015. Observed versus latent features for knowledge base and text inference. In Proceedings of the 3rd Workshop on Continuous Vector Space Models and their Compositionality. 57--66. https://doi.org/10.18653/v1/W15--4007
    [32]
    Théo Trouillon, Johannes Welbl, Sebastian Riedel, Éric Gaussier, and Guillaume Bouchard. 2016. Complex embeddings for simple link prediction. In Proceedings of the 33rd International Conference on Machine Learning (ICML). 2071--2080.
    [33]
    Ledyard R Tucker et al. 1964. The extension of factor analysis to three-dimensional matrices. Contributions to mathematical psychology, Vol. 110119 (1964).
    [34]
    Denny Vrandevc ić and Markus Krötzsch. 2014. Wikidata: a free collaborative knowledgebase. Commun. ACM, Vol. 57, 10 (2014), 78--85.
    [35]
    Yanjie Wang, Daniel Ruffinelli, Rainer Gemulla, Samuel Broscheit, and Christian Meilicke. 2019. On Evaluating Embedding Models for Knowledge Base Completion. In Proceedings of the 4th Workshop on Representation Learning for NLP (RepL4NLP-2019). 104--112. https://doi.org/10.18653/v1/W19--4313
    [36]
    Zhen Wang, Jianwen Zhang, Jianlin Feng, and Zheng Chen. 2014. Knowledge Graph Embedding by Translating on Hyperplanes. In Proceedings of the 28th AAAI Conference on Artificial Intelligence (AAAI). 1112--1119.
    [37]
    Jason Weston, Antoine Bordes, Oksana Yakhnenko, and Nicolas Usunier. 2013. Connecting Language and Knowledge Bases with Embedding Models for Relation Extraction. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing (EMNLP). 1366--1371.
    [38]
    Kun Xu, Siva Reddy, Yansong Feng, Songfang Huang, and Dongyan Zhao. 2016. Question Answering on Freebase via Relation Extraction and Textual Evidence. In In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (ACL). 2326--2336. https://doi.org/10.18653/v1/P16--1220
    [39]
    Bishan Yang, Wen-tau Yih, Xiaodong He, Jianfeng Gao, and Li Deng. 2015. Embedding entities and relations for learning and inference in knowledge bases. In Proceedings of the International Conference on Learning Representations (ICLR) .
    [40]
    Xuchen Yao and Benjamin Van Durme. 2014. Information extraction over structured data: Question answering with freebase. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (ACL), Vol. 1. 956--966.
    [41]
    Wen-tau Yih, Ming-Wei Chang, Xiaodong He, and Jianfeng Gao. 2015. Semantic Parsing via Staged Query Graph Generation: Question Answering with Knowledge Base. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (ACL+IJCNLP). 1321--1331. https://doi.org/10.3115/v1/P15--1128

    Cited By

    View all
    • (2024)Uncovering CWE-CVE-CPE Relations with Threat Knowledge GraphsACM Transactions on Privacy and Security10.1145/364181927:1(1-26)Online publication date: 5-Feb-2024
    • (2024)Tracing the Impact of Bias in Link PredictionProceedings of the 39th ACM/SIGAPP Symposium on Applied Computing10.1145/3605098.3635912(1626-1633)Online publication date: 8-Apr-2024
    • (2024)A joint knowledge representation learning of sentence vectors weighting and primary neighbor constraintsKnowledge and Information Systems10.1007/s10115-024-02174-8Online publication date: 16-Jul-2024
    • Show More Cited By

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SIGMOD '20: Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data
    June 2020
    2925 pages
    ISBN:9781450367356
    DOI:10.1145/3318464
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 31 May 2020

    Permissions

    Request permissions for this article.

    Check for updates

    Badges

    Author Tags

    1. embedding models
    2. knowledge graph completion
    3. link prediction

    Qualifiers

    • Research-article

    Funding Sources

    Conference

    SIGMOD/PODS '20
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 785 of 4,003 submissions, 20%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)467
    • Downloads (Last 6 weeks)34
    Reflects downloads up to

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Uncovering CWE-CVE-CPE Relations with Threat Knowledge GraphsACM Transactions on Privacy and Security10.1145/364181927:1(1-26)Online publication date: 5-Feb-2024
    • (2024)Tracing the Impact of Bias in Link PredictionProceedings of the 39th ACM/SIGAPP Symposium on Applied Computing10.1145/3605098.3635912(1626-1633)Online publication date: 8-Apr-2024
    • (2024)A joint knowledge representation learning of sentence vectors weighting and primary neighbor constraintsKnowledge and Information Systems10.1007/s10115-024-02174-8Online publication date: 16-Jul-2024
    • (2023)Vineyard: Optimizing Data Sharing in Data-Intensive AnalyticsProceedings of the ACM on Management of Data10.1145/35897801:2(1-27)Online publication date: 20-Jun-2023
    • (2023)Popularity Ratio Maximization: Surpassing Competitors through Influence PropagationProceedings of the ACM on Management of Data10.1145/35893091:2(1-26)Online publication date: 20-Jun-2023
    • (2023)EARLY: Efficient and Reliable Graph Neural Network for Dynamic GraphsProceedings of the ACM on Management of Data10.1145/35893081:2(1-28)Online publication date: 20-Jun-2023
    • (2023)Data Stream Clustering: An In-depth Empirical StudyProceedings of the ACM on Management of Data10.1145/35893071:2(1-26)Online publication date: 20-Jun-2023
    • (2023)Deep Active Alignment of Knowledge Graph Entities and SchemataProceedings of the ACM on Management of Data10.1145/35893041:2(1-26)Online publication date: 20-Jun-2023
    • (2023)GIO: Generating Efficient Matrix and Frame Readers for Custom Data Formats by ExampleProceedings of the ACM on Management of Data10.1145/35892651:2(1-26)Online publication date: 20-Jun-2023
    • (2023)Practical Differentially Private and Byzantine-resilient Federated LearningProceedings of the ACM on Management of Data10.1145/35892641:2(1-26)Online publication date: 20-Jun-2023
    • Show More Cited By

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Get Access

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media