Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

ProMvSD: : Towards unsupervised knowledge graph anomaly detection via prior knowledge integration and multi-view semantic-driven estimation

Published: 18 July 2024 Publication History
  • Get Citation Alerts
  • Abstract

    Knowledge graphs (KGs) have found extensive applications within intelligent systems, such as information retrieval. Much of the research has predominantly focused on completing missing knowledge, with little consideration given to examining errors. Unfortunately, during customizing KGs, diverse unpredictable errors are virtually unavoidable to be introduced, and these anomalies significantly impact the performance of applications. Detecting erroneous knowledge presents a formidable challenge due to the costly acquisition of ground-truth labels. In this work, we develop an unsupervised anomaly detection framework named ProMvSD, aiming to adapt KGs of varying scales via serialization components. To overcome the insufficient contextual information provided by the topological structure, we introduce the large language model as a reasoner to extract prior knowledge from extensive pre-trained textual data, thereby enhancing the understanding of KGs. Anomalous triple may result in a larger semantic gap between the head and tail neighborhoods. To uncover latent anomalies effectively, we propose a multi-view semantic-driven model (MvSD) based on the assumptions of self-consistency and information stability. MvSD jointly estimates the suspiciousness of triples from three hyperviews: node-view semantic contradiction, triple-view semantic gap, and pathway-view semantic gap. Extensive experiments on three English benchmark KGs and a Chinese medical KG demonstrate that, for the top 1% of the most suspicious triples, we can detect real anomalies with at most 99.9% accuracy. Furthermore, ProMvSD significantly outperforms state-of-the-art representation learning baselines, achieving a 29.2% improvement in detecting all anomalies.

    Highlights

    An unsupervised anomaly detection framework is developed for knowledge graphs.
    A multi-view semantic-driven model is proposed to detect anomalies.
    The LLM is integrated as a reasoner to extract prior knowledge.
    Extensive experiments are conducted to evaluate the effectiveness and robustness.

    References

    [1]
    Auer S., Bizer C., Kobilarov G., Lehmann J., Cyganiak R., Ives Z., Dbpedia: A nucleus for a web of open data, in: International semantic web conference, Springer, 2007, pp. 722–735.
    [2]
    Belth C., Zheng X., Vreeken J., Koutra D., What is normal, what is strange, and what is missing in a knowledge graph: Unified characterization via inductive summarization, in: Proceedings of the web conference 2020, Association for Computing Machinery, New York, NY, USA, 2020, pp. 1115–1126,.
    [3]
    Bollacker K., Evans C., Paritosh P., Sturge T., Taylor J., Freebase: A collaboratively created graph database for structuring human knowledge, in: Proceedings of the 2008 ACM SIGMOD international conference on management of data, Association for Computing Machinery, New York, NY, USA, 2008, pp. 1247–1250,.
    [4]
    Bordes A., Usunier N., Garcia-Duran A., Weston J., Yakhnenko O., Translating embeddings for modeling multi-relational data, Advances in Neural Information Processing Systems 26 (2013).
    [5]
    Carlson, A., Betteridge, J., Kisiel, B., Settles, B., Hruschka, E., & Mitchell, T. (2010). Toward an architecture for never-ending language learning. Vol. 24, In Proceedings of the AAAI conference on artificial intelligence (pp. 1306–1313).
    [6]
    Chung H.W., Hou L., Longpre S., Zoph B., Tay Y., Fedus W., et al., Scaling instruction-finetuned language models, 2022,. URL: https://arxiv.org/abs/2210.11416.
    [7]
    Dettmers, T., Minervini, P., Stenetorp, P., & Riedel, S. (2018). Convolutional 2d knowledge graph embeddings. Vol. 32, In Proceedings of the AAAI conference on artificial intelligence.
    [8]
    Devlin J., Chang M., Lee K., Toutanova K., BERT: pre-training of deep bidirectional transformers for language understanding, 2018, CoRR abs/1810.04805.
    [9]
    Devlin J., Chang M.-W., Lee K., Toutanova K., BERT: Pre-training of deep bidirectional transformers for language understanding, in: Burstein J., Doran C., Solorio T. (Eds.), Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, volume 1 (long and short papers), Association for Computational Linguistics, Minneapolis, Minnesota, 2019, pp. 4171–4186,. URL: https://aclanthology.org/N19-1423.
    [10]
    Dong J., Zhang Q., Huang X., Tan Q., Zha D., Zihao Z., Active ensemble learning for knowledge graph error detection, in: Proceedings of the sixteenth ACM international conference on web search and data mining, Association for Computing Machinery, New York, NY, USA, 2023, pp. 877–885,.
    [11]
    Du Z., Qian Y., Liu X., Ding M., Qiu J., Yang Z., et al., GLM: General language model pretraining with autoregressive blank infilling, in: Proceedings of the 60th annual meeting of the association for computational linguistics (volume 1: long papers), Association for Computational Linguistics, Dublin, Ireland, 2022, pp. 320–335,. URL: https://aclanthology.org/2022.acl-long.26.
    [12]
    Fan, M., Zhou, Q., Chang, E., & Zheng, F. (2014). Transition-based knowledge graph embedding with relational mapping properties. In Proceedings of the 28th Pacific Asia conference on language, information and computing (pp. 328–337).
    [13]
    Galárraga L.A., Teflioudi C., Hose K., Suchanek F., AMIE: Association rule mining under incomplete evidence in ontological knowledge bases, in: Proceedings of the 22nd international conference on world wide web, Association for Computing Machinery, New York, NY, USA, 2013, pp. 413–422,.
    [14]
    Ge C., Gao Y., Weng H., Zhang C., Miao X., Zheng B., Kgclean: An embedding powered knowledge graph cleaning framework, 2020, arXiv preprint arXiv:2004.14478.
    [15]
    Grubb T., Andersen B., Alonso O., On reliability scores for knowledge graphs, in: Companion proceedings of the web conference 2022, Association for Computing Machinery, New York, NY, USA, 2022, pp. 84–88,.
    [16]
    Hossin M., Sulaiman M.N., A review on evaluation metrics for data classification evaluations, International Journal of Data Mining & Knowledge Management Process 5 (2) (2015) 1.
    [17]
    Huang X., Zhang J., Li D., Li P., Knowledge graph embedding based question answering, in: Proceedings of the twelfth ACM international conference on web search and data mining, Association for Computing Machinery, New York, NY, USA, 2019, pp. 105–113,.
    [18]
    Jia S., Xiang Y., Chen X., Wang K., Shijia X., Triple trustworthiness measurement for knowledge graph, in: The world wide web conference, Association for Computing Machinery, New York, NY, USA, 2019, pp. 2865–2871,.
    [19]
    Kazemi S.M., Poole D., Simple embedding for link prediction in knowledge graphs, Bengio S., Wallach H., Larochelle H., Grauman K., Cesa-Bianchi N., Garnett R. (Eds.), Advances in neural information processing systems, vol. 31, Curran Associates, Inc., 2018, URL: https://proceedings.neurips.cc/paper_files/paper/2018/file/b2ab001909a8a6f04b51920306046ce5-Paper.pdf.
    [20]
    Kingma D.P., Ba J., Adam: A method for stochastic optimization, 2014, arXiv preprint arXiv:1412.6980.
    [21]
    Lewis M., Liu Y., Goyal N., Ghazvininejad M., Mohamed A., Levy O., et al., BART: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension, in: Proceedings of the 58th annual meeting of the association for computational linguistics, Association for Computational Linguistics, Online, 2020, pp. 7871–7880,. URL: https://aclanthology.org/2020.acl-main.703.
    [22]
    Li Z., Zhang Q., Zhu F., Li D., Zheng C., Zhang Y., Knowledge graph representation learning with simplifying hierarchical feature propagation, Information Processing & Management 60 (4) (2023).
    [23]
    Lin, Y., Liu, Z., Sun, M., Liu, Y., & Zhu, X. (2015). Learning entity and relation embeddings for knowledge graph completion. Vol. 29, In Proceedings of the AAAI conference on artificial intelligence.
    [24]
    Liu H., Wu Y., Yang Y., Analogical inference for multi-relational embeddings, in: Proceedings of the 34th international conference on machine learning - volume 70, JMLR.org, 2017, pp. 2168–2178.
    [25]
    Liu P., Yuan W., Fu J., Jiang Z., Hayashi H., Neubig G., Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing, ACM Computing Surveys 55 (9) (2023).
    [26]
    Lu Y.-J., Li C.-T., GCAN: Graph-aware co-attention networks for explainable fake news detection on social media, in: Jurafsky D., Chai J., Schluter N., Tetreault J. (Eds.), Proceedings of the 58th annual meeting of the association for computational linguistics, Association for Computational Linguistics, Online, 2020, pp. 505–514,. URL: https://aclanthology.org/2020.acl-main.48.
    [27]
    Ma Y., Gao H., Wu T., Qi G., Learning disjointness axioms with association rule mining and its application to inconsistency detection of linked data, in: The semantic web and web science: 8th Chinese conference, CSWS 2014, Wuhan, China, August 8-12, 2014, revised selected papers 8, Springer, 2014, pp. 29–41.
    [28]
    Ma J., Zhou C., Wang Y., Guo Y., Hu G., Qiao Y., et al., PTrustE: A high-accuracy knowledge graph noise detection method based on path trustworthiness and triple embedding, Knowledge-Based Systems 256 (2022).
    [29]
    Mahdisoltani, F., Biega, J., & Suchanek, F. M. (2013). YAGO3: A Knowledge Base from Multilingual Wikipedias. In CIDR. Asilomar, United States: URL:.
    [30]
    Mao, X., Wang, W., Xu, H., Lan, M., & Wu, Y. (2020). MRAEA: an efficient and robust entity alignment approach for cross-lingual knowledge graph. In Proceedings of the 13th international conference on web search and data mining (pp. 420–428).
    [31]
    Meilicke C., Chekol M.W., Ruffinelli D., Stuckenschmidt H., Anytime bottom-up rule learning for knowledge graph completion, in: Proceedings of the 28th international joint conference on artificial intelligence, AAAI Press, 2019, pp. 3137–3143.
    [32]
    Meilicke C., Fink M., Wang Y., Ruffinelli D., Gemulla R., Stuckenschmidt H., Fine-grained evaluation of rule-and embedding-based systems for knowledge graph completion, in: The semantic web–ISWC 2018: 17th international semantic web conference, Monterey, CA, USA, October 8–12, 2018, proceedings, part I 17, Springer, 2018, pp. 3–20.
    [33]
    Melo A., Paulheim H., Detection of relation assertion errors in knowledge graphs, in: Proceedings of the knowledge capture conference, in: K-CAP 2017, Association for Computing Machinery, New York, NY, USA, 2017,.
    [34]
    Nathani D., Chauhan J., Sharma C., Kaul M., Learning attention-based embeddings for relation prediction in knowledge graphs, in: Proceedings of the 57th annual meeting of the association for computational linguistics, Association for Computational Linguistics, Florence, Italy, 2019, pp. 4710–4723,. URL: https://aclanthology.org/P19-1466.
    [35]
    Nguyen D.Q., Nguyen T.D., Nguyen D.Q., Phung D., A novel embedding model for knowledge base completion based on convolutional neural network, in: Walker M., Ji H., Stent A. (Eds.), Proceedings of the 2018 conference of the North American chapter of the association for computational linguistics: human language technologies, volume 2 (short papers), Association for Computational Linguistics, New Orleans, Louisiana, 2018, pp. 327–333,. URL: https://aclanthology.org/N18-2053.
    [36]
    Ouyang L., Wu J., Jiang X., Almeida D., Wainwright C., Mishkin P., et al., Training language models to follow instructions with human feedback, Advances in Neural Information Processing Systems 35 (2022) 27730–27744.
    [37]
    Pan S., Luo L., Wang Y., Chen C., Wang J., Wu X., Unifying large language models and knowledge graphs: A roadmap, 2023, arXiv preprint arXiv:2306.08302.
    [38]
    Paulheim H., Bizer C., Improving the quality of linked data using statistical distributions, International Journal on Semantic Web and Information Systems (IJSWIS) 10 (2) (2014) 63–86.
    [39]
    Sheng J., Guo S., Chen Z., Yue J., Wang L., Liu T., et al., Adaptive attentional network for few-shot knowledge graph completion, in: Proceedings of the 2020 conference on empirical methods in natural language processing, Association for Computational Linguistics, Online, 2020, pp. 1681–1691,. URL: https://aclanthology.org/2020.emnlp-main.131.
    [40]
    Shi B., Wang H., Li Y., Deng S., RelaGraph: Improving embedding on small-scale sparse knowledge graphs by neighborhood relations, Information Processing & Management 60 (5) (2023).
    [41]
    Sun Z., Deng Z.-H., Nie J.-Y., Tang J., Rotate: Knowledge graph embedding by relational rotation in complex space, 2019, arXiv preprint arXiv:1902.10197.
    [42]
    Sun, Z., Wang, C., Hu, W., Chen, M., Dai, J., Zhang, W., et al. (2020). Knowledge Graph Alignment Network with Gated Multi-Hop Neighborhood Aggregation. Vol. 34, In Proceedings of the AAAI conference on artificial intelligence (pp. 222–229).
    [43]
    Thirunavukarasu A.J., Ting D.S.J., Elangovan K., Gutierrez L., Tan T.F., Ting D.S.W., Large language models in medicine, Nature Medicine (2023) 1–11.
    [44]
    Toutanova K., Chen D., Observed versus latent features for knowledge base and text inference, in: Proceedings of the 3rd workshop on continuous vector space models and their compositionality, Association for Computational Linguistics, Beijing, China, 2015, pp. 57–66,. URL: https://aclanthology.org/W15-4007.
    [45]
    Trouillon T., Welbl J., Riedel S., Gaussier E., Bouchard G., Complex embeddings for simple link prediction, in: Balcan M.F., Weinberger K.Q. (Eds.), Proceedings of the 33rd international conference on machine learning, in: Proceedings of machine learning research, vol. 48, PMLR, New York, New York, USA, 2016, pp. 2071–2080. URL: https://proceedings.mlr.press/v48/trouillon16.html.
    [46]
    Turner J.C., Oakes P.J., The significance of the social identity concept for social psychology with reference to individualism, interactionism and social influence, British Journal of Social Psychology 25 (3) (1986) 237–252.
    [47]
    Vaswani A., Shazeer N., Parmar N., Uszkoreit J., Jones L., Gomez A.N., et al., Attention is all you need, Advances in Neural Information Processing Systems 30 (2017).
    [48]
    Wang Q., Mao Z., Wang B., Guo L., Knowledge graph embedding: A survey of approaches and applications, IEEE Transactions on Knowledge and Data Engineering 29 (12) (2017) 2724–2743.
    [49]
    Wang, Z., Zhang, J., Feng, J., & Chen, Z. (2014). Knowledge graph embedding by translating on hyperplanes. Vol. 28, In Proceedings of the AAAI conference on artificial intelligence.
    [50]
    White J., Fu Q., Hays S., Sandborn M., Olea C., Gilbert H., et al., A prompt pattern catalog to enhance prompt engineering with chatgpt, 2023, arXiv preprint arXiv:2302.11382.
    [51]
    Xie R., Liu Z., Lin F., Lin L., Does William Shakespeare really write hamlet? Knowledge representation learning with confidence, AAAI’18/IAAI’18/EAAI’18, AAAI Press, 2018.
    [52]
    Yang Y., Rao Y., Yu M., Kang Y., Multi-layer information fusion based on graph convolutional network for knowledge-driven herb recommendation, Neural Networks 146 (2022) 1–10.
    [53]
    Yang B., Yih W.-t., He X., Gao J., Deng L., Embedding entities and relations for learning and inference in knowledge bases, 2014, arXiv preprint arXiv:1412.6575.
    [54]
    Yao L., Mao C., Luo Y., KG-BERT: BERT for knowledge graph completion, 2019, arXiv preprint arXiv:1909.03193.
    [55]
    Ye C., Xu H., Zhang H., Wu Y., Dai G., Grier: graph repairing based on iterative embedding and rules, Knowledge and Information Systems 65 (8) (2023) 3273–3294.
    [56]
    Zaveri, A., Kontokostas, D., Sherif, M. A., Bühmann, L., Morsey, M., Auer, S., et al. (2013). User-driven quality evaluation of dbpedia. In Proceedings of the 9th international conference on semantic systems (pp. 97–104).
    [57]
    Zhang Z., Chen J., Chen X., Liu H., Xiang Y., Liu B., et al., An industry evaluation of embedding-based entity alignment, 2020, arXiv preprint arXiv:2010.11522.
    [58]
    Zhang Q., Dong J., Duan K., Huang X., Liu Y., Xu L., Contrastive knowledge graph error detection, in: Proceedings of the 31st ACM international conference on information & knowledge management, Association for Computing Machinery, New York, NY, USA, 2022, pp. 2590–2599,.
    [59]
    Zhang Q., Dong J., Tan Q., Huang X., Integrating entity attributes for error-aware knowledge graph embedding, IEEE Transactions on Knowledge and Data Engineering (2023).
    [60]
    Zhang B., Haddow B., Birch A., Prompting large language model for machine translation: A case study, 2023, arXiv preprint arXiv:2301.07069.

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image Information Processing and Management: an International Journal
    Information Processing and Management: an International Journal  Volume 61, Issue 4
    Jul 2024
    1167 pages

    Publisher

    Pergamon Press, Inc.

    United States

    Publication History

    Published: 18 July 2024

    Author Tags

    1. Knowledge graph
    2. Anomaly detection
    3. Pre-trained language models
    4. Semantics
    5. Unsupervised learning

    Qualifiers

    • Research-article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 0
      Total Downloads
    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 27 Jul 2024

    Other Metrics

    Citations

    View Options

    View options

    Get Access

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media