Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3661304.3661901acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article
Free access

A Benchmark to Understand the Role of Knowledge Graphs on Large Language Model's Accuracy for Question Answering on Enterprise SQL Databases

Published: 09 June 2024 Publication History
  • Get Citation Alerts
  • Abstract

    Enterprise applications of Large Language Models (LLMs) hold promise for question answering on enterprise SQL databases. However, the extent to which LLMs can accurately respond to enterprise questions in such databases remains unclear, given the absence of suitable Text-to-SQL benchmarks tailored to enterprise settings. Additionally, the potential of Knowledge Graphs (KGs) to enhance LLM-based question answering by providing business context is not well understood. This study aims to evaluate the accuracy of LLM-powered question answering systems in the context of enterprise questions and SQL databases, while also exploring the role of knowledge graphs in improving accuracy. To achieve this, we introduce a benchmark comprising an enterprise SQL schema in the insurance domain, a range of enterprise queries encompassing reporting to metrics, and a contextual layer incorporating an ontology and mappings that define a knowledge graph. Our primary finding reveals that question answering using GPT-4, with zero-shot prompts directly on SQL databases, achieves an accuracy of 16%. Notably, this accuracy increases to 54% when questions are posed over a Knowledge Graph representation of the enterprise SQL database. Therefore, investing in Knowledge Graph provides higher accuracy for LLM powered question answering systems.

    References

    [1]
    [n.d.]. NSQL. https://github.com/NumbersStationAI/NSQL
    [2]
    [n. d.]. SQLCoder. https://github.com/defog-ai/sqlcoder
    [3]
    [n.d.]. Ultra. https://github.com/DeepGraphLearning/ULTRA
    [4]
    Oren Etzioni Ana-Maria Popescu and Henry Kautz. 2003. Towards a Theory of Natural Language Interfaces to Databases. In Proceedings of the 8th International Conference on Intelligent User Interfaces (Miami, Florida, USA). 149--157. http://doi.acm.org/10.1145/604045.604070
    [5]
    Li Zhang Karthik Ramanathan Sesh Sadasivam Rui Zhang Catherine Finegan-Dollak, Jonathan K. Kummerfeld and Dragomir Radev. 2018. Improving Text-to-SQL Evaluation Methodology. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (Melbourne, Victoria, Australia). 351--360. http://aclweb.org/anthology/P18-1033
    [6]
    Michael Brown William Fisher Kate Hunicke-Smith David Pallett Christine Pao Alexander Rudnicky Deborah A. Dahl, Madeleine Bates and Elizabeth Shriber. 1994. Expanding the scope of the ATIS task: The ATIS-3 corpus. Proceedings of the workshop on Human Language Technology (1994), 43--48. htttp://dl.acm.org/citation.cfm?id=1075823
    [7]
    Catherine Finegan-Dollak, Jonathan K. Kummerfeld, Li Zhang, Karthik Ramanathan, Sesh Sadasivam, Rui Zhang, and Dragomir Radev. 2018. Improving Text-to-SQL Evaluation Methodology. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Iryna Gurevych and Yusuke Miyao (Eds.). Association for Computational Linguistics, Melbourne, Australia, 351--360. https://doi.org/10.18653/v1/P18-1033
    [8]
    Mikhail Galkin, Xinyu Yuan, Hesham Mostafa, Jian Tang, and Zhaocheng Zhu. 2023. Towards Foundation Models for Knowledge Graph Reasoning. (2023). arXiv:2310.04562 [cs.CL]
    [9]
    Alessandra Giordani and Alessandro Moschitti. 2012. Automatic Generation and Reranking of SQL-derived Answers to NL Questions. In Proceedings of the Second International Conference on Trustworthy Eternal Systems via Evolving Software, Data and Knowledge (Montpellier, France). 59--76. https://doi.org/10.1007/978-3-642-45260-4_5
    [10]
    Parke Godfrey and Jarek Gryz. 1999. Answering Queries by Semantic Caches. In Database and Expert Systems Applications, 10th International Conference, DEXA '99, Florence, Italy, August 30-September 3, 1999, Proceedings (Lecture Notes in Computer Science, Vol. 1677), Trevor J. M. Bench-Capon, Giovanni Soda, and A Min Tjoa (Eds.). Springer, 485--498. https://doi.org/10.1007/3-540-48309-8_45
    [11]
    Bert F. Green, Alice K. Wolf, Carol Chomsky, and Kenneth Laughery. 1961. Baseball: An Automatic Question-Answerer. In Papers Presented at the May 9-11, 1961, Western Joint IRE-AIEE-ACM Computer Conference (Los Angeles, California) (IRE-AIEE-ACM '61 (Western)). Association for Computing Machinery, New York, NY, USA, 219--224. https://doi.org/10.1145/1460690.1460714
    [12]
    C. Cordell Green and Bertram Raphael. 1968. The Use of Theorem-Proving Techniques in Question-Answering Systems. In Proceedings of the 1968 23rd ACM National Conference (ACM '68). Association for Computing Machinery, New York, NY, USA, 169--181. https://doi.org/10.1145/800186.810578
    [13]
    Henry Cook Guido De Simoni, Robert Thanaraj. 2023. Adopt a Data Semantics Approach to Drive Business Value.
    [14]
    Gary G. Hendrix, Earl D. Sacerdoti, Daniel Sagalowicz, and Jonathan Slocum. 1978. Developing a Natural Language Interface to Complex Data. ACM Trans. Database Syst. 3, 2 (jun 1978), 105--147. https://doi.org/10.1145/320251.320253
    [15]
    Chia-Hsuan Lee, Oleksandr Polozov, and Matthew Richardson. 2021. KaggleDBQA: Realistic Evaluation of Text-to-SQL Parsers. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), Chengqing Zong, Fei Xia, Wenjie Li, and Roberto Navigli (Eds.). Association for Computational Linguistics, Online, 2261--2273. https://doi.org/10.18653/v1/2021.acl-long.176
    [16]
    Tony Lee, Michihiro Yasunaga, Chenlin Meng, Yifan Mai, Joon Sung Park, Agrim Gupta, Yunzhi Zhang, Deepak Narayanan, Hannah Benita Teufel, Marco Bellagente, Minguk Kang, Taesung Park, Jure Leskovec, Jun-Yan Zhu, Li Fei-Fei, Jiajun Wu, Stefano Ermon, and Percy Liang. 2023. Holistic Evaluation of Text-To-Image Models. arXiv:2311.04287 [cs.CV]
    [17]
    Fei Li and H. V. Jagadish. 2014. Constructing an Interactive Natural Language Interface for Relational Databases. Proceedings of the VLDB Endowment 8, 1 (September 2014), 73--84. http://dx.doi.org/10.14778/2735461.2735468
    [18]
    Isil Dillig Navid Yaghmazadeh, Yuepeng Wang and Thomas Dillig. 2017. SQLizer: Query Synthesis from Natural Language. In International Conference on Object-Oriented Programming, Systems, Languages, and Applications, ACM. 63:1--63:26. http://doi.org/10.1145/3133887
    [19]
    Shirui Pan, Linhao Luo, Yufei Wang, Chen Chen, Jiapu Wang, and Xindong Wu. 2023. Unifying Large Language Models and Knowledge Graphs: A Roadmap. arXiv preprint arxiv:306.08302 (2023).
    [20]
    Juan Sequeda and Ora Lassila. 2021. Designing and Building Enterprise Knowledge Graphs. Morgan & Claypool Publishers. https://doi.org/10.2200/S01105ED1V01Y202105DSK020
    [21]
    Alvin Cheung Jayant Krishnamurthy Srinivasan Iyer, Ioannis Konstas and Luke Zettlemoyer. 2017. Learning a Neural Semantic Parser from User Feedback. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (Vancouver, Canada). 963--973. http://www.aclweb.org/anthology/P17-1089
    [22]
    Lappoon R. Tang and Raymond J. Mooney. 2000. Automated Construction of Database Interfaces: Intergrating Statistical and Relational Learning for Semantic Parsing. In 2000 Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora (Hong Kong, China). 133--141. http://www.aclweb.org/anthology/W00-1317
    [23]
    Kai Yang Michihiro Yasunaga Dongxu Wang Zifan Li James Ma Irene Li Qingning Yao Shanelle Roman Zilin Zhang Tao Yu, Rui Zhang and Dragomir Radev. 2018. Spider: A Large-Scale Human-Labeled Dataset for Complex and Cross-Domain Semantic Parsing and Text-to-SQL Task. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing (Brussels, Belgium). 3911--3921. http://aclweb.org/anthology/D18-1425
    [24]
    Caiming Xiong Victor Zhong and Richard Socher. 2017. Seq2SQL: Generating Structured Queries from Natural Language using Reinforcement Learning. CoRR abs/1709.00103 (2017).
    [25]
    W. A. Woods. 1970. Transition Network Grammars for Natural Language Analysis. Commun. ACM 13, 10 (oct 1970), 591--606. https://doi.org/10.1145/355598.362773
    [26]
    Shijie Wu, Ozan Irsoy, Steven Lu, Vadim Dabravolski, Mark Dredze, Sebastian Gehrmann, Prabhanjan Kambadur, David Rosenberg, and Gideon Mann. 2023. BloombergGPT: A Large Language Model for Finance. arXiv:2303.17564 [cs.LG]
    [27]
    John M. Zelle and Raymond J. Mooney. 1996. Learning to Parse Database Queries Using Inductive Logic Programming. In Proceedings of the Thirteenth National Conference on Artificial Intelligence - Volume 2 (Portland, Oregon). 1050--1055. http://dl.acm.org/citation.cfm?id=1864519.1864543

    Cited By

    View all
    • (2024)Large language models: Expectations for semantics-driven systems engineeringData & Knowledge Engineering10.1016/j.datak.2024.102324152(102324)Online publication date: Jul-2024

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    GRADES-NDA '24: Proceedings of the 7th Joint Workshop on Graph Data Management Experiences & Systems (GRADES) and Network Data Analytics (NDA)
    June 2024
    62 pages
    ISBN:9798400706530
    DOI:10.1145/3661304
    • Editors:
    • Olaf Hartig,
    • Zoi Kaoudi
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 09 June 2024

    Permissions

    Request permissions for this article.

    Check for updates

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Conference

    SIGMOD/PODS '24
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 29 of 61 submissions, 48%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)16
    • Downloads (Last 6 weeks)16

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Large language models: Expectations for semantics-driven systems engineeringData & Knowledge Engineering10.1016/j.datak.2024.102324152(102324)Online publication date: Jul-2024

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Get Access

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media