research-article

Free access

A Benchmark to Understand the Role of Knowledge Graphs on Large Language Model's Accuracy for Question Answering on Enterprise SQL Databases

Authors:

Dean Allemang, and

Bryon JacobAuthors Info & Claims

GRADES-NDA '24: Proceedings of the 7th Joint Workshop on Graph Data Management Experiences & Systems (GRADES) and Network Data Analytics (NDA)

June 2024

Article No.: 5, Pages 1 - 12

https://doi.org/10.1145/3661304.3661901

Published: 09 June 2024 Publication History

Abstract

Enterprise applications of Large Language Models (LLMs) hold promise for question answering on enterprise SQL databases. However, the extent to which LLMs can accurately respond to enterprise questions in such databases remains unclear, given the absence of suitable Text-to-SQL benchmarks tailored to enterprise settings. Additionally, the potential of Knowledge Graphs (KGs) to enhance LLM-based question answering by providing business context is not well understood. This study aims to evaluate the accuracy of LLM-powered question answering systems in the context of enterprise questions and SQL databases, while also exploring the role of knowledge graphs in improving accuracy. To achieve this, we introduce a benchmark comprising an enterprise SQL schema in the insurance domain, a range of enterprise queries encompassing reporting to metrics, and a contextual layer incorporating an ontology and mappings that define a knowledge graph. Our primary finding reveals that question answering using GPT-4, with zero-shot prompts directly on SQL databases, achieves an accuracy of 16%. Notably, this accuracy increases to 54% when questions are posed over a Knowledge Graph representation of the enterprise SQL database. Therefore, investing in Knowledge Graph provides higher accuracy for LLM powered question answering systems.

References

[1]

[n.d.]. NSQL. https://github.com/NumbersStationAI/NSQL

[2]

[n. d.]. SQLCoder. https://github.com/defog-ai/sqlcoder

[3]

[n.d.]. Ultra. https://github.com/DeepGraphLearning/ULTRA

[4]

Oren Etzioni Ana-Maria Popescu and Henry Kautz. 2003. Towards a Theory of Natural Language Interfaces to Databases. In Proceedings of the 8th International Conference on Intelligent User Interfaces (Miami, Florida, USA). 149--157. http://doi.acm.org/10.1145/604045.604070

Digital Library

[5]

Li Zhang Karthik Ramanathan Sesh Sadasivam Rui Zhang Catherine Finegan-Dollak, Jonathan K. Kummerfeld and Dragomir Radev. 2018. Improving Text-to-SQL Evaluation Methodology. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (Melbourne, Victoria, Australia). 351--360. http://aclweb.org/anthology/P18-1033

[6]

Michael Brown William Fisher Kate Hunicke-Smith David Pallett Christine Pao Alexander Rudnicky Deborah A. Dahl, Madeleine Bates and Elizabeth Shriber. 1994. Expanding the scope of the ATIS task: The ATIS-3 corpus. Proceedings of the workshop on Human Language Technology (1994), 43--48. htttp://dl.acm.org/citation.cfm?id=1075823

[7]

Catherine Finegan-Dollak, Jonathan K. Kummerfeld, Li Zhang, Karthik Ramanathan, Sesh Sadasivam, Rui Zhang, and Dragomir Radev. 2018. Improving Text-to-SQL Evaluation Methodology. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Iryna Gurevych and Yusuke Miyao (Eds.). Association for Computational Linguistics, Melbourne, Australia, 351--360. https://doi.org/10.18653/v1/P18-1033

[8]

Mikhail Galkin, Xinyu Yuan, Hesham Mostafa, Jian Tang, and Zhaocheng Zhu. 2023. Towards Foundation Models for Knowledge Graph Reasoning. (2023). arXiv:2310.04562 [cs.CL]

[9]

Alessandra Giordani and Alessandro Moschitti. 2012. Automatic Generation and Reranking of SQL-derived Answers to NL Questions. In Proceedings of the Second International Conference on Trustworthy Eternal Systems via Evolving Software, Data and Knowledge (Montpellier, France). 59--76. https://doi.org/10.1007/978-3-642-45260-4_5

Digital Library

[10]

Parke Godfrey and Jarek Gryz. 1999. Answering Queries by Semantic Caches. In Database and Expert Systems Applications, 10th International Conference, DEXA '99, Florence, Italy, August 30-September 3, 1999, Proceedings (Lecture Notes in Computer Science, Vol. 1677), Trevor J. M. Bench-Capon, Giovanni Soda, and A Min Tjoa (Eds.). Springer, 485--498. https://doi.org/10.1007/3-540-48309-8_45

[11]

Bert F. Green, Alice K. Wolf, Carol Chomsky, and Kenneth Laughery. 1961. Baseball: An Automatic Question-Answerer. In Papers Presented at the May 9-11, 1961, Western Joint IRE-AIEE-ACM Computer Conference (Los Angeles, California) (IRE-AIEE-ACM '61 (Western)). Association for Computing Machinery, New York, NY, USA, 219--224. https://doi.org/10.1145/1460690.1460714

Digital Library

[12]

C. Cordell Green and Bertram Raphael. 1968. The Use of Theorem-Proving Techniques in Question-Answering Systems. In Proceedings of the 1968 23rd ACM National Conference (ACM '68). Association for Computing Machinery, New York, NY, USA, 169--181. https://doi.org/10.1145/800186.810578

Digital Library

[13]

Henry Cook Guido De Simoni, Robert Thanaraj. 2023. Adopt a Data Semantics Approach to Drive Business Value.

[14]

Gary G. Hendrix, Earl D. Sacerdoti, Daniel Sagalowicz, and Jonathan Slocum. 1978. Developing a Natural Language Interface to Complex Data. ACM Trans. Database Syst. 3, 2 (jun 1978), 105--147. https://doi.org/10.1145/320251.320253

Digital Library

[15]

Chia-Hsuan Lee, Oleksandr Polozov, and Matthew Richardson. 2021. KaggleDBQA: Realistic Evaluation of Text-to-SQL Parsers. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), Chengqing Zong, Fei Xia, Wenjie Li, and Roberto Navigli (Eds.). Association for Computational Linguistics, Online, 2261--2273. https://doi.org/10.18653/v1/2021.acl-long.176

[16]

Tony Lee, Michihiro Yasunaga, Chenlin Meng, Yifan Mai, Joon Sung Park, Agrim Gupta, Yunzhi Zhang, Deepak Narayanan, Hannah Benita Teufel, Marco Bellagente, Minguk Kang, Taesung Park, Jure Leskovec, Jun-Yan Zhu, Li Fei-Fei, Jiajun Wu, Stefano Ermon, and Percy Liang. 2023. Holistic Evaluation of Text-To-Image Models. arXiv:2311.04287 [cs.CV]

[17]

Fei Li and H. V. Jagadish. 2014. Constructing an Interactive Natural Language Interface for Relational Databases. Proceedings of the VLDB Endowment 8, 1 (September 2014), 73--84. http://dx.doi.org/10.14778/2735461.2735468

Digital Library

[18]

Isil Dillig Navid Yaghmazadeh, Yuepeng Wang and Thomas Dillig. 2017. SQLizer: Query Synthesis from Natural Language. In International Conference on Object-Oriented Programming, Systems, Languages, and Applications, ACM. 63:1--63:26. http://doi.org/10.1145/3133887

Digital Library

[19]

Shirui Pan, Linhao Luo, Yufei Wang, Chen Chen, Jiapu Wang, and Xindong Wu. 2023. Unifying Large Language Models and Knowledge Graphs: A Roadmap. arXiv preprint arxiv:306.08302 (2023).

[20]

Juan Sequeda and Ora Lassila. 2021. Designing and Building Enterprise Knowledge Graphs. Morgan & Claypool Publishers. https://doi.org/10.2200/S01105ED1V01Y202105DSK020

[21]

Alvin Cheung Jayant Krishnamurthy Srinivasan Iyer, Ioannis Konstas and Luke Zettlemoyer. 2017. Learning a Neural Semantic Parser from User Feedback. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (Vancouver, Canada). 963--973. http://www.aclweb.org/anthology/P17-1089

[22]

Lappoon R. Tang and Raymond J. Mooney. 2000. Automated Construction of Database Interfaces: Intergrating Statistical and Relational Learning for Semantic Parsing. In 2000 Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora (Hong Kong, China). 133--141. http://www.aclweb.org/anthology/W00-1317

[23]

Kai Yang Michihiro Yasunaga Dongxu Wang Zifan Li James Ma Irene Li Qingning Yao Shanelle Roman Zilin Zhang Tao Yu, Rui Zhang and Dragomir Radev. 2018. Spider: A Large-Scale Human-Labeled Dataset for Complex and Cross-Domain Semantic Parsing and Text-to-SQL Task. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing (Brussels, Belgium). 3911--3921. http://aclweb.org/anthology/D18-1425

[24]

Caiming Xiong Victor Zhong and Richard Socher. 2017. Seq2SQL: Generating Structured Queries from Natural Language using Reinforcement Learning. CoRR abs/1709.00103 (2017).

[25]

W. A. Woods. 1970. Transition Network Grammars for Natural Language Analysis. Commun. ACM 13, 10 (oct 1970), 591--606. https://doi.org/10.1145/355598.362773

Digital Library

[26]

Shijie Wu, Ozan Irsoy, Steven Lu, Vadim Dabravolski, Mark Dredze, Sebastian Gehrmann, Prabhanjan Kambadur, David Rosenberg, and Gideon Mann. 2023. BloombergGPT: A Large Language Model for Finance. arXiv:2303.17564 [cs.LG]

[27]

John M. Zelle and Raymond J. Mooney. 1996. Learning to Parse Database Queries Using Inductive Logic Programming. In Proceedings of the Thirteenth National Conference on Artificial Intelligence - Volume 2 (Portland, Oregon). 1050--1055. http://dl.acm.org/citation.cfm?id=1864519.1864543

Cited By

Buchmann REder JFill HFrank UKaragiannis DLaurenzi EMylopoulos JPlexousakis DSantos M(2024)Large language models: Expectations for semantics-driven systems engineeringData & Knowledge Engineering10.1016/j.datak.2024.102324152(102324)Online publication date: Jul-2024
https://doi.org/10.1016/j.datak.2024.102324

Recommendations

Sql: Learn Basics of Queries and Implement Easily (sql programming, SQL 2016, sql database programming, sql for beginners, sql beginners guide, sql ... sql workbook, sql guide, MSSQL) (Volume 1)
Read More
A comparative benchmark of large objects in relational databases
IDEAS '08: Proceedings of the 2008 international symposium on Database engineering & applications

Originally Binary Large Objects (BLOBs) in databases were conceived as a means to capture any large data (whatever large meant at the time of writing) which, for whatever reason, cannot or should not be modeled relationally. Today we find images, movies,...
Read More
SQL & NoSQL Databases: Models, Languages, Consistency Options and Architectures for Big Data Management
Read More

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

GRADES-NDA '24: Proceedings of the 7th Joint Workshop on Graph Data Management Experiences & Systems (GRADES) and Network Data Analytics (NDA)

June 2024

62 pages

ISBN:9798400706530

DOI:10.1145/3661304

Editors:
Olaf Hartig
Amazon Web Services & Linköping University, Sweden
,
Zoi Kaoudi
IT University of Copenhagen, Denmark

Copyright © 2024 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGMOD: ACM Special Interest Group on Management of Data

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 09 June 2024

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Research-article
Research
Refereed limited

Conference

SIGMOD/PODS '24

Sponsor:

SIGMOD

SIGMOD/PODS '24: International Conference on Management of Data

June 14, 2024

AA, Santiago, Chile

Acceptance Rates

Overall Acceptance Rate 29 of 61 submissions, 48%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
16
Total Downloads

Downloads (Last 12 months)16
Downloads (Last 6 weeks)16

Other Metrics

View Author Metrics

Citations

Cited By

Buchmann REder JFill HFrank UKaragiannis DLaurenzi EMylopoulos JPlexousakis DSantos M(2024)Large language models: Expectations for semantics-driven systems engineeringData & Knowledge Engineering10.1016/j.datak.2024.102324152(102324)Online publication date: Jul-2024
https://doi.org/10.1016/j.datak.2024.102324

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents