research-article

RHB-Net: A Relation-aware Historical Bridging Network for Text2SQL Auto-Completion

Authors:

Christian S. JensenAuthors Info & Claims

SIGIR '23: Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval

Pages 1458 - 1467

https://doi.org/10.1145/3539618.3591759

Published: 18 July 2023 Publication History

Abstract

Test2SQL, a natural language interface to database querying, has seen considerable improvement, in part due to advances in deep learning. However, despite recent improvement, existing Text2SQL proposals allow only input in the form of complete questions. This leaves behind users who struggle to formulate complete questions, e.g., because they lack database expertise or are unfamiliar with the underlying database schema. To address this shortcoming, we study the novel problem of Text2SQL Auto-Completion (TSAC) that extends Text2SQL to also take partial or incomplete questions as input. Specifically, the TSAC problem is to predict the complete, executable SQL query. To solve the problem, we propose a novel Relation-aware Historical Bridging Network (RHB-Net) that consists of a relation-aware union encoder and an extraction-generation sensitive decoder. RHB-Net models relations between questions and database schemas and predicts the ambiguous intents expressed in partial queries. We also propose two optimization strategies: historical query bridging that fuses historical database queries, and a dynamic context construction that prevents repeated generation of the same SQL elements. Extensive experiments with real-world data offer evidence that RHB-Net is capable of outperforming baseline algorithms.

References

[1]

Christopher Baik, H. V. Jagadish, and Yunyao Li. 2019. Bridging the Semantic Gap with SQL Query Logs in Natural Language Interfaces to Databases. In ICDE. 374--385.

[2]

Ziv Bar-Yossef and Naama Kraus. 2011. Context-sensitive query auto-completion. In WWW. 107--116.

[3]

Samy Bengio, Oriol Vinyals, Navdeep Jaitly, and Noam Shazeer. 2015. Scheduled Sampling for Sequence Prediction with Recurrent Neural Networks. In NIPS. 1171--1179.

[4]

Ursin Brunner and Kurt Stockinger. 2021. ValueNet: A Natural Language-to-SQL System that Learns from Database Information. In ICDE. 2177--2182.

[5]

Fei Cai and Maarten de Rijke. 2016. A Survey of Query Auto Completion in Information Retrieval. Found. Trends Inf. Retr., Vol. 10, 4 (2016), 273--363.

Digital Library

[6]

Zhi Chen, Lu Chen, Yanbin Zhao, Ruisheng Cao, Zihan Xu, Su Zhu, and Kai Yu. 2021. ShadowGNN: Graph Projection Neural Network for Text-to-SQL Parser. In NAACL-HLT. 5567--5577.

[7]

Naihao Deng, Shuaichen Chang, Peng Shi, Tao Yu, and Rui Zhang. 2021. Prefix-to-SQL: Text-to-SQL Generation from Incomplete User Questions. CoRR, Vol. abs/2109.13066 (2021).

[8]

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In NAACL-HLT (1). 4171--4186.

[9]

Catherine Finegan-Dollak, Jonathan K. Kummerfeld, Li Zhang, Karthik Ramanathan, Sesh Sadasivam, Rui Zhang, and Dragomir R. Radev. 2018. Improving Text-to-SQL Evaluation Methodology. In ACL (1). 351--360.

[10]

Markus Freitag and Yaser Al-Onaizan. 2017. Beam Search Strategies for Neural Machine Translation. In NMT@ACL. 56--60.

[11]

Hiroshi Fukui, Tsubasa Hirakawa, Takayoshi Yamashita, and Hironobu Fujiyoshi. 2019. Attention Branch Network: Learning of Attention Mechanism for Visual Explanation. In CVPR. 10705--10714.

[12]

Sepp Hochreiter and Jü rgen Schmidhuber. 1997. Long Short-Term Memory. Neural Comput., Vol. 9, 8 (1997), 1735--1780.

Digital Library

[13]

Wonseok Hwang, Jinyeung Yim, Seunghyun Park, and Minjoon Seo. 2019. A Comprehensive Exploration on WikiSQL with Table-Aware Word Contextualization. CoRR, Vol. abs/1902.01069 (2019).

[14]

Srinivasan Iyer, Ioannis Konstas, Alvin Cheung, Jayant Krishnamurthy, and Luke Zettlemoyer. 2017. Learning a Neural Semantic Parser from User Feedback. In ACL (1). 963--973.

[15]

Jyun-Yu Jiang, Yen-Yu Ke, Pao-Yu Chien, and Pu-Jen Cheng. 2014. Learning user reformulation behavior for query auto-completion. In SIGIR. 445--454.

[16]

Rong Jin and Zoubin Ghahramani. 2002. Learning with Multiple Labels. In NIPS. 897--904.

[17]

Diederik P. Kingma and Jimmy Ba. 2015. Adam: A Method for Stochastic Optimization. In ICLR (Poster).

[18]

Fei Li and H. V. Jagadish. 2014. Constructing an Interactive Natural Language Interface for Relational Databases. Proc. VLDB Endow., Vol. 8, 1 (2014), 73--84.

Digital Library

[19]

Xi Victoria Lin, Richard Socher, and Caiming Xiong. 2020. Bridging Textual and Tabular Data for Cross-Domain Text-to-SQL Semantic Parsing. In EMNLP (Findings) (Findings of ACL, Vol. EMNLP 2020). 4870--4888.

[20]

Siyuan Liu, Sourav S. Bhowmick, Wanlu Zhang, Shu Wang, Wanyi Huang, and Shafiq R. Joty. 2019a. NEURON: Query Execution Plan Meets Natural Language Processing For Augmenting DB Education. In SIGMOD. 1953--1956.

[21]

Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019b. RoBERTa: A Robustly Optimized BERT Pretraining Approach. CoRR, Vol. abs/1907.11692 (2019).

[22]

Yuyu Luo, Nan Tang, Guoliang Li, Chengliang Chai, Wenbo Li, and Xuedi Qin. 2021. Synthesizing Natural Language to Visualization (NL2VIS) Benchmarks from NL2SQL Benchmarks. In SIGMOD. 1235--1247.

[23]

Pingchuan Ma and Shuai Wang. 2021. MT-Teql: Evaluating and Augmenting Neural NLIDB on Real-world Linguistic and Schema Variations. Proc. VLDB Endow., Vol. 15, 3 (2021), 569--582.

Digital Library

[24]

Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas Köpf, Edward Z. Yang, Zachary DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. 2019. PyTorch: An Imperative Style, High-Performance Deep Learning Library. In NeurIPS. 8024--8035.

Digital Library

[25]

Anastasia Pentina, Viktoriia Sharmanska, and Christoph H. Lampert. 2015. Curriculum learning of multiple tasks. In CVPR. 5492--5500.

[26]

Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, Ilya Sutskever, et al. 2019. Language models are unsupervised multitask learners. OpenAI blog, Vol. 1, 8 (2019), 9.

[27]

Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J. Liu. 2020. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer. J. Mach. Learn. Res., Vol. 21 (2020), 140:1--140:67.

[28]

Marc'Aurelio Ranzato, Sumit Chopra, Michael Auli, and Wojciech Zaremba. 2016. Sequence Level Training with Recurrent Neural Networks. In ICLR (Poster).

[29]

Robert Sanders. 1987. The Pareto principle: its use and abuse. Journal of Services Marketing, Vol. 1, 2 (1987), 37--40.

[30]

Mike Schuster and Kuldip K. Paliwal. 1997. Bidirectional recurrent neural networks. IEEE Trans. Signal Process., Vol. 45, 11 (1997), 2673--2681.

Digital Library

[31]

Jaydeep Sen, Chuan Lei, Abdul Quamar, Fatma Özcan, Vasilis Efthymiou, Ayushi Dalmia, Greg Stager, Ashish R. Mittal, Diptikalyan Saha, and Karthik Sankaranarayanan. 2020. ATHENA: Natural Language Querying for Complex Nested SQL Queries. Proc. VLDB Endow., Vol. 13, 11 (2020), 2747--2759.

Digital Library

[32]

Jaydeep Sen, Fatma Özcan, Abdul Quamar, Greg Stager, Ashish R. Mittal, Manasa Jammi, Chuan Lei, Diptikalyan Saha, and Karthik Sankaranarayanan. 2019. Natural Language Querying of Complex Business Intelligence Queries. In SIGMOD. 1997--2000.

[33]

Vraj Shah, Side Li, Arun Kumar, and Lawrence K. Saul. 2020. SpeakQL: Towards Speech-driven Multimodal Querying of Structured Data. In SIGMOD. 2363--2374.

[34]

Tianze Shi, Kedar Tatwawadi, Kaushik Chakrabarti, Yi Mao, Oleksandr Polozov, and Weizhu Chen. 2018. IncSQL: Training Incremental Text-to-SQL Parsers with Non-Deterministic Oracles. CoRR, Vol. abs/1809.05054 (2018).

[35]

Milad Shokouhi. 2013. Learning to personalize query auto-completion. In SIGIR. 103--112.

[36]

James Thorne, Majid Yazdani, Marzieh Saeidi, Fabrizio Silvestri, Sebastian Riedel, and Alon Y. Levy. 2021. From Natural Language Processing to Neural Databases. Proc. VLDB Endow., Vol. 14, 6 (2021), 1033--1039.

Digital Library

[37]

Immanuel Trummer. 2021. Database Tuning using Natural Language Processing. SIGMOD Rec., Vol. 50, 3 (2021), 27--28.

Digital Library

[38]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is All you Need. In NIPS. 5998--6008.

[39]

Bailin Wang, Richard Shin, Xiaodong Liu, Oleksandr Polozov, and Matthew Richardson. 2020. RAT-SQL: Relation-Aware Schema Encoding and Linking for Text-to-SQL Parsers. In ACL. 7567--7578.

[40]

Weiguo Wang, Sourav S. Bhowmick, Hui Li, Shafiq R. Joty, Siyuan Liu, and Peng Chen. 2021. Towards Enhancing Database Education: Natural Language Generation Meets Query Execution Plans. In SIGMOD. 1933--1945.

[41]

Nathaniel Weir, Prasetya Utama, Alex Galakatos, Andrew Crotty, Amir Ilkhechi, Shekar Ramaswamy, Rohin Bhushan, Nadja Geisler, Benjamin H"a ttasch, Steffen Eger, Ugur cC etintemel, and Carsten Binnig. 2020. DBPal: A Fully Pluggable NL2SQL Training Pipeline. In SIGMOD. 2347--2361.

[42]

Stewart Whiting and Joemon M. Jose. 2014. Recent and robust query auto-completion. In WWW. 971--982.

Digital Library

[43]

Baosong Yang, Longyue Wang, Derek F. Wong, Shuming Shi, and Zhaopeng Tu. 2021. Context-aware Self-Attention Networks for Natural Language Processing. Neurocomputing, Vol. 458 (2021), 157--169.

Digital Library

[44]

Tao Yu, Michihiro Yasunaga, Kai Yang, Rui Zhang, Dongxu Wang, Zifan Li, and Dragomir R. Radev. 2018a. SyntaxSQLNet: Syntax Tree Networks for Complex and Cross-Domain Text-to-SQL Task. In EMNLP. 1653--1663.

[45]

Tao Yu, Rui Zhang, Kai Yang, Michihiro Yasunaga, Dongxu Wang, Zifan Li, James Ma, Irene Li, Qingning Yao, Shanelle Roman, Zilin Zhang, and Dragomir R. Radev. 2018b. Spider: A Large-Scale Human-Labeled Dataset for Complex and Cross-Domain Semantic Parsing and Text-to-SQL Task. In EMNLP. 3911--3921.

[46]

Licheng Zhang, Zhendong Mao, Benfeng Xu, Quan Wang, and Yongdong Zhang. 2021. Review and Arrange: Curriculum Learning for Natural Language Understanding. IEEE ACM Trans. Audio Speech Lang. Process., Vol. 29 (2021), 3307--3320.

Digital Library

Index Terms

RHB-Net: A Relation-aware Historical Bridging Network for Text2SQL Auto-Completion
1. Information systems
  1. Data management systems
    1. Query languages

Recommendations

Extensions to SQL for Historical Databases

A historical management system (HDBMS) is described which uses an extended relational data model with state-oriented, instead of 'cubic', conceptualization. Two types of historical relations, called state and event relations, are provided for modeling ...
In-process object-oriented database design for .NET
SIGITE '05: Proceedings of the 6th conference on Information technology education

In this paper, we introduce the development of an in-process Object-Oriented Database (OODB) design for the .NET platform. Using an OODB design, one simple function call is needed to save, search, delete, or update .NET objects. Also little database ...
Superviews: Virtual Integration of Multiple Databases

An important advantage of a database system is that it provides each application with a custom view of the data. The issue addressed in this paper is how to provide such custom views to applications that access multiple databases. The paper describes a ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

SIGIR '23: Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval

July 2023

3567 pages

ISBN:9781450394086

DOI:10.1145/3539618

General Chairs:
Hsin-Hsi Chen
National Taiwan University
,
Wei-Jou (Edward) Duh
National Taiwan University
,
Hen-Hsen Huang
Academia Sinica
,
Program Chairs:
Makoto P. Kato
Spotify
,
Josiane Mothe
Universite de Toulouse
,
Barbara Poblete
University of Chile and Amazon Visiting Academic

Copyright © 2023 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGIR: ACM Special Interest Group on Information Retrieval

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 18 July 2023

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Conference

SIGIR '23

Sponsor:

SIGIR

SIGIR '23: The 46th International ACM SIGIR Conference on Research and Development in Information Retrieval

July 23 - 27, 2023

Taipei, Taiwan

Acceptance Rates

Overall Acceptance Rate 792 of 3,983 submissions, 20%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
196
Total Downloads

Downloads (Last 12 months)196
Downloads (Last 6 weeks)4

Reflects downloads up to 27 Jul 2024

Other Metrics

View Author Metrics

Citations

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents