Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3539618.3591759acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
research-article

RHB-Net: A Relation-aware Historical Bridging Network for Text2SQL Auto-Completion

Published: 18 July 2023 Publication History
  • Get Citation Alerts
  • Abstract

    Test2SQL, a natural language interface to database querying, has seen considerable improvement, in part due to advances in deep learning. However, despite recent improvement, existing Text2SQL proposals allow only input in the form of complete questions. This leaves behind users who struggle to formulate complete questions, e.g., because they lack database expertise or are unfamiliar with the underlying database schema. To address this shortcoming, we study the novel problem of Text2SQL Auto-Completion (TSAC) that extends Text2SQL to also take partial or incomplete questions as input. Specifically, the TSAC problem is to predict the complete, executable SQL query. To solve the problem, we propose a novel Relation-aware Historical Bridging Network (RHB-Net) that consists of a relation-aware union encoder and an extraction-generation sensitive decoder. RHB-Net models relations between questions and database schemas and predicts the ambiguous intents expressed in partial queries. We also propose two optimization strategies: historical query bridging that fuses historical database queries, and a dynamic context construction that prevents repeated generation of the same SQL elements. Extensive experiments with real-world data offer evidence that RHB-Net is capable of outperforming baseline algorithms.

    References

    [1]
    Christopher Baik, H. V. Jagadish, and Yunyao Li. 2019. Bridging the Semantic Gap with SQL Query Logs in Natural Language Interfaces to Databases. In ICDE. 374--385.
    [2]
    Ziv Bar-Yossef and Naama Kraus. 2011. Context-sensitive query auto-completion. In WWW. 107--116.
    [3]
    Samy Bengio, Oriol Vinyals, Navdeep Jaitly, and Noam Shazeer. 2015. Scheduled Sampling for Sequence Prediction with Recurrent Neural Networks. In NIPS. 1171--1179.
    [4]
    Ursin Brunner and Kurt Stockinger. 2021. ValueNet: A Natural Language-to-SQL System that Learns from Database Information. In ICDE. 2177--2182.
    [5]
    Fei Cai and Maarten de Rijke. 2016. A Survey of Query Auto Completion in Information Retrieval. Found. Trends Inf. Retr., Vol. 10, 4 (2016), 273--363.
    [6]
    Zhi Chen, Lu Chen, Yanbin Zhao, Ruisheng Cao, Zihan Xu, Su Zhu, and Kai Yu. 2021. ShadowGNN: Graph Projection Neural Network for Text-to-SQL Parser. In NAACL-HLT. 5567--5577.
    [7]
    Naihao Deng, Shuaichen Chang, Peng Shi, Tao Yu, and Rui Zhang. 2021. Prefix-to-SQL: Text-to-SQL Generation from Incomplete User Questions. CoRR, Vol. abs/2109.13066 (2021).
    [8]
    Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In NAACL-HLT (1). 4171--4186.
    [9]
    Catherine Finegan-Dollak, Jonathan K. Kummerfeld, Li Zhang, Karthik Ramanathan, Sesh Sadasivam, Rui Zhang, and Dragomir R. Radev. 2018. Improving Text-to-SQL Evaluation Methodology. In ACL (1). 351--360.
    [10]
    Markus Freitag and Yaser Al-Onaizan. 2017. Beam Search Strategies for Neural Machine Translation. In NMT@ACL. 56--60.
    [11]
    Hiroshi Fukui, Tsubasa Hirakawa, Takayoshi Yamashita, and Hironobu Fujiyoshi. 2019. Attention Branch Network: Learning of Attention Mechanism for Visual Explanation. In CVPR. 10705--10714.
    [12]
    Sepp Hochreiter and Jü rgen Schmidhuber. 1997. Long Short-Term Memory. Neural Comput., Vol. 9, 8 (1997), 1735--1780.
    [13]
    Wonseok Hwang, Jinyeung Yim, Seunghyun Park, and Minjoon Seo. 2019. A Comprehensive Exploration on WikiSQL with Table-Aware Word Contextualization. CoRR, Vol. abs/1902.01069 (2019).
    [14]
    Srinivasan Iyer, Ioannis Konstas, Alvin Cheung, Jayant Krishnamurthy, and Luke Zettlemoyer. 2017. Learning a Neural Semantic Parser from User Feedback. In ACL (1). 963--973.
    [15]
    Jyun-Yu Jiang, Yen-Yu Ke, Pao-Yu Chien, and Pu-Jen Cheng. 2014. Learning user reformulation behavior for query auto-completion. In SIGIR. 445--454.
    [16]
    Rong Jin and Zoubin Ghahramani. 2002. Learning with Multiple Labels. In NIPS. 897--904.
    [17]
    Diederik P. Kingma and Jimmy Ba. 2015. Adam: A Method for Stochastic Optimization. In ICLR (Poster).
    [18]
    Fei Li and H. V. Jagadish. 2014. Constructing an Interactive Natural Language Interface for Relational Databases. Proc. VLDB Endow., Vol. 8, 1 (2014), 73--84.
    [19]
    Xi Victoria Lin, Richard Socher, and Caiming Xiong. 2020. Bridging Textual and Tabular Data for Cross-Domain Text-to-SQL Semantic Parsing. In EMNLP (Findings) (Findings of ACL, Vol. EMNLP 2020). 4870--4888.
    [20]
    Siyuan Liu, Sourav S. Bhowmick, Wanlu Zhang, Shu Wang, Wanyi Huang, and Shafiq R. Joty. 2019a. NEURON: Query Execution Plan Meets Natural Language Processing For Augmenting DB Education. In SIGMOD. 1953--1956.
    [21]
    Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019b. RoBERTa: A Robustly Optimized BERT Pretraining Approach. CoRR, Vol. abs/1907.11692 (2019).
    [22]
    Yuyu Luo, Nan Tang, Guoliang Li, Chengliang Chai, Wenbo Li, and Xuedi Qin. 2021. Synthesizing Natural Language to Visualization (NL2VIS) Benchmarks from NL2SQL Benchmarks. In SIGMOD. 1235--1247.
    [23]
    Pingchuan Ma and Shuai Wang. 2021. MT-Teql: Evaluating and Augmenting Neural NLIDB on Real-world Linguistic and Schema Variations. Proc. VLDB Endow., Vol. 15, 3 (2021), 569--582.
    [24]
    Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas Köpf, Edward Z. Yang, Zachary DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. 2019. PyTorch: An Imperative Style, High-Performance Deep Learning Library. In NeurIPS. 8024--8035.
    [25]
    Anastasia Pentina, Viktoriia Sharmanska, and Christoph H. Lampert. 2015. Curriculum learning of multiple tasks. In CVPR. 5492--5500.
    [26]
    Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, Ilya Sutskever, et al. 2019. Language models are unsupervised multitask learners. OpenAI blog, Vol. 1, 8 (2019), 9.
    [27]
    Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J. Liu. 2020. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer. J. Mach. Learn. Res., Vol. 21 (2020), 140:1--140:67.
    [28]
    Marc'Aurelio Ranzato, Sumit Chopra, Michael Auli, and Wojciech Zaremba. 2016. Sequence Level Training with Recurrent Neural Networks. In ICLR (Poster).
    [29]
    Robert Sanders. 1987. The Pareto principle: its use and abuse. Journal of Services Marketing, Vol. 1, 2 (1987), 37--40.
    [30]
    Mike Schuster and Kuldip K. Paliwal. 1997. Bidirectional recurrent neural networks. IEEE Trans. Signal Process., Vol. 45, 11 (1997), 2673--2681.
    [31]
    Jaydeep Sen, Chuan Lei, Abdul Quamar, Fatma Özcan, Vasilis Efthymiou, Ayushi Dalmia, Greg Stager, Ashish R. Mittal, Diptikalyan Saha, and Karthik Sankaranarayanan. 2020. ATHENA: Natural Language Querying for Complex Nested SQL Queries. Proc. VLDB Endow., Vol. 13, 11 (2020), 2747--2759.
    [32]
    Jaydeep Sen, Fatma Özcan, Abdul Quamar, Greg Stager, Ashish R. Mittal, Manasa Jammi, Chuan Lei, Diptikalyan Saha, and Karthik Sankaranarayanan. 2019. Natural Language Querying of Complex Business Intelligence Queries. In SIGMOD. 1997--2000.
    [33]
    Vraj Shah, Side Li, Arun Kumar, and Lawrence K. Saul. 2020. SpeakQL: Towards Speech-driven Multimodal Querying of Structured Data. In SIGMOD. 2363--2374.
    [34]
    Tianze Shi, Kedar Tatwawadi, Kaushik Chakrabarti, Yi Mao, Oleksandr Polozov, and Weizhu Chen. 2018. IncSQL: Training Incremental Text-to-SQL Parsers with Non-Deterministic Oracles. CoRR, Vol. abs/1809.05054 (2018).
    [35]
    Milad Shokouhi. 2013. Learning to personalize query auto-completion. In SIGIR. 103--112.
    [36]
    James Thorne, Majid Yazdani, Marzieh Saeidi, Fabrizio Silvestri, Sebastian Riedel, and Alon Y. Levy. 2021. From Natural Language Processing to Neural Databases. Proc. VLDB Endow., Vol. 14, 6 (2021), 1033--1039.
    [37]
    Immanuel Trummer. 2021. Database Tuning using Natural Language Processing. SIGMOD Rec., Vol. 50, 3 (2021), 27--28.
    [38]
    Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is All you Need. In NIPS. 5998--6008.
    [39]
    Bailin Wang, Richard Shin, Xiaodong Liu, Oleksandr Polozov, and Matthew Richardson. 2020. RAT-SQL: Relation-Aware Schema Encoding and Linking for Text-to-SQL Parsers. In ACL. 7567--7578.
    [40]
    Weiguo Wang, Sourav S. Bhowmick, Hui Li, Shafiq R. Joty, Siyuan Liu, and Peng Chen. 2021. Towards Enhancing Database Education: Natural Language Generation Meets Query Execution Plans. In SIGMOD. 1933--1945.
    [41]
    Nathaniel Weir, Prasetya Utama, Alex Galakatos, Andrew Crotty, Amir Ilkhechi, Shekar Ramaswamy, Rohin Bhushan, Nadja Geisler, Benjamin H"a ttasch, Steffen Eger, Ugur cC etintemel, and Carsten Binnig. 2020. DBPal: A Fully Pluggable NL2SQL Training Pipeline. In SIGMOD. 2347--2361.
    [42]
    Stewart Whiting and Joemon M. Jose. 2014. Recent and robust query auto-completion. In WWW. 971--982.
    [43]
    Baosong Yang, Longyue Wang, Derek F. Wong, Shuming Shi, and Zhaopeng Tu. 2021. Context-aware Self-Attention Networks for Natural Language Processing. Neurocomputing, Vol. 458 (2021), 157--169.
    [44]
    Tao Yu, Michihiro Yasunaga, Kai Yang, Rui Zhang, Dongxu Wang, Zifan Li, and Dragomir R. Radev. 2018a. SyntaxSQLNet: Syntax Tree Networks for Complex and Cross-Domain Text-to-SQL Task. In EMNLP. 1653--1663.
    [45]
    Tao Yu, Rui Zhang, Kai Yang, Michihiro Yasunaga, Dongxu Wang, Zifan Li, James Ma, Irene Li, Qingning Yao, Shanelle Roman, Zilin Zhang, and Dragomir R. Radev. 2018b. Spider: A Large-Scale Human-Labeled Dataset for Complex and Cross-Domain Semantic Parsing and Text-to-SQL Task. In EMNLP. 3911--3921.
    [46]
    Licheng Zhang, Zhendong Mao, Benfeng Xu, Quan Wang, and Yongdong Zhang. 2021. Review and Arrange: Curriculum Learning for Natural Language Understanding. IEEE ACM Trans. Audio Speech Lang. Process., Vol. 29 (2021), 3307--3320.

    Index Terms

    1. RHB-Net: A Relation-aware Historical Bridging Network for Text2SQL Auto-Completion

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      SIGIR '23: Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval
      July 2023
      3567 pages
      ISBN:9781450394086
      DOI:10.1145/3539618
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 18 July 2023

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. auto-completion
      2. database
      3. query language
      4. text2sql

      Qualifiers

      • Research-article

      Funding Sources

      Conference

      SIGIR '23
      Sponsor:

      Acceptance Rates

      Overall Acceptance Rate 792 of 3,983 submissions, 20%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • 0
        Total Citations
      • 196
        Total Downloads
      • Downloads (Last 12 months)196
      • Downloads (Last 6 weeks)4
      Reflects downloads up to 27 Jul 2024

      Other Metrics

      Citations

      View Options

      Get Access

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media