Type-and content-driven synthesis of SQL queries from natural language

N Yaghmazadeh, Y Wang, I Dillig, T Dillig - arXiv preprint arXiv …, 2017 - arxiv.org
N Yaghmazadeh, Y Wang, I Dillig, T Dillig
arXiv preprint arXiv:1702.01168, 2017arxiv.org
This paper presents a new technique for automatically synthesizing SQL queries from
natural language. Our technique is fully automated, works for any database without requiring
additional customization, and does not require users to know the underlying database
schema. Our method achieves these goals by combining natural language processing,
program synthesis, and automated program repair. Given the user's English description, our
technique first uses semantic parsing to generate a query sketch, which is subsequently …
This paper presents a new technique for automatically synthesizing SQL queries from natural language. Our technique is fully automated, works for any database without requiring additional customization, and does not require users to know the underlying database schema. Our method achieves these goals by combining natural language processing, program synthesis, and automated program repair. Given the user's English description, our technique first uses semantic parsing to generate a query sketch, which is subsequently completed using type-directed program synthesis and assigned a confidence score using database contents. However, since the user's description may not accurately reflect the actual database schema, our approach also performs fault localization and repairs the erroneous part of the sketch. This synthesize-repair loop is repeated until the algorithm infers a query with a sufficiently high confidence score. We have implemented the proposed technique in a tool called Sqlizer and evaluate it on three different databases. Our experiments show that the desired query is ranked within the top 5 candidates in close to 90% of the cases.
arxiv.org