SQL-Encoder: Improving NL2SQL In-Context Learning Through a Context-Aware Encoder

Pourreza, Mohammadreza; Rafiei, Davood; Feng, Yuxi; Li, Raymond; Fan, Zhenan; Zhang, Weiwei

Computer Science > Computation and Language

arXiv:2403.16204 (cs)

[Submitted on 24 Mar 2024]

Title:SQL-Encoder: Improving NL2SQL In-Context Learning Through a Context-Aware Encoder

Authors:Mohammadreza Pourreza, Davood Rafiei, Yuxi Feng, Raymond Li, Zhenan Fan, Weiwei Zhang

View PDF HTML (experimental)

Abstract:Detecting structural similarity between queries is essential for selecting examples in in-context learning models. However, assessing structural similarity based solely on the natural language expressions of queries, without considering SQL queries, presents a significant challenge. This paper explores the significance of this similarity metric and proposes a model for accurately estimating it. To achieve this, we leverage a dataset comprising 170k question pairs, meticulously curated to train a similarity prediction model. Our comprehensive evaluation demonstrates that the proposed model adeptly captures the structural similarity between questions, as evidenced by improvements in Kendall-Tau distance and precision@k metrics. Notably, our model outperforms strong competitive embedding models from OpenAI and Cohere. Furthermore, compared to these competitive models, our proposed encoder enhances the downstream performance of NL2SQL models in 1-shot in-context learning scenarios by 1-2\% for GPT-3.5-turbo, 4-8\% for CodeLlama-7B, and 2-3\% for CodeLlama-13B.

Subjects:	Computation and Language (cs.CL); Databases (cs.DB); Human-Computer Interaction (cs.HC)
Cite as:	arXiv:2403.16204 [cs.CL]
	(or arXiv:2403.16204v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2403.16204

Submission history

From: Mohammadreza Pourreza [view email]
[v1] Sun, 24 Mar 2024 15:57:24 UTC (7,600 KB)

Computer Science > Computation and Language

Title:SQL-Encoder: Improving NL2SQL In-Context Learning Through a Context-Aware Encoder

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:SQL-Encoder: Improving NL2SQL In-Context Learning Through a Context-Aware Encoder

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators