Structure-Grounded Pretraining for Text-to-SQL

Deng, Xiang; Awadallah, Ahmed Hassan; Meek, Christopher; Polozov, Oleksandr; Sun, Huan; Richardson, Matthew

Computer Science > Computation and Language

arXiv:2010.12773v1 (cs)

[Submitted on 24 Oct 2020 (this version), latest version 31 Aug 2022 (v3)]

Title:Structure-Grounded Pretraining for Text-to-SQL

Authors:Xiang Deng, Ahmed Hassan Awadallah, Christopher Meek, Oleksandr Polozov, Huan Sun, Matthew Richardson

View PDF

Abstract:Learning to capture text-table alignment is essential for table related tasks like text-to-SQL. The model needs to correctly recognize natural language references to columns and values and to ground them in the given database schema. In this paper, we present a novel weakly supervised Structure-Grounded pretraining framework (StruG) for text-to-SQL that can effectively learn to capture text-table alignment based on a parallel text-table corpus. We identify a set of novel prediction tasks: column grounding, value grounding and column-value mapping, and train them using weak supervision without requiring complex SQL annotation. Additionally, to evaluate the model under a more realistic setting, we create a new evaluation set Spider-Realistic based on Spider with explicit mentions of column names removed, and adopt two existing single-database text-to-SQL datasets. StruG significantly outperforms BERT-LARGE on Spider and the realistic evaluation sets, while bringing consistent improvement on the large-scale WikiSQL benchmark.

Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2010.12773 [cs.CL]
	(or arXiv:2010.12773v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2010.12773

Submission history

From: Xiang Deng [view email]
[v1] Sat, 24 Oct 2020 04:35:35 UTC (1,492 KB)
[v2] Sun, 20 Jun 2021 21:12:39 UTC (1,479 KB)
[v3] Wed, 31 Aug 2022 00:19:41 UTC (1,479 KB)

✅2024-10-01: arxiv.org is back to normal.✅

Computer Science > Computation and Language

Title:Structure-Grounded Pretraining for Text-to-SQL

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

✅2024-10-01: arxiv.org is back to normal.✅

Computer Science > Computation and Language

Title:Structure-Grounded Pretraining for Text-to-SQL

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators