Zero-shot Neural Passage Retrieval via Domain-targeted Synthetic Question Generation

Ma, Ji; Korotkov, Ivan; Yang, Yinfei; Hall, Keith; McDonald, Ryan

Computer Science > Information Retrieval

arXiv:2004.14503 (cs)

[Submitted on 29 Apr 2020 (v1), last revised 27 Jan 2021 (this version, v3)]

Title:Zero-shot Neural Passage Retrieval via Domain-targeted Synthetic Question Generation

Authors:Ji Ma, Ivan Korotkov, Yinfei Yang, Keith Hall, Ryan McDonald

View PDF

Abstract:A major obstacle to the wide-spread adoption of neural retrieval models is that they require large supervised training sets to surpass traditional term-based techniques, which are constructed from raw corpora. In this paper, we propose an approach to zero-shot learning for passage retrieval that uses synthetic question generation to close this gap. The question generation system is trained on general domain data, but is applied to documents in the targeted domain. This allows us to create arbitrarily large, yet noisy, question-passage relevance pairs that are domain specific. Furthermore, when this is coupled with a simple hybrid term-neural model, first-stage retrieval performance can be improved further. Empirically, we show that this is an effective strategy for building neural passage retrieval models in the absence of large training corpora. Depending on the domain, this technique can even approach the accuracy of supervised models.

Comments:	14 pages, 4 figures
Subjects:	Information Retrieval (cs.IR); Computation and Language (cs.CL)
Cite as:	arXiv:2004.14503 [cs.IR]
	(or arXiv:2004.14503v3 [cs.IR] for this version)
	https://doi.org/10.48550/arXiv.2004.14503

Submission history

From: Ji Ma [view email]
[v1] Wed, 29 Apr 2020 22:21:31 UTC (103 KB)
[v2] Sat, 23 Jan 2021 13:29:55 UTC (7,300 KB)
[v3] Wed, 27 Jan 2021 16:04:12 UTC (7,300 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.IR

< prev | next >

new | recent | 2020-04

Change to browse by:

cs
cs.CL

References & Citations

DBLP - CS Bibliography

listing | bibtex

Ji Ma
Yinfei Yang
Ryan T. McDonald

export BibTeX citation

Computer Science > Information Retrieval

Title:Zero-shot Neural Passage Retrieval via Domain-targeted Synthetic Question Generation

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Information Retrieval

Title:Zero-shot Neural Passage Retrieval via Domain-targeted Synthetic Question Generation

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators