CMT in TREC-COVID Round 2: Mitigating the Generalization Gaps from Web to Special Domain Search

Xiong, Chenyan; Liu, Zhenghao; Sun, Si; Dai, Zhuyun; Zhang, Kaitao; Yu, Shi; Liu, Zhiyuan; Poon, Hoifung; Gao, Jianfeng; Bennett, Paul

Computer Science > Information Retrieval

arXiv:2011.01580v1 (cs)

COVID-19 e-print

Important: e-prints posted on arXiv are not peer-reviewed by arXiv; they should not be relied upon without context to guide clinical practice or health-related behavior and should not be reported in news media as established information without consulting multiple experts in the field.

[Submitted on 3 Nov 2020]

Title:CMT in TREC-COVID Round 2: Mitigating the Generalization Gaps from Web to Special Domain Search

Authors:Chenyan Xiong, Zhenghao Liu, Si Sun, Zhuyun Dai, Kaitao Zhang, Shi Yu, Zhiyuan Liu, Hoifung Poon, Jianfeng Gao, Paul Bennett

View PDF

Abstract:Neural rankers based on deep pretrained language models (LMs) have been shown to improve many information retrieval benchmarks. However, these methods are affected by their the correlation between pretraining domain and target domain and rely on massive fine-tuning relevance labels. Directly applying pretraining methods to specific domains may result in suboptimal search quality because specific domains may have domain adaption problems, such as the COVID domain. This paper presents a search system to alleviate the special domain adaption problem. The system utilizes the domain-adaptive pretraining and few-shot learning technologies to help neural rankers mitigate the domain discrepancy and label scarcity problems. Besides, we also integrate dense retrieval to alleviate traditional sparse retrieval's vocabulary mismatch obstacle. Our system performs the best among the non-manual runs in Round 2 of the TREC-COVID task, which aims to retrieve useful information from scientific literature related to COVID-19. Our code is publicly available at this https URL.

Comments:	5 pages, 3 figures, 2 tables
Subjects:	Information Retrieval (cs.IR); Computation and Language (cs.CL)
Cite as:	arXiv:2011.01580 [cs.IR]
	(or arXiv:2011.01580v1 [cs.IR] for this version)
	https://doi.org/10.48550/arXiv.2011.01580

Submission history

From: Si Sun [view email]
[v1] Tue, 3 Nov 2020 09:10:48 UTC (928 KB)

Computer Science > Information Retrieval

Title:CMT in TREC-COVID Round 2: Mitigating the Generalization Gaps from Web to Special Domain Search

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Information Retrieval

Title:CMT in TREC-COVID Round 2: Mitigating the Generalization Gaps from Web to Special Domain Search

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators