C3: Continued Pretraining with Contrastive Weak Supervision for Cross Language Ad-Hoc Retrieval

Yang, Eugene; Nair, Suraj; Chandradevan, Ramraj; Iglesias-Flores, Rebecca; Oard, Douglas W.

doi:10.1145/3477495.3531886

Computer Science > Information Retrieval

arXiv:2204.11989 (cs)

[Submitted on 25 Apr 2022]

Title:C3: Continued Pretraining with Contrastive Weak Supervision for Cross Language Ad-Hoc Retrieval

Authors:Eugene Yang, Suraj Nair, Ramraj Chandradevan, Rebecca Iglesias-Flores, Douglas W. Oard

View PDF

Abstract:Pretrained language models have improved effectiveness on numerous tasks, including ad-hoc retrieval. Recent work has shown that continuing to pretrain a language model with auxiliary objectives before fine-tuning on the retrieval task can further improve retrieval effectiveness. Unlike monolingual retrieval, designing an appropriate auxiliary task for cross-language mappings is challenging. To address this challenge, we use comparable Wikipedia articles in different languages to further pretrain off-the-shelf multilingual pretrained models before fine-tuning on the retrieval task. We show that our approach yields improvements in retrieval effectiveness.

Comments:	6 pages, 2 figures, accepted as a SIGIR 2022 Short Paper
Subjects:	Information Retrieval (cs.IR); Computation and Language (cs.CL)
Cite as:	arXiv:2204.11989 [cs.IR]
	(or arXiv:2204.11989v1 [cs.IR] for this version)
	https://doi.org/10.48550/arXiv.2204.11989
Related DOI:	https://doi.org/10.1145/3477495.3531886

Submission history

From: Eugene Yang [view email]
[v1] Mon, 25 Apr 2022 23:12:05 UTC (128 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.IR

< prev | next >

new | recent | 2022-04

Change to browse by:

cs
cs.CL

References & Citations

export BibTeX citation

Computer Science > Information Retrieval

Title:C3: Continued Pretraining with Contrastive Weak Supervision for Cross Language Ad-Hoc Retrieval

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Information Retrieval

Title:C3: Continued Pretraining with Contrastive Weak Supervision for Cross Language Ad-Hoc Retrieval

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators