Unsupervised Pre-training for Biomedical Question Answering

Kommaraju, Vaishnavi; Gunasekaran, Karthick; Li, Kun; Bansal, Trapit; McCallum, Andrew; Williams, Ivana; Istrate, Ana-Maria

Computer Science > Computation and Language

arXiv:2009.12952 (cs)

[Submitted on 27 Sep 2020]

Title:Unsupervised Pre-training for Biomedical Question Answering

Authors:Vaishnavi Kommaraju, Karthick Gunasekaran, Kun Li, Trapit Bansal, Andrew McCallum, Ivana Williams, Ana-Maria Istrate

View PDF

Abstract:We explore the suitability of unsupervised representation learning methods on biomedical text -- BioBERT, SciBERT, and BioSentVec -- for biomedical question answering. To further improve unsupervised representations for biomedical QA, we introduce a new pre-training task from unlabeled data designed to reason about biomedical entities in the context. Our pre-training method consists of corrupting a given context by randomly replacing some mention of a biomedical entity with a random entity mention and then querying the model with the correct entity mention in order to locate the corrupted part of the context. This de-noising task enables the model to learn good representations from abundant, unlabeled biomedical text that helps QA tasks and minimizes the train-test mismatch between the pre-training task and the downstream QA tasks by requiring the model to predict spans. Our experiments show that pre-training BioBERT on the proposed pre-training task significantly boosts performance and outperforms the previous best model from the 7th BioASQ Task 7b-Phase B challenge.

Comments:	To appear in BioASQ workshop 2020
Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2009.12952 [cs.CL]
	(or arXiv:2009.12952v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2009.12952

Submission history

From: Trapit Bansal [view email]
[v1] Sun, 27 Sep 2020 21:07:51 UTC (140 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.CL

< prev | next >

new | recent | 2020-09

Change to browse by:

References & Citations

DBLP - CS Bibliography

listing | bibtex

Kun Li
Trapit Bansal
Andrew McCallum

export BibTeX citation

Computer Science > Computation and Language

Title:Unsupervised Pre-training for Biomedical Question Answering

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Unsupervised Pre-training for Biomedical Question Answering

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators