Continual Pre-training of Language Models

Ke, Zixuan; Shao, Yijia; Lin, Haowei; Konishi, Tatsuya; Kim, Gyuhak; Liu, Bing

Computer Science > Computation and Language

arXiv:2302.03241 (cs)

[Submitted on 7 Feb 2023 (v1), last revised 12 Apr 2023 (this version, v4)]

Title:Continual Pre-training of Language Models

Authors:Zixuan Ke, Yijia Shao, Haowei Lin, Tatsuya Konishi, Gyuhak Kim, Bing Liu

View PDF

Abstract:Language models (LMs) have been instrumental for the rapid advance of natural language processing. This paper studies continual pre-training of LMs, in particular, continual domain-adaptive pre-training (or continual DAP-training). Existing research has shown that further pre-training an LM using a domain corpus to adapt the LM to the domain can improve the end-task performance in the domain. This paper proposes a novel method to continually DAP-train an LM with a sequence of unlabeled domain corpora to adapt the LM to these domains to improve their end-task performances. The key novelty of our method is a soft-masking mechanism that directly controls the update to the LM. A novel proxy is also proposed to preserve the general knowledge in the original LM. Additionally, it contrasts the representations of the previously learned domain knowledge (including the general knowledge in the pre-trained LM) and the knowledge from the current full network to achieve knowledge integration. The method not only overcomes catastrophic forgetting, but also achieves knowledge transfer to improve end-task performances. Empirical evaluation demonstrates the effectiveness of the proposed method.

Comments:	this https URL
Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
Cite as:	arXiv:2302.03241 [cs.CL]
	(or arXiv:2302.03241v4 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2302.03241
Journal reference:	ICLR 2023

Submission history

From: Zixuan Ke [view email]
[v1] Tue, 7 Feb 2023 03:57:55 UTC (425 KB)
[v2] Fri, 10 Feb 2023 02:14:32 UTC (425 KB)
[v3] Thu, 2 Mar 2023 19:38:17 UTC (425 KB)
[v4] Wed, 12 Apr 2023 10:36:44 UTC (425 KB)

Computer Science > Computation and Language

Title:Continual Pre-training of Language Models

Submission history

Access Paper:

References & Citations

1 blog link

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Continual Pre-training of Language Models

Submission history

Access Paper:

References & Citations

1 blog link

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators