AD-DROP: Attribution-Driven Dropout for Robust Language Model Fine-Tuning

Yang, Tao; Deng, Jinghao; Quan, Xiaojun; Wang, Qifan; Nie, Shaoliang

Computer Science > Computation and Language

arXiv:2210.05883 (cs)

[Submitted on 12 Oct 2022]

Title:AD-DROP: Attribution-Driven Dropout for Robust Language Model Fine-Tuning

Authors:Tao Yang, Jinghao Deng, Xiaojun Quan, Qifan Wang, Shaoliang Nie

View PDF

Abstract:Fine-tuning large pre-trained language models on downstream tasks is apt to suffer from overfitting when limited training data is available. While dropout proves to be an effective antidote by randomly dropping a proportion of units, existing research has not examined its effect on the self-attention mechanism. In this paper, we investigate this problem through self-attention attribution and find that dropping attention positions with low attribution scores can accelerate training and increase the risk of overfitting. Motivated by this observation, we propose Attribution-Driven Dropout (AD-DROP), which randomly discards some high-attribution positions to encourage the model to make predictions by relying more on low-attribution positions to reduce overfitting. We also develop a cross-tuning strategy to alternate fine-tuning and AD-DROP to avoid dropping high-attribution positions excessively. Extensive experiments on various benchmarks show that AD-DROP yields consistent improvements over baselines. Analysis further confirms that AD-DROP serves as a strategic regularizer to prevent overfitting during fine-tuning.

Comments:	Accepted to NeurIPS 2022
Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2210.05883 [cs.CL]
	(or arXiv:2210.05883v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2210.05883

Submission history

From: Xiaojun Quan [view email]
[v1] Wed, 12 Oct 2022 02:54:41 UTC (635 KB)

Computer Science > Computation and Language

Title:AD-DROP: Attribution-Driven Dropout for Robust Language Model Fine-Tuning

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:AD-DROP: Attribution-Driven Dropout for Robust Language Model Fine-Tuning

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators