short-paper

A Pairwise Probe for Understanding BERT Fine-Tuning on Machine Reading Comprehension

Authors:

Jie Cai,

Zhengzhou Zhu,

Ping Nie, and

Qian LiuAuthors Info & Claims

SIGIR '20: Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval

July 2020

Pages 1665 - 1668

https://doi.org/10.1145/3397271.3401195

Published: 25 July 2020 Publication History

Get Access

Abstract

Pre-trained models have brought significant improvements to many NLP tasks and have been extensively analyzed. But little is known about the effect of fine-tuning on specific tasks. Intuitively, people may agree that a pre-trained model already learns semantic representations of words (e.g. synonyms are closer to each other) and fine-tuning further improves its capabilities which require more complicated reasoning (e.g. coreference resolution, entity boundary detection, etc). However, how to verify these arguments analytically and quantitatively is a challenging task and there are few works focus on this topic. In this paper, inspired by the observation that most probing tasks involve identifying matched pairs of phrases (e.g. coreference requires matching an entity and a pronoun), we propose a pairwise probe to understand BERT fine-tuning on the machine reading comprehension (MRC) task. Specifically, we identify five phenomena in MRC. According to pairwise probing tasks, we compare the performance of each layer's hidden representation of pre-trained and fine-tuned BERT. The proposed pairwise probe alleviates the problem of distraction from inaccurate model training and makes a robust and quantitative comparison. Our experimental analysis leads to highly confident conclusions: (1) Fine-tuning has little effect on the fundamental and low-level information and general semantic tasks. (2) For specific abilities required for downstream tasks, fine-tuned BERT is better than pre-trained BERT and such gaps are obvious after the fifth layer.

Supplementary Material

MP4 File (3397271.3401195.mp4)

Our work focuses on the question-answering task in NLP. What kind of knowledge does the pre-training model learn in the whole finetuning process and what kind of knowledge can make the pre-training model perform so well in specific tasks. We first summarize the five questions in the question-answering task: synonym, abbreviation, CO referential resolution, problem type, and boundary determination. These five problems all have one thing in common: they all appear in the form of pairs. Our analysis method is designed based on the idea of pairwise. Through quantitative comparison of the similarity between the embedding of positive and negative examples in each layer of corresponding problems, and making statistics, we can see the difference before and after fine-tuning, and analyze the knowledge learned in each layer.

Download
9.55 MB

References

[1]

Kevin Clark and et al. 2019. What Does BERT Look At? An Analysis of BERT's Attention. In ACL.

Abstract

Supplementary Material

References

Cited By

Index Terms

Recommendations

Machine reading comprehension combined with semantic dependency for Chinese zero pronoun resolution

XCMRC: Evaluating Cross-Lingual Machine Reading Comprehension

Biomedical named entity recognition using BERT in the machine reading comprehension framework

Comments

Information

Published In

Sponsors

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Funding Sources

Conference

Acceptance Rates

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

Get Access

Login options

Full Access

View options

PDF

eReader

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations