Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3397271.3401195acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
short-paper

A Pairwise Probe for Understanding BERT Fine-Tuning on Machine Reading Comprehension

Published: 25 July 2020 Publication History
  • Get Citation Alerts
  • Abstract

    Pre-trained models have brought significant improvements to many NLP tasks and have been extensively analyzed. But little is known about the effect of fine-tuning on specific tasks. Intuitively, people may agree that a pre-trained model already learns semantic representations of words (e.g. synonyms are closer to each other) and fine-tuning further improves its capabilities which require more complicated reasoning (e.g. coreference resolution, entity boundary detection, etc). However, how to verify these arguments analytically and quantitatively is a challenging task and there are few works focus on this topic. In this paper, inspired by the observation that most probing tasks involve identifying matched pairs of phrases (e.g. coreference requires matching an entity and a pronoun), we propose a pairwise probe to understand BERT fine-tuning on the machine reading comprehension (MRC) task. Specifically, we identify five phenomena in MRC. According to pairwise probing tasks, we compare the performance of each layer's hidden representation of pre-trained and fine-tuned BERT. The proposed pairwise probe alleviates the problem of distraction from inaccurate model training and makes a robust and quantitative comparison. Our experimental analysis leads to highly confident conclusions: (1) Fine-tuning has little effect on the fundamental and low-level information and general semantic tasks. (2) For specific abilities required for downstream tasks, fine-tuned BERT is better than pre-trained BERT and such gaps are obvious after the fifth layer.

    Supplementary Material

    MP4 File (3397271.3401195.mp4)
    Our work focuses on the question-answering task in NLP. What kind of knowledge does the pre-training model learn in the whole finetuning process and what kind of knowledge can make the pre-training model perform so well in specific tasks. We first summarize the five questions in the question-answering task: synonym, abbreviation, CO referential resolution, problem type, and boundary determination. These five problems all have one thing in common: they all appear in the form of pairs. Our analysis method is designed based on the idea of pairwise. Through quantitative comparison of the similarity between the embedding of positive and negative examples in each layer of corresponding problems, and making statistics, we can see the difference before and after fine-tuning, and analyze the knowledge learned in each layer.

    References

    [1]
    Kevin Clark and et al. 2019. What Does BERT Look At? An Analysis of BERT's Attention. In ACL.
    [2]
    Alexis Conneau and et al. 2018. What you can cram into a single &!#* vector: Probing sentence embeddings for linguistic properties. In ACL.
    [3]
    Jacob Devlin and et al. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In NAACL.
    [4]
    Kenji Imamura and Eiichiro Sumita. 2019. Recycling a Pre-trained BERT Encoder for Neural Machine Translation. In Proceedings of the 3rd Workshop on Neural Generation and Translation.
    [5]
    Ganesh Jawahar and et al. 2019. What does BERT learn about the structure of language?. In ACL.
    [6]
    Ben Kantor and Amir Globerson. 2019. Coreference resolution with entity equalization. In ACL.
    [7]
    Olga Kovaleva, Alexey Romanov, Anna Rogers, and Anna Rumshisky. 2019. Revealing the dark secrets of bert. EMNLP (2019).
    [8]
    Tom Kwiatkowski and et al. 2019. Natural questions: a benchmark for question answering research. In ACL.
    [9]
    Zhen-Zhong Lan and et al. 2019. ALBERT: A Lite BERT for Self-supervised Learning of Language Representations. In ICLR.
    [10]
    Yongjie Lin, Yi Chern Tan, and Robert Frank. 2019. Open Sesame: Getting inside BERT's Linguistic Knowledge. In ACL.
    [11]
    Yinhan Liu and et al. 2019. RoBERTa: A Robustly Optimized BERT Pretraining Approach. ArXiv abs/1907.11692 (2019).
    [12]
    Matthew E. Peters, Sebastian Ruder, and Noah A. Smith. 2019. To Tune or Not to Tune? Adapting Pretrained Representations to Diverse Tasks. ACL (2019).
    [13]
    Chen Qu and et al. 2019. BERT with History Answer Embedding for Conversational Question Answering. In SIGIR.
    [14]
    Barbara Rychalska and et al. 2018. Does it care what you asked? Understanding Importance of Verbs in Deep Learning QA System. EMNLP (2018).
    [15]
    Barbara Rychalska and et al. 2018. How much should you ask? On the question structure in QA systems. EMNLP (2018).
    [16]
    Chenglei Si and et al. 2019. What does BERT Learn from Multiple-Choice Reading Comprehension Datasets? arXiv:cs.CL/1910.12391
    [17]
    Cong Sun and Zhihao Yang. 2019. Transfer Learning in Biomedical Named Entity Recognition: An Evaluation of BERT in the PharmaCoNER task. In SIGIR.
    [18]
    Ian Tenney, Dipanjan Das, and Ellie Pavlick. 2019. BERT Rediscovers the Classical NLP Pipeline. In ACL.
    [19]
    Ian Tenney and et al. 2019. What do you learn from context? Probing for sentence structure in contextualized word representations. In ICLR.
    [20]
    Betty van Aken and et al. 2019. How Does BERT Answer Questions?: A Layer- Wise Analysis of Transformer Representations. In CIKM.
    [21]
    Ashish Vaswani and et al. 2017. Attention is All you Need. In NIPS.
    [22]
    Xiaoyu Yang, Xiaodan Zhu, Huasha Zhao, Qiong Zhang, and Yufei Feng. 2019. Enhancing Unsupervised Pretraining with External Knowledge for Natural Language Inference. In Canadian Conference on Artificial Intelligence. Springer.
    [23]
    Zhilin Yang and et al. 2019. XLNet: Generalized Autoregressive Pretraining for Language Understanding. In NLPS.

    Cited By

    View all
    • (2023)The Biases of Pre-Trained Language Models: An Empirical Study on Prompt-Based Sentiment Analysis and Emotion DetectionIEEE Transactions on Affective Computing10.1109/TAFFC.2022.320497214:3(1743-1753)Online publication date: 1-Jul-2023
    • (2022)Machine Reading at Scale: A Search Engine for Scientific and Academic ResearchSystems10.3390/systems1002004310:2(43)Online publication date: 5-Apr-2022

    Index Terms

    1. A Pairwise Probe for Understanding BERT Fine-Tuning on Machine Reading Comprehension

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      SIGIR '20: Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval
      July 2020
      2548 pages
      ISBN:9781450380164
      DOI:10.1145/3397271
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 25 July 2020

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. BERT
      2. fine-tune
      3. machine reading comprehension
      4. pairwise

      Qualifiers

      • Short-paper

      Funding Sources

      • Ministry of Education
      • National Key Research and Development Program of China

      Conference

      SIGIR '20
      Sponsor:

      Acceptance Rates

      Overall Acceptance Rate 792 of 3,983 submissions, 20%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)11
      • Downloads (Last 6 weeks)0

      Other Metrics

      Citations

      Cited By

      View all
      • (2023)The Biases of Pre-Trained Language Models: An Empirical Study on Prompt-Based Sentiment Analysis and Emotion DetectionIEEE Transactions on Affective Computing10.1109/TAFFC.2022.320497214:3(1743-1753)Online publication date: 1-Jul-2023
      • (2022)Machine Reading at Scale: A Search Engine for Scientific and Academic ResearchSystems10.3390/systems1002004310:2(43)Online publication date: 5-Apr-2022

      View Options

      Get Access

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media