Can BERT Reason? Logically Equivalent Probes for Evaluating the Inference Capabilities of Language Models

Zhou, Pei; Khanna, Rahul; Lin, Bill Yuchen; Ho, Daniel; Ren, Xiang; Pujara, Jay

Computer Science > Computation and Language

arXiv:2005.00782v1 (cs)

[Submitted on 2 May 2020 (this version), latest version 10 Sep 2021 (v4)]

Title:Can BERT Reason? Logically Equivalent Probes for Evaluating the Inference Capabilities of Language Models

Authors:Pei Zhou, Rahul Khanna, Bill Yuchen Lin, Daniel Ho, Xiang Ren, Jay Pujara

View PDF

Abstract:Pre-trained language models (PTLM) have greatly improved performance on commonsense inference benchmarks, however, it remains unclear whether they share a human's ability to consistently make correct inferences under perturbations. Prior studies of PTLMs have found inference deficits, but have failed to provide a systematic means of understanding whether these deficits are due to low inference abilities or poor inference robustness. In this work, we address this gap by developing a procedure that allows for the systematized probing of both PTLMs' inference abilities and robustness. Our procedure centers around the methodical creation of logically-equivalent, but syntactically-different sets of probes, of which we create a corpus of 14,400 probes coming from 60 logically-equivalent sets that can be used to probe PTLMs in three task settings. We find that despite the recent success of large PTLMs on commonsense benchmarks, their performances on our probes are no better than random guessing (even with fine-tuning) and are heavily dependent on biases--the poor overall performance, unfortunately, inhibits us from studying robustness. We hope our approach and initial probe set will assist future work in improving PTLMs' inference abilities, while also providing a probing set to test robustness under several linguistic variations--code and data will be released.

Comments:	15 pages, 11 figures. Work in progress
Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Logic in Computer Science (cs.LO)
Cite as:	arXiv:2005.00782 [cs.CL]
	(or arXiv:2005.00782v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2005.00782

Submission history

From: Pei Zhou [view email]
[v1] Sat, 2 May 2020 10:36:55 UTC (420 KB)
[v2] Wed, 23 Sep 2020 04:11:53 UTC (1,608 KB)
[v3] Tue, 13 Apr 2021 23:40:23 UTC (2,377 KB)
[v4] Fri, 10 Sep 2021 01:37:12 UTC (2,396 KB)

Computer Science > Computation and Language

Title:Can BERT Reason? Logically Equivalent Probes for Evaluating the Inference Capabilities of Language Models

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Can BERT Reason? Logically Equivalent Probes for Evaluating the Inference Capabilities of Language Models

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators