Social Biases in Automatic Evaluation Metrics for NLG

Gao, Mingqi; Wan, Xiaojun

Computer Science > Computation and Language

arXiv:2210.08859 (cs)

[Submitted on 17 Oct 2022]

Title:Social Biases in Automatic Evaluation Metrics for NLG

Authors:Mingqi Gao, Xiaojun Wan

View PDF

Abstract:Many studies have revealed that word embeddings, language models, and models for specific downstream tasks in NLP are prone to social biases, especially gender bias. Recently these techniques have been gradually applied to automatic evaluation metrics for text generation. In the paper, we propose an evaluation method based on Word Embeddings Association Test (WEAT) and Sentence Embeddings Association Test (SEAT) to quantify social biases in evaluation metrics and discover that social biases are also widely present in some model-based automatic evaluation metrics. Moreover, we construct gender-swapped meta-evaluation datasets to explore the potential impact of gender bias in image caption and text summarization tasks. Results show that given gender-neutral references in the evaluation, model-based evaluation metrics may show a preference for the male hypothesis, and the performance of them, i.e. the correlation between evaluation metrics and human judgments, usually has more significant variation after gender swapping.

Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2210.08859 [cs.CL]
	(or arXiv:2210.08859v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2210.08859

Submission history

From: Mingqi Gao [view email]
[v1] Mon, 17 Oct 2022 08:55:26 UTC (527 KB)

Computer Science > Computation and Language

Title:Social Biases in Automatic Evaluation Metrics for NLG

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Social Biases in Automatic Evaluation Metrics for NLG

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators