short-paper

Do Extractive Summarization Algorithms Amplify Lexical Bias in News Articles?

Authors:

Rei Shimizu,

Sumio Fujita,

Tetsuya SakaiAuthors Info & Claims

ICTIR '22: Proceedings of the 2022 ACM SIGIR International Conference on Theory of Information Retrieval

Pages 133 - 137

https://doi.org/10.1145/3539813.3545123

Published: 25 August 2022 Publication History

Get Access

Abstract

Users who read news summaries on search engine result pages and social media may not access the original news articles. Hence, if the summaries are automatically generated, it is vital that the automatic summaries represent the contents of the original articles accurately and fairly. The present study is concerned with lexical bias in sentences: a sentence is considered lexically biased if it contains expressions that may strongly influence the reader's opinion about a topic either positively or negatively. More specifically, we are interested in whether extractive summarizers can amplify lexical bias, by excessively extracting lexically biased sentences from the original article and thus misrepresent it. To address this question, we first introduce the Bias Independence Principle (BIP), which says that the probability that a sentence is selected by an extractive summarizer should be independent of whether the sentence is lexically biased or not. Based on the BIP, we propose an evaluation measure for extractive summarizers called the Bias Independence Criterion (BIC), which compares the distribution of the sentence scores for lexically biased sentences and that of the sentence scores for non-biased sentences. Moreover, based on the BIC, we define another measure called the Summary Feature Permutation Importance (SFPI) to examine whether a particular feature used by a feature-based extractive summarizer is responsible for amplifying lexical bias. Our experimental results suggest that a)~Different extractive summarizers can amplify lexical bias to different degrees; b)~The features useful for extracting informative sentences may also be responsible for amplifying lexical bias; and c)~as mean ROUGE scores increase (implying higher informativeness), mean BIC scores also tend to increase (implying a higher concentration of lexically biased sentences).

References

[1]

André Altmann, Laura Tolocsi, Oliver Sander, and Thomas Lengauer. 2010. Permutation Importance: A Corrected Feature Importance Measure. Bioinformatics, Vol. 26, 10 (2010), 1340--1347.

Abstract

References

Index Terms

Recommendations

RankSum—An unsupervised extractive text summarization based on rank fusion

Sentiment diversification for short review summarization

Enhancing extractive summarization using non-negative matrix factorization with semantic aspects and sentence features

Comments

Information

Published In

Sponsors

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Get Access

Login options

Full Access

View options

PDF

eReader

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations