Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3539813.3545123acmconferencesArticle/Chapter ViewAbstractPublication PagesictirConference Proceedingsconference-collections
short-paper

Do Extractive Summarization Algorithms Amplify Lexical Bias in News Articles?

Published: 25 August 2022 Publication History
  • Get Citation Alerts
  • Abstract

    Users who read news summaries on search engine result pages and social media may not access the original news articles. Hence, if the summaries are automatically generated, it is vital that the automatic summaries represent the contents of the original articles accurately and fairly. The present study is concerned with lexical bias in sentences: a sentence is considered lexically biased if it contains expressions that may strongly influence the reader's opinion about a topic either positively or negatively. More specifically, we are interested in whether extractive summarizers can amplify lexical bias, by excessively extracting lexically biased sentences from the original article and thus misrepresent it. To address this question, we first introduce the Bias Independence Principle (BIP), which says that the probability that a sentence is selected by an extractive summarizer should be independent of whether the sentence is lexically biased or not. Based on the BIP, we propose an evaluation measure for extractive summarizers called the Bias Independence Criterion (BIC), which compares the distribution of the sentence scores for lexically biased sentences and that of the sentence scores for non-biased sentences. Moreover, based on the BIC, we define another measure called the Summary Feature Permutation Importance (SFPI) to examine whether a particular feature used by a feature-based extractive summarizer is responsible for amplifying lexical bias. Our experimental results suggest that a)~Different extractive summarizers can amplify lexical bias to different degrees; b)~The features useful for extracting informative sentences may also be responsible for amplifying lexical bias; and c)~as mean ROUGE scores increase (implying higher informativeness), mean BIC scores also tend to increase (implying a higher concentration of lexically biased sentences).

    References

    [1]
    André Altmann, Laura Tolocsi, Oliver Sander, and Thomas Lengauer. 2010. Permutation Importance: A Corrected Feature Importance Measure. Bioinformatics, Vol. 26, 10 (2010), 1340--1347.
    [2]
    Ramy Baly, Georgi Karadzhov, Dimitar Alexandrov, James Glass, and Preslav Nakov. 2020. Predicting Factuality of Reporting and Bias of News Media Sources. Proceedings of EMNLP 2018 (2020), 3528--3539.
    [3]
    David P. Baron. 2006. Persistent Media Bias. Journal of Public Economics, Vol. 90, 1--2 (2006), 1--36.
    [4]
    Iz Beltagy, Matthew E. Peters, and Arman Cohan. 2020. Longformer: The Long-Document Transformer. arxiv: 2004.05150v2 [cs.CL]
    [5]
    Wei-Fan Chen, Khalid Al-Khatib, Benno Stein, and Henning Wachsmuth. 2020 a. Detecting Media Bias in News Articles Using Gaussian Bias Distributions. Findings of ACL:EMNLP 2020 (2020), 4290--4300.
    [6]
    Wei-Fan Chen, Khalid Al-Khatib, Henning Wachsmuth, and Benno Stein. 2020 b. Analyzing Political Bias and Unfairness in News Articles at Different Levels of Granularity. arxiv: 2010.10652 [cs.CL]
    [7]
    Jianpeng Cheng and Mirella Lapata. 2016. Neural Summarization by Extracting Sentences and Words. Proceedings of ACL 2016, Vol. 1 (2016), 484--494.
    [8]
    Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. Proceedings of NAACL-HLT 2019, Vol. 1 (2019), 4171--4186.
    [9]
    Günecs Erkan and Dragomir R. Radev. 2004. LexRank: Graph-based Lexical Centrality as Salience in Text Summarization. Journal of Artificial Intelligence Research, Vol. 22 (2004), 457--479.
    [10]
    Lisa Fan, Marshall White, Eva Sharma, Ruisi Su, Prafulla Kumar Choubey, Ruihong Huang, and Lu Wang. 2020. In Plain Sight: Media Bias through the Lens of Factual Reporting. Proceedings of EMNLP-IJCNLP 2019 (2020), 6343--6349.
    [11]
    Matt Grenander, Yue Dong, Jackie C.K. Cheung, and Annie Louis. 2019. Countering the Effects of Lead Bias in News Summarization via Multi-stage Training and Auxiliary Losses. Proceedings of EMNLP-IJCNLP 2019 (2019), 6019--6024.
    [12]
    Karl Moritz Hermann, Tomávs Kovciský, Edward Grefenstette, Lasse Espeholt, Will Kay, Mustafa Suleyman, and Phil Blunsom. 2015. Teaching Machines to Read and Comprehend. Advances in Neural Information Processing Systems, Vol. 2015-January (2015), 1693--1701.
    [13]
    Mohit Iyyer, Peter Enns, Jordan Boyd-Graber, and Philip Resnik. 2014. Political Ideology Detection Using Recursive Neural Networks. Proceedings of ACL 2014, Vol. 1 (2014), 1113--1122.
    [14]
    Chin-Yew Lin. 2004. ROUGE: A Package for Automatic Evaluation of Summaries. In Text Summarization Branches Out . Association for Computational Linguistics, 74--81.
    [15]
    Yang Liu and Mirella Lapata. 2020. Text Summarization with Pretrained Encoders. Proceedings of EMNLP-IJCNLP 2019 (2020), 3730--3740.
    [16]
    Rada Mihalcea and Paul Tarau. 2004. TextRank: Bringing order into Texts. Proceedings of EMNLP 2004 (2004), 404--411.
    [17]
    Ramesh Nallapati, Feifei Zhai, and Bowen Zhou. 2017. SummaRuNNer: A Recurrent Neural Network based Sequence Model for Extractive Summarization of Documents. Proceedings of AAAI 2017 (2017), 3075--3081.
    [18]
    Anshuman Pattanaik, Sanjeevani Subhadra Mishra, and Madhabananda Das. 2020. A Comparative Study of Classifiers for Extractive Text Summarization. Advances in Intelligent Systems and Computing, Vol. 1101 (2020), 173--181.
    [19]
    Hannah Rashkin, Eunsol Choi, Jin Yea Jang, Svitlana Volkova, and Yejin Choi. 2017. Truth of Varying Shades: Analyzing Language in Fake News and Political Fact-checking. Proceedings of EMNLP 2017 (2017), 2931--2937.
    [20]
    Tetsuya Sakai. 2021. Evaluating Evaluation Measures for Ordinal Classification and Ordinal Quantification. Proceedings of ACL-IJCNLP 2021 (2021), 2759--2769.
    [21]
    Tetsuya Sakai and Zhicheng Dou. 2013. Summaries, Ranked Retrieval and Sessions: A Unified Framework for Information Access Evaluation. In Proceedings of ACM SIGIR 2013 . 473--482.
    [22]
    Tetsuya Sakai and Karen Sparck Jones. 2001. Generic Summaries for Indexing in Information Retrieval. In Proceedings of ACM SIGIR 2001 . 190--198.
    [23]
    Mark D. Smucker and Charles L.A. Clarke. 2012. Time-based Calibration of Effectiveness Measures. In Proceedings of ACM SIGIR 2012 . 95--104.
    [24]
    Anastasios Tombros and Mark Sanderson. 1998. Advantages of Query Biased Summaries in Information Retrieval. In Proceedings of ACM SIGIR '98 . 2--10.
    [25]
    Andrew Turpin, Falk Scholer, Kalervo J"arvelin, Mingfang Wu, and J. Shane Culpepper. 2009. Including Summaries in System Evaluation. In Proceedings of ACM SIGIR 2009 . 508--515.
    [26]
    Chenguang Zhu, Ziyi Yang, Robert Gmyr, Michael Zeng, and Xuedong Huang. 2021. Leveraging Lead Bias for Zero-shot Abstractive News Summarization. Proceedings of ACM SIGIR 2021 (2021), 1462--1471.

    Index Terms

    1. Do Extractive Summarization Algorithms Amplify Lexical Bias in News Articles?

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      ICTIR '22: Proceedings of the 2022 ACM SIGIR International Conference on Theory of Information Retrieval
      August 2022
      289 pages
      ISBN:9781450394123
      DOI:10.1145/3539813
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 25 August 2022

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. evaluation
      2. evaluation measures
      3. lexical bias
      4. text summarization

      Qualifiers

      • Short-paper

      Conference

      ICTIR '22
      Sponsor:

      Acceptance Rates

      ICTIR '22 Paper Acceptance Rate 32 of 80 submissions, 40%;
      Overall Acceptance Rate 235 of 527 submissions, 45%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • 0
        Total Citations
      • 87
        Total Downloads
      • Downloads (Last 12 months)21
      • Downloads (Last 6 weeks)2
      Reflects downloads up to 09 Aug 2024

      Other Metrics

      Citations

      View Options

      Get Access

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media