Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3654522.3654556acmotherconferencesArticle/Chapter ViewAbstractPublication PagesiciitConference Proceedingsconference-collections
research-article

Feature-based Unsupervised Method for Salient Sentence Ranking in Text Summarization Task

Published: 26 July 2024 Publication History

Abstract

Salient Sentence Ranking is an essential task that plays a vital role in Data Mining, especially in unsupervised document summarization tasks. In this paper, we introduce a simple yet effective unsupervised method to extract the salient sentences from a cluster of documents. Our method synthesizes the sentence scoring from various feature-based information containing position, topic, keyword, semantic, entity, sentence centroid -scores. The proposed method has the potential to generate large-scale pseudo-summary, which supports the tasks of summarization. To this end, our approach is able to incorporate pre-trained objectives used in pre-trained language models to diminish the problems of the lack of annotated datasets in low-resource languages like Vietnamese. We also conducted experiments to verify the effectiveness of various feature-based scoring methods and their combinations. Our experimental results on two well-known benchmark datasets, MultiNews and NewSHead, show the superiority of our proposed method compared with the previous unsupervised approaches.

References

[1]
Jaime Carbonell and Jade Goldstein. 1998. The Use of MMR, Diversity-Based Reranking for Reordering Documents and Producing Summaries. In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (Melbourne, Australia) (SIGIR ’98). Association for Computing Machinery, New York, NY, USA, 335–336. https://doi.org/10.1145/290941.291025
[2]
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Association for Computational Linguistics, Minneapolis, Minnesota, 4171–4186.
[3]
Günes Erkan and Dragomir R. Radev. 2004. LexRank: Graph-Based Lexical Centrality as Salience in Text Summarization. J. Artif. Int. Res. 22, 1 (dec 2004), 457–479.
[4]
Alexander Fabbri, Irene Li, Tianwei She, Suyi Li, and Dragomir Radev. 2019. Multi-News: A Large-Scale Multi-Document Summarization Dataset and Abstractive Hierarchical Model. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Florence, Italy, 1074–1084. https://doi.org/10.18653/v1/P19-1102
[5]
Travis Goodwin, Max Savery, and Dina Demner-Fushman. 2020. Flight of the PEGASUS? Comparing Transformers on Few-shot and Zero-shot Multi-document Abstractive Summarization. In Proceedings of the 28th International Conference on Computational Linguistics. International Committee on Computational Linguistics, Barcelona, Spain (Online), 5640–5646. https://doi.org/10.18653/v1/2020.coling-main.494
[6]
Maarten Grootendorst. 2020. KeyBERT: Minimal keyword extraction with BERT.
[7]
Xiaotao Gu, Yuning Mao, Jiawei Han, Jialu Liu, You Wu, Cong Yu, Daniel Finnie, Hongkun Yu, Jiaqi Zhai, and Nicholas Zukoski. 2020. Generating Representative Headlines for News Stories. In Proceedings of The Web Conference 2020 (Taipei, Taiwan) (WWW ’20). Association for Computing Machinery, New York, NY, USA, 1773–1784.
[8]
Akanksha Joshi, Eduardo Fidalgo, Enrique Alegre, and Rocio Alaiz-Rodriguez. 2022. RankSum—An unsupervised extractive text summarization based on rank fusion. Expert Systems with Applications 200 (2022), 116846.
[9]
Salima Lamsiyah, Abdelkader El Mahdaouy, Bernard Espinasse, and Saïd El Alaoui Ouatik. 2021. An Unsupervised Method for Extractive Multi-Document Summarization Based on Centroid Approach and Sentence Embeddings. Expert Syst. Appl. 167, C (apr 2021), 16 pages.
[10]
Salima Lamsiyah, Abdelkader El Mahdaouy, Saïd El Alaoui Ouatik, and Bernard Espinasse. 2023. Unsupervised extractive multi-document summarization method based on transfer learning from BERT multi-task fine-tuning. Journal of Information Science 49, 1 (2023), 164–182. arXiv:https://doi.org/10.1177/0165551521990616
[11]
Logan Lebanoff, Kaiqiang Song, Franck Dernoncourt, Doo Soon Kim, Seokhwan Kim, Walter Chang, and Fei Liu. 2019. Scoring Sentence Singletons and Pairs for Abstractive Summarization. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Florence, Italy, 2175–2189. https://doi.org/10.18653/v1/P19-1209
[12]
Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Veselin Stoyanov, and Luke Zettlemoyer. 2020. BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Dan Jurafsky, Joyce Chai, Natalie Schluter, and Joel Tetreault (Eds.). Association for Computational Linguistics, Online, 7871–7880.
[13]
Chin-Yew Lin. 2004. ROUGE: A Package for Automatic Evaluation of Summaries. In Text Summarization Branches Out. Association for Computational Linguistics, Barcelona, Spain, 74–81.
[14]
Rada Mihalcea and Paul Tarau. 2004. TextRank: Bringing Order into Text. In Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Barcelona, Spain, 404–411.
[15]
Ani Nenkova and Rebecca Passonneau. 2004. Evaluating Content Selection in Summarization: The Pyramid Method. In Proceedings of the Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics: HLT-NAACL 2004. Association for Computational Linguistics, Boston, Massachusetts, USA, 145–152.
[16]
Nils Reimers and Iryna Gurevych. 2019. Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). Association for Computational Linguistics, Hong Kong, China, 3982–3992.
[17]
Deepak Sahoo, Ashutosh Bhoi, and Rakesh Chandra Balabantaray. 2018. Hybrid Approach To Abstractive Summarization. Procedia Computer Science 132 (2018), 1228–1237. International Conference on Computational Intelligence and Data Science.
[18]
Simone Tedeschi, Valentino Maiorca, Niccolò Campolungo, Francesco Cecconi, and Roberto Navigli. 2021. WikiNEuRal: Combined Neural and Knowledge-based Silver Data Creation for Multilingual NER. In Findings of the Association for Computational Linguistics: EMNLP 2021. Association for Computational Linguistics, Punta Cana, Dominican Republic, 2521–2533.
[19]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Ł ukasz Kaiser, and Illia Polosukhin. 2017. Attention is All you Need. In Advances in Neural Information Processing Systems, I. Guyon, U. Von Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (Eds.). Vol. 30. Curran Associates, Inc.
[20]
Pradeepika Verma and Hari Om. 2019. A novel approach for text summarization using optimal combination of sentence scoring methods. Sādhanā 44 (05 2019).
[21]
Wen Xiao, Iz Beltagy, Giuseppe Carenini, and Arman Cohan. 2022. PRIMERA: Pyramid-based Masked Sentence Pre-training for Multi-document Summarization. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Dublin, Ireland, 5245–5263. https://doi.org/10.18653/v1/2022.acl-long.360
[22]
Jiacheng Xu and Greg Durrett. 2019. Neural Extractive Text Summarization with Syntactic Compression. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). Association for Computational Linguistics, Hong Kong, China, 3292–3303. https://doi.org/10.18653/v1/D19-1324
[23]
Jingqing Zhang, Yao Zhao, Mohammad Saleh, and Peter J. Liu. 2020. PEGASUS: Pre-Training with Extracted Gap-Sentences for Abstractive Summarization. In Proceedings of the 37th International Conference on Machine Learning(ICML’20). JMLR.org, Article 1051, 12 pages.

Index Terms

  1. Feature-based Unsupervised Method for Salient Sentence Ranking in Text Summarization Task

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    ICIIT '24: Proceedings of the 2024 9th International Conference on Intelligent Information Technology
    February 2024
    596 pages
    ISBN:9798400716713
    DOI:10.1145/3654522
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 26 July 2024

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Unsupervised sentence scoring
    2. salient sentence extraction
    3. unsupervised multi-document summarization (mds)

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Conference

    ICIIT 2024

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 14
      Total Downloads
    • Downloads (Last 12 months)14
    • Downloads (Last 6 weeks)2
    Reflects downloads up to 09 Jan 2025

    Other Metrics

    Citations

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media