research-article

Feature-based Unsupervised Method for Salient Sentence Ranking in Text Summarization Task

Authors:

Minh Phuong Nguyen,

The Anh LeAuthors Info & Claims

ICIIT '24: Proceedings of the 2024 9th International Conference on Intelligent Information Technology

Pages 346 - 351

https://doi.org/10.1145/3654522.3654556

Published: 26 July 2024 Publication History

Abstract

Salient Sentence Ranking is an essential task that plays a vital role in Data Mining, especially in unsupervised document summarization tasks. In this paper, we introduce a simple yet effective unsupervised method to extract the salient sentences from a cluster of documents. Our method synthesizes the sentence scoring from various feature-based information containing position, topic, keyword, semantic, entity, sentence centroid -scores. The proposed method has the potential to generate large-scale pseudo-summary, which supports the tasks of summarization. To this end, our approach is able to incorporate pre-trained objectives used in pre-trained language models to diminish the problems of the lack of annotated datasets in low-resource languages like Vietnamese. We also conducted experiments to verify the effectiveness of various feature-based scoring methods and their combinations. Our experimental results on two well-known benchmark datasets, MultiNews and NewSHead, show the superiority of our proposed method compared with the previous unsupervised approaches.

References

[1]

Jaime Carbonell and Jade Goldstein. 1998. The Use of MMR, Diversity-Based Reranking for Reordering Documents and Producing Summaries. In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (Melbourne, Australia) (SIGIR ’98). Association for Computing Machinery, New York, NY, USA, 335–336. https://doi.org/10.1145/290941.291025

Digital Library

[2]

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Association for Computational Linguistics, Minneapolis, Minnesota, 4171–4186.

[3]

Günes Erkan and Dragomir R. Radev. 2004. LexRank: Graph-Based Lexical Centrality as Salience in Text Summarization. J. Artif. Int. Res. 22, 1 (dec 2004), 457–479.

[4]

Alexander Fabbri, Irene Li, Tianwei She, Suyi Li, and Dragomir Radev. 2019. Multi-News: A Large-Scale Multi-Document Summarization Dataset and Abstractive Hierarchical Model. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Florence, Italy, 1074–1084. https://doi.org/10.18653/v1/P19-1102

[5]

Travis Goodwin, Max Savery, and Dina Demner-Fushman. 2020. Flight of the PEGASUS? Comparing Transformers on Few-shot and Zero-shot Multi-document Abstractive Summarization. In Proceedings of the 28th International Conference on Computational Linguistics. International Committee on Computational Linguistics, Barcelona, Spain (Online), 5640–5646. https://doi.org/10.18653/v1/2020.coling-main.494

[6]

Maarten Grootendorst. 2020. KeyBERT: Minimal keyword extraction with BERT.

[7]

Xiaotao Gu, Yuning Mao, Jiawei Han, Jialu Liu, You Wu, Cong Yu, Daniel Finnie, Hongkun Yu, Jiaqi Zhai, and Nicholas Zukoski. 2020. Generating Representative Headlines for News Stories. In Proceedings of The Web Conference 2020 (Taipei, Taiwan) (WWW ’20). Association for Computing Machinery, New York, NY, USA, 1773–1784.

Digital Library

[8]

Akanksha Joshi, Eduardo Fidalgo, Enrique Alegre, and Rocio Alaiz-Rodriguez. 2022. RankSum—An unsupervised extractive text summarization based on rank fusion. Expert Systems with Applications 200 (2022), 116846.

Digital Library

[9]

Salima Lamsiyah, Abdelkader El Mahdaouy, Bernard Espinasse, and Saïd El Alaoui Ouatik. 2021. An Unsupervised Method for Extractive Multi-Document Summarization Based on Centroid Approach and Sentence Embeddings. Expert Syst. Appl. 167, C (apr 2021), 16 pages.

[10]

Salima Lamsiyah, Abdelkader El Mahdaouy, Saïd El Alaoui Ouatik, and Bernard Espinasse. 2023. Unsupervised extractive multi-document summarization method based on transfer learning from BERT multi-task fine-tuning. Journal of Information Science 49, 1 (2023), 164–182. arXiv:https://doi.org/10.1177/0165551521990616

Digital Library

[11]

Logan Lebanoff, Kaiqiang Song, Franck Dernoncourt, Doo Soon Kim, Seokhwan Kim, Walter Chang, and Fei Liu. 2019. Scoring Sentence Singletons and Pairs for Abstractive Summarization. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Florence, Italy, 2175–2189. https://doi.org/10.18653/v1/P19-1209

[12]

Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Veselin Stoyanov, and Luke Zettlemoyer. 2020. BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Dan Jurafsky, Joyce Chai, Natalie Schluter, and Joel Tetreault (Eds.). Association for Computational Linguistics, Online, 7871–7880.

[13]

Chin-Yew Lin. 2004. ROUGE: A Package for Automatic Evaluation of Summaries. In Text Summarization Branches Out. Association for Computational Linguistics, Barcelona, Spain, 74–81.

[14]

Rada Mihalcea and Paul Tarau. 2004. TextRank: Bringing Order into Text. In Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Barcelona, Spain, 404–411.

[15]

Ani Nenkova and Rebecca Passonneau. 2004. Evaluating Content Selection in Summarization: The Pyramid Method. In Proceedings of the Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics: HLT-NAACL 2004. Association for Computational Linguistics, Boston, Massachusetts, USA, 145–152.

[16]

Nils Reimers and Iryna Gurevych. 2019. Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). Association for Computational Linguistics, Hong Kong, China, 3982–3992.

[17]

Deepak Sahoo, Ashutosh Bhoi, and Rakesh Chandra Balabantaray. 2018. Hybrid Approach To Abstractive Summarization. Procedia Computer Science 132 (2018), 1228–1237. International Conference on Computational Intelligence and Data Science.

Digital Library

[18]

Simone Tedeschi, Valentino Maiorca, Niccolò Campolungo, Francesco Cecconi, and Roberto Navigli. 2021. WikiNEuRal: Combined Neural and Knowledge-based Silver Data Creation for Multilingual NER. In Findings of the Association for Computational Linguistics: EMNLP 2021. Association for Computational Linguistics, Punta Cana, Dominican Republic, 2521–2533.

[19]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Ł ukasz Kaiser, and Illia Polosukhin. 2017. Attention is All you Need. In Advances in Neural Information Processing Systems, I. Guyon, U. Von Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (Eds.). Vol. 30. Curran Associates, Inc.

[20]

Pradeepika Verma and Hari Om. 2019. A novel approach for text summarization using optimal combination of sentence scoring methods. Sādhanā 44 (05 2019).

[21]

Wen Xiao, Iz Beltagy, Giuseppe Carenini, and Arman Cohan. 2022. PRIMERA: Pyramid-based Masked Sentence Pre-training for Multi-document Summarization. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Dublin, Ireland, 5245–5263. https://doi.org/10.18653/v1/2022.acl-long.360

[22]

Jiacheng Xu and Greg Durrett. 2019. Neural Extractive Text Summarization with Syntactic Compression. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). Association for Computational Linguistics, Hong Kong, China, 3292–3303. https://doi.org/10.18653/v1/D19-1324

[23]

Jingqing Zhang, Yao Zhao, Mohammad Saleh, and Peter J. Liu. 2020. PEGASUS: Pre-Training with Extracted Gap-Sentences for Abstractive Summarization. In Proceedings of the 37th International Conference on Machine Learning(ICML’20). JMLR.org, Article 1051, 12 pages.

Index Terms

Feature-based Unsupervised Method for Salient Sentence Ranking in Text Summarization Task
1. Information systems
  1. Information retrieval
    1. Retrieval models and ranking
      1. Rank aggregation

Recommendations

Unsupervised Extractive Text Summarization with Distance-Augmented Sentence Graphs
SIGIR '21: Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval

Supervised summarization has made significant improvements in recent years by leveraging cutting-edge deep learning technologies. However, the true success of supervised methods relies on the availability of large quantity of human-generated summaries of ...
Unsupervised abstractive summarization via sentence rewriting
Abstract
Unsupervised extractive summarization aims to extract salient sentences from the document without labeled corpus. Existing methods have achieved promising progress, thanks to the power of large-scale pre-trained language models and high-quality ...
An unsupervised method for extractive multi-document summarization based on centroid approach and sentence embeddings
Abstract
Extractive multi-document summarization (MDS) is the process of automatically summarizing a collection of documents by ranking sentences according to their importance and informativeness. Text representation is a fundamental process ...
Highlights
- An unsupervised method for extractive multi-document summarization.
- Pre-trained ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

ICIIT '24: Proceedings of the 2024 9th International Conference on Intelligent Information Technology

February 2024

596 pages

ISBN:9798400716713

DOI:10.1145/3654522

Copyright © 2024 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 26 July 2024

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

ICIIT 2024

ICIIT 2024: 2024 9th International Conference on Intelligent Information Technology

February 23 - 25, 2024

Ho Chi Minh City, Vietnam

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
14
Total Downloads

Downloads (Last 12 months)14
Downloads (Last 6 weeks)2

Reflects downloads up to 09 Jan 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View Table of Contents