research-article

From coarse to fine: : Enhancing multi-document summarization with multi-granularity relationship-based extractor

Authors:

Jiyu Lu,

Xuejun ZhangAuthors Info & Claims

Volume 61, Issue 3

https://doi.org/10.1016/j.ipm.2024.103696

Published: 01 May 2024 Publication History

Abstract

Multi-Document Summarization (MDS) is a challenging task due to the fact that multiple documents not only have extremely long inputs but may also be overlapping, complementary, or contradictory to each other. In this paper, we propose to capture complex cross-document interactions to handle lengthy inputs for better multi-document summarization. Specifically, we present MDS-MGRE, a coarse-to-fine MDS framework that introduces Multi-Granularity Relationships into an Extract-then-summarize pipeline. In the coarse-grained stage, multi-granularity embedding, heterogeneous graph construction, and MGRExtractor work together to convert redundant multi-documents into compact meta-documents. We first utilize pre-trained language model BERT to obtain semantically rich embeddings for documents at different granularities, including documents, paragraphs, sentence-sets, and sentences. Then, we construct a heterogeneous graph with 4 types of nodes (document nodes, paragraph nodes, sentence-set nodes, and sentence nodes) and corresponding connecting edges to model rich document relationships. Furthermore, we propose a novel Multi-Granularity Relationship-based Extractor (MGRExtractor) to produce meta-documents by efficiently pruning heterogeneous graphs. More precisely, it consists of 4 main modules: noise removal, redundancy removal, multi-granularity scoring, and sentence-set selection. In the fine-grained stage, we employ the large configuration of BART as our abstractive summarizer to generate system summaries from the extracted meta-documents. Experimental results on two benchmark datasets show that our framework significantly outperforms strong baselines with comparable parameters, and slightly underperforms methods with a maximum encoding length of 16,384 tokens. For Multi-News and WCEP, automatic evaluation results show that MDS-MGRE achieves an average performance improvement of 1.75% and 8.77% compared to the state-of-the-art systems with comparable parameters, respectively. Such positive results demonstrate the benefits of generating high-quality meta-documents to enhance MDS by modeling rich document relationships.

References

[1]

Agarwal R., Chatterjee N., Improvements in multi-document abstractive summarization using multi sentence compression with word graph and node alignment, Expert Systems with Applications 190 (2022).

Abstract

References

Index Terms

Recommendations

Sentiment Lossless Summarization

Opinion summarization on spontaneous conversations

Latent Dirichlet Allocation and Singular Value Decomposition Based Multi-document Summarization

Comments

Information

Published In

Publisher

Publication History

Author Tags

Qualifiers

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

View options

Share

Share this Publication link

Share on social media

Affiliations