Comparative Document Summarisation via Classification

Bista, Umanga; Mathews, Alexander; Shin, Minjeong; Menon, Aditya Krishna; Xie, Lexing

doi:10.1609/aaai.v33i01.330120

Computer Science > Information Retrieval

arXiv:1812.02171 (cs)

[Submitted on 6 Dec 2018 (v1), last revised 2 Jan 2020 (this version, v2)]

Title:Comparative Document Summarisation via Classification

Authors:Umanga Bista, Alexander Mathews, Minjeong Shin, Aditya Krishna Menon, Lexing Xie

View PDF

Abstract:This paper considers extractive summarisation in a comparative setting: given two or more document groups (e.g., separated by publication time), the goal is to select a small number of documents that are representative of each group, and also maximally distinguishable from other groups. We formulate a set of new objective functions for this problem that connect recent literature on document summarisation, interpretable machine learning, and data subset selection. In particular, by casting the problem as a binary classification amongst different groups, we derive objectives based on the notion of maximum mean discrepancy, as well as a simple yet effective gradient-based optimisation strategy. Our new formulation allows scalable evaluations of comparative summarisation as a classification task, both automatically and via crowd-sourcing. To this end, we evaluate comparative summarisation methods on a newly curated collection of controversial news topics over 13 months. We observe that gradient-based optimisation outperforms discrete and baseline approaches in 14 out of 24 different automatic evaluation settings. In crowd-sourced evaluations, summaries from gradient optimisation elicit 7% more accurate classification from human workers than discrete optimisation. Our result contrasts with recent literature on submodular data subset selection that favours discrete optimisation. We posit that our formulation of comparative summarisation will prove useful in a diverse range of use cases such as comparing content sources, authors, related topics, or distinct view points.

Comments:	Accepted for AAAI 2019
Subjects:	Information Retrieval (cs.IR); Machine Learning (cs.LG); Machine Learning (stat.ML)
Cite as:	arXiv:1812.02171 [cs.IR]
	(or arXiv:1812.02171v2 [cs.IR] for this version)
	https://doi.org/10.48550/arXiv.1812.02171
Journal reference:	Proceedings of the AAAI Conference on Artificial Intelligence. Vol. 33. 2019
Related DOI:	https://doi.org/10.1609/aaai.v33i01.330120

Submission history

From: Umanga Bista [view email]
[v1] Thu, 6 Dec 2018 04:04:56 UTC (1,144 KB)
[v2] Thu, 2 Jan 2020 08:42:03 UTC (2,506 KB)

Computer Science > Information Retrieval

Title:Comparative Document Summarisation via Classification

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Information Retrieval

Title:Comparative Document Summarisation via Classification

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators