A Modular Approach for Multimodal Summarization of TV Shows

Mahon, Louis; Lapata, Mirella

Computer Science > Computation and Language

arXiv:2403.03823 (cs)

[Submitted on 6 Mar 2024 (v1), last revised 22 Aug 2024 (this version, v9)]

Title:A Modular Approach for Multimodal Summarization of TV Shows

Authors:Louis Mahon, Mirella Lapata

View PDF HTML (experimental)

Abstract:In this paper we address the task of summarizing television shows, which touches key areas in AI research: complex reasoning, multiple modalities, and long narratives. We present a modular approach where separate components perform specialized sub-tasks which we argue affords greater flexibility compared to end-to-end methods. Our modules involve detecting scene boundaries, reordering scenes so as to minimize the number of cuts between different events, converting visual information to text, summarizing the dialogue in each scene, and fusing the scene summaries into a final summary for the entire episode. We also present a new metric, PRISMA (Precision and Recall EvaluatIon of Summary FActs), to measure both precision and recall of generated summaries, which we decompose into atomic facts. Tested on the recently released SummScreen3D dataset, our method produces higher quality summaries than comparison models, as measured with ROUGE and our new fact-based metric, and as assessed by human evaluators.

Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2403.03823 [cs.CL]
	(or arXiv:2403.03823v9 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2403.03823

Submission history

From: Louis Mahon [view email]
[v1] Wed, 6 Mar 2024 16:10:01 UTC (2,359 KB)
[v2] Thu, 7 Mar 2024 09:10:42 UTC (2,359 KB)
[v3] Thu, 16 May 2024 15:45:58 UTC (2,360 KB)
[v4] Thu, 13 Jun 2024 20:58:03 UTC (2,364 KB)
[v5] Tue, 2 Jul 2024 11:22:14 UTC (2,364 KB)
[v6] Sat, 6 Jul 2024 07:58:14 UTC (2,364 KB)
[v7] Tue, 6 Aug 2024 14:47:11 UTC (2,366 KB)
[v8] Fri, 9 Aug 2024 14:48:52 UTC (2,365 KB)
[v9] Thu, 22 Aug 2024 10:00:53 UTC (2,365 KB)

Computer Science > Computation and Language

Title:A Modular Approach for Multimodal Summarization of TV Shows

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:A Modular Approach for Multimodal Summarization of TV Shows

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators