Show, Write, and Retrieve: Entity-aware Article Generation and Retrieval

Zhang, Zhongping; Gu, Yiwen; Plummer, Bryan A.

Computer Science > Computation and Language

arXiv:2112.05917v3 (cs)

[Submitted on 11 Dec 2021 (v1), last revised 20 Oct 2023 (this version, v3)]

Title:Show, Write, and Retrieve: Entity-aware Article Generation and Retrieval

Authors:Zhongping Zhang, Yiwen Gu, Bryan A. Plummer

View PDF

Abstract:Article comprehension is an important challenge in natural language processing with many applications such as article generation or image-to-article retrieval. Prior work typically encodes all tokens in articles uniformly using pretrained language models. However, in many applications, such as understanding news stories, these articles are based on real-world events and may reference many named entities that are difficult to accurately recognize and predict by language models. To address this challenge, we propose an ENtity-aware article GeneratIoN and rEtrieval (ENGINE) framework, to explicitly incorporate named entities into language models. ENGINE has two main components: a named-entity extraction module to extract named entities from both metadata and embedded images associated with articles, and an entity-aware mechanism that enhances the model's ability to recognize and predict entity names. We conducted experiments on three public datasets: GoodNews, VisualNews, and WikiText, where our results demonstrate that our model can boost both article generation and article retrieval performance, with a 4-5 perplexity improvement in article generation and a 3-4% boost in recall@1 in article retrieval. We release our implementation at this https URL .

Comments:	Accepted at EMNLP 2023 Findings
Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2112.05917 [cs.CL]
	(or arXiv:2112.05917v3 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2112.05917

Submission history

From: Zhongping Zhang [view email]
[v1] Sat, 11 Dec 2021 05:32:09 UTC (17,382 KB)
[v2] Thu, 24 Mar 2022 04:49:39 UTC (16,731 KB)
[v3] Fri, 20 Oct 2023 20:44:27 UTC (3,687 KB)

Computer Science > Computation and Language

Title:Show, Write, and Retrieve: Entity-aware Article Generation and Retrieval

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Show, Write, and Retrieve: Entity-aware Article Generation and Retrieval

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators