Large Language Models for Propaganda Span Annotation

Hasanain, Maram; Ahmad, Fatema; Alam, Firoj

Computer Science > Computation and Language

arXiv:2311.09812 (cs)

[Submitted on 16 Nov 2023 (v1), last revised 6 Oct 2024 (this version, v3)]

Title:Large Language Models for Propaganda Span Annotation

Authors:Maram Hasanain, Fatema Ahmad, Firoj Alam

View PDF HTML (experimental)

Abstract:The use of propagandistic techniques in online content has increased in recent years aiming to manipulate online audiences. Fine-grained propaganda detection and extraction of textual spans where propaganda techniques are used, are essential for more informed content consumption. Automatic systems targeting the task over lower resourced languages are limited, usually obstructed by lack of large scale training datasets. Our study investigates whether Large Language Models (LLMs), such as GPT-4, can effectively extract propagandistic spans. We further study the potential of employing the model to collect more cost-effective annotations. Finally, we examine the effectiveness of labels provided by GPT-4 in training smaller language models for the task. The experiments are performed over a large-scale in-house manually annotated dataset. The results suggest that providing more annotation context to GPT-4 within prompts improves its performance compared to human annotators. Moreover, when serving as an expert annotator (consolidator), the model provides labels that have higher agreement with expert annotators, and lead to specialized models that achieve state-of-the-art over an unseen Arabic testing set. Finally, our work is the first to show the potential of utilizing LLMs to develop annotated datasets for propagandistic spans detection task prompting it with annotations from human annotators with limited expertise. All scripts and annotations will be shared with the community.

Comments:	propaganda, span detection, disinformation, misinformation, fake news, LLMs, GPT-4
Subjects:	Computation and Language (cs.CL)
MSC classes:	68T50
ACM classes:	F.2.2; I.2.7
Cite as:	arXiv:2311.09812 [cs.CL]
	(or arXiv:2311.09812v3 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2311.09812

Submission history

From: Maram Hasanain [view email]
[v1] Thu, 16 Nov 2023 11:37:54 UTC (4,600 KB)
[v2] Sun, 14 Jan 2024 06:32:09 UTC (4,604 KB)
[v3] Sun, 6 Oct 2024 08:46:23 UTC (4,658 KB)

Computer Science > Computation and Language

Title:Large Language Models for Propaganda Span Annotation

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Large Language Models for Propaganda Span Annotation

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators