A Bayesian Approach to Harnessing the Power of LLMs in Authorship Attribution

Hu, Zhengmian; Zheng, Tong; Huang, Heng

Computer Science > Computation and Language

arXiv:2410.21716 (cs)

[Submitted on 29 Oct 2024]

Title:A Bayesian Approach to Harnessing the Power of LLMs in Authorship Attribution

Authors:Zhengmian Hu, Tong Zheng, Heng Huang

View PDF HTML (experimental)

Abstract:Authorship attribution aims to identify the origin or author of a document. Traditional approaches have heavily relied on manual features and fail to capture long-range correlations, limiting their effectiveness. Recent advancements leverage text embeddings from pre-trained language models, which require significant fine-tuning on labeled data, posing challenges in data dependency and limited interpretability. Large Language Models (LLMs), with their deep reasoning capabilities and ability to maintain long-range textual associations, offer a promising alternative. This study explores the potential of pre-trained LLMs in one-shot authorship attribution, specifically utilizing Bayesian approaches and probability outputs of LLMs. Our methodology calculates the probability that a text entails previous writings of an author, reflecting a more nuanced understanding of authorship. By utilizing only pre-trained models such as Llama-3-70B, our results on the IMDb and blog datasets show an impressive 85\% accuracy in one-shot authorship classification across ten authors. Our findings set new baselines for one-shot authorship analysis using LLMs and expand the application scope of these models in forensic linguistics. This work also includes extensive ablation studies to validate our approach.

Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Applications (stat.AP)
Cite as:	arXiv:2410.21716 [cs.CL]
	(or arXiv:2410.21716v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2410.21716

Submission history

From: Zhengmian Hu [view email]
[v1] Tue, 29 Oct 2024 04:14:23 UTC (40 KB)

Computer Science > Computation and Language

Title:A Bayesian Approach to Harnessing the Power of LLMs in Authorship Attribution

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:A Bayesian Approach to Harnessing the Power of LLMs in Authorship Attribution

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators