PlagBench: Exploring the Duality of Large Language Models in Plagiarism Generation and Detection

Lee, Jooyoung; Agrawal, Toshini; Uchendu, Adaku; Le, Thai; Chen, Jinghui; Lee, Dongwon

Computer Science > Computation and Language

arXiv:2406.16288 (cs)

[Submitted on 24 Jun 2024]

Title:PlagBench: Exploring the Duality of Large Language Models in Plagiarism Generation and Detection

Authors:Jooyoung Lee, Toshini Agrawal, Adaku Uchendu, Thai Le, Jinghui Chen, Dongwon Lee

View PDF HTML (experimental)

Abstract:Recent literature has highlighted potential risks to academic integrity associated with large language models (LLMs), as they can memorize parts of training instances and reproduce them in the generated texts without proper attribution. In addition, given their capabilities in generating high-quality texts, plagiarists can exploit LLMs to generate realistic paraphrases or summaries indistinguishable from original work. In response to possible malicious use of LLMs in plagiarism, we introduce PlagBench, a comprehensive dataset consisting of 46.5K synthetic plagiarism cases generated using three instruction-tuned LLMs across three writing domains. The quality of PlagBench is ensured through fine-grained automatic evaluation for each type of plagiarism, complemented by human annotation. We then leverage our proposed dataset to evaluate the plagiarism detection performance of five modern LLMs and three specialized plagiarism checkers. Our findings reveal that GPT-3.5 tends to generates paraphrases and summaries of higher quality compared to Llama2 and GPT-4. Despite LLMs' weak performance in summary plagiarism identification, they can surpass current commercial plagiarism detectors. Overall, our results highlight the potential of LLMs to serve as robust plagiarism detection tools.

Comments:	9 pages
Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2406.16288 [cs.CL]
	(or arXiv:2406.16288v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2406.16288

Submission history

From: Jooyoung Lee [view email]
[v1] Mon, 24 Jun 2024 03:29:53 UTC (4,282 KB)

Computer Science > Computation and Language

Title:PlagBench: Exploring the Duality of Large Language Models in Plagiarism Generation and Detection

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:PlagBench: Exploring the Duality of Large Language Models in Plagiarism Generation and Detection

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators