Large Language Models can be Guided to Evade AI-Generated Text Detection

Lu, Ning; Liu, Shengcai; He, Rui; Wang, Qi; Ong, Yew-Soon; Tang, Ke

Computer Science > Computation and Language

arXiv:2305.10847v6 (cs)

[Submitted on 18 May 2023 (v1), last revised 15 May 2024 (this version, v6)]

Title:Large Language Models can be Guided to Evade AI-Generated Text Detection

Authors:Ning Lu, Shengcai Liu, Rui He, Qi Wang, Yew-Soon Ong, Ke Tang

View PDF HTML (experimental)

Abstract:Large language models (LLMs) have shown remarkable performance in various tasks and have been extensively utilized by the public. However, the increasing concerns regarding the misuse of LLMs, such as plagiarism and spamming, have led to the development of multiple detectors, including fine-tuned classifiers and statistical methods. In this study, we equip LLMs with prompts, rather than relying on an external paraphraser, to evaluate the vulnerability of these detectors. We propose a novel Substitution-based In-Context example Optimization method (SICO) to automatically construct prompts for evading the detectors. SICO is cost-efficient as it requires only 40 human-written examples and a limited number of LLM inferences to generate a prompt. Moreover, once a task-specific prompt has been constructed, it can be universally used against a wide range of detectors. Extensive experiments across three real-world tasks demonstrate that SICO significantly outperforms the paraphraser baselines and enables GPT-3.5 to successfully evade six detectors, decreasing their AUC by 0.5 on average. Furthermore, a comprehensive human evaluation show that the SICO-generated text achieves human-level readability and task completion rates, while preserving high imperceptibility. Finally, we propose an ensemble approach to enhance the robustness of detectors against SICO attack. The code is publicly available at this https URL.

Comments:	TMLR camera ready
Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2305.10847 [cs.CL]
	(or arXiv:2305.10847v6 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2305.10847

Submission history

From: Ning Lu [view email]
[v1] Thu, 18 May 2023 10:03:25 UTC (221 KB)
[v2] Fri, 19 May 2023 11:25:01 UTC (362 KB)
[v3] Mon, 5 Jun 2023 03:54:52 UTC (362 KB)
[v4] Sat, 17 Jun 2023 03:48:41 UTC (448 KB)
[v5] Thu, 14 Dec 2023 12:21:05 UTC (870 KB)
[v6] Wed, 15 May 2024 08:00:09 UTC (891 KB)

Computer Science > Computation and Language

Title:Large Language Models can be Guided to Evade AI-Generated Text Detection

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Large Language Models can be Guided to Evade AI-Generated Text Detection

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators