Self-Control of LLM Behaviors by Compressing Suffix Gradient into Prefix Controller

Cai, Min; Zhang, Yuchen; Zhang, Shichang; Yin, Fan; Zou, Difan; Yue, Yisong; Hu, Ziniu

Computer Science > Computation and Language

arXiv:2406.02721v1 (cs)

[Submitted on 4 Jun 2024 (this version), latest version 18 Jun 2024 (v2)]

Title:Self-Control of LLM Behaviors by Compressing Suffix Gradient into Prefix Controller

Authors:Min Cai, Yuchen Zhang, Shichang Zhang, Fan Yin, Difan Zou, Yisong Yue, Ziniu Hu

View PDF HTML (experimental)

Abstract:We propose Self-Control, a novel method utilizing suffix gradients to control the behavior of large language models (LLMs) without explicit human annotations. Given a guideline expressed in suffix string and the model's self-assessment of adherence, Self-Control computes the gradient of this self-judgment concerning the model's hidden states, directly influencing the auto-regressive generation process towards desired behaviors. To enhance efficiency, we introduce Self-Control_{prefix}, a compact module that encapsulates the learned representations from suffix gradients into a Prefix Controller, facilitating inference-time control for various LLM behaviors. Our experiments demonstrate Self-Control's efficacy across multiple domains, including emotional modulation, ensuring harmlessness, and enhancing complex reasoning. Especially, Self-Control_{prefix} enables a plug-and-play control and jointly controls multiple attributes, improving model outputs without altering model parameters or increasing inference-time costs.

Comments:	41 pages, 12 figures, 61 tables; Website: this https URL
Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2406.02721 [cs.CL]
	(or arXiv:2406.02721v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2406.02721

Submission history

From: Min Cai [view email]
[v1] Tue, 4 Jun 2024 19:05:10 UTC (2,834 KB)
[v2] Tue, 18 Jun 2024 15:58:38 UTC (2,834 KB)

Computer Science > Computation and Language

Title:Self-Control of LLM Behaviors by Compressing Suffix Gradient into Prefix Controller

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Self-Control of LLM Behaviors by Compressing Suffix Gradient into Prefix Controller

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators