Accurate LoRA-Finetuning Quantization of LLMs via Information Retention

Qin, Haotong; Ma, Xudong; Zheng, Xingyu; Li, Xiaoyang; Zhang, Yang; Liu, Shouda; Luo, Jie; Liu, Xianglong; Magno, Michele

Computer Science > Machine Learning

arXiv:2402.05445 (cs)

[Submitted on 8 Feb 2024 (v1), last revised 27 May 2024 (this version, v2)]

Title:Accurate LoRA-Finetuning Quantization of LLMs via Information Retention

Authors:Haotong Qin, Xudong Ma, Xingyu Zheng, Xiaoyang Li, Yang Zhang, Shouda Liu, Jie Luo, Xianglong Liu, Michele Magno

View PDF HTML (experimental)

Abstract:The LoRA-finetuning quantization of LLMs has been extensively studied to obtain accurate yet compact LLMs for deployment on resource-constrained hardware. However, existing methods cause the quantized LLM to severely degrade and even fail to benefit from the finetuning of LoRA. This paper proposes a novel IR-QLoRA for pushing quantized LLMs with LoRA to be highly accurate through information retention. The proposed IR-QLoRA mainly relies on two technologies derived from the perspective of unified information: (1) statistics-based Information Calibration Quantization allows the quantized parameters of LLM to retain original information accurately; (2) finetuning-based Information Elastic Connection makes LoRA utilizes elastic representation transformation with diverse information. Comprehensive experiments show that IR-QLoRA can significantly improve accuracy across LLaMA and LLaMA2 families under 2-4 bit-widths, e.g., 4- bit LLaMA-7B achieves 1.4% improvement on MMLU compared with the state-of-the-art methods. The significant performance gain requires only a tiny 0.31% additional time consumption, revealing the satisfactory efficiency of our IR-QLoRA. We highlight that IR-QLoRA enjoys excellent versatility, compatible with various frameworks (e.g., NormalFloat and Integer quantization) and brings general accuracy gains. The code is available at this https URL.

Subjects:	Machine Learning (cs.LG); Computation and Language (cs.CL)
Cite as:	arXiv:2402.05445 [cs.LG]
	(or arXiv:2402.05445v2 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2402.05445

Submission history

From: Xudong Ma [view email]
[v1] Thu, 8 Feb 2024 06:53:31 UTC (1,049 KB)
[v2] Mon, 27 May 2024 09:20:35 UTC (1,051 KB)

Computer Science > Machine Learning

Title:Accurate LoRA-Finetuning Quantization of LLMs via Information Retention

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Accurate LoRA-Finetuning Quantization of LLMs via Information Retention

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators