MimiQ: Low-Bit Data-Free Quantization of Vision Transformers with Encouraging Inter-Head Attention Similarity

Choi, Kanghyun; Lee, Hye Yoon; Kwon, Dain; Park, SunJong; Kim, Kyuyeun; Park, Noseong; Lee, Jinho

Computer Science > Machine Learning

arXiv:2407.20021v3 (cs)

[Submitted on 29 Jul 2024 (v1), last revised 1 Aug 2024 (this version, v3)]

Title:MimiQ: Low-Bit Data-Free Quantization of Vision Transformers with Encouraging Inter-Head Attention Similarity

Authors:Kanghyun Choi, Hye Yoon Lee, Dain Kwon, SunJong Park, Kyuyeun Kim, Noseong Park, Jinho Lee

View PDF HTML (experimental)

Abstract:Data-free quantization (DFQ) is a technique that creates a lightweight network from its full-precision counterpart without the original training data, often through a synthetic dataset. Although several DFQ methods have been proposed for vision transformer (ViT) architectures, they fail to achieve efficacy in low-bit settings. Examining the existing methods, we identify that their synthetic data produce misaligned attention maps, while those of the real samples are highly aligned. From the observation of aligned attention, we find that aligning attention maps of synthetic data helps to improve the overall performance of quantized ViTs. Motivated by this finding, we devise MimiQ, a novel DFQ method designed for ViTs that focuses on inter-head attention similarity. First, we generate synthetic data by aligning head-wise attention responses in relation to spatial query patches. Then, we apply head-wise structural attention distillation to align the attention maps of the quantized network to those of the full-precision teacher. The experimental results show that the proposed method significantly outperforms baselines, setting a new state-of-the-art performance for data-free ViT quantization.

Comments:	Author Preprint
Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2407.20021 [cs.LG]
	(or arXiv:2407.20021v3 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2407.20021

Submission history

From: Kanghyun Choi [view email]
[v1] Mon, 29 Jul 2024 13:57:40 UTC (9,634 KB)
[v2] Tue, 30 Jul 2024 02:03:06 UTC (9,634 KB)
[v3] Thu, 1 Aug 2024 16:13:45 UTC (9,634 KB)

Computer Science > Machine Learning

Title:MimiQ: Low-Bit Data-Free Quantization of Vision Transformers with Encouraging Inter-Head Attention Similarity

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:MimiQ: Low-Bit Data-Free Quantization of Vision Transformers with Encouraging Inter-Head Attention Similarity

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators