research-article

QUQ: Quadruplet Uniform Quantization for Efficient Vision Transformer Inference

Authors:

Xinkuang Geng,

Siting Liu,

Leibo Liu,

Jie Han,

Honglan JiangAuthors Info & Claims

DAC '24: Proceedings of the 61st ACM/IEEE Design Automation Conference

Article No.: 272, Pages 1 - 6

https://doi.org/10.1145/3649329.3656516

Published: 07 November 2024 Publication History

Get Access

Abstract

While exhibiting superior performance in many tasks, vision transformers (ViTs) face challenges in quantization. Some existing low-bit-width quantization techniques cannot effectively cover the whole inference process of ViTs, leading to an additional memory overhead (22.3%-172.6%) compared with corresponding fully quantized models. To address this issue, we propose quadruplet uniform quantization (QUQ) to deal with data of various distributions in ViT. QUQ divides the entire data range into at most four subranges that are uniformly quantized with different scale factors. To determine the partition scheme and quantization parameters, an efficient relaxation algorithm is proposed accordingly. Moreover, dedicated encoding and decoding strategies are devised to facilitate the design of an efficient accelerator. Experimental results show that QUQ surpasses state-of-the-art quantization techniques; it is the first viable scheme that can fully quantize ViTs to 6-bit with acceptable accuracy. Compared with conventional uniform quantization, QUQ leads to not only a higher accuracy but also an accelerator with lower area and power.

References

[1]

Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition. Ieee, 248--255.

Crossref

Google Scholar

[2]

Yifu Ding, Haotong Qin, Qinghua Yan, Zhenhua Chai, Junjie Liu, Xiaolin Wei, and Xianglong Liu. 2022. Towards Accurate Post-Training Quantization for Vision Transformer. In Proceedings of the 30th ACM International Conference on Multimedia. 5380--5388.

Digital Library

Google Scholar

[3]

Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, et al. 2020. An image is worth 16×16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020).

Google Scholar

[4]

Shubham Jain, Swagath Venkataramani, Vijayalakshmi Srinivasan, Jungwook Choi, Kailash Gopalakrishnan, and Leland Chang. 2019. BiScaled-DNN: Quantizing long-tailed datastructures with two scale factors for deep neural networks. In Proceedings of the 56th Annual Design Automation Conference 2019. 1--6.

Digital Library

Google Scholar

[5]

Sehoon Kim, Amir Gholami, Zhewei Yao, Michael W Mahoney, and Kurt Keutzer. 2021. I-bert: Integer-only bert quantization. In International conference on machine learning. PMLR, 5506--5518.

Google Scholar

[6]

Zhikai Li and Qingyi Gu. 2023. I-vit: Integer-only quantization for efficient vision transformer inference. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 17065--17075.

Crossref

Google Scholar

[7]

Yang Lin, Tianyu Zhang, Peiqin Sun, Zheng Li, and Shuchang Zhou. 2021. Fq-vit: Post-training quantization for fully quantized vision transformer. arXiv preprint arXiv:2111.13824 (2021).

Google Scholar

[8]

Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, and Baining Guo. 2021. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF international conference on computer vision. 10012--10022.

Crossref

Google Scholar

[9]

Markus Nagel, Marios Fournarakis, Rana Ali Amjad, Yelysei Bondarenko, Mart Van Baalen, and Tijmen Blankevoort. 2021. A white paper on neural network quantization. arXiv preprint arXiv:2106.08295 (2021).

Google Scholar

[10]

Hugo Touvron, Matthieu Cord, Matthijs Douze, Francisco Massa, Alexandre Sablayrolles, and Hervé Jégou. 2021. Training data-efficient image transformers & distillation through attention. In International conference on machine learning. PMLR, 10347--10357.

Google Scholar

[11]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. Advances in neural information processing systems 30 (2017).

Google Scholar

[12]

Zhihang Yuan, Chenhao Xue, Yiqi Chen, Qiang Wu, and Guangyu Sun. 2022. Ptq4vit: Post-training quantization for vision transformers with twin uniform quantization. In European Conference on Computer Vision. Springer, 191--207.

Digital Library

Google Scholar

Index Terms

QUQ: Quadruplet Uniform Quantization for Efficient Vision Transformer Inference
1. Computing methodologies
2. Hardware
  1. Very large scale integration design
    1. Application-specific VLSI designs
      1. Application specific processors

Index terms have been assigned to the content through auto-classification.

Recommendations

Approximating vector quantisation by transformation and scalar quantisation

Vector quantisation provides better rate‐distortion performance over scalar quantisation even for a random vector with independent dimensions. However, the design and implementation complexity of vector quantisers is much higher than that of scalar ...
Efficient product code vector quantisation using the switched split vector quantiser

In this article, we first review the vector quantiser and discuss its well-known advantages over the scalar quantiser, namely the space-filling advantage, the shape advantage, and the memory advantage. It is important to understand why vector quantisers ...
Efficient vector quantization of lpc parameters for harmonic speech coding

Comments

Information & Contributors

Information

Published In

DAC '24: Proceedings of the 61st ACM/IEEE Design Automation Conference

June 2024

2159 pages

ISBN:9798400706011

DOI:10.1145/3649329

Chair:
Vivek De

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

In-Cooperation

SIGBED: ACM Special Interest Group on Embedded Systems

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 November 2024

Check for updates

Qualifiers

Research-article

Funding Sources

Conference

DAC '24

Sponsor:

SIGDA

DAC '24: 61st ACM/IEEE Design Automation Conference

June 23 - 27, 2024

CA, San Francisco, USA

Acceptance Rates

Overall Acceptance Rate 1,770 of 5,499 submissions, 32%

Upcoming Conference

DAC '25

Sponsor:
sigda

62nd ACM/IEEE Design Automation Conference

June 22 - 26, 2025

San Francisco , CA , USA

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
53
Total Downloads

Downloads (Last 12 months)53
Downloads (Last 6 weeks)53

Reflects downloads up to 22 Dec 2024

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Index Terms

Recommendations

Approximating vector quantisation by transformation and scalar quantisation

Efficient product code vector quantisation using the switched split vector quantiser

Efficient vector quantization of lpc parameters for harmonic speech coding

Comments

Published In

Sponsors

In-Cooperation

Publisher

Publication History

Check for updates

Qualifiers

Funding Sources

Conference

Acceptance Rates

Upcoming Conference

Other Metrics

Article Metrics

Other Metrics

Login options

Full Access

PDF

eReader

Abstract

References

Index Terms

Recommendations

Approximating vector quantisation by transformation and scalar quantisation

Efficient product code vector quantisation using the switched split vector quantiser

Efficient vector quantization of lpc parameters for harmonic speech coding

Comments

Information

Published In

Sponsors

In-Cooperation

Publisher

Publication History

Check for updates

Qualifiers

Funding Sources

Conference

Acceptance Rates

Upcoming Conference

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Login options

Full Access

View options

PDF

eReader

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations