Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3649329.3657314acmconferencesArticle/Chapter ViewAbstractPublication PagesdacConference Proceedingsconference-collections
research-article

Genetic Quantization-Aware Approximation for Non-Linear Operations in Transformers

Published: 07 November 2024 Publication History

Abstract

Non-linear functions are prevalent in Transformers and their lightweight variants, incurring substantial and frequently underestimated hardware costs. Previous state-of-the-art works optimize these operations by piece-wise linear approximation and store the parameters in look-up tables (LUT), but most of them require unfriendly high-precision arithmetics such as FP/INT 32 and lack consideration of integer-only INT quantization. This paper proposed a genetic LUT-Approximation algorithm namely GQA-LUT that can automatically determine the parameters with quantization awareness. The results demonstrate that GQA-LUT achieves negligible degradation on the challenging semantic segmentation task for both vanilla and linear Transformer models. Besides, proposed GQA-LUT enables the employment of INT8-based LUT-Approximation that achieves an area savings of 81.3~81.7% and a power reduction of 79.3~80.2% compared to the high-precision FP/INT 32 alternatives. Code is available at https://github.com/PingchengDong/GQA-LUT.

References

[1]
Jacob Devlin et al. Bert: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of naacL-HLT, volume 1, page 2, 2019.
[2]
Ze Liu et al. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF international conference on computer vision, pages 10012--10022, 2021.
[3]
Enze Xie et al. Segformer: Simple and efficient design for semantic segmentation with transformers. Advances in Neural Information Processing Systems, 34:12077--12090, 2021.
[4]
Han Cai et al. Efficientvit: Lightweight multi-scale attention for high-resolution dense prediction. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 17302--17313, October 2023.
[5]
Dong Zhang et al. Augmented fcn: rethinking context modeling for semantic segmentation. Science China Information Sciences, 66(4):142105, 2023.
[6]
Sehoon Kim et al. I-bert: Integer-only bert quantization. In International conference on machine learning, pages 5506--5518. PMLR, 2021.
[7]
Shih-Yang Liu et al. Oscillation-free quantization for low-bit vision transformers. In Proceedings of the 40th International Conference on Machine Learning, volume 202, pages 21813--21824. PMLR, 23--29 Jul 2023.
[8]
Shih-yang Liu et al. Llm-fp4: 4-bit floating-point quantized transformers. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 592--605, 2023.
[9]
Fengbin Tu et al. Multcim: Digital computing-in-memory-based multimodal transformer accelerator with attention-token-bit hybrid sparsity. IEEE Journal of Solid-State Circuits, 2023.
[10]
Jacob R Stevens et al. Softermax: Hardware/software co-design of an efficient softmax for transformers. In 2021 58th ACM/IEEE Design Automation Conference (DAC), pages 469--474. IEEE, 2021.
[11]
Joonsang Yu et al. Nn-lut: neural approximation of non-linear operations for efficient transformer inference. In 2023 59th ACM/IEEE Design Automation Conference (DAC), pages 577--582, 2022.
[12]
Janghyeon Kim et al. Range-invariant approximation of non-linear operations for efficient bert fine-tuning. In 2023 60th ACM/IEEE Design Automation Conference (DAC), pages 1--6. IEEE, 2023.
[13]
Xijie Huang et al. Sdq: Stochastic differentiable quantization with mixed precision. In International Conference on Machine Learning, pages 9295--9309. PMLR, 2022.
[14]
Xianghong Hu et al. A tiny accelerator for mixed-bit sparse cnn based on efficient fetch method of simo spad. IEEE Transactions on Circuits and Systems II: Express Briefs, 70(8):3079--3083, 2023.
[15]
Benoit Jacob et al. Quantization and training of neural networks for efficient integer-arithmetic-only inference. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2704--2713, 2018.
[16]
Ashish Vaswani et al. Attention is all you need. Advances in neural information processing systems, 30, 2017.
[17]
Dongchen Han et al. Flatten transformer: Vision transformer using focused linear attention. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 5961--5971, 2023.
[18]
Yoshua Bengio et al. Estimating or propagating gradients through stochastic neurons for conditional computation. arXiv preprint arXiv:1308.3432, 2013.
[19]
Steven K. Esser et al. Learned step size quantization. In 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26-30, 2020, 2020.
[20]
John Holland. Adaptation in natural and artificial systems: an introductory analysis with applications to biology, control, and artificial intelligence. MIT press, 1992.
[21]
Marius Cordts et al. The cityscapes dataset for semantic urban scene understanding. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 3213--3223, 2016.
[22]
Dong Zhang et al. Graph reasoning transformer for image parsing. In Proceedings of the 30th ACM International Conference on Multimedia, pages 2380--2389, 2022.

Cited By

View all
  • (2024)Boosting Weakly-Supervised Image Segmentation via Representation, Transform, and CompensatorIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2024.341377834:11(11013-11025)Online publication date: Nov-2024
  • (2024)Hardware-oriented algorithms for softmax and layer normalization of large language modelsScience China Information Sciences10.1007/s11432-024-4137-467:10Online publication date: 12-Sep-2024

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
DAC '24: Proceedings of the 61st ACM/IEEE Design Automation Conference
June 2024
2159 pages
ISBN:9798400706011
DOI:10.1145/3649329
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

In-Cooperation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 November 2024

Check for updates

Author Tags

  1. non-linear function
  2. quantization-aware training
  3. integer-only arithmetic
  4. transformer
  5. look-up table
  6. genetic algorithm

Qualifiers

  • Research-article

Conference

DAC '24
Sponsor:
DAC '24: 61st ACM/IEEE Design Automation Conference
June 23 - 27, 2024
CA, San Francisco, USA

Acceptance Rates

Overall Acceptance Rate 1,770 of 5,499 submissions, 32%

Upcoming Conference

DAC '25
62nd ACM/IEEE Design Automation Conference
June 22 - 26, 2025
San Francisco , CA , USA

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)110
  • Downloads (Last 6 weeks)110
Reflects downloads up to 23 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Boosting Weakly-Supervised Image Segmentation via Representation, Transform, and CompensatorIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2024.341377834:11(11013-11025)Online publication date: Nov-2024
  • (2024)Hardware-oriented algorithms for softmax and layer normalization of large language modelsScience China Information Sciences10.1007/s11432-024-4137-467:10Online publication date: 12-Sep-2024

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media