research-article

Genetic Quantization-Aware Approximation for Non-Linear Operations in Transformers

Authors:

Pingcheng Dong,

Kwang-Ting ChengAuthors Info & Claims

DAC '24: Proceedings of the 61st ACM/IEEE Design Automation Conference

Article No.: 220, Pages 1 - 6

https://doi.org/10.1145/3649329.3657314

Published: 07 November 2024 Publication History

Abstract

Non-linear functions are prevalent in Transformers and their lightweight variants, incurring substantial and frequently underestimated hardware costs. Previous state-of-the-art works optimize these operations by piece-wise linear approximation and store the parameters in look-up tables (LUT), but most of them require unfriendly high-precision arithmetics such as FP/INT 32 and lack consideration of integer-only INT quantization. This paper proposed a genetic LUT-Approximation algorithm namely GQA-LUT that can automatically determine the parameters with quantization awareness. The results demonstrate that GQA-LUT achieves negligible degradation on the challenging semantic segmentation task for both vanilla and linear Transformer models. Besides, proposed GQA-LUT enables the employment of INT8-based LUT-Approximation that achieves an area savings of 81.3~81.7% and a power reduction of 79.3~80.2% compared to the high-precision FP/INT 32 alternatives. Code is available at https://github.com/PingchengDong/GQA-LUT.

References

[1]

Jacob Devlin et al. Bert: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of naacL-HLT, volume 1, page 2, 2019.

[2]

Ze Liu et al. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF international conference on computer vision, pages 10012--10022, 2021.

[3]

Enze Xie et al. Segformer: Simple and efficient design for semantic segmentation with transformers. Advances in Neural Information Processing Systems, 34:12077--12090, 2021.

[4]

Han Cai et al. Efficientvit: Lightweight multi-scale attention for high-resolution dense prediction. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 17302--17313, October 2023.

[5]

Dong Zhang et al. Augmented fcn: rethinking context modeling for semantic segmentation. Science China Information Sciences, 66(4):142105, 2023.

[6]

Sehoon Kim et al. I-bert: Integer-only bert quantization. In International conference on machine learning, pages 5506--5518. PMLR, 2021.

[7]

Shih-Yang Liu et al. Oscillation-free quantization for low-bit vision transformers. In Proceedings of the 40th International Conference on Machine Learning, volume 202, pages 21813--21824. PMLR, 23--29 Jul 2023.

[8]

Shih-yang Liu et al. Llm-fp4: 4-bit floating-point quantized transformers. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 592--605, 2023.

[9]

Fengbin Tu et al. Multcim: Digital computing-in-memory-based multimodal transformer accelerator with attention-token-bit hybrid sparsity. IEEE Journal of Solid-State Circuits, 2023.

[10]

Jacob R Stevens et al. Softermax: Hardware/software co-design of an efficient softmax for transformers. In 2021 58th ACM/IEEE Design Automation Conference (DAC), pages 469--474. IEEE, 2021.

[11]

Joonsang Yu et al. Nn-lut: neural approximation of non-linear operations for efficient transformer inference. In 2023 59th ACM/IEEE Design Automation Conference (DAC), pages 577--582, 2022.

[12]

Janghyeon Kim et al. Range-invariant approximation of non-linear operations for efficient bert fine-tuning. In 2023 60th ACM/IEEE Design Automation Conference (DAC), pages 1--6. IEEE, 2023.

[13]

Xijie Huang et al. Sdq: Stochastic differentiable quantization with mixed precision. In International Conference on Machine Learning, pages 9295--9309. PMLR, 2022.

[14]

Xianghong Hu et al. A tiny accelerator for mixed-bit sparse cnn based on efficient fetch method of simo spad. IEEE Transactions on Circuits and Systems II: Express Briefs, 70(8):3079--3083, 2023.

[15]

Benoit Jacob et al. Quantization and training of neural networks for efficient integer-arithmetic-only inference. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2704--2713, 2018.

[16]

Ashish Vaswani et al. Attention is all you need. Advances in neural information processing systems, 30, 2017.

[17]

Dongchen Han et al. Flatten transformer: Vision transformer using focused linear attention. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 5961--5971, 2023.

[18]

Yoshua Bengio et al. Estimating or propagating gradients through stochastic neurons for conditional computation. arXiv preprint arXiv:1308.3432, 2013.

[19]

Steven K. Esser et al. Learned step size quantization. In 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26-30, 2020, 2020.

[20]

John Holland. Adaptation in natural and artificial systems: an introductory analysis with applications to biology, control, and artificial intelligence. MIT press, 1992.

[21]

Marius Cordts et al. The cityscapes dataset for semantic urban scene understanding. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 3213--3223, 2016.

[22]

Dong Zhang et al. Graph reasoning transformer for image parsing. In Proceedings of the 30th ACM International Conference on Multimedia, pages 2380--2389, 2022.

Cited By

Wang CZhang DYan R(2024)Boosting Weakly-Supervised Image Segmentation via Representation, Transform, and CompensatorIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2024.341377834:11(11013-11025)Online publication date: Nov-2024
https://doi.org/10.1109/TCSVT.2024.3413778
Li WLyu DWang GHu AXu NHe G(2024)Hardware-oriented algorithms for softmax and layer normalization of large language modelsScience China Information Sciences10.1007/s11432-024-4137-467:10Online publication date: 12-Sep-2024
https://doi.org/10.1007/s11432-024-4137-4

Index Terms

Index terms have been assigned to the content through auto-classification.

Recommendations

Genetic Algorithm for Static Power Economic Dispatch
CSIE '09: Proceedings of the 2009 WRI World Congress on Computer Science and Information Engineering - Volume 04

This research presents an improved genetic algorithm (IGA) to solve static power economic dispatch (SPED) problems of units with valve-point effects and multiple fuels. Few SPED problems related studies have seldom addressed both valve-point loadings ...
Codebook Optimization in Vector Quantization Using Genetic Algorithm
ICCEE '09: Proceedings of the 2009 Second International Conference on Computer and Electrical Engineering - Volume 01

This paper presents Genetic algorithm (GA) as a part of evolutionary computing for vector quantizer design in color image compression. Vector quantization, a lossy method to compress the image data in spatial domain. So the quality of the decompressed ...
Genetic Algorithm for Optimal Charge Scheduling of Electric Vehicle Fleet
NISS '19: Proceedings of the 2nd International Conference on Networking, Information Systems & Security

Electric Vehicles (EV) are gradually conquering more roads and replacing pollutant conventional vehicles. They seem to be used to store energy in clean and smart grids to mitigate greenhouse gas emissions and eliminate harmful peak loads. This paper ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

DAC '24: Proceedings of the 61st ACM/IEEE Design Automation Conference

June 2024

2159 pages

ISBN:9798400706011

DOI:10.1145/3649329

Chair:
Vivek De

Copyright © 2024 Copyright is held by the owner/author(s). Publication rights licensed to ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGDA: ACM Special Interest Group on Design Automation
IEEE-CEDA

In-Cooperation

SIGBED: ACM Special Interest Group on Embedded Systems

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 November 2024

Check for updates

Author Tags

Qualifiers

Research-article

Conference

DAC '24

Sponsor:

SIGDA

DAC '24: 61st ACM/IEEE Design Automation Conference

June 23 - 27, 2024

CA, San Francisco, USA

Acceptance Rates

Overall Acceptance Rate 1,770 of 5,499 submissions, 32%

Upcoming Conference

DAC '25

Sponsor:
sigda

62nd ACM/IEEE Design Automation Conference

June 22 - 26, 2025

San Francisco , CA , USA

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
110
Total Downloads

Downloads (Last 12 months)110
Downloads (Last 6 weeks)110

Reflects downloads up to 23 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Wang CZhang DYan R(2024)Boosting Weakly-Supervised Image Segmentation via Representation, Transform, and CompensatorIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2024.341377834:11(11013-11025)Online publication date: Nov-2024
https://doi.org/10.1109/TCSVT.2024.3413778
Li WLyu DWang GHu AXu NHe G(2024)Hardware-oriented algorithms for softmax and layer normalization of large language modelsScience China Information Sciences10.1007/s11432-024-4137-467:10Online publication date: 12-Sep-2024
https://doi.org/10.1007/s11432-024-4137-4

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents