QFT: Post-training Quantization via Fast Joint Finetuning of All Degrees of Freedom

Finkelstein, Alex; Fuchs, Ella; Tal, Idan; Grobman, Mark; Vosco, Niv; Meller, Eldad

doi:10.1007/978-3-031-25082-8_8

Alex Finkelstein¹⁰,
Ella Fuchs¹⁰,
Idan Tal¹⁰,
Mark Grobman¹⁰,
Niv Vosco¹⁰ &
…
Eldad Meller¹⁰

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13807))

Included in the following conference series:

European Conference on Computer Vision

2166 Accesses

Abstract

The post-training quantization (PTQ) challenge of bringing quantized neural net accuracy close to original has drawn much attention driven by industry demand. Many of the methods emphasize optimization of a specific per-layer degree of freedom (DoF), such as grid step size, preconditioning factors, nudges to weights and biases, often chained to others in multi-step solutions. Here we rethink quantized network parameterization in HW-aware fashion, towards a unified analysis of all quantization DoF, permitting for the first time their joint end-to-end finetuning. Our single-step simple and extendable method, dubbed quantization-aware finetuning (QFT), achieves 4b-weights quantization results on-par with SoTA within PTQ constraints of speed and resource.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Explicit Model Size Control and Relaxation via Smooth Regularization for Mixed-Precision Quantization

Loss aware post-training quantization

Article 01 October 2021

Blended coarse gradient descent for full quantization of deep neural networks

Article 02 January 2019

References

Jacob, B., et al.: Quantization and training of neural networks for efficient integer-arithmetic-only inference. CoRR abs/1712.05877 (2017)
Google Scholar
Krishnamoorthi, R.: Quantizing deep convolutional networks for efficient inference: a whitepaper. CoRR abs/1806.08342 (2018)
Google Scholar
Nagel, M., Fournarakis, M., Amjad, R.A., Bondarenko, Y., van Baalen, M., Blankevoort, T.: A white paper on neural network quantization. CoRR abs/2106.08295 (2021)
Google Scholar
Wu, H., Judd, P., Zhang, X., Isaev, M., Micikevicius, P.: Integer quantization for deep learning inference: principles and empirical evaluation. CoRR abs/2004.09602 (2020)
Google Scholar
Kozlov, A., Lazarevich, I., Shamporov, V., Lyalyushkin, N., Gorbachev, Y.: Neural network compression framework for fast model inference. CoRR abs/2002.08679 (2020)
Google Scholar
Gholami, A., Kim, S., Dong, Z., Yao, Z., Mahoney, M.W., Keutzer, K.: A survey of quantization methods for efficient neural network inference. CoRR abs/2103.13630 (2021)
Google Scholar
Meller, E., Finkelstein, A., Almog, U., Grobman, M.: Same, same but different - recovering neural network quantization error through weight factorization. CoRR abs/1902.01917 (2019)
Google Scholar
Nagel, M., van Baalen, M., Blankevoort, T., Welling, M.: Data-free quantization through weight equalization and bias correction. In: 2019 IEEE/CVF International Conference on Computer Vision, ICCV 2019, pp. 1325–1334. Seoul, Korea (South), 27 October - 2 November 2019. IEEE (2019)
Google Scholar
Li, Y., et al.: MQBench: towards reproducible and deployable model quantization benchmark. In: Vanschoren, J., Yeung, S., (eds.): Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks, vol. 1 (2021)
Google Scholar
Bengio, Y., Léonard, N., Courville, A.C.: Estimating or propagating gradients through stochastic neurons for conditional computation. CoRR abs/1308.3432 (2013)
Google Scholar
Hubara, I., Courbariaux, M., Soudry, D., El-Yaniv, R., Bengio, Y.: Binarized neural networks. In: Lee, D.D., Sugiyama, M., von Luxburg, U., Guyon, I., Garnett, R., (eds.): Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems 2016, pp. 4107–4115, 5-10 December 2016, Barcelona, Spain (2016)
Google Scholar
McKinstry, J.L., et al.: Discovering low-precision networks close to full-precision networks for efficient embedded inference. CoRR abs/1809.04191 (2018)
Google Scholar
Liu, Z.G., Mattina, M.: Learning low-precision neural networks without straight-through estimator(ste). CoRR abs/1903.01061 (2019)
Google Scholar
Nagel, M., Amjad, R.A., van Baalen, M., Louizos, C., Blankevoort, T.: Up or down? adaptive rounding for post-training quantization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, pp. 7197–7206, 13-18 July 2020, Virtual Event. Volume 119 of Proceedings of Machine Learning Research, PMLR (2020)
Google Scholar
Choi, J., Wang, Z., Venkataramani, S., Chuang, P.I., Srinivasan, V., Gopalakrishnan, K.: PACT: parameterized clipping activation for quantized neural networks. CoRR abs/1805.06085 (2018)
Google Scholar
Esser, S.K., McKinstry, J.L., Bablani, D., Appuswamy, R., Modha, D.S.: Learned step size quantization. In: 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26–30 (2020), OpenReview.net (2020)
Google Scholar
Jain, S.R., Gural, A., Wu, M., Dick, C.: Trained quantization thresholds for accurate and efficient fixed-point inference of deep neural networks. In: Dhillon, I.S., Papailiopoulos, D.S., Sze, V., (eds.): Proceedings of Machine Learning and Systems 2020, MLSys 2020, Austin, TX, USA, 2–4 March 2020, mlsys.org (2020)
Google Scholar
Banner, R., Nahshan, Y., Hoffer, E., Soudry, D.: ACIQ: analytical clipping for integer quantization of neural networks. CoRR abs/1810.05723 (2018)
Google Scholar
Finkelstein, A., Almog, U., Grobman, M.: Fighting quantization bias with bias. CoRR abs/1906.03193 (2019)
Google Scholar
Choukroun, Y., Kravchik, E., Yang, F., Kisilev, P.: Low-bit quantization of neural networks for efficient inference. In: 2019 IEEE/CVF International Conference on Computer Vision Workshops, ICCV Workshops 2019, pp. 3009–3018. Seoul, Korea (South), 27-28 October 2019. IEEE (2019)
Google Scholar
Hubara, I., Nahshan, Y., Hanani, Y., Banner, R., Soudry, D.: Improving post training neural quantization: layer-wise calibration and integer programming. CoRR abs/2006.10518 (2020)
Google Scholar
Li, Y., et al.: BRECQ: pushing the limit of post-training quantization by block reconstruction. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, 3–7 May 2021. OpenReview.net (2021)
Google Scholar
Hinton, G.E., Vinyals, O., Dean, J.: Distilling the knowledge in a neural network. CoRR abs/1503.02531 (2015)
Google Scholar
Mishra, A.K., Marr, D.: Apprentice: using knowledge distillation techniques to improve low-precision network accuracy. In: 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, 30 April - 3 May 2018. Conference Track Proceedings, OpenReview.net (2018)
Google Scholar
Polino, A., Pascanu, R., Alistarh, D.: Model compression via distillation and quantization. In: 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, 30 April - 3 May 2018. Conference Track Proceedings, OpenReview.net (2018)
Google Scholar
Zhuang, B., Shen, C., Tan, M., Liu, L., Reid, I.: Towards effective low-bitwidth convolutional neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2018)
Google Scholar
Russakovsky, O., et al.: Imagenet large scale visual recognition challenge. Int. J. Comput. Vis. 115(3), 211–252 (2015)
Article MathSciNet Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, pp. 770–778. Las Vegas, NV, USA, 27-30 June 2016. IEEE Computer Society (2016)
Google Scholar
Radosavovic, I., Kosaraju, R.P., Girshick, R.B., He, K., Dollár, P.: Designing network design spaces. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020, pp. 10425–10433. Seattle, WA, USA, 13–19 June 2020. Computer Vision Foundation / IEEE (2020)
Google Scholar
Tan, M., et al.: MnasNet: platform-aware neural architecture search for mobile. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019, pp. 2820–2828. Long Beach, CA, USA, 16–20 June 2019. Computer Vision Foundation / IEEE (2019)
Google Scholar
Sandler, M., Howard, A.G., Zhu, M., Zhmoginov, A., Chen, L.: Mobilenetv 2: Inverted residuals and linear bottlenecks. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, pp. 4510–4520. Salt Lake City, UT, USA, 18–22 June 2018. Computer Vision Foundation / IEEE Computer Society (2018)
Google Scholar
Gluska, S., Grobman, M.: Exploring neural networks quantization via layer-wise quantization analysis. CoRR abs/2012.08420 (2020)
Google Scholar

Download references

Author information

Authors and Affiliations

Hailo, Tel-Aviv, Israel
Alex Finkelstein, Ella Fuchs, Idan Tal, Mark Grobman, Niv Vosco & Eldad Meller

Authors

Alex Finkelstein
View author publications
You can also search for this author in PubMed Google Scholar
Ella Fuchs
View author publications
You can also search for this author in PubMed Google Scholar
Idan Tal
View author publications
You can also search for this author in PubMed Google Scholar
Mark Grobman
View author publications
You can also search for this author in PubMed Google Scholar
Niv Vosco
View author publications
You can also search for this author in PubMed Google Scholar
Eldad Meller
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Alex Finkelstein .

Editor information

Editors and Affiliations

IBM Research - MIT-IBM Watson AI Lab, Massachusetts, USA
Leonid Karlinsky
Technion – Israel Institute of Technology, Haifa, Israel
Tomer Michaeli
Kyoto University, Kyoto, Japan
Ko Nishino

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 464 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Finkelstein, A., Fuchs, E., Tal, I., Grobman, M., Vosco, N., Meller, E. (2023). QFT: Post-training Quantization via Fast Joint Finetuning of All Degrees of Freedom. In: Karlinsky, L., Michaeli, T., Nishino, K. (eds) Computer Vision – ECCV 2022 Workshops. ECCV 2022. Lecture Notes in Computer Science, vol 13807. Springer, Cham. https://doi.org/10.1007/978-3-031-25082-8_8

Download citation

DOI: https://doi.org/10.1007/978-3-031-25082-8_8
Published: 12 February 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-25081-1
Online ISBN: 978-3-031-25082-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

QFT: Post-training Quantization via Fast Joint Finetuning of All Degrees of Freedom

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Explicit Model Size Control and Relaxation via Smooth Regularization for Mixed-Precision Quantization

Loss aware post-training quantization

Blended coarse gradient descent for full quantization of deep neural networks

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

1 Electronic supplementary material

Supplementary material 1 (pdf 464 KB)

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

QFT: Post-training Quantization via Fast Joint Finetuning of All Degrees of Freedom

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Explicit Model Size Control and Relaxation via Smooth Regularization for Mixed-Precision Quantization

Loss aware post-training quantization

Blended coarse gradient descent for full quantization of deep neural networks

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

1 Electronic supplementary material

Supplementary material 1 (pdf 464 KB)

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation