Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

QFT: Post-training Quantization via Fast Joint Finetuning of All Degrees of Freedom

  • Conference paper
  • First Online:
Computer Vision – ECCV 2022 Workshops (ECCV 2022)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13807))

Included in the following conference series:

  • 2166 Accesses

Abstract

The post-training quantization (PTQ) challenge of bringing quantized neural net accuracy close to original has drawn much attention driven by industry demand. Many of the methods emphasize optimization of a specific per-layer degree of freedom (DoF), such as grid step size, preconditioning factors, nudges to weights and biases, often chained to others in multi-step solutions. Here we rethink quantized network parameterization in HW-aware fashion, towards a unified analysis of all quantization DoF, permitting for the first time their joint end-to-end finetuning. Our single-step simple and extendable method, dubbed quantization-aware finetuning (QFT), achieves 4b-weights quantization results on-par with SoTA within PTQ constraints of speed and resource.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Jacob, B., et al.: Quantization and training of neural networks for efficient integer-arithmetic-only inference. CoRR abs/1712.05877 (2017)

    Google Scholar 

  2. Krishnamoorthi, R.: Quantizing deep convolutional networks for efficient inference: a whitepaper. CoRR abs/1806.08342 (2018)

    Google Scholar 

  3. Nagel, M., Fournarakis, M., Amjad, R.A., Bondarenko, Y., van Baalen, M., Blankevoort, T.: A white paper on neural network quantization. CoRR abs/2106.08295 (2021)

    Google Scholar 

  4. Wu, H., Judd, P., Zhang, X., Isaev, M., Micikevicius, P.: Integer quantization for deep learning inference: principles and empirical evaluation. CoRR abs/2004.09602 (2020)

    Google Scholar 

  5. Kozlov, A., Lazarevich, I., Shamporov, V., Lyalyushkin, N., Gorbachev, Y.: Neural network compression framework for fast model inference. CoRR abs/2002.08679 (2020)

    Google Scholar 

  6. Gholami, A., Kim, S., Dong, Z., Yao, Z., Mahoney, M.W., Keutzer, K.: A survey of quantization methods for efficient neural network inference. CoRR abs/2103.13630 (2021)

    Google Scholar 

  7. Meller, E., Finkelstein, A., Almog, U., Grobman, M.: Same, same but different - recovering neural network quantization error through weight factorization. CoRR abs/1902.01917 (2019)

    Google Scholar 

  8. Nagel, M., van Baalen, M., Blankevoort, T., Welling, M.: Data-free quantization through weight equalization and bias correction. In: 2019 IEEE/CVF International Conference on Computer Vision, ICCV 2019, pp. 1325–1334. Seoul, Korea (South), 27 October - 2 November 2019. IEEE (2019)

    Google Scholar 

  9. Li, Y., et al.: MQBench: towards reproducible and deployable model quantization benchmark. In: Vanschoren, J., Yeung, S., (eds.): Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks, vol. 1 (2021)

    Google Scholar 

  10. Bengio, Y., Léonard, N., Courville, A.C.: Estimating or propagating gradients through stochastic neurons for conditional computation. CoRR abs/1308.3432 (2013)

    Google Scholar 

  11. Hubara, I., Courbariaux, M., Soudry, D., El-Yaniv, R., Bengio, Y.: Binarized neural networks. In: Lee, D.D., Sugiyama, M., von Luxburg, U., Guyon, I., Garnett, R., (eds.): Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems 2016, pp. 4107–4115, 5-10 December 2016, Barcelona, Spain (2016)

    Google Scholar 

  12. McKinstry, J.L., et al.: Discovering low-precision networks close to full-precision networks for efficient embedded inference. CoRR abs/1809.04191 (2018)

    Google Scholar 

  13. Liu, Z.G., Mattina, M.: Learning low-precision neural networks without straight-through estimator(ste). CoRR abs/1903.01061 (2019)

    Google Scholar 

  14. Nagel, M., Amjad, R.A., van Baalen, M., Louizos, C., Blankevoort, T.: Up or down? adaptive rounding for post-training quantization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, pp. 7197–7206, 13-18 July 2020, Virtual Event. Volume 119 of Proceedings of Machine Learning Research, PMLR (2020)

    Google Scholar 

  15. Choi, J., Wang, Z., Venkataramani, S., Chuang, P.I., Srinivasan, V., Gopalakrishnan, K.: PACT: parameterized clipping activation for quantized neural networks. CoRR abs/1805.06085 (2018)

    Google Scholar 

  16. Esser, S.K., McKinstry, J.L., Bablani, D., Appuswamy, R., Modha, D.S.: Learned step size quantization. In: 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26–30 (2020), OpenReview.net (2020)

    Google Scholar 

  17. Jain, S.R., Gural, A., Wu, M., Dick, C.: Trained quantization thresholds for accurate and efficient fixed-point inference of deep neural networks. In: Dhillon, I.S., Papailiopoulos, D.S., Sze, V., (eds.): Proceedings of Machine Learning and Systems 2020, MLSys 2020, Austin, TX, USA, 2–4 March 2020, mlsys.org (2020)

    Google Scholar 

  18. Banner, R., Nahshan, Y., Hoffer, E., Soudry, D.: ACIQ: analytical clipping for integer quantization of neural networks. CoRR abs/1810.05723 (2018)

    Google Scholar 

  19. Finkelstein, A., Almog, U., Grobman, M.: Fighting quantization bias with bias. CoRR abs/1906.03193 (2019)

    Google Scholar 

  20. Choukroun, Y., Kravchik, E., Yang, F., Kisilev, P.: Low-bit quantization of neural networks for efficient inference. In: 2019 IEEE/CVF International Conference on Computer Vision Workshops, ICCV Workshops 2019, pp. 3009–3018. Seoul, Korea (South), 27-28 October 2019. IEEE (2019)

    Google Scholar 

  21. Hubara, I., Nahshan, Y., Hanani, Y., Banner, R., Soudry, D.: Improving post training neural quantization: layer-wise calibration and integer programming. CoRR abs/2006.10518 (2020)

    Google Scholar 

  22. Li, Y., et al.: BRECQ: pushing the limit of post-training quantization by block reconstruction. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, 3–7 May 2021. OpenReview.net (2021)

    Google Scholar 

  23. Hinton, G.E., Vinyals, O., Dean, J.: Distilling the knowledge in a neural network. CoRR abs/1503.02531 (2015)

    Google Scholar 

  24. Mishra, A.K., Marr, D.: Apprentice: using knowledge distillation techniques to improve low-precision network accuracy. In: 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, 30 April - 3 May 2018. Conference Track Proceedings, OpenReview.net (2018)

    Google Scholar 

  25. Polino, A., Pascanu, R., Alistarh, D.: Model compression via distillation and quantization. In: 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, 30 April - 3 May 2018. Conference Track Proceedings, OpenReview.net (2018)

    Google Scholar 

  26. Zhuang, B., Shen, C., Tan, M., Liu, L., Reid, I.: Towards effective low-bitwidth convolutional neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2018)

    Google Scholar 

  27. Russakovsky, O., et al.: Imagenet large scale visual recognition challenge. Int. J. Comput. Vis. 115(3), 211–252 (2015)

    Article  MathSciNet  Google Scholar 

  28. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, pp. 770–778. Las Vegas, NV, USA, 27-30 June 2016. IEEE Computer Society (2016)

    Google Scholar 

  29. Radosavovic, I., Kosaraju, R.P., Girshick, R.B., He, K., Dollár, P.: Designing network design spaces. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020, pp. 10425–10433. Seattle, WA, USA, 13–19 June 2020. Computer Vision Foundation / IEEE (2020)

    Google Scholar 

  30. Tan, M., et al.: MnasNet: platform-aware neural architecture search for mobile. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019, pp. 2820–2828. Long Beach, CA, USA, 16–20 June 2019. Computer Vision Foundation / IEEE (2019)

    Google Scholar 

  31. Sandler, M., Howard, A.G., Zhu, M., Zhmoginov, A., Chen, L.: Mobilenetv 2: Inverted residuals and linear bottlenecks. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, pp. 4510–4520. Salt Lake City, UT, USA, 18–22 June 2018. Computer Vision Foundation / IEEE Computer Society (2018)

    Google Scholar 

  32. Gluska, S., Grobman, M.: Exploring neural networks quantization via layer-wise quantization analysis. CoRR abs/2012.08420 (2020)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Alex Finkelstein .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 464 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Finkelstein, A., Fuchs, E., Tal, I., Grobman, M., Vosco, N., Meller, E. (2023). QFT: Post-training Quantization via Fast Joint Finetuning of All Degrees of Freedom. In: Karlinsky, L., Michaeli, T., Nishino, K. (eds) Computer Vision – ECCV 2022 Workshops. ECCV 2022. Lecture Notes in Computer Science, vol 13807. Springer, Cham. https://doi.org/10.1007/978-3-031-25082-8_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-25082-8_8

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-25081-1

  • Online ISBN: 978-3-031-25082-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics