Abstract
The post-training quantization (PTQ) challenge of bringing quantized neural net accuracy close to original has drawn much attention driven by industry demand. Many of the methods emphasize optimization of a specific per-layer degree of freedom (DoF), such as grid step size, preconditioning factors, nudges to weights and biases, often chained to others in multi-step solutions. Here we rethink quantized network parameterization in HW-aware fashion, towards a unified analysis of all quantization DoF, permitting for the first time their joint end-to-end finetuning. Our single-step simple and extendable method, dubbed quantization-aware finetuning (QFT), achieves 4b-weights quantization results on-par with SoTA within PTQ constraints of speed and resource.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Jacob, B., et al.: Quantization and training of neural networks for efficient integer-arithmetic-only inference. CoRR abs/1712.05877 (2017)
Krishnamoorthi, R.: Quantizing deep convolutional networks for efficient inference: a whitepaper. CoRR abs/1806.08342 (2018)
Nagel, M., Fournarakis, M., Amjad, R.A., Bondarenko, Y., van Baalen, M., Blankevoort, T.: A white paper on neural network quantization. CoRR abs/2106.08295 (2021)
Wu, H., Judd, P., Zhang, X., Isaev, M., Micikevicius, P.: Integer quantization for deep learning inference: principles and empirical evaluation. CoRR abs/2004.09602 (2020)
Kozlov, A., Lazarevich, I., Shamporov, V., Lyalyushkin, N., Gorbachev, Y.: Neural network compression framework for fast model inference. CoRR abs/2002.08679 (2020)
Gholami, A., Kim, S., Dong, Z., Yao, Z., Mahoney, M.W., Keutzer, K.: A survey of quantization methods for efficient neural network inference. CoRR abs/2103.13630 (2021)
Meller, E., Finkelstein, A., Almog, U., Grobman, M.: Same, same but different - recovering neural network quantization error through weight factorization. CoRR abs/1902.01917 (2019)
Nagel, M., van Baalen, M., Blankevoort, T., Welling, M.: Data-free quantization through weight equalization and bias correction. In: 2019 IEEE/CVF International Conference on Computer Vision, ICCV 2019, pp. 1325–1334. Seoul, Korea (South), 27 October - 2 November 2019. IEEE (2019)
Li, Y., et al.: MQBench: towards reproducible and deployable model quantization benchmark. In: Vanschoren, J., Yeung, S., (eds.): Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks, vol. 1 (2021)
Bengio, Y., Léonard, N., Courville, A.C.: Estimating or propagating gradients through stochastic neurons for conditional computation. CoRR abs/1308.3432 (2013)
Hubara, I., Courbariaux, M., Soudry, D., El-Yaniv, R., Bengio, Y.: Binarized neural networks. In: Lee, D.D., Sugiyama, M., von Luxburg, U., Guyon, I., Garnett, R., (eds.): Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems 2016, pp. 4107–4115, 5-10 December 2016, Barcelona, Spain (2016)
McKinstry, J.L., et al.: Discovering low-precision networks close to full-precision networks for efficient embedded inference. CoRR abs/1809.04191 (2018)
Liu, Z.G., Mattina, M.: Learning low-precision neural networks without straight-through estimator(ste). CoRR abs/1903.01061 (2019)
Nagel, M., Amjad, R.A., van Baalen, M., Louizos, C., Blankevoort, T.: Up or down? adaptive rounding for post-training quantization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, pp. 7197–7206, 13-18 July 2020, Virtual Event. Volume 119 of Proceedings of Machine Learning Research, PMLR (2020)
Choi, J., Wang, Z., Venkataramani, S., Chuang, P.I., Srinivasan, V., Gopalakrishnan, K.: PACT: parameterized clipping activation for quantized neural networks. CoRR abs/1805.06085 (2018)
Esser, S.K., McKinstry, J.L., Bablani, D., Appuswamy, R., Modha, D.S.: Learned step size quantization. In: 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26–30 (2020), OpenReview.net (2020)
Jain, S.R., Gural, A., Wu, M., Dick, C.: Trained quantization thresholds for accurate and efficient fixed-point inference of deep neural networks. In: Dhillon, I.S., Papailiopoulos, D.S., Sze, V., (eds.): Proceedings of Machine Learning and Systems 2020, MLSys 2020, Austin, TX, USA, 2–4 March 2020, mlsys.org (2020)
Banner, R., Nahshan, Y., Hoffer, E., Soudry, D.: ACIQ: analytical clipping for integer quantization of neural networks. CoRR abs/1810.05723 (2018)
Finkelstein, A., Almog, U., Grobman, M.: Fighting quantization bias with bias. CoRR abs/1906.03193 (2019)
Choukroun, Y., Kravchik, E., Yang, F., Kisilev, P.: Low-bit quantization of neural networks for efficient inference. In: 2019 IEEE/CVF International Conference on Computer Vision Workshops, ICCV Workshops 2019, pp. 3009–3018. Seoul, Korea (South), 27-28 October 2019. IEEE (2019)
Hubara, I., Nahshan, Y., Hanani, Y., Banner, R., Soudry, D.: Improving post training neural quantization: layer-wise calibration and integer programming. CoRR abs/2006.10518 (2020)
Li, Y., et al.: BRECQ: pushing the limit of post-training quantization by block reconstruction. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, 3–7 May 2021. OpenReview.net (2021)
Hinton, G.E., Vinyals, O., Dean, J.: Distilling the knowledge in a neural network. CoRR abs/1503.02531 (2015)
Mishra, A.K., Marr, D.: Apprentice: using knowledge distillation techniques to improve low-precision network accuracy. In: 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, 30 April - 3 May 2018. Conference Track Proceedings, OpenReview.net (2018)
Polino, A., Pascanu, R., Alistarh, D.: Model compression via distillation and quantization. In: 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, 30 April - 3 May 2018. Conference Track Proceedings, OpenReview.net (2018)
Zhuang, B., Shen, C., Tan, M., Liu, L., Reid, I.: Towards effective low-bitwidth convolutional neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2018)
Russakovsky, O., et al.: Imagenet large scale visual recognition challenge. Int. J. Comput. Vis. 115(3), 211–252 (2015)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, pp. 770–778. Las Vegas, NV, USA, 27-30 June 2016. IEEE Computer Society (2016)
Radosavovic, I., Kosaraju, R.P., Girshick, R.B., He, K., Dollár, P.: Designing network design spaces. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020, pp. 10425–10433. Seattle, WA, USA, 13–19 June 2020. Computer Vision Foundation / IEEE (2020)
Tan, M., et al.: MnasNet: platform-aware neural architecture search for mobile. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019, pp. 2820–2828. Long Beach, CA, USA, 16–20 June 2019. Computer Vision Foundation / IEEE (2019)
Sandler, M., Howard, A.G., Zhu, M., Zhmoginov, A., Chen, L.: Mobilenetv 2: Inverted residuals and linear bottlenecks. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, pp. 4510–4520. Salt Lake City, UT, USA, 18–22 June 2018. Computer Vision Foundation / IEEE Computer Society (2018)
Gluska, S., Grobman, M.: Exploring neural networks quantization via layer-wise quantization analysis. CoRR abs/2012.08420 (2020)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
1 Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Finkelstein, A., Fuchs, E., Tal, I., Grobman, M., Vosco, N., Meller, E. (2023). QFT: Post-training Quantization via Fast Joint Finetuning of All Degrees of Freedom. In: Karlinsky, L., Michaeli, T., Nishino, K. (eds) Computer Vision – ECCV 2022 Workshops. ECCV 2022. Lecture Notes in Computer Science, vol 13807. Springer, Cham. https://doi.org/10.1007/978-3-031-25082-8_8
Download citation
DOI: https://doi.org/10.1007/978-3-031-25082-8_8
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-25081-1
Online ISBN: 978-3-031-25082-8
eBook Packages: Computer ScienceComputer Science (R0)