Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1007/978-3-030-01237-3_23guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

LQ-Nets: Learned Quantization for Highly Accurate and Compact Deep Neural Networks

Published: 08 September 2018 Publication History

Abstract

Although weight and activation quantization is an effective approach for Deep Neural Network (DNN) compression and has a lot of potentials to increase inference speed leveraging bit-operations, there is still a noticeable gap in terms of prediction accuracy between the quantized model and the full-precision model. To address this gap, we propose to jointly train a quantized, bit-operation-compatible DNN and its associated quantizers, as opposed to using fixed, handcrafted quantization schemes such as uniform or logarithmic quantization. Our method for learning the quantizers applies to both network weights and activations with arbitrary-bit precision, and our quantizers are easy to train. The comprehensive experiments on CIFAR-10 and ImageNet datasets show that our method works consistently well for various network structures such as AlexNet, VGG-Net, GoogLeNet, ResNet, and DenseNet, surpassing previous quantization methods in terms of accuracy by an appreciable margin. Code available at https://github.com/Microsoft/LQ-Nets.

References

[1]
Abadi, M., et al.: TensorFlow: large-scale machine learning on heterogeneous systems (2015). https://www.tensorflow.org/
[2]
Bengio, Y., Léonard, N., Courville, A.: Estimating or propagating gradients through stochastic neurons for conditional computation. arXiv:1308.3432 (2013)
[3]
Cai, Z., He, X., Sun, J., Vasconcelos, N.: Deep learning with low precision by half-wave Gaussian quantization. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5918–5926 (2017)
[4]
Chen, W., Wilson, J.T., Tyree, S., Weinberger, K.Q., Chen, Y.: Compressing neural networks with the hashing trick. In: International Conference on Machine Learning (ICML), pp. 2285–2294 (2015)
[5]
Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1251–1258 (2017)
[6]
Courbariaux, M., Bengio, Y., David, J.P.: BinaryConnect: training deep neural networks with binary weights during propagations. In: Advances in Neural Information Processing Systems (NIPS), pp. 3123–3131 (2015)
[7]
Denil, M., Shakibi, B., Dinh, L., Ranzato, M., de Freitas, N.: Predicting parameters in deep learning. In: Advances in Neural Information Processing Systems (NIPS), pp. 2148–2156 (2013)
[8]
Denton, E., Zaremba, W., Bruna, J., LeCun, Y., Fergus, R.: Exploiting linear structure within convolutional networks for efficient evaluation. In: Advances in Neural Information Processing Systems (NIPS), pp. 1269–1277 (2014)
[9]
Dong, Y., Ni, R., Li, J., Chen, Y., Zhu, J., Su, H.: Learning accurate low-bit deep neural networks with stochastic quantization. In: British Machine Vision Conference (BMVC) (2017)
[10]
Gong, Y., Liu, L., Yang, M., Bourdev, L.: Compressing deep convolutional networks using vector quantization. arXiv:1412.6115 (2014)
[11]
Gupta, S., Agrawal, A., Gopalakrishnan, K., Narayanan, P.: Deep learning with limited numerical precision. In: International Conference on Machine Learning (ICML), pp. 1737–1746 (2015)
[12]
Han, S., Mao, H., Dally, W.J.: Deep compression: compressing deep neural networks with pruning, trained quantization and Huffman coding. In: International Conference on Learning Representations (ICLR) (2016)
[13]
Han, S., Pool, J., Tran, J., Dally, W.J.: Learning both weights and connections for efficient neural networks. In: Advances in Neural Information Processing Systems (NIPS), pp. 1135–1143 (2015)
[14]
He, K., Zhang, X., Ren, S., Sun, J.: Delving deep into rectifiers: surpassing human-level performance on ImageNet classification. In: International Conference on Computer Vision (ICCV), pp. 1026–1034 (2015)
[15]
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016)
[16]
He K, Zhang X, Ren S, and Sun J Leibe B, Matas J, Sebe N, and Welling M Identity mappings in deep residual networks Computer Vision – ECCV 2016 2016 Cham Springer 630-645
[17]
He, Y., Zhang, X., Sun, J.: Channel pruning for accelerating very deep neural networks. In: International Conference on Computer Vision (ICCV), pp. 1389–1397 (2017)
[18]
Hou, L., Kwok, J.T.: Loss-aware weight quantization of deep networks. In: International Conference on Learning Representations (ICLR) (2018)
[19]
Howard, A.G., et al.: Mobilenets: efficient convolutional neural networks for mobile vision applications. arXiv:1704.04861 (2017)
[20]
Huang, G., Liu, Z., van der Maaten, L.: Densely connected convolutional networks. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2261–2269 (2017)
[21]
Hubara, I., Courbariaux, M., Soudry, D., El-Yaniv, R., Bengio, Y.: Binarized neural networks. In: Advances in Neural Information Processing Systems (NIPS), pp. 4107–4115 (2016)
[22]
Hubara, I., Courbariaux, M., Soudry, D., El-Yaniv, R., Bengio, Y.: Quantized neural networks: training neural networks with low precision weights and activations. arXiv:1609.07061 (2016)
[23]
Iandola, F.N., Han, S., Moskewicz, M.W., Ashraf, K., Dally, W.J., Keutzer, K.: SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and<0.5MB model size. arXiv:1602.07360 (2016)
[24]
Jaderberg, M., Vedaldi, A., Zisserman, A.: Speeding up convolutional neural networks with low rank expansions. In: British Machine Vision Conference (BMVC) (2014)
[25]
Krizhevsky, A.: Learning multiple layers of features from tiny images. Technical report (2009)
[26]
Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems (NIPS), pp. 1097–1105 (2012)
[27]
Lebedev, V., Ganin, Y., Rakhuba, M., Oseledets, I., Lempitsky, V.: Speeding-up convolutional neural networks using fine-tuned CP-decomposition. In: International Conference on Learning Representations (ICLR) (2015)
[28]
Lebedev, V., Lempitsky, V.: Fast convnets using group-wise brain damage. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2554–2564 (2016)
[29]
Li, F., Zhang, B., Liu, B.: Ternary weight networks. In: NIPS Workshop on Efficient Methods for Deep Neural Networks (2016)
[30]
Li, Z., Ni, B., Zhang, W., Yang, X., Gao, W.: Performance guaranteed network acceleration via high-order residual quantization. In: International Conference on Computer Vision (ICCV), pp. 2584–2592 (2017)
[31]
Lin, D., Talathi, S., Annapureddy, S.: Fixed point quantization of deep convolutional networks. In: International Conference on Machine Learning (ICML), pp. 2849–2858 (2016)
[32]
Lin, M., Chen, Q., Yan, S.: Network in network. In: International Conference on Learning Representations (ICLR) (2014)
[33]
Lin, X., Zhao, C., Pan, W.: Towards accurate binary convolutional neural network. In: Advances in Neural Information Processing Systems (NIPS), pp. 345–353 (2017)
[34]
Lin, Z., Courbariaux, M., Memisevic, R., Bengio, Y.: Neural networks with few multiplications. In: International Conference on Learning Representations (ICLR) (2016)
[35]
Liu, B., Wang, M., Foroosh, H., Tappen, M., Penksy, M.: Sparse convolutional neural networks. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 806–814 (2015)
[36]
Luo, J.H., Wu, J., Lin, W.: Thinet: A filter level pruning method for deep neural network compression. In: International Conference on Computer Vision (ICCV), pp. 5058–5066 (2017)
[37]
Miyashita, D., Lee, E.H., Murmann, B.: Convolutional neural networks using logarithmic data representation. arXiv:1603.01025 (2016)
[38]
Novikov, A., Podoprikhin, D., Osokin, A., Vetrov, D.: Tensorizing neural networks. In: Advances in Neural Information Processing Systems (NIPS), pp. 442–450 (2015)
[39]
Rastegari M, Ordonez V, Redmon J, and Farhadi A Leibe B, Matas J, Sebe N, and Welling M XNOR-Net: ImageNet classification using binary convolutional neural networks Computer Vision – ECCV 2016 2016 Cham Springer 525-542
[40]
Russakovsky O et al. ImageNet large scale visual recognition challenge Int. J. Comput. Vis. (IJCV) 2015 115 3 211-252
[41]
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: International Conference on Learning Representations (ICLR) (2015)
[42]
Szegedy, C., et al.: Going deeper with convolutions. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1–9 (2015)
[43]
Tang, W., Hua, G., Wang, L.: How to train a compact binary neural network with high accuracy? In: AAAI Conference on Artificial Intelligence (AAAI), pp. 2625–2631 (2017)
[44]
Wang J, Zhang T, Sebe N, Shen HT, et al. A survey on learning to hash IEEE Trans. Pattern Anal. Mach. Intell. (TPAMI) 2018 40 4 769-790
[45]
Wen, W., Wu, C., Wang, Y., Chen, Y., Li, H.: Learning structured sparsity in deep neural networks. In: Advances in Neural Information Processing Systems (NIPS), pp. 2074–2082 (2016)
[46]
Wu, J., Leng, C., Wang, Y., Hu, Q., Cheng, J.: Quantized convolutional neural networks for mobile devices. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4820–4828 (2016)
[47]
Wu, Y., et al.: Tensorpack (2016). https://github.com/tensorpack/
[48]
Xie, S., Girshick, R., Dollár, P., Tu, Z., He, K.: Aggregated residual transformations for deep neural networks. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1492–1500 (2017)
[49]
Yu, X., Liu, T., Wang, X., Tao, D.: On compressing deep models by low rank and sparse decomposition. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7370–7379 (2017)
[50]
Zhang, X., Zhou, X., Lin, M., Sun, J.: ShuffleNet: an extremely efficient convolutional neural network for mobile devices. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6848–6856 (2017)
[51]
Zhang X, Zou J, He K, and Sun J Accelerating very deep convolutional networks for classification and detection IEEE Trans. Pattern Anal. Mach. Intell. (TPAMI) 2016 38 10 1943-1955
[52]
Zhou, A., Yao, A., Guo, Y., Xu, L., Chen, Y.: Incremental network quantization: towards lossless CNNs with low-precision weights. In: International Conference on Learning Representations (ICLR) (2017)
[53]
Zhou, S., Wu, Y., Ni, Z., Zhou, X., Wen, H., Zou, Y.: DoReFa-Net: Training low bitwidth convolutional neural networks with low bitwidth gradients. arXiv:1606.06160 (2016)
[54]
Zhu, C., Han, S., Mao, H., Dally, W.J.: Trained ternary quantization. In: International Conference on Learning Representations (ICLR) (2017)
[55]
Zhuang, B., Shen, C., Tan, M., Liu, L., Reid, I.: Towards effective low-bitwidth convolutional neural networks. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7920–7928 (2018)

Cited By

View all
  • (2024)Sharpness-aware data generation for zero-shot quantizationProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3692549(12034-12045)Online publication date: 21-Jul-2024
  • (2024)QuickUpdateProceedings of the 21st USENIX Symposium on Networked Systems Design and Implementation10.5555/3691825.3691865(731-744)Online publication date: 16-Apr-2024
  • (2024)ISOAcc: In-situ Shift Operation-based Accelerator For Efficient in-SRAM MultiplicationACM Transactions on Design Automation of Electronic Systems10.1145/370720530:2(1-24)Online publication date: 5-Dec-2024
  • Show More Cited By

Index Terms

  1. LQ-Nets: Learned Quantization for Highly Accurate and Compact Deep Neural Networks
        Index terms have been assigned to the content through auto-classification.

        Recommendations

        Comments

        Information & Contributors

        Information

        Published In

        cover image Guide Proceedings
        Computer Vision – ECCV 2018: 15th European Conference, Munich, Germany, September 8-14, 2018, Proceedings, Part VIII
        Sep 2018
        845 pages
        ISBN:978-3-030-01236-6
        DOI:10.1007/978-3-030-01237-3

        Publisher

        Springer-Verlag

        Berlin, Heidelberg

        Publication History

        Published: 08 September 2018

        Author Tags

        1. Deep neural networks
        2. Quantization
        3. Compression

        Qualifiers

        • Article

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)0
        • Downloads (Last 6 weeks)0
        Reflects downloads up to 27 Jan 2025

        Other Metrics

        Citations

        Cited By

        View all
        • (2024)Sharpness-aware data generation for zero-shot quantizationProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3692549(12034-12045)Online publication date: 21-Jul-2024
        • (2024)QuickUpdateProceedings of the 21st USENIX Symposium on Networked Systems Design and Implementation10.5555/3691825.3691865(731-744)Online publication date: 16-Apr-2024
        • (2024)ISOAcc: In-situ Shift Operation-based Accelerator For Efficient in-SRAM MultiplicationACM Transactions on Design Automation of Electronic Systems10.1145/370720530:2(1-24)Online publication date: 5-Dec-2024
        • (2024)Low-Precision Mixed-Computation Models for Inference on EdgeIEEE Transactions on Very Large Scale Integration (VLSI) Systems10.1109/TVLSI.2024.340964032:8(1414-1422)Online publication date: 1-Aug-2024
        • (2024)M3XU: Achieving High-Precision and Complex Matrix Multiplication with Low-Precision MXUsProceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis10.1109/SC41406.2024.00016(1-16)Online publication date: 17-Nov-2024
        • (2024)HTQPattern Recognition10.1016/j.patcog.2024.110788156:COnline publication date: 18-Nov-2024
        • (2024)Trainable pruned ternary quantization for medical signal classification modelsNeurocomputing10.1016/j.neucom.2024.128216601:COnline publication date: 7-Oct-2024
        • (2024)Model compression of deep neural network architectures for visual pattern recognitionComputers and Electrical Engineering10.1016/j.compeleceng.2024.109180116:COnline publication date: 1-May-2024
        • (2024)AutoQNN: An End-to-End Framework for Automatically Quantizing Neural NetworksJournal of Computer Science and Technology10.1007/s11390-022-1632-939:2(401-420)Online publication date: 1-Mar-2024
        • (2024)EAGLES: Efficient Accelerated 3D Gaussians with Lightweight EncodingSComputer Vision – ECCV 202410.1007/978-3-031-73036-8_4(54-71)Online publication date: 29-Sep-2024
        • Show More Cited By

        View Options

        View options

        Figures

        Tables

        Media

        Share

        Share

        Share this Publication link

        Share on social media