Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1007/978-3-030-58942-4_27guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

Lossless Compression of Deep Neural Networks

Published: 21 September 2020 Publication History

Abstract

Deep neural networks have been successful in many predictive modeling tasks, such as image and language recognition, where large neural networks are often used to obtain good accuracy. Consequently, it is challenging to deploy these networks under limited computational resources, such as in mobile devices. In this work, we introduce an algorithm that removes units and layers of a neural network while not changing the output that is produced, which thus implies a lossless compression. This algorithm, which we denote as LEO (Lossless Expressiveness Optimization), relies on Mixed-Integer Linear Programming (MILP) to identify Rectified Linear Units (ReLUs) with linear behavior over the input domain. By using regularization to induce such behavior, we can benefit from training over a larger architecture than we would later use in the environment where the trained neural network is deployed.

References

[1]
Aghasi, A., Abdi, A., Nguyen, N., Romberg, J.: Net-trim: convex pruning of deep neural networks with performance guarantee. In: NeurIPS (2017)
[2]
Agrawal, A., Amos, B., Barratt, S., Boyd, S., Diamond, S., Kolter, Z.: Differentiable convex optimization layers. In: NeurIPS (2019)
[3]
Alvarez, A., Louveaux, Q., Wehenkel, L.: A machine learning-based approximation of strong branching. INFORMS J. Comput. (2017)
[4]
Alvarez, J., Salzmann, M.: Learning the number of neurons in deep networks. In: NeurIPS (2016)
[5]
Amos, B., Kolter, Z.: OptNet: differentiable optimization as a layer in neural networks. In: ICML (2017)
[6]
Anderson, R., Huchette, J., Tjandraatmadja, C., Vielma, J.: Strong mixed-integer programming formulations for trained neural networks. In: IPCO (2019)
[7]
Arora, R., Basu, A., Mianjy, P., Mukherjee, A.: Understanding deep neural networks with rectified linear units. In: ICLR (2018)
[8]
Balcan, M.F., Dick, T., Sandholm, T., Vitercik, E.: Learning to branch. In: ICML (2018)
[9]
Bartlett P, Maiorov V, and Meir R Almost linear VC-dimension bounds for piecewise polynomial networks Neural Comput. 1998 10 2159-2173
[10]
Bello, I., Pham, H., Le, Q.V., Norouzi, M., Bengio, S.: Neural combinatorial optimization with reinforcement learning. In: ICLR (2017)
[11]
Bengio, Y., Lodi, A., Prouvost, A.: Machine learning for combinatorial optimization: a methodological tour d’horizon. CoRR abs/1811.06128 (2018)
[12]
Bertsimas D and Dunn JOptimal classification treesMach. Learn.201710671039-10823665788
[13]
Bienstock, D., Muñoz, G., Pokutta, S.: Principled deep neural network training through linear programming. CoRR abs/1810.03218 (2018)
[14]
Bonami, P., Lodi, A., Zarpellon, G.: Learning a classification of mixed-integer quadratic programming problems. In: CPAIOR (2018)
[15]
Cappart, Q., Goutierre, E., Bergman, D., Rousseau, L.M.: Improving optimization bounds using machine learning: decision diagrams meet deep reinforcement learning. In: AAAI (2019)
[16]
Cheng, C., Nührenberg, G., Ruess, H.: Maximum resilience of artificial neural networks. In: ATVA (2017)
[17]
Ciresan D, Meier U, Masci J, and Schmidhuber J Multi column deep neural network for traffic sign classification Neural Netw. 2012 32 333-338
[18]
Courbariaux, M., Hubara, I., Soudry, D., El-Yaniv, R., Bengio, Y.: Binarized neural networks: training deep neural networks with weights and activations constrained to +1 or -1. In: NeurIPS (2016)
[19]
Cybenko, G.: Approximation by superpositions of a sigmoidal function. Mathematics of Control, Signals and Systems (1989)
[20]
Dai, H., Khalil, E.B., Zhang, Y., Dilkina, B., Song, L.: Learning combinatorial optimization algorithms over graphs. In: NeurIPS (2017)
[21]
Demirović, E., et al.: An investigation into prediction + optimisation for the knapsack problem. In: CPAIOR (2019)
[22]
Denton, E., Zaremba, W., Bruna, J., LeCun, Y., Fergus, R.: Exploiting linear structure within convolutional networks for efficient evaluation. In: NeurIPS (2014)
[23]
Deudon, M., Cournut, P., Lacoste, A., Adulyasak, Y., Rousseau, L.M.: Learning heuristics for the TSP by policy gradient. In: CPAIOR (2018)
[24]
Ding, J.Y., et al.: Accelerating primal solution findings for mixed integer programs based on solution prediction. CoRR abs/1906.09575 (2019)
[25]
Donti, P., Amos, B., Kolter, Z.: Task-based end-to-end model learning in stochastic optimization. In: NeurIPS (2017)
[26]
Dubey A, Chatterjee M, and Ahuja N Ferrari V, Hebert M, Sminchisescu C, and Weiss Y Coreset-based neural network compression Computer Vision – ECCV 2018 2018 Cham Springer 469-486
[27]
Dutta, S., Jha, S., Sankaranarayanan, S., Tiwari, A.: Output range analysis for deep feedforward networks. In: NFM (2018)
[28]
Elmachtoub, A., Grigas, P.: Smart predict, then optimize. CoRR abs/1710.08005 (2017)
[29]
Ferber, A., Wilder, B., Dilkina, B., Tambe, M.: MIPaaL: mixed integer program as a layer. In: AAAI (2020)
[30]
Fischetti, M., Lodi, A., Zarpellon, G.: Learning MILP resolution outcomes before reaching time-limit. In: CPAIOR (2019)
[31]
Fischetti, M., Jo, J.: Deep neural networks and mixed integer linear optimization. Constraints (2018)
[32]
Frankle, J., Carbin, M.: The lottery ticket hypothesis: Finding sparse, trainable neural networks. In: ICLR (2019)
[33]
Galassi, A., Lombardi, M., Mello, P., Milano, M.: Model agnostic solution of CSPs via deep learning: a preliminary study. In: CPAIOR (2018)
[34]
Gambella, C., Ghaddar, B., Naoum-Sawaya, J.: Optimization models for machine learning: a survey. CoRR abs/1901.05331 (2019)
[35]
Glorot, X., Bordes, A., Bengio, Y.: Deep sparse rectifier neural networks. In: AISTATS (2011)
[36]
Goodfellow, I., Warde-Farley, D., Mirza, M., Courville, A., Bengio, Y.: Maxout networks. In: ICML (2013)
[37]
Gurobi Optimization, L.: Gurobi optimizer reference manual (2018). http://www.gurobi.com
[38]
Hahnloser R, Sarpeshkar R, Mahowald M, Douglas R, and Seung S Digital selection and analogue amplification coexist in a cortex-inspired silicon circuit Nature 2000 405 947-951
[39]
Han, S., et al.: DSD: regularizing deep neural networks with dense-sparse-dense training flow. arXiv preprint arXiv:1607.04381 (2016)
[40]
Han, S., Pool, J., Tran, J., Dally, W.: Learning both weights and connections for efficient neural network. In: NeurIPS (2015)
[41]
Hanin, B., Rolnick, D.: Complexity of linear regions in deep networks. In: ICML (2019)
[42]
Hanin, B., Rolnick, D.: Deep relu networks have surprisingly few activation patterns. In: NeurIPS (2019)
[43]
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR (2016)
[44]
Herrmann, C., Bowen, R., Zabih, R.: Deep networks with probabilistic gates. CoRR abs/1812.04180 (2018)
[45]
Hinton G et al. Deep neural networks for acoustic modeling in speech recognition IEEE Sig. Process. Mag. 2012 29 82-97
[46]
Hornik K, Stinchcombe M, and White H Multilayer feed-forward networks are universal approximators Neural Net. 1989 2 5 359-366
[47]
Hottung, A., Tanaka, S., Tierney, K.: Deep learning assisted heuristic tree search for the container pre-marshalling problem. Comput. Oper. Res. (2020)
[48]
Howard, A., et al.: Mobilenets: efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017)
[49]
Huang, G., Liu, Z., Maaten, L.V.D., Weinberger, K.: Densely connected convolutional networks. In: CVPR (2017)
[50]
Hutter, F., Hoos, H.H., Leyton-Brown, K.: Sequential model-based optimization for general algorithm configuration. In: LIOn (2011)
[51]
Iandola, F., Han, S., Moskewicz, M., Ashraf, K., Dally, W., Keutzer, K.: Squeezenet: alexnet-level accuracy with 50x fewer parameters and 0.5 MB model size. arXiv preprint arXiv:1602.07360 (2016)
[52]
Icarte, R., Illanes, L., Castro, M., Cire, A., McIlraith, S., Beck, C.: Training binarized neural networks using MIP and CP. In: International Conference on Principles and Practice of Constraint Programming (CP) (2019)
[53]
Jaderberg, M., Vedaldi, A., Zisserman, A.: Speeding up convolutional neural networks with low rank expansions. In: BMVC (2014)
[54]
Kadioglu, S., Malitsky, Y., Sellmann, M., Tierney, K.: ISAC – Instance-Specific Algorithm Configuration. In: ECAI (2010)
[55]
Khalil, E., Bodic, P., Song, L., Nemhauser, G., Dilkina, B.: Learning to branch in mixed integer programming. In: AAAI (2016)
[56]
Khalil, E., Gupta, A., Dilkina, B.: Combinatorial attacks on binarized neural networks. In: ICLR (2019)
[57]
Kolmogorov, V., Rother, C.: Minimizing nonsubmodular functions with graph cuts-a review. In: TPAMI (2007)
[58]
Kotthoff, L.: Algorithm selection for combinatorial search problems: a survey. AI Mag. 35(3) (2014)
[59]
Koval, V., Schlesinger, M.: Two-dimensional programming in image analysis problems. USSR Academy of Science, Automatics and Telemechanics (1976)
[60]
Krizhevsky, A., Sutskever, I., Hinton, G.: Imagenet classification with deep convolutional neural networks. In: NeurIPS (2012)
[61]
Kruber, M., Lübbecke, M., Parmentier, A.: Learning when to use a decomposition. In: CPAIOR (2017)
[62]
Kumar, A., Serra, T., Ramalingam, S.: Equivalent and approximate transformations of deep neural networks. arXiv preprint arXiv:1905.11428 (2019)
[63]
LeCun Y, Bottou L, Bengio Y, and Haffner P Gradient-based learning applied to document recognition Proc. IEEE 1998 86 11 2278-2324
[64]
LeCun Y, Bengio Y, and Hinton G Deep learning Nature 2015 521 436-444
[65]
Li, H., Kadav, A., Durdanovic, I., Samet, H., Graf, H.: Pruning filters for efficient convnets. arXiv preprint arXiv:1608.08710 (2016)
[66]
Lin, C., Zhong, Z., Wei, W., Yan, J.: Synaptic strength for convolutional neural network. In: NeurIPS (2018)
[67]
Lin, H., Jegelka, S.: Resnet with one-neuron hidden layers is a universal approximator. In: NeurIPS (2018)
[68]
Liu, B., Wang, M., Foroosh, H., Tappen, M., Pensky, M.: Sparse convolutional neural networks. In: CVPR (2015)
[69]
Lodi A and Zarpellon GOn learning and branching: a surveyTop2017252207-2363664244
[70]
Lombardi, M., Milano, M.: Boosting combinatorial problem modeling with machine learning. In: IJCAI (2018)
[71]
Lomuscio, A., Maganti, L.: An approach to reachability analysis for feed-forward ReLU neural networks. CoRR abs/1706.07351 (2017)
[72]
Luo, J.H., Wu, J., Lin, W.: Thinet: A filter level pruning method for deep neural network compression. In: ICCV (2017)
[73]
Mhaskar, H., Poggio, T.: Function approximation by deep networks. CoRR abs/1905.12882 (2019)
[74]
Molchanov, P., Tyree, S., Karras, T., Aila, T., Kautz, J.: Pruning convolutional neural networks for resource efficient transfer learning. arXiv preprint arXiv:1611.06440 (2016)
[75]
Montúfar, G.: Notes on the number of linear regions of deep neural networks. In: SampTA (2017)
[76]
Montúfar, G., Pascanu, R., Cho, K., Bengio, Y.: On the number of linear regions of deep neural networks. In: NeurIPS (2014)
[77]
Nair, V., Hinton, G.: Rectified linear units improve restricted boltzmann machines. In: ICML (2010)
[78]
Narodytska, N., Kasiviswanathan, S., Ryzhyk, L., Sagiv, M., Walsh, T.: Verifying properties of binarized deep neural networks. In: AAAI (2018)
[79]
Pascanu, R., Montúfar, G., Bengio, Y.: On the number of response regions of deep feedforward networks with piecewise linear activations. In: ICLR (2014)
[80]
Paszke, A., et al.: Automatic differentiation in pytorch. In: NeurIPS Workshops (2017)
[81]
Peng B, Tan W, Li Z, Zhang S, Xie D, and Pu S Ferrari V, Hebert M, Sminchisescu C, and Weiss Y Extreme network compression via filter group approximation Computer Vision – ECCV 2018 2018 Cham Springer 307-323
[82]
Raghu, M., Poole, B., Kleinberg, J., Ganguli, S., Dickstein, J.: On the expressive power of deep neural networks. In: ICML (2017)
[83]
Rastegari M, Ordonez V, Redmon J, and Farhadi A Leibe B, Matas J, Sebe N, and Welling M XNOR-Net: Imagenet classification using binary convolutional neural networks Computer Vision – ECCV 2016 2016 Cham Springer 525-542
[84]
Ryu, M., Chow, Y., Anderson, R., Tjandraatmadja, C., Boutilier, C.: CAQL: Continuous action Q-learning. CoRR abs/1909.12397 (2019)
[85]
Say, B., Wu, G., Zhou, Y.Q., Sanner, S.: Nonlinear hybrid planning with deep net learned transition models and mixed-integer linear programming. In: IJCAI (2017)
[86]
Serra, T., Ramalingam, S.: Empirical bounds on linear regions of deep rectifier networks. In: AAAI (2020)
[87]
Serra, T., Tjandraatmadja, C., Ramalingam, S.: Bounding and counting linear regions of deep neural networks. In: ICML (2018)
[88]
Serra, T.: On defining design patterns to generalize and leverage automated constraint solving (2012)
[89]
Singh, G., Gehr, T., Püschel, M., Vechev, M.: Robustness certification with refinement. In: ICLR (2019)
[90]
Sutskever, I., Vinyals, O., Le, Q.: Sequence to sequence learning with neural networks. In: NeurIPS (2014)
[91]
Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. In: CVPR (2015)
[92]
Tan, Y., Delong, A., Terekhov, D.: Deep inverse optimization. In: CPAIOR (2019)
[93]
Tang, Y., Agrawal, S., Faenza, Y.: Reinforcement learning for integer programming: learning to cut. CoRR abs/1906.04859 (2019)
[94]
Tang, Z., Peng, X., Li, K., Metaxas, D.: Towards efficient u-nets: a coupled and quantized approach. In: TPAMI (2019)
[95]
Telgarsky, M.: Benefits of depth in neural networks. In: COLT (2016)
[96]
Tjeng, V., Xiao, K., Tedrake, R.: Evaluating robustness of neural networks with mixed integer programming. In: ICLR (2019)
[97]
Tung, F., Mori, G.: Clip-q: Deep network compression learning by in-parallel pruning-quantization. In: CVPR (2018)
[98]
Veit, A., Belongie, S.: Convolutional networks with adaptive computation graphs. CoRR abs/1711.11503 (2017)
[99]
Wainwright M, Jaakkola T, and Willsky A Map estimation via agreement on (hyper)trees: Message-passing and linear-programming approaches IEEE Trans. Inf. Theory 2005 51 11 3697-3717
[100]
Wainwright M, Jaakkola T, and Willsky ATree consistency and bounds on the performance of the max-product algorithm and its generalizationsStat. Comput.200414143-1662037944
[101]
Wang, W., Sun, Y., Eriksson, B., Wang, W., Aggarwal, V.: Wide compression: tensor ring nets. In: CVPR (2018)
[102]
Wen, W., Wu, C., Wang, Y., Chen, Y., Li, H.: Learning structured sparsity in deep neural networks. In: NeurIPS (2016)
[103]
Werner, T.: A linear programming approach to max-sum problem: a review. Technical Report CTU-CMP-2005-25, Center for Machine Perception (2005)
[104]
Wong, E., Kolter, J.Z.: Provable defenses against adversarial examples via the convex outer adversarial polytope. In: ICML (2018)
[105]
Wu, J., Leng, C., Wang, Y., Hu, Q., Cheng, J.: Quantized convolutional neural networks for mobile devices. In: CVPR (2016)
[106]
Xiao, K., Tjeng, V., Shafiullah, N., Madry, A.: Training for faster adversarial robustness verification via inducing ReLU stability. ICLR (2019)
[107]
Xu, H., Koenig, S., Kumar, T.S.: Towards effective deep learning for constraint satisfaction problems. In: CP (2018)
[108]
Xue, Y., van Hoeve, W.J.: Embedding decision diagrams into generative adversarial networks. In: CPAIOR (2019)
[109]
Ye, Z., Say, B., Sanner, S.: Symbolic bucket elimination for piecewise continuous constrained optimization. In: CPAIOR (2018)
[110]
Yu, R., et al.: NISP: pruning networks using neuron importance score propagation. In: CVPR (2018)
[111]
Yu, X., Yu, Z., Ramalingam, S.: Learning strict identity mappings in deep residual networks. In: CVPR (2018)
[112]
Zhang, X., Zou, J., Ming, X., He, K., Sun, J.: Efficient and accurate approximations of nonlinear convolutional networks. In: CVPR (2015)
[113]
Zhao, C., Ni, B., Zhang, J., Zhao, Q., Zhang, W., Tian, Q.: Variational convolutional neural network pruning. In: CVPR (2019)
[114]
Zhou H, Alvarez JM, and Porikli F Leibe B, Matas J, Sebe N, and Welling M Less is more: towards compact CNNs Computer Vision – ECCV 2016 2016 Cham Springer 662-677

Cited By

View all
  • (2024)Overcoming the optimizer's curseProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3694059(48695-48712)Online publication date: 21-Jul-2024
  • (2024)Optimization over Trained Neural Networks: Taking a Relaxing WalkIntegration of Constraint Programming, Artificial Intelligence, and Operations Research10.1007/978-3-031-60599-4_14(221-233)Online publication date: 28-May-2024
  • (2023)Alternating Mixed-Integer Programming and Neural Network Training for Approximating Stochastic Two-Stage ProblemsMachine Learning, Optimization, and Data Science10.1007/978-3-031-53966-4_10(124-139)Online publication date: 22-Sep-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Guide Proceedings
Integration of Constraint Programming, Artificial Intelligence, and Operations Research: 17th International Conference, CPAIOR 2020, Vienna, Austria, September 21–24, 2020, Proceedings
Sep 2020
558 pages
ISBN:978-3-030-58941-7
DOI:10.1007/978-3-030-58942-4

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 21 September 2020

Author Tags

  1. Deep learning
  2. Mixed-Integer Linear Programming
  3. Neural network pruning
  4. Neuron stability
  5. Rectified Linear Unit

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 20 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Overcoming the optimizer's curseProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3694059(48695-48712)Online publication date: 21-Jul-2024
  • (2024)Optimization over Trained Neural Networks: Taking a Relaxing WalkIntegration of Constraint Programming, Artificial Intelligence, and Operations Research10.1007/978-3-031-60599-4_14(221-233)Online publication date: 28-May-2024
  • (2023)Alternating Mixed-Integer Programming and Neural Network Training for Approximating Stochastic Two-Stage ProblemsMachine Learning, Optimization, and Data Science10.1007/978-3-031-53966-4_10(124-139)Online publication date: 22-Sep-2023
  • (2023)The BeMi Stardust: A Structured Ensemble of Binarized Neural NetworksLearning and Intelligent Optimization10.1007/978-3-031-44505-7_30(443-458)Online publication date: 4-Jun-2023
  • (2023)Model-Based Feature Selection for Neural Networks: A Mixed-Integer Programming ApproachLearning and Intelligent Optimization10.1007/978-3-031-44505-7_16(223-238)Online publication date: 4-Jun-2023
  • (2023)Getting Away with More Network Pruning: From Sparsity to Geometry and Linear RegionsIntegration of Constraint Programming, Artificial Intelligence, and Operations Research10.1007/978-3-031-33271-5_14(200-218)Online publication date: 29-May-2023
  • (2022)Pruning's effect on generalization through the lens of training and regularizationProceedings of the 36th International Conference on Neural Information Processing Systems10.5555/3600270.3603020(37947-37961)Online publication date: 28-Nov-2022
  • (2022)Mixed Integer Linear Programming for Optimizing a Hopfield NetworkMachine Learning and Knowledge Discovery in Databases10.1007/978-3-031-26419-1_21(344-360)Online publication date: 19-Sep-2022
  • (2022)Training Thinner and Deeper Neural Networks: Jumpstart RegularizationIntegration of Constraint Programming, Artificial Intelligence, and Operations Research10.1007/978-3-031-08011-1_23(345-357)Online publication date: 20-Jun-2022
  • (2021)On the expected complexity of maxout networksProceedings of the 35th International Conference on Neural Information Processing Systems10.5555/3540261.3542482(28995-29008)Online publication date: 6-Dec-2021
  • Show More Cited By

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media