research-article

Open access

The Art of Getting Deep Neural Networks in Shape

Authors:

Rahim Mammadli,

Ali JannesariAuthors Info & Claims

ACM Transactions on Architecture and Code Optimization (TACO), Volume 15, Issue 4

Article No.: 62, Pages 1 - 21

https://doi.org/10.1145/3291053

Published: 08 January 2019 Publication History

All formats PDF

Abstract

Training a deep neural network (DNN) involves selecting a set of hyperparameters that define the network topology and influence the accuracy of the resulting network. Often, the goal is to maximize prediction accuracy on a given dataset. However, non-functional requirements of the trained network -- such as inference speed, size, and energy consumption -- can be very important as well. In this article, we aim to automate the process of selecting an appropriate DNN topology that fulfills both functional and non-functional requirements of the application. Specifically, we focus on tuning two important hyperparameters, depth and width, which together define the shape of the resulting network and directly affect its accuracy, speed, size, and energy consumption. To reduce the time needed to search the design space, we train a fraction of DNNs and build a model to predict the performances of the remaining ones. We are able to produce tuned ResNets, which are up to 4.22 times faster than original depth-scaled ResNets on a batch of 128 images while matching their accuracy.

References

[1]

Jimmy Ba and Rich Caruana. 2014. Do deep nets really need to be deep? In Advances in Neural Information Processing Systems 27, Z. Ghahramani, M. Welling, C. Cortes, N. D. Lawrence, and K. Q. Weinberger (Eds.). Curran Associates, Inc., 2654--2662.

Digital Library

[2]

Bowen Baker, Otkrist Gupta, Nikhil Naik, and Ramesh Raskar. 2016. Designing neural network architectures using reinforcement learning. (2016). arxiv:1611.02167

[3]

James Bergstra and Yoshua Bengio. 2012. Random search for hyper-parameter optimization. Journal of Machine Learning Research 13, Feb (2012), 281--305.

Digital Library

[4]

Martin Burtscher, Ivan Zecena, and Ziliang Zong. 2014. Measuring GPU power with the K20 built-in sensor. In Proceedings of Workshop on General Purpose Processing Using GPUs. ACM, 28.

[5]

Patryk Chrabaszcz, Ilya Loshchilov, and Frank Hutter. 2017. A downsampled variant of ImageNet as an alternative to the CIFAR datasets. (2017). arxiv:1707.08819

[6]

Matthieu Courbariaux, Yoshua Bengio, and Jean-Pierre David. 2015. BinaryConnect: Training deep neural networks with binary weights during propagations. In Advances in Neural Information Processing Systems 28, C. Cortes, N. D. Lawrence, D. D. Lee, M. Sugiyama, and R. Garnett (Eds.). Curran Associates, Inc., 3123--3131.

Digital Library

[7]

Song Han, Huizi Mao, and William J. Dally. 2015. Deep compression: Compressing deep neural network with pruning, trained quantization and Huffman coding. (2015). arxiv:1510.00149

[8]

Sayed Hadi Hashemi, Shadi A. Noghabi, and William Gropp. 2016. Performance modeling of distributed deep neural networks. (2016). arxiv:1612.00521

[9]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 770--778.

[10]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Identity mappings in deep residual networks. In Proceedings of the European Conference on Computer Vision. Springer, 630--645.

[11]

Gao Huang, Zhuang Liu, Laurens van der Maaten, and Kilian Q. Weinberger. 2017. Densely connected convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE. 2261--2269.

[12]

Forrest N. Iandola, Song Han, Matthew W. Moskewicz, Khalid Ashraf, William J. Dally, and Kurt Keutzer. 2016. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size. (2016), 1--13. arxiv:1602.07360

[13]

Jia Deng, Wei Dong, R. Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. ImageNet: A large-scale hierarchical image database. In Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition. IEEE. 248--255.

[14]

Alex Krizhevsky and Geoffrey Hinton. 2009. Learning multiple layers of features from tiny images. Science Department, University of Toronto, Tech. (2009), 1--60.

[15]

Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. ImageNet classification with deep convolutional neural networks. In Proceedings of the 25th International Conference on Neural Information Processing Systems - Vol. 1 (NIPS’12). Curran Associates Inc., 1097--1105.

Digital Library

[16]

Jens Lang and Gudula Rünger. 2013. High-resolution power profiling of GPU functions using low-resolution measurement. In Proceedings of the European Conference on Parallel Processing, Felix Wolf, Bernd Mohr, and Dieter an Mey (Eds.). Springer, 801--812.

Digital Library

[17]

Hugo Larochelle, Dumitru Erhan, Aaron Courville, James Bergstra, and Yoshua Bengio. 2007. An empirical evaluation of deep architectures on problems with many factors of variation. In Proceedings of the 24th International Conference on Machine Learning (ICML’07). 473--480.

Digital Library

[18]

Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner. 1998. Gradient-based learning applied to document recognition. In Proceedings of the IEEE, Vol. 86. 2278--2323.

[19]

Yann LeCun and Corinna Cortes. {n.d.}. MNIST handwritten digit database. Retrieved November 21, 2018 from http://yann.lecun.com/exdb/mnist/.

[20]

Risto Miikkulainen, Jason Zhi Liang, Elliot Meyerson, Aditya Rawal, Daniel Fink, Olivier Francon, Bala Raju, Hormoz Shahrzad, Arshak Navruzyan, Nigel Duffy, and Babak Hodjat. 2017. Evolving deep neural networks. (2017). arxiv:1703.00548

[21]

Renato Negrinho and Geoff Gordon. 2017. Deeparchitect: Automatically designing and training deep architectures. (2017). arxiv:1704.08792

[22]

Hang Qi, Evan R. Sparks, and Ameet Talwalkar. 2017. Paleo: A performance model for deep neural networks. In Proceedings of the International Conference on Learning Representations.

[23]

Mohammad Rastegari, Vicente Ordonez, Joseph Redmon, and Ali Farhadi. 2016. Xnor-net: ImageNet classification using binary convolutional neural networks. In Proceedings of the European Conference on Computer Vision. Springer, 525--542.

[24]

Adriana Romero, Nicolas Ballas, Samira Ebrahimi Kahou, Antoine Chassang, Carlo Gatta, and Yoshua Bengio. 2014. Fitnets: Hints for thin deep nets. (2014). arxiv:1412.6550

[25]

Pytorch Core Team. 2017. Pytorch: Tensors and dynamic neural networks in python with strong GPU acceleration.

[26]

Gregor Urban, Krzysztof J. Geras, Samira Ebrahimi Kahou, Ozlem Aslan, Shengjie Wang, Rich Caruana, Abdelrahman Mohamed, Matthai Philipose, and Matt Richardson. 2016. Do deep convolutional nets really need to be deep and convolutional? (2016). arxiv:1603.05691

[27]

Tien-Ju Yang, Yu-Hsin Chen, and Vivienne Sze. 2017. Designing energy-efficient convolutional neural networks using energy-aware pruning. (2017). arxiv:1611.05128

[28]

Sergey Zagoruyko and Nikos Komodakis. 2016. Wide residual networks. (2016). arxiv:1605.07146

[29]

Shuchang Zhou, Yuxin Wu, Zekun Ni, Xinyu Zhou, He Wen, and Yuheng Zou. 2016. DoReFa-Net: Training low bitwidth convolutional neural networks with low bitwidth gradients. (2016). arxiv:1606.06160

[30]

Barret Zoph and Quoc V. Le. 2016. Neural architecture search with reinforcement learning. (2016). arxiv:1611.01578

[31]

Barret Zoph, Vijay Vasudevan, Jonathon Shlens, and Quoc V. Le. 2017. Learning transferable architectures for scalable image recognition. (2017). arxiv:1707.07012

Cited By

Comas DMeschino GAmalfitano ABallarin V(2023)Analysis and Interpretation of Deep Convolutional Features Using Self-organizing MapsInnovations in Machine and Deep Learning10.1007/978-3-031-40688-1_10(213-229)Online publication date: 29-Sep-2023
https://doi.org/10.1007/978-3-031-40688-1_10
Sedghani HArdagna DMatteucci MFontana GVerticale GAmarilli FBadia RLezzi DBlanquer IMartin AWawruch K(2021)Advancing Design and Runtime Management of AI Applications with AI-SPRINT (Position Paper)2021 IEEE 45th Annual Computers, Software, and Applications Conference (COMPSAC)10.1109/COMPSAC51774.2021.00216(1455-1462)Online publication date: Jul-2021
https://doi.org/10.1109/COMPSAC51774.2021.00216
Roghair JNiaraki AKo KJannesari A(2021)A Vision Based Deep Reinforcement Learning Algorithm for UAV Obstacle AvoidanceIntelligent Systems and Applications10.1007/978-3-030-82193-7_8(115-128)Online publication date: 4-Aug-2021
https://doi.org/10.1007/978-3-030-82193-7_8
Show More Cited By

Index Terms

The Art of Getting Deep Neural Networks in Shape
1. Computing methodologies
  1. Machine learning
    1. Learning paradigms
      1. Supervised learning
    2. Machine learning approaches
      1. Neural networks

Recommendations

Deep Elman recurrent neural networks for statistical parametric speech synthesis

Owing to the success of deep learning techniques in automatic speech recognition, deep neural networks (DNNs) have been used as acoustic models for statistical parametric speech synthesis (SPSS). DNNs do not inherently model the temporal structure in ...
Deep Kronecker neural networks: A general framework for neural networks with adaptive activation functions
Abstract
We propose a new type of neural networks, Kronecker neural networks (KNNs), that form a general framework for neural networks with adaptive activation functions. KNNs employ the Kronecker product, which provides an efficient way of ...
Symmetric Power Activation Functions for Deep Neural Networks
LOPAL '18: Proceedings of the International Conference on Learning and Optimization Algorithms: Theory and Applications

Common nonlinear activation functions with large saturation regions, like Sigmoid and Tanh, used for Deep Neural Networks (DNNs) can not guarantee useful and efficient training since they suffer from vanishing gradients problem. Rectified Linear Units ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Architecture and Code Optimization

ACM Transactions on Architecture and Code Optimization Volume 15, Issue 4

December 2018

706 pages

ISSN:1544-3566

EISSN:1544-3973

DOI:10.1145/3284745

Editor:
Koen De Bosschere
Ghent University

Issue’s Table of Contents

Copyright © 2019 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 08 January 2019

Accepted: 01 October 2018

Revised: 01 September 2018

Received: 01 May 2018

Published in TACO Volume 15, Issue 4

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Funding Sources

Hessian LOEWE initiative within the Software - Factory 4.0 project
Graduate School of Excellence Computational Engineering

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

7
Total Citations
View Citations
1,493
Total Downloads

Downloads (Last 12 months)203
Downloads (Last 6 weeks)26

Reflects downloads up to 26 Jul 2024

Other Metrics

View Author Metrics

Citations

Cited By

Comas DMeschino GAmalfitano ABallarin V(2023)Analysis and Interpretation of Deep Convolutional Features Using Self-organizing MapsInnovations in Machine and Deep Learning10.1007/978-3-031-40688-1_10(213-229)Online publication date: 29-Sep-2023
https://doi.org/10.1007/978-3-031-40688-1_10
Sedghani HArdagna DMatteucci MFontana GVerticale GAmarilli FBadia RLezzi DBlanquer IMartin AWawruch K(2021)Advancing Design and Runtime Management of AI Applications with AI-SPRINT (Position Paper)2021 IEEE 45th Annual Computers, Software, and Applications Conference (COMPSAC)10.1109/COMPSAC51774.2021.00216(1455-1462)Online publication date: Jul-2021
https://doi.org/10.1109/COMPSAC51774.2021.00216
Roghair JNiaraki AKo KJannesari A(2021)A Vision Based Deep Reinforcement Learning Algorithm for UAV Obstacle AvoidanceIntelligent Systems and Applications10.1007/978-3-030-82193-7_8(115-128)Online publication date: 4-Aug-2021
https://doi.org/10.1007/978-3-030-82193-7_8
Galanis IAnagnostopoulos INguyen CBares GBurkard D(2020)Inference and Energy Efficient Design of Deep Neural Networks for Embedded Devices2020 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)10.1109/ISVLSI49217.2020.00017(36-41)Online publication date: Jul-2020
https://doi.org/10.1109/ISVLSI49217.2020.00017
Sirbu CSimion GCaleanu C(2020)Deep CNN for Contrast-Enhanced Ultrasound Focal Liver Lesions Diagnosis2020 International Symposium on Electronics and Telecommunications (ISETC)10.1109/ISETC50328.2020.9301116(1-4)Online publication date: 5-Nov-2020
https://doi.org/10.1109/ISETC50328.2020.9301116
Murugesan MThilagamani S(2020)EFFICIENT ANOMALY DETECTION IN SURVEILLANCE VIDEOS BASED ON MULTI LAYER PERCEPTION RECURRENT NEURAL NETWORKMicroprocessors and Microsystems10.1016/j.micpro.2020.103303(103303)Online publication date: Oct-2020
https://doi.org/10.1016/j.micpro.2020.103303
Sreenu GSaleem Durai M(2019)Intelligent video surveillance: a review through deep learning techniques for crowd analysisJournal of Big Data10.1186/s40537-019-0212-56:1Online publication date: 6-Jun-2019
https://doi.org/10.1186/s40537-019-0212-5

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Media

Figures

Other

Tables

View Issue’s Table of Contents