Optimal distributed parallel algorithms for deep learning framework Tensorflow

Xie, Yuanlun; He, Majun; Ma, Tingsong; Tian, Wenhong

doi:10.1007/s10489-021-02588-9

Optimal distributed parallel algorithms for deep learning framework Tensorflow

Published: 14 July 2021

Volume 52, pages 3880–3900, (2022)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Yuanlun Xie¹,
Majun He¹,
Tingsong Ma¹ &
…
Wenhong Tian ORCID: orcid.org/0000-0002-5551-9796¹

997 Accesses
Explore all metrics

Abstract

Since its release, the Tensorflow framework has been widely used in various fields due to its advantages in deep learning. However, it is still at its early state. Its native distributed implementation has difficulty in expanding for large models because it has issues of low utilization of multiple GPUs and slow distribution compared with running on single machine. It is of great significance to reduce the training time through parallel models. In view of this, we firstly provided an in-depth analysis of the implementation principle of Tensorflow and identify the bottlenecks of its native distributed parallel models to improve. Then, two optimal algorithms are designed and implemented based on data parallelism and model parallelism modes of Tensorflow. For data parallelism, the proposed algorithm is implemented to replace the native linear execution mode with pipeline execution mode. As for model parallelism, the native random partitioning mode is replaced by our proposed novel greedy algorithm. Finally, we built a homogeneous distributed cluster and a heterogeneous distributed cluster respectively to verify the effectiveness of the proposed algorithms. Through a number of comparative experiments, we showed that the proposed optimal parallel algorithms can effectively reduce model training time by an average of 26.5%(or average 1.5x speedup than native distributed algorithms) and improve the utilization of the cluster while keeping the same accuracy level of native Tensorflow.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Comparative Study of Distributed Deep Learning Tools on Supercomputers

HyPar-Flow: Exploiting MPI and Keras for Scalable Hybrid-Parallel DNN Training with TensorFlow

Accelerating Synchronous Distributed Data Parallel Training with Small Batch Sizes

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

References

LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(7553):436–444
Article Google Scholar
Hinton GE, Salakhutdinov RR (2006) Reducing the dimensionality of data with neural networks. Science 313(5786):504–507
Article MathSciNet Google Scholar
Brownlee J (2018) Better deep learning: train faster, reduce overfitting, and make better predictions machine learning mastery
Shanmugamani R (2018) Deep learning for computer vision: expert techniques to train advanced neural networks using Tensorflow and Keras. Packt Publishing Ltd
Hendrycks D, Mazeika M, Wilson D, Gimpel K (2018) Using trusted data to train deep networks on labels corrupted by severe noise. In: Advances in neural information processing systems, pp 10456–10465
Miikkulainen R, Liang J, Meyerson E, Rawal A, Fink D, Francon O, Raju B, Shahrzad H, Navruzyan A, Duffy N et al (2019) Evolving deep neural networks. In: Artificial intelligence in the age of neural networks and brain computing. Elsevier, pp 293–312
Traore BB, Kamsu-Foguem B, Tangara F (2018) Deep convolution neural network for image recognition. Ecological Informatics 48:257–268
Article Google Scholar
Che Z, Purushotham S, Cho K, Sontag D, Liu Y (2018) Recurrent neural networks for multivariate time series with missing values. Scientific Reports 8(1):1–12
Google Scholar
Gu J, Chowdhury M, Shin KG, Zhu Y, Jeon M, Qian J, Liu H, Guo C (2019) Tiresias: a {GPU} cluster manager for distributed deep learning. In: 16th {USENIX} symposium on networked systems design and implementation ({NSDI}, vol 19, pp 485–500
Shi S, Wang Q, Chu X, Li B, Qin Y, Liu R, Zhao X (2020) Communication-efficient distributed deep learning with merged gradient sparsification on gpus. In: IEEE INFOCOM
Malik A, Lu M, Wang N, Lin Y, Yoo S (2018) Detailed performance analysis of distributed Tensorflow on a gpu cluster using deep learning algorithms. In: 2018 New York scientific data summit (NYSDS). IEEE, pp 1–8
Chen C, Weng Q, Wang W, Li B, Li B (2018) Fast distributed deep learning via worker-adaptive batch sizing. In: Proceedings of the ACM symposium on cloud computing, pp 521–521
Yang E, Kim S-H, Kim T-W, Jeon M, Park S, Youn C-H (2018) An adaptive batch-orchestration algorithm for the heterogeneous gpu cluster environment in distributed deep learning system. In: 2018 IEEE international conference on big data and smart computing (BigComp). IEEE, pp 725–728
Bao Y, Peng Y, Wu C (2019) Deep learning-based job placement in distributed machine learning clusters. In: IEEE INFOCOM 2019-IEEE conference on computer communications. IEEE, pp 505–513
Pang B, Nijkamp E, Wu YN (2020) Deep learning with Tensorflow: a review. J Educ Behav Stat 45(2):227–248
Article Google Scholar
Seetala K, Birdsong W, Reddy YB (2019) Image classification using Tensorflow. In: 16th international conference on information technology-new generations (ITNG 2019). Springer, pp 485–488
Dean J, Corrado G, Monga R, Chen K, Devin M, Mao M, Ranzato M, Senior A, Tucker P, Yang K et al (2012) Large scale distributed deep networks. In: Advances in neural information processing systems, pp 1223–1231
Baldi P, Sadowski P (2014) The dropout learning algorithm. Artificial Intelligence 210:78–122
Article MathSciNet Google Scholar
Kennedy RK, Khoshgoftaar TM, Villanustre F, Humphrey T (2019) A parallel and distributed stochastic gradient descent implementation using commodity clusters. Journal of Big Data 6(1):16
Article Google Scholar
Du X, Kuang D, Ye Y, Li X, Chen M, Du Y, Wu W (2018) Comparative study of distributed deep learning tools on supercomputers. In: International conference on algorithms and architectures for parallel processing. Springer, pp 122–137
Kang B, Jeong J-H, Jeong C (2018) Distributed parallel deep learning for fast extraction of similar weather map. In: TENCON 2018-2018 IEEE region 10 conference. IEEE, pp 1426–1429
Li D, Lai Z, Ge K, Zhang Y, Zhang Z, Wang Q, Wang H (2019) Hpdl: towards a general framework for high-performance distributed deep learning. In: 2019 IEEE 39th International Conference on Distributed Computing Systems (ICDCS). IEEE, pp 1742–1753
Kim S, Yu G-I, Park H, Cho S, Jeong E, Ha H, Lee S, Jeong JS, Chun B-G (2019) Parallax: sparsity-aware data parallel training of deep neural networks. In: Proceedings of the fourteenth eurosys conference 2019, pp 1–15
Gunn DJ, Liu Z, Dave R, Yuan X, Roy K (2019) Touch-based active dloud authentication using traditional machine learning and LSTM on a distributed Tensorflow framework. International Journal of Computational Intelligence and Applications 18(04):1950022
Article Google Scholar
Ranbirsingh JK, Kimm H, Kimm H (2019) Distributed neural networks using Tensorflow over multicore and many-core systems. In: 2019 IEEE 13th international symposium on embedded multicore/many-core systems-on-chip (MCSoC). IEEE, pp 101–107
Kennedy RKL (2018) Parallel distributed deep learning on cluster computers. Training 4(32):256
Google Scholar
Marques J, Falcao G, Alexandre LA (2018) Distributed learning of cnns on heterogeneous cpu/gpu architectures. Appl Artif Intell 32(9-10):822–844
Article Google Scholar
Grabaskas N (2019) Improving usability of distributed neural network training. In: Intelligent computing-proceedings of the computing conference. Springer, pp 867–886
Wen W, Xu C, Yan F, Wu C, Wang Y, Chen Y, Li H (2017) Terngrad: ternary gradients to reduce communication in distributed deep learning. In: Advances in neural information processing systems, pp 1509–1519
Ben-Nun T, Hoefler T (2019) Demystifying parallel and distributed deep learning: an in-depth concurrency analysis. ACM Computing Surveys (CSUR) 52(4):1–43
Article Google Scholar
Chang K, Balachandar N, Lam C, Yi D, Brown J, Beers A, Rosen B, Rubin DL, Kalpathy-Cramer J (2018) Distributed deep learning networks among institutions for medical imaging. J Am Med Inform Assoc 25(8):945–954
Article Google Scholar
Chen C, Yang C, Cheng H (2018) Efficient and robust parallel dnn training through model parallelism on multi-gpu platform, arxiv: Distributed, Parallel and Cluster Computing
Peng Y, Zhu Y, Chen Y, Bao Y, Yi B, Lan C, Wu C, Guo C (2019) A generic communication scheduler for distributed dnn training acceleration. In: Proceedings of the 27th ACM symposium on operating systems principles, ser. SOSP ’19. New York, NY, USA: Association for Computing Machinery, pp 16–29. [Online]. Available: https://doi.org/10.1145/3341301.3359642
Surya RY, Imam Kistijantoro A (2019) Dynamic resource allocation for distributed Tensorflow training in kubernetes cluster. In: 2019 international conference on data and software engineering (ICoDSE), pp 1–6
Mayer R, Mayer C, Laich L (2017) The Tensorflow partitioning and scheduling problem: it’s the critical path! arxiv: Distributed, Parallel, and Cluster Computing, pp 1–6
Chen C, Weng Q, Wang W, Li B, Li B (2018) Fast distributed deep learning via worker-adaptive batch sizing. In: Proceedings of the ACM symposium on cloud computing, ser. SoCC ’18. New York, NY, USA: Association for Computing Machinery, p 521. [Online]. Available: https://doi.org/10.1145/3267809.3275463
Liu J, Jia C, Chen J, Lin H, Jin X, An H (2019) An effective method for operations placement in tensor flow. In: Proceedings of the 3rd international conference on high performance compilation, computing and communications, ser. HP3C ’19. New York, NY, USA: Association for Computing Machinery, pp 13–19. [Online]. Available: https://doi.org/10.1145/3318265.3318270
Sergeev A, Del Balso M (2018) Horovod: fast and easy distributed deep learning in Tensorflow. arXiv:1802.05799
Fujiki D, Mahlke S, Das R (2018) In-memory data parallel processor. ACM SIGPLAN Not 53(2):1–14
Article Google Scholar
Bienia C, Kumar S, Singh JP, Li K (2008) The parsec benchmark suite: characterization and architectural implications. In: Proceedings of the 17th international conference on parallel architectures and compilation techniques, pp 72–81
Hu Z, Qin W (2017) Fuzzy method and neural network model parallel implementation of multi-layer neural network based on cloud computing for real time data transmission in large offshore platform. Polish Maritime Research 24(s2):39–44
Article Google Scholar
Kurth T, Smorkalov M, Mendygral P, Sridharan S, Mathuriya A (2019) Tensorflow at scale: performance and productivity analysis of distributed training with horovod, mlsl, and cray pe ml. Concurrency and Computation: Practice and Experience 31(16):e4989
Article Google Scholar
Liu M, Grana D (2019) Accelerating geostatistical seismic inversion using Tensorflow: a heterogeneous distributed deep learning framework. Computers & Geosciences 124:37–45
Article Google Scholar
Li M, Andersen DG, Park JW, Smola AJ, Ahmed A, Josifovski V, Long J, Shekita EJ, Su B-Y (2014) Scaling distributed machine learning with the parameter server. In: 11th {USENIX} symposium on operating systems design and implementation ({OSDI}, vol 14, pp 583–598
Gibiansky A (2017) Bringing hpc techniques to deep learning, Baidu Research, Tech. Rep.
Abadi M, Agarwal A, Barham P, Brevdo E, Chen Z, Citro C, Corrado G, Davis A, Dean J, Devin M, Ghemawat S, Goodfellow I, Harp A, Irving G, Isard M, Jia Y, Kaiser L, Kudlur M, Levenberg J, Zheng X (2015) Tensorflow : large-scale machine learning on heterogeneous distributed systems, 01

Download references

Acknowledgements

This research is partially supported by National Key Research and Development Program of China with ID 2018AAA0103203.

Author information

Authors and Affiliations

School of Information and Software Engineering, University of Electronic Science and Technology of China, Chengdu, China
Yuanlun Xie, Majun He, Tingsong Ma & Wenhong Tian

Authors

Yuanlun Xie
View author publications
You can also search for this author in PubMed Google Scholar
Majun He
View author publications
You can also search for this author in PubMed Google Scholar
Tingsong Ma
View author publications
You can also search for this author in PubMed Google Scholar
Wenhong Tian
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Wenhong Tian.

Ethics declarations

Conflict of Interests

Author Yuanlun Xie declares that he has no conflict of interest. Author Majun He declares that he has no conflict of interest. Author Tingsong Ma declares that he has no conflict of interest. Wenhong Tian declares that he has no conflict of interest.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Xie, Y., He, M., Ma, T. et al. Optimal distributed parallel algorithms for deep learning framework Tensorflow. Appl Intell 52, 3880–3900 (2022). https://doi.org/10.1007/s10489-021-02588-9

Download citation

Accepted: 02 June 2021
Published: 14 July 2021
Issue Date: March 2022
DOI: https://doi.org/10.1007/s10489-021-02588-9

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Optimal distributed parallel algorithms for deep learning framework Tensorflow

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Comparative Study of Distributed Deep Learning Tools on Supercomputers

HyPar-Flow: Exploiting MPI and Keras for Scalable Hybrid-Parallel DNN Training with TensorFlow

Accelerating Synchronous Distributed Data Parallel Training with Small Batch Sizes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Optimal distributed parallel algorithms for deep learning framework Tensorflow

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Comparative Study of Distributed Deep Learning Tools on Supercomputers

HyPar-Flow: Exploiting MPI and Keras for Scalable Hybrid-Parallel DNN Training with TensorFlow

Accelerating Synchronous Distributed Data Parallel Training with Small Batch Sizes

Explore related subjects

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation