Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
survey
Open access

A Survey on Distributed Machine Learning

Published: 20 March 2020 Publication History

Abstract

The demand for artificial intelligence has grown significantly over the past decade, and this growth has been fueled by advances in machine learning techniques and the ability to leverage hardware acceleration. However, to increase the quality of predictions and render machine learning solutions feasible for more complex applications, a substantial amount of training data is required. Although small machine learning models can be trained with modest amounts of data, the input for training larger models such as neural networks grows exponentially with the number of parameters. Since the demand for processing training data has outpaced the increase in computation power of computing machinery, there is a need for distributing the machine learning workload across multiple machines, and turning the centralized into a distributed system. These distributed systems present new challenges: first and foremost, the efficient parallelization of the training process and the creation of a coherent model. This article provides an extensive overview of the current state-of-the-art in the field by outlining the challenges and opportunities of distributed machine learning over conventional (centralized) machine learning, discussing the techniques used for distributed machine learning, and providing an overview of the systems that are available.

References

[1]
Martín Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Greg S. Corrado, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Ian Goodfellow, Andrew Harp, Geoffrey Irving, Michael Isard, Yangqing Jia, Rafal Jozefowicz, Lukasz Kaiser, Manjunath Kudlur, Josh Levenberg, Dandelion Mané, Rajat Monga, Sherry Moore, Derek Murray, Chris Olah, Mike Schuster, Jonathon Shlens, Benoit Steiner, Ilya Sutskever, Kunal Talwar, Paul Tucker, Vincent Vanhoucke, Vijay Vasudevan, Fernanda Viégas, Oriol Vinyals, Pete Warden, Martin Wattenberg, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. 2015. TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems. Retrieved from https://www.tensorflow.org/.
[2]
Martin Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, Manjunath Kudlur, Josh Levenberg, Rajat Monga, Sherry Moore, Derek G. Murray, Benoit Steiner, Paul Tucker, Vijay Vasudevan, Pete Warden, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. 2016. TensorFlow: A system for large-scale machine learning. In Proceedings of the 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI’16). 265--283.
[3]
Martin Abadi, Andy Chu, Ian Goodfellow, Brendan McMahan, Ilya Mironov, Kunal Talwar, and Li Zhang. 2016. Deep learning with differential privacy. In Proceedings of the 23rd ACM Conference on Computer and Communications Security (ACM CCS’16). 308--318. Retrieved from https://arxiv.org/abs/1607.00133.
[4]
Adapteva, Inc. 2017. E64G401 Epiphany 64-core Microprocessor Datasheet. Retrieved from http://www.adapteva.com/docs/e64g401_datasheet.pdf.
[5]
Alekh Agarwal, Olivier Chapelle, Miroslav Dudík, and John Langford. 2014. A reliable effective terascale linear learning system. J. Mach. Learn. Res. 15, 1 (2014), 1111--1133.
[6]
Vinay Amatya, Abhinav Vishnu, Charles Siegel, and Jeff Daily. 2017. What does fault tolerant deep learning need from MPI? CoRR abs/1709.03316 (2017). arxiv:1709.03316 http://arxiv.org/abs/1709.03316.
[7]
Amazon Web Services. 2018. Amazon SageMaker. Retrieved from https://aws.amazon.com/sagemaker/developer-resources/.
[8]
Dario Amodei, Sundaram Ananthanarayanan, Rishita Anubhai, Jingliang Bai, Eric Battenberg, Carl Case, Jared Casper, Bryan Catanzaro, Qiang Cheng, Guoliang Chen, Jie Chen, Jingdong Chen, Zhijie Chen, Mike Chrzanowski, Adam Coates, Greg Diamos, Ke Ding, Niandong Du, Erich Elsen, Jesse Engel, Weiwei Fang, Linxi Fan, Christopher Fougner, Liang Gao, Caixia Gong, Awni Hannun, Tony Han, Lappi Johannes, Bing Jiang, Cai Ju, Billy Jun, Patrick LeGresley, Libby Lin, Junjie Liu, Yang Liu, Weigao Li, Xiangang Li, Dongpeng Ma, Sharan Narang, Andrew Ng, Sherjil Ozair, Yiping Peng, Ryan Prenger, Sheng Qian, Zongfeng Quan, Jonathan Raiman, Vinay Rao, Sanjeev Satheesh, David Seetapun, Shubho Sengupta, Kavya Srinet, Anuroop Sriram, Haiyuan Tang, Liliang Tang, Chong Wang, Jidong Wang, Kaifu Wang, Yi Wang, Zhijian Wang, Zhiqian Wang, Shuang Wu, Likai Wei, Bo Xiao, Wen Xie, Yan Xie, Dani Yogatama, Bin Yuan, Jun Zhan, and Zhenyao Zhu. 2016. Deep speech 2 : End-to-end speech recognition in English and Mandarin. In Proceedings of the 33rd International Conference on Machine Learning, Maria Florina Balcan and Kilian Q. Weinberger (Eds.), Vol. 48. PMLR, New York, New York, 173--182. Retrieved from http://proceedings.mlr.press/v48/amodei16.html.
[9]
Edward Anderson, Zhaojun Bai, Jack Dongarra, Anne Greenbaum, Alan McKenney, Jeremy Du Croz, Sven Hammarling, James Demmel, C. Bischof, and Danny Sorensen. 1990. LAPACK: A portable linear algebra library for high-performance computers. In Proceedings of the ACM/IEEE Conference on Supercomputing. IEEE Computer Society Press, 2--11.
[10]
Apple. 2017. Core ML Model Format Specification. Retrieved from https://apple.github.io/coremltools/coremlspecification/.
[11]
Apple. 2018. A12 Bionic. Retrieved from https://www.apple.com/iphone-xs/a12-bionic/.
[12]
Eugene Bagdasaryan, Andreas Veit, Yiqing Hua, Deborah Estrin, and Vitaly Shmatikov. 2018. How to backdoor federated learning. arXiv preprint arXiv:1807.00459 (2018).
[13]
Drew Bagnell and Andrew Y. Ng. 2006. On local rewards and scaling distributed reinforcement learning. In Proceedings of the International Conference on Advances in Neural Information Processing Systems. 91--98.
[14]
Maria-Florina Balcan, Avrim Blum, Shai Fine, and Yishay Mansour. 2012. Distributed learning, communication complexity and privacy. In Proceedings of the Conference on Learning Theory. 26--1.
[15]
Paul Baran. 1962. On Distributed Communication Networks. Rand Corporation.
[16]
Luiz André Barroso, Jeffrey Dean, and Urs Hölzle. 2003. Web search for a planet: The Google cluster architecture. IEEE Micro 2 (2003), 22--28.
[17]
James Bergstra and Yoshua Bengio. 2012. Random search for hyper-parameter optimization. J. Mach. Learn. Res. 13 (Feb. 2012), 281--305.
[18]
James Bergstra, Olivier Breuleux, Frédéric Bastien, Pascal Lamblin, Razvan Pascanu, Guillaume Desjardins, Joseph Turian, David Warde-Farley, and Yoshua Bengio. 2010. Theano: A CPU and GPU math compiler in Python. In Proceedings of the 9th Python in Science Conference, Vol. 1.
[19]
Philip A. Bernstein and Eric Newcomer. 2009. Principles of Transaction Processing. Morgan Kaufmann.
[20]
L. Susan Blackford, Antoine Petitet, Roldan Pozo, Karin Remington, R. Clint Whaley, James Demmel, Jack Dongarra, Iain Duff, Sven Hammarling, Greg Henry, et al. 2002. An updated set of basic linear algebra subprograms (BLAS). ACM Trans. Math. Softw. 28, 2 (2002), 135--151.
[21]
David M. Blei. 2012. Probabilistic topic models. Commun. ACM 55, 4 (2012), 77--84.
[22]
David M. Blei, Andrew Y. Ng, and Michael I. Jordan. 2003. Latent Dirichlet allocation. J. Mach. Learn. Res. 3, Jan. (2003), 993--1022.
[23]
Mariusz Bojarski, Davide Del Testa, Daniel Dworakowski, Bernhard Firner, Beat Flepp, Prasoon Goyal, Lawrence D. Jackel, Mathew Monfort, Urs Muller, Jiakai Zhang, Xin Zhang, Jake Zhao, and Karol Zieba. 2016. End to end learning for self-driving cars. CoRR abs/1604.07316 (2016). arxiv:1604.07316 http://arxiv.org/abs/1604.07316.
[24]
Léon Bottou. 2010. Large-scale machine learning with stochastic gradient descent. In Proceedings of the COMPSTAT’2010. Springer, 177--186.
[25]
Leo Breiman. 2001. Random forests. Mach. Learn. 45, 1 (1 Oct. 2001), 5--32.
[26]
Stephen Brooks. 1998. Markov chain Monte Carlo method and its application. J. Roy. Statist. Soc.: Series D (the Statist.) 47, 1 (1998), 69--100.
[27]
Rajkumar Buyya et al. 1999. High Performance Cluster Computing: Architectures and Systems. Prentice Hall, Upper SaddleRiver, NJ, 999.
[28]
Richard H. Byrd, Samantha L. Hansen, Jorge Nocedal, and Yoram Singer. 2016. A stochastic quasi-Newton method for large-scale optimization. SIAM J. Optim. 26, 2 (2016), 1008--1031.
[29]
K. Canini, T. Chandra, E. Ie, J. McFadden, K. Goldman, M. Gunter, J. Harmsen, K. LeFevre, D. Lepikhin, T. L. Llinares, et al. 2012. Sibyl: A system for large scale supervised machine learning. Tech. Talk 1 (2012), 113.
[30]
Jianmin Chen, Rajat Monga, Samy Bengio, and Rafal Józefowicz. 2016. Revisiting distributed synchronous SGD. CoRR abs/1604.00981 (2016). arxiv:1604.00981 http://arxiv.org/abs/1604.00981.
[31]
Kai Chen and Qiang Huo. 2016. Scalable training of deep learning machines by incremental block training with intra-block parallel optimization and blockwise model-update filtering. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP’16). IEEE, 5880--5884.
[32]
Tianshi Chen, Zidong Du, Ninghui Sun, Jia Wang, Chengyong Wu, Yunji Chen, and Olivier Temam. 2014. Diannao: A small-footprint high-throughput accelerator for ubiquitous machine-learning. ACM SIGPLAN Not. 49, 4 (2014), 269--284.
[33]
Tianqi Chen, Mu Li, Yutian Li, Min Lin, Naiyan Wang, Minjie Wang, Tianjun Xiao, Bing Xu, Chiyuan Zhang, and Zheng Zhang. 2015. MXNet: A flexible and efficient machine learning library for heterogeneous distributed systems. CoRR abs/1512.01274 (2015). arxiv:1512.01274 http://arxiv.org/abs/1512.01274.
[34]
Trishul Chilimbi, Yutaka Suzue, Johnson Apacible, and Karthik Kalyanaraman. 2014. Project Adam: Building an efficient and scalable deep learning training system. In Proceedings of the 11th USENIX Symposium on Operating Systems Design and Implementation (OSDI’14). USENIX Association, 571--582. Retrieved from https://www.usenix.org/conference/osdi14/technical-sessions/presentation/chilimbi.
[35]
François Chollet et al. 2015. Keras. Retrieved from https://keras.io/.
[36]
Cheng-Tao Chu, Sang K. Kim, Yi-An Lin, YuanYuan Yu, Gary Bradski, Kunle Olukotun, and Andrew Y. Ng. 2007. Map-reduce for machine learning on multicore. In Proceedings of the International Conference on Advances in Neural Information Processing Systems. 281--288.
[37]
Scott H. Clearwater, Tze-Pin Cheng, Haym Hirsh, and Bruce G. Buchanan. 1989. Incremental batch learning. In Proceedings of the 6th International Workshop on Machine Learning. Elsevier, 366--370.
[38]
Adam Coates, Brody Huval, Tao Wang, David Wu, Bryan Catanzaro, and Ng Andrew. 2013. Deep learning with COTS HPC systems. In Proceedings of the International Conference on Machine Learning. 1337--1345.
[39]
Elias De Coninck, Steven Bohez, Sam Leroux, Tim Verbelen, Bert Vankeirsbilck, Pieter Simoens, and Bart Dhoedt. 2018. DIANNE: A modular framework for designing, training and deploying deep neural networks on heterogeneous distributed infrastructure. J. Syst. Softw. 141 (2018), 52--65.
[40]
George F. Coulouris, Jean Dollimore, and Tim Kindberg. 2005. Distributed Systems: Concepts and Design. Pearson Education.
[41]
Henggang Cui, James Cipar, Qirong Ho, Jin Kyu Kim, Seunghak Lee, Abhimanu Kumar, Jinliang Wei, Wei Dai, Gregory R. Ganger, Phillip B. Gibbons, et al. 2014. Exploiting bounded staleness to speed Up big data analytics. In Proceedings of the USENIX Annual Technical Conference. 37--48.
[42]
Yu-Hong Dai and Yaxiang Yuan. 1999. A nonlinear conjugate gradient method with a strong global convergence property. SIAM J. Optim. 10, 1 (1999), 177--182.
[43]
Jeffrey Dean, Greg S. Corrado, Rajat Monga, Kai Chen, Matthieu Devin, Quoc V. Le, Mark Z. Mao, Marc’Aurelio Ranzato, Andrew Senior, Paul Tucker, Ke Yang, and Andrew Y. Ng. 2012. Large scale distributed deep networks. In Proceedings of the 25th International Conference on Neural Information Processing Systems, Vol. 1 (NIPS’12). Curran Associates Inc., 1223--1231. Retrieved from http://dl.acm.org/citation.cfm?id=2999134.2999271.
[44]
Jeffrey Dean and Sanjay Ghemawat. 2004. MapReduce: Simplified data processing on large clusters. In Proceedings of the 6th Conference on Operating Systems Design 8 Implementation, Vol. 6 (OSDI’04). USENIX Association, 10--10.
[45]
John Duchi, Elad Hazan, and Yoram Singer. 2011. Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12 (July 2011), 2121--2159. Retrieved from http://dl.acm.org/citation.cfm?id=1953048.2021068.
[46]
Elmootazbellah Nabil Elnozahy, Lorenzo Alvisi, Yi-Min Wang, and David B. Johnson. 2002. A survey of rollback-recovery protocols in message-passing systems. ACM Comput. Surv. 34, 3 (2002), 375--408.
[47]
Facebook. 2017. Gloo. Retrieved from https://github.com/facebookincubator/gloo.
[48]
Clément Farabet, Berin Martini, Benoit Corda, Polina Akselrod, Eugenio Culurciello, and Yann LeCun. 2011. Neuflow: A runtime reconfigurable dataflow processor for vision. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (CVPRW’11). IEEE, 109--116.
[49]
Michael Ferdman, Almutaz Adileh, Onur Kocberber, Stavros Volos, Mohammad Alisafaee, Djordje Jevdjic, Cansu Kaynak, Adrian Daniel Popescu, Anastasia Ailamaki, and Babak Falsafi. 2012. Clearing the clouds: A study of emerging scale-out workloads on modern hardware. In ACM SIGPLAN Not., Vol. 47. ACM, 37--48.
[50]
Manuel Fernández-Delgado, Eva Cernadas, Senén Barro, and Dinani Amorim. 2014. Do we need hundreds of classifiers to solve real world classification problems. J. Mach. Learn. Res 15, 1 (2014), 3133--3181.
[51]
Michael J. Flynn. 1972. Some computer organizations and their effectiveness. IEEE Trans. Comput. 100, 9 (1972), 948--960.
[52]
Ian Foster and Adriana Iamnitchi. 2003. On death, taxes, and the convergence of peer-to-peer and grid computing. In Proceedings of the International Workshop on Peer-to-Peer Systems. Springer, 118--128.
[53]
Ermias Gebremeskel. 2018. Analysis and comparison of distributed training techniques for deep neural networks in a dynamic environment. (2018).
[54]
Stuart Geman, Elie Bienenstock, and René Doursat. 1992. Neural networks and the bias/variance dilemma. Neural Comput. 4, 1 (1992), 1--58.
[55]
Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung. 2003. The Google file system. In Proceedings of the ACM Symposium on Operating Systems Principles.
[56]
Andrew Gibiansky. 2017. Bringing HPC Techniques to Deep Learning. Retrieved from http://research.baidu.com/bringing-hpc-techniques-deep-learning/.
[57]
Yue-Jiao Gong, Wei-Neng Chen, Zhi-Hui Zhan, Jun Zhang, Yun Li, Qingfu Zhang, and Jing-Jing Li. 2015. Distributed evolutionary algorithms and their models: A survey of the state-of-the-art. Appl. Soft Comput. 34 (2015), 286--300.
[58]
Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative adversarial nets. In Proceedings of the International Conference on Advances in Neural Information Processing Systems, Vol. 27, Z. Ghahramani, M. Welling, C. Cortes, N. D. Lawrence, and K. Q. Weinberger (Eds.). Curran Associates, Inc., 2672--2680.
[59]
Michael T. Goodrich, Nodari Sitchinava, and Qin Zhang. 2011. Sorting, searching, and simulation in the MapReduce framework. In Proceedings of the International Symposium on Algorithms and Computation. 374--383.
[60]
Google. 2017. Google Cloud TPU. Retrieved from https://cloud.google.com/tpu.
[61]
Priya Goyal, Piotr Dollár, Ross B. Girshick, Pieter Noordhuis, Lukasz Wesolowski, Aapo Kyrola, Andrew Tulloch, Yangqing Jia, and Kaiming He. 2017. Accurate, large minibatch SGD: Training ImageNet in 1 hour. CoRR abs/1706.02677 (2017). arxiv:1706.02677 http://arxiv.org/abs/1706.02677.
[62]
William D. Gropp, William Gropp, Ewing Lusk, and Anthony Skjellum. 1999. Using MPI: Portable Parallel Programming with the Message-passing Interface. Vol. 1. The MIT Press.
[63]
Mehmet Gönen. 2012. Predicting drug-target interactions from chemical and genomic kernels using Bayesian matrix factorization. Bioinformatics 28, 18 (2012), 2304--2310.
[64]
D. Hall, D. Ramage, et al. 2009. Breeze: Numerical Processing Library for Scala. Retrieved from https://github. com/scalanlp/breeze.
[65]
Minyang Han and Khuzaima Daudjee. 2015. Giraph unchained: Barrierless asynchronous parallel execution in pregel-like graph processing systems. Proc. VLDB Endow. 8, 9 (May 2015), 950--961.
[66]
Tianyi David Han and Tarek S. Abdelrahman. 2011. Reducing branch divergence in GPU programs. In Proceedings of the 4th Workshop on General Purpose Processing on Graphics Processing Units. 3 (Mar. 2011) 1--8.
[67]
Elmar Haußmann. 2018. Accelerating I/O Bound Deep Learning on Shared Storage. Retrieved from https://blog.riseml.com/accelerating-io-bound-deep-learning-e0e3f095fd0.
[68]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2015. Deep residual learning for image recognition. CoRR abs/1512.03385 (2015). arxiv:1512.03385 http://arxiv.org/abs/1512.03385.
[69]
Magnus Rudolph Hestenes, Eduard Stiefel, and others. 1952. Methods of conjugate gradients for solving linear systems. Journal of Research of the National Bureau of Standards 49, 6 (1952), 409--436.
[70]
Geoffrey E. Hinton, Nitish Srivastava, Alex Krizhevsky, Ilya Sutskever, and Ruslan R. Salakhutdinov. 2012. Improving neural networks by preventing co-adaptation of feature detectors. arXiv preprint arXiv:1207.0580 (2012).
[71]
Briland Hitaj, Giuseppe Ateniese, and Fernando Perez-Cruz. 2017. Deep models under the GAN: Information leakage from collaborative deep learning. In Proceedings of the ACM SIGSAC Conference on Computer and Communications Security. ACM, 603--618.
[72]
Thomas Hofmann. 1999. Probabilistic latent semantic analysis. In Proceedings of the 15th Conference on Uncertainty in Artificial Intelligence. Morgan Kaufmann Publishers Inc., 289--296.
[73]
Kevin Hsieh, Aaron Harlap, Nandita Vijaykumar, Dimitris Konomis, Gregory R. Ganger, Phillip B. Gibbons, and Onur Mutlu. 2017. Gaia: Geo-distributed machine learning approaching LAN speeds. In Proceedings of the 14th USENIX Symposium on Networked Systems Design and Implementation (NSDI’17). USENIX Association, 629--647. Retrieved from https://www.usenix.org/conference/nsdi17/technical-sessions/presentation/hsieh.
[74]
IBM Cloud. 2018. IBM Watson Machine Learning. Retrieved from https://www.ibm.com/cloud/machine-learning.
[75]
Michael Isard, Mihai Budiu, Yuan Yu, Andrew Birrell, and Dennis Fetterly. 2007. Dryad: Distributed data-parallel programs from sequential building blocks. In ACM SIGOPS Operating Systems Review, Vol. 41. ACM, 59--72.
[76]
Sylvain Jeaugey. 2017. NCCL 2.0. Retrieved from http://on-demand.gputechconf.com/gtc/2017/presentation/s7155-jeaugey-nccl.pdf.
[77]
Genlin Ji and Xiaohan Ling. 2007. Ensemble learning based distributed clustering. In Proceedings of the Pacific-Asia Conference on Knowledge Discovery and Data Mining. Springer, 312--321.
[78]
Yangqing Jia, Evan Shelhamer, Jeff Donahue, Sergey Karayev, Jonathan Long, Ross Girshick, Sergio Guadarrama, and Trevor Darrell. 2014. Caffe: Convolutional architecture for fast feature embedding. In Proceedings of the 22nd ACM International Conference on Multimedia. ACM, 675--678.
[79]
Michael I. Jordan and Tom M. Mitchell. 2015. Machine learning: Trends, perspectives, and prospects. Science 349, 6245 (2015), 255--260.
[80]
Norman P. Jouppi, Cliff Young, Nishant Patil, David Patterson, Gaurav Agrawal, Raminder Bajwa, Sarah Bates, Suresh Bhatia, Nan Boden, Al Borchers, et al. 2017. In-datacenter performance analysis of a tensor processing unit. In Proceedings of the ACM/IEEE 44th International Symposium on Computer Architecture (ISCA’17). IEEE, 1--12.
[81]
Leslie Pack Kaelbling, Michael L. Littman, and Andrew W. Moore. 1996. Reinforcement learning: A survey. J. Artific. Intell. Res. 4 (1996), 237--285.
[82]
Amir E. Khandani, Adlar J. Kim, and Andrew W. Lo. 2010. Consumer credit-risk models via machine-learning algorithms. J. Bank. Fin. 34, 11 (2010), 2767--2787.
[83]
Yehuda Koren, Robert Bell, and Chris Volinsky. 2009. Matrix factorization techniques for recommender systems. Computer 42, 8 (Aug. 2009), 30--37.
[84]
Thorsten Kurth, Jian Zhang, Nadathur Satish, Evan Racah, Ioannis Mitliagkas, Md Mostofa Ali Patwary, Tareq Malas, Narayanan Sundaram, Wahid Bhimji, Mikhail Smorkalov, et al. 2017. Deep learning at 15pf: Supervised and semi-supervised classification for scientific data. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. ACM, 7.
[85]
Donghwoon Kwon, Hyunjoo Kim, Jinoh Kim, Sang C. Suh, Ikkyun Kim, and Kuinam J. Kim. 2017. A survey of deep learning-based network anomaly detection. Cluster Comput. (27 Sep. 2017).
[86]
Ralf Lammel. 2008. Google’s MapReduce programming model—Revisited. Sci. Comput. Prog. 70, 1 (2008), 1.
[87]
Sara Landset, Taghi M. Khoshgoftaar, Aaron N. Richter, and Tawfiq Hasanin. 2015. A survey of open source tools for machine learning with big data in the Hadoop ecosystem. J. Big Data 2, 1 (2015), 24.
[88]
Hugo Larochelle, Dumitru Erhan, Aaron Courville, James Bergstra, and Y. Bengio. 2007. An empirical evaluation of deep architectures on problems with many factors of variation. In Proceedings of the International Conference on Machine Learning, Vol. 227, 473--480.
[89]
Chuck L. Lawson, Richard J. Hanson, David R. Kincaid, and Fred T. Krogh. 1979. Basic linear algebra subprograms for Fortran usage. ACM Trans. Math. Softw. 5, 3 (1979), 308--323.
[90]
Quoc V. Le. 2013. Building high-level features using large scale unsupervised learning. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP’13). IEEE, 8595--8598.
[91]
Guanpeng Li, Siva Kumar Sastry Hari, Michael Sullivan, Timothy Tsai, Karthik Pattabiraman, Joel Emer, and Stephen W. Keckler. 2017. Understanding error propagation in deep learning neural network (DNN) accelerators and applications. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. ACM, 8.
[92]
Mu Li, David G. Andersen, Alexander Smola, and Kai Yu. 2014. Communication efficient distributed machine learning with the parameter server. In Proceedings of the 27th International Conference on Neural Information Processing Systems (NIPS’14), Vol. 1. The MIT Press, 19--27.
[93]
Mu Li, David G. Anderson, Jun Woo Park, Alexander J. Smola, Amr Ahmed, Vanja Josifovski, James Long, Eugene J. Shekita, and Bor-Yiing Su. 2014. Scaling distributed machine learning with the parameter server. In Proceedings of the USENIX Symposium on Operating Systems Design and Implementation (OSDI’14). 583--598.
[94]
Mu Li, Li Zhou, Zichao Yang, Aaron Li, Fei Xia, David G. Andersen, and Alexander Smola. 2013. Parameter server for distributed machine learning. In Big Learning NIPS Workshop, Vol. 6. 2 pages.
[95]
Dong C. Liu and Jorge Nocedal. 1989. On the limited memory BFGS method for large scale optimization. Math. Prog. 45, 1--3 (1989), 503--528.
[96]
A. R. Mamidala, G. Kollias, C. Ward, and F. Artico. 2018. MXNET-MPI: Embedding MPI parallelism in parameter server task model for scaling deep learning. ArXiv e-prints (Jan. 2018). arxiv:cs.DC/1801.03855.
[97]
H. Brendan McMahan, Eider Moore, Daniel Ramage, and Blaise Agüera y Arcas. 2016. Federated learning of deep networks using model averaging. (2016).
[98]
Xiangrui Meng, Joseph Bradley, Burak Yavuz, Evan Sparks, Shivaram Venkataraman, Davies Liu, Jeremy Freeman, D. B. Tsai, Manish Amde, Sean Owen, et al. 2016. MLlib: Machine learning in Apache Spark. J. Mach. Learn. Res. 17, 1 (2016), 1235--1241.
[99]
Cade Metz. 2018. Big bets on AI open a new frontier for chip start-ups, too. The New York Times 14 Jan. (2018). Retrieved from https://www.nytimes.com/2018/01/14/technology/artificial-intelligence-chip-start-ups.html.
[100]
Ryan J. Meuth. 2007. GPUs surpass computers at repetitive calculations. IEEE Potent. 26, 6 (2007), 12--23.
[101]
Microsoft. 2018. Microsoft Azure Machine Learning. Retrieved from https://azure.microsoft.com/en-us/overview/machine-learning/.
[102]
Microsoft Inc. 2015. Distributed Machine Learning Toolkit (DMTK). Retrieved from http://www.dmtk.io.
[103]
Jyoti Nandimath, Ekata Banerjee, Ankur Patil, Pratima Kakade, Saumitra Vaidya, and Divyansh Chaturvedi. 2013. Big data analysis using Apache Hadoop. In Proceedings of the IEEE 14th International Conference on Information Reuse 8 Integration (IRI’13). IEEE, 700--703.
[104]
Fairuz Amalina Narudin, Ali Feizollah, Nor Badrul Anuar, and Abdullah Gani. 2016. Evaluation of machine learning classifiers for mobile malware detection. Soft Comput. 20, 1 (2016), 343--357.
[105]
David Newman, Arthur Asuncion, Padhraic Smyth, and Max Welling. 2009. Distributed algorithms for topic models. J. Mach. Learn. Res. 10, Aug. (2009), 1801--1828.
[106]
NVIDIA Corporation. 2015. NVIDIA Collective Communications Library (NCCL). Retrieved from https://developer.nvidia.com/nccl.
[107]
NVIDIA Corporation. 2017. Nvidia Tesla V100. Retrieved from https://www.nvidia.com/en-us/data-center/tesla-v100/.
[108]
Kyoung-Su Oh and Keechul Jung. 2004. GPU implementation of neural networks. Pattern Recog. 37, 6 (2004), 1311--1314.
[109]
Andreas Olofsson. 2016. Epiphany-V: A 1024 processor 64-bit RISC system-on-chip. (2016).
[110]
Andreas Olofsson, Tomas Nordström, and Zain Ul-Abdin. 2014. Kickstarting high-performance energy-efficient manycore architectures with epiphany. arXiv preprint arXiv:1412.5538 (2014).
[111]
David W. Opitz and Richard Maclin. 1999. Popular ensemble methods: An empirical study. (1999).
[112]
Róbert Ormándi, István Hegedűs, and Márk Jelasity. 2013. Gossip learning with linear models on fully distributed data. Concurr. Comput.: Pract. Exper. 25, 4 (2013), 556--571.
[113]
A. Orriols-Puig, J. Casillas, and E. Bernado-Mansilla. 2009. Fuzzy-UCS: A Michigan-style learning fuzzy-classifier system for supervised learning. IEEE Trans. Evol. Comput. 13, 2 (Apr. 2009), 260--283.
[114]
Kalin Ovtcharov, Olatunji Ruwase, Joo-Young Kim, Jeremy Fowers, Karin Strauss, and Eric S. Chung. 2015. Accelerating deep convolutional neural networks using specialized hardware. Micros. Res. Whitep. 2, 11 (2015).
[115]
Matthew Felice Pace. 2012. BSP vs MapReduce. Proced. Comput. Sci. 9 (2012), 246--255.
[116]
Louis Papageorgiou, Picasi Eleni, Sofia Raftopoulou, Meropi Mantaiou, Vasileios Megalooikonomou, and Dimitrios Vlachakis. 2018. Genomic big data hitting the storage bottleneck. EMBnet. J. 24 (2018).
[117]
Adam Paszke, Sam Gross, Soumith Chintala, Gregory Chanan, Edward Yang, Zachary DeVito, Zeming Lin, Alban Desmaison, Luca Antiga, and Adam Lerer. 2017. Automatic differentiation in PyTorch. (2017).
[118]
Pitch Patarasuk and Xin Yuan. 2009. Bandwidth optimal all-reduce algorithms for clusters of workstations. J. Parallel Distrib. Comput. 69, 2 (Feb. 2009), 117--124.
[119]
Diego Peteiro-Barral and Bertha Guijarro-Berdiñas. 2013. A survey of methods for distributed machine learning. Prog. Artific. Intell. 2, 1 (2013), 1--11.
[120]
Boris T. Polyak. 2007. Newton’s method and its use in optimization. Europ. J. Operat. Res. 181, 3 (2007), 1086--1096.
[121]
Daniel Pop. 2016. Machine learning and cloud computing: Survey of distributed and SaaS solutions. arXiv preprint arXiv:1603.08767 (2016).
[122]
Foster Provost and Venkateswarlu Kolluri. 1999. A survey of methods for scaling up inductive algorithms. Data Min. Knowl. Disc. 3, 2 (1999), 131--169.
[123]
Junfei Qiu, Qihui Wu, Guoru Ding, Yuhua Xu, and Shuo Feng. 2016. A survey of machine learning for big data processing. EURASIP J. Adv. Sig. Proc. 2016, 1 (2016), 67.
[124]
Ioan Raicu, Ian Foster, Alex Szalay, and Gabriela Turcu. 2006. Astroportal: A science gateway for large-scale astronomy data analysis. In Proceedings of the TeraGrid Conference. 12--15.
[125]
Rajat Raina, Anand Madhavan, and Andrew Y. Ng. 2009. Large-scale deep unsupervised learning using graphics processors. In Proceedings of the 26th International Conference on Machine Learning. ACM, 873--880.
[126]
James Reinders. 2013. AVX-512 instructions. Intel Corporation (2013).
[127]
Peter Richtárik and Martin Takáč. 2016. Distributed coordinate descent method for learning with big data. J. Mach. Learn. Res. 17, 1 (2016), 2657--2681.
[128]
Kaz Sato, Cliff Young, and David Patterson. 2017. An in-depth look at Google’s first Tensor Processing Unit (TPU). Google Cloud Big Data Mach. Learn. Blog 12 (2017).
[129]
Frank Seide and Amit Agarwal. 2016. CNTK: Microsoft’s open-source deep-learning toolkit. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 2135--2135.
[130]
Frank Seide, Hao Fu, Jasha Droppo, Gang Li, and Dong Yu. 2014. 1-bit stochastic gradient descent and application to data-parallel distributed training of speech DNNs. In Proceedings of the Conference of the International Speech Communication Association (Interspeech’14). Retrieved from https://www.microsoft.com/en-us/research/publication/1-bit-stochastic-gradient-descent-and-application-to-data-parallel-distributed-training-of-speech-dnns/.
[131]
Alexander Sergeev and Mike Del Balso. 2018. Horovod: Fast and easy distributed deep learning in TensorFlow. Retrieved from https://arxiv.org/abs/1802.05799.
[132]
Pierre Sermanet, Soumith Chintala, and Yann LeCun. 2012. Convolutional neural networks applied to house numbers digit classification. In Proceedings of the 21st International Conference on Pattern Recognition (ICPR’12). IEEE, 3288--3291.
[133]
Pierre Sermanet and Yann LeCun. 2011. Traffic sign recognition with multi-scale convolutional networks. In Proceedings of the International Joint Conference on Neural Networks (IJCNN’11). IEEE, 2809--2813.
[134]
Amazon Web Services. 2016. Introducing Amazon EC2 P2 Instances, the Largest GPU-Powered Virtual Machine in the Cloud. Retrieved from https://aws.amazon.com/about-aws/whats-new/2016/09/introducing-amazon-ec2-p2-instances-the-largest-gpu-powered-virtual-machine-in-the-cloud/.
[135]
Amazon Web Services. 2017. Amazon EC2 F1 Instances. Retrieved from https://aws.amazon.com/ec2/instance-types/f1/.
[136]
Shai Shalev-Shwartz and Tong Zhang. 2013. Stochastic dual coordinate ascent methods for regularized loss minimization. (2013).
[137]
James G. Shanahan and Laing Dai. 2015. Large scale distributed data science using Apache Spark. In Proceedings of the 21st ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 2323--2324.
[138]
Shaohuai Shi and Xiaowen Chu. 2017. Performance modeling and evaluation of distributed deep learning frameworks on GPUs. CoRR abs/1711.05979 (2017). arxiv:1711.05979 Retrieved from http://arxiv.org/abs/1711.05979.
[139]
Shaohuai Shi, Qiang Wang, Pengfei Xu, and Xiaowen Chu. 2016. Benchmarking state-of-the-art deep learning software tools. In Proceedings of the 7th International Conference on Cloud Computing and Big Data (CCBD’16). IEEE, 99--104.
[140]
Reza Shokri and Vitaly Shmatikov. 2015. Privacy-preserving deep learning. In Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security. ACM, 1310--1321.
[141]
Konstantin Shvachko, Hairong Kuang, Sanjay Radia, and Robert Chansler. 2010. The Hadoop distributed file system. In Proceedings of the IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST’10). IEEE, 1--10.
[142]
J. Simm, A. Arany, P. Zakeri, T. Haber, J. K. Wegner, V. Chupakhin, H. Ceulemans, and Y. Moreau. 2017. Macau: Scalable Bayesian factorization with high-dimensional side information using MCMC. In Proceedings of the IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP’17).
[143]
Dilpreet Singh and Chandan K. Reddy. 2015. A survey on platforms for big data analytics. J. Big Data 2, 1 (2015), 8.
[144]
Michael John Sebastian Smith. 1997. Application-specific Integrated Circuits. Vol. 7. Addison-Wesley, Reading, MA.
[145]
Alexander Smola and Shravan Narayanamurthy. 2010. An architecture for parallel topic models. Proc. VLDB Endow. 3, 1--2 (2010), 703--710.
[146]
Jasper Snoek, Hugo Larochelle, and Ryan P. Adams. 2012. Practical Bayesian optimization of machine learning algorithms. In Proceedings of the International Conference on Advances in Neural Information Processing Systems, Vol. 25, F. Pereira, C. J. C. Burges, L. Bottou, and K. Q. Weinberger (Eds.). Curran Associates, Inc., 2951--2959.
[147]
Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott E. Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. 2014. Going deeper with convolutions. CoRR abs/1409.4842 (2014). arxiv:1409.4842 http://arxiv.org/abs/1409.4842.
[148]
Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jonathon Shlens, and Zbigniew Wojna. 2015. Rethinking the inception architecture for computer vision. CoRR abs/1512.00567 (2015). arxiv:1512.00567 http://arxiv.org/abs/1512.00567.
[149]
Martin Takác, Avleen Singh Bijral, Peter Richtárik, and Nati Srebro. 2013. Mini-batch primal and dual methods for SVMs. In Proceedings of the International Conference on Machine Learning (ICML’13). 1022--1030.
[150]
The Khronos Group. 2018. Neural Network Exchange Format (NNEF). Retrieved from https://www.khronos.org/registry/NNEF/specs/1.0/nnef-1.0.pdf.
[151]
K. I. Tsianos, S. F. Lawlor, and M. G. Rabbat. 2012. Communication/computation tradeoffs in consensus-based distributed optimization. In Proceedings of the International Conference on Advances in Neural Information Processing Systems.
[152]
Sujatha R. Upadhyaya. 2013. Parallel approaches to machine learning—A comprehensive survey. J. Parallel Distrib. Comput. 73, 3 (2013), 284--292.
[153]
Praneeth Vepakomma, Tristan Swedish, Ramesh Raskar, Otkrist Gupta, and Abhimanyu Dubey. 2018. No peek: A survey of private distributed deep learning. arXiv preprint arXiv:1812.03288 (2018).
[154]
Tim Verbelen, Pieter Simoens, Filip De Turck, and Bart Dhoedt. 2012. AIOLOS: Middleware for improving mobile application performance through cyber foraging. J. Syst. Softw. 85, 11 (2012), 2629--2639.
[155]
Jinliang Wei, Wei Dai, Aurick Qiao, Qirong Ho, Henggang Cui, Gregory R. Ganger, Phillip B. Gibbons, Garth A. Gibson, and Eric P. Xing. 2015. Managed communication and consistency for fast data-parallel iterative analytics. (2015), 381--394.
[156]
Sholom M. Weiss and Nitin Indurkhya. 1995. Rule-based machine learning methods for functional prediction. J. Artific. Intell. Res. 3 (1995), 383--403.
[157]
Stewart W. Wilson. 1995. Classifier fitness based on accuracy. Evolut. Comput. 3, 2 (1995), 149--175.
[158]
Stephen J. Wright. 2015. Coordinate descent algorithms. Math. Prog. 151, 1 (2015), 3--34.
[159]
P. Xie, J. K. Kim, Y. Zhou, Q. Ho, A. Kumar, Y. Yu, and E. Xing. 2015. Distributed machine learning via sufficient factor broadcasting. (2015).
[160]
E. P. Xing, Q. Ho, W. Dai, J. K. Kim, J. Wei, S. Lee, X. Zheng, P. Xie, A. Kumar, and Y. Yu. 2013. Petuum: A new platform for distributed machine learning on big data. ArXiv e-prints (Dec. 2013). arxiv:stat.ML/1312.7651.
[161]
Eric P. Xing, Qirong Ho, Pengtao Xie, and Dai Wei. 2016. Strategies and principles of distributed machine learning on big data. Engineering 2, 2 (2016), 179--195.
[162]
Feng Yan, Olatunji Ruwase, Yuxiong He, and Evgenia Smirni. 2016. SERF: Efficient scheduling for fast deep neural network serving via judicious parallelism. In Proceedings of the International Conference forHigh Performance Computing, Networking, Storage and Analysis (SC’16). IEEE, 300--311.
[163]
Ying Yan, Yanjie Gao, Yang Chen, Zhongxin Guo, Bole Chen, and Thomas Moscibroda. 2016. TR-Spark: Transient computing for big data analytics. In Proceedings of the 7th ACM Symposium on Cloud Computing (SoCC’16). ACM, New York, NY, 484--496.
[164]
Dani Yogatama, Phil Blunsom, Chris Dyer, Edward Grefenstette, and Wang Ling. 2016. Learning to compose words into sentences with reinforcement learning. CoRR abs/1611.09100 (2016). arxiv:1611.09100 http://arxiv.org/abs/1611.09100.
[165]
Yang You, Aydın Buluç, and James Demmel. 2017. Scaling deep learning on GPU and Knights Landing clusters. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. ACM.
[166]
Haifeng Yu and Amin Vahdat. 2002. Design and evaluation of a conit-based continuous consistency model for replicated services. ACM Trans. Comput. Syst. 20, 3 (2002), 239--282.
[167]
Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur Dave, Justin Ma, Murphy McCauley, Michael J. Franklin, Scott Shenker, and Ion Stoica. 2012. Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing. In Proceedings of the 9th USENIX Conference on Networked Systems Design and Implementation. USENIX Association, 2--2.
[168]
Matei Zaharia, Mosharaf Chowdhury, Michael J. Franklin, Scott Shenker, and Ion Stoica. 2010. Spark: Cluster computing with working sets. In Proceedings of the 2nd USENIX Conference on Hot Topics in Cloud Computing (HotCloud’10) 10, 10--10 (2010), 95.
[169]
Matei Zaharia, Reynold S. Xin, Patrick Wendell, Tathagata Das, Michael Armbrust, Ankur Dave, Xiangrui Meng, Josh Rosen, Shivaram Venkataraman, Michael J. Franklin, et al. 2016. Apache Spark: A unified engine for big data processing. Commun. ACM 59, 11 (2016), 56--65.
[170]
Li Zeng, Ling Li, Lian Duan, Kevin Lu, Zhongzhi Shi, Maoguang Wang, Wenjuan Wu, and Ping Luo. 2012. Distributed data mining: A survey. Inf. Technol. Manag. 13, 4 (2012), 403--409.
[171]
Hao Zhang, Zeyu Zheng, Shizhen Xu, Wei Dai, Qirong Ho, Xiaodan Liang, Zhiting Hu, Jinliang Wei, Pengtao Xie, and Eric P. Xing. 2017. Poseidon: An efficient communication architecture for distributed deep learning on GPU clusters. arXiv preprint (2017).

Cited By

View all
  • (2025)New fusion loss function based on knowledge generation using Gumbel-SoftMax for federated learningThe Journal of Supercomputing10.1007/s11227-024-06593-w81:1Online publication date: 1-Jan-2025
  • (2024)Distributed Technologies Using AI/ML Techniques for Healthcare ApplicationsSocial Innovations in Education, Environment, and Healthcare10.4018/979-8-3693-2569-8.ch019(383-404)Online publication date: 17-Jul-2024
  • (2024)Machine learning and deep learning-based approach in smart healthcare: Recent advances, applications, challenges and opportunitiesAIMS Public Health10.3934/publichealth.202400411:1(58-109)Online publication date: 2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Computing Surveys
ACM Computing Surveys  Volume 53, Issue 2
March 2021
848 pages
ISSN:0360-0300
EISSN:1557-7341
DOI:10.1145/3388460
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 March 2020
Accepted: 01 December 2019
Revised: 01 November 2019
Received: 01 December 2018
Published in CSUR Volume 53, Issue 2

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Distributed machine learning
  2. distributed systems

Qualifiers

  • Survey
  • Survey
  • Refereed

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)6,479
  • Downloads (Last 6 weeks)776
Reflects downloads up to 10 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2025)New fusion loss function based on knowledge generation using Gumbel-SoftMax for federated learningThe Journal of Supercomputing10.1007/s11227-024-06593-w81:1Online publication date: 1-Jan-2025
  • (2024)Distributed Technologies Using AI/ML Techniques for Healthcare ApplicationsSocial Innovations in Education, Environment, and Healthcare10.4018/979-8-3693-2569-8.ch019(383-404)Online publication date: 17-Jul-2024
  • (2024)Machine learning and deep learning-based approach in smart healthcare: Recent advances, applications, challenges and opportunitiesAIMS Public Health10.3934/publichealth.202400411:1(58-109)Online publication date: 2024
  • (2024)A Fair Contribution Measurement Method for Federated LearningSensors10.3390/s2415496724:15(4967)Online publication date: 31-Jul-2024
  • (2024)Multilingual Framework for Risk Assessment and Symptom Tracking (MRAST)Sensors10.3390/s2404110124:4(1101)Online publication date: 8-Feb-2024
  • (2024)How to Design Reinforcement Learning Methods for the Edge: An Integrated Approach toward Intelligent Decision MakingElectronics10.3390/electronics1307128113:7(1281)Online publication date: 29-Mar-2024
  • (2024)Review of Federated Learning and Machine Learning-Based Methods for Medical Image AnalysisBig Data and Cognitive Computing10.3390/bdcc80900998:9(99)Online publication date: 28-Aug-2024
  • (2024)Advancing healthcare through data: the BETTER project's vision for distributed analyticsFrontiers in Medicine10.3389/fmed.2024.147387411Online publication date: 2-Oct-2024
  • (2024)A snapshot of parallelism in distributed deep learning trainingRevista Colombiana de Computación10.29375/25392115.505425:1(60-73)Online publication date: 30-Jun-2024
  • (2024)A Novel NLP-based Stock Market Price Prediction and Risk Analysis FrameworkJournal of Computer Science and Technology10.24215/16666038.24.e0724:2(e07)Online publication date: 18-Oct-2024
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Get Access

Login options

Full Access

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media