Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3373376.3378499acmconferencesArticle/Chapter ViewAbstractPublication PagesasplosConference Proceedingsconference-collections
research-article
Public Access

Prague: High-Performance Heterogeneity-Aware Asynchronous Decentralized Training

Published: 13 March 2020 Publication History
  • Get Citation Alerts
  • Abstract

    Distributed deep learning training usually adopts All-Reduce as the synchronization mechanism for data parallel algorithms due to its high performance in homogeneous environment. However, its performance is bounded by the slowest worker among all workers. For this reason, it is significantly slower in heterogeneous settings. AD-PSGD, a newly proposed synchronization method which provides numerically fast convergence and heterogeneity tolerance, suffers from deadlock issues and high synchronization overhead. Is it possible to get the best of both worlds --- designing a distributed training method that has both high performance like All-Reduce in homogeneous environment and good heterogeneity tolerance like AD-PSGD?
    In this paper, we propose Prague, a high-performance heterogeneity-aware asynchronous decentralized training approach. We achieve the above goal with intensive synchronization optimization by exploring the interplay between algorithm and system implementation, or statistical and hardware efficiency. To reduce synchronization cost, we propose a novel communication primitive, Partial All-Reduce, that enables fast synchronization among a group of workers. To reduce serialization cost, we propose static group scheduling in homogeneous environment and simple techniques, i.e., Group Buffer and Group Division, to largely eliminate conflicts with slightly reduced randomness. Our experiments show that in homogeneous environment, Prague is 1.2x faster than the state-of-the-art implementation of All-Reduce, 5.3x faster than Parameter Server and 3.7x faster than AD-PSGD. In a heterogeneous setting, Prague tolerates slowdowns well and achieves 4.4x speedup over All-Reduce.

    References

    [1]
    Mart'in Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Greg S. Corrado, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Ian Goodfellow, Andrew Harp, Geoffrey Irving, Michael Isard, Yangqing Jia, Rafal Jozefowicz, Lukasz Kaiser, Manjunath Kudlur, Josh Levenberg, Dan Mané, Rajat Monga, Sherry Moore, Derek Murray, Chris Olah, Mike Schuster, Jonathon Shlens, Benoit Steiner, Ilya Sutskever, Kunal Talwar, Paul Tucker, Vincent Vanhoucke, Vijay Vasudevan, Fernanda Viégas, Oriol Vinyals, Pete Warden, Martin Wattenberg, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. TensorFlow: Large-scale machine learning on heterogeneous systems, 2015. Software available from tensorflow.org.
    [2]
    Martin Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, Manjunath Kudlur, Josh Levenberg, Rajat Monga, Sherry Moore, Derek G. Murray, Benoit Steiner, Paul Tucker, Vijay Vasudevan, Pete Warden, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. Tensorflow: A system for large-scale machine learning. In 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16), pages 265--283, 2016.
    [3]
    Dan Alistarh, Demjan Grubic, Jerry Li, Ryota Tomioka, and Milan Vojnovic. Qsgd: Communication-efficient sgd via gradient quantization and encoding. In I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, editors, Advances in Neural Information Processing Systems 30, pages 1709--1720. Curran Associates, Inc., 2017.
    [4]
    Tal Ben-Nun and Torsten Hoefler. Demystifying parallel and distributed deep learning: An in-depth concurrency analysis, 2018. cite arxiv:1802.09941.
    [5]
    Texas Advanced Computing Center. Maverick2 User Guide - TACC User Portal. https://portal.tacc.utexas.edu/user-guides/maverick2.
    [6]
    Jianmin Chen, Rajat Monga, Samy Bengio, and Rafal Jozefowicz. Revisiting distributed synchronous sgd. In International Conference on Learning Representations Workshop Track, 2016.
    [7]
    Tianqi Chen, Mu Li, Yutian Li, Min Lin, Naiyan Wang, Minjie Wang, Tianjun Xiao, Bing Xu, Chiyuan Zhang, and Zheng Zhang. Mxnet: A flexible and efficient machine learning library for heterogeneous distributed systems. CoRR, abs/1512.01274, 2015.
    [8]
    Minsik Cho, Ulrich Finkler, and David Kung. Blueconnect: Novel hierarchical all-reduce on multi-tired network for deep learning, 2018.
    [9]
    Adam Coates, Brody Huval, Tao Wang, David Wu, Bryan Catanzaro, and Ng Andrew. Deep learning with cots hpc systems. In Sanjoy Dasgupta and David McAllester, editors, Proceedings of the 30th International Conference on Machine Learning, volume 28.3 of Proceedings of Machine Learning Research, pages 1337--1345, Atlanta, Georgia, USA, 17--19 Jun 2013. PMLR.
    [10]
    MPI contributors. MPI: A Message-Passing Interface Standard, 2015. https://www.mpi-forum.org/docs/mpi-3.1/mpi31-report.pdf.
    [11]
    IBM Corporation and Oak Ridge National Laboratory. Summit - IBM Power System AC922, IBM POWER9 22C 3.07GHz, NVIDIA Volta GV100, Dual-rail Mellanox EDR Infiniband | TOP500 Supercomputer Sites. https://www.top500.org/system/179397.
    [12]
    Intel Corporation. Intel? MPI Library | Intel? Software. https://software.intel.com/en-us/mpi-library.
    [13]
    Jeffrey Dean and Luiz André Barroso. The tail at scale. Commun. ACM, 56(2):74--80, February 2013.
    [14]
    Stephen Doherty. The impact of translation technologies on the process and product of translation. International Journal of Communication, 10:969, 02 2016.
    [15]
    Edgar Gabriel, Graham E. Fagg, George Bosilca, Thara Angskun, Jack J. Dongarra, Jeffrey M. Squyres, Vishal Sahay, Prabhanjan Kambadur, Brian Barrett, Andrew Lumsdaine, Ralph H. Castain, David J. Daniel, Richard L. Graham, and Timothy S. Woodall. Open MPI: Goals, concept, and design of a next generation MPI implementation. In Proceedings, 11th European PVM/MPI Users' Group Meeting, pages 97--104, Budapest, Hungary, September 2004.
    [16]
    Priya Goyal, Piotr Dollá r, Ross B. Girshick, Pieter Noordhuis, Lukasz Wesolowski, Aapo Kyrola, Andrew Tulloch, Yangqing Jia, and Kaiming He. Accurate, large minibatch SGD: training imagenet in 1 hour. CoRR, abs/1706.02677, 2017.
    [17]
    Aaron Harlap, Deepak Narayanan, Amar Phanishayee, Vivek Seshadri, Nikhil R. Devanur, Gregory R. Ganger, and Phillip B. Gibbons. Pipedream: Fast and efficient pipeline parallel DNN training. CoRR, abs/1806.03377, 2018.
    [18]
    K. Hazelwood, S. Bird, D. Brooks, S. Chintala, U. Diril, D. Dzhulgakov, M. Fawzy, B. Jia, Y. Jia, A. Kalro, J. Law, K. Lee, J. Lu, P. Noordhuis, M. Smelyanskiy, L. Xiong, and X. Wang. Applied machine learning at facebook: A datacenter infrastructure perspective. In 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA), pages 620--629, Feb 2018.
    [19]
    Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Identity mappings in deep residual networks. In European conference on computer vision, pages 630--645. Springer, 2016.
    [20]
    Geoffrey Hinton, Li Deng, Dong Yu, George Dahl, Abdel-rahman Mohamed, Navdeep Jaitly, Andrew Senior, Vincent Vanhoucke, Patrick Nguyen, Brian Kingsbury, and Tara Sainath. Deep neural networks for acoustic modeling in speech recognition. IEEE Signal Processing Magazine, 29:82--97, November 2012.
    [21]
    Qirong Ho, James Cipar, Henggang Cui, Jin Kyu Kim, Seunghak Lee, Phillip B. Gibbons, Garth A. Gibson, Gregory R. Ganger, and Eric P. Xing. More effective distributed ml via a stale synchronous parallel parameter server. In Proceedings of the 26th International Conference on Neural Information Processing Systems - Volume 1, NIPS'13, pages 1223--1231, USA, 2013. Curran Associates Inc.
    [22]
    Rankyung Hong and Abhishek Chandra. Decentralized distributed deep learning in heterogeneous wan environments. In Proceedings of the ACM Symposium on Cloud Computing, SoCC '18, pages 505--505, New York, NY, USA, 2018. ACM.
    [23]
    Rankyung Hong and Abhishek Chandra. Dlion: Decentralized distributed deep learning in micro-clouds. In 11th USENIX Workshop on Hot Topics in Cloud Computing (HotCloud 19), Renton, WA, July 2019. USENIX Association.
    [24]
    Shlomo Hoory, Nathan Linial, and Avi Wigderson. Expander graphs and their applications. Bull. Amer. Math. Soc. 43 (2006), 439--561, 2006.
    [25]
    Kevin Hsieh, Aaron Harlap, Nandita Vijaykumar, Dimitris Konomis, Gregory R. Ganger, Phillip B. Gibbons, and Onur Mutlu. Gaia: Geo-distributed machine learning approaching LAN speeds. In 14th USENIX Symposium on Networked Systems Design and Implementation (NSDI 17), pages 629--647, Boston, MA, 2017. USENIX Association.
    [26]
    Sylvain Jeaugey. Nccl 2.0. GTC, 2017.
    [27]
    Xianyan Jia, Shutao Song, Wei He, Yangzihao Wang, Haidong Rong, Feihu Zhou, Liqiang Xie, Zhenyu Guo, Yuanzhou Yang, Liwei Yu, et al. Highly scalable deep learning training system with mixed-precision: Training imagenet in four minutes. arXiv preprint arXiv:1807.11205, 2018.
    [28]
    Zhihao Jia, Matei Zaharia, and Alex Aiken. Beyond data and model parallelism for deep neural networks. CoRR, abs/1807.05358, 2018.
    [29]
    Jiawei Jiang, Bin Cui, Ce Zhang, and Lele Yu. Heterogeneity-aware distributed parameter servers. In Proceedings of the 2017 ACM International Conference on Management of Data, SIGMOD '17, pages 463--478, New York, NY, USA, 2017. ACM.
    [30]
    Peng Jiang and Gagan Agrawal. Accelerating distributed stochastic gradient descent with adaptive periodic parameter averaging: Poster. In Proceedings of the 24th Symposium on Principles and Practice of Parallel Programming, PPoPP '19, pages 403--404, New York, NY, USA, 2019. ACM.
    [31]
    A. Krizhevsky and G. Hinton. Learning multiple layers of features from tiny images. Master's thesis, Department of Computer Science, University of Toronto, 2009.
    [32]
    Thorsten Kurth, Sean Treichler, Joshua Romero, Mayur Mudigonda, Nathan Luehr, Everett Phillips, Ankur Mahesh, Michael Matheson, Jack Deslippe, Massimiliano Fatica, et al. Exascale deep learning for climate analytics. In Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis, page 51. IEEE Press, 2018.
    [33]
    Mu Li. Scaling distributed machine learning with the parameter server. In International Conference on Big Data Science and Computing, page 3, 2014.
    [34]
    Youjie Li, Mingchao Yu, Songze Li, Salman Avestimehr, Nam Sung Kim, and Alexander Schwing. Pipe-sgd: A decentralized pipelined sgd framework for distributed deep net training. In Proceedings of the 32Nd International Conference on Neural Information Processing Systems, NIPS'18, pages 8056--8067, USA, 2018. Curran Associates Inc.
    [35]
    Xiangru Lian, Ce Zhang, Huan Zhang, Cho-Jui Hsieh, Wei Zhang, and Ji Liu. Can decentralized algorithms outperform centralized algorithms? a case study for decentralized parallel stochastic gradient descent. In I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, editors, Advances in Neural Information Processing Systems 30, pages 5330--5340. Curran Associates, Inc., 2017.
    [36]
    Xiangru Lian, Wei Zhang, Ce Zhang, and Ji Liu. Asynchronous decentralized parallel stochastic gradient descent. In Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsm"a ssan, Stockholm, Sweden, July 10--15, 2018, pages 3049--3058, 2018.
    [37]
    Liang Luo, Jacob Nelson, Luis Ceze, Amar Phanishayee, and Arvind Krishnamurthy. Parameter hub: a rack-scale parameter server for distributed deep neural network training. CoRR, abs/1805.07891, 2018.
    [38]
    Qinyi Luo, Jinkun Lin, Youwei Zhuo, and Xuehai Qian. Hop: Heterogeneity-aware decentralized training. In Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS '19, pages 893--907, New York, NY, USA, 2019. ACM.
    [39]
    Krishna Giri Narra, Zhifeng Lin, Mehrdad Kiamari, Salman Avestimehr, and Murali Annavaram. Slack squeeze coded computing for adaptive straggler mitigation. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC ??9, New York, NY, USA, 2019. Association for Computing Machinery.
    [40]
    Pitch Patarasuk and Xin Yuan. Bandwidth optimal all-reduce algorithms for clusters of workstations. J. Parallel Distrib. Comput., 69(2):117--124, February 2009.
    [41]
    Yanghua Peng, Yibo Zhu, Yangrui Chen, Yixin Bao, Bairen Yi, Chang Lan, Chuan Wu, and Chuanxiong Guo. A generic communication scheduler for distributed dnn training acceleration. In Proceedings of the 27th ACM Symposium on Operating Systems Principles, SOSP ??9, page 16-9, New York, NY, USA, 2019. Association for Computing Machinery.
    [42]
    Benjamin Recht, Christopher Re, Stephen Wright, and Feng Niu. Hogwild: A lock-free approach to parallelizing stochastic gradient descent. In J. Shawe-Taylor, R. S. Zemel, P. L. Bartlett, F. Pereira, and K. Q. Weinberger, editors, Advances in Neural Information Processing Systems 24, pages 693--701. Curran Associates, Inc., 2011.
    [43]
    Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, Alexander C. Berg, and Li Fei-Fei. ImageNet Large Scale Visual Recognition Challenge. International Journal of Computer Vision (IJCV), 115(3):211--252, 2015.
    [44]
    Alexander Sergeev and Mike Del Balso. Horovod: fast and easy distributed deep learning in TensorFlow. arXiv preprint arXiv:1802.05799, 2018.
    [45]
    Xiaogang Shi, Bin Cui, Yingxia Shao, and Yunhai Tong. Tornado: A system for real-time iterative analysis over evolving data. In Proceedings of the 2016 International Conference on Management of Data, SIGMOD '16, pages 417--430, New York, NY, USA, 2016. ACM.
    [46]
    David Silver, Aja Huang, Christopher J. Maddison, Arthur Guez, Laurent Sifre, George van den Driessche, Julian Schrittwieser, Ioannis Antonoglou, Veda Panneershelvam, Marc Lanctot, Sander Dieleman, Dominik Grewe, John Nham, Nal Kalchbrenner, Ilya Sutskever, Timothy Lillicrap, Madeleine Leach, Koray Kavukcuoglu, Thore Graepel, and Demis Hassabis. Mastering the game of go with deep neural networks and tree search. Nature, 529:484--503, 2016.
    [47]
    Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. CoRR, abs/1409.1556, 2014.
    [48]
    Linghao Song, Fan Chen, Youwei Zhuo, Xuehai Qian, Hai Li, and Yiran Chen. Accpar: Tensor partitioning for heterogeneous deep learning accelerator arrays. In 26th IEEE International Symposium on High Performance Computer Architecture, HPCA 2020, San Diego, CA, USA, February 22--26, 2020, page to appear, 2020. to appear.
    [49]
    Linghao Song, Jiachen Mao, Youwei Zhuo, Xuehai Qian, Hai Li, and Yiran Chen. Hypar: Towards hybrid parallelism for deep learning accelerator array. In 25th IEEE International Symposium on High Performance Computer Architecture, HPCA 2019, Washington, DC, USA, February 16--20, 2019, pages 56--68, 2019.
    [50]
    Peng Sun, Wansen Feng, Ruobing Han, Shengen Yan, and Yonggang Wen. Optimizing network performance for distributed dnn training on gpu clusters: Imagenet/alexnet training in 1.5 minutes. arXiv preprint arXiv:1902.06855, 2019.
    [51]
    Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. Going deeper with convolutions. In Computer Vision and Pattern Recognition (CVPR), 2015.
    [52]
    Hanlin Tang, Shaoduo Gan, Ce Zhang, Tong Zhang, and Ji Liu. Communication compression for decentralized training. In NeurIPS, 2018.
    [53]
    Hanlin Tang, Xiangru Lian, Ming Yan, Ce Zhang, and Ji Liu. $d^2$: Decentralized training over decentralized data. In Jennifer Dy and Andreas Krause, editors, Proceedings of the 35th International Conference on Machine Learning, volume 80 of Proceedings of Machine Learning Research, pages 4848--4856, Stockholmsm'ssan, Stockholm Sweden, 10--15 Jul 2018. PMLR.
    [54]
    Jörg Tiedemann. Parallel data, tools and interfaces in opus. In Nicoletta Calzolari (Conference Chair), Khalid Choukri, Thierry Declerck, Mehmet Ugur Dogan, Bente Maegaard, Joseph Mariani, Jan Odijk, and Stelios Piperidis, editors, Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC'12), Istanbul, Turkey, may 2012. European Language Resources Association (ELRA).
    [55]
    Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Ł ukasz Kaiser, and Illia Polosukhin. Attention is all you need. In I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, editors, Advances in Neural Information Processing Systems 30, pages 5998--6008. Curran Associates, Inc., 2017.
    [56]
    Jianyu Wang and Gauri Joshi. Adaptive communication strategies to achieve the best error-runtime trade-off in local-update sgd. ArXiv, abs/1810.08313, 2018.
    [57]
    Minjie Wang, Chien chin Huang, and Jinyang Li. Supporting very large models using automatic dataflow graph partitioning. 2018.
    [58]
    Minjie Wang, Chien-chin Huang, and Jinyang Li. Supporting very large models using automatic dataflow graph partitioning. In Proceedings of the Fourteenth EuroSys Conference 2019, EuroSys '9, New York, NY, USA, 2019. Association for Computing Machinery.
    [59]
    Eric P. Xing, Qirong Ho, Wei Dai, Jin-Kyu Kim, Jinliang Wei, Seunghak Lee, Xun Zheng, Pengtao Xie, Abhimanu Kumar, and Yaoliang Yu. Petuum: A new platform for distributed machine learning on big data. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '15, pages 1335--1344, New York, NY, USA, 2015. ACM.
    [60]
    Masafumi Yamazaki, Akihiko Kasagi, Akihiro Tabuchi, Takumi Honda, Masahiro Miwa, Naoto Fukumoto, Tsuguchika Tabaru, Atsushi Ike, and Kohta Nakashima. Yet another accelerated sgd: Resnet-50 training on imagenet in 74.7 seconds. arXiv preprint arXiv:1903.12650, 2019.
    [61]
    Kun-Hsing Yu, Andrew Beam, and Isaac Kohane. Artificial intelligence in healthcare. Nature Biomedical Engineering, 2, 10 2018.
    [62]
    Ce Zhang and Christopher Ré. Dimmwitted: A study of main-memory statistical analytics. Proceedings of the VLDB Endowment, 7(12):1283--1294, 2014.

    Cited By

    View all
    • (2024)HAP: SPMD DNN Training on Heterogeneous GPU Clusters with Automated Program SynthesisProceedings of the Nineteenth European Conference on Computer Systems10.1145/3627703.3629580(524-541)Online publication date: 22-Apr-2024
    • (2024)AdaPipe: Optimizing Pipeline Parallelism with Adaptive Recomputation and PartitioningProceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 310.1145/3620666.3651359(86-100)Online publication date: 27-Apr-2024
    • (2024)Heet: Accelerating Elastic Training in Heterogeneous Deep Learning ClustersProceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 210.1145/3620665.3640375(499-513)Online publication date: 27-Apr-2024
    • Show More Cited By

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    ASPLOS '20: Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems
    March 2020
    1412 pages
    ISBN:9781450371025
    DOI:10.1145/3373376
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    In-Cooperation

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 13 March 2020

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. decentralized training
    2. deep learning
    3. heterogeneity
    4. machine learning

    Qualifiers

    • Research-article

    Funding Sources

    Conference

    ASPLOS '20

    Acceptance Rates

    Overall Acceptance Rate 535 of 2,713 submissions, 20%

    Upcoming Conference

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)399
    • Downloads (Last 6 weeks)50
    Reflects downloads up to 10 Aug 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)HAP: SPMD DNN Training on Heterogeneous GPU Clusters with Automated Program SynthesisProceedings of the Nineteenth European Conference on Computer Systems10.1145/3627703.3629580(524-541)Online publication date: 22-Apr-2024
    • (2024)AdaPipe: Optimizing Pipeline Parallelism with Adaptive Recomputation and PartitioningProceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 310.1145/3620666.3651359(86-100)Online publication date: 27-Apr-2024
    • (2024)Heet: Accelerating Elastic Training in Heterogeneous Deep Learning ClustersProceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 210.1145/3620665.3640375(499-513)Online publication date: 27-Apr-2024
    • (2024)YOGA: Adaptive Layer-Wise Model Aggregation for Decentralized Federated LearningIEEE/ACM Transactions on Networking10.1109/TNET.2023.332900532:2(1768-1780)Online publication date: Apr-2024
    • (2024)Decentralized Federated Learning With Adaptive Configuration for Heterogeneous ParticipantsIEEE Transactions on Mobile Computing10.1109/TMC.2023.333540323:6(7453-7469)Online publication date: Jun-2024
    • (2024)Heter-Train: A Distributed Training Framework Based on Semi-Asynchronous Parallel Mechanism for Heterogeneous Intelligent Transportation SystemsIEEE Transactions on Intelligent Transportation Systems10.1109/TITS.2023.328640025:1(959-972)Online publication date: Jan-2024
    • (2023)Delay-agnostic asynchronous coordinate update algorithmProceedings of the 40th International Conference on Machine Learning10.5555/3618408.3619973(37582-37606)Online publication date: 23-Jul-2023
    • (2023)Resource-Aware Federated Hybrid Profiling for Edge Node Selection in Federated Patient Similarity NetworkApplied Sciences10.3390/app13241311413:24(13114)Online publication date: 8-Dec-2023
    • (2023)Dynamic Worker Classification Scheme for Addressing Straggler Problem in Distributed Deep Learning EnvironmentsThe Journal of Korean Institute of Information Technology10.14801/jkiit.2023.21.10.121:10(1-9)Online publication date: 31-Oct-2023
    • (2023)FusionFlow: Accelerating Data Preprocessing for Machine Learning with CPU-GPU CooperationProceedings of the VLDB Endowment10.14778/3636218.363623817:4(863-876)Online publication date: 1-Dec-2023
    • Show More Cited By

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Get Access

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media