Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3489517.3530417acmconferencesArticle/Chapter ViewAbstractPublication PagesdacConference Proceedingsconference-collections
research-article

Sign bit is enough: a learning synchronization framework for multi-hop all-reduce with ultimate compression

Published: 23 August 2022 Publication History

Abstract

Traditional one-bit compressed stochastic gradient descent can not be directly employed in multi-hop all-reduce, a widely adopted distributed training paradigm in network-intensive high-performance computing systems such as public clouds. According to our theoretical findings, due to the cascading compression, the training process has considerable deterioration on the convergence performance. To overcome this limitation, we implement a sign-bit compression-based learning synchronization framework, Marsit. It prevents cascading compression via an elaborate bit-wise operation for unbiased sign aggregation and its specific global compensation mechanism for mitigating compression deviation. The proposed framework retains the same theoretical convergence rate as non-compression mechanisms. Experimental results demonstrate that Marsit reduces up to 35% training time while preserving the same accuracy as training without compression.

References

[1]
F. Pérez-Hernández, S. Tabik, A. Lamas, R. Olmos, H. Fujita, and F. Herrera, "Object detection binary classifiers methodology based on deep learning to identify small objects handled similarly: Application in video surveillance," Knowledge-Based Systems, 2020.
[2]
K. Roy, S. Debdas, S. Kundu, S. Chouhan, S. Mohanty, and B. Biswas, "Application of natural language processing in healthcare," Computational Intelligence and Healthcare Informatics, 2021.
[3]
O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein et al., "Imagenet large scale visual recognition challenge," International journal of computer vision, 2015.
[4]
Baidu-Research, "tensorflow-allreduce," [Source Code]. https://github.com/baidu-research/tensorflow-allreduce, 2017.
[5]
A. Sergeev and M. Del Balso, "Horovod: fast and easy distributed deep learning in tensorflow," arXiv preprint arXiv:1802.05799, 2018.
[6]
H. Mikami, H. Suganuma, P. U-chupala, Y. Tanaka, and Y. Kageyama, "Massively distributed sgd: Imagenet/resnet-50 training in a flash," arXiv preprint arXiv:1811.05233, 2018.
[7]
M. Li, D. G. Andersen, J. W. Park, A. J. Smola, A. Ahmed, V. Josifovski, J. Long, E. J. Shekita, and B.-Y. Su, "Scaling distributed machine learning with the parameter server," in OSDI, 2014.
[8]
Y. Lu and C. De Sa, "Optimal complexity in decentralized training," in ICML, 2021.
[9]
T. Lin, S. P. Karimireddy, S. U. Stich, and M. Jaggi, "Quasi-global momentum: Accelerating decentralized deep learning on heterogeneous data," in ICML, 2021.
[10]
Y. Chen, K. Yuan, Y. Zhang, P. Pan, Y. Xu, and W. Yin, "Accelerating gossip sgd with periodic global averaging," in ICML, 2021.
[11]
K. He, X. Zhang, S. Ren, and J. Sun, "Deep residual learning for image recognition," in CVPR, 2016.
[12]
T. B. Brown, B. Mann, N. Ryder, M. Subbiah, J. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell et al., "Language models are few-shot learners," in NeurIPS, 2020.
[13]
J. Bernstein, J. Zhao, K. Azizzadenesheli, and A. Anandkumar, "signsgd with majority vote is communication efficient and fault tolerant," in ICLR, 2018.
[14]
M. Safaryan and P. Richtárik, "Stochastic sign descent methods: New algorithms and better theory," in ICML, 2021.
[15]
S. Liu, P.-Y. Chen, X. Chen, and M. Hong, "signsgd via zeroth-order oracle," in ICLR, 2018.
[16]
H. Tang, S. Gan, A. A. Awan, S. Rajbhandari, C. Li, X. Lian, J. Liu, C. Zhang, and Y. He, "1-bit adam: Communication efficient large-scale training with adam's convergence speed," in ICML, 2021.
[17]
W. Wen, C. Xu, F. Yan, C. Wu, Y. Wang, Y. Chen, and H. Li, "Terngrad: Ternary gradients to reduce communication in distributed deep learning," in NeurIPS, 2017.
[18]
D. Alistarh, D. Grubic, J. Li, R. Tomioka, and M. Vojnovic, "Qsgd: Communication-efficient sgd via gradient quantization and encoding," in NeurIPS, 2017.
[19]
H. Zhang, J. Li, K. Kara, D. Alistarh, J. Liu, and C. Zhang, "Zipml: Training linear models with end-to-end low precision, and a little bit of deep learning," in ICML, 2017.
[20]
M. Yu, Z. Lin, K. Narra, S. Li, Y. Li, N. S. Kim, A. Schwing, M. Annavaram, and S. Avestimehr, "Gradiveq: Vector quantization for bandwidth-efficient gradient aggregation in distributed cnn training," in NeurIPS, 2018.
[21]
J. Bernstein, Y.-X. Wang, K. Azizzadenesheli, and A. Anandkumar, "signsgd: Compressed optimisation for non-convex problems," in ICML, 2018.
[22]
J. Wangni, J. Wang, J. Liu, and T. Zhang, "Gradient sparsification for communication-efficient distributed optimization," in NeurIPS, 2018.
[23]
J. Guo, S. Hu, W. Wang, C. Yao, J. Han, R. Li, and Y. Lu, "Tail: an automated and lightweight gradient compression framework for distributed deep learning," in DAC, 2020.
[24]
T. Vogels, S. P. Karinireddy, and M. Jaggi, "Powersgd: Practical low-rank gradient compression for distributed optimization," NeurIPS, 2019.
[25]
X. Jia, S. Song, W. He, Y. Wang, H. Rong, F. Zhou, L. Xie, Z. Guo, Y. Yang, L. Yu et al., "Highly scalable deep learning training system with mixed-precision: Training imagenet in four minutes," arXiv preprint arXiv:1807.11205, 2018.
[26]
A. Krizhevsky, G. Hinton et al., "Learning multiple layers of features from tiny images," 2009.
[27]
A. L. Maas, R. E. Daly, P. T. Pham, D. Huang, A. Y. Ng, and C. Potts, "Learning word vectors for sentiment analysis," in Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, 2011.
[28]
A. Krizhevsky, I. Sutskever, and G. E. Hinton, "Imagenet classification with deep convolutional neural networks," NeurIPS, 2012.
[29]
V. Sanh, L. Debut, J. Chaumond, and T. Wolf, "Distilbert, a distilled version of bert: smaller, faster, cheaper and lighter," arXiv preprint arXiv:1910.01108, 2019.
[30]
S. P. Karimireddy, Q. Rebjock, S. Stich, and M. Jaggi, "Error feedback fixes signsgd and other gradient compression schemes," in ICML, 2019.
[31]
P. Elias, "Universal codeword sets and representations of the integers," IEEE transactions on information theory, 1975.

Cited By

View all
  • (2024)Distributed Analytics For Big DataNeurocomputing10.1016/j.neucom.2024.127258574:COnline publication date: 17-Apr-2024
  • (2023)BirderProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3667839(39529-39552)Online publication date: 10-Dec-2023
  • (2023)Dynamic Pricing for Client Recruitment in Federated LearningIEEE/ACM Transactions on Networking10.1109/TNET.2023.331220832:2(1273-1286)Online publication date: 11-Sep-2023

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
DAC '22: Proceedings of the 59th ACM/IEEE Design Automation Conference
July 2022
1462 pages
ISBN:9781450391429
DOI:10.1145/3489517
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 23 August 2022

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. distributed machine learning
  2. multi-hop all-reduce
  3. signSGD

Qualifiers

  • Research-article

Funding Sources

Conference

DAC '22
Sponsor:
DAC '22: 59th ACM/IEEE Design Automation Conference
July 10 - 14, 2022
California, San Francisco

Acceptance Rates

Overall Acceptance Rate 1,770 of 5,499 submissions, 32%

Upcoming Conference

DAC '25
62nd ACM/IEEE Design Automation Conference
June 22 - 26, 2025
San Francisco , CA , USA

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)34
  • Downloads (Last 6 weeks)0
Reflects downloads up to 27 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Distributed Analytics For Big DataNeurocomputing10.1016/j.neucom.2024.127258574:COnline publication date: 17-Apr-2024
  • (2023)BirderProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3667839(39529-39552)Online publication date: 10-Dec-2023
  • (2023)Dynamic Pricing for Client Recruitment in Federated LearningIEEE/ACM Transactions on Networking10.1109/TNET.2023.331220832:2(1273-1286)Online publication date: 11-Sep-2023

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media