Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.5555/3294771.3294783guideproceedingsArticle/Chapter ViewAbstractPublication PagesnipsConference Proceedingsconference-collections
Article
Free access

Machine learning with adversaries: byzantine tolerant gradient descent

Published: 04 December 2017 Publication History

Abstract

We study the resilience to Byzantine failures of distributed implementations of Stochastic Gradient Descent (SGD). So far, distributed machine learning frameworks have largely ignored the possibility of failures, especially arbitrary (i.e., Byzantine) ones. Causes of failures include software bugs, network asynchrony, biases in local datasets, as well as attackers trying to compromise the entire system. Assuming a set of n workers, up to f being Byzantine, we ask how resilient can SGD be, without limiting the dimension, nor the size of the parameter space. We first show that no gradient aggregation rule based on a linear combination of the vectors proposed by the workers (i.e, current approaches) tolerates a single Byzantine failure. We then formulate a resilience property of the aggregation rule capturing the basic requirements to guarantee convergence despite f Byzantine workers. We propose Krum, an aggregation rule that satisfies our resilience property, which we argue is the first provably Byzantine-resilient algorithm for distributed SGD. We also report on experimental evaluations of Krum.

References

[1]
M. Abadi, P. Barham, J. Chen, Z. Chen, A. Davis, J. Dean, M. Devin, S. Ghemawat, G. Irving, M. Isard, et al. Tensorflow: A system for large-scale machine learning. In Proceedings of the 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI). Savannah, Georgia, USA, 2016.
[2]
P. Blanchard, E. M. El Mhamdi, R. Guerraoui, and J. Stainer. Brief announcement: Byzantine-tolerant machine learning. In Proceedings of the ACM Symposium on Principles of Distributed Computing, PODC '17, pages 455-457, New York, NY, USA, 2017. ACM.
[3]
L. Bottou. Online learning and stochastic approximations. Online learning in neural networks, 17(9):142, 1998.
[4]
L. Bottou. Large-scale machine learning with stochastic gradient descent. In Proceedings of COMPSTAT'2010, pages 177-186. Springer, 2010.
[5]
M. B. Cohen, Y. T. Lee, G. Miller, J. Pachocki, and A. Sidford. Geometric median in nearly linear time. In Proceedings of the 48th Annual ACM SIGACT Symposium on Theory of Computing, pages 9-21. ACM, 2016.
[6]
J. Dean, G. Corrado, R. Monga, K. Chen, M. Devin, M. Mao, A. Senior, P. Tucker, K. Yang, Q. V. Le, et al. Large scale distributed deep networks. In Advances in neural information processing systems, pages 1223-1231, 2012.
[7]
D. L. Donoho and P. J. Huber. The notion of breakdown point. A festschrift for Erich L. Lehmann, 157184, 1983.
[8]
E. M. El Mhamdi and R. Guerraoui. When neurons fail. In 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS), pages 1028-1037, May 2017.
[9]
E. M. El Mhamdi, R. Guerraoui, and S. Rouault. On the robustness of a neural network. In 2017 IEEE 36th Symposium on Reliable Distributed Systems (SRDS), pages 84-93, Sept 2017.
[10]
A. Fawzi, S.-M. Moosavi-Dezfooli, and P. Frossard. Robustness of classifiers: from adversarial to random noise. In Advances in Neural Information Processing Systems, pages 1624-1632, 2016.
[11]
J. Feng, H. Xu, and S. Mannor. Outlier robust online learning. arXiv preprint arXiv:1701.00251, 2017.
[12]
R. Gemulla, E. Nijkamp, P. J. Haas, and Y. Sismanis. Large-scale matrix factorization with distributed stochastic gradient descent. In Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 69-77. ACM, 2011.
[13]
S. S. Haykin. Neural networks and learning machines, volume 3. Pearson Upper Saddle River, NJ, USA:, 2009.
[14]
M. Herlihy, S. Rajsbaum, M. Raynal, and J. Stainer. Computing in the presence of concurrent solo executions. In Latin American Symposium on Theoretical Informatics, pages 214-225. Springer, 2014.
[15]
J. Konečnỳ, B. McMahan, and D. Ramage. Federated optimization: Distributed optimization beyond the datacenter. arXiv preprint arXiv:1511.03575, 2015.
[16]
J. Konečnỳ, H. B. McMahan, F. X. Yu, P. Richtánk, A. T. Suresh, and D. Bacon. Federated learning: Strategies for improving communication efficiency. arXiv preprint arXiv:1610.05492, 2016.
[17]
L. Lamport, R. Shostak, and M. Pease. The byzantine generals problem. ACM Transactions on Programming Languages and Systems (TOPLAS), 4(3):382-401, 1982.
[18]
X. Lian, Y. Huang, Y. Li, and J. Liu. Asynchronous parallel stochastic gradient for nonconvex optimization. In Advances in Neural Information Processing Systems, pages 2737-2745, 2015.
[19]
M. Lichman. UCI machine learning repository, 2013.
[20]
LPD-EPFL. The implementation is part of a larger distributed framework to run sgd in a reliable distributed fashion and will be released in the github repository of the distributed computing group at epfl, https://github.com/lpd-epfl.
[21]
N. A. Lynch. Distributed algorithms. Morgan Kaufmann, 1996.
[22]
J. Markoff. How many computers to identify a cat? 16,000. New York Times, pages 06-25, 2012.
[23]
B. McMahan, E. Moore, D. Ramage, S. Hampson, and B. A. y Arcas. Communication-efficient learning of deep networks from decentralized data. In Artificial Intelligence and Statistics, pages 1273-1282, 2017.
[24]
H. Mendes and M. Herlihy. Multidimensional approximate agreement in byzantine asynchronous systems. In Proceedings of the forty-fifth annual ACM symposium on Theory of computing, pages 391-400. ACM, 2013.
[25]
B. T. Polyak and A. B. Juditsky. Acceleration of stochastic approximation by averaging. SIAM Journal on Control and Optimization, 30(4):838-855, 1992.
[26]
F. B. Schneider. Implementing fault-tolerant services using the state machine approach: A tutorial. ACM Computing Surveys (CSUR), 22(4):299-319, 1990.
[27]
R. K. Srivastava, K. Greff, and J. Schmidhuber. Training very deep networks. In Advances in neural information processing systems, pages 2377-2385, 2015.
[28]
L. Su and N. H. Vaidya. Fault-tolerant multi-agent optimization: optimal iterative distributed algorithms. In Proceedings of the 2016 ACM Symposium on Principles of Distributed Computing, pages 425-434. ACM, 2016.
[29]
L. Su and N. H. Vaidya. Non-bayesian learning in the presence of byzantine agents. In International Symposium on Distributed Computing, pages 414-127. Springer, 2016.
[30]
A. Trask, D. Gilmore, and M. Russell. Modeling order in neural word embeddings at scale. In ICML, pages 2266-2275, 2015.
[31]
J. Tsitsiklis, D. Bertsekas, and M. Athans. Distributed asynchronous deterministic and stochastic gradient optimization algorithms. IEEE transactions on automatic control, 31(9):803-812, 1986.
[32]
B. Wang, J. Gao, and Y. Qi. A theoretical framework for robustness of (deep) classifiers under adversarial noise. arXiv preprint arXiv:1612.00334, 2016.
[33]
S. Zhang, A. E. Choromanska, and Y. LeCun. Deep learning with elastic averaging sgd. In Advances in Neural Information Processing Systems, pages 685-693, 2015.
[34]
T. Zhang. Solving large scale linear prediction problems using stochastic gradient descent algorithms. In Proceedings of the twenty-first international conference on Machine learning, page 116. ACM, 2004.

Cited By

View all
  • (2023)Label poisoning is all you needProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3669233(71029-71052)Online publication date: 10-Dec-2023
  • (2023)Robust distributed learningProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3668104(45744-45776)Online publication date: 10-Dec-2023
  • (2021)The More, the BetterProceedings of the 3rd Workshop on Cyber-Security Arms Race10.1145/3474374.3486915(1-12)Online publication date: 19-Nov-2021
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Guide Proceedings
NIPS'17: Proceedings of the 31st International Conference on Neural Information Processing Systems
December 2017
7104 pages

Publisher

Curran Associates Inc.

Red Hook, NY, United States

Publication History

Published: 04 December 2017

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)348
  • Downloads (Last 6 weeks)59
Reflects downloads up to 30 Aug 2024

Other Metrics

Citations

Cited By

View all
  • (2023)Label poisoning is all you needProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3669233(71029-71052)Online publication date: 10-Dec-2023
  • (2023)Robust distributed learningProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3668104(45744-45776)Online publication date: 10-Dec-2023
  • (2021)The More, the BetterProceedings of the 3rd Workshop on Cyber-Security Arms Race10.1145/3474374.3486915(1-12)Online publication date: 19-Nov-2021
  • (2021)FedMatchProceedings of the 30th ACM International Conference on Information & Knowledge Management10.1145/3459637.3482345(181-190)Online publication date: 26-Oct-2021
  • (2021)Towards Communication-Efficient and Attack-Resistant Federated Edge Learning for Industrial Internet of ThingsACM Transactions on Internet Technology10.1145/345316922:3(1-22)Online publication date: 6-Dec-2021
  • (2021)A GDPR-compliant Ecosystem for Speech Recognition with Transfer, Federated, and Evolutionary LearningACM Transactions on Intelligent Systems and Technology10.1145/344768712:3(1-19)Online publication date: 5-May-2021
  • (2021)Privacy-preserving Decentralized Learning Framework for Healthcare SystemACM Transactions on Multimedia Computing, Communications, and Applications10.1145/342647417:2s(1-24)Online publication date: 14-Jun-2021
  • (2021)On the Neural Backdoor of Federated Generative Models in Edge ComputingACM Transactions on Internet Technology10.1145/342566222:2(1-21)Online publication date: 22-Oct-2021
  • (2020)Election coding for distributed learningProceedings of the 34th International Conference on Neural Information Processing Systems10.5555/3495724.3496949(14615-14625)Online publication date: 6-Dec-2020
  • (2020)Fault-Tolerance in Distributed Optimization: The Case of RedundancyProceedings of the 39th Symposium on Principles of Distributed Computing10.1145/3382734.3405748(365-374)Online publication date: 31-Jul-2020
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media