Article

Free access

Machine learning with adversaries: byzantine tolerant gradient descent

Authors:

Peva Blanchard,

El Mahdi El Mhamdi,

Rachid Guerraoui,

Julien StainerAuthors Info & Claims

NIPS'17: Proceedings of the 31st International Conference on Neural Information Processing Systems

Pages 118 - 128

Published: 04 December 2017 Publication History

PDF eReader Publisher Site

Abstract

We study the resilience to Byzantine failures of distributed implementations of Stochastic Gradient Descent (SGD). So far, distributed machine learning frameworks have largely ignored the possibility of failures, especially arbitrary (i.e., Byzantine) ones. Causes of failures include software bugs, network asynchrony, biases in local datasets, as well as attackers trying to compromise the entire system. Assuming a set of n workers, up to f being Byzantine, we ask how resilient can SGD be, without limiting the dimension, nor the size of the parameter space. We first show that no gradient aggregation rule based on a linear combination of the vectors proposed by the workers (i.e, current approaches) tolerates a single Byzantine failure. We then formulate a resilience property of the aggregation rule capturing the basic requirements to guarantee convergence despite f Byzantine workers. We propose Krum, an aggregation rule that satisfies our resilience property, which we argue is the first provably Byzantine-resilient algorithm for distributed SGD. We also report on experimental evaluations of Krum.

References

[1]

M. Abadi, P. Barham, J. Chen, Z. Chen, A. Davis, J. Dean, M. Devin, S. Ghemawat, G. Irving, M. Isard, et al. Tensorflow: A system for large-scale machine learning. In Proceedings of the 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI). Savannah, Georgia, USA, 2016.

Digital Library

[2]

P. Blanchard, E. M. El Mhamdi, R. Guerraoui, and J. Stainer. Brief announcement: Byzantine-tolerant machine learning. In Proceedings of the ACM Symposium on Principles of Distributed Computing, PODC '17, pages 455-457, New York, NY, USA, 2017. ACM.

Digital Library

[3]

L. Bottou. Online learning and stochastic approximations. Online learning in neural networks, 17(9):142, 1998.

Digital Library

[4]

L. Bottou. Large-scale machine learning with stochastic gradient descent. In Proceedings of COMPSTAT'2010, pages 177-186. Springer, 2010.

[5]

M. B. Cohen, Y. T. Lee, G. Miller, J. Pachocki, and A. Sidford. Geometric median in nearly linear time. In Proceedings of the 48th Annual ACM SIGACT Symposium on Theory of Computing, pages 9-21. ACM, 2016.

Digital Library

[6]

J. Dean, G. Corrado, R. Monga, K. Chen, M. Devin, M. Mao, A. Senior, P. Tucker, K. Yang, Q. V. Le, et al. Large scale distributed deep networks. In Advances in neural information processing systems, pages 1223-1231, 2012.

Digital Library

[7]

D. L. Donoho and P. J. Huber. The notion of breakdown point. A festschrift for Erich L. Lehmann, 157184, 1983.

[8]

E. M. El Mhamdi and R. Guerraoui. When neurons fail. In 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS), pages 1028-1037, May 2017.

[9]

E. M. El Mhamdi, R. Guerraoui, and S. Rouault. On the robustness of a neural network. In 2017 IEEE 36th Symposium on Reliable Distributed Systems (SRDS), pages 84-93, Sept 2017.

[10]

A. Fawzi, S.-M. Moosavi-Dezfooli, and P. Frossard. Robustness of classifiers: from adversarial to random noise. In Advances in Neural Information Processing Systems, pages 1624-1632, 2016.

Digital Library

[11]

J. Feng, H. Xu, and S. Mannor. Outlier robust online learning. arXiv preprint arXiv:1701.00251, 2017.

[12]

R. Gemulla, E. Nijkamp, P. J. Haas, and Y. Sismanis. Large-scale matrix factorization with distributed stochastic gradient descent. In Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 69-77. ACM, 2011.

Digital Library

[13]

S. S. Haykin. Neural networks and learning machines, volume 3. Pearson Upper Saddle River, NJ, USA:, 2009.

[14]

M. Herlihy, S. Rajsbaum, M. Raynal, and J. Stainer. Computing in the presence of concurrent solo executions. In Latin American Symposium on Theoretical Informatics, pages 214-225. Springer, 2014.

[15]

J. Konečnỳ, B. McMahan, and D. Ramage. Federated optimization: Distributed optimization beyond the datacenter. arXiv preprint arXiv:1511.03575, 2015.

[16]

J. Konečnỳ, H. B. McMahan, F. X. Yu, P. Richtánk, A. T. Suresh, and D. Bacon. Federated learning: Strategies for improving communication efficiency. arXiv preprint arXiv:1610.05492, 2016.

[17]

L. Lamport, R. Shostak, and M. Pease. The byzantine generals problem. ACM Transactions on Programming Languages and Systems (TOPLAS), 4(3):382-401, 1982.

Digital Library

[18]

X. Lian, Y. Huang, Y. Li, and J. Liu. Asynchronous parallel stochastic gradient for nonconvex optimization. In Advances in Neural Information Processing Systems, pages 2737-2745, 2015.

Digital Library

[19]

M. Lichman. UCI machine learning repository, 2013.

[20]

LPD-EPFL. The implementation is part of a larger distributed framework to run sgd in a reliable distributed fashion and will be released in the github repository of the distributed computing group at epfl, https://github.com/lpd-epfl.

[21]

N. A. Lynch. Distributed algorithms. Morgan Kaufmann, 1996.

Digital Library

[22]

J. Markoff. How many computers to identify a cat? 16,000. New York Times, pages 06-25, 2012.

[23]

B. McMahan, E. Moore, D. Ramage, S. Hampson, and B. A. y Arcas. Communication-efficient learning of deep networks from decentralized data. In Artificial Intelligence and Statistics, pages 1273-1282, 2017.

[24]

H. Mendes and M. Herlihy. Multidimensional approximate agreement in byzantine asynchronous systems. In Proceedings of the forty-fifth annual ACM symposium on Theory of computing, pages 391-400. ACM, 2013.

Digital Library

[25]

B. T. Polyak and A. B. Juditsky. Acceleration of stochastic approximation by averaging. SIAM Journal on Control and Optimization, 30(4):838-855, 1992.

Digital Library

[26]

F. B. Schneider. Implementing fault-tolerant services using the state machine approach: A tutorial. ACM Computing Surveys (CSUR), 22(4):299-319, 1990.

Digital Library

[27]

R. K. Srivastava, K. Greff, and J. Schmidhuber. Training very deep networks. In Advances in neural information processing systems, pages 2377-2385, 2015.

Digital Library

[28]

L. Su and N. H. Vaidya. Fault-tolerant multi-agent optimization: optimal iterative distributed algorithms. In Proceedings of the 2016 ACM Symposium on Principles of Distributed Computing, pages 425-434. ACM, 2016.

Digital Library

[29]

L. Su and N. H. Vaidya. Non-bayesian learning in the presence of byzantine agents. In International Symposium on Distributed Computing, pages 414-127. Springer, 2016.

[30]

A. Trask, D. Gilmore, and M. Russell. Modeling order in neural word embeddings at scale. In ICML, pages 2266-2275, 2015.

[31]

J. Tsitsiklis, D. Bertsekas, and M. Athans. Distributed asynchronous deterministic and stochastic gradient optimization algorithms. IEEE transactions on automatic control, 31(9):803-812, 1986.

[32]

B. Wang, J. Gao, and Y. Qi. A theoretical framework for robustness of (deep) classifiers under adversarial noise. arXiv preprint arXiv:1612.00334, 2016.

[33]

S. Zhang, A. E. Choromanska, and Y. LeCun. Deep learning with elastic averaging sgd. In Advances in Neural Information Processing Systems, pages 685-693, 2015.

Digital Library

[34]

T. Zhang. Solving large scale linear prediction problems using stochastic gradient descent algorithms. In Proceedings of the twenty-first international conference on Machine learning, page 116. ACM, 2004.

Digital Library

Cited By

Jha RHayase JOh SOh ANaumann TGloberson ASaenko KHardt MLevine S(2023)Label poisoning is all you needProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3669233(71029-71052)Online publication date: 10-Dec-2023
https://dl.acm.org/doi/10.5555/3666122.3669233
Allouah YGuerraoui RGupta NPinot RRizk GOh ANaumann TGloberson ASaenko KHardt MLevine S(2023)Robust distributed learningProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3668104(45744-45776)Online publication date: 10-Dec-2023
https://dl.acm.org/doi/10.5555/3666122.3668104
Drichel AHolmes Bvon Brandt JMeyer UChen LMitchell CGiannetsos TSgandurra D(2021)The More, the BetterProceedings of the 3rd Workshop on Cyber-Security Arms Race10.1145/3474374.3486915(1-12)Online publication date: 19-Nov-2021
https://dl.acm.org/doi/10.1145/3474374.3486915
Show More Cited By

Recommendations

Distributed Statistical Machine Learning in Adversarial Settings: Byzantine Gradient Descent

We consider the distributed statistical learning problem over decentralized systems that are prone to adversarial attacks. This setup arises in many practical applications, including Google's Federated Learning. Formally, we focus on a decentralized ...
On Nonconvex Optimization for Machine Learning: Gradients, Stochasticity, and Saddle Points

Gradient descent (GD) and stochastic gradient descent (SGD) are the workhorses of large-scale machine learning. While classical theory focused on analyzing the performance of these methods in convex optimization problems, the most notable successes in ...
Adaptive stochastic conjugate gradient for machine learning
Abstract
Due to their faster convergence rate than gradient descent algorithms and less computational cost than second order algorithms, conjugate gradient (CG) algorithms have been widely used in machine learning. This paper considers ...
Highlights
- The efficacy of conjugate gradient with noisy gradients is verified.
- The linear ...

Comments

Information & Contributors

Information

Published In

cover image Guide Proceedings

NIPS'17: Proceedings of the 31st International Conference on Neural Information Processing Systems

December 2017

7104 pages

ISBN:9781510860964

Publisher

Curran Associates Inc.

Red Hook, NY, United States

Publication History

Published: 04 December 2017

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

15
Total Citations
View Citations
1,031
Total Downloads

Downloads (Last 12 months)348
Downloads (Last 6 weeks)59

Reflects downloads up to 30 Aug 2024

Other Metrics

View Author Metrics

Citations

Cited By

Jha RHayase JOh SOh ANaumann TGloberson ASaenko KHardt MLevine S(2023)Label poisoning is all you needProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3669233(71029-71052)Online publication date: 10-Dec-2023
https://dl.acm.org/doi/10.5555/3666122.3669233
Allouah YGuerraoui RGupta NPinot RRizk GOh ANaumann TGloberson ASaenko KHardt MLevine S(2023)Robust distributed learningProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3668104(45744-45776)Online publication date: 10-Dec-2023
https://dl.acm.org/doi/10.5555/3666122.3668104
Drichel AHolmes Bvon Brandt JMeyer UChen LMitchell CGiannetsos TSgandurra D(2021)The More, the BetterProceedings of the 3rd Workshop on Cyber-Security Arms Race10.1145/3474374.3486915(1-12)Online publication date: 19-Nov-2021
https://dl.acm.org/doi/10.1145/3474374.3486915
Chen JZhang RGuo JFan YCheng XDemartini GZuccon GCulpepper JHuang ZTong H(2021)FedMatchProceedings of the 30th ACM International Conference on Information & Knowledge Management10.1145/3459637.3482345(181-190)Online publication date: 26-Oct-2021
https://dl.acm.org/doi/10.1145/3459637.3482345
Liu YZhao RKang JYassine ANiyato DPeng J(2021)Towards Communication-Efficient and Attack-Resistant Federated Edge Learning for Industrial Internet of ThingsACM Transactions on Internet Technology10.1145/345316922:3(1-22)Online publication date: 6-Dec-2021
https://dl.acm.org/doi/10.1145/3453169
Jiang DTan CPeng JChen CWu XZhao WSong YTong YLiu CXu QYang QDeng L(2021)A GDPR-compliant Ecosystem for Speech Recognition with Transfer, Federated, and Evolutionary LearningACM Transactions on Intelligent Systems and Technology10.1145/344768712:3(1-19)Online publication date: 5-May-2021
https://dl.acm.org/doi/10.1145/3447687
Kasyap HTripathy S(2021)Privacy-preserving Decentralized Learning Framework for Healthcare SystemACM Transactions on Multimedia Computing, Communications, and Applications10.1145/342647417:2s(1-24)Online publication date: 14-Jun-2021
https://dl.acm.org/doi/10.1145/3426474
Wang DWen SJolfaei AHaghighi MNepal SXiang Y(2021)On the Neural Backdoor of Federated Generative Models in Edge ComputingACM Transactions on Internet Technology10.1145/342566222:2(1-21)Online publication date: 22-Oct-2021
https://dl.acm.org/doi/10.1145/3425662
Sohn JHan DChoi BMoon JLarochelle HRanzato MHadsell RBalcan MLin H(2020)Election coding for distributed learningProceedings of the 34th International Conference on Neural Information Processing Systems10.5555/3495724.3496949(14615-14625)Online publication date: 6-Dec-2020
https://dl.acm.org/doi/10.5555/3495724.3496949
Gupta NVaidya NEmek YCachin C(2020)Fault-Tolerance in Distributed Optimization: The Case of RedundancyProceedings of the 39th Symposium on Principles of Distributed Computing10.1145/3382734.3405748(365-374)Online publication date: 31-Jul-2020
https://dl.acm.org/doi/10.1145/3382734.3405748
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents