Abstract
In distributed gradient descent based machine learning model training, workers periodically upload locally computed gradients or weights to the parameter server (PS). Byzantine attacks take place when some workers upload wrong gradients or weights, i.e., the information received by the PS is not always the true values computed by workers. Approaches such as score-based, median-based, and distance-based defense algorithms were proposed previously, but all of them made the asumptions: (1) the dataset on each worker is independent and identically distributed (i.i.d.), and (2) the majority of all participating workers are honest. These assumptions are not realistic in federated learning where each worker may keep its non-i.i.d. private dataset and malicious workers may take over the majority in some iterations. In this paper, we propose a novel reference dataset based algorithm along with a practical Two-Filter algorithm (ToFi) to defend against Byzantine attacks in federated learning. Our experiments highlight the effectiveness of our algorithm compared with previous algorithms in different settings.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Alistarh, D., Allen-Zhu, Z., Li, J.: Byzantine stochastic gradient descent. CoRR abs/1803.08917 (2018). http://arxiv.org/abs/1803.08917
Blanchard, P., El Mhamdi, E.M., Guerraoui, R., Stainer, J.: Machine learning with adversaries: byzantine tolerant gradient descent. In: Guyon, I., et al. (eds.) Advances in Neural Information Processing Systems, vol. 30, pp. 119–129. Curran Associates, Inc. (2017). http://papers.nips.cc/paper/6617-machine-learning-with-adversaries-byzantine-tolerant-gradient-descent.pdf
Chen, Y., Su, L., Xu, J.: Distributed statistical machine learning in adversarial settings: Byzantine gradient descent. Proc. ACM Meas. Anal. Comput. Syst. 1(2), 1–25 (2017). https://doi.org/10.1145/3154503
Damaskinos, G., El Mhamdi, E.M., Guerraoui, R., Patra, R., Taziki, M.: Asynchronous Byzantine machine learning (the case of SGD). In: Dy, J., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 80, pp. 1145–1154. PMLR, Stockholmsmässan, Stockholm Sweden (10–15 Jul 2018). http://proceedings.mlr.press/v80/damaskinos18a.html
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016). https://doi.org/10.1109/CVPR.2016.90
Konecný, J., McMahan, H., Yu, F., Richtárik, P., Suresh, A., Bacon, D.: Federated learning: strategies for improving communication efficiency. CoRR abs/1610.05492 (2016)
Konstantinov, N., Lampert, C.: Robust learning from untrusted sources. CoRR abs/1901.10310 (2019). http://arxiv.org/abs/1901.10310
Krizhevsky, A.: Learning multiple layers of features from tiny images. Technical report, Google (2009)
Mao, Y., Hong, W., Wang, H., Li, Q., Zhong, S.: Privacy-preserving computation offloading for parallel deep neural networks training. IEEE Trans. Parallel Distrib. Syst. 32(7), 1777–1788 (2021). https://doi.org/10.1109/TPDS.2020.3040734
Mao, Y., Yi, S., Li, Q., Feng, J., Xu, F., Zhong, S.: Learning from differentially private neural activations with edge computing. In: 2018 IEEE/ACM Symposium on Edge Computing (SEC), pp. 90–102 (2018). https://doi.org/10.1109/SEC.2018.00014
McMahan, H.B., Moore, E., Ramage, D., Hampson, S., Arcas, B.A.: Communication-efficient learning of deep networks from decentralized data. In: Proceedings of the 20th International Conference on Artificial Intelligence and Statistics (AISTATS) (2017). http://arxiv.org/abs/1602.05629
Paudice, A., Muñoz-González, L., Lupu, E.C.: Label sanitization against label flipping poisoning attacks. In: Alzate, C., et al. (eds.) ECML PKDD 2018 Workshops, pp. 5–15. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-13453-2_1
Tao, Z., Li, Q.: eSGD: Communication efficient distributed deep learning on the edge. In: USENIX Workshop on Hot Topics in Edge Computing (HotEdge 18). USENIX Association, Boston, MA, July 2018
Tao, Z., et al.: A survey of virtual machine management in edge computing. Proc. IEEE 107(8), 1482–1499 (2019). https://doi.org/10.1109/JPROC.2019.2927919
Wu, Y., He, K.: Group normalization. Int. J. Comput. Vis. 128(3), 742–755 (2020). https://doi.org/10.1007/s11263-019-01198-w
Xia, Q., Tao, Z., Hao, Z., Li, Q.: FABA: an algorithm for fast aggregation against byzantine attacks in distributed neural networks. In: Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, IJCAI-19, pp. 4824–4830. International Joint Conferences on Artificial Intelligence Organization (July 2019). https://doi.org/10.24963/ijcai.2019/670
Xia, Q., Tao, Z., Li, Q.: Defenses against byzantine attacks in distributed deep neural networks. IEEE Transactions on Network Science and Engineering (2020). https://doi.org/10.1109/TNSE.2020.3035112
Xia, Q., Tao, Z., Li, Q.: Privacy issues in edge computing. In: Chang, W., Wu, J. (eds.) Fog/Edge Computing For Security, Privacy, and Applications. AIS, vol. 83, pp. 147–169. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-57328-7_6
Xia, Q., Ye, W., Tao, Z., Wu, J., Li, Q.: A survey of federated learning for edge computing: Research problems and solutions. High-Confidence Computing (2021). https://doi.org/10.1016/j.hcc.2021.100008
Xie, C., Koyejo, O., Gupta, I.: Generalized byzantine-tolerant SGD. CoRR abs/1802.10116 (2018). http://arxiv.org/abs/1802.10116
Xie, C., Koyejo, O., Gupta, I.: Zeno++: Robust fully asynchronous sgd (2020). https://openreview.net/forum?id=rygHe64FDS
Xie, C., Koyejo, S., Gupta, I.: Zeno: Distributed stochastic gradient descent with suspicion-based fault-tolerance. In: Chaudhuri, K., Salakhutdinov, R. (eds.) Proceedings of the 36th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 97, pp. 6893–6901. PMLR, Long Beach, California, USA (09–15 Jun 2019). http://proceedings.mlr.press/v97/xie19b.html
Lecun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998). https://doi.org/10.1109/5.726791
Yin, D., Chen, Y., Kannan, R., Bartlett, P.: Byzantine-robust distributed learning: towards optimal statistical rates. In: Dy, J., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 80, pp. 5650–5659. PMLR, Stockholmsmässan, Stockholm Sweden (10–15 Jul 2018). http://proceedings.mlr.press/v80/yin18a.html
Zhang, M., Hu, L., Shi, C., Wang, X.: Adversarial label-flipping attack and defense for graph neural networks. In: 2020 IEEE International Conference on Data Mining (ICDM), pp. 791–800 (2020). https://doi.org/10.1109/ICDM50108.2020.00088
Acknowledgements
We thank all the reviewers for their constructive comments. This project was supported in part by US National Science Foundation grant CNS-1816399. This work was also supported in part by the Commonwealth Cyber Initiative, an investment in the advancement of cyber R&D, innovation and workforce development. For more information about CCI, visit cyberinitiative.org.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
A More Experiments on MNIST
A More Experiments on MNIST
All of the experiments in this section have the same setting as the main paper. We will mark any differences if there are. In this section, we supplement some results of experiments on the MNIST dataset and more workers [23]. The model we are using is LeNet-5 [23].
1.1 A.1 Federated Learning with All Node Participation
Naive Heterogeneous Environment. We compare our ToFi with three classic methods and ground truth (filter out all Byzantine attacks, average aggregation) in three Byzantine environments (Gaussian, wrong label, and one bit) we described in the main paper and no Byzantine environment. The distributed environment that we use here is the naive heterogeneous environment. We use 10 as interval length. The results are in Fig. 6. From this figure, we can see that the performance of Krum is not as good as ToFi, GeoMedian, and FABA, while these three methods have very similar performance in the naive heterogeneous environment for these three different types of attacks. As for the no Byzantine scenario, the performances are similar among ToFi, GeoMedian, and FABA, while Krum has a lower accuracy than those algorithms.
Enhanced Heterogeneous Environment. In order to show the difference, we compare ToFi with three classic methods and ground truth (filter out all Byzantine attacks, average aggregation) in three Byzantine environments (Gaussian, wrong label, and one bit) and no Byzantine environment. This time we change the distributed environment to the enhanced heterogeneous environment. We use 10 as interval length. The results are in Fig. 7. From this figure, we can see that ToFi has much better performance than Krum, FABA, and GeoMedian. GeoMedian has the second-best performance for Gaussian and one bit attacks. FABA has the second-best performance for wrong label attacks and no Byzantine scenario. But both of them have a significant accuracy decline than our algorithm. Krum has the worst performance in the enhanced heterogeneous environment.
More Workers Experiment. Because of the limitation of the hardware, we cannot make experiments for more workers than 8 on the CIFAR-10 dataset. Here we only examine the scenario with more workers on the MNIST dataset. In this experiment, we choose 32 workers, among which 8 out of 32 workers are Byzantine workers. To show the difference, we examine this setting in the enhanced heterogeneous environment. The results are in Fig. 8. From Fig. 8, it has a very similar performance with the 8-worker scenario. ToFi still outperforms other algorithms. For the Gaussian attack, ToFi has a similar performance with FABA and beats all other algorithms. For wrong label attack and one bit attack, ToFi performs much better than others. The best performance here is not as good as no Byzantine attack case. It is because in the experiment we fixed the workers who suffer Byzantine attacks. Since in this experiment we use the enhanced heterogeneous environment, the data with some labels may be hidden by the Byzantine workers. This will cause a decrease in the accuracy for the best performance.
1.2 A.2 Federated Learning with Partial Node Participation
We compare Krum, GeoMedian, FABA and Zeno with ToFi in the federated learning environment using similar setting with CIFAR-10 dataset. The results are in Fig. 9. We can see that ToFi outperforms all other algorithms. All the other algorithms are not designed for federated learning and thus have very bad performance.
Rights and permissions
Copyright information
© 2021 ICST Institute for Computer Sciences, Social Informatics and Telecommunications Engineering
About this paper
Cite this paper
Xia, Q., Tao, Z., Li, Q. (2021). ToFi: An Algorithm to Defend Against Byzantine Attacks in Federated Learning. In: Garcia-Alfaro, J., Li, S., Poovendran, R., Debar, H., Yung, M. (eds) Security and Privacy in Communication Networks. SecureComm 2021. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol 398. Springer, Cham. https://doi.org/10.1007/978-3-030-90019-9_12
Download citation
DOI: https://doi.org/10.1007/978-3-030-90019-9_12
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-90018-2
Online ISBN: 978-3-030-90019-9
eBook Packages: Computer ScienceComputer Science (R0)