Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Auto-weighted Robust Federated Learning with Corrupted Data Sources

Published: 11 June 2022 Publication History

Abstract

Federated learning provides a communication-efficient and privacy-preserving training process by enabling learning statistical models with massive participants without accessing their local data. Standard federated learning techniques that naively minimize an average loss function are vulnerable to data corruptions from outliers, systematic mislabeling, or even adversaries. In this article, we address this challenge by proposing Auto-weighted Robust Federated Learning (ARFL), a novel approach that jointly learns the global model and the weights of local updates to provide robustness against corrupted data sources. We prove a learning bound on the expected loss with respect to the predictor and the weights of clients, which guides the definition of the objective for robust federated learning. We present an objective that minimizes the weighted sum of empirical risk of clients with a regularization term, where the weights can be allocated by comparing the empirical risk of each client with the average empirical risk of the best \(p\) clients. This method can downweight the clients with significantly higher losses, thereby lowering their contributions to the global model. We show that this approach achieves robustness when the data of corrupted clients is distributed differently from the benign ones. To optimize the objective function, we propose a communication-efficient algorithm based on the blockwise minimization paradigm. We conduct extensive experiments on multiple benchmark datasets, including CIFAR-10, FEMNIST, and Shakespeare, considering different neural network models. The results show that our solution is robust against different scenarios, including label shuffling, label flipping, and noisy features, and outperforms the state-of-the-art methods in most scenarios.
Appendices

A Proof of Theorem 1

Proof.
Write:
\begin{equation} \mathcal {L}_{\mathcal {D}_{\mathbf {\alpha }}}\left(h\right) \le \hat{\mathcal {L}}_{\mathcal {D}_{\mathbf {\alpha }}}\left(h\right) + \sup _{f\in \mathcal {H}}\left(\mathcal {L}_{\mathcal {D}_{\mathbf {\alpha }}}\left(f\right)- \hat{\mathcal {L}}_{\mathcal {D}_{\mathbf {\alpha }}}\left(f\right)\right) . \end{equation}
(12)
To link the second term to its expectation, we prove the following:
Lemma 1.
Define the function \(\phi :\left(\mathcal {X}\times \mathcal {Y}\right)^m \rightarrow \mathbb {R}\) by:
\[\phi \left(\lbrace x_{1,1}, y_{1,1}\rbrace , \ldots , \lbrace x_{N, m_N}, y_{N, m_N}\rbrace \right) = \sup _{f\in \mathcal {H}}\left(\mathcal {L}_{\mathcal {D}_{\mathbf {\alpha }}}\left(f\right)- \hat{\mathcal {L}}_{\mathcal {D}_{\mathbf {\alpha }}}\left(f\right)\right).\]
Denote for brevity \(z_{i,j} = \lbrace x_{i,j}, y_{i,j}\rbrace\). Then, for any \(i \in \lbrace 1, 2, \ldots , N\rbrace , j \in \lbrace 1, 2, \ldots , m_i\rbrace\):
\begin{equation} \begin{split} \sup _{z_{1,1}, \ldots , z_{N, m_N}, z_{i,j}^{^{\prime }}} |\phi (z_{1,1},\ldots , z_{i,j}, \ldots , z_{N, m_N}) - \phi (z_{1,1}, \ldots , z_{i,j}^{^{\prime }}, \ldots , z_{N, m_N})| \le \frac{\alpha _i}{m_i}\mathcal {M} \end{split} . \end{equation}
(13)
Proof.
Fix any \(i, j\) and any \(z_{1,1}, \ldots , z_{N, m_N}, z_{i,j}^{^{\prime }}\). Denote the \(\alpha\)-weighted empirical average of the loss with respect to the sample \(z_{1,1}, \ldots , z_{i,j}^{^{\prime }}, \ldots , z_{N, m_N}\) by \(\mathcal {L}_{\mathcal {D}_{\mathbf {\alpha }}}^{^{\prime }}\). Then, we have that:
\begin{align*} |\phi (\ldots , z_{i,j}, \ldots) - \phi (\ldots , z_{i,j}^{^{\prime }}, \ldots)| & = |\sup _{f\in \mathcal {H}}\left(\mathcal {L}_{\mathcal {D}_{\mathbf {\alpha }}}\left(f\right) - \hat{\mathcal {L}}_{\mathcal {D}_{\mathbf {\alpha }}}\left(f\right)\right) - \sup _{f\in \mathcal {H}} \left(\mathcal {L}_{\mathcal {D}_{\mathbf {\alpha }}}\left(f\right) - \hat{\mathcal {L}}_{\mathcal {D}_{\mathbf {\alpha }}}^{^{\prime }}\left(f\right)\right)| \\ & \le |\sup _{f\in \mathcal {H}}\left(\hat{\mathcal {L}}^{^{\prime }}_{\mathcal {D}_{\mathbf {\alpha }}}\left(f\right) - \hat{\mathcal {L}}_{\mathcal {D}_{\mathbf {\alpha }}}\left(f\right)\right)| \\ & = \frac{\alpha _i}{m_i}|\sup _{f\in \mathcal {H}}\left(\ell _f(z^{\prime }_{i,j}) - \ell _f(z_{i,j})\right)| \\ & \le \frac{\alpha _i}{m_i}\mathcal {M} . \end{align*}
Note: The inequality we used above holds for bounded functions inside the supremum.□
Let \(S\) denote a random sample of size \(m\) drawn from a distribution as the one generating out data (i.e., \(m_i\) samples from \(\mathcal {D}_i\) for each \(i\)). Now, using Lemma 1, McDiarmid’s inequality gives:
\begin{equation*} \begin{split} \mathbb {P}\left(\phi (S) - \mathbb {E}(\phi (S)) \ge t\right) & \le \exp \left(-\frac{2t^2}{\sum _{i=1}^N\sum _{j=1}^{m_i}\frac{\alpha _i^2}{m_i^2}\mathcal {M}^2} \right) \\ & = \exp \left(-\frac{2t^2}{\mathcal {M}^2\sum _{i=1}^N \frac{\alpha _i^2}{m_i}}\right) . \end{split} \end{equation*}
For any \(\delta \gt 0\), setting the right-hand side above to be \(\delta /4\) and using Equation (12), we obtain that with probability at least \(1-\delta /4\):
\begin{equation} \begin{split} \mathcal {L}_{\mathcal {D}_{\mathbf {\alpha }}}\left(h\right) \le \hat{\mathcal {L}}_{\mathcal {D}_{\mathbf {\alpha }}}\left(h\right) & + \mathbb {E}_S\left(\sup _{f\in \mathcal {H}}\left(\mathcal {L}_{\mathcal {D}_{\mathbf {\alpha }}} (f) - \hat{\mathcal {L}}_{\mathcal {D}_{\mathbf {\alpha }}}(f)\right)\right) + \sqrt {\frac{\log \left(\frac{4}{\delta }\right)\mathcal {M}^2}{2}}\sqrt {\sum _{i=1}^N\frac{\alpha _i^2}{m_i}} . \end{split} \end{equation}
(14)
To deal with the expected loss inside the second term, introduce a ghost sample (denoted by \(S^{\prime }\)), drawn from the same distributions as our original sample (denoted by \(S\)). Denoting the weighted empirical loss with respect to the ghost sample by \(\mathcal {L}_{\mathcal {D}_{\mathbf {\alpha }}}^{^{\prime }}\), \(\beta _i = m_i/m\) for all \(i\), and using the convexity of the supremum, we obtain:
\begin{equation*} \begin{split} \mathbb {E}_S \left(\sup _{f\in \mathcal {H}}\left(\mathcal {L}_{\mathcal {D}_{\mathbf {\alpha }}} (f) - \hat{\mathcal {L}}_{\mathcal {D}_{\mathbf {\alpha }}}(f)\right)\right) & = \mathbb {E}_{S}\left(\sup _{f\in \mathcal {H}}\left(\mathbb {E}_{S^{\prime }}\left(\hat{\mathcal {L}}_{\mathcal {D}_{\mathbf {\alpha }}}^{^{\prime }}(f)\right) - \hat{\mathcal {L}}_{\mathcal {D}_{\mathbf {\alpha }}}(f)\right)\right) \\ & \le \mathbb {E}_{S, S^{\prime }} \left(\sup _{f\in \mathcal {H}}\left(\hat{\mathcal {L}}_{\mathcal {D}_{\mathbf {\alpha }}}^{^{\prime }}(f) - \hat{\mathcal {L}}_{\mathcal {D}_{\mathbf {\alpha }}}(f) \right)\right) \\ & = \mathbb {E}_{S, S^{\prime }}\left(\sup _{f\in \mathcal {H}}\left(\frac{1}{m}\sum _{i=1}^N\sum _{j=1}^{m_i}\frac{\alpha _i}{\beta _i}\left(\ell _f(z^{\prime }_{i,j}) \right. \right. \right. \left. \left. \left. - \ell _f(z_{i,j}) \vphantom{L^{^{\prime }}}\right) \vphantom{\frac{1}{m}\sum _{i=1}^N}\right)\right) . \end{split} \end{equation*}
Introducing \(m\) independent Rademacher random variables and noting that \((\ell _f(z^{\prime }) - \ell _f(z))\) and \(\sigma (\ell _f(z^{\prime }) - \ell _f(z))\) have the same distribution, as long as \(\mathbf {z}\) and \(\mathbf {z}^{\prime }\) have the same distribution:
\begin{equation*} \begin{split} \mathbb {E}_S \left(\sup _{f\in \mathcal {H}}\left(\mathcal {L}_{\mathcal {D}_{\mathbf {\alpha }}} (f) - \hat{\mathcal {L}}_{\mathcal {D}_{\mathbf {\alpha }}}(f)\right)\right) & \le \mathbb {E}_{S, S^{\prime }, \sigma }\left(\sup _{f\in \mathcal {H}}\left(\frac{1}{m}\sum _{i=1}^N\sum _{j=1}^{m_i}\frac{\alpha _i}{\beta _i}\sigma _{i,j}\left(\ell _f(z_{i,j})^{^{\prime }}) \right. \right. \right. \left. \left. \left. -\, \ell _f(z_{i,j}) \vphantom{L^{^{\prime }}}\right) \vphantom{\frac{1}{m}\sum _{i=1}^N}\right)\right) \\ & \le \mathbb {E}_{S^{^{\prime }}, \sigma }\left(\sup _{f\in \mathcal {H}}\left(\frac{1}{m}\sum _{i=1}^N\sum _{j=1}^{m_i}\frac{\alpha _{i}}{\beta _{i}}\sigma _{i,j}\ell _f(z_{i,j})\right)\right) \\ &\quad + \mathbb {E}_{S, \sigma }\left(\sup _{f\in \mathcal {H}}\left(\frac{1}{m}\sum _{i=1}^N\sum _{j=1}^{m_i}\frac{\alpha _{i}}{\beta _{i}}(-\sigma _{i,j})\ell _f(z_{i,j})\right)\right) \\ & = 2\mathbb {E}_{S, \sigma }\left(\sup _{f\in \mathcal {H}}\left(\frac{1}{m}\sum _{i=1}^N\sum _{j=1}^{m_i}\frac{\alpha _{i}}{\beta _{i}}\sigma _{i,j}\ell _f(z_{i,j})\right)\right). \end{split} \end{equation*}
We can now link the last term to the empirical analog of the Rademacher complexity by using the McDiarmid Inequality (with an observation similar to Lemma 1). Putting this together, we obtain that for any \(\delta \gt 0\) with probability at least \(1 - \delta /2\):
\begin{equation} \begin{split} \mathcal {L}_{\mathcal {D}_{\mathbf {\alpha }}}\left(h\right) & \le \hat{\mathcal {L}}_{\mathcal {D}_{\mathbf {\alpha }}} \left(h\right) + 2\mathbb {E}_{\sigma }\left(\sup _{f\in \mathcal {H}}\left(\frac{1}{m}\sum _{i=1}^N\sum _{j=1}^{m_i}\frac{\alpha _{i}}{\beta _{i}}\sigma _{i,j}\ell _f(z_{i,j})\right)\right) + 3 \sqrt {\frac{\log \left(\frac{4}{\delta }\right)M^2}{2}}\sqrt {\sum _{i=1}^N\frac{\alpha _i^2}{m_i}} . \end{split} \end{equation}
(15)
Finally, note that:
\begin{align*} \mathbb {E}_{\sigma } \left(\sup _{f\in \mathcal {H}}\left(\frac{1}{m}\sum _{i=1}^N\sum _{j=1}^{m_i}\frac{\alpha _{i}}{\beta _{i}}\sigma _{i,j}\ell _f(z_{i,j})\right)\right) & \le \mathbb {E}_{\sigma }\left(\sum _{i=1}^{N}\alpha _i\sup _{f\in \mathcal {H}}\left(\frac{1}{m_i}\sum _{j=1}^{m_i}\sigma _{i,j}\ell _f(z_{i,j})\right)\right) \\ & = \sum _{i=1}^N \alpha _i \mathbb {E}_{\sigma }\left(\sup _{f\in \mathcal {H}}\left(\frac{1}{m_i}\sum _{j=1}^{m_i}\sigma _{i,j}\ell _f(z_{i,j})\right)\right) \\ & = \sum _{i=1}^N \alpha _i \mathcal {R}_i \left(\mathcal {H}\right) . \end{align*}
Bounding \(\hat{\mathcal {L}}_{\mathcal {D}_{\mathbf {\alpha }}}(h) - \mathcal {L}_{\mathcal {D}_{\mathbf {\alpha }}}(h)\) with the same quantity and with probability at least \(1 - \delta /2\) follows by a similar argument. The result then follows by applying the union bound.

B Proof of Theorem 2

Proof.
The Lagrangian function of Equation (6) is
\begin{equation} \mathbb {L} = \mathbf {\alpha }^\top {\hat{\mathcal {L}}}(\mathbf {w}) + \frac{\lambda }{2} || \mathbf {\alpha }^{\top } \mathbf {m}^{\circ - \frac{1}{2}} ||^2_2 - \mathbf {\alpha }^{\top } \mathbf {\beta } - \eta (\mathbf {\alpha }^{\top } \mathbf {1} - 1), \end{equation}
(16)
where \(\hat{\mathcal {L}}(\mathbf {w}) = [\hat{\mathcal {L}}_1(\mathbf {w}),\hat{\mathcal {L}}_2(\mathbf {w}),\ldots , \hat{\mathcal {L}}_N(\mathbf {w})]^\intercal\), \(\circ\) is the Hadamard root operation, \(\mathbf {\beta }\) and \(\eta\) are the Lagrangian multipliers. Then, the following Karush-Kuhn-Tucker (KKT) conditions hold:
\begin{align} \partial _{\mathbf {\alpha }} \mathbb {L}(\mathbf {\alpha }, \mathbf {\beta }, \eta) &= 0 , \end{align}
(17)
\begin{align} \mathbf {\alpha }^\intercal \mathbf {1} - 1 &= 0, \end{align}
(18)
\begin{align} \mathbf {\alpha } &\ge 0, \end{align}
(19)
\begin{align} \mathbf {\beta } &\ge 0, \end{align}
(20)
\begin{align} \alpha _i \beta _i &= 0 , \forall i = 1, 2,\ldots N. \end{align}
(21)
According to Equation (17), we have:
\begin{equation} \alpha _i = \frac{m_i(\beta _i + \eta - \hat{\mathcal {L}}_i(\mathbf {w}))}{\lambda }. \end{equation}
(22)
Since \(\beta _i \ge 0\), we discuss the following cases:
(1)
When \(\beta _i = 0\), we have \(\alpha _i = \frac{m_i(\eta - \hat{\mathcal {L}}_i(\mathbf {w}))}{\lambda } \ge 0\). Note that we further have \(\eta - \hat{\mathcal {L}}_i(\mathbf {w}) \ge 0\).
(2)
When \(\beta _i \gt 0\), from the condition \(\alpha _i \beta _i = 0\), we have \(\alpha _i = 0\).
Therefore, the optimal solution to Equation (6) is given by:
\begin{equation} \alpha _i(\mathbf {w}) = \left[\frac{m_i (\eta - \hat{\mathcal {L}}_i(\mathbf {w}))}{\lambda }\right]_{+}, \end{equation}
(23)
where \([\cdot ]_+ = max(0, \cdot)\).
We notice that \(\sum _{i=1}^p \alpha _i = 1\), thus we can get:
\begin{equation} \eta = \frac{\sum _{i=1}^{p} m_i \hat{\mathcal {L}}_i(\mathbf {w}) + \lambda }{\sum _{i=1}^{p} m_i}. \end{equation}
(24)
According to \(\eta - \hat{\mathcal {L}}_i(\mathbf {w}) \ge 0\), we have Equations (7) and (8). Finally, plugging Equation (24) into Equation (23) yields Equation (9).□

References

[1]
Martín Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Greg S. Corrado, Andy Davis, Jeffrey Dean, Matthieu Devin, et al. 2016. Tensorflow: Large-scale machine learning on heterogeneous distributed systems. arXiv preprint arXiv:1603.04467 (2016).
[2]
Arjun Nitin Bhagoji, Supriyo Chakraborty, Prateek Mittal, and Seraphin Calo. 2019. Analyzing federated learning through an adversarial lens. In International Conference on Machine Learning. PMLR, 634–643.
[3]
Peva Blanchard, Rachid Guerraoui, Julien Stainer, et al. 2017. Machine learning with adversaries: Byzantine tolerant gradient descent. In Conference on Advances in Neural Information Processing Systems. 119–129.
[4]
Olivier Bousquet, Stéphane Boucheron, and Gábor Lugosi. 2003. Introduction to statistical learning theory. In Summer School on Machine Learning. Springer, 169–207.
[5]
Stephen Boyd, Stephen P. Boyd, and Lieven Vandenberghe. 2004. Convex Optimization. Cambridge University Press.
[6]
Sebastian Caldas, Peter Wu, Tian Li, Jakub Konečnỳ, H. Brendan McMahan, Virginia Smith, and Ameet Talwalkar. 2018. Leaf: A benchmark for federated settings. arXiv preprint arXiv:1812.01097 (2018).
[7]
Xiaoyu Cao, Minghong Fang, Jia Liu, and Neil Zhenqiang Gong. 2021. FLTrust: Byzantine-robust federated learning via trust bootstrapping. In ISOC Network and Distributed System Security Symposium (NDSS).
[8]
Mingqing Chen, Ananda Theertha Suresh, Rajiv Mathews, Adeline Wong, Cyril Allauzen, Françoise Beaufays, and Michael Riley. 2019. Federated learning of n-gram language models. In 23rd Conference on Computational Natural Language Learning (CoNLL). 121–130.
[9]
Yudong Chen, Lili Su, and Jiaming Xu. 2017. Distributed statistical machine learning in adversarial settings: Byzantine gradient descent. Proc. ACM Measur. Anal. Comput. Syst. 1, 2 (2017), 1–25.
[10]
Gregory Cohen, Saeed Afshar, Jonathan Tapson, and Andre Van Schaik. 2017. EMNIST: Extending MNIST to handwritten letters. In International Joint Conference on Neural Networks (IJCNN). IEEE, 2921–2926.
[11]
Minghong Fang, Xiaoyu Cao, Jinyuan Jia, and Neil Gong. 2020. Local model poisoning attacks to byzantine-robust federated learning. In 29th USENIX Security Symposium (USENIX Security’20). 1605–1622.
[12]
Luigi Grippo and Marco Sciandrone. 2000. On the convergence of the block nonlinear Gauss-Seidel method under convex constraints. Oper. Res. Lett. 26, 3 (2000), 127–136.
[13]
Rachid Guerraoui, Sébastien Rouault, et al. 2018. The hidden vulnerability of distributed learning in byzantium. In International Conference on Machine Learning. 3521–3530.
[14]
Dhruv Guliani, Francoise Beaufays, and Giovanni Motta. 2020. Training speech recognition models with federated learning: A quality/cost framework. arXiv preprint arXiv:2010.15965 (2020).
[15]
Jenny Hamer, Mehryar Mohri, and Ananda Theertha Suresh. 2020. FedBoost: A communication-efficient algorithm for federated learning. In International Conference on Machine Learning. PMLR, 3973–3983.
[16]
Yufei Han and Xiangliang Zhang. 2020. Robust federated learning via collaborative machine teaching. In AAAI Conference on Artificial Intelligence. 4075–4082.
[17]
Tzu-Ming Harry Hsu, Hang Qi, and Matthew Brown. 2019. Measuring the effects of non-identical data distribution for federated visual classification. arXiv preprint arXiv:1909.06335 (2019).
[18]
Peter Kairouz, Ziyu Liu, and Thomas Steinke. 2021. The distributed discrete Gaussian mechanism for federated learning with secure aggregation. In Proceedings of the 38th International Conference on Machine Learning (Proceedings of Machine Learning Research), Marina Meila and Tong Zhang (Eds.), Vol. 139. 5201–5212.
[19]
Jakub Konečnỳ, H. Brendan McMahan, Daniel Ramage, and Peter Richtárik. 2016. Federated optimization: Distributed machine learning for on-device intelligence. arXiv preprint arXiv:1610.02527 (2016).
[20]
Jakub Konečnỳ, H. Brendan McMahan, Felix X. Yu, Peter Richtárik, Ananda Theertha Suresh, and Dave Bacon. 2016. Federated learning: Strategies for improving communication efficiency. arXiv preprint arXiv:1610.05492 (2016).
[21]
Nikola Konstantinov and Christoph Lampert. 2019. Robust learning from untrusted sources. In International Conference on Machine Learning. PMLR, 3488–3498.
[22]
Suyi Li, Yong Cheng, Wei Wang, Yang Liu, and Tianjian Chen. 2020. Learning to detect malicious clients for robust federated learning. arXiv preprint arXiv:2002.00211 (2020).
[23]
Tian Li, Shengyuan Hu, Ahmad Beirami, and Virginia Smith. 2020. Federated multi-task learning for competing constraints. arXiv preprint arXiv:2012.04221 (2020).
[24]
Tian Li, Shengyuan Hu, Ahmad Beirami, and Virginia Smith. 2021. Ditto: Fair and robust federated learning through personalization. In International Conference on Machine Learning.
[25]
Tian Li, Maziar Sanjabi, Ahmad Beirami, and Virginia Smith. 2019. Fair Resource Allocation in Federated Learning. In International Conference on Learning Representations.
[26]
Xiaoli Li, Nan Liu, Chuan Chen, Zibin Zheng, Huizhong Li, and Qiang Yan. 2020. Communication-efficient collaborative learning of geo-distributed jointcloud from heterogeneous datasets. In IEEE International Conference on Joint Cloud Computing. IEEE, 22–29.
[27]
Tao Lin, Lingjing Kong, Sebastian U. Stich, and Martin Jaggi. 2020. Ensemble distillation for robust model fusion in federated learning. In Advances in Neural Information Processing Systems Conference.
[28]
Xiuming Liu and Edith Ngai. 2019. Gaussian process learning for distributed sensor networks under false data injection attacks. In IEEE Conference on Dependable and Secure Computing (DSC). IEEE, 1–6.
[29]
Jiahuan Luo, Xueyang Wu, Yun Luo, Anbu Huang, Yunfeng Huang, Yang Liu, and Qiang Yang. 2019. Real-world image datasets for federated learning. arXiv preprint arXiv:1910.11089 (2019).
[30]
Lingjuan Lyu, Han Yu, and Qiang Yang. 2020. Threats to federated learning: A survey. arXiv preprint arXiv:2003.02133 (2020).
[31]
Saeed Mahloujifar, Mohammad Mahmoody, and Ameer Mohammed. 2019. Data poisoning attacks in multi-party learning. In International Conference on Machine Learning. 4274–4283.
[32]
Brendan McMahan, Eider Moore, Daniel Ramage, Seth Hampson, and Blaise Aguera y Arcas. 2017. Communication-efficient learning of deep networks from decentralized data. In Artificial Intelligence and Statistics. PMLR 1273–1282.
[33]
Mehryar Mohri, Gary Sivek, and Ananda Theertha Suresh. 2019. Agnostic federated learning. In International Conference on Machine Learning. 4615–4625.
[34]
Xingchao Peng, Zijun Huang, Yizhe Zhu, and Kate Saenko. 2019. Federated adversarial domain adaptation. In International Conference on Learning Representations.
[35]
Krishna Pillutla, Sham M. Kakade, and Zaid Harchaoui. 2019. Robust aggregation for federated learning. arXiv preprint arXiv:1912.13445 (2019).
[36]
Nuria Rodríguez-Barroso, Eugenio Martínez-Cámara, M. Luzón, Gerardo González Seco, Miguel Ángel Veganzones, and Francisco Herrera. 2020. Dynamic federated learning model for identifying adversarial clients. arXiv preprint arXiv:2007.15030 (2020).
[37]
Felix Sattler, Klaus-Robert Müller, Thomas Wiegand, and Wojciech Samek. 2020. On the byzantine robustness of clustered federated learning. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 8861–8865.
[38]
William Shakespeare. The Complete Works of William Shakespeare. Retrieved from http://www.gutenberg.org/ebooks/100/.
[39]
Shai Shalev-Shwartz and Shai Ben-David. 2014. Understanding Machine Learning: From Theory to Algorithms. Cambridge University Press.
[40]
Jinhyun So, Başak Güler, and A. Salman Avestimehr. 2020. Byzantine-resilient secure federated learning. IEEE J. Select. Areas Commun. 39, 7 (2020), 2168–2181.
[41]
Sebastian U. Stich. 2018. Local SGD converges fast and communicates little. In International Conference on Learning Representations.
[42]
Dianbo Sui, Yubo Chen, Jun Zhao, Yantao Jia, Yuantao Xie, and Weijian Sun. 2020. FedED: Federated learning via ensemble distillation for medical relation extraction. In Conference on Empirical Methods in Natural Language Processing (EMNLP). 2118–2128.
[43]
David A. van Dyk and Xiao-Li Meng. 2001. The art of data augmentation. J. Computat. Graphic. Statist. 10, 1 (2001), 1–50.
[44]
Paul Wais, Shivaram Lingamneni, Duncan Cook, Jason Fennell, Benjamin Goldenberg, Daniel Lubarov, David Marin, and Hari Simons. 2010. Towards building a high-quality workforce with mechanical turk. In NIPS Workshop on Computational Social Science and the Wisdom of Crowds. Citeseer.
[45]
Hongyi Wang, Mikhail Yurochkin, Yuekai Sun, Dimitris Papailiopoulos, and Yasaman Khazaeni. 2019. Federated learning with matched averaging. In International Conference on Learning Representations.
[46]
Jianyu Wang and Gauri Joshi. 2019. Cooperative SGD: A unified framework for the design and analysis of communication-efficient SGD algorithms. In ICML Workshop on Coding Theory for Machine Learning.
[47]
Qiang Yang, Yang Liu, Tianjian Chen, and Yongxin Tong. 2019. Federated machine learning: Concept and applications. ACM Trans. Intell. Syst. Technol. 10, 2 (2019), 1–19.
[48]
Qiang Yang, Yang Liu, Yong Cheng, Yan Kang, Tianjian Chen, and Han Yu. 2019. Federated learning. Synth. Lect. Artif. Intell. Mach. Learn. 13, 3 (2019), 1–207.
[49]
Timothy Yang, Galen Andrew, Hubert Eichner, Haicheng Sun, Wei Li, Nicholas Kong, Daniel Ramage, and Françoise Beaufays. 2018. Applied federated learning: Improving Google keyboard query suggestions. arXiv preprint arXiv:1812.02903 (2018).
[50]
Dong Yin, Yudong Chen, Ramchandran Kannan, and Peter Bartlett. 2018. Byzantine-robust distributed learning: Towards optimal statistical rates. In International Conference on Machine Learning. 5650–5659.
[51]
Dong Yin, Ramchandran Kannan, and Peter Bartlett. 2019. Rademacher complexity for adversarially robust generalization. In International Conference on Machine Learning. PMLR, 7085–7094.
[52]
Hao Yu, Sen Yang, and Shenghuo Zhu. 2019. Parallel restarted SGD with faster convergence and less communication: Demystifying why model averaging works for deep learning. In AAAI Conference on Artificial Intelligence, Vol. 33. 5693–5700.
[53]
Shuai Zheng, Ziyue Huang, and James Kwok. 2019. Communication-efficient distributed blockwise momentum SGD with error-feedback. In Conference on Advances in Neural Information Processing Systems. 11450–11460.

Cited By

View all
  • (2024)Deconfounded Cross-modal Matching for Content-based Micro-video Background Music RecommendationACM Transactions on Intelligent Systems and Technology10.1145/365004215:3(1-25)Online publication date: 15-Apr-2024
  • (2024)Fairness-Aware Graph Neural Networks: A SurveyACM Transactions on Knowledge Discovery from Data10.1145/364914218:6(1-23)Online publication date: 12-Apr-2024
  • (2024)Fair Sequential Recommendation without User DemographicsProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657703(395-404)Online publication date: 10-Jul-2024
  • Show More Cited By

Index Terms

  1. Auto-weighted Robust Federated Learning with Corrupted Data Sources

        Recommendations

        Comments

        Information & Contributors

        Information

        Published In

        cover image ACM Transactions on Intelligent Systems and Technology
        ACM Transactions on Intelligent Systems and Technology  Volume 13, Issue 5
        October 2022
        424 pages
        ISSN:2157-6904
        EISSN:2157-6912
        DOI:10.1145/3542930
        • Editor:
        • Huan Liu
        Issue’s Table of Contents

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        Published: 11 June 2022
        Online AM: 23 February 2022
        Accepted: 01 February 2022
        Revised: 01 January 2022
        Received: 01 March 2021
        Published in TIST Volume 13, Issue 5

        Permissions

        Request permissions for this article.

        Check for updates

        Author Tags

        1. Federated learning
        2. robustness
        3. Auto-weighted
        4. distributed learning
        5. neural networks

        Qualifiers

        • Research-article
        • Refereed

        Funding Sources

        • RGC Grant Research
        • Swedish Research Council
        • European Union’s Horizon 2020 research and innovation programme

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)173
        • Downloads (Last 6 weeks)11
        Reflects downloads up to 02 Feb 2025

        Other Metrics

        Citations

        Cited By

        View all
        • (2024)Deconfounded Cross-modal Matching for Content-based Micro-video Background Music RecommendationACM Transactions on Intelligent Systems and Technology10.1145/365004215:3(1-25)Online publication date: 15-Apr-2024
        • (2024)Fairness-Aware Graph Neural Networks: A SurveyACM Transactions on Knowledge Discovery from Data10.1145/364914218:6(1-23)Online publication date: 12-Apr-2024
        • (2024)Fair Sequential Recommendation without User DemographicsProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657703(395-404)Online publication date: 10-Jul-2024
        • (2024)Demand-driven Urban Facility Visit PredictionACM Transactions on Intelligent Systems and Technology10.1145/362523315:2(1-24)Online publication date: 22-Feb-2024
        • (2024)Systemization of Knowledge (SoK): Creating a Research Agenda for Human-Centered Real-Time Risk Detection on Social Media PlatformsProceedings of the CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642315(1-21)Online publication date: 11-May-2024
        • (2024)Trusted Model Aggregation With Zero-Knowledge Proofs in Federated LearningIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2024.345576235:11(2284-2296)Online publication date: Nov-2024
        • (2024)BF-Meta: Secure Blockchain-enhanced Privacy-preserving Federated Learning for Metaverse2024 IEEE International Conference on Metaverse Computing, Networking, and Applications (MetaCom)10.1109/MetaCom62920.2024.00037(166-172)Online publication date: 12-Aug-2024
        • (2024)MergeSFL: Split Federated Learning with Feature Merging and Batch Size Regulation2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00164(2054-2067)Online publication date: 13-May-2024
        • (2024)FedIBD: a federated learning framework in asynchronous mode for imbalanced dataApplied Intelligence10.1007/s10489-024-06032-655:2Online publication date: 10-Dec-2024
        • (2024)Collaborative learning of supervision and correlation for generalized zero-shot extreme multi-label learningApplied Intelligence10.1007/s10489-024-05498-854:8(6285-6298)Online publication date: 9-May-2024
        • Show More Cited By

        View Options

        Login options

        Full Access

        View options

        PDF

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        Full Text

        View this article in Full Text.

        Full Text

        HTML Format

        View this article in HTML Format.

        HTML Format

        Figures

        Tables

        Media

        Share

        Share

        Share this Publication link

        Share on social media