Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

VADAF: Visualization for Abnormal Client Detection and Analysis in Federated Learning

Published: 03 September 2021 Publication History

Abstract

Federated Learning (FL) provides a powerful solution to distributed machine learning on a large corpus of decentralized data. It ensures privacy and security by performing computation on devices (which we refer to as clients) based on local data to improve the shared global model. However, the inaccessibility of the data and the invisibility of the computation make it challenging to interpret and analyze the training process, especially to distinguish potential client anomalies. Identifying these anomalies can help experts diagnose and improve FL models. For this reason, we propose a visual analytics system, VADAF, to depict the training dynamics and facilitate analyzing potential client anomalies. Specifically, we design a visualization scheme that supports massive training dynamics in the FL environment. Moreover, we introduce an anomaly detection method to detect potential client anomalies, which are further analyzed based on both the client model’s visual and objective estimation. Three case studies have demonstrated the effectiveness of our system in understanding the FL training process and supporting abnormal client detection and analysis.

References

[1]
Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, Manjunath Kudlur, Josh Levenberg, Rajat Monga, Sherry Moore, Derek G. Murray, Benoit Steiner, Paul Tucker, Vijay Vasudevan, Pete Warden, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. 2016. TensorFlow: A system for large-scale machine learning. In Proceedings of the 12th USENIX Conference on Operating Systems Design and Implementation (OSDI’16). USENIX Association, 265–283.
[2]
Scott Alfeld, Xiaojin Zhu, and Paul Barford. 2016. Data poisoning attacks against autoregressive models. In Proceedings of the 30th AAAI Conference on Artificial Intelligence (AAAI’16). AAAI Press, 1452–1458.
[3]
Saleema Amershi, Max Chickering, Steven M. Drucker, Bongshin Lee, Patrice Simard, and Jina Suh. 2015. ModelTracker: Redesigning performance analysis tools for machine learning. In Proceedings of the 33rd ACM Conference on Human Factors in Computing Systems (CHI’15). ACM, New York, NY, 337–346.
[4]
Eugene Bagdasaryan, Andreas Veit, Yiqing Hua, Deborah Estrin, and Vitaly Shmatikov. 2019. How To Backdoor Federated Learning. arxiv:1807.00459.
[5]
Ivan Beschastnikh, Patty Wang, Yuriy Brun, and Michael D. Ernst. 2016. Debugging distributed systems. Commun. ACM 59, 8 (July 2016), 32–37.
[6]
Keith Bonawitz, Hubert Eichner, Wolfgang Grieskamp, Dzmitry Huba, Alex Ingerman, Vladimir Ivanov, Chloe Kiddon, Jakub Konečný, Stefano Mazzocchi, H. Brendan McMahan, Timon Van Overveldt, David Petrou, Daniel Ramage, and Jason Roselander. 2019. Towards Federated Learning at Scale: System Design. arxiv:1902.01046.
[7]
Keith Bonawitz, Vladimir Ivanov, Ben Kreuter, Antonio Marcedone, H. Brendan McMahan, Sarvar Patel, Daniel Ramage, Aaron Segal, and Karn Seth. 2017. Practical secure aggregation for privacy-preserving machine learning. In Proceedings of the ACM SIGSAC Conference on Computer and Communications Security (CCS’17). Association for Computing Machinery, New York, NY, 1175–1191.
[8]
Ingwer Borg and Patrick Groenen. 2003. Modern multidimensional scaling: Theory and applications. J. Educ. Meas. 40, 3 (2003), 277–280.
[9]
Markus M. Breunig, Hans-Peter Kriegel, Raymond T. Ng, and Jörg Sander. 2000. LOF: Identifying density-based local outliers. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD’00). Association for Computing Machinery, New York, NY, 93–104.
[10]
Sebastian Caldas, Sai Meher Karthik Duddu, Peter Wu, Tian Li, Jakub Konečný, H. Brendan McMahan, Virginia Smith, and Ameet Talwalkar. 2019. LEAF: A Benchmark for Federated Settings. arxiv:1812.01097.
[11]
Varun Chandola, Arindam Banerjee, and Vipin Kumar. 2009. Anomaly detection: A survey. 41, 3, Article 15 (July 2009). 58 pages.
[12]
Xinyun Chen, Chang Liu, Bo Li, Kimberly Lu, and Dawn Song. 2017. Targeted Backdoor Attacks on Deep Learning Systems Using Data Poisoning. arxiv:1712.05526.
[13]
Dan C. Cosma and Radu Marinescu. 2007. Distributable features view: Visualizing the structural characteristics of distributed software systems. In Proceedings of the 4th IEEE International Workshop on Visualizing Software for Understanding and Analysis. IEEE, 55–62.
[14]
Minghong Fang, Xiaoyu Cao, Jinyuan Jia, and Neil Zhenqiang Gong. 2020. Local Model Poisoning Attacks to Byzantine-Robust Federated Learning. arxiv:1911.11815.
[15]
Shuhao Fu, Chulin Xie, Bo Li, and Qifeng Chen. 2019. Attack-Resistant Federated Learning with Residual-based Reweighting. arxiv:1912.11464.
[16]
Clement Fung, Chris J. M. Yoon, and Ivan Beschastnikh. 2020. Mitigating Sybils in Federated Learning Poisoning. arxiv:1808.04866.
[17]
Google. 2019. TensorFlow Federated: Machine Learning on Decentralized Data. Retrieved from https://www.tensorflow.org/federated.
[18]
Dan Gunter, Brian Tierney, Brian Crowley, Mason Holding, and Jason Lee. 2000. NetLogger: A toolkit for distributed system performance analysis.Proceedings of the International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems (MASCOTS’00). IEEE Computer Society, 267.
[19]
Andrew Hard, Kanishka Rao, Rajiv Mathews, Swaroop Ramaswamy, Françoise Beaufays, Sean Augenstein, Hubert Eichner, Chloé Kiddon, and Daniel Ramage. 2019. Federated Learning for Mobile Keyboard Prediction. arxiv:1811.03604.
[20]
Li Huang and Dianbo Liu. 2019. Patient Clustering Improves Efficiency of Federated Machine Learning to predict mortality and hospital stay time using distributed Electronic Medical Records. arxiv:1903.09296.
[21]
Peter J. Huber. 2011. Robust Statistics. Springer.
[22]
B. Iglewicz and D. C. Hoaglin. 1993. How to Detect and Handle Outliers. ASQC Quality Press. 93020842Retrieved from https://books.google.nl/books?id=siInAQAAIAAJ.
[23]
M. Jagielski, A. Oprea, B. Biggio, C. Liu, C. Nita-Rotaru, and B. Li. 2018. Manipulating machine learning: Poisoning attacks and countermeasures for regression learning. In Proceedings of the IEEE Symposium on Security and Privacy (SP’18). 19–35.
[24]
Minsuk Kahng, Pierre Y. Andrews, Aditya Kalro, and Duen Horng Polo Chau. 2017. Activis: Visual exploration of industry-scale deep neural network models. IEEE Trans. Vis. Comput. Graph. 24, 1 (2017), 88–97.
[25]
Minsuk Kahng, Nikhil Thorat, Duen Horng Polo Chau, Fernanda B. Viégas, and Martin Wattenberg. 2018. GAN lab: Understanding complex deep generative models using interactive visual experimentation. IEEE Trans. Vis. Comput. Graph. 25, 1 (2018), 1–11.
[26]
Edwin M. Knorr and Raymond T. Ng. 1998. Algorithms for mining distance-based outliers in large datasets. In Proceedings of the 24th International Conference on Very Large Data Bases (VLDB’98). Morgan Kaufmann Publishers Inc., San Francisco, CA, 392–403.
[27]
Edwin M. Knorr and Raymond T. Ng. 1999. Finding intensional knowledge of distance-based outliers. In Proceedings of the 25th International Conference on Very Large Data Bases (VLDB’99). Morgan Kaufmann Publishers Inc., San Francisco, CA, 211–222.
[28]
Jakub Konečný, H. Brendan McMahan, Daniel Ramage, and Peter Richtárik. 2016. Federated Optimization: Distributed Machine Learning for On-Device Intelligence. arxiv:1610.02527.
[29]
Jakub Konečný, H. Brendan McMahan, Felix X. Yu, Peter Richtárik, Ananda Theertha Suresh, and Dave Bacon. 2017. Federated Learning: Strategies for Improving Communication Efficiency. arxiv:1610.05492.
[30]
Alex Krizhevsky, Geoffrey Hinton, et al. 2009. Learning Multiple Layers of Features from Tiny Images. Master’s thesis. Department of Computer Science, University of Toronto.
[31]
André Kutzleb. 2017. Visual Analytics of Big Data from Distributed Systems. Master’s thesis. University of Stuttgart. Retrieved from http://dx.doi.org/10.18419/opus-9585.
[32]
Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner. 1998. Gradient-based learning applied to document recognition. Proc. IEEE 86, 11 (1998), 2278–2324.
[33]
Christophe Leys, Olivier Klein, Philippe Bernard, and Laurent Licata. 2013. Detecting outliers: Do not use standard deviation around the mean, use absolute deviation around the median. J. Exper. Soc. Psychol. 49, 4 (2013), 764–766.
[34]
Suyi Li, Yong Cheng, Yang Liu, Wei Wang, and Tianjian Chen. 2019. Abnormal Client Behavior Detection in Federated Learning. arxiv:1910.09933.
[35]
Suyi Li, Yong Cheng, Wei Wang, Yang Liu, and Tianjian Chen. 2020. Learning to Detect Malicious Clients for Robust Federated Learning. arxiv:2002.00211.
[36]
Dongyu Liu, Weiwei Cui, Kai Jin, Yuxiao Guo, and Huamin Qu. 2018. Deeptracker: Visualizing the training process of convolutional neural networks. ACM Transactions on Intelligent Systems and Technology (TIST) 10, 1 (2018), 6.
[37]
Mengchen Liu, Jiaxin Shi, Kelei Cao, Jun Zhu, and Shixia Liu. 2017. Analyzing the training processes of deep generative models. IEEE Transactions on Visualization and Computer Graphics 24, 1 (2017), 77–87.
[38]
Mengchen Liu, Jiaxin Shi, Zhen Li, Chongxuan Li, Jun Zhu, and Shixia Liu. 2016. Towards better analysis of deep convolutional neural networks. IEEE Transactions on Visualization and Computer Graphics 23, 1 (2016), 91–100.
[39]
H. Brendan McMahan, Eider Moore, Daniel Ramage, Seth Hampson, and Blaise Agüera y Arcas. 2017. Communication-Efficient Learning of Deep Networks from Decentralized Data. arxiv:1602.05629.
[40]
H. Brendan McMahan, Daniel Ramage, Kunal Talwar, and Li Zhang. 2018. Learning Differentially Private Recurrent Language Models. arxiv:1710.06963.
[41]
Jeff Miller. 1991. Reaction time analysis with outlier exclusion: Bias varies with sample size. The Quarterly Journal of Experimental Psychology 43, 4 (1991), 907–912.
[42]
Y. Ming, S. Cao, R. Zhang, Z. Li, Y. Chen, Y. Song, and H. Qu. 2017. Understanding hidden memories of recurrent neural networks. In 2017 IEEE Conference on Visual Analytics Science and Technology (VAST). 13–24.
[43]
Gerhard Münz, Sa Li, and Georg Carle. 2007. Traffic anomaly detection using k-means clustering. In GI/ITG Workshop MMBnet. 13–14.
[44]
Kristin Potter, Hans Hagen, Andreas Kerren, and Peter Dannenmann. 2006. Methods for presenting statistical information: The box plot. Visualization of Large and Unstructured Data Sets 4 (2006), 97–106.
[45]
Sridhar Ramaswamy, Rajeev Rastogi, and Kyuseok Shim. 2000. Efficient algorithms for mining outliers from large data sets. In Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data (SIGMOD’00). Association for Computing Machinery, New York, NY, 427–438.
[46]
Paulo E. Rauber, Samuel G. Fadel, Alexandre X. Falcao, and Alexandru C. Telea. 2016. Visualizing the hidden activity of artificial neural networks. IEEE Transactions on Visualization and Computer Graphics 23, 1 (2016), 101–110.
[47]
Thomas C. Redman. 1998. The impact of poor data quality on the typical enterprise. Commun. ACM 41, 2 (1998), 79–82.
[48]
Donghao Ren, Saleema Amershi, Bongshin Lee, Jina Suh, and Jason D. Williams. 2016. Squares: Supporting interactive performance analysis for multiclass classifiers. IEEE Transactions on Visualization and Computer Graphics 23, 1 (2016), 61–70.
[49]
Peter J Rousseeuw and Christophe Croux. 1993. Alternatives to the median absolute deviation. Journal of the American Statistical Association 88, 424 (1993), 1273–1283.
[50]
Shiqi Shen, Shruti Tople, and Prateek Saxena. 2016. A Defending against poisoning attacks in collaborative deep learning systems. In Proceedings of the 32nd Annual Conference on Computer Security Applications (ACSAC’16). Association for Computing Machinery, New York, NY, 508–519.
[51]
Hendrik Strobelt, Sebastian Gehrmann, Hanspeter Pfister, and Alexander M Rush. 2017. Lstmvis: A tool for visual analysis of hidden state dynamics in recurrent neural networks. IEEE Transactions on Visualization and Computer Graphics 24, 1 (2017), 667–676.
[52]
Richard Tomsett, Kevin Chan, and Supriyo Chakraborty. 2019. Model poisoning attacks against distributed machine learning systems. In Artificial Intelligence and Machine Learning for Multi-Domain Operations Applications, Tien Pham (Ed.), Vol. 11006. International Society for Optics and Photonics, SPIE, 481–489.
[53]
Junpeng Wang, Liang Gou, Wei Zhang, Hao Yang, and Han-Wei Shen. 2019. DeepVID: Deep visual interpretation and diagnosis for image classifiers via knowledge distillation. IEEE Transactions on Visualization and Computer Graphics 25, 6 (2019), 2168–2180.
[54]
WeBank. 2019. Federated AI Technology Enabler(FATE). (2019). Retrieved Oct 2, 2019 from https://github.com/FederatedAI/FATE.
[55]
Xiguang Wei, Quan Li, Yang Liu, Han Yu, Tianjian Chen, and Qiang Yang. 2019. Multi-agent visualization for explaining federated learning. In Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence (IJCAI-19). International Joint Conferences on Artificial Intelligence Organization, 6572–6574.
[56]
Zhaoxian Wu, Qing Ling, Tianyi Chen, and Georgios B. Giannakis. 2019. Federated Variance-Reduced Stochastic Gradient Descent with Robustness to Byzantine Attacks. (2019). arxiv:1912.12716.
[57]
Qiang Yang, Yang Liu, Tianjian Chen, and Yongxin Tong. 2019. Federated machine learning: Concept and applications. ACM Transactions on Intelligent Systems and Technology (TIST) 10, 2 (2019), 12.
[58]
Chiyuan Zhang, Samy Bengio, Moritz Hardt, Benjamin Recht, and Oriol Vinyals. 2017. Understanding deep learning requires rethinking generalization. arxiv:1611.03530.
[59]
Jiawei Zhang, Yang Wang, Piero Molino, Lezhi Li, and David S Ebert. 2018. Manifold: A model-agnostic framework for interpretation and diagnosis of machine learning models. IEEE Transactions on Visualization and Computer Graphics 25, 1 (2018), 364–373.
[60]
Xingquan Zhu and Xindong Wu. 2004. Class noise vs. attribute noise: A quantitative study. Artificial Intelligence Review 22, 3 (2004), 177–210.

Cited By

View all
  • (2024)FedMon: A Federated Learning Monitoring ToolkitIoT10.3390/iot50200125:2(227-249)Online publication date: 11-Apr-2024
  • (2023)Visually Analysing the Fairness of Clustered Federated Learning with Non-IID Data2023 International Joint Conference on Neural Networks (IJCNN)10.1109/IJCNN54540.2023.10191762(01-10)Online publication date: 18-Jun-2023
  • (2023)A Comparative Analysis of Federated Learning Towards Big data IoT with Future Perspectives2023 3rd International Conference on Computing and Information Technology (ICCIT)10.1109/ICCIT58132.2023.10273901(518-525)Online publication date: 13-Sep-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Interactive Intelligent Systems
ACM Transactions on Interactive Intelligent Systems  Volume 11, Issue 3-4
December 2021
483 pages
ISSN:2160-6455
EISSN:2160-6463
DOI:10.1145/3481699
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 03 September 2021
Accepted: 01 September 2020
Revised: 01 September 2020
Received: 01 November 2019
Published in TIIS Volume 11, Issue 3-4

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Federated learning
  2. visual analytics
  3. anomaly detection

Qualifiers

  • Research-article
  • Refereed

Funding Sources

  • National Natural Science Foundation of China
  • Alibaba-Zhejiang University Joint Institute of Frontier Technologies (AZFT)

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)141
  • Downloads (Last 6 weeks)25
Reflects downloads up to 01 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)FedMon: A Federated Learning Monitoring ToolkitIoT10.3390/iot50200125:2(227-249)Online publication date: 11-Apr-2024
  • (2023)Visually Analysing the Fairness of Clustered Federated Learning with Non-IID Data2023 International Joint Conference on Neural Networks (IJCNN)10.1109/IJCNN54540.2023.10191762(01-10)Online publication date: 18-Jun-2023
  • (2023)A Comparative Analysis of Federated Learning Towards Big data IoT with Future Perspectives2023 3rd International Conference on Computing and Information Technology (ICCIT)10.1109/ICCIT58132.2023.10273901(518-525)Online publication date: 13-Sep-2023
  • (2022)Perspectives on cross-domain visual analysis of cyber-physical-social big data三元空间大数据跨域可视化分析展望Frontiers of Information Technology & Electronic Engineering10.1631/FITEE.210055322:12(1559-1564)Online publication date: 5-Jan-2022

View Options

Get Access

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media