research-article

Cross-Silo Federated Learning for Multi-Tier Networks with Vertical and Horizontal Data Partitioning

Authors:

Timothy Castiglia,

Stacy PattersonAuthors Info & Claims

ACM Transactions on Intelligent Systems and Technology (TIST), Volume 13, Issue 6

Article No.: 99, Pages 1 - 27

https://doi.org/10.1145/3543433

Published: 22 September 2022 Publication History

Abstract

We consider federated learning in tiered communication networks. Our network model consists of a set of silos, each holding a vertical partition of the data. Each silo contains a hub and a set of clients, with the silo’s vertical data shard partitioned horizontally across its clients. We propose Tiered Decentralized Coordinate Descent (TDCD), a communication-efficient decentralized training algorithm for such two-tiered networks. The clients in each silo perform multiple local gradient steps before sharing updates with their hub to reduce communication overhead. Each hub adjusts its coordinates by averaging its workers’ updates, and then hubs exchange intermediate updates with one another. We present a theoretical analysis of our algorithm and show the dependence of the convergence rate on the number of vertical partitions and the number of local updates. We further validate our approach empirically via simulation-based experiments using a variety of datasets and objectives.

References

[1]

Mehdi Salehi Heydar Abad, Emre Ozfatura, Deniz Gunduz, and Ozgur Ercetin. 2020. Hierarchical federated learning across heterogeneous cellular networks. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE, 8866–8870.

[2]

Léon Bottou, Frank E. Curtis, and Jorge Nocedal. 2018. Optimization methods for large-scale machine learning. Siam Review 60, 2 (2018), 223–311.

[3]

Timothy Castiglia, Anirban Das, and Stacy Patterson. 2021. Multi-level local SGD: Distributed SGD for heterogeneous hierarchical networks. In Proceedings of the International Conference on Learning Representations.

[4]

Tianyi Chen, Xiao Jin, Yuejiao Sun, and Wotao Yin. 2020. VAFL: A method of Vertical Asynchronous Federated Learning. arxiv:2007.06081. Retrieved from https://arxiv.org/abs/2007.06081.

[5]

Anirban Das and Stacy Patterson. 2021. Multi-tier federated learning for vertically partitioned data. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE, 3100–3104.

[6]

Siwei Feng and Han Yu. 2020. Multi-participant multi-class vertical federated learning. arxiv:2001.11154. Retrieved from https://arxiv.org/abs/2001.11154.

[7]

Farzin Haddadpour and Mehrdad Mahdavi. 2019. On the convergence of local descent methods in federated learning. arxiv:1910.14425. Retrieved from https://arxiv.org/abs/1910.14425.

[8]

Stephen Hardy, Wilko Henecka, Hamish Ivey-Law, Richard Nock, Giorgio Patrini, Guillaume Smith, and Brian Thorne. 2017. Private federated learning on vertically partitioned data via entity resolution and additively homomorphic encryption. arxiv:1711.10677. Retrieved from https://arxiv.org/abs/1711.10677.

[9]

Hrayr Harutyunyan, Hrant Khachatrian, David C. Kale, Greg Ver Steeg, and Aram Galstyan. 2019. Multitask learning and benchmarking with clinical time series data. Scientific Data 6, 1 (2019), 1–18.

[10]

Alistair E. W. Johnson, Tom J. Pollard, Lu Shen, H. Lehman Li-Wei, Mengling Feng, Mohammad Ghassemi, Benjamin Moody, Peter Szolovits, Leo Anthony Celi, and Roger G. Mark. 2016. MIMIC-III, a freely accessible critical care database. Scientific Data 3, 1 (2016), 1–9.

[11]

Peter Kairouz, H. Brendan McMahan, Brendan Avent, Aurélien Bellet, Mehdi Bennis, Arjun Nitin Bhagoji, Kallista Bonawitz, Zachary Charles, Graham Cormode, Rachel Cummings, Rafael G. L. D’Oliveira, Hubert Eichner, Salim El Rouayheb, David Evans, Josh Gardner, Zachary Garrett, Adriá Gascón, Badih Ghazi, Phillip B. Gibbons, Marco Gruteser, Zaid Harchaoui, Chaoyang He, Lie He, Zhouyuan Huo, Ben Hutchinson, Justin Hsu, Martin Jaggi, Tara Javidi, Gauri Joshi, Mikhail Khodak, Jakub KonecnÝ, Aleksandra Korolova, Farinaz Koushanfar, Sanmi Koyejo, Tancréde Lepoint, Yang Liu, Prateek Mittal, Mehryar Mohri, Richard Nock, Ayfer özgür, Rasmus Pagh, Hang Qi, Daniel Ramage, Ramesh Raskar, Mariana Raykova, Dawn Song, Weikang Song, Sebastian U. Stich, Ziteng Sun, Ananda Theertha Suresh, Florian Tramúr, Praneeth Vepakomma, Jianyu Wang, Li Xiong, Zheng Xu, Qiang Yang, Felix X. Yu, Han Yu, and Sen Zhao. 2021. Advances and open problems in federated learning. Foundations and Trends® in Machine Learning 14, 1–2 (2021), 1–210.

Digital Library

[12]

Dongyeop Kang, Woosang Lim, Kijung Shin, Lee Sael, and U. Kang. 2014. Data/feature distributed stochastic coordinate descent for logistic regression. In Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management. 1269–1278.

Digital Library

[13]

Yan Kang, Yang Liu, and Tianjian Chen. 2020. Fedmvt: Semi-supervised vertical federated learning with multiview training. arXiv:2008.10838. Retrieved from https://arxiv.org/abs/2008.10838.

[14]

Ahmed Khaled, Konstantin Mishchenko, and Peter Richtárik. 2020. Tighter theory for local SGD on identical and heterogeneous data. In Proceedings of the International Conference on Artificial Intelligence and Statistics. PMLR, 4519–4529.

[15]

Jakub Konečný, H. Brendan McMahan, Daniel Ramage, and Peter Richtárik. 2016. Federated optimization: Distributed machine learning for on-device intelligence. arxiv:1610.02527. Retrieved from https://arxiv.org/abs/1610.02527.

[16]

Alex Krizhevsky. 2009. Learning multiple layers of features from tiny images. Master’s thesis. University of Toronto, Toronto, Canada.

[17]

Tian Li, Anit Kumar Sahu, Manzil Zaheer, Maziar Sanjabi, Ameet Talwalkar, and Virginia Smith. 2020. Federated optimization in heterogeneous networks. Proceedings of Machine Learning and Systems 2 (2020), 429–450. https://proceedings.mlsys.org/paper/2020/file/38af86134b65d0f10fe33d30dd76442e-Paper.pdf.

[18]

Lumin Liu, Jun Zhang, S. H. Song, and Khaled B. Letaief. 2020. Client-edge-cloud hierarchical federated learning. In Proceedings of the IEEE International Conference on Communications. IEEE, 1–6.

[19]

Yang Liu, Yan Kang, Xinwei Zhang, Liping Li, Yong Cheng, Tianjian Chen, Mingyi Hong, and Qiang Yang. 2020. A communication efficient collaborative learning framework for distributed features. arxiv:1912.11187. Retrieved from https://arxiv.org/abs/1912.11187.

[20]

Dhruv Mahajan, S. Sathiya Keerthi, and S. Sundararajan. 2017. A distributed block coordinate descent method for training l1 regularized linear classifiers. The Journal of Machine Learning Research 18, 1 (2017), 3167–3201.

Digital Library

[21]

Brendan McMahan, Eider Moore, Daniel Ramage, Seth Hampson, and Blaise Aguera y Arcas. 2017. Communication-efficient learning of deep networks from decentralized data. In Proceedings of the 20th International Conference on Artificial Intelligence and Statistics. PMLR, 1273–1282.

[22]

Peter Richtárik and Martin Takáč. 2016. Distributed coordinate descent method for learning with big data. The Journal of Machine Learning Research 17, 1 (2016), 2657–2681.

Digital Library

[23]

Sebastian U. Stich. 2019. Local SGD converges fast and communicates little. In Proceedings of the International Conference on Learning Representations. OpenReview.net. Retrieved from https://openreview.net/forum?id=S1g2JnRcFX.

[24]

Chang Sun, Lianne Ippel, Johan Van Soest, Birgit Wouters, Alexander Malic, Onaopepo Adekunle, Bob van den Berg, Ole Mussmann, Annemarie Koster, Carla van der Kallen, Claudia van Oppen, David Townend, Andre Dekker, and Michel Dumontier. 2019. A privacy-preserving infrastructure for analyzing personal health data in a vertically partitioned scenario. Studies in Health Technology and Informatics 264 (2019), 373–377.

[25]

Paul Tseng and Sangwoon Yun. 2009. A coordinate gradient descent method for nonsmooth separable minimization. Mathematical Programming 117, 1–2 (Aug. 2009), 387–423.

Digital Library

[26]

Jianyu Wang and Gauri Joshi. 2021. Cooperative SGD: A unified framework for the design and analysis of local-update SGD algorithms. Journal of Machine Learning Research 22, 213 (2021), 1–50.

[27]

Jiayi Wang, Shiqiang Wang, Rong-Rong Chen, and Mingyue Ji. 2020. Local averaging helps: Hierarchical federated learning and convergence analysis. arXiv:2010.12998. Retrieved from https://arxiv.org/abs/2010.12998.

[28]

Shiqiang Wang, Tiffany Tuor, Theodoros Salonidis, Kin K. Leung, Christian Makaya, Ting He, and Kevin Chan. 2019. Adaptive federated learning in resource constrained edge computing systems. IEEE Journal on Selected Areas in Communications 37, 6 (June 2019), 1205–1221.

[29]

Yansheng Wang, Yongxin Tong, and Dingyuan Shi. 2020. Federated latent dirichlet allocation: A local differential privacy based framework. In Proceedings of the AAAI Conference on Artificial Intelligence. Vol. 34, 6283–6290.

[30]

Yansheng Wang, Yongxin Tong, Dingyuan Shi, and Ke Xu. 2021. An efficient approach for cross-silo federated learning to rank. In Proceedings of the IEEE International Conference on Data Engineering. IEEE, 1128–1139.

[31]

Yuncheng Wu, Shaofeng Cai, Xiaokui Xiao, Gang Chen, and Beng Chin Ooi. 2020. Privacy preserving vertical federated learning for tree-based models. Proceedings of the VLDB Endowment 13, 12 (Aug. 2020), 2090–2103.

Digital Library

[32]

Zhirong Wu, Shuran Song, Aditya Khosla, Fisher Yu, Linguang Zhang, Xiaoou Tang, and Jianxiong Xiao. 2015. 3d shapenets: A deep representation for volumetric shapes. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1912–1920.

[33]

Qiang Yang, Yang Liu, Tianjian Chen, and Yongxin Tong. 2019. Federated machine learning: Concept and applications. ACM Transactions on Intelligent Systems and Technology 10, 2 (2019), 1–19.

Digital Library

[34]

Shengwen Yang, Bing Ren, Xuhui Zhou, and Liping Liu. 2019. Parallel distributed logistic regression for vertical federated learning without third-party coordinator. arxiv:1911.09824. Retrieved from https://arxiv.org/abs/1911.09824.

[35]

Hao Yu, Sen Yang, and Shenghuo Zhu. 2019. Parallel restarted SGD with faster convergence and less communication: Demystifying why model averaging works for deep learning. In Proceedings of the AAAI Conference on Artificial Intelligence. Vol. 33, 5693–5700.

Digital Library

Cited By

Chen JHu QJiang H(2024)Alliance Makes Difference? Maximizing Social Welfare in Cross-Silo Federated LearningIEEE Transactions on Vehicular Technology10.1109/TVT.2023.332055073:2(2786-2798)Online publication date: Mar-2024
https://doi.org/10.1109/TVT.2023.3320550
Bai LTsai P(2024)A Secure Framework in Vertical and Horizontal Federated Learning Utilizing Homomorphic EncryptionNOMS 2024-2024 IEEE Network Operations and Management Symposium10.1109/NOMS59830.2024.10575488(1-5)Online publication date: 6-May-2024
https://doi.org/10.1109/NOMS59830.2024.10575488
Dilliwar VSahu MRakesh N(2024)Cluster computing-based EEG sub-band signal extraction with channel-wise and time-slice-wise data partitioning techniqueInternational Journal of Information Technology10.1007/s41870-024-01924-916:5(2763-2773)Online publication date: 11-May-2024
https://doi.org/10.1007/s41870-024-01924-9
Show More Cited By

Index Terms

Cross-Silo Federated Learning for Multi-Tier Networks with Vertical and Horizontal Data Partitioning
1. Computing methodologies
  1. Distributed computing methodologies
    1. Distributed algorithms
  2. Machine learning
    1. Machine learning algorithms
    2. Machine learning approaches
      1. Neural networks
2. Mathematics of computing
  1. Mathematical analysis
    1. Mathematical optimization
      1. Continuous optimization
        Nonconvex optimization

Recommendations

Fastest rates for stochastic mirror descent methods
Abstract
Relative smoothness—a notion introduced in Birnbaum et al. (Proceedings of the 12th ACM conference on electronic commerce, ACM, pp 127–136, 2011) and recently rediscovered in Bauschke et al. (Math Oper Res 330–348, 2016) and Lu et al. (Relatively-...
Efficient mini-batch training for stochastic optimization
KDD '14: Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining

Stochastic gradient descent (SGD) is a popular technique for large-scale optimization problems in machine learning. In order to parallelize SGD, minibatch training needs to be employed to reduce the communication cost. However, an increase in minibatch ...
The error-feedback framework: better rates for SGD with delayed gradients and compressed updates

We analyze (stochastic) gradient descent (SGD) with delayed updates on smooth quasi-convex and non-convex functions and derive concise, non-asymptotic, convergence rates. We show that the rate of convergence in all cases consists of two terms: (i) a ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Intelligent Systems and Technology

ACM Transactions on Intelligent Systems and Technology Volume 13, Issue 6

December 2022

468 pages

ISSN:2157-6904

EISSN:2157-6912

DOI:10.1145/3560231

Editor:
Huan Liu
Arizona State University, USA

Issue’s Table of Contents

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 22 September 2022

Online AM: 06 July 2022

Accepted: 09 May 2022

Revised: 17 April 2022

Received: 18 December 2021

Published in TIST Volume 13, Issue 6

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Refereed

Funding Sources

Rensselaer-IBM AI Research Collaboration
IBM AI Horizons Network
National Science Foundation

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

4
Total Citations
View Citations
485
Total Downloads

Downloads (Last 12 months)261
Downloads (Last 6 weeks)19

Reflects downloads up to 26 Jul 2024

Other Metrics

View Author Metrics

Citations

Cited By

Chen JHu QJiang H(2024)Alliance Makes Difference? Maximizing Social Welfare in Cross-Silo Federated LearningIEEE Transactions on Vehicular Technology10.1109/TVT.2023.332055073:2(2786-2798)Online publication date: Mar-2024
https://doi.org/10.1109/TVT.2023.3320550
Bai LTsai P(2024)A Secure Framework in Vertical and Horizontal Federated Learning Utilizing Homomorphic EncryptionNOMS 2024-2024 IEEE Network Operations and Management Symposium10.1109/NOMS59830.2024.10575488(1-5)Online publication date: 6-May-2024
https://doi.org/10.1109/NOMS59830.2024.10575488
Dilliwar VSahu MRakesh N(2024)Cluster computing-based EEG sub-band signal extraction with channel-wise and time-slice-wise data partitioning techniqueInternational Journal of Information Technology10.1007/s41870-024-01924-916:5(2763-2773)Online publication date: 11-May-2024
https://doi.org/10.1007/s41870-024-01924-9
Feng TXu JZhou ZLuo Y(2023)How Green Credit Policy Affects Commercial Banks' Credit Risk?Journal of Cases on Information Technology10.4018/JCIT.33385826:1(1-21)Online publication date: 17-Nov-2023
https://dl.acm.org/doi/10.4018/JCIT.333858

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Full Text

View this article in Full Text.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View full text|Download PDF

View Issue’s Table of Contents