Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3578356.3592587acmconferencesArticle/Chapter ViewAbstractPublication PageseurosysConference Proceedingsconference-collections
research-article

Decentralized Learning Made Easy with DecentralizePy

Published: 08 May 2023 Publication History
  • Get Citation Alerts
  • Abstract

    Decentralized learning (DL) has gained prominence for its potential benefits in terms of scalability, privacy, and fault tolerance. It consists of many nodes that coordinate without a central server and exchange millions of parameters in the inherently iterative process of machine learning (ML) training. In addition, these nodes are connected in complex and potentially dynamic topologies. Assessing the intricate dynamics of such networks is clearly not an easy task. Often in literature, researchers resort to simulated environments that do not scale and fail to capture practical and crucial behaviors, including the ones associated to parallelism, data transfer, network delays, and wall-clock time. In this paper, we propose decentralizepy, a distributed framework for decentralized ML, which allows for the emulation of large-scale learning networks in arbitrary topologies. We demonstrate the capabilities of decentralizepy by deploying techniques such as sparsification and secure aggregation on top of several topologies, including dynamic networks with more than one thousand nodes.

    References

    [1]
    Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, Manjunath Kudlur, Josh Levenberg, Rajat Monga, Sherry Moore, Derek G. Murray, Benoit Steiner, Paul Tucker, Vijay Vasudevan, Pete Warden, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. 2016. TensorFlow: a system for Large-Scale machine learning (OSDI'16). https://www.usenix.org/conference/osdi16/technical-sessions/presentation/abadi
    [2]
    Dan Alistarh, Demjan Grubic, Jerry Z. Li, Ryota Tomioka, and Milan Vojnovic. 2017. QSGD: Communication-Efficient SGD via Gradient Quantization and Encoding (NIPS'17).
    [3]
    Dan Alistarh, Torsten Hoefler, Mikael Johansson, Sarit Khirirat, Nikola Konstantinov, and Cédric Renggli. 2018. The Convergence of Sparsified Gradient Methods (NIPS'18). https://proceedings.neurips.cc/paper_files/paper/2018/file/314450613369e0ee72d0da7f6fee773c-Paper.pdf
    [4]
    Batiste Le Bars, Aurélien Bellet, Marc Tommasi, Erick Lavoie, and Anne-Marie Kermarrec. 2023. Refined Convergence and Topology Learning for Decentralized Optimization with Heterogeneous Data (AISTATS'23). arXiv:2204.04452
    [5]
    Aurélien Bellet, Anne-Marie Kermarrec, and Erick Lavoie. 2022. D-Cliques: Compensating for Data Heterogeneity with Topology in Decentralized Federated Learning (SRDS'22).
    [6]
    Juan Benet. 2014. IPFS - Content Addressed, Versioned, P2P File System. CoRR (2014). arXiv:1407.3561
    [7]
    Daniel J Beutel, Taner Topal, Akhil Mathur, Xinchi Qiu, Titouan Parcollet, and Nicholas D Lane. 2020. Flower: A Friendly Federated Learning Research Framework. (2020). arXiv:2007.14390
    [8]
    Luca Boccassi et al. 2023. ZeroMQ: An open-source universal messaging library. https://zeromq.org
    [9]
    Keith Bonawitz, Hubert Eichner, Wolfgang Grieskamp, et al. 2019. Towards Federated Learning at Scale: System Design (ML-Sys'19). https://proceedings.mlsys.org/paper_files/paper/2019/file/bd686fd640be98efaae0091fa301e613-Paper.pdf
    [10]
    Keith Bonawitz, Vladimir Ivanov, Ben Kreuter, Antonio Marcedone, H. Brendan McMahan, Sarvar Patel, Daniel Ramage, Aaron Segal, and Karn Seth. 2017. Practical Secure Aggregation for Privacy-Preserving Machine Learning (CCS '17).
    [11]
    Sebastian Caldas, Peter Wu, Tian Li, Jakub Konečný, H. Brendan McMahan, Virginia Smith, and Ameet Talwalkar. 2019. Leaf: A benchmark for federated settings. In 2nd Intl. Workshop on Federated Learning for Data Privacy and Confidentiality (FL-NeurIPS'19). arXiv:1812.01097
    [12]
    Akash Dhasade, Nevena Dresevic, Anne-Marie Kermarrec, and Rafael Pires. 2022. TEE-based decentralized recommender systems: The raw data sharing redemption (IPDPS'22).
    [13]
    Paulo Gouveia, João Neves, Carlos Segarra, Luca Liechti, Shady Issa, Valerio Schiavoni, and Miguel Matos. 2020. Kollaps: Decentralized and Dynamic Topology Emulation (EuroSys '20). Article 23.
    [14]
    Chaoyang He, Songze Li, Jinhyun So, Mi Zhang, Hongyi Wang, Xiaoyang Wang, Praneeth Vepakomma, Abhishek Singh, Hang Qiu, Li Shen, Peilin Zhao, Yan Kang, Yang Liu, Ramesh Raskar, Qiang Yang, Murali Annavaram, and Salman Avestimehr. 2020. FedML: A research library and benchmark for federated machine learning. (2020). arXiv:2007.13518
    [15]
    Kevin Hsieh, Amar Phanishayee, Onur Mutlu, and Phillip B. Gibbons. 2020. The Non-IID Data Quagmire of Decentralized Machine Learning. In Proceedings of the 37th International Conference on Machine Learning (ICML'20). Article 408. http://proceedings.mlr.press/v119/hsieh20a/hsieh20a.pdf
    [16]
    Márk Jelasity, Spyros Voulgaris, Rachid Guerraoui, Anne-Marie Kermarrec, and Maarten Van Steen. 2007. Gossip-based peer sampling. ACM Transactions on Computer Systems (TOCS) 25, 3 (2007), 8--es.
    [17]
    Peter Kairouz, H. Brendan McMahan, Brendan Avent, Aurélien Bellet, Mehdi Bennis, Arjun Nitin Bhagoji, Kallista Bonawitz, Zachary Charles, Graham Cormode, Rachel Cummings, Rafael G. L. D'Oliveira, Hubert Eichner, Salim El Rouayheb, David Evans, Josh Gardner, Zachary Garrett, Adrià Gascón, Badih Ghazi, Phillip B. Gibbons, Marco Gruteser, Zaid Harchaoui, Chaoyang He, Lie He, Zhouyuan Huo, Ben Hutchinson, Justin Hsu, Martin Jaggi, Tara Javidi, Gauri Joshi, Mikhail Khodak, Jakub Konecný, Aleksandra Korolova, Farinaz Koushanfar, Sanmi Koyejo, Tancrède Lepoint, Yang Liu, Prateek Mittal, Mehryar Mohri, Richard Nock, Ayfer Özgür, Rasmus Pagh, Hang Qi, Daniel Ramage, Ramesh Raskar, Mariana Raykova, Dawn Song, Weikang Song, Sebastian U. Stich, Ziteng Sun, Ananda Theertha Suresh, Florian Tramèr, Praneeth Vepakomma, Jianyu Wang, Li Xiong, Zheng Xu, Qiang Yang, Felix X. Yu, Han Yu, and Sen Zhao. 2020. Advances and open problems in federated learning. Foundations and Trends in Machine Learning 14, 1--2 (2020).
    [18]
    Anastasia Koloskova, Tao Lin, Sebastian U Stich, and Martin Jaggi. 2020. Decentralized Deep Learning with Arbitrary Communication Compression (ICLR'20). https://openreview.net/forum?id=SkgGCkrKvH
    [19]
    Anastasia Koloskova, Nicolas Loizou, Sadra Boreiri, Martin Jaggi, and Sebastian Stich. 2020. A Unified Theory of Decentralized SGD with Changing Topology and Local Updates (ICML'20). https://proceedings.mlr.press/v119/koloskova20a.html
    [20]
    Anastasia Koloskova, Sebastian Stich, and Martin Jaggi. 2019. Decentralized stochastic optimization and gossip algorithms with compressed communication (ICML'19). https://proceedings.mlr.press/v97/koloskova19a.html
    [21]
    Alex Krizhevsky, Vinod Nair, and Geoffrey Hinton. 2014. The CIFAR-10 dataset. 55, 5 (2014). https://www.cs.toronto.edu/~kriz/cifar.html
    [22]
    Fan Lai, Yinwei Dai, Sanjay Singapuram, et al. 2022. FedScale: Benchmarking Model and System Performance of Federated Learning at Scale (ICML'22). https://proceedings.mlr.press/v162/lai22a.html
    [23]
    Xiangru Lian, Ce Zhang, Huan Zhang, Cho-Jui Hsieh, Wei Zhang, and Ji Liu. 2017. Can Decentralized Algorithms Outperform Centralized Algorithms? A Case Study for Decentralized Parallel Stochastic Gradient Descent (NIPS'17). https://proceedings.neurips.cc/paper_files/paper/2017/file/f75526659f31040afeb61cb7133e4e6d-Paper.pdf
    [24]
    Yujun Lin, Song Han, Huizi Mao, Yu Wang, and Bill Dally. 2018. Deep Gradient Compression: Reducing the Communication Bandwidth for Distributed Training (ICLR'18). https://openreview.net/forum?id=SkhQHMW0W
    [25]
    Yang Liu, Tao Fan, Tianjian Chen, Qian Xu, and Qiang Yang. 2021. FATE: An Industrial Grade Platform for Collaborative Learning With Data Protection. J. Mach. Learn. Res. 22, 226 (2021). http://jmlr.org/papers/v22/20-815.html
    [26]
    Brendan McMahan, Eider Moore, Daniel Ramage, Seth Hampson, and Blaise Aguera y Arcas. 2017. Communication-efficient learning of deep networks from decentralized data (AISTATS'17). https://proceedings.mlr.press/v54/mcmahan17a/mcmahan17a.pdf
    [27]
    Message Passing Interface Forum. 2021. MPI: A Message-Passing Interface Standard Version 4.0. https://www.mpi-forum.org/docs/mpi4.0/mpi40-report.pdf
    [28]
    Christodoulos Pappas, Dimitris Chatzopoulos, Spyros Lalis, and Manolis Vavalis. 2021. IPLS: A Framework for Decentralized Federated Learning (IFIP Networking'21).
    [29]
    Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas Kopf, Edward Yang, Zachary DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. 2019. PyTorch: An Imperative Style, High-Performance Deep Learning Library. In NeurIPS'19. https://proceedings.neurips.cc/paper_files/paper/2019/file/bdbca288fee7f92f2bfa9f7012727740-Paper.pdf
    [30]
    Holger R Roth, Yan Cheng, Yuhong Wen, Isaac Yang, Ziyue Xu, YuanTing Hsieh, Kristopher Kersten, Ahmed Harouni, Can Zhao, Kevin Lu, Zhihong Zhang, Wenqi Li, Andriy Myronenko, Dong Yang, Sean Yang, Nicola Rieke, Abood Quraini, Chester Chen, Daguang Xu, Nic Ma, Prerna Dogra, Mona G Flores, and Andrew Feng. 2022. NVIDIA FLARE: Federated Learning from Simulation to Real-World. In Workshop on Federated Learning: Recent Advances and New Challenges. https://openreview.net/forum?id=hD9QaIQTL_f
    [31]
    Rishi Sharma et al. 2022. decentralizepy: An open-source decentralized learning research framework. https://github.com/sacs-epfl/decentralizepy
    [32]
    Nikko Strom. 2015. Scalable distributed DNN training using commodity GPU cloud computing. In 16th Annual Conference of the International Speech Communication Association (INTER-SPEECH'15). https://www.isca-speech.org/archive_v0/interspeech_2015/papers/i15_1488.pdf
    [33]
    Thijs Vogels, Hadrien Hendrikx, and Martin Jaggi. 2022. Beyond spectral gap: the role of the topology in decentralized learning (NeurIPS'22). https://proceedings.neurips.cc/paper_files/paper/2022/file/61162d94822d468ee6e92803340f2040-Paper-Conference.pdf
    [34]
    Thijs Vogels, Sai Praneeth Karimireddy, and Martin Jaggi. 2020. Practical Low-Rank Communication Compression in Decentralized Deep Learning (NeurIPS'20). https://proceedings.neurips.cc/paper_files/paper/2020/file/a376802c0811f1b9088828288eb0d3f0-Paper.pdf
    [35]
    Milos Vujasinovic. 2023. Secure Aggregation on Sparse Models in Decentralized Learning Systems. Master's thesis. EPFL. https://www.epfl.ch/labs/sacs/wp-content/uploads/2023/02/Secure_Aggregation_on_Sparse_Models_in_Decentralized_Learning_Systems___Milos_Vujasinovic.pdf
    [36]
    Lin Xiao, Stephen Boyd, and Seung-Jean Kim. 2007. Distributed average consensus with least-mean-square deviation. J. Parallel and Distrib. Comput. 67, 1 (2007).
    [37]
    Timothy Yang, Galen Andrew, Hubert Eichner, Haicheng Sun, Wei Li, Nicholas Kong, Daniel Ramage, and Françoise Beaufays. 2018. Applied federated learning: Improving google keyboard query suggestions. (2018). arXiv:1812.02903
    [38]
    Tongtian Zhu, Fengxiang He, Lan Zhang, Zhengyang Niu, Mingli Song, and Dacheng Tao. 2022. Topology-aware generalization of decentralized SGD (ICML'22). https://proceedings.mlr.press/v162/zhu22d.html
    [39]
    Alexander Ziller, Andrew Trask, Antonio Lopardo, et al. 2021. PySyft: A library for easy federated learning. In Federated Learning Systems.

    Cited By

    View all
    • (2023)Get More for Less in Decentralized Learning Systems2023 IEEE 43rd International Conference on Distributed Computing Systems (ICDCS)10.1109/ICDCS57875.2023.00067(463-474)Online publication date: Jul-2023

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    EuroMLSys '23: Proceedings of the 3rd Workshop on Machine Learning and Systems
    May 2023
    176 pages
    ISBN:9798400700842
    DOI:10.1145/3578356
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 08 May 2023

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. decentralized learning
    2. middleware
    3. machine learning
    4. distributed systems
    5. peer-to-peer
    6. network topology

    Qualifiers

    • Research-article

    Conference

    EuroMLSys '23
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 18 of 26 submissions, 69%

    Upcoming Conference

    EuroSys '25
    Twentieth European Conference on Computer Systems
    March 30 - April 3, 2025
    Rotterdam , Netherlands

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)143
    • Downloads (Last 6 weeks)12

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)Get More for Less in Decentralized Learning Systems2023 IEEE 43rd International Conference on Distributed Computing Systems (ICDCS)10.1109/ICDCS57875.2023.00067(463-474)Online publication date: Jul-2023

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media