Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Peer-to-peer deep learning with non-IID data

Published: 15 March 2023 Publication History
  • Get Citation Alerts
  • Abstract

    Collaborative training of deep neural networks using edge devices has attracted substantial research interest recently. The two main architecture approaches for the training process are centrally orchestrated Federated Learning and fully decentralized peer-to-peer learning. In decentralized systems, edge devices, known as agents, collaborate in a peer-to-peer architecture, avoiding the need for a central system to orchestrate the process. Decentralized peer-to-peer (P2P) learning techniques are well researched under the assumption of independent and identically distributed (IID) data across the agents. IID data is seldom observed in real-world distributed systems, and the training performance varies significantly with non-IID data. This paper proposes a decentralized learning variant of the P2P gossip averaging method with Batch Normalization (BN) adaptation for P2P architectures. It is well-known that BN layers accelerate the convergence of the non-distributed deep learning models. Recent research confirms that Federated Learning methods benefit from using the BN method with some aggregation alterations. Our work demonstrated BN effectiveness in P2P architectures by mitigating the non-IID data characteristics across decentralized agents. We also introduce a variant of the early stopping technique that, combined with BN layers, acts as a fine-tuning technique for agent models. We validated our approach by conducting numerous simulations of different model-topology-communication combinations and comparing them to other decentralized baseline approaches. The evaluations were conducted on the next word prediction task using user comments from the Reddit and StackOverflow datasets representing comments from two different domains. Simulations showed that our approach, on average, achieves a mean relative top accuracy increase of 16.9% in ring (19.9% for Reddit, 13.9% for StackOverflow) and 29.8% in sparse (32.9% for Reddit, 26.6% for StackOverflow) communication topologies compared to the best baseline approach. Our code is available at https://github.com/fipu-lab/p2p_bn.

    Highlights

    Batch Normalization layers improve decentralized learning on non-IID data.
    P2P-BN improves decentralized learning by fine-tuning the Batch Normalization layers.
    P2P-BN produces stable models even in sparse communication topologies.

    References

    [1]
    Almeida I., Xavier J., DJAM: Distributed Jacobi asynchronous method for learning personal models, IEEE Signal Processing Letters 25 (9) (2018) 1389–1392,.
    [2]
    Andreux M., du Terrail J.O., Beguier C., Tramel E.W., Siloed federated learning for multi-centric histopathology datasets, Springer-Verlag, Berlin, Heidelberg, 2020, pp. 129–139,.
    [3]
    Arivazhagan M.G., Aggarwal V., Singh A., Choudhary S., Federated learning with personalization layers, 2019, arXiv arXiv:1912.00818.
    [4]
    Assran M., Loizou N., Ballas N., Rabbat M., Stochastic gradient push for distributed deep learning, in: Chaudhuri K., Salakhutdinov R. (Eds.), Proceedings of the 36th international conference on machine learning, in: Proceedings of machine learning research, vol. 97, PMLR, 2019, pp. 344–353. URL https://proceedings.mlr.press/v97/assran19a.html.
    [5]
    Bellet A., Guerraoui R., Taziki M., Tommasi M., Personalized and private peer-to-peer machine learning, in: Storkey A., Perez-Cruz F. (Eds.), Proceedings of the twenty-first international conference on artificial intelligence and statistics, in: Proceedings of machine learning research, vol. 84, PMLR, 2018, pp. 473–481. URL https://proceedings.mlr.press/v84/bellet18a.html.
    [6]
    Blot M., Picard D., Cord M., Thome N., Gossip training for deep learning, 2016.
    [7]
    Blot M., Picard D., Thome N., Cord M., Distributed optimization for deep learning with gossip exchange, Neurocomputing 330 (2019) 287–296,. URL https://www.sciencedirect.com/science/article/pii/S0925231218313195.
    [8]
    Boubouh, K., Boussetta, A., Benkaouz, Y., & Guerraoui, R. (2020). Robust P2P personalized learning. In 2020 International symposium on reliable distributed systems (pp. 299–308). https://doi.org/10.1109/SRDS51746.2020.00037.
    [9]
    Bouchra Pilet A., Frey D., Taïani F., Robust privacy-preserving gossip averaging, in: SSS 2019 - 21st international symposium on stabilization, safety, and security of distributed systems, Springer, Pisa, Italy, 2019, pp. 38–52,. URL https://hal.archives-ouvertes.fr/hal-02373353.
    [10]
    Bouchra Pilet A., Frey D., Taïani F., Simple, efficient and convenient decentralized multi-task learning for neural networks, in: IDA 2021 - 19th symposium on intelligent data analysis, in: Lecture notes in computer science, 12695, Springer, Porto, Portugal, 2021,. URL https://hal.archives-ouvertes.fr/hal-02373338.
    [11]
    Caldas S., Wu P., Li T., Konecný J., McMahan H.B., Smith V., Talwalkar A.S., LEAF: A benchmark for federated settings, 2018, arXiv arXiv:1812.01097.
    [12]
    Chang H., Shejwalkar V., Shokri R., Houmansadr A., Cronus: Robust and heterogeneous collaborative learning with black-box knowledge transfer, 2021, URL http://arxiv.org/abs/1912.11279.
    [13]
    Cheng, H.-P., Yu, P., Hu, H., Yan, F., Li, S., Li, H., & Chen, Y. (2018). LEASGD: An efficient and privacy-preserving decentralized algorithm for distributed learning. In Proceedings of NIPS workshop on privacy preserving machine learning.
    [14]
    Cho K., van Merriënboer B., Gulcehre C., Bahdanau D., Bougares F., Schwenk H., Bengio Y., Learning phrase representations using RNN encoder–decoder for statistical machine translation, in: Proceedings of the 2014 conference on empirical methods in natural language processing EMNLP, Association for Computational Linguistics, Doha, Qatar, 2014, pp. 1724–1734,. URL https://aclanthology.org/D14-1179.
    [15]
    Daily J.A., Vishnu A., Siegel C.M., Warfel T.E., Amatya V.C., GossipGraD: Scalable deep learning using gossip communication based asynchronous gradient descent, 2018, arXiv arXiv:1803.05880.
    [16]
    Deng L., The mnist database of handwritten digit images for machine learning research, IEEE Signal Processing Magazine 29 (6) (2012) 141–142.
    [17]
    Deng Y., Kamani M.M., Mahdavi M., Adaptive personalized federated learning, 2021.
    [18]
    Fallah A., Mokhtari A., Ozdaglar A., Personalized federated learning with theoretical guarantees: A model-agnostic meta-learning approach, Larochelle H., Ranzato M., Hadsell R., Balcan M.F., Lin H. (Eds.), Advances in neural information processing systems, vol. 33, Curran Associates, Inc., 2020, pp. 3557–3568. URL https://proceedings.neurips.cc/paper/2020/file/24389bfe4fe2eba8bf9aa9203a44cdad-Paper.pdf.
    [19]
    Frankle, J., Schwab, D. J., & Morcos, A. S. (2021). Training BatchNorm and Only BatchNorm: On the Expressive Power of Random Features in CNNs. In International conference on learning representations.
    [20]
    Guo S., Zhang T., Xiang T., Liu Y., Differentially private decentralized learning, 2020, URL http://arxiv.org/abs/2006.07817.
    [21]
    Hard A., Kiddon C.M., Ramage D., Beaufays F., Eichner H., Rao K., Mathews R., Augenstein S., Federated learning for mobile keyboard prediction, 2018, URL https://arxiv.org/abs/1811.03604.
    [22]
    He C., Tan C., Tang H., Qiu S., Liu J., Central server free federated learning over single-sided trust social networks, 2019, arXiv arXiv:1910.04956, URL http://arxiv.org/abs/1910.04956.
    [23]
    He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep Residual Learning for Image Recognition. In 2016 IEEE conference on computer vision and pattern recognition (pp. 770–778). https://doi.org/10.1109/CVPR.2016.90.
    [24]
    Hochreiter S., Schmidhuber J., Long short-term memory, Neural Computation 9 (8) (1997) 1735–1780,.
    [25]
    Hu, J., Shen, L., & Sun, G. (2018). Squeeze-and-Excitation Networks. In 2018 IEEE/CVF conference on computer vision and pattern recognition (pp. 7132–7141). https://doi.org/10.1109/CVPR.2018.00745.
    [26]
    Huang Y., Chu L., Zhou Z., Wang L., Liu J., Pei J., Zhang Y., Personalized federated learning: An attentive collaboration approach, 2020, CoRR abs/2007.03797, URL https://arxiv.org/abs/2007.03797.
    [27]
    Ioffe S., Szegedy C., Batch normalization: Accelerating deep network training by reducing internal covariate shift, in: Proceedings of the 32nd international conference on international conference on machine learning, vol. 37, JMLR.org, 2015, pp. 448–456.
    [28]
    Jelasity M., Montresor A., Babaoglu Ö., Gossip-based aggregation in large dynamic networks, ACM Transactions on Computer Systems 23 (2005) 219–252.
    [29]
    Jelasity M., Voulgaris S., Guerraoui R., Kermarrec A.-M., van Steen M., Gossip-based peer sampling, ACM Transactions on Computer Systems 25 (3) (2007) 8–es,.
    [30]
    Jiang Y., Konečný J., Rush K., Kannan S., Improving federated learning personalization via model agnostic meta learning, 2020, URL http://arxiv.org/abs/1909.12488.
    [31]
    Karimireddy S.P., Kale S., Mohri M., Reddi S., Stich S., Suresh A.T., SCAFFOLD: Stochastic controlled averaging for federated learning, in: III H.D., Singh A. (Eds.), Proceedings of the 37th international conference on machine learning, in: Proceedings of machine learning research, vol. 119, PMLR, 2020, pp. 5132–5143. URL https://proceedings.mlr.press/v119/karimireddy20a.html.
    [32]
    Lalitha A., Kilinc O.C., Javidi T., Koushanfar F., Peer-to-peer federated learning on graphs, 2019, arXiv arXiv:1901.11173, URL http://arxiv.org/abs/1901.11173.
    [33]
    Li, X., Huang, K., Yang, W., Wang, S., & Zhang, Z. (2020). On the Convergence of FedAvg on Non-IID Data. In International conference on learning representations.
    [34]
    Li X., Jiang M., Zhang X., Kamp M., Dou Q., FedBN: Federated learning on non-IID features via local batch normalization, in: 9th International conference on learning representations, OpenReview.net, 2021.
    [35]
    Li, D., & Wang, J. (2019). FedMD: Heterogenous federated learning via model distillation. In International workshop on federated learning for user privacy and data confidentiality in conjunction with NeurIPS.
    [36]
    Lian X., Zhang C., Zhang H., Hsieh C.-J., Zhang W., Liu J., Can decentralized algorithms outperform centralized algorithms? A case study for decentralized parallel stochastic gradient descent, Guyon I., Luxburg U.V., Bengio S., Wallach H., Fergus R., Vishwanathan S., Garnett R. (Eds.), Advances in neural information processing systems, vol. 30, Curran Associates, Inc., 2017, URL https://proceedings.neurips.cc/paper/2017/file/f75526659f31040afeb61cb7133e4e6d-Paper.pdf.
    [37]
    Lian X., Zhang W., Zhang C., Liu J., Asynchronous decentralized parallel stochastic gradient descent, in: Dy J., Krause A. (Eds.), Proceedings of the 35th international conference on machine learning, in: Proceedings of machine learning research, vol. 80, PMLR, 2018, pp. 3043–3052. URL https://proceedings.mlr.press/v80/lian18a.html.
    [38]
    Luo Q., He J., Zhuo Y., Qian X., Prague: High-performance heterogeneity-aware asynchronous decentralized training, in: Proceedings of the twenty-fifth international conference on architectural support for programming languages and operating systems, Association for Computing Machinery, New York, NY, USA, 2020, pp. 401–416,.
    [39]
    Ma X., Zhu J., Lin Z., Chen S., Qin Y., A state-of-the-art survey on solving non-IID data in federated learning, Future Generation Computer Systems 135 (2022) 244–258,. URL https://www.sciencedirect.com/science/article/pii/S0167739X22001686.
    [40]
    McMahan B., Moore E., Ramage D., Hampson S., Arcas B.A.y., Communication-efficient learning of deep networks from decentralized data, in: Singh A., Zhu J. (Eds.), Proceedings of the 20th international conference on artificial intelligence and statistics, in: Proceedings of machine learning research, vol. 54, PMLR, 2017, pp. 1273–1282. URL https://proceedings.mlr.press/v54/mcmahan17a.html.
    [41]
    Mills J., Hu J., Min G., Multi-task federated learning for personalised deep neural networks in edge computing, IEEE Transactions on Parallel and Distributed Systems 33 (03) (2022) 630–641,.
    [42]
    Morgan N., Bourlard H., Generalization and parameter estimation in feedforward nets: Some experiments, in: Touretzky D. (Ed.), Advances in neural information processing systems, vol. 2, Morgan-Kaufmann, 1989, URL https://proceedings.neurips.cc/paper/1989/file/63923f49e5241343aa7acb6a06a751e7-Paper.pdf.
    [43]
    Mudrakarta P.K., Sandler M., Zhmoginov A., Howard A., K for the price of 1: Parameter efficient multi-task and transfer learning, in: International conference on learning representations, 2019, pp. 1–15. arXiv:1810.10703.
    [44]
    Nedić A., Olshevsky A., Stochastic gradient-push for strongly convex functions on time-varying directed graphs, IEEE Transactions on Automatic Control 61 (12) (2016) 3936–3947,.
    [45]
    Reddi, S. J., Charles, Z., Zaheer, M., Garrett, Z., Rush, K., Konečný, J., Kumar, S., & McMahan, H. B. (2021). Adaptive federated optimization. In International conference on learning representations.
    [46]
    Rosenfeld, A., & Tsotsos, J. K. (2019). Intriguing properties of randomly weighted networks: Generalizing while learning next to nothing. In 2019 16th conference on computer and robot vision (CRV), (pp. 9–16). https://doi.org/10.1109/CRV.2019.00010.
    [47]
    Šajina, R., Tanković, N., & Etinger, D. (2020). Decentralized trustless gossip training of deep neural networks. In 2020 43rd International convention on information, communication and electronic technology (pp. 1080–1084). https://doi.org/10.23919/MIPRO48935.2020.9245248.
    [48]
    StackOverflow R., Stack overflow data, 2018, URL https://www.kaggle.com/datasets/stackoverflow/stackoverflow.
    [49]
    Stremmel J., Singh A., Pretraining federated text models for next word prediction, in: Arai K. (Ed.), Advances in information and communication, Springer International Publishing, 2021, pp. 477–488.
    [50]
    Tan M., Le Q., EfficientNet: Rethinking model scaling for convolutional neural networks, in: Chaudhuri K., Salakhutdinov R. (Eds.), Proceedings of the 36th international conference on machine learning, in: Proceedings of machine learning research, vol. 97, PMLR, 2019, pp. 6105–6114. URL https://proceedings.mlr.press/v97/tan19a.html.
    [51]
    Tang H., Lian X., Yan M., Zhang C., Liu J., D2: Decentralized training over decentralized data, Proceedings of the 35th international conference on machine learning, vol. 80, 2018, pp. 4848––4856. URL http://proceedings.mlr.press/v80/tang18a.html.
    [52]
    Trockman A., Kolter J.Z., Patches are all you need?, 2022, URL https://openreview.net/forum?id=TVHS5Y4dNvM.
    [53]
    Vanhaesebrouck P., Bellet A., Tommasi M., Decentralized collaborative learning of personalized models over networks, in: Singh A., Zhu J. (Eds.), Proceedings of the 20th international conference on artificial intelligence and statistics, in: Proceedings of machine learning research, vol. 54, PMLR, 2017, pp. 509–517. URL https://proceedings.mlr.press/v54/vanhaesebrouck17a.html.
    [54]
    Wang J., Joshi G., Cooperative SGD: A unified framework for the design and analysis of local-update SGD algorithms, Journal of Machine Learning Research 22 (213) (2021) 1–50. URL http://jmlr.org/papers/v22/20-147.html.
    [55]
    Wang K., Mathews R., Kiddon C., Eichner H., Beaufays F., Ramage D., Federated evaluation of on-device personalization, 2019, arXiv arXiv:1910.10252, URL http://arxiv.org/abs/1910.10252.
    [56]
    Xie C., Koyejo O., Gupta I., Asynchronous federated optimization, 2019, arXiv arXiv:1903.03934, URL http://arxiv.org/abs/1903.03934.
    [57]
    Yu R., Li P., Toward resource-efficient federated learning in mobile edge computing, IEEE Network 35 (1) (2021) 148–155,.
    [58]
    Zantedeschi V., Bellet A., Tommasi M., Fully decentralized joint learning of personalized models and collaboration graphs, in: Chiappa S., Calandra R. (Eds.), Proceedings of the twenty third international conference on artificial intelligence and statistics, in: Proceedings of machine learning research, vol. 108, PMLR, 2020, pp. 864–874. URL https://proceedings.mlr.press/v108/zantedeschi20a.html.

    Cited By

    View all
    • (2024)Multi-task peer-to-peer learning using an encoder-only transformer modelFuture Generation Computer Systems10.1016/j.future.2023.11.006152:C(170-178)Online publication date: 1-Mar-2024

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image Expert Systems with Applications: An International Journal
    Expert Systems with Applications: An International Journal  Volume 214, Issue C
    Mar 2023
    1471 pages

    Publisher

    Pergamon Press, Inc.

    United States

    Publication History

    Published: 15 March 2023

    Author Tags

    1. Peer-to-peer
    2. Gossip averaging
    3. Decentralized learning
    4. Batch normalization
    5. Neural network
    6. Machine learning

    Qualifiers

    • Research-article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Multi-task peer-to-peer learning using an encoder-only transformer modelFuture Generation Computer Systems10.1016/j.future.2023.11.006152:C(170-178)Online publication date: 1-Mar-2024

    View Options

    View options

    Get Access

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media