Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3578356.3592590acmconferencesArticle/Chapter ViewAbstractPublication PageseurosysConference Proceedingsconference-collections
research-article
Open access

A First Look at the Impact of Distillation Hyper-Parameters in Federated Knowledge Distillation

Published: 08 May 2023 Publication History
  • Get Citation Alerts
  • Abstract

    Knowledge distillation has been known as a useful way for model compression. It has been recently adopted in the distributed training domain, such as federated learning, as a way to transfer knowledge between already pre-trained models. Knowledge distillation in distributed settings promises advantages, including significantly reducing the communication overhead and allowing heterogeneous model architectures. However, distillation is still not well studied and understood in such settings, which hinders the possible gains. We bridge this gap by performing an experimental analysis of the distillation process in the distributed training setting, mainly with non-IID data. We highlight some elements that require special considerations when transferring knowledge between already pre-trained models: the transfer set, the temperature, the weight, and the positioning. Appropriately tuning these hyper-parameters can remarkably boost learning outcomes. In our experiments, around two-thirds of the participants require settings other than commonly used default settings in literature, and appropriate tuning can reach more than five times improvement on average.

    References

    [1]
    Rohan Anil, Gabriel Pereyra, Alexandre Passos, Robert Ormandi, George E Dahl, and Geoffrey E Hinton. 2018. Large scale distributed neural network training through online distillation. arXiv:1804.03235 [cs.LG]
    [2]
    Daniel J. Beutel, Taner Topal, Akhil Mathur, Xinchi Qiu, Javier Fernandez-Marques, Yan Gao, Lorenzo Sani, Kwing Hei Li, Titouan Parcollet, Pedro Porto Buarque de Gusmão, and Nicholas D. Lane. 2022. Flower: A Friendly Federated Learning Research Framework. arXiv:2007.14390 [cs.LG]
    [3]
    Ilai Bistritz, Ariana Mann, and Nicholas Bambos. 2020. Distributed distillation for on-device learning. In NeurIPS.
    [4]
    Cristian Buciluǎ, Rich Caruana, and Alexandru Niculescu-Mizil. 2006. Model compression. In KDD.
    [5]
    Hongyan Chang, Virat Shejwalkar, Reza Shokri, and Amir Houmansadr. 2019. Cronus: Robust and Heterogeneous Collaborative Learning with Black-Box Knowledge Transfer. arXiv:1912.11279 [stat.ML]
    [6]
    Hong-You Chen and Wei-Lun Chao. 2021. FedBE: Making Bayesian Model Ensemble Applicable to Federated Learning. arXiv:2009.01974 [cs.LG]
    [7]
    Xuan Gong, Abhishek Sharma, Srikrishna Karanam, Ziyan Wu, Terrence Chen, David Doermann, and Arun Innanje. 2021. Ensemble attention distillation for privacy-preserving federated learning. In ICCV.
    [8]
    Jianping Gou, Baosheng Yu, Stephen J Maybank, and Dacheng Tao. 2021. Knowledge distillation: A survey. International Journal of Computer Vision 129, 6 (2021).
    [9]
    Neel Guha, Ameet Talwalkar, and Virginia Smith. 2019. One-Shot Federated Learning. arXiv:1902.11175 [cs.LG]
    [10]
    Geoffrey Hinton, Oriol Vinyals, and Jeff Dean. 2015. Distilling the Knowledge in a Neural Network. arXiv:1503.02531 [stat.ML]
    [11]
    Li Hu, Hongyang Yan, Lang Li, Zijie Pan, Xiaozhang Liu, and Zulong Zhang. 2021. MHAT: An efficient model-heterogenous aggregation training scheme for federated learning. Information Sciences 560 (2021).
    [12]
    Sohei Itahara, Takayuki Nishio, Yusuke Koda, Masahiro Morikura, and Koji Yamamoto. 2021. Distillation-based semi-supervised federated learning for communication-efficient collaborative training with non-iid private data. IEEE Transactions on Mobile Computing 22, 1 (2021).
    [13]
    Shivam Kalra, Junfeng Wen, Jesse C. Cresswell, Maksims Volkovs, and Hamid R. Tizhoosh. 2021. ProxyFL: Decentralized Federated Learning through Proxy Model Sharing. arXiv:2111.11343 [cs.LG]
    [14]
    Alex Krizhevsky, Geoffrey Hinton, et al. 2009. Learning multiple layers of features from tiny images. Technical Report. University of Toronto.
    [15]
    Fan Lai, Yinwei Dai, Sanjay Singapuram, Jiachen Liu, Xiangfeng Zhu, Harsha Madhyastha, and Mosharaf Chowdhury. 2022. Fedscale: Benchmarking model and system performance of federated learning at scale. In ICML.
    [16]
    Chengxi Li, Gang Li, and Pramod K Varshney. 2021. Decentralized federated learning via mutual knowledge transfer. IEEE Internet of Things Journal 9, 2 (2021).
    [17]
    Daliang Li and Junpu Wang. 2019. FedMD: Heterogenous Federated Learning via Model Distillation. arXiv:1910.03581 [cs.LG]
    [18]
    Tao Lin, Lingjing Kong, Sebastian U. Stich, and Martin Jaggi. 2020. Ensemble distillation for robust model fusion in federated learning. In NeurIPS.
    [19]
    Mi Luo, Fei Chen, Dapeng Hu, Yifan Zhang, Jian Liang, and Jiashi Feng. 2021. No Fear of Heterogeneity: Classifier Calibration for Federated Learning with Non-IID Data. In NeurIPS.
    [20]
    Michael Matena and Colin Raffel. 2022. Merging Models with Fisher-Weighted Averaging. arXiv:2111.09832 [cs.LG]
    [21]
    Felix Sattler, Tim Korjakow, Roman Rischke, and Wojciech Samek. 2021. Fedaux: Leveraging unlabeled auxiliary data in federated learning. IEEE Transactions on Neural Networks and Learning Systems (2021).
    [22]
    Felix Sattler, Arturo Marban, Roman Rischke, and Wojciech Samek. 2021. Cfd: Communication-efficient federated distillation via soft-label quantization and delta coding. IEEE Transactions on Network Science and Engineering 9, 4 (2021).
    [23]
    Jürgen Schmidhuber. 1991. Neural sequence chunkers. Inst. für Informatik.
    [24]
    Samuel Stanton, Pavel Izmailov, Polina Kirichenko, Alexander A Alemi, and Andrew G Wilson. 2021. Does knowledge distillation really work?. In NeurIPS.
    [25]
    Yue Tan, Guodong Long, Lu Liu, Tianyi Zhou, Qinghua Lu, Jing Jiang, and Chengqi Zhang. 2022. Fedproto: Federated prototype learning across heterogeneous clients. In AAAI.
    [26]
    Linfeng Zhang, Chenglong Bao, and Kaisheng Ma. 2022. Self-distillation: Towards efficient and compact neural networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 44, 8 (2022).
    [27]
    Ying Zhang, Tao Xiang, Timothy M Hospedales, and Huchuan Lu. 2018. Deep mutual learning. In CVPR.

    Index Terms

    1. A First Look at the Impact of Distillation Hyper-Parameters in Federated Knowledge Distillation

        Recommendations

        Comments

        Information & Contributors

        Information

        Published In

        cover image ACM Conferences
        EuroMLSys '23: Proceedings of the 3rd Workshop on Machine Learning and Systems
        May 2023
        176 pages
        ISBN:9798400700842
        DOI:10.1145/3578356
        This work is licensed under a Creative Commons Attribution International 4.0 License.

        Sponsors

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        Published: 08 May 2023

        Check for updates

        Author Tags

        1. knowledge distillation
        2. joint distillation
        3. decentralized learning

        Qualifiers

        • Research-article

        Funding Sources

        • King Abdullah University of Science and Technology (KAUST)

        Conference

        EuroMLSys '23
        Sponsor:

        Acceptance Rates

        Overall Acceptance Rate 18 of 26 submissions, 69%

        Upcoming Conference

        EuroSys '25
        Twentieth European Conference on Computer Systems
        March 30 - April 3, 2025
        Rotterdam , Netherlands

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • 0
          Total Citations
        • 266
          Total Downloads
        • Downloads (Last 12 months)222
        • Downloads (Last 6 weeks)14

        Other Metrics

        Citations

        View Options

        View options

        PDF

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        Get Access

        Login options

        Media

        Figures

        Other

        Tables

        Share

        Share

        Share this Publication link

        Share on social media