research-article

Open access

A First Look at the Impact of Distillation Hyper-Parameters in Federated Knowledge Distillation

Authors:

Norah Alballa and

Marco CaniniAuthors Info & Claims

EuroMLSys '23: Proceedings of the 3rd Workshop on Machine Learning and Systems

May 2023

Pages 123 - 130

https://doi.org/10.1145/3578356.3592590

Published: 08 May 2023 Publication History

Abstract

Knowledge distillation has been known as a useful way for model compression. It has been recently adopted in the distributed training domain, such as federated learning, as a way to transfer knowledge between already pre-trained models. Knowledge distillation in distributed settings promises advantages, including significantly reducing the communication overhead and allowing heterogeneous model architectures. However, distillation is still not well studied and understood in such settings, which hinders the possible gains. We bridge this gap by performing an experimental analysis of the distillation process in the distributed training setting, mainly with non-IID data. We highlight some elements that require special considerations when transferring knowledge between already pre-trained models: the transfer set, the temperature, the weight, and the positioning. Appropriately tuning these hyper-parameters can remarkably boost learning outcomes. In our experiments, around two-thirds of the participants require settings other than commonly used default settings in literature, and appropriate tuning can reach more than five times improvement on average.

References

[1]

Rohan Anil, Gabriel Pereyra, Alexandre Passos, Robert Ormandi, George E Dahl, and Geoffrey E Hinton. 2018. Large scale distributed neural network training through online distillation. arXiv:1804.03235 [cs.LG]

[2]

Daniel J. Beutel, Taner Topal, Akhil Mathur, Xinchi Qiu, Javier Fernandez-Marques, Yan Gao, Lorenzo Sani, Kwing Hei Li, Titouan Parcollet, Pedro Porto Buarque de Gusmão, and Nicholas D. Lane. 2022. Flower: A Friendly Federated Learning Research Framework. arXiv:2007.14390 [cs.LG]

[3]

Ilai Bistritz, Ariana Mann, and Nicholas Bambos. 2020. Distributed distillation for on-device learning. In NeurIPS.

[4]

Cristian Buciluǎ, Rich Caruana, and Alexandru Niculescu-Mizil. 2006. Model compression. In KDD.

[5]

Hongyan Chang, Virat Shejwalkar, Reza Shokri, and Amir Houmansadr. 2019. Cronus: Robust and Heterogeneous Collaborative Learning with Black-Box Knowledge Transfer. arXiv:1912.11279 [stat.ML]

[6]

Hong-You Chen and Wei-Lun Chao. 2021. FedBE: Making Bayesian Model Ensemble Applicable to Federated Learning. arXiv:2009.01974 [cs.LG]

[7]

Xuan Gong, Abhishek Sharma, Srikrishna Karanam, Ziyan Wu, Terrence Chen, David Doermann, and Arun Innanje. 2021. Ensemble attention distillation for privacy-preserving federated learning. In ICCV.

[8]

Jianping Gou, Baosheng Yu, Stephen J Maybank, and Dacheng Tao. 2021. Knowledge distillation: A survey. International Journal of Computer Vision 129, 6 (2021).

Digital Library

[9]

Neel Guha, Ameet Talwalkar, and Virginia Smith. 2019. One-Shot Federated Learning. arXiv:1902.11175 [cs.LG]

[10]

Geoffrey Hinton, Oriol Vinyals, and Jeff Dean. 2015. Distilling the Knowledge in a Neural Network. arXiv:1503.02531 [stat.ML]

[11]

Li Hu, Hongyang Yan, Lang Li, Zijie Pan, Xiaozhang Liu, and Zulong Zhang. 2021. MHAT: An efficient model-heterogenous aggregation training scheme for federated learning. Information Sciences 560 (2021).

[12]

Sohei Itahara, Takayuki Nishio, Yusuke Koda, Masahiro Morikura, and Koji Yamamoto. 2021. Distillation-based semi-supervised federated learning for communication-efficient collaborative training with non-iid private data. IEEE Transactions on Mobile Computing 22, 1 (2021).

[13]

Shivam Kalra, Junfeng Wen, Jesse C. Cresswell, Maksims Volkovs, and Hamid R. Tizhoosh. 2021. ProxyFL: Decentralized Federated Learning through Proxy Model Sharing. arXiv:2111.11343 [cs.LG]

[14]

Alex Krizhevsky, Geoffrey Hinton, et al. 2009. Learning multiple layers of features from tiny images. Technical Report. University of Toronto.

[15]

Fan Lai, Yinwei Dai, Sanjay Singapuram, Jiachen Liu, Xiangfeng Zhu, Harsha Madhyastha, and Mosharaf Chowdhury. 2022. Fedscale: Benchmarking model and system performance of federated learning at scale. In ICML.

[16]

Chengxi Li, Gang Li, and Pramod K Varshney. 2021. Decentralized federated learning via mutual knowledge transfer. IEEE Internet of Things Journal 9, 2 (2021).

[17]

Daliang Li and Junpu Wang. 2019. FedMD: Heterogenous Federated Learning via Model Distillation. arXiv:1910.03581 [cs.LG]

[18]

Tao Lin, Lingjing Kong, Sebastian U. Stich, and Martin Jaggi. 2020. Ensemble distillation for robust model fusion in federated learning. In NeurIPS.

[19]

Mi Luo, Fei Chen, Dapeng Hu, Yifan Zhang, Jian Liang, and Jiashi Feng. 2021. No Fear of Heterogeneity: Classifier Calibration for Federated Learning with Non-IID Data. In NeurIPS.

[20]

Michael Matena and Colin Raffel. 2022. Merging Models with Fisher-Weighted Averaging. arXiv:2111.09832 [cs.LG]

[21]

Felix Sattler, Tim Korjakow, Roman Rischke, and Wojciech Samek. 2021. Fedaux: Leveraging unlabeled auxiliary data in federated learning. IEEE Transactions on Neural Networks and Learning Systems (2021).

[22]

Felix Sattler, Arturo Marban, Roman Rischke, and Wojciech Samek. 2021. Cfd: Communication-efficient federated distillation via soft-label quantization and delta coding. IEEE Transactions on Network Science and Engineering 9, 4 (2021).

[23]

Jürgen Schmidhuber. 1991. Neural sequence chunkers. Inst. für Informatik.

[24]

Samuel Stanton, Pavel Izmailov, Polina Kirichenko, Alexander A Alemi, and Andrew G Wilson. 2021. Does knowledge distillation really work?. In NeurIPS.

[25]

Yue Tan, Guodong Long, Lu Liu, Tianyi Zhou, Qinghua Lu, Jing Jiang, and Chengqi Zhang. 2022. Fedproto: Federated prototype learning across heterogeneous clients. In AAAI.

[26]

Linfeng Zhang, Chenglong Bao, and Kaisheng Ma. 2022. Self-distillation: Towards efficient and compact neural networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 44, 8 (2022).

[27]

Ying Zhang, Tao Xiang, Timothy M Hospedales, and Huchuan Lu. 2018. Deep mutual learning. In CVPR.

Index Terms

A First Look at the Impact of Distillation Hyper-Parameters in Federated Knowledge Distillation
1. Computing methodologies
  1. Distributed computing methodologies
    1. Distributed algorithms
  2. Machine learning

Recommendations

Assisted entanglement distillation

Motivated by the problem of designing quantum repeaters, we study entanglement distillation between two parties, Alice and Bob, starting from a mixed state and with the help of "repeater" stations. To treat the case of a single repeater, we extend the ...
Read More
Quantum universality by state distillation

Quantum universality can be achieved using classically controlled stabilizer operations and repeated preparation of certain ancilla states. Which ancilla states suffice for universality? This "magic states distillation" question is closely related to ...
Read More
Distillation and bound entanglement

Quantum entanglement has been known for over sixty years, however the full significance of it as a basic resource in quantum information theory is only being discovered. The fundamental problem is that the decoherence effect due to the environment ...
Read More

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

EuroMLSys '23: Proceedings of the 3rd Workshop on Machine Learning and Systems

May 2023

176 pages

ISBN:9798400700842

DOI:10.1145/3578356

Workshop Co-chairs:
Eiko Yoneki
University of Cambridge
,
Luigi Nardi
Lund University
Stanford University

Copyright © 2023 Owner/Author(s).

This work is licensed under a Creative Commons Attribution International 4.0 License.

Sponsors

SIGOPS: ACM Special Interest Group on Operating Systems

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 08 May 2023

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

King Abdullah University of Science and Technology (KAUST)

Conference

EuroMLSys '23

Sponsor:

SIGOPS

EuroMLSys '23: 3rd Workshop on Machine Learning and Systems

May 8, 2023

Rome, Italy

Acceptance Rates

Overall Acceptance Rate 18 of 26 submissions, 69%

Upcoming Conference

EuroSys '25

Sponsor:
sigops

Twentieth European Conference on Computer Systems

March 30 - April 3, 2025

Rotterdam , Netherlands

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
266
Total Downloads

Downloads (Last 12 months)222
Downloads (Last 6 weeks)14

Other Metrics

View Author Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents