research-article

QFL: Federated Learning Acceleration Based on QAT Hardware Accelerator

Authors:

Shuangwu ChenAuthors Info & Claims

CMLDS '24: Proceedings of the International Conference on Computing, Machine Learning and Data Science

Article No.: 20, Pages 1 - 7

https://doi.org/10.1145/3661725.3661747

Published: 20 June 2024 Publication History

Abstract

Federated Learning(FL) enables geographically dispersed organizations to collaboratively train a machine learning model. In this process, a parameter server enables global updating and synchronization of model by receiving and aggregating model data from multiple clients. In order to ensure security of this process, homomorphic encryption (HE) algorithms are used by clients to achieve data privacy. However, HE brings huge computational overhead (i.e., the computational cost of data encryption/decryption) and communication overhead (multiple rounds of FL communication, more than 150 times of ciphertext expansion in each round), and eventually becomes the performance bottleneck of the entire FL system. In this paper, we present QFL, a system solution for FL based on Intel QAT(Quick Assist Technology) hardware accelerator that substantially reduces the computation and communication overhead caused by HE. Based on the optimized HE algorithm, we leverage coroutines to concurrently and asynchronously offload the HE modular exponentiation operation to the QAT, and use an event-driven mechanism to get QAT calculation results timely to reduce computing overhead. Through the combination of error feedback gradient compression algorithm and QAT Hardware Accelerated Huffman coding, we greatly reduce the communication overhead and accelerate server-side gradient aggregation,reduce the system complexity. Our solution improves encryption throughput by 16 × compared with the open source Python encryption library python-Paillier[1]. Compared with the state-of-the-art FL framework with HE [32], our solution shrinks the training time by 3 × when reaching the same test accuracy.

References

[1]

2013. Python Paillier Library. https://github.com/data61/python-paillier

[2]

2021. Pyfhel: Python For Homomorphic Encryption Libraries. https://github.com/ibarrond/Pyfhel

[3]

2021. SEAL. https://github.com/Microsoft/SEAL

[4]

Nitesh Aggarwal, CP Gupta, and Iti Sharma. 2014. Fully Homomorphic symmetric scheme without bootstrapping. In Proceedings of 2014 International Conference on Cloud Computing and Internet of Things. 14–17. https://doi.org/10.1109/CCIOT.2014.7062497

[5]

Dan Alistarh, Demjan Grubic, Jerry Z. Li, Ryota Tomioka, and Milan Vojnovic. 2017. QSGD: Communication-Efficient SGD via Gradient Quantization and Encoding. In Proceedings of the 31st International Conference on Neural Information Processing Systems (Long Beach, California, USA) (NIPS’17). Curran Associates Inc., Red Hook, NY, USA, 1707–1718.

[6]

Ayoub Benaissa, Bilal Retiat, Bogdan Cebere, and Alaa Eddine Belfedhal. 2021. TenSEAL: A library for encrypted tensor operations using homomorphic encryption. arXiv preprint arXiv:2104.03152 (2021).

[7]

Xuanyu Cao, Tamer Başar, Suhas Diggavi, Yonina C. Eldar, Khaled B. Letaief, H. Vincent Poor, and Junshan Zhang. 2023. Communication-Efficient Distributed Learning: An Overview. IEEE Journal on Selected Areas in Communications 41, 4 (2023), 851–873. https://doi.org/10.1109/JSAC.2023.3242710

Digital Library

[8]

Xiaodian Cheng, Wanhang Lu, Xinyang Huang, Shuihai Hu, and Kai Chen. 2021. HAFLO: GPU-Based Acceleration for Federated Logistic Regression. arxiv:2107.13797 [cs.LG]

[9]

Jung Hee Cheon, Andrey Kim, Miran Kim, and Yongsoo Song. 2017. Homomorphic Encryption for Arithmetic of Approximate Numbers. In Advances in Cryptology – ASIACRYPT 2017, Tsuyoshi Takagi and Thomas Peyrin (Eds.). Springer International Publishing, Cham, 409–437.

[10]

Xiaojie Feng and Haizhou Du. 2021. FLZip: An Efficient and Privacy-Preserving Framework for Cross-Silo Federated Learning. In 2021 IEEE International Conferences on Internet of Things (iThings) and IEEE Green Computing & Communications (GreenCom) and IEEE Cyber, Physical & Social Computing (CPSCom) and IEEE Smart Data (SmartData) and IEEE Congress on Cybermatics (Cybermatics). 209–216. https://doi.org/10.1109/iThings-GreenCom-CPSCom-SmartData-Cybermatics53846.2021.00044

[11]

Craig Gentry. 2009. Fully Homomorphic Encryption Using Ideal Lattices. In Proceedings of the Forty-First Annual ACM Symposium on Theory of Computing (Bethesda, MD, USA) (STOC ’09). Association for Computing Machinery, New York, NY, USA, 169–178. https://doi.org/10.1145/1536414.1536440

Digital Library

[12]

Song Han, Huizi Mao, and William J Dally. 2015. Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding. arXiv preprint arXiv:1510.00149 (2015).

[13]

Intel. 2019. Intel quickassist technology (intel QAT). Retrieved August, 2019 from https://www.intel.com/content/www/us/en/architecture-and-technology/intel-quick-assist-technology-overview.html

[14]

Zhifeng Jiang, Wei Wang, and Yang Liu. 2021. FLASHE: Additively Symmetric Homomorphic Encryption for Cross-Silo Federated Learning. arxiv:2109.00675 [cs.CR]

[15]

Peter Kairouz, H. Brendan McMahan, Brendan Avent, Aurélien Bellet, 2021. Advances and Open Problems in Federated Learning.

[16]

Sai Praneeth Karimireddy, Quentin Rebjock, Sebastian Stich, and Martin Jaggi. 2019. Error Feedback Fixes SignSGD and other Gradient Compression Schemes. In Proceedings of the 36th International Conference on Machine Learning(Proceedings of Machine Learning Research, Vol. 97), Kamalika Chaudhuri and Ruslan Salakhutdinov (Eds.). PMLR, 3252–3261. https://proceedings.mlr.press/v97/karimireddy19a.html

[17]

Alex Krizhevsky, Geoffrey Hinton, 2009. Learning multiple layers of features from tiny images. (2009).

[18]

Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2012. Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems 25 (2012).

[19]

Yujun Lin, Song Han, Huizi Mao, Yu Wang, and William J Dally. 2018. Deep Gradient Compression: Reducing the communication bandwidth for distributed training. In The International Conference on Learning Representations.

[20]

Changchang Liu, Supriyo Chakraborty, and Dinesh Verma. 2019. Secure Model Fusion for Distributed Learning Using Partial Homomorphic Encryption. Springer International Publishing, Cham, 154–179. https://doi.org/10.1007/978-3-030-17277-0_9

[21]

Yang Liu, Tao Fan, Tianjian Chen, Qian Xu, and Qiang Yang. 2021. FATE: An Industrial Grade Platform for Collaborative Learning with Data Protection. J. Mach. Learn. Res. 22, 1, Article 226 (jan 2021), 6 pages.

[22]

Payman Mohassel and Peter Rindal. 2018. ABY3: A Mixed Protocol Framework for Machine Learning. In Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security (Toronto, Canada) (CCS ’18). Association for Computing Machinery, New York, NY, USA, 35–52. https://doi.org/10.1145/3243734.3243760

Digital Library

[23]

Pascal Paillier. 1999. Public-Key Cryptosystems Based on Composite Degree Residuosity Classes. In Proceedings of the 17th International Conference on Theory and Application of Cryptographic Techniques (Prague, Czech Republic) (EUROCRYPT’99). Springer-Verlag, Berlin, Heidelberg, 223–238.

Digital Library

[24]

Frank Seide, Hao Fu, Jasha Droppo, Gang Li, and Dong Yu. 2014. 1-bit stochastic gradient descent and its application to data-parallel distributed training of speech DNNs. In Proc. Interspeech 2014. 1058–1062. https://doi.org/10.21437/Interspeech.2014-274

[25]

Shaohuai Shi, Xianhao Zhou, Shutao Song, Xingyao Wang, Zilin Zhu, Xue Huang, Xinan Jiang, Feihu Zhou, Zhenyu Guo, Liqiang Xie, 2021. Towards scalable distributed training of deep learning on public cloud clusters. Proceedings of Machine Learning and Systems 3 (2021), 401–412.

[26]

Reza Shokri and Vitaly Shmatikov. 2015. Privacy-Preserving Deep Learning. In Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security (Denver, Colorado, USA) (CCS ’15). Association for Computing Machinery, New York, NY, USA, 1310–1321. https://doi.org/10.1145/2810103.2813687

Digital Library

[27]

N. P. Smart and F. Vercauteren. 2010. Fully Homomorphic Encryption with Relatively Small Key and Ciphertext Sizes. In Public Key Cryptography – PKC 2010, Phong Q. Nguyen and David Pointcheval (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 420–443.

[28]

N. P. Smart and F. Vercauteren. 2014. Fully Homomorphic SIMD Operations. Des. Codes Cryptography 71, 1 (apr 2014), 57–81. https://doi.org/10.1007/s10623-012-9720-4

Digital Library

[29]

Marten van Dijk, Craig Gentry, Shai Halevi, and Vinod Vaikuntanathan. 2010. Fully Homomorphic Encryption over the Integers. In Proceedings of the 29th Annual International Conference on Theory and Applications of Cryptographic Techniques (French Riviera, France) (EUROCRYPT’10). Springer-Verlag, Berlin, Heidelberg, 24–43. https://doi.org/10.1007/978-3-642-13190-5_2

Digital Library

[30]

Qiang Yang, Yang Liu, Tianjian Chen, and Yongxin Tong. 2019. Federated Machine Learning: Concept and Applications. ACM Trans. Intell. Syst. Technol. 10, 2, Article 12 (jan 2019), 19 pages. https://doi.org/10.1145/3298981

Digital Library

[31]

Zhaoxiong Yang, Shuihai Hu, and Kai Chen. 2020. FPGA-Based Hardware Accelerator of Homomorphic Encryption for Efficient Federated Learning. arxiv:2007.10560 [cs.CR]

[32]

Chengliang Zhang, Suyi Li, Junzhe Xia, Wei Wang, Feng Yan, and Yang Liu. 2020. BatchCrypt: Efficient Homomorphic Encryption for Cross-Silo Federated Learning. In Proceedings of the 2020 USENIX Conference on Usenix Annual Technical Conference(USENIX ATC’20). USENIX Association, USA, Article 33, 14 pages.

Index Terms

QFL: Federated Learning Acceleration Based on QAT Hardware Accelerator

Recommendations

Hardware Acceleration of Searchable Encryption
CCS '18: Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security

Searchable symmetric encryption (SSE) allows a client to outsource the storage of her data to an (untrusted) server in a private manner, while maintaining the ability to selectively search over it. A key feature of all existing SSE schemes is the ...
Hardware Acceleration of OpenSSL Cryptographic Functions for High-Performance Internet Security
ISMS '10: Proceedings of the 2010 International Conference on Intelligent Systems, Modelling and Simulation

The Transport Layer Security (TLS) protocol is currently the predominant method of implementing Internet security. This paper proposes an FPGA-based embedded system integrating hardware that accelerates the cryptographic algorithms used in the SSL/TLS ...
Client-optimized algorithms and acceleration for encrypted compute offloading
ASPLOS '22: Proceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems

Homomorphic Encryption (HE) enables secure cloud offload processing on encrypted data. HE schemes are limited in the complexity and type of operations they can perform, motivating client-aided implementations that distribute computation between client (...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

CMLDS '24: Proceedings of the International Conference on Computing, Machine Learning and Data Science

April 2024

381 pages

ISBN:9798400716393

DOI:10.1145/3661725

Copyright © 2024 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 June 2024

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

CMLDS 2024

CMLDS 2024: 2024 International Conference on Computing, Machine Learning and Data Science

April 12 - 14, 2024

Singapore, Singapore

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
53
Total Downloads

Downloads (Last 12 months)53
Downloads (Last 6 weeks)7

Reflects downloads up to 02 Feb 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Figures

Tables

Media

View Table of Conten