research-article

Federated matrix factorization with privacy guarantee

Authors:

Jingren ZhouAuthors Info & Claims

Proceedings of the VLDB Endowment, Volume 15, Issue 4

Pages 900 - 913

https://doi.org/10.14778/3503585.3503598

Published: 01 December 2021 Publication History

Abstract

Matrix factorization (MF) approximates unobserved ratings in a rating matrix, whose rows correspond to users and columns correspond to items to be rated, and has been serving as a fundamental building block in recommendation systems. This paper comprehensively studies the problem of matrix factorization in different federated learning (FL) settings, where a set of parties want to cooperate in training but refuse to share data directly. We first propose a generic algorithmic framework for various settings of <u>f</u>ederated <u>m</u>atrix <u>f</u>actorization (FMF) and provide a theoretical convergence guarantee. We then systematically characterize privacy-leakage risks in data collection, training, and publishing stages for three different settings and introduce privacy notions to provide end-to-end privacy protections. The first one is vertical federated learning (VFL), where multiple parties have the ratings from the same set of users but on disjoint sets of items. The second one is horizontal federated learning (HFL), where parties have ratings from different sets of users but on the same set of items. The third setting is local federated learning (LFL), where the ratings of the users are only stored on their local devices. We introduce adapted versions of FMF with the privacy notions guaranteed in the three settings. In particular, a new private learning technique called embedding clipping is introduced and used in all the three settings to ensure differential privacy. For the LFL setting, we combine differential privacy with secure aggregation to protect the communication between user devices and the server with a strength similar to the local differential privacy model, but much better accuracy. We perform experiments to demonstrate the effectiveness of our approaches.

References

[1]

Martin Abadi, Andy Chu, Ian Goodfellow, H. Brendan McMahan, Ilya Mironov, Kunal Talwar, and Li Zhang. 2016. Deep Learning with Differential Privacy. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security (Vienna, Austria) (CCS '16). Association for Computing Machinery, New York, NY, USA, 308--318.

Digital Library

[2]

Charu C Aggarwal et al. 2016. Recommender systems. Vol. 1. Springer, New York, USA.

[3]

Eugene Bagdasaryan, Omid Poursaeed, and Vitaly Shmatikov. 2019. Differential privacy has disparate impact on model accuracy. Advances in Neural Information Processing Systems 32 (2019), 15479--15488.

[4]

Oren Barkan and Noam Koenigstein. 2016. Item2vec: neural item embedding for collaborative filtering. In 2016 IEEE 26th International Workshop on Machine Learning for Signal Processing (MLSP). IEEE, IEEE, New York, NY, USA, 1--6.

[5]

James Henry Bell, Kallista A. Bonawitz, Adrià Gascón, Tancrède Lepoint, and Mariana Raykova. 2020. Secure Single-Server Aggregation with (Poly)Logarithmic Overhead. In Proceedings of the 2020 ACM SIGSAC Conference on Computer and Communications Security (Virtual Event, USA) (CCS '20). Association for Computing Machinery, New York, NY, USA, 1253--1269.

Digital Library

[6]

Keith Bonawitz, Vladimir Ivanov, Ben Kreuter, Antonio Marcedone, H. Brendan McMahan, Sarvar Patel, Daniel Ramage, Aaron Segal, and Karn Seth. 2017. Practical Secure Aggregation for Privacy-Preserving Machine Learning. In Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security (Dallas, Texas, USA) (CCS '17). Association for Computing Machinery, New York, NY, USA, 1175--1191.

Digital Library

[7]

Lukas Brozovsky and Vaclav Petricek. 2007. Recommender System for Online Dating Service.

[8]

Di Chai, Leye Wang, Kai Chen, and Qiang Yang. 2021. Secure Federated Matrix Factorization. IEEE Intelligent Systems 36, 5 (2021), 11--20.

[9]

Tianyi Chen, Xiao Jin, Yuejiao Sun, and Wotao Yin. 2020. VAFL: a Method of Vertical Asynchronous Federated Learning. arXiv:2007.06081 [cs.LG]

[10]

Xiangyi Chen, Steven Z. Wu, and Mingyi Hong. 2020. Understanding Gradient Clipping in Private SGD: A Geometric Perspective. In Advances in Neural Information Processing Systems, H. Larochelle, M. Ranzato, R. Hadsell, M. F. Balcan, and H. Lin (Eds.), Vol. 33. Curran Associates, Inc., Red Hook, NY, USA, 13773--13782. https://proceedings.neurips.cc/paper/2020/file/9ecff5455677b38d19f49ce658ef0608-Paper.pdf

[11]

Bolin Ding, Janardhan Kulkarni, and Sergey Yekhanin. 2017. Collecting Telemetry Data Privately. In Proceedings of the 31st International Conference on Neural Information Processing Systems (Long Beach, California, USA) (NIPS'17). Curran Associates Inc., Red Hook, NY, USA, 3574--3583.

Digital Library

[12]

Irit Dinur and Kobbi Nissim. 2003. Revealing Information While Preserving Privacy. In Proceedings of the Twenty-Second ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (San Diego, California) (PODS '03). Association for Computing Machinery, New York, NY, USA, 202--210.

Digital Library

[13]

Cynthia Dwork, Frank McSherry, Kobbi Nissim, and Adam Smith. 2006. Calibrating Noise to Sensitivity in Private Data Analysis. In Theory of Cryptography, Shai Halevi and Tal Rabin (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 265--284.

Digital Library

[14]

Cynthia Dwork, Adam Smith, Thomas Steinke, and Jonathan Ullman. 2017. Exposed! a survey of attacks on private data. Annual Review of Statistics and Its Application 4 (2017), 61--84.

[15]

Úlfar Erlingsson, Vasyl Pihur, and Aleksandra Korolova. 2014. RAPPOR: Randomized Aggregatable Privacy-Preserving Ordinal Response. In Proceedings of the 2014 ACM SIGSAC Conference on Computer and Communications Security (Scottsdale, Arizona, USA) (CCS '14). Association for Computing Machinery, New York, NY, USA, 1054--1067.

Digital Library

[16]

Vitaly Feldman, Audra McMillan, and Kunal Talwar. 2021. Hiding Among the Clones: A Simple and Nearly Optimal Analysis of Privacy Amplification by Shuffling. arXiv:2012.12803 [cs.LG]

[17]

Adrian Flanagan, Were Oyomno, Alexander Grigorievskiy, Kuan E. Tan, Suleiman A. Khan, and Muhammad Ammad-Ud-Din. 2021. Federated Multi-view Matrix Factorization for Personalized Recommendations. In Machine Learning and Knowledge Discovery in Databases. Springer International Publishing, Cham, 324--347.

Digital Library

[18]

Rainer Gemulla, Erik Nijkamp, Peter J. Haas, and Yannis Sismanis. 2011. Large-Scale Matrix Factorization with Distributed Stochastic Gradient Descent. In Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (San Diego, California, USA) (KDD '11). Association for Computing Machinery, New York, NY, USA, 69--77.

Digital Library

[19]

Antonious Girgis, Deepesh Data, Suhas Diggavi, Peter Kairouz, and Ananda Theertha Suresh. 2021. Shuffled Model of Differential Privacy in Federated Learning. In Proceedings of The 24th International Conference on Artificial Intelligence and Statistics (Proceedings of Machine Learning Research), Arindam Banerjee and Kenji Fukumizu (Eds.), Vol. 130. PMLR, USA, 2521--2529. https://proceedings.mlr.press/v130/girgis21a.html

[20]

Anupam Gupta, Katrina Ligett, Frank McSherry, Aaron Roth, and Kunal Talwar. 2010. Differentially private combinatorial optimization. SIAM, Philadelphia, PA, USA, 1106--1125. arXiv:https://epubs.siam.org/doi/pdf/10.1137/1.9781611973075.90

[21]

F. Maxwell Harper and Joseph A. Konstan. 2015. The MovieLens Datasets: History and Context. ACM Trans. Interact. Intell. Syst. 5, 4, Article 19 (dec 2015), 19 pages.

Digital Library

[22]

Yaochen Hu, Peng Liu, Linglong Kong, and Di Niu. 2019. Learning Privately over Distributed Features: An ADMM Sharing Approach. arXiv:1907.07735 [cs.LG]

[23]

Yaochen Hu, Di Niu, Jianming Yang, and Shengping Zhou. 2019. FDML: A Collaborative Machine Learning Framework for Distributed Features. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (Anchorage, AK, USA) (KDD '19). Association for Computing Machinery, New York, NY, USA, 2232--2240.

Digital Library

[24]

Jingyu Hua, Chang Xia, and Sheng Zhong. 2015. Differentially private matrix factorization. In Proceedings of the 24th International Conference on Artificial Intelligence (Buenos Aires, Argentina) (IJCAI'15). AAAI Press, USA, 1763--1770.

Digital Library

[25]

Prateek Jain, Om Dipakbhai Thakkar, and Abhradeep Thakurta. 2018. Differentially Private Matrix Completion Revisited. In Proceedings of the 35th International Conference on Machine Learning (Proceedings of Machine Learning Research), Jennifer Dy and Andreas Krause (Eds.), Vol. 80. PMLR, USA, 2215--2224. https://proceedings.mlr.press/v80/jain18b.html

[26]

Peter Kairouz, H. Brendan McMahan, Brendan Avent, Aurélien Bellet, Mehdi Bennis, Arjun Nitin Bhagoji, Kallista Bonawitz, Zachary Charles, Graham Cormode, Rachel Cummings, Rafael G. L. D'Oliveira, Hubert Eichner, Salim El Rouayheb, David Evans, Josh Gardner, Zachary Garrett, Adrià Gascón, Badih Ghazi, Phillip B. Gibbons, Marco Gruteser, Zaid Harchaoui, Chaoyang He, Lie He, Zhouyuan Huo, Ben Hutchinson, Justin Hsu, Martin Jaggi, Tara Javidi, Gauri Joshi, Mikhail Khodak, Jakub Konečný, Aleksandra Korolova, Farinaz Koushanfar, Sanmi Koyejo, Tancrède Lepoint, Yang Liu, Prateek Mittal, Mehryar Mohri, Richard Nock, Ayfer Özgür, Rasmus Pagh, Mariana Raykova, Hang Qi, Daniel Ramage, Ramesh Raskar, Dawn Song, Weikang Song, Sebastian U. Stich, Ziteng Sun, Ananda Theertha Suresh, Florian Tramèr, Praneeth Vepakomma, Jianyu Wang, Li Xiong, Zheng Xu, Qiang Yang, Felix X. Yu, Han Yu, and Sen Zhao. 2021. Advances and Open Problems in Federated Learning. arXiv:1912.04977 [cs.LG]

[27]

Yejin Kim, Jimeng Sun, Hwanjo Yu, and Xiaoqian Jiang. 2017. Federated tensor factorization for computational phenotyping. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Association for Computing Machinery, New York, NY, USA, 887--895.

Digital Library

[28]

Jakub Konečný, H. Brendan McMahan, Daniel Ramage, and Peter Richtárik. 2016. Federated Optimization: Distributed Machine Learning for On-Device Intelligence. arXiv:1610.02527 [cs.LG]

[29]

Jakub Konečný, H. Brendan McMahan, Felix X. Yu, Peter Richtárik, Ananda Theertha Suresh, and Dave Bacon. 2017. Federated Learning: Strategies for Improving Communication Efficiency. arXiv:1610.05492 [cs.LG]

[30]

Yehuda Koren, Robert Bell, and Chris Volinsky. 2009. Matrix factorization techniques for recommender systems. Computer 42, 8 (2009), 30--37.

Digital Library

[31]

Ninghui Li, Wahbeh Qardaji, and Dong Su. 2012. On sampling, anonymization, and differential privacy or, k-anonymization meets differential privacy. In Proceedings of the 7th ACM Symposium on Information, Computer and Communications Security. Association for Computing Machinery, New York, NY, USA, 32--33.

Digital Library

[32]

Yang Liu, Yingting Liu, Zhijie Liu, Yuxuan Liang, Chuishi Meng, Junbo Zhang, and Yu Zheng. 2020. Federated forest. IEEE Transactions on Big Data 01 (2020), 1--1.

[33]

Ziqi Liu, Yu-Xiang Wang, and Alexander Smola. 2015. Fast Differentially Private Matrix Factorization. In Proceedings of the 9th ACM Conference on Recommender Systems (Vienna, Austria) (RecSys '15). Association for Computing Machinery, New York, NY, USA, 171--178.

Digital Library

[34]

Jing Ma, Qiuchen Zhang, Jian Lou, Joyce C. Ho, Li Xiong, and Xiaoqian Jiang. 2019. Privacy-Preserving Tensor Factorization for Collaborative Health Data Analysis. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management (Beijing, China) (CIKM '19). Association for Computing Machinery, New York, NY, USA, 1291--1300.

Digital Library

[35]

Brendan McMahan, Eider Moore, Daniel Ramage, Seth Hampson, and Blaise Aguera y Arcas. 2017. Communication-Efficient Learning of Deep Networks from Decentralized Data. In Proceedings of the 20th International Conference on Artificial Intelligence and Statistics (Proceedings of Machine Learning Research), Aarti Singh and Jerry Zhu (Eds.), Vol. 54. PMLR, USA, 1273--1282. https://proceedings.mlr.press/v54/mcmahan17a.html

[36]

H Brendan McMahan, Daniel Ramage, Kunal Talwar, and Li Zhang. 2018. Learning Differentially Private Recurrent Language Models. In International Conference on Learning Representations. OpenReview.net. https://openreview.net/forum?id=BJ0hF1Z0b

[37]

Frank D McSherry. 2009. Privacy integrated queries: an extensible platform for privacy-preserving data analysis. In Proceedings of the 2009 ACM SIGMOD International Conference on Management of data. Association for Computing Machinery, New York, NY, USA, 19--30.

Digital Library

[38]

Luca Melis, Congzheng Song, Emiliano De Cristofaro, and Vitaly Shmatikov. 2019. Exploiting unintended feature leakage in collaborative learning. In 2019 IEEE Symposium on Security and Privacy (SP). IEEE, New York, NY, USA, 691--706.

[39]

Arvind Narayanan and Vitaly Shmatikov. 2006. How to break anonymity of the netflix prize dataset.

[40]

Yilin Shen and Hongxia Jin. 2014. Privacy-preserving personalized recommendation: An instance-based approach via differential privacy. In 2014 IEEE International Conference on Data Mining. IEEE, New York, NY, USA, 540--549.

Digital Library

[41]

Hyejin Shin, Sungwook Kim, Junbum Shin, and Xiaokui Xiao. 2018. Privacy enhanced matrix factorization for recommendation with local differential privacy. IEEE Transactions on Knowledge and Data Engineering 30, 9 (2018), 1770--1782.

Digital Library

[42]

Shuang Song, Kamalika Chaudhuri, and Anand D Sarwate. 2013. Stochastic gradient descent with differentially private updates. In 2013 IEEE Global Conference on Signal and Information Processing. IEEE, New York, NY, USA, 245--248.

[43]

Apple Differential Privacy Team. 2021. Differential Privacy Overview - Apple. Apple. inc. https://www.apple.com/privacy/docs/Differential_Privacy_Overview.pdf

[44]

TensorFlow-Privacy. 2021. TensorFlow Privacy. https://github.com/tensorflow/privacy. Accessed: 2021-12-01.

[45]

Yu-Xiong Wang and Yu-Jin Zhang. 2012. Nonnegative matrix factorization: A comprehensive review. IEEE Transactions on Knowledge and Data Engineering 25, 6 (2012), 1336--1353.

Digital Library

[46]

Yu-Xiong Wang and Yu-Jin Zhang. 2012. Nonnegative matrix factorization: A comprehensive review. IEEE Transactions on Knowledge and Data Engineering 25, 6 (2012), 1336--1353.

Digital Library

[47]

Kang Wei, Jun Li, Ming Ding, Chuan Ma, Howard H Yang, Farhad Farokhi, Shi Jin, Tony QS Quek, and H Vincent Poor. 2020. Federated learning with differential privacy: Algorithms and performance analysis. IEEE Transactions on Information Forensics and Security 15 (2020), 3454--3469.

Digital Library

[48]

Nan Wu, Farhad Farokhi, David Smith, and Mohamed Ali Kaafar. 2020. The value of collaboration in convex machine learning with differential privacy. In 2020 IEEE Symposium on Security and Privacy (SP). IEEE, New York, NY, USA, 304--317.

[49]

Yuncheng Wu, Shaofeng Cai, Xiaokui Xiao, Gang Chen, and Beng Chin Ooi. 2020. Privacy Preserving Vertical Federated Learning for Tree-based Models. Proceedings of the VLDB Endowment 13, 11 (2020), 2090--2103.

Digital Library

[50]

Qiang Yang, Yang Liu, Tianjian Chen, and Yongxin Tong. 2019. Federated machine learning: Concept and applications. ACM Transactions on Intelligent Systems and Technology (TIST) 10, 2 (2019), 1--19.

Digital Library

[51]

Sheng Zhang, Weihong Wang, James Ford, and Fillia Makedon. 2006. Learning from incomplete ratings using non-negative matrix factorization. Proceedings of the 2006 SIAM international conference on data mining 1 (2006), 549--553.

[52]

Handong Zhao, Zhengming Ding, and Yun Fu. 2017. Multi-view clustering via deep matrix factorization. Proceedings of the AAAI Conference on Artificial Intelligence 31, 1 (2017), -. https://ojs.aaai.org/index.php/AAAI/article/view/10867

[53]

Ligeng Zhu and Song Han. 2020. Deep leakage from gradients. Federated Learning 12500 (2020), 17--31.

Cited By

Sun PWu LWang ZLiu JLuo JJin W(2024)A Profit-Maximizing Data Marketplace with Differentially Private Federated Learning under Price CompetitionProceedings of the ACM on Management of Data10.1145/36771272:4(1-27)Online publication date: 30-Sep-2024
https://dl.acm.org/doi/10.1145/3677127
Zhao FLi ZRen XDing BYang SLi YBaeza-Yates RBonchi F(2024)VertiMRF: Differentially Private Vertical Federated Data SynthesisProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671771(4431-4442)Online publication date: 25-Aug-2024
https://dl.acm.org/doi/10.1145/3637528.3671771
Guo WZhuang FZhang XTong YDong J(2024)A comprehensive survey of federated transfer learning: challenges, methods and applicationsFrontiers of Computer Science: Selected Publications from Chinese Universities10.1007/s11704-024-40065-x18:6Online publication date: 1-Dec-2024
https://dl.acm.org/doi/10.1007/s11704-024-40065-x
Show More Cited By

Recommendations

Applying Differential Privacy to Matrix Factorization
RecSys '15: Proceedings of the 9th ACM Conference on Recommender Systems

Recommender systems are increasingly becoming an integral part of on-line services. As the recommendations rely on personal user information, there is an inherent loss of privacy resulting from the use of such systems. While several works studied ...
Federated matrix factorization for privacy-preserving recommender systems
Abstract
Recommender systems recommend contents or services via collecting and analyzing numerous user data, which may raise serious privacy concerns when the recommender is untrusted. Inspired by federated learning, a user-level distributed ...
Highlights
- An effective privacy-preserving recommendation method is proposed.
- Our method ...
Collaborative filtering using non-negative matrix factorisation

Collaborative filtering is a popular strategy in recommender systems area. This approach gathers users' ratings and then predicts what users will rate based on their similarity to other users. However, most of the collaborative filtering methods have ...

Comments

Information & Contributors

Information

Published In

cover image Proceedings of the VLDB Endowment

Proceedings of the VLDB Endowment Volume 15, Issue 4

December 2021

246 pages

ISSN:2150-8097

Editors:
Juliana Freire
New York University
,
Xuemin Lin
University of New South Wales

Issue’s Table of Contents

Publisher

VLDB Endowment

Publication History

Published: 01 December 2021

Published in PVLDB Volume 15, Issue 4

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

16
Total Citations
View Citations
255
Total Downloads

Downloads (Last 12 months)32
Downloads (Last 6 weeks)2

Reflects downloads up to 03 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

Sun PWu LWang ZLiu JLuo JJin W(2024)A Profit-Maximizing Data Marketplace with Differentially Private Federated Learning under Price CompetitionProceedings of the ACM on Management of Data10.1145/36771272:4(1-27)Online publication date: 30-Sep-2024
https://dl.acm.org/doi/10.1145/3677127
Zhao FLi ZRen XDing BYang SLi YBaeza-Yates RBonchi F(2024)VertiMRF: Differentially Private Vertical Federated Data SynthesisProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671771(4431-4442)Online publication date: 25-Aug-2024
https://dl.acm.org/doi/10.1145/3637528.3671771
Guo WZhuang FZhang XTong YDong J(2024)A comprehensive survey of federated transfer learning: challenges, methods and applicationsFrontiers of Computer Science: Selected Publications from Chinese Universities10.1007/s11704-024-40065-x18:6Online publication date: 1-Dec-2024
https://dl.acm.org/doi/10.1007/s11704-024-40065-x
Liang ZWang H(2024)FedST: secure federated shapelet transformation for time series classificationThe VLDB Journal — The International Journal on Very Large Data Bases10.1007/s00778-024-00865-w33:5(1617-1641)Online publication date: 1-Sep-2024
https://dl.acm.org/doi/10.1007/s00778-024-00865-w
Choquette-Choo CGanesh AMcKenna RMcMahan HRush KThakurta AXu ZOh ANaumann TGloberson ASaenko KHardt MLevine S(2023)(Amplified) banded matrix factorizationProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3669393(74856-74889)Online publication date: 10-Dec-2023
https://dl.acm.org/doi/10.5555/3666122.3669393
Qiao DDing CFan JOh ANaumann TGloberson ASaenko KHardt MLevine S(2023)Federated spectral clustering via secure similarity reconstructionProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3668673(58520-58555)Online publication date: 10-Dec-2023
https://dl.acm.org/doi/10.5555/3666122.3668673
Wu YXing NChen GDinh TLuo ZOoi BXiao XZhang M(2023)Falcon: A Privacy-Preserving and Interpretable Vertical Federated Learning SystemProceedings of the VLDB Endowment10.14778/3603581.360358816:10(2471-2484)Online publication date: 1-Jun-2023
https://dl.acm.org/doi/10.14778/3603581.3603588
Li ZWang TLi N(2023)Differentially Private Vertical Federated ClusteringProceedings of the VLDB Endowment10.14778/3583140.358314616:6(1277-1290)Online publication date: 1-Feb-2023
https://dl.acm.org/doi/10.14778/3583140.3583146
Xie YWang ZGao DChen DYao LKuang WLi YDing BZhou J(2023)FederatedScope: A Flexible Federated Learning Platform for HeterogeneityProceedings of the VLDB Endowment10.14778/3579075.357908116:5(1059-1072)Online publication date: 1-Jan-2023
https://dl.acm.org/doi/10.14778/3579075.3579081
Fu RWu YXu QZhang M(2023)FEAST: A Communication-efficient Federated Feature Selection Framework for Relational DataProceedings of the ACM on Management of Data10.1145/35889611:1(1-28)Online publication date: 30-May-2023
https://dl.acm.org/doi/10.1145/3588961
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Issue’s Table of Contents