Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Federated matrix factorization with privacy guarantee

Published: 01 December 2021 Publication History

Abstract

Matrix factorization (MF) approximates unobserved ratings in a rating matrix, whose rows correspond to users and columns correspond to items to be rated, and has been serving as a fundamental building block in recommendation systems. This paper comprehensively studies the problem of matrix factorization in different federated learning (FL) settings, where a set of parties want to cooperate in training but refuse to share data directly. We first propose a generic algorithmic framework for various settings of <u>f</u>ederated <u>m</u>atrix <u>f</u>actorization (FMF) and provide a theoretical convergence guarantee. We then systematically characterize privacy-leakage risks in data collection, training, and publishing stages for three different settings and introduce privacy notions to provide end-to-end privacy protections. The first one is vertical federated learning (VFL), where multiple parties have the ratings from the same set of users but on disjoint sets of items. The second one is horizontal federated learning (HFL), where parties have ratings from different sets of users but on the same set of items. The third setting is local federated learning (LFL), where the ratings of the users are only stored on their local devices. We introduce adapted versions of FMF with the privacy notions guaranteed in the three settings. In particular, a new private learning technique called embedding clipping is introduced and used in all the three settings to ensure differential privacy. For the LFL setting, we combine differential privacy with secure aggregation to protect the communication between user devices and the server with a strength similar to the local differential privacy model, but much better accuracy. We perform experiments to demonstrate the effectiveness of our approaches.

References

[1]
Martin Abadi, Andy Chu, Ian Goodfellow, H. Brendan McMahan, Ilya Mironov, Kunal Talwar, and Li Zhang. 2016. Deep Learning with Differential Privacy. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security (Vienna, Austria) (CCS '16). Association for Computing Machinery, New York, NY, USA, 308--318.
[2]
Charu C Aggarwal et al. 2016. Recommender systems. Vol. 1. Springer, New York, USA.
[3]
Eugene Bagdasaryan, Omid Poursaeed, and Vitaly Shmatikov. 2019. Differential privacy has disparate impact on model accuracy. Advances in Neural Information Processing Systems 32 (2019), 15479--15488.
[4]
Oren Barkan and Noam Koenigstein. 2016. Item2vec: neural item embedding for collaborative filtering. In 2016 IEEE 26th International Workshop on Machine Learning for Signal Processing (MLSP). IEEE, IEEE, New York, NY, USA, 1--6.
[5]
James Henry Bell, Kallista A. Bonawitz, Adrià Gascón, Tancrède Lepoint, and Mariana Raykova. 2020. Secure Single-Server Aggregation with (Poly)Logarithmic Overhead. In Proceedings of the 2020 ACM SIGSAC Conference on Computer and Communications Security (Virtual Event, USA) (CCS '20). Association for Computing Machinery, New York, NY, USA, 1253--1269.
[6]
Keith Bonawitz, Vladimir Ivanov, Ben Kreuter, Antonio Marcedone, H. Brendan McMahan, Sarvar Patel, Daniel Ramage, Aaron Segal, and Karn Seth. 2017. Practical Secure Aggregation for Privacy-Preserving Machine Learning. In Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security (Dallas, Texas, USA) (CCS '17). Association for Computing Machinery, New York, NY, USA, 1175--1191.
[7]
Lukas Brozovsky and Vaclav Petricek. 2007. Recommender System for Online Dating Service.
[8]
Di Chai, Leye Wang, Kai Chen, and Qiang Yang. 2021. Secure Federated Matrix Factorization. IEEE Intelligent Systems 36, 5 (2021), 11--20.
[9]
Tianyi Chen, Xiao Jin, Yuejiao Sun, and Wotao Yin. 2020. VAFL: a Method of Vertical Asynchronous Federated Learning. arXiv:2007.06081 [cs.LG]
[10]
Xiangyi Chen, Steven Z. Wu, and Mingyi Hong. 2020. Understanding Gradient Clipping in Private SGD: A Geometric Perspective. In Advances in Neural Information Processing Systems, H. Larochelle, M. Ranzato, R. Hadsell, M. F. Balcan, and H. Lin (Eds.), Vol. 33. Curran Associates, Inc., Red Hook, NY, USA, 13773--13782. https://proceedings.neurips.cc/paper/2020/file/9ecff5455677b38d19f49ce658ef0608-Paper.pdf
[11]
Bolin Ding, Janardhan Kulkarni, and Sergey Yekhanin. 2017. Collecting Telemetry Data Privately. In Proceedings of the 31st International Conference on Neural Information Processing Systems (Long Beach, California, USA) (NIPS'17). Curran Associates Inc., Red Hook, NY, USA, 3574--3583.
[12]
Irit Dinur and Kobbi Nissim. 2003. Revealing Information While Preserving Privacy. In Proceedings of the Twenty-Second ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (San Diego, California) (PODS '03). Association for Computing Machinery, New York, NY, USA, 202--210.
[13]
Cynthia Dwork, Frank McSherry, Kobbi Nissim, and Adam Smith. 2006. Calibrating Noise to Sensitivity in Private Data Analysis. In Theory of Cryptography, Shai Halevi and Tal Rabin (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 265--284.
[14]
Cynthia Dwork, Adam Smith, Thomas Steinke, and Jonathan Ullman. 2017. Exposed! a survey of attacks on private data. Annual Review of Statistics and Its Application 4 (2017), 61--84.
[15]
Úlfar Erlingsson, Vasyl Pihur, and Aleksandra Korolova. 2014. RAPPOR: Randomized Aggregatable Privacy-Preserving Ordinal Response. In Proceedings of the 2014 ACM SIGSAC Conference on Computer and Communications Security (Scottsdale, Arizona, USA) (CCS '14). Association for Computing Machinery, New York, NY, USA, 1054--1067.
[16]
Vitaly Feldman, Audra McMillan, and Kunal Talwar. 2021. Hiding Among the Clones: A Simple and Nearly Optimal Analysis of Privacy Amplification by Shuffling. arXiv:2012.12803 [cs.LG]
[17]
Adrian Flanagan, Were Oyomno, Alexander Grigorievskiy, Kuan E. Tan, Suleiman A. Khan, and Muhammad Ammad-Ud-Din. 2021. Federated Multi-view Matrix Factorization for Personalized Recommendations. In Machine Learning and Knowledge Discovery in Databases. Springer International Publishing, Cham, 324--347.
[18]
Rainer Gemulla, Erik Nijkamp, Peter J. Haas, and Yannis Sismanis. 2011. Large-Scale Matrix Factorization with Distributed Stochastic Gradient Descent. In Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (San Diego, California, USA) (KDD '11). Association for Computing Machinery, New York, NY, USA, 69--77.
[19]
Antonious Girgis, Deepesh Data, Suhas Diggavi, Peter Kairouz, and Ananda Theertha Suresh. 2021. Shuffled Model of Differential Privacy in Federated Learning. In Proceedings of The 24th International Conference on Artificial Intelligence and Statistics (Proceedings of Machine Learning Research), Arindam Banerjee and Kenji Fukumizu (Eds.), Vol. 130. PMLR, USA, 2521--2529. https://proceedings.mlr.press/v130/girgis21a.html
[20]
Anupam Gupta, Katrina Ligett, Frank McSherry, Aaron Roth, and Kunal Talwar. 2010. Differentially private combinatorial optimization. SIAM, Philadelphia, PA, USA, 1106--1125. arXiv:https://epubs.siam.org/doi/pdf/10.1137/1.9781611973075.90
[21]
F. Maxwell Harper and Joseph A. Konstan. 2015. The MovieLens Datasets: History and Context. ACM Trans. Interact. Intell. Syst. 5, 4, Article 19 (dec 2015), 19 pages.
[22]
Yaochen Hu, Peng Liu, Linglong Kong, and Di Niu. 2019. Learning Privately over Distributed Features: An ADMM Sharing Approach. arXiv:1907.07735 [cs.LG]
[23]
Yaochen Hu, Di Niu, Jianming Yang, and Shengping Zhou. 2019. FDML: A Collaborative Machine Learning Framework for Distributed Features. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (Anchorage, AK, USA) (KDD '19). Association for Computing Machinery, New York, NY, USA, 2232--2240.
[24]
Jingyu Hua, Chang Xia, and Sheng Zhong. 2015. Differentially private matrix factorization. In Proceedings of the 24th International Conference on Artificial Intelligence (Buenos Aires, Argentina) (IJCAI'15). AAAI Press, USA, 1763--1770.
[25]
Prateek Jain, Om Dipakbhai Thakkar, and Abhradeep Thakurta. 2018. Differentially Private Matrix Completion Revisited. In Proceedings of the 35th International Conference on Machine Learning (Proceedings of Machine Learning Research), Jennifer Dy and Andreas Krause (Eds.), Vol. 80. PMLR, USA, 2215--2224. https://proceedings.mlr.press/v80/jain18b.html
[26]
Peter Kairouz, H. Brendan McMahan, Brendan Avent, Aurélien Bellet, Mehdi Bennis, Arjun Nitin Bhagoji, Kallista Bonawitz, Zachary Charles, Graham Cormode, Rachel Cummings, Rafael G. L. D'Oliveira, Hubert Eichner, Salim El Rouayheb, David Evans, Josh Gardner, Zachary Garrett, Adrià Gascón, Badih Ghazi, Phillip B. Gibbons, Marco Gruteser, Zaid Harchaoui, Chaoyang He, Lie He, Zhouyuan Huo, Ben Hutchinson, Justin Hsu, Martin Jaggi, Tara Javidi, Gauri Joshi, Mikhail Khodak, Jakub Konečný, Aleksandra Korolova, Farinaz Koushanfar, Sanmi Koyejo, Tancrède Lepoint, Yang Liu, Prateek Mittal, Mehryar Mohri, Richard Nock, Ayfer Özgür, Rasmus Pagh, Mariana Raykova, Hang Qi, Daniel Ramage, Ramesh Raskar, Dawn Song, Weikang Song, Sebastian U. Stich, Ziteng Sun, Ananda Theertha Suresh, Florian Tramèr, Praneeth Vepakomma, Jianyu Wang, Li Xiong, Zheng Xu, Qiang Yang, Felix X. Yu, Han Yu, and Sen Zhao. 2021. Advances and Open Problems in Federated Learning. arXiv:1912.04977 [cs.LG]
[27]
Yejin Kim, Jimeng Sun, Hwanjo Yu, and Xiaoqian Jiang. 2017. Federated tensor factorization for computational phenotyping. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Association for Computing Machinery, New York, NY, USA, 887--895.
[28]
Jakub Konečný, H. Brendan McMahan, Daniel Ramage, and Peter Richtárik. 2016. Federated Optimization: Distributed Machine Learning for On-Device Intelligence. arXiv:1610.02527 [cs.LG]
[29]
Jakub Konečný, H. Brendan McMahan, Felix X. Yu, Peter Richtárik, Ananda Theertha Suresh, and Dave Bacon. 2017. Federated Learning: Strategies for Improving Communication Efficiency. arXiv:1610.05492 [cs.LG]
[30]
Yehuda Koren, Robert Bell, and Chris Volinsky. 2009. Matrix factorization techniques for recommender systems. Computer 42, 8 (2009), 30--37.
[31]
Ninghui Li, Wahbeh Qardaji, and Dong Su. 2012. On sampling, anonymization, and differential privacy or, k-anonymization meets differential privacy. In Proceedings of the 7th ACM Symposium on Information, Computer and Communications Security. Association for Computing Machinery, New York, NY, USA, 32--33.
[32]
Yang Liu, Yingting Liu, Zhijie Liu, Yuxuan Liang, Chuishi Meng, Junbo Zhang, and Yu Zheng. 2020. Federated forest. IEEE Transactions on Big Data 01 (2020), 1--1.
[33]
Ziqi Liu, Yu-Xiang Wang, and Alexander Smola. 2015. Fast Differentially Private Matrix Factorization. In Proceedings of the 9th ACM Conference on Recommender Systems (Vienna, Austria) (RecSys '15). Association for Computing Machinery, New York, NY, USA, 171--178.
[34]
Jing Ma, Qiuchen Zhang, Jian Lou, Joyce C. Ho, Li Xiong, and Xiaoqian Jiang. 2019. Privacy-Preserving Tensor Factorization for Collaborative Health Data Analysis. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management (Beijing, China) (CIKM '19). Association for Computing Machinery, New York, NY, USA, 1291--1300.
[35]
Brendan McMahan, Eider Moore, Daniel Ramage, Seth Hampson, and Blaise Aguera y Arcas. 2017. Communication-Efficient Learning of Deep Networks from Decentralized Data. In Proceedings of the 20th International Conference on Artificial Intelligence and Statistics (Proceedings of Machine Learning Research), Aarti Singh and Jerry Zhu (Eds.), Vol. 54. PMLR, USA, 1273--1282. https://proceedings.mlr.press/v54/mcmahan17a.html
[36]
H Brendan McMahan, Daniel Ramage, Kunal Talwar, and Li Zhang. 2018. Learning Differentially Private Recurrent Language Models. In International Conference on Learning Representations. OpenReview.net. https://openreview.net/forum?id=BJ0hF1Z0b
[37]
Frank D McSherry. 2009. Privacy integrated queries: an extensible platform for privacy-preserving data analysis. In Proceedings of the 2009 ACM SIGMOD International Conference on Management of data. Association for Computing Machinery, New York, NY, USA, 19--30.
[38]
Luca Melis, Congzheng Song, Emiliano De Cristofaro, and Vitaly Shmatikov. 2019. Exploiting unintended feature leakage in collaborative learning. In 2019 IEEE Symposium on Security and Privacy (SP). IEEE, New York, NY, USA, 691--706.
[39]
Arvind Narayanan and Vitaly Shmatikov. 2006. How to break anonymity of the netflix prize dataset.
[40]
Yilin Shen and Hongxia Jin. 2014. Privacy-preserving personalized recommendation: An instance-based approach via differential privacy. In 2014 IEEE International Conference on Data Mining. IEEE, New York, NY, USA, 540--549.
[41]
Hyejin Shin, Sungwook Kim, Junbum Shin, and Xiaokui Xiao. 2018. Privacy enhanced matrix factorization for recommendation with local differential privacy. IEEE Transactions on Knowledge and Data Engineering 30, 9 (2018), 1770--1782.
[42]
Shuang Song, Kamalika Chaudhuri, and Anand D Sarwate. 2013. Stochastic gradient descent with differentially private updates. In 2013 IEEE Global Conference on Signal and Information Processing. IEEE, New York, NY, USA, 245--248.
[43]
Apple Differential Privacy Team. 2021. Differential Privacy Overview - Apple. Apple. inc. https://www.apple.com/privacy/docs/Differential_Privacy_Overview.pdf
[44]
TensorFlow-Privacy. 2021. TensorFlow Privacy. https://github.com/tensorflow/privacy. Accessed: 2021-12-01.
[45]
Yu-Xiong Wang and Yu-Jin Zhang. 2012. Nonnegative matrix factorization: A comprehensive review. IEEE Transactions on Knowledge and Data Engineering 25, 6 (2012), 1336--1353.
[46]
Yu-Xiong Wang and Yu-Jin Zhang. 2012. Nonnegative matrix factorization: A comprehensive review. IEEE Transactions on Knowledge and Data Engineering 25, 6 (2012), 1336--1353.
[47]
Kang Wei, Jun Li, Ming Ding, Chuan Ma, Howard H Yang, Farhad Farokhi, Shi Jin, Tony QS Quek, and H Vincent Poor. 2020. Federated learning with differential privacy: Algorithms and performance analysis. IEEE Transactions on Information Forensics and Security 15 (2020), 3454--3469.
[48]
Nan Wu, Farhad Farokhi, David Smith, and Mohamed Ali Kaafar. 2020. The value of collaboration in convex machine learning with differential privacy. In 2020 IEEE Symposium on Security and Privacy (SP). IEEE, New York, NY, USA, 304--317.
[49]
Yuncheng Wu, Shaofeng Cai, Xiaokui Xiao, Gang Chen, and Beng Chin Ooi. 2020. Privacy Preserving Vertical Federated Learning for Tree-based Models. Proceedings of the VLDB Endowment 13, 11 (2020), 2090--2103.
[50]
Qiang Yang, Yang Liu, Tianjian Chen, and Yongxin Tong. 2019. Federated machine learning: Concept and applications. ACM Transactions on Intelligent Systems and Technology (TIST) 10, 2 (2019), 1--19.
[51]
Sheng Zhang, Weihong Wang, James Ford, and Fillia Makedon. 2006. Learning from incomplete ratings using non-negative matrix factorization. Proceedings of the 2006 SIAM international conference on data mining 1 (2006), 549--553.
[52]
Handong Zhao, Zhengming Ding, and Yun Fu. 2017. Multi-view clustering via deep matrix factorization. Proceedings of the AAAI Conference on Artificial Intelligence 31, 1 (2017), -. https://ojs.aaai.org/index.php/AAAI/article/view/10867
[53]
Ligeng Zhu and Song Han. 2020. Deep leakage from gradients. Federated Learning 12500 (2020), 17--31.

Cited By

View all
  • (2024)A Profit-Maximizing Data Marketplace with Differentially Private Federated Learning under Price CompetitionProceedings of the ACM on Management of Data10.1145/36771272:4(1-27)Online publication date: 30-Sep-2024
  • (2024)VertiMRF: Differentially Private Vertical Federated Data SynthesisProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671771(4431-4442)Online publication date: 25-Aug-2024
  • (2024)A comprehensive survey of federated transfer learning: challenges, methods and applicationsFrontiers of Computer Science: Selected Publications from Chinese Universities10.1007/s11704-024-40065-x18:6Online publication date: 1-Dec-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Proceedings of the VLDB Endowment
Proceedings of the VLDB Endowment  Volume 15, Issue 4
December 2021
246 pages
ISSN:2150-8097
Issue’s Table of Contents

Publisher

VLDB Endowment

Publication History

Published: 01 December 2021
Published in PVLDB Volume 15, Issue 4

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)32
  • Downloads (Last 6 weeks)2
Reflects downloads up to 03 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2024)A Profit-Maximizing Data Marketplace with Differentially Private Federated Learning under Price CompetitionProceedings of the ACM on Management of Data10.1145/36771272:4(1-27)Online publication date: 30-Sep-2024
  • (2024)VertiMRF: Differentially Private Vertical Federated Data SynthesisProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671771(4431-4442)Online publication date: 25-Aug-2024
  • (2024)A comprehensive survey of federated transfer learning: challenges, methods and applicationsFrontiers of Computer Science: Selected Publications from Chinese Universities10.1007/s11704-024-40065-x18:6Online publication date: 1-Dec-2024
  • (2024)FedST: secure federated shapelet transformation for time series classificationThe VLDB Journal — The International Journal on Very Large Data Bases10.1007/s00778-024-00865-w33:5(1617-1641)Online publication date: 1-Sep-2024
  • (2023)(Amplified) banded matrix factorizationProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3669393(74856-74889)Online publication date: 10-Dec-2023
  • (2023)Federated spectral clustering via secure similarity reconstructionProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3668673(58520-58555)Online publication date: 10-Dec-2023
  • (2023)Falcon: A Privacy-Preserving and Interpretable Vertical Federated Learning SystemProceedings of the VLDB Endowment10.14778/3603581.360358816:10(2471-2484)Online publication date: 1-Jun-2023
  • (2023)Differentially Private Vertical Federated ClusteringProceedings of the VLDB Endowment10.14778/3583140.358314616:6(1277-1290)Online publication date: 1-Feb-2023
  • (2023)FederatedScope: A Flexible Federated Learning Platform for HeterogeneityProceedings of the VLDB Endowment10.14778/3579075.357908116:5(1059-1072)Online publication date: 1-Jan-2023
  • (2023)FEAST: A Communication-efficient Federated Feature Selection Framework for Relational DataProceedings of the ACM on Management of Data10.1145/35889611:1(1-28)Online publication date: 30-May-2023
  • Show More Cited By

View Options

Get Access

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media