Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Data Sharing via Differentially Private Coupled Matrix Factorization

Published: 13 May 2020 Publication History

Abstract

We address the privacy-preserving data-sharing problem in a distributed multiparty setting. In this setting, each data site owns a distinct part of a dataset and the aim is to estimate the parameters of a statistical model conditioned on the complete data without any site revealing any information about the individuals in their own parts. The sites want to maximize the utility of the collective data analysis while providing privacy guarantees for their own portion of the data as well as for each participating individual. Our first contribution is to classify these different privacy requirements as (i) site-level and (ii) user-level differential privacy and present formal privacy guarantees for these two cases under the model of differential privacy. To satisfy a stronger form of differential privacy, we use a variant of differential privacy which is local differential privacy where the sensitive data is perturbed with a randomized response mechanism prior to the estimation. In this study, we assume that the data instances that are partitioned between several parties are arranged as matrices. A natural statistical model for this distributed scenario is coupled matrix factorization. We present two generic frameworks for privatizing Bayesian inference for coupled matrix factorization models that are able to guarantee proposed differential privacy notions based on the privacy requirements of the model. To privatize Bayesian inference, we first exploit the connection between differential privacy and sampling from a Bayesian posterior via stochastic gradient Langevin dynamics and then derive an efficient coupled matrix factorization method. In the local privacy context, we propose two models that have an additional privatization mechanism to achieve a stronger measure of privacy and introduce a Gibbs sampling based algorithm. We demonstrate that the proposed methods are able to provide good prediction accuracy on synthetic and real datasets while adhering to the introduced privacy constraints.

References

[1]
Martin Abadi, Andy Chu, Ian Goodfellow, Brendan McMahan, Ilya Mironov, Kunal Talwar, and Li Zhang. 2016. Deep learning with differential privacy. In Proceedings of the 23rd ACM Conference on Computer and Communications Security (ACM CCS’16). 308--318.
[2]
Orly Alter, Patrick O. Brown, and David Botstein. 2003. Generalized singular value decomposition for comparative analysis of genome-scale expression data sets of two different organisms. Proceedings of the National Academy of Sciences 100, 6 (2003), 3351--3356.
[3]
Rina Foygel Barber and John C. Duchi. 2014. Privacy and statistical risk: Formalisms and minimax bounds. arXiv:1412.4451 (2014).
[4]
Raef Bassily, Adam Smith, and Abhradeep Thakurta. 2014. Private empirical risk minimization, revisited. arXiv:1405.7085 (2014).
[5]
Christopher M. Bishop. 2006. Pattern Recognition and Machine Learning (Information Science and Statistics). Springer-Verlag New York, Inc., Secaucus, NJ.
[6]
Ali Taylan Cemgil. 2009. Bayesian inference for nonnegative matrix factorisation models. Intell. Neuroscience 2009, Article 4 (Jan. 2009), 17 pages. https://doi.org/10.1155/2009/785152
[7]
Kamalika Chaudhuri, Claire Monteleoni, and Anand D. Sarwate. 2011. Differentially private empirical risk minimization. The Journal of Machine Learning Research 12 (2011), 1069--1109.
[8]
Siddhartha Chib and Edward Greenberg. 1995. Understanding the Metropolis-Hastings algorithm. The American Statistician 49, 4 (1995), 327--335.
[9]
Christos Dimitrakakis, Blaine Nelson, Aikaterini Mitrokotsa, and Benjamin I. P. Rubinstein. 2014. Robust and private Bayesian inference. In Algorithmic Learning Theory. Springer, 291--305.
[10]
John C. Duchi, Michael I. Jordan, and Martin J. Wainwright. 2013. Local privacy, data processing inequalities, and minimax rates. arXiv:1302.3203 (2013).
[11]
Cynthia Dwork. 2006. Differential privacy. In Proceedings of the 33rd international conference on Automata, Languages and Programming - Volume Part II. Springer, 1--12.
[12]
Cynthia Dwork and Jing Lei. 2009. Differential privacy and robust statistics. In Proceedings of the 41st Annual ACM Symposium on Theory of Computing. ACM, 371--380.
[13]
Cynthia Dwork, Frank McSherry, Kobbi Nissim, and Adam Smith. 2006. Calibrating noise to sensitivity in private data analysis. In Proceedings of the 3rd Conference on Theory of Cryptography (TCC’06). Springer-Verlag, Berlin, Heidelberg, 265--284.
[14]
Cynthia Dwork and Aaron Roth. 2013. The algorithmic foundations of differential privacy. Theoretical Computer Science 9, 3-4 (2013), 211--407.
[15]
Cynthia Dwork and Adam Smith. 2010. Differential privacy for statistics: What we know and what we want to learn. Journal of Privacy and Confidentiality 1, 2 (2010), 2.
[16]
Cynthia Dwork, Kunal Talwar, Abhradeep Thakurta, and Li Zhang. 2014. Analyze gauss: Optimal bounds for privacy-preserving principal component analysis. In Proceedings of the 46th Annual ACM Symposium on Theory of Computing. ACM, 11--20.
[17]
Stuart Geman and Donald Geman. 1984. Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. IEEE Transactions on Pattern Analysis and Machine Intelligence 6 (1984), 721--741.
[18]
Arpita Ghosh, Tim Roughgarden, and Mukund Sundararajan. 2012. Universally utility-maximizing privacy mechanisms. SIAM Journal on Computing 41, 6 (2012), 1673--1693.
[19]
Walter R. Gilks, Sylvia Richardson, and David Spiegelhalter. 1995. Markov Chain Monte Carlo in Practice. CRC press.
[20]
Prem Gopalan, Jake M. Hofman, and David M. Blei. 2015. Scalable Recommendation with Hierarchical Poisson Factorization (UAI’15). AUAI Press, Arlington, Virginia, USA, 326–335.
[21]
Prem Gopalan, Francisco J. Ruiz, Rajesh Ranganath, and David M. Blei. 2014. Bayesian nonparametric poisson factorization for recommendation systems. In Proceedings of the International Conference on Artificial Intelligence and Statistics. 275--283.
[22]
Sunil Kumar Gupta, Santu Rana, and Svetha Venkatesh. 2016. Differentially private multi-task learning. In Proceedings of the 11th Pacific Asia Workshop on Intelligence and Security Informatics, Vol. 9650. 101--113.
[23]
Jihun Hamm, Paul Cao, and Mikhail Belkin. 2016. Learning privately from multiparty data. CoRR abs/1602.03552 (2016). Retrieved from http://arxiv.org/abs/1602.03552.
[24]
Mikko Heikkilä, Eemil Lagerspetz, Samuel Kaski, Kana Shimizu, Sasu Tarkoma, and Antti Honkela. 2017. Differentially private Bayesian learning on distributed data. In Proceedings of the Advances in Neural Information Processing Systems. 3226--3235.
[25]
Matthew D. Hoffman, David M. Blei, Chong Wang, and John William Paisley. 2013. Stochastic variational inference. Journal of Machine Learning Research 14, 1 (2013), 1303--1347. Retrieved from http://dl.acm.org/citation.cfm?id=2502622.
[26]
Naoise Holohan, Douglas J. Leith, and Oliver Mason. 2017. Extreme points of the local differential privacy polytope. Linear Algebra and its Applications 534 (2017), 78--96. http://mural.maynoothuniversity.ie/11658/.
[27]
Jingyu Hua, Chang Xia, and Sheng Zhong. 2015. Differentially private matrix factorization. In Proceedings of the International Joint Conferences on Artificial Intelligence. 1763--1770.
[28]
Hafiz Imtiaz and Anand D. Sarwate. 2018. Distributed differentially-private algorithms for matrix and tensor factorization. IEEE Journal of Selected Topics in Signal Processing 12, 6 (December 2018), 1449--1464. https://doi.org/10.1109/JSTSP.2018.2877842
[29]
Prateek Jain, Om Thakkar, and Abhradeep Thakurta. 2017. Differentially private matrix completion, revisited. arXiv preprint arXiv:1712.09765 (2017).
[30]
Joonas Jälkö, Onur Dikmen, and Antti Honkela. 2016. Differentially private variational inference for non-conjugate models. arXiv preprint arXiv:1610.08749 (2016).
[31]
Jia-Yun Jiang, Cheng-Te Li, and Shou-De Lin. 2017. Towards a more reliable privacy-preserving recommender system. arXiv preprint arXiv:1711.07638 (2017).
[32]
Meng Jiang, Peng Cui, Rui Liu, Qiang Yang, Fei Wang, Wenwu Zhu, and Shiqiang Yang. 2012. Social contextual recommendation. In Proceedings of the 21st ACM International Conference on Information and Knowledge Management. ACM, 45--54.
[33]
Peter Kairouz, Sewoong Oh, and Pramod Viswanath. 2014. Extremal mechanisms for local differential privacy. In Proceedings of the Advances in Neural Information Processing Systems. 2879--2887.
[34]
John Kent. 1978. Time-reversible diffusions. Advances in Applied Probability 10, 4 (1978), 819--835.
[35]
Bai Li, Changyou Chen, Hao Liu, and Lawrence Carin. 2019. On connecting stochastic gradient MCMC and differential privacy. 89 (Apr. 2019), 557--566.
[36]
Ziqi Liu, Yu-Xiang Wang, and Alexander J. Smola. 2015. Fast differentially private matrix factorization. In Proceedings of the 9th ACM Conference on Recommender Systems. 171--178.
[37]
Bo Long, Zhongfei Mark Zhang, Xiaoyun Wu, and Philip S. Yu. 2006. Spectral clustering for multi-type relational data. In Proceedings of the 23rd International Conference on Machine Learning. ACM, 585--592.
[38]
Frank McSherry and Ilya Mironov. 2009. Differentially private recommender systems: Building privacy into the net. In Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 627--636.
[39]
Darakhshan J. Mir. 2013. Information-theoretic foundations of differential privacy. In Proceedings of the International Symposium on Foundations and Practice of Security. Springer, 374--381.
[40]
Radford M. Neal et al. 2011. MCMC using Hamiltonian dynamics. Handbook of Markov Chain Monte Carlo 2, 11 (2011), 2.
[41]
Valeria Nikolaenko, Stratis Ioannidis, Udi Weinsberg, Marc Joye, Nina Taft, and Dan Boneh. 2013. Privacy-preserving matrix factorization. In Proceedings of the 2013 ACM SIGSAC Conference on Computer 8 Communications Security. ACM, 801--812.
[42]
Nicolas Papernot, Martín Abadi, Úlfar Erlingsson, Ian J. Goodfellow, and Kunal Talwar. 2016. Semi-supervised knowledge transfer for deep learning from private training data. CoRR abs/1610.05755 (2016).
[43]
Mijung Park, James R. Foulds, Kamalika Chaudhuri, and Max Welling. 2016. Variational Bayes In Private Settings (VIPS). CoRR abs/1611.00340 (2016). Retrieved from http://arxiv.org/abs/1611.00340.
[44]
Manas Pathak, Shantanu Rane, and Bhiksha Raj. 2010. Multiparty differential privacy via aggregation of locally trained classifiers. In Proceedings of the Advances in Neural Information Processing Systems. 1876--1884.
[45]
Arun Rajkumar and Shivani Agarwal. 2012. A differentially private stochastic gradient descent algorithm for multiparty classification. In Proceedings of the International Conference on Artificial Intelligence and Statistics. 933--941.
[46]
Christian P. Robert and George Casella. 2005. Monte Carlo Statistical Methods. Springer-Verlag New York, Inc., Secaucus, NJ.
[47]
Ruslan Salakhutdinov and Andriy Mnih. 2008. Bayesian probabilistic matrix factorization using Markov chain Monte Carlo. In Proceedings of the 25th International Conference on Machine Learning. ACM, 880--887.
[48]
A. D. Sarwate, S. M. Plis, J. A. Turner, M. R. Arbabshirani, and V. D. Calhoun. 2014. Sharing privacy-sensitive access to neuroimaging and genetics data: A review and preliminary validation. Frontiers in Neuroinformatics 8 (2014), 35. https://doi.org/10.3389/fninf.2014.00035
[49]
Anand D. Sarwate and Kamalika Chaudhuri. 2013. Signal processing and machine learning with differential privacy: Algorithms and challenges for continuous data. Signal Processing Magazine, IEEE 30, 5 (2013), 86--94.
[50]
Aaron Schein, Zhiwei Steven Wu, Mingyuan Zhou, and Hanna Wallach. 2019. Locally private Bayesian inference for count models. 97 (Jun. 2019), 5638--5648.
[51]
Reza Shokri and Vitaly Shmatikov. 2015. Privacy-preserving deep learning. In Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security, Denver, CO, October 12-6, 2015. 1310--1321.
[52]
Umut Simsekli, Ali Taylan Cemgil, and Beyza Ermis. 2015. Learning mixed divergences in coupled matrix and tensor factorization models. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing. 2120--2124.
[53]
Ajit P. Singh and Geoffrey J. Gordon. 2008. Relational learning via collective matrix factorization. In Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 650--658.
[54]
John G. Skellam. 1946. The frequency distribution of the difference between two Poisson variates belonging to different populations. Journal of the Royal Statistical Society. Series A (General) 109, Pt 3 (1946), 296--296.
[55]
Shuang Song, Kamalika Chaudhuri, and Anand D. Sarwate. 2013. Stochastic gradient descent with differentially private updates. In Proceedings of the IEEE Global Conference on Signal and Information Processing.
[56]
Shuang Song, Kamalika Chaudhuri, and Anand D. Sarwate. 2014. Learning from data with heterogeneous noise using SGD. arXiv:1412.5617 (2014).
[57]
Ambuj Tewari and Sougata Chaudhuri. 2014. On Lipschitz continuity and smoothness of loss functions in learning to rank. arXiv preprint arXiv:1405.0586 (2014).
[58]
Chain Monte Carlo. 2004. Markov chain Monte Carlo and Gibbs sampling. Lecture Notes for EEB 581 (2004).
[59]
Jun Wang and Qiang Tang. 2017. Differentially private neighborhood-based recommender systems. In Proceedings of the International Conference on ICT Systems Security and Privacy Protection. Springer, 459--473.
[60]
Yu-Xiang Wang, Stephen E. Fienberg, and Alexander J. Smola. 2015. Privacy for free: Posterior sampling and stochastic gradient Monte Carlo. In Proceedings of the 32nd International Conference on Machine Learning, Lille, France, 6-11 July 2015. 2493--2502.
[61]
Yu-Xiang Wang. 2018. Revisiting differentially private linear regression: Optimal and adaptive prediction 8 estimation in unbounded domain. (2018).
[62]
Stanley L. Warner. 1965. Randomized response: A survey technique for eliminating evasive answer bias. Journal of the American Statistical Association 60, 309 (1965), 63--69.
[63]
Max Welling and Yee W. Teh. 2011. Bayesian learning via stochastic gradient Langevin dynamics. In Proceedings of the 28th International Conference on Machine Learning. 681--688.
[64]
Oliver Williams and Frank McSherry. 2010. Probabilistic inference and differential privacy. In Proceedings of the Advances in Neural Information Processing Systems. 2451--2459.
[65]
Yonghui Xiao and Li Xiong. 2012. Bayesian inference under differential privacy. arXiv:1203.0617 (2012).
[66]
Liyang Xie, Inci M. Baytas, Kaixiang Lin, and Jiayu Zhou. 2017. Privacy-preserving distributed multi-task learning with asynchronous updates. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’17). 1195--1204.
[67]
Yu Xin and Tommi Jaakkola. 2014. Controlling privacy in recommender systems. In Proceedings of the Advances in Neural Information Processing Systems. 2618--2626.
[68]
Bin Yang, Issei Sato, and Hiroshi Nakagawa. 2015. Bayesian differential privacy on correlated data. In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data. ACM, New York, NY, 747--762.
[69]
Shuang-Hong Yang, Bo Long, Alex Smola, Narayanan Sadagopan, Zhaohui Zheng, and Hongyuan Zha. 2011. Like like alike: Joint friendship and interest propagation in social networks. In Proceedings of the 20th International Conference on World Wide Web. ACM, 537--546.
[70]
Kenan Y. Yılmaz, Ali T. Cemgil, and Umut Simsekli. 2011. Generalised coupled tensor factorisation. In Proceedings of the Advances in Neural Information Processing Systems. 2151--2159.
[71]
Jiho Yoo and Seungjin Choi. 2012. Hierarchical variational Bayesian matrix co-factorization. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing. 1901--1904.
[72]
Jiho Yoo, Minje Kim, Kyeongok Kang, and Seungjin Choi. 2010. Nonnegative matrix partial co-factorization for drum source separation. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing. 1942--1945.
[73]
Shun Zhang, Laixiang Liu, Zhili Chen, and Hong Zhong. 2018. Probabilistic matrix factorization with personalized differential privacy. Knowledge-Based Systems 183 (2019), 104864.

Cited By

View all
  • (2025)Differentially private recommender framework with Dual semi-AutoencoderExpert Systems with Applications10.1016/j.eswa.2024.125447260(125447)Online publication date: Jan-2025
  • (2024)Privacy-Preserving Non-Negative Matrix Factorization with OutliersACM Transactions on Knowledge Discovery from Data10.1145/363296118:3(1-26)Online publication date: 12-Jan-2024
  • (2024)FedCORE: Federated Learning for Cross-Organization Recommendation EcosystemIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.336350536:8(3817-3831)Online publication date: 1-Aug-2024
  • Show More Cited By

Index Terms

  1. Data Sharing via Differentially Private Coupled Matrix Factorization

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Knowledge Discovery from Data
    ACM Transactions on Knowledge Discovery from Data  Volume 14, Issue 3
    June 2020
    381 pages
    ISSN:1556-4681
    EISSN:1556-472X
    DOI:10.1145/3388473
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 13 May 2020
    Online AM: 07 May 2020
    Accepted: 01 November 2019
    Revised: 01 September 2019
    Received: 01 March 2019
    Published in TKDD Volume 14, Issue 3

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Differential privacy
    2. Markov Chain Monte Carlo (MCMC)
    3. collective matrix factorization
    4. distributed data
    5. local differential privacy
    6. stochastic gradient Langevin dynamics (SGLD)

    Qualifiers

    • Research-article
    • Research
    • Refereed

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)28
    • Downloads (Last 6 weeks)2
    Reflects downloads up to 11 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2025)Differentially private recommender framework with Dual semi-AutoencoderExpert Systems with Applications10.1016/j.eswa.2024.125447260(125447)Online publication date: Jan-2025
    • (2024)Privacy-Preserving Non-Negative Matrix Factorization with OutliersACM Transactions on Knowledge Discovery from Data10.1145/363296118:3(1-26)Online publication date: 12-Jan-2024
    • (2024)FedCORE: Federated Learning for Cross-Organization Recommendation EcosystemIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.336350536:8(3817-3831)Online publication date: 1-Aug-2024
    • (2023)A Matrix Factorization Recommendation System-Based Local Differential Privacy for Protecting Users’ Sensitive DataIEEE Transactions on Computational Social Systems10.1109/TCSS.2022.317069110:3(1189-1198)Online publication date: Jun-2023
    • (2023)Cross-platform sequential recommendation with sharing item-level relevance dataInformation Sciences10.1016/j.ins.2022.11.112621(265-286)Online publication date: Apr-2023
    • (2022)Multiple Strategies Differential Privacy on Sparse Tensor Factorization for Network Traffic Analysis in 5GIEEE Transactions on Industrial Informatics10.1109/TII.2021.308257618:3(1939-1948)Online publication date: Mar-2022
    • (2022)A differentially private nonnegative matrix factorization for recommender systemInformation Sciences10.1016/j.ins.2022.01.050592(21-35)Online publication date: May-2022
    • (2021)Cross-domain Recommendation with Bridge-Item EmbeddingsACM Transactions on Knowledge Discovery from Data10.1145/344768316:1(1-23)Online publication date: 20-Jul-2021
    • (2021)FCMF: Federated collective matrix factorization for heterogeneous collaborative filteringKnowledge-Based Systems10.1016/j.knosys.2021.106946220(106946)Online publication date: May-2021
    • (2021)A privacy-preserving framework for cross-domain recommender systemsComputers & Electrical Engineering10.1016/j.compeleceng.2021.10721393(107213)Online publication date: Jul-2021

    View Options

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media