Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Optimizing Batch Linear Queries under Exact and Approximate Differential Privacy

Published: 30 June 2015 Publication History

Abstract

Differential privacy is a promising privacy-preserving paradigm for statistical query processing over sensitive data. It works by injecting random noise into each query result such that it is provably hard for the adversary to infer the presence or absence of any individual record from the published noisy results. The main objective in differentially private query processing is to maximize the accuracy of the query results while satisfying the privacy guarantees. Previous work, notably Li et al. [2010], has suggested that, with an appropriate strategy, processing a batch of correlated queries as a whole achieves considerably higher accuracy than answering them individually. However, to our knowledge there is currently no practical solution to find such a strategy for an arbitrary query batch; existing methods either return strategies of poor quality (often worse than naive methods) or require prohibitively expensive computations for even moderately large domains. Motivated by this, we propose a low-rank mechanism (LRM), the first practical differentially private technique for answering batch linear queries with high accuracy. LRM works for both exact (i.e., ϵ-) and approximate (i.e., (ϵ, δ)-) differential privacy definitions. We derive the utility guarantees of LRM and provide guidance on how to set the privacy parameters, given the user's utility expectation. Extensive experiments using real data demonstrate that our proposed method consistently outperforms state-of-the-art query processing solutions under differential privacy, by large margins.

References

[1]
K. Ball. 1997. An elementary introduction to modern convex geometry. Flavors Geom. 31, 1--58.
[2]
A. Beck and M. Teboulle. 2009. A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imag. Sci. 2, 1, 183--202.
[3]
C. Beltran. 2011. Estimates on the condition number of random rank-deficient matrices. IMA J. Numer. Anal. 31, 1, 25--39.
[4]
D. P. Bertsekas. 1999. Nonlinear Programming. Athena Scientific.
[5]
R. Bhaskar, S. Laxman, A. Smith, and A. Thakurta. 2010. Discovering frequent patterns in sensitive data. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (SIGKDD'10). 503--512.
[6]
P. Billingsley. 2012. Probability and Measure, vol. 939. John Wiley and Sons, New York.
[7]
E. G. Birgin, J. M. Martinez, and M. Raydan. 2000. Nonmonotone spectral projected gradient methods on convex sets. SIAM J. Optim. 10, 4, 1196--1211.
[8]
A. Blum, K. Ligett, and A. Roth. 2008. A learning theory approach to non-interactive database privacy. In Proceedings of the ACM Symposium on Theory of Computing (STOC'08). 609--618.
[9]
E. J. Candes and B. Recht. 2009. Exact matrix completion via convex optimization. Foundat. Comput. Math. 9, 6, 717--772.
[10]
K. Chaudhuri, C. Monteleoni, and A. D. Sarwate. 2011. Differentially privae empirical risk minimization. J. Mach. Learn. Res. 12, 1069--1109.
[11]
Z. Chen and J. J. Dongarra. 2005. Condition numbers of gaussian random matrices. SIAM J. Matrix Anal. Appl. 27, 3, 603--620.
[12]
A. R. Conn, N. Gould, and L. Toint. Ph. 1997. A globally convergent lagrangian barrier algorithm for optimization with general inequality constraints and simple bounds. Math. Comput. 66, 217, 261--288.
[13]
G. Cormode, C. M. Procopiuc, E. Shen, D. Srivastava, and T. Yu. 2012. Differentially private spatial decompositions. In Proceedings of the IEEE International Conference on Data Engineering (ICDE'12). 20--31.
[14]
A. D'Aspremont, L. E. Ghaoui, M. I. Jordan, and G. R. G. Lanckriet. 2007. A direct formulation for sparse pca using semidefinite programming. SIAM Rev. 49, 3, 434--448.
[15]
A. De 2012. Lower bounds in differential privacy. In Theory of Cryptography, Springer, 321--338.
[16]
B. Ding, M. Winslett, J. Han, and Z. Li. 2011. Differentially private data cubes: Optimizing noise sources and consistency. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD'11). 217--228.
[17]
I. Dinur and K. Nissim. 2003. Revealing information while preserving privacy. In Proceedings of the ACM Symposium on Principles of Database Systems (PODS'03). 202--210.
[18]
J. Duchi, S. Shalev-Shwartz, Y. Singer, and T. Chandra. 2008. Efficient projections onto the l1-ball for learning in high dimensions. In Proceedings of the International Conference on Machine Learning (ICML'08). 272--279.
[19]
C. Dwork, K. Kenthapadi, F. Mcsherry, I. Mironov, and M. Naor. 2006a. Our data, ourselves: Privacy via distributed noise generation. In Proceedings of the Annual International Conference on the Theory and Applications of Cryptographic Techniques (EUROCRYPT'06). Lecture Notes in Computer Science Series, vol. 4004, Springer, 486--503.
[20]
C. Dwork, K. Kenthapadi, F. Mcsherry, I. Mironov, and M. Naor. 2006b. Our data, ourselves: Privacy via distributed noise generation. In Proceedings of the Annual International Conference on the Theory and Applications of Cryptographic Techniques (EUROCRYPT'06). 486--503.
[21]
C. Dwork, F. Mcsherry, K. Nissim, and A. Smith. 2006c. Calibrating noise to sensitivity in private data analysis. In Proceedings of the Theory of Cryptography Conference (TCC'06). 265--284.
[22]
C. Dwork, G. N. Rothblum, and S. P. Vadhan. 2010. Boosting and differential privacy. In Proceedings of the Symposium on Foundations of Computer Science (FOCS'10). 51--60.
[23]
M. E. Dyer, A. M. Frieze, and R. Kannan. 1991. A random polynomial time algorithm for approximating the volume of convex bodies. J. Assoc. Comput. Mach. 38, 1, 1--17.
[24]
A. V. Fiacco and G. P. Mccormick. 1968. Nonlinear Programming: Sequential Unconstrained Minimization Techniques. John Wiley and Sons, New York.
[25]
A. Friedman and A. Schuster. 2010. Data mining with differential privacy. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (SIGKDD'10). 493--502.
[26]
A. Ghosh, T. Roughgarden, and M. Sundararajan. 2012. Universally utility-maximizing privacy mechanisms. SIAM J. Comput. 41, 6, 1673--1693.
[27]
L. Grippo and M. Sciandrone. 2000. On the convergence of the block nonlinear gauss-seidel method under convex constraints. Oper. Res. Lett. 26, 3, 127--136.
[28]
M. Hardt, K. Ligett, and F. Mcsherry. 2012. A simple and practical algorithm for differentially private data release. In Proceedings of the Conference on Neural Information Processing Systems (NIPS'12). 2348--2356.
[29]
M. Hardt and A. Roth. 2012. Beating randomized response on incoherent matrices. In Proceedings of the ACM Symposium on Theory of Computing (STOC'12). ACM Press, New York, 1255--1268.
[30]
M. Hardt and G. N. Rothblum. 2010. A multiplicative weights mechanism for privacy-preserving data analysis. In Proceedings of the Symposium on Foundations of Computer Science (FOCS'10). IEEE, 61--70.
[31]
M. Hardt and K. Talwar. 2010. On the geometry of differential privacy. In Proceedings of the ACM Symposium on Theory of Computing (STOC'10). 705--714.
[32]
M. Hay, C. Li, G. Miklau, and D. Jensen. 2009. Accurate estimation of the degree distribution of private networks. In Proceedings of the IEEE International Conference on Data Mining (ICDM'09). 169--178.
[33]
M. Hay, V. Rastogi, G. Miklau, and D. Suciu. 2010. Boosting the accuracy of differentially private histograms through consistency. Proc. VLDB Endow. 3, 1, 1021--1032.
[34]
C. Li, M. Hay, V. Rastogi, G. Miklau, and A. Mcgregor. 2010. Optimizing linear counting queries under differential privacy. In Proceedings of the ACM Symposium on Principles of Database Systems (PODS'10). 123--134.
[35]
C. Li and G. Miklau. 2012. An adaptive mechanism for accurate query answering under differential privacy. Proc. VLDB Endow. 5, 6, 514--525.
[36]
C. Li and G. Miklau. 2013. Optimal error of query sets under the differentially-private matrix mechanism. In Proceedings of the International Conference on Database Theory (ICDT'13). 272--283.
[37]
N. Li, W. H. Qardaji, D. Su, and J. Cao. 2012. Privbasis: Frequent itemset mining with differential privacy. Proc. VLDB Endow. 5, 11, 1340--1351.
[38]
Y. D. Li, Z. Zhang, M. Winslett, and Y. Yang. 2011. Compressive mechanism: Utilizing sparse respresentation in differential privacy. In Proceedings of the ACM Workshop on Privacy in the Electronic Society (WPES'11). 177--182.
[39]
Z. Lin, M. Chen, L. Wu, and Y. Ma. 2010. The augmented lagrange multiplier method for exact recovery of corrupted low-rank matrices. http://yima.csl.illinois.edu/psfile/Lin09-MP.pdf.
[40]
F. Mcsherry and R. Mahajan. 2010. Differentially-private network trace analysis. In Proceedings of the ACM SIGCOMM International Conference on Data Communication (SIGCOMM'10). 123--134.
[41]
F. Mcsherry and I. Mironov. 2009. Differentially private recommender systems: Building privacy into the netflix prize contenders. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (SIGKDD'09). 627--636.
[42]
F. Mcsherry and K. Talwar. 2007. Mechanism design via differential privacy. In Proceedings of the Symposium on Foundations of Computer Science (FOCS'07). 94--103.
[43]
N. Mohammed, R. Chen, B. C. M. FUNG, and P. S. Yu. 2011. Differentially private data release for data mining. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (SIGKDD'11). 493--501.
[44]
Y. E. Nesterov. 2003. Introductory Lectures on Convex Optimization: A Basic Course. Applied Optimization Series, vol. 87. Kluwer Academic Publishers.
[45]
A. Nikolov, K. Talwar, and L. Zhang. 2013. The geometry of differential privacy: The sparse and approximate cases. In Proceedings of the ACM Symposium on Theory of Computing (STOC'13). ACM Press, New York, 351--360.
[46]
S. Peng, Y. Yang, Z. Zhang, M. Winslett, and Y. Yu. 2012. Dp-tree: indexing multi-dimensional data under differential privacy (abstract only). In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD'12). 864.
[47]
S. Peng, Y. Yang, Z. Zhang, M. Winslett, and Y. Yu. 2013. Query optimization for differentially private data management systems. In Proceedings of the IEEE International Conference on Data Engineering (ICDE'13). 1093--1104.
[48]
V. Rastogi, M. Hay, G. Miklau, and D. Suciu. 2009. Relationship privacy: Output perturbation for queries with joins. In Proceedings of the ACM Symposium on Principles of Database Systems (PODS'09). 107--116.
[49]
V. Rastogi and S. Nath. 2010. Differentially private aggregation of distributed time-series with transformation and encryption. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD'10). 735--746.
[50]
A. Roth and T. Roughgarden. 2010. Interactive privacy via the median mechanism. In Proceedings of the ACM Symposium on Theory of Computing (STOC'10). 765--774.
[51]
B. I. Rubinstein, P. L. Bartlett, L. Huang, and N. Taft. 2012. Learning in a large function space: Privacy-preserving mechanisms for SVM learning. J. Privacy Confident. 4, 1, 65--100.
[52]
A. Sala, X. Zhao, C. Wilson, H. Zheng, and B. Y. Zhao. 2011. Sharing graphs using differentially private graph models. In Proceedings of the ACM Internet Measurement Conference (IMC'11). 81--98.
[53]
N. Srebro, J. D. Rennie, and T. Jaakkola. 2004. Maximum-margin matrix factorization. In Proceedings of the Conference on Advances Neural Information Processing Systems (NIPS'04). Vol. 17. 1329--1336.
[54]
M. J. Todd and E. A. Yildirim. 2007. On Khachiyan's algorithm for the computation of minimum-volume enclosing ellipsoids. Discr. Appl. Math. 155, 13, 1731--1744.
[55]
Z. Wen, C. Yang, X. Liu, and S. Marchesini. 2012a. Alternating direction methods for classical and ptychographic phase retrieval. Inverse Problems 28, 11, 115010.
[56]
Z. Wen, W. Yin, and Y. Zhang. 2012b. Solving a low-rank factorization model for matrix completion by a nonlinear successive over-relaxation algorithm. Math. Program. Comput. 4, 4, 333--361.
[57]
X. Xiao, G. Bender, M. Hay, and J. Gehrke. 2011. iReduct: Differential privacy with reduced relative errors. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD'11). 229--240.
[58]
X. Xiao, G. Wang, and J. Gehrke. 2010. Differential privacy via wavelet transforms. In Proceedings of the IEEE International Conference on Data Engineering (ICDE'10). 225--236.
[59]
J. Xu, Z. Zhang, X. Xiao, Y. Yang, and G. Yu. 2012. Differentially private histogram publication. In Proceedings of the International Conference on Data Engineering (ICDE'12). 32--43.
[60]
J. Xu, Z. Zhang, X. Xiao, Y. Yang, G. Yu, and M. Winslett. 2013. Differentially private histogram publication. VLDB J. 22, 6, 797--822.
[61]
Y. Yang, Z. Zhang, G. Miklau, M. Winslett, and X. Xiao. 2012. Differential privacy in data publication and analysis. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD'12). 601--606.
[62]
G. Yuan, Z. Zhang, M. Winslett, X. Xiao, Y. Yang, and Z. Hao. 2012. Low-rank mechanism: Optimizing batch queries under differential privacy. Proc. VLDB Endow. 5, 11, 1352--1363.
[63]
J. Zhang, X. Xiao, Y. Yang, Z. Zhang, and M. Winslett. 2013. Privgene: Differentially private model fitting using genetic algorithms. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD'13). ACM Press, New York, 665--676.
[64]
J. Zhang, Z. Zhang, X. Xiao, Y. Yang, and M. Winslett. 2012. Functional mechanism: regression analysis under differential privacy. Proc. VLDB Endow. 5, 11, 1364--1375.

Cited By

View all
  • (2025)Alternating minimization differential privacy protection algorithm for the novel dual-mode learning tasks modelExpert Systems with Applications10.1016/j.eswa.2024.125279259(125279)Online publication date: Jan-2025
  • (2023)Answering Private Linear Queries Adaptively Using the Common MechanismProceedings of the VLDB Endowment10.14778/3594512.359451916:8(1883-1896)Online publication date: 1-Apr-2023
  • (2023)DP-starJ: A Differential Private Scheme towards Analytical Star-Join QueriesProceedings of the ACM on Management of Data10.1145/36267251:4(1-24)Online publication date: 12-Dec-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Database Systems
ACM Transactions on Database Systems  Volume 40, Issue 2
June 2015
283 pages
ISSN:0362-5915
EISSN:1557-4644
DOI:10.1145/2799368
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 30 June 2015
Accepted: 01 December 2014
Revised: 01 October 2014
Received: 01 November 2013
Published in TODS Volume 40, Issue 2

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Linear counting query
  2. augmented Lagrangian multiplier algorithm
  3. differential privacy
  4. low rank
  5. matrix approximation

Qualifiers

  • Research-article
  • Research
  • Refereed

Funding Sources

  • SUG Grant M58020016 from Nanyang Technological University
  • NSF (61100148, 61202269, 61472089)
  • AcRF Tier 2 grant ARC19/14 from Ministry of Education, Singapore
  • NSF-61402182
  • SERC 102-158-0074 from Singapore's A*STAR

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)15
  • Downloads (Last 6 weeks)1
Reflects downloads up to 05 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2025)Alternating minimization differential privacy protection algorithm for the novel dual-mode learning tasks modelExpert Systems with Applications10.1016/j.eswa.2024.125279259(125279)Online publication date: Jan-2025
  • (2023)Answering Private Linear Queries Adaptively Using the Common MechanismProceedings of the VLDB Endowment10.14778/3594512.359451916:8(1883-1896)Online publication date: 1-Apr-2023
  • (2023)DP-starJ: A Differential Private Scheme towards Analytical Star-Join QueriesProceedings of the ACM on Management of Data10.1145/36267251:4(1-24)Online publication date: 12-Dec-2023
  • (2023)Better than Composition: How to Answer Multiple Relational Queries under Differential PrivacyProceedings of the ACM on Management of Data10.1145/35892681:2(1-26)Online publication date: 20-Jun-2023
  • (2021)Optimizing fitness-for-use of differentially private linear queriesProceedings of the VLDB Endowment10.14778/3467861.346786414:10(1730-1742)Online publication date: 26-Oct-2021
  • (2021)Reinforcement-Learning-Based Query Optimization in Differentially Private IoT Data PublishingIEEE Internet of Things Journal10.1109/JIOT.2021.30529788:14(11163-11176)Online publication date: 15-Jul-2021
  • (2021)Personal Big Data Pricing Method Based on Differential PrivacyComputers & Security10.1016/j.cose.2021.102529(102529)Online publication date: Nov-2021
  • (2020)R2DP: A Universal and Automated Approach to Optimizing the Randomization Mechanisms of Differential Privacy for Utility Metrics with No Known Optimal DistributionsProceedings of the 2020 ACM SIGSAC Conference on Computer and Communications Security10.1145/3372297.3417259(677-696)Online publication date: 30-Oct-2020
  • (2020)CryptϵProceedings of the 2020 ACM SIGMOD International Conference on Management of Data10.1145/3318464.3380596(603-619)Online publication date: 11-Jun-2020
  • (2019)IHP: improving the utility in differential private histogram publicationDistributed and Parallel Databases10.1007/s10619-018-07255-637:4(721-750)Online publication date: 2-Jan-2019
  • Show More Cited By

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media