Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2857705.2857708acmconferencesArticle/Chapter ViewAbstractPublication PagescodaspyConference Proceedingsconference-collections
research-article
Public Access

Differentially Private K-Means Clustering

Published: 09 March 2016 Publication History

Abstract

There are two broad approaches for differentially private data analysis. The interactive approach aims at developing customized differentially private algorithms for various data mining tasks. The non-interactive approach aims at developing differentially private algorithms that can output a synopsis of the input dataset, which can then be used to support various data mining tasks. In this paper we study the effectiveness of the two approaches on differentially private k-means clustering. We develop techniques to analyze the empirical error behaviors of the existing interactive and non-interactive approaches. Based on the analysis, we propose an improvement of DPLloyd which is a differentially private version of the Lloyd algorithm. We also propose a non-interactive approach EUGkM which publishes a differentially private synopsis for k-means clustering. Results from extensive and systematic experiments support our analysis and demonstrate the effectiveness of our improvement on DPLloyd and the proposed EUGkM algorithm.

References

[1]
A. Asuncion and D. Newman. UCI machine learning repository, 2010.
[2]
R. Bhaskar, S. Laxman, A. Smith, and A. Thakurta. Discovering frequent patterns in sensitive data. In KDD, pages 503--512, 2010.
[3]
A. Blum, C. Dwork, F. McSherry, and K. Nissim. Practical privacy: The sulq framework. In PODS, pages 128--138, 2005.
[4]
U. S. Census. Topologically integrated geographic encoding and referencing. http://www.census.gov/geo/maps-data/data/tiger.html.
[5]
K. Chaudhuri and C. Monteleoni. Privacy-preserving logistic regression. In NIPS, pages 289--296, 2008.
[6]
G. Cormode, C. M. Procopiuc, D. Srivastava, E. Shen, and T. Yu. Differentially private spatial decompositions. In ICDE, pages 20--31, 2012.
[7]
I. Dinur and K. Nissim. Revealing information while preserving privacy. In PODS, pages 202--210, 2003.
[8]
C. Dwork. Differential privacy. In ICALP, pages 1--12, 2006.
[9]
C. Dwork. A firm foundation for private data analysis. Commun. ACM, 54(1):86--95, Jan. 2011.
[10]
C. Dwork, F. McSherry, K. Nissim, and A. Smith. Calibrating noise to sensitivity in private data analysis. In TCC, pages 265--284, 2006.
[11]
C. Dwork and K. Nissim. Privacy-preserving data mining on vertically partitioned databases. In CRYPTO, pages 528--544, 2004.
[12]
P. Fr\"anti. Clustering datasets. http://cs.joensuu.fi/sipu/datasets/.
[13]
A. Friedman and A. Schuster. Data mining with differential privacy. In KDD, pages 493--502, 2010.
[14]
M. Hay, V. Rastogi, G. Miklau, and D. Suciu. Boosting the accuracy of differentially private histograms through consistency. Proc. VLDB Endow., 3(1--2):1021--1032, Sept. 2010.
[15]
K. Kummamuru and M. N. Murty. Genetic k-means algorithm. IEEE Transactions on Systems, Man, and Cybernetics, Part B, 29(3):433--439, 1999.
[16]
J. Lei. Differentially private m-estimators. In NIPS, pages 361--369, 2011.
[17]
N. Li, W. Qardaji, D. Su, and J. Cao. Privbasis: Frequent itemset mining with differential privacy. Proc. VLDB Endow., 5(11):1340--1351, July 2012.
[18]
S. P. Lloyd. Least squares quantization in pcm. IEEE Transactions on Information Theory, 28(2):129--136, 1982.
[19]
F. McSherry. Privacy integrated queries (pinq) infrastructure. http://research.microsoft.com/en-us/downloads/73099525-fd8d-4966--9b93--574e6023147f/.
[20]
F. McSherry and I. Mironov. Differentially private recommender systems: Building privacy into the netflix prize contenders. In KDD, pages 627--636, 2009.
[21]
F. McSherry and K. Talwar. Mechanism design via differential privacy. In FOCS, pages 94--103, 2007.
[22]
F. D. McSherry. Privacy integrated queries: An extensible platform for privacy-preserving data analysis. In SIGMOD, pages 19--30, 2009.
[23]
P. Mohan. Gupt: a platform for privacy-preserving data mining. https://github.com/prashmohan/GUPT.
[24]
P. Mohan, A. Thakurta, E. Shi, D. Song, and D. Culler. Gupt: Privacy preserving data analysis made easy. In SIGMOD, pages 349--360, 2012.
[25]
K. Nissim, S. Raskhodnikova, and A. Smith. Smooth sensitivity and sampling in private data analysis. In STOC, pages 75--84, 2007.
[26]
J. M. Pe\ na, J. A. Lozano, and P. Larra\ naga. An empirical comparison of four initialization methods for the k-means algorithm. Pattern Recogn. Lett., 20(10):1027--1040, 1999.
[27]
W. H. Qardaji, W. Yang, and N. Li. Differentially private grids for geospatial data. In ICDE, pages 757--768, 2013.
[28]
W. Qiu. clustergeneration: Random cluster generation (with specified degree of separation). http://cran.r-project.org/web/packages/clusterGeneration/index.html.
[29]
S. Ray and R. H. Turi. Determination of number of clusters in k-means clustering and application in colour image segmentation. In ICAPRDT'99, pages 137--143, 1999.
[30]
Scipy.org. Scientific computing tools for python. http://scipy.org/.
[31]
A. Smith. Privacy-preserving statistical estimation with optimal convergence rates. In STOC, pages 813--822, 2011.
[32]
R. Tibshirani, G. Walther, and T. Hastie. Estimating the number of clusters in a data set via the gap statistic. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 63(2):411--423, 2001.
[33]
X. Xiao, G. Wang, and J. Gehrke. Differential privacy via wavelet transforms. IEEE Trans. Knowl. Data Eng., 23(8):1200--1214, 2011.
[34]
J. Zhang, X. Xiao, Y. Yang, Z. Zhang, and M. Winslett. Privgene: Differentially private model fitting using genetic algorithms. In SIGMOD, pages 665--676, 2013.
[35]
J. Zhang, Z. Zhang, X. Xiao, Y. Yang, and M. Winslett. Functional mechanism: Regression analysis under differential privacy. Proc. VLDB Endow., 5(11):1364--1375, July 2012.

Cited By

View all
  • (2024)Heavy Hitter Identification Over Large-Domain Set-Valued Data With Local Differential PrivacyIEEE Transactions on Information Forensics and Security10.1109/TIFS.2023.332472619(414-426)Online publication date: 2024
  • (2024)Fully Privacy-Preserving and Efficient Clustering Scheme based on Fully Homomorphic EncryptionICC 2024 - IEEE International Conference on Communications10.1109/ICC51166.2024.10622837(2694-2700)Online publication date: 9-Jun-2024
  • (2024)Density-Based Clustering with Differential PrivacyInformation Sciences10.1016/j.ins.2024.121211(121211)Online publication date: Jul-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
CODASPY '16: Proceedings of the Sixth ACM Conference on Data and Application Security and Privacy
March 2016
340 pages
ISBN:9781450339353
DOI:10.1145/2857705
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 09 March 2016

Permissions

Request permissions for this article.

Check for updates

Badges

  • Best Paper

Author Tags

  1. differential privacy
  2. k-means clustering
  3. private data publishing

Qualifiers

  • Research-article

Funding Sources

Conference

CODASPY'16
Sponsor:

Acceptance Rates

CODASPY '16 Paper Acceptance Rate 22 of 115 submissions, 19%;
Overall Acceptance Rate 149 of 789 submissions, 19%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)404
  • Downloads (Last 6 weeks)34
Reflects downloads up to 30 Aug 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Heavy Hitter Identification Over Large-Domain Set-Valued Data With Local Differential PrivacyIEEE Transactions on Information Forensics and Security10.1109/TIFS.2023.332472619(414-426)Online publication date: 2024
  • (2024)Fully Privacy-Preserving and Efficient Clustering Scheme based on Fully Homomorphic EncryptionICC 2024 - IEEE International Conference on Communications10.1109/ICC51166.2024.10622837(2694-2700)Online publication date: 9-Jun-2024
  • (2024)Density-Based Clustering with Differential PrivacyInformation Sciences10.1016/j.ins.2024.121211(121211)Online publication date: Jul-2024
  • (2024)Lifting in Support of Privacy-Preserving Probabilistic InferenceKI - Künstliche Intelligenz10.1007/s13218-024-00851-yOnline publication date: 13-Jun-2024
  • (2024)Efficient Clustering on Encrypted DataApplied Cryptography and Network Security10.1007/978-3-031-54770-6_9(213-236)Online publication date: 5-Mar-2024
  • (2023)Nearly tight bounds for differentially private multiway cutProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3667206(24947-24965)Online publication date: 10-Dec-2023
  • (2023)k-means clustering with distance-based privacyProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3666981(19570-19593)Online publication date: 10-Dec-2023
  • (2023)Privacy Preservation Using Machine Learning in the Internet of ThingsMathematics10.3390/math1116347711:16(3477)Online publication date: 11-Aug-2023
  • (2023)Differentially Private Vertical Federated ClusteringProceedings of the VLDB Endowment10.14778/3583140.358314616:6(1277-1290)Online publication date: 1-Feb-2023
  • (2023)Global Combination and Clustering Based Differential Privacy Mixed Data PublishingIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2023.323782235:11(11437-11448)Online publication date: 1-Nov-2023
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media