Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3105831.3105861acmotherconferencesArticle/Chapter ViewAbstractPublication PagesideasConference Proceedingsconference-collections
short-paper

DiPCoDing: A Differentially Private Approach for Correlated Data with Clustering

Published: 12 July 2017 Publication History

Abstract

Differential privacy is a model which gives strong privacy guarantees. It was designed to make difficult to distinguish individuals' records on statistical databases while maximizing data utility. Differential privacy approaches usually assume that database records are sampled independently, i.e., each record of this database is independent of the rest. However, this assumption is not always true in the context of real-world applications. In this paper we propose DiPCoDing, a novel approach to calculate the correlation between records in statistical databases using clusterization. For this matter, we have considered Density-Based Spatial Clustering of Applications with Noise (DBSCAN) and Gaussian Mixture Model (GMM). Our method aims to group similar records, which are more likely to be correlated, to reduce the sensitivity of differential privacy and consequently the amount of noise added to the query answer, increasing data utility while providing privacy for correlated data. The experimental results of our approach showed that relative errors and noisy answers are significantly lower than those from existing works.

References

[1]
Hirotugu Akaike. 1974. A new look at the statistical model identification. IEEE transactions on automatic control 19, 6 (1974), 716--723.
[2]
Rui Chen, Benjamin CM Fung, S Yu Philip, and Bipin C Desai. 2014. Correlated network data publication via differential privacy. The VLDB Journal 23, 4 (2014), 653--676.
[3]
Josep Domingo-Ferrer, David Sánchez, and Jordi Soria-Comas. 2016. Database Anonymization: Privacy Models, Data Utility, and Microaggregation-based Inter-model Connections. Synthesis Lectures on Information Security, Privacy, & Trust 8, 1 (2016), 1--136.
[4]
Cynthia Dwork. 2006. Differential Privacy. In Automata, Languages and Programming: 33rd International Colloquium, ICALP 2006, Venice, Italy, July 10-14, 2006, Proceedings, Part II. Springer, 1--12.
[5]
Cynthia Dwork. 2008. Differential privacy: A survey of results. In International Conference on Theory and Applications of Models of Computation. Springer, 1--19.
[6]
Cynthia Dwork, Aaron Roth, et al. 2014. The algorithmic foundations of differential privacy. Foundations and Trends in Theoretical Computer Science 9, 3--4 (2014), 211--407.
[7]
Martin Ester, Hans-Peter Kriegel, Jörg Sander, Xiaowei Xu, et al. 1996. A density-based algorithm for discovering clusters in large spatial databases with noise. In Kdd, Vol. 96. 226--231.
[8]
Daniel Kifer and Ashwin Machanavajjhala. 2011. No Free Lunch in Data Privacy. In Proceedings of the 2011 ACM SIGMOD International Conference on Management of Data (SIGMOD '11). ACM, 193--204.
[9]
Daniel Kifer and Ashwin Machanavajjhala. 2014. Pufferfish: A framework for mathematical privacy definitions. ACM Transactions on Database Systems (TODS) 39, 1 (2014), 3.
[10]
M. Lichman. 2013. UCI Machine Learning Repository. (2013). http://archive.ics.uci.edu/ml
[11]
P.E. Pfeiffer. 1978. Concepts of Probability Theory. Dover Publications. https://books.google.com.br/books?id=_mayRBczVRwC
[12]
David Sánchez, Josep Domingo-Ferrer, Sergio Martínez, and Jordi Soria-Comas. 2016. Utility-preserving differentially private data releases via individual ranking microaggregation. Information Fusion 30 (2016), 1--14.
[13]
Gideon Schwarz et al. 1978. Estimating the dimension of a model. The annals of statistics 6, 2 (1978), 461--464.
[14]
Maryam Shoaran, Alex Thomo, and Jens H. Weber. 2012. Differential Privacy in Practice. In Secure Data Management - 9th VLDB Workshop, SDM 2012, Istanbul, Turkey, August 27, 2012. Proceedings. 14--24.
[15]
Xue-Kun Song and Peter X-K Song. 2007. Correlated data analysis: modeling, analytics, and applications. Springer Science & Business Media.
[16]
Jordi Soria-Comas, Josep Domingo-Ferrer, David Sánchez, and Sergio Martínez. 2014. Enhancing data utility in differential privacy via microaggregation-based k-anonymity. The VLDB Journal 23, 5 (2014), 771--794.
[17]
Xiaokui Xiao, Guozhang Wang, and Johannes Gehrke. 2011. Differential Privacy via Wavelet Transforms. IEEE Trans. Knowl. Data Eng. 23, 8 (2011), 1200--1214.
[18]
Lei Xu and Michael I Jordan. 1996. On convergence properties of the EM algorithm for Gaussian mixtures. Neural computation 8, 1 (1996), 129--151.
[19]
Tianqing Zhu, Ping Xiong, Gang Li, and Wanlei Zhou. 2015. Correlated differential privacy: hiding information in non-IID data set. IEEE Transactions on Information Forensics and Security 10, 2 (2015), 229--242.

Cited By

View all
  • (2021)A New Categories Identification Method based on Reliability Test in Radar Signal Recognition System2021 IEEE International Geoscience and Remote Sensing Symposium IGARSS10.1109/IGARSS47720.2021.9553762(5791-5794)Online publication date: 11-Jul-2021
  • (2019)A Quadtree Density Clustering Algorithm Under Differential Privacy2019 IEEE 13th International Conference on Anti-counterfeiting, Security, and Identification (ASID)10.1109/ICASID.2019.8925230(65-69)Online publication date: Oct-2019

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
IDEAS '17: Proceedings of the 21st International Database Engineering & Applications Symposium
July 2017
338 pages
ISBN:9781450352208
DOI:10.1145/3105831
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

In-Cooperation

  • Univ of the West of England: University of the West of England
  • BytePress
  • Concordia University: Concordia University

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 July 2017

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Clustering
  2. Correlated Data
  3. DBSCAN
  4. Differential Privacy
  5. Gaussian Mixture Model

Qualifiers

  • Short-paper
  • Research
  • Refereed limited

Funding Sources

Conference

IDEAS 2017

Acceptance Rates

IDEAS '17 Paper Acceptance Rate 38 of 102 submissions, 37%;
Overall Acceptance Rate 74 of 210 submissions, 35%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)4
  • Downloads (Last 6 weeks)2
Reflects downloads up to 16 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2021)A New Categories Identification Method based on Reliability Test in Radar Signal Recognition System2021 IEEE International Geoscience and Remote Sensing Symposium IGARSS10.1109/IGARSS47720.2021.9553762(5791-5794)Online publication date: 11-Jul-2021
  • (2019)A Quadtree Density Clustering Algorithm Under Differential Privacy2019 IEEE 13th International Conference on Anti-counterfeiting, Security, and Identification (ASID)10.1109/ICASID.2019.8925230(65-69)Online publication date: Oct-2019

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media