Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3098954.3098962acmotherconferencesArticle/Chapter ViewAbstractPublication PagesaresConference Proceedingsconference-collections
research-article

A Non-Parametric Model for Accurate and Provably Private Synthetic Data Sets

Published: 29 August 2017 Publication History

Abstract

Generating synthetic data is a well-known option to limit disclosure risk in sensitive data releases. The usual approach is to build a model for the population and then generate a synthetic data set solely based on the model. We argue that building an accurate population model is difficult and we propose instead to approximate the original data as closely as privacy constraints permit. To enforce an ex ante privacy level when generating synthetic data, we introduce a new privacy model called ϵ synthetic privacy. Then, we describe a synthetic data generation method that satisfies ϵ-synthetic privacy. Finally, we evaluate the utility of the synthetic data generated with our method.

References

[1]
A. Blum, K. Ligett, and A. Roth. A learning theory approach to non-interactive database privacy. In 40th Annual ACM Symposium on Theory of Computing (STOC 2008), pp. 609--618. ACM, 2008.
[2]
J. Burridge. Information preserving statistical obfuscation. Statistics and Computing, 13(4):321--327, 2003.
[3]
R. A. Dandekar, J. Domingo-Ferrer, and F. Sebé. LHS-based hybrid microdata vs rank swapping and microaggregation for numeric microdata protection. In Inference Control in Statistical Databases: from Theory to Practice, LNCS 2316, pp. 153--162. Springer, 2002.
[4]
J. Domingo-Ferrer and Ú. González-Nicolás. Hybrid microdata using microaggregation. Information Sciences, 180(15):2834--2844, 2010.
[5]
J. Domingo-Ferrer, S. Ricci, and J. Soria-Comas. Disclosure risk assessment via record linkage by a maximum-knowledge attacker. In 13th Annual Conference on Privacy Security and Trust (PST 2015), pp. 28--35. IEEE, 2015.
[6]
J. Domingo-Ferrer, D. Sanchez, and J. Soria-Comas. Database Anonymization: Privacy Models, Data Utility, and Microaggregation-Based Inter-Model Connections. Morgan & Claypool, 2016.
[7]
J. Drechsler. Synthetic Datasets for Statistical Disclosure Control. Springer, 2011.
[8]
C. Dwork. Differential privacy: A survey of results. In Proceedings of the 5th international conference on theory and aplications of models of computation (TAMC'08). pp. 1--19. Springer-Verlag, 2008.
[9]
C. Dwork, K. Kenthapadi, F. McSherry, I. Mironov, and M. Naor. Our data, ourselves: privacy via distributed noise generation. In Proceedings of the 24th annual international conference on The Theory and Applications of Cryptographic Techniques (EUROCRYPT'06). pp. 486--503. Springer-Verlag, 2006.
[10]
C. Dwork, F. McSherry, K. Nissim, and A. Smith. Calibrating noise to sensitivity in private data analysis. In Proceedings of the 3rd Conference on Theory of Cryptography (TCC 2006), LNCS 3876, pp. 265--284. Springer, 2006.
[11]
C. Dwork, and A. Smith. Differential privacy for statistics: What we know and what we want to learn. Journal of Privacy and Confidentiality, 1(2):135--154, 2009.
[12]
A. Hundepool, J. Domingo-Ferrer, L. Franconi, S. Giessing, E. S. Nordholt, K. Spicer, and P.-P. de Wolf. Statistical Disclosure Control. Wiley, 2012.
[13]
N. Li, T. Li, and S. Venkatasubramanian. t-closeness: privacy beyond k-anonymity and l-diversity. In Proceedings of the 23rd IEEE International Conference on Data Engineering (ICDE 2007), pp. 106--115. IEEE, 2007.
[14]
A. Machanavajjhala, D. Kifer, J. Gehrke, and M. Venkitasubramaniam. l-diversity: privacy beyond k-anonymity. ACM Trans. Knowl. Discov. Data, 1(1), Mar. 2007.
[15]
K. Muralidhar and R. Sarathy. Generating sufficiency-based non-synthetic perturbed data. Transactions on Data Privacy, 1(1):17--33, 2008.
[16]
J. Reiter. Inference for partially synthetic, public use microdata sets. Survey Methodology, 29(2):181--188, 2003.
[17]
J. P. Reiter. Using CART to generate partially synthetic, public use microdata. Journal of Official Statistics, 19:441--462, 2003.
[18]
D.B.Rubin. Discussion: statistical disclosure limitation. Journal of Official Statistics, 9(2):461--468, 1993.
[19]
P. Samarati and L. Sweeney. Protecting Privacy when Disclosing Information: k-Anonymity and its Enforcement Through Generalization and Suppression. Technical report, SRI International, 1998.
[20]
D. Sánchez, J. Domingo-Ferrer, S. Martínez, and J. Soria-Comas. Utility-preserving differentially private data releases via individual ranking microaggregation. Information Fusion, 30:1--14, 2016.
[21]
J. Soria-Comas and J. Domingo-Ferrer. Differential privacy through knowledge refinement. In 4th IEEE International Conference on Privacy, Security, Risk and Trust (PASSAT 2012), pp. 702--707. IEEE, 2012.
[22]
J. Soria-Comas and J. Domingo-Ferrer. Probabilistic k-anonymity through microaggregation and data swapping. In Proc. of the IEEE International Conference on Fuzzy Systems (FUZZ-IEEE 2012), pp. 1--8. IEEE, 2012.
[23]
J. Soria-Comas and J. Domingo-Ferrer. Optimal data-independent noise for differential privacy. Information Sciences, 250:200--214, 2013.
[24]
J. Soria-Comas, J. Domingo-Ferrer, D. Sánchez, and S. Martínez. Enhancing data utility in differential privacy via microaggregation-based k-anonymity. The VLDB Journal, 23(5):771--794, 2014.
[25]
J. Soria-Comas, J. Domingo-Ferrer, D. Sánchez, and D. Megias. Individual differential privacy: a utility-preserving formulation of differential privacy guarantees. IEEE Transactions on Information Forensics and Security, 12(6):1418--1429, 2017.
[26]
Y. Xiao, L. Xiong, and C. Yuan. Differentially private data release through multidimensional partitioning. In Secure Data Management, LNCS 6358, pp. 150--168. Springer, 2010.
[27]
J. Zhang, G. Cormode, C. M. Procopiuc, D. Srivastava, and X. Xiao. Privbayes: private data release via bayesian networks. In 2014 ACM SIGMOD International Conference on Management of Data-SIGMOD '14, pp. 1423--1434, New York, NY, USA, 2014. ACM.

Cited By

View all
  • (2018)On the Privacy Guarantees of Synthetic Data: A Reassessment from the Maximum-Knowledge Attacker PerspectivePrivacy in Statistical Databases10.1007/978-3-319-99771-1_5(59-74)Online publication date: 25-Aug-2018

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
ARES '17: Proceedings of the 12th International Conference on Availability, Reliability and Security
August 2017
853 pages
ISBN:9781450352574
DOI:10.1145/3098954
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 29 August 2017

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. ϵ-synthetic privacy
  2. Synthetic data
  3. formal privacy
  4. non-parametric methods

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Funding Sources

Conference

ARES '17
ARES '17: International Conference on Availability, Reliability and Security
August 29 - September 1, 2017
Reggio Calabria, Italy

Acceptance Rates

ARES '17 Paper Acceptance Rate 100 of 191 submissions, 52%;
Overall Acceptance Rate 228 of 451 submissions, 51%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)8
  • Downloads (Last 6 weeks)0
Reflects downloads up to 10 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2018)On the Privacy Guarantees of Synthetic Data: A Reassessment from the Maximum-Knowledge Attacker PerspectivePrivacy in Statistical Databases10.1007/978-3-319-99771-1_5(59-74)Online publication date: 25-Aug-2018

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media