research-article

A Non-Parametric Model for Accurate and Provably Private Synthetic Data Sets

Authors:

Jordi Soria-Comas,

Josep Domingo-FerrerAuthors Info & Claims

ARES '17: Proceedings of the 12th International Conference on Availability, Reliability and Security

Article No.: 3, Pages 1 - 10

https://doi.org/10.1145/3098954.3098962

Published: 29 August 2017 Publication History

Abstract

Generating synthetic data is a well-known option to limit disclosure risk in sensitive data releases. The usual approach is to build a model for the population and then generate a synthetic data set solely based on the model. We argue that building an accurate population model is difficult and we propose instead to approximate the original data as closely as privacy constraints permit. To enforce an ex ante privacy level when generating synthetic data, we introduce a new privacy model called ϵ synthetic privacy. Then, we describe a synthetic data generation method that satisfies ϵ-synthetic privacy. Finally, we evaluate the utility of the synthetic data generated with our method.

References

[1]

A. Blum, K. Ligett, and A. Roth. A learning theory approach to non-interactive database privacy. In 40th Annual ACM Symposium on Theory of Computing (STOC 2008), pp. 609--618. ACM, 2008.

Digital Library

[2]

J. Burridge. Information preserving statistical obfuscation. Statistics and Computing, 13(4):321--327, 2003.

Digital Library

[3]

R. A. Dandekar, J. Domingo-Ferrer, and F. Sebé. LHS-based hybrid microdata vs rank swapping and microaggregation for numeric microdata protection. In Inference Control in Statistical Databases: from Theory to Practice, LNCS 2316, pp. 153--162. Springer, 2002.

Digital Library

[4]

J. Domingo-Ferrer and Ú. González-Nicolás. Hybrid microdata using microaggregation. Information Sciences, 180(15):2834--2844, 2010.

Digital Library

[5]

J. Domingo-Ferrer, S. Ricci, and J. Soria-Comas. Disclosure risk assessment via record linkage by a maximum-knowledge attacker. In 13th Annual Conference on Privacy Security and Trust (PST 2015), pp. 28--35. IEEE, 2015.

[6]

J. Domingo-Ferrer, D. Sanchez, and J. Soria-Comas. Database Anonymization: Privacy Models, Data Utility, and Microaggregation-Based Inter-Model Connections. Morgan & Claypool, 2016.

Digital Library

[7]

J. Drechsler. Synthetic Datasets for Statistical Disclosure Control. Springer, 2011.

[8]

C. Dwork. Differential privacy: A survey of results. In Proceedings of the 5th international conference on theory and aplications of models of computation (TAMC'08). pp. 1--19. Springer-Verlag, 2008.

Digital Library

[9]

C. Dwork, K. Kenthapadi, F. McSherry, I. Mironov, and M. Naor. Our data, ourselves: privacy via distributed noise generation. In Proceedings of the 24th annual international conference on The Theory and Applications of Cryptographic Techniques (EUROCRYPT'06). pp. 486--503. Springer-Verlag, 2006.

Digital Library

[10]

C. Dwork, F. McSherry, K. Nissim, and A. Smith. Calibrating noise to sensitivity in private data analysis. In Proceedings of the 3rd Conference on Theory of Cryptography (TCC 2006), LNCS 3876, pp. 265--284. Springer, 2006.

Digital Library

[11]

C. Dwork, and A. Smith. Differential privacy for statistics: What we know and what we want to learn. Journal of Privacy and Confidentiality, 1(2):135--154, 2009.

[12]

A. Hundepool, J. Domingo-Ferrer, L. Franconi, S. Giessing, E. S. Nordholt, K. Spicer, and P.-P. de Wolf. Statistical Disclosure Control. Wiley, 2012.

[13]

N. Li, T. Li, and S. Venkatasubramanian. t-closeness: privacy beyond k-anonymity and l-diversity. In Proceedings of the 23rd IEEE International Conference on Data Engineering (ICDE 2007), pp. 106--115. IEEE, 2007.

[14]

A. Machanavajjhala, D. Kifer, J. Gehrke, and M. Venkitasubramaniam. l-diversity: privacy beyond k-anonymity. ACM Trans. Knowl. Discov. Data, 1(1), Mar. 2007.

Digital Library

[15]

K. Muralidhar and R. Sarathy. Generating sufficiency-based non-synthetic perturbed data. Transactions on Data Privacy, 1(1):17--33, 2008.

Digital Library

[16]

J. Reiter. Inference for partially synthetic, public use microdata sets. Survey Methodology, 29(2):181--188, 2003.

[17]

J. P. Reiter. Using CART to generate partially synthetic, public use microdata. Journal of Official Statistics, 19:441--462, 2003.

[18]

D.B.Rubin. Discussion: statistical disclosure limitation. Journal of Official Statistics, 9(2):461--468, 1993.

[19]

P. Samarati and L. Sweeney. Protecting Privacy when Disclosing Information: k-Anonymity and its Enforcement Through Generalization and Suppression. Technical report, SRI International, 1998.

[20]

D. Sánchez, J. Domingo-Ferrer, S. Martínez, and J. Soria-Comas. Utility-preserving differentially private data releases via individual ranking microaggregation. Information Fusion, 30:1--14, 2016.

Digital Library

[21]

J. Soria-Comas and J. Domingo-Ferrer. Differential privacy through knowledge refinement. In 4th IEEE International Conference on Privacy, Security, Risk and Trust (PASSAT 2012), pp. 702--707. IEEE, 2012.

Digital Library

[22]

J. Soria-Comas and J. Domingo-Ferrer. Probabilistic k-anonymity through microaggregation and data swapping. In Proc. of the IEEE International Conference on Fuzzy Systems (FUZZ-IEEE 2012), pp. 1--8. IEEE, 2012.

[23]

J. Soria-Comas and J. Domingo-Ferrer. Optimal data-independent noise for differential privacy. Information Sciences, 250:200--214, 2013.

[24]

J. Soria-Comas, J. Domingo-Ferrer, D. Sánchez, and S. Martínez. Enhancing data utility in differential privacy via microaggregation-based k-anonymity. The VLDB Journal, 23(5):771--794, 2014.

Digital Library

[25]

J. Soria-Comas, J. Domingo-Ferrer, D. Sánchez, and D. Megias. Individual differential privacy: a utility-preserving formulation of differential privacy guarantees. IEEE Transactions on Information Forensics and Security, 12(6):1418--1429, 2017.

Digital Library

[26]

Y. Xiao, L. Xiong, and C. Yuan. Differentially private data release through multidimensional partitioning. In Secure Data Management, LNCS 6358, pp. 150--168. Springer, 2010.

Digital Library

[27]

J. Zhang, G. Cormode, C. M. Procopiuc, D. Srivastava, and X. Xiao. Privbayes: private data release via bayesian networks. In 2014 ACM SIGMOD International Conference on Management of Data-SIGMOD '14, pp. 1423--1434, New York, NY, USA, 2014. ACM.

Digital Library

Cited By

Ruiz NMuralidhar KDomingo-Ferrer J(2018)On the Privacy Guarantees of Synthetic Data: A Reassessment from the Maximum-Knowledge Attacker PerspectivePrivacy in Statistical Databases10.1007/978-3-319-99771-1_5(59-74)Online publication date: 25-Aug-2018
https://doi.org/10.1007/978-3-319-99771-1_5

Recommendations

Towards using differentially private synthetic data for machine learning in collaborative data science projects
ARES '20: Proceedings of the 15th International Conference on Availability, Reliability and Security

As organisations increasingly embrace data science to extract additional value from the data they hold, understanding how ethical and secure data sharing practices effect the utility of models is necessary. For organisations taking first steps towards ...
The promise and limitations of formal privacy
Abstract
Differential privacy (DP) is in our smart phones, web browsers, social media, and the federal statistics used to allocate billions of dollars. Despite the mathematical concept being only 17 years old, differential privacy has amassed a rapidly ...
Side profile of a woman with a digitalized outline of her face a few inches infront of her against a blue background. image image
Covariance’s Loss is Privacy’s Gain: Computationally Efficient, Private and Accurate Synthetic Data
Abstract
The protection of private information is of vital importance in data-driven research, business and government. The conflict between privacy and utility has triggered intensive research in the computer science and statistics communities, who have ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

ARES '17: Proceedings of the 12th International Conference on Availability, Reliability and Security

August 2017

853 pages

ISBN:9781450352574

DOI:10.1145/3098954

Copyright © 2017 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 29 August 2017

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Funding Sources

Conference

ARES '17

ARES '17: International Conference on Availability, Reliability and Security

August 29 - September 1, 2017

Reggio Calabria, Italy

Acceptance Rates

ARES '17 Paper Acceptance Rate 100 of 191 submissions, 52%;

Overall Acceptance Rate 228 of 451 submissions, 51%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
141
Total Downloads

Downloads (Last 12 months)8
Downloads (Last 6 weeks)0

Reflects downloads up to 10 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Ruiz NMuralidhar KDomingo-Ferrer J(2018)On the Privacy Guarantees of Synthetic Data: A Reassessment from the Maximum-Knowledge Attacker PerspectivePrivacy in Statistical Databases10.1007/978-3-319-99771-1_5(59-74)Online publication date: 25-Aug-2018
https://doi.org/10.1007/978-3-319-99771-1_5

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents