research-article

A practice-oriented framework for measuring privacy and utility in data sanitization systems

Authors:

Reihaneh Safavi-Naini,

Jörg Denzinger,

Mina AskariAuthors Info & Claims

EDBT '10: Proceedings of the 2010 EDBT/ICDT Workshops

Article No.: 27, Pages 1 - 10

https://doi.org/10.1145/1754239.1754270

Published: 22 March 2010 Publication History

Abstract

Published data is prone to privacy attacks. Sanitization methods aim to prevent these attacks while maintaining usefulness of the data for legitimate users. Quantifying the trade-off between usefulness and privacy of published data has been the subject of much research in recent years. We propose a pragmatic framework for evaluating sanitization systems in real-life and use data mining utility as a universal measure of usefulness and privacy. We propose a definition for data mining utility that can be tuned to capture the needs of data users and the adversaries' intentions in a setting that is specified by a database, a candidate sanitization method, and privacy and utility concerns of data owner. We use this framework to evaluate and compare privacy and utility offered by two well-known sanitization methods, namely k-anonymity and ε-differential privacy, when UCI's "Adult" dataset and the Weka data mining package is used, and utility and privacy measures are defined for users and adversaries. In the case of k-anonymity, we compare our results with the recent work of Brickell and Shmatikov (KDD 2008), and show that using data mining algorithms increases their proposed adversarial gains.

References

[1]

N. A. Adam and J. C. Wortman. Security-control methods for statistical databases. ACM Comput Surv, 21(4):515--556, 1989.

Digital Library

[2]

R. Agrawal and R. Srikant. Privacy-Preserving Data Mining. In SIGMOD, pages 439--450, 2000.

Digital Library

[3]

A. Asuncion and D. Newman. UCI Machine Learning Repository, 2007.

[4]

J. Brickell and V. Shmatikov. The Cost of Privacy: Destruction of Data-Mining Utility in Anonymized Data Publishing. In KDD, pages 70--78, 2008.

Digital Library

[5]

J.-W. Byun, A. Kamra, E. Bertino, and N. Li. Efficient k -Anonymization Using Clustering Techniques. In DASFAA, pages 188--200, 2007.

Digital Library

[6]

V. Ciriani, S. D. C. di Vimercati, S. Foresti, and P. Samarati. k-Anonymity. In Secure Data Management in Decentralized Systems. Springer, 2007.

[7]

C. Dwork. Differential Privacy. In ICALP, pages 1--12, 2006.

Digital Library

[8]

C. Dwork, F. McSherry, K. Nissim, and A. Smith. Calibrating Noise to Sensitivity in Private Data Analysis. In TCC, pages 265--284, 2006.

Digital Library

[9]

A. Evfimievski, J. Gehrke, and R. Srikant. Limiting privacy breaches in privacy preserving data mining. In PODS, pages 211--222, 2003.

Digital Library

[10]

V. S. Iyengar. Transforming data to satisfy privacy constraints. In KDD, pages 279--288, 2002.

Digital Library

[11]

M. Kantarcioglu and C. Clifton. Privacy-preserving Distributed Mining of Association Rules on Horizontally Partitioned Data. IEEE T Knowl Data En, 16(9):1026--1037, 2004.

Digital Library

[12]

D. Kifer and J. Gehrke. Injecting utility into anonymized datasets. In SIGMOD, pages 217--228, 2006.

Digital Library

[13]

K. LeFevre, D. J. DeWitt, and R. Ramakrishnan. Workload-aware anonymization. In KDD, pages 277--286, 2006.

Digital Library

[14]

N. Li, T. Li, and S. Venkatasubramanian. t-Closeness: Privacy Beyond k-Anonymity and l-Diversity. In ICDE, pages 106--115, 2007.

[15]

A. Machanavajjhala, J. Gehrke, D. Kifer, and M. Venkitasubramaniam. l-diversity: Privacy beyond k-anonymity. In ICDE, pages 24--35, 2006.

Digital Library

[16]

D. J. Martin, D. Kifer, A. Machanavajjhala, J. Gehrke, and J. Y. Halpern. Worst-Case Background Knowledge for Privacy-Preserving Data Publishing. In ICDE, pages 126--135, 2007.

[17]

G. Miklau and D. Suciu. A Formal Analysis of Information Disclosure in Data Exchange. In SIGMOD, pages 575--586, 2004.

Digital Library

[18]

M. E. Nergiz and C. Clifton. Thoughts on k-Anonymization. In PDM, page 96, 2006.

Digital Library

[19]

M. E. Nergiz, C. Clifton, and A. E. Nergiz. MultiRelational k-Anonymity. In ICDE, pages 1417--1421, 2007.

[20]

H. Park and K. Shim. Approximate algorithms for K-anonymity. In SIGMOD, pages 67--78, 2007.

Digital Library

[21]

V. Rastogi, S. Hong, and D. Suciu. The Boundary Between Privacy and Utility in Data Publishing. In VLDB, pages 531--542, 2007.

Digital Library

[22]

P. Samarati. Protecting Respondents' Identities in Microdata Release. IEEE T Knowl Data En, 13(6):1010--1027, 2001.

Digital Library

[23]

P. Samarati and L. Sweeney. Protecting privacy when disclosing information: k-anonymity and its enforcement through generalization and suppression. Technical Report SRI-CSL-98-04, SRI Computer Science Laboratory, 1998.

[24]

M. Sramka, R. Safavi-Naini, and J. Denzinger. An Attack on the Privacy of Sanitized Data That Fuses the Outputs of Multiple Data Miners. In PADM, pages 130--137, 2009.

Digital Library

[25]

M. Sramka, R. Safavi-Naini, J. Denzinger, M. Askari, and J. Gao. Utility of Knowledge Discovered from Sanitized Data. Technical Report 2008-910-23, University of Calgary, 2008.

[26]

M. Sramka, R. Safavi-Naini, J. Denzinger, M. Askari, and J. Gao. Utility of Knowledge Extracted from Unsanitized Data when Applied to Sanitized Data. In PST, pages 227--231, 2008.

Digital Library

[27]

L. Sweeney. k-anonymity: a model for protecting privacy. Int J Uncertainty, Fuzziness and Knowl-based Syst, 10(5):557--570, 2002.

Digital Library

[28]

T. M. Truta and B. Vinay. Privacy Protection: p-Sensitive k-Anonymity Property. In PDM, pages 94--103, 2006.

Digital Library

[29]

J. S. Vaidya and C. Clifton. Privacy preserving association rule mining in vertically partitioned data. In KDD, pages 639--644, 2002.

Digital Library

[30]

V. S. Verykios, E. Bertino, I. N. Fovino, L. P. Provenza, Y. Saygin, and Y. Theodoridis. State-of-the-art in privacy preserving data mining. SIGMOD Record, 33(1), 2004.

Digital Library

[31]

I. H. Witten and E. Frank. Data Mining: Practical machine learning tools and techniques. Morgan Kaufmann, 2005.

Digital Library

[32]

R. C.-W. Wong, J. Li, A. W.-C. Fu, and K. Wang. (α, k)-anonymity: an enhanced k-anonymity model for privacy preserving data publishing. In KDD, pages 754--759, 2006.

Digital Library

Cited By

Tsai YWang STing IHong T(2018)Flexible Anonymization of Transactions with Sensitive Items2018 5th International Conference on Behavioral, Economic, and Socio-Cultural Computing (BESC)10.1109/BESC.2018.8697320(201-206)Online publication date: Nov-2018
https://doi.org/10.1109/BESC.2018.8697320
Tsai YWang SSong CTing I(2016)Privacy and Utility Effects of k-anonymity on Association Rule HidingProceedings of the The 3rd Multidisciplinary International Social Networks Conference on SocialInformatics 2016, Data Science 201610.1145/2955129.2955169(1-6)Online publication date: 15-Aug-2016
https://dl.acm.org/doi/10.1145/2955129.2955169
Wang JWu ZLiu YDeng WOh H(2016)Computational data privacy in wireless networksPeer-to-Peer Networking and Applications10.1007/s12083-016-0435-610:4(865-873)Online publication date: 25-Jan-2016
https://doi.org/10.1007/s12083-016-0435-6
Show More Cited By

Index Terms

A practice-oriented framework for measuring privacy and utility in data sanitization systems

Recommendations

An information theoretic privacy and utility measure for data sanitization mechanisms
CODASPY '12: Proceedings of the second ACM conference on Data and Application Security and Privacy

Data collection agencies publish sensitive data for legitimate purposes, such as research, marketing and etc. Data publishing has attracted much interest in research community due to the important concerns over the protection of individuals privacy. As ...
A Data Sanitization Method for Privacy Preserving Data Re-publication
NCM '08: Proceedings of the 2008 Fourth International Conference on Networked Computing and Advanced Information Management - Volume 02

When a table containing personal information is published, sensitive information should not be revealed. Although k-anonymity and l-diversity models are popular approaches to protect privacy, they are limited to one time data publishing. After a dataset ...
Effective sanitization approaches to hide sensitive utility and frequent itemsets

Privacy preserving data mining is a vibrant area in data mining. The sharing of data between the organizations is found to be beneficial for business growth. However, privacy policies and threats prevent the data owners from sharing the data for mining. ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

EDBT '10: Proceedings of the 2010 EDBT/ICDT Workshops

March 2010

290 pages

ISBN:9781605589909

DOI:10.1145/1754239

Copyright © 2010 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 22 March 2010

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Research-article

Conference

EDBT/ICDT '10

EDBT/ICDT '10: EDBT/ICDT '10 joint conference

March 22 - 26, 2010

Lausanne, Switzerland

Acceptance Rates

Overall Acceptance Rate 7 of 10 submissions, 70%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

12
Total Citations
View Citations
316
Total Downloads

Downloads (Last 12 months)5
Downloads (Last 6 weeks)0

Reflects downloads up to 22 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Tsai YWang STing IHong T(2018)Flexible Anonymization of Transactions with Sensitive Items2018 5th International Conference on Behavioral, Economic, and Socio-Cultural Computing (BESC)10.1109/BESC.2018.8697320(201-206)Online publication date: Nov-2018
https://doi.org/10.1109/BESC.2018.8697320
Tsai YWang SSong CTing I(2016)Privacy and Utility Effects of k-anonymity on Association Rule HidingProceedings of the The 3rd Multidisciplinary International Social Networks Conference on SocialInformatics 2016, Data Science 201610.1145/2955129.2955169(1-6)Online publication date: 15-Aug-2016
https://dl.acm.org/doi/10.1145/2955129.2955169
Wang JWu ZLiu YDeng WOh H(2016)Computational data privacy in wireless networksPeer-to-Peer Networking and Applications10.1007/s12083-016-0435-610:4(865-873)Online publication date: 25-Jan-2016
https://doi.org/10.1007/s12083-016-0435-6
Basu ANakamura THidano SKiyomoto S(2015)k-anonymityProceedings of the 2015 IEEE Trustcom/BigDataSE/ISPA - Volume 0110.1109/Trustcom.2015.473(983-989)Online publication date: 20-Aug-2015
https://dl.acm.org/doi/10.1109/Trustcom.2015.473
Okkalioglu BOkkalioglu MKoc MPolat H(2015)A surveyArtificial Intelligence Review10.1007/s10462-015-9439-544:4(547-569)Online publication date: 1-Dec-2015
https://dl.acm.org/doi/10.1007/s10462-015-9439-5
Alfalayleh MBrankovic L(2015)Quantifying Privacy: A Novel Entropy-Based Measure of Disclosure RiskCombinatorial Algorithms10.1007/978-3-319-19315-1_3(24-36)Online publication date: 7-Jun-2015
https://doi.org/10.1007/978-3-319-19315-1_3
Chakravarthy SKumari V(2014)Privacy preserving data publishingInternational Journal of Computational Intelligence Studies10.1504/IJCISTUDIES.2014.0627333:2/3(196-220)Online publication date: 1-Jun-2014
https://dl.acm.org/doi/10.1504/IJCISTUDIES.2014.062733
Basu AMonreale ACorena JGiannotti FPedreschi DKiyomoto SMiyake YYanagihara TTrasarti R(2014)A Privacy Risk Model for Trajectory DataTrust Management VIII10.1007/978-3-662-43813-8_9(125-140)Online publication date: 2014
https://doi.org/10.1007/978-3-662-43813-8_9
Askari MSafavi-Naini RBarker KBertino ESandhu R(2012)An information theoretic privacy and utility measure for data sanitization mechanismsProceedings of the second ACM conference on Data and Application Security and Privacy10.1145/2133601.2133637(283-294)Online publication date: 7-Feb-2012
https://dl.acm.org/doi/10.1145/2133601.2133637
Mivule KTurner CJi S(2012)Towards A Differential Privacy and Utility Preserving Machine Learning ClassifierProcedia Computer Science10.1016/j.procs.2012.09.05012(176-181)Online publication date: 2012
https://doi.org/10.1016/j.procs.2012.09.050
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents