research-article

An information theoretic privacy and utility measure for data sanitization mechanisms

Authors:

Reihaneh Safavi-Naini,

Ken BarkerAuthors Info & Claims

CODASPY '12: Proceedings of the second ACM conference on Data and Application Security and Privacy

Pages 283 - 294

https://doi.org/10.1145/2133601.2133637

Published: 07 February 2012 Publication History

Abstract

Data collection agencies publish sensitive data for legitimate purposes, such as research, marketing and etc. Data publishing has attracted much interest in research community due to the important concerns over the protection of individuals privacy. As a result several sanitization mechanisms with different notions of privacy have been proposed. To be able to measure, set and compare the level of privacy protection, there is a need to translate these different mechanisms to a unified system. In this paper, we propose a novel information theoretic framework for representing a formal model of a mechanism as a noisy channel and evaluating its privacy and utility. We show that deterministic publishing property that is used in most of these mechanisms reduces the privacy guarantees and causes information to leak. The great effect of adversary's background knowledge on this metric is concluded. We also show that using this framework we can compute the sanitization mechanism's preserved utility from the point of view of a data user. By using the specifications of a popular sanitization mechanism, k-anonymity, we analytically provide a representation of this mechanism to be used for its evaluation.

References

[1]

D. Agrawal and C. C. Aggarwal. On the design and quantification of privacy preserving data mining algorithms. In PODS, 2001.

Digital Library

[2]

K. Chatzikokolakis, C. Palamidessi, and P. Panangaden. On the bayes risk in information-hiding protocols. J. Comput. Secur., 16(5):531--571, 2008.

[3]

B.-C. Chen, K. LeFevre, and R. Ramakrishnan. Privacy skyline: privacy with multidimensional adversarial knowledge. In VLDB '07: Proceedings of the 33rd international conference on Very large data bases, pages 770--781. VLDB Endowment, 2007.

Digital Library

[4]

T. Dalenius. Towards a methodology for statistical disclosure control. Statistik Tidskrift, 15:429--444, 1977.

[5]

S. De Capitani di Vimercati and P. Samarati. k-Anonymity for protecting privacy. October 2006.

[6]

J. Domingo-Ferrer, A. Oganian, and V. Torra. Information-theoretic disclosure risk measures in statistical disclosure control of tabular data. In SSDBM, pages 227--231, 2002.

Digital Library

[7]

J. Domingo-Ferrer, F. Sebé, and J. Castellà-Roca. On the security of noise addition for privacy in statistical databases. In Privacy in Statistical Databases, pages 149--161, 2004.

[8]

G. Duncan and D. Lambert. The risk of disclosure for microdata. Journal of Business & Economic Statistics, 7(2):207--17, April 1989.

[9]

C. Dwork. Differential privacy. In ICALP (2), pages 1--12, 2006.

[10]

C. Dwork. Differential privacy: A survey of results. In TAMC, pages 1--19, 2008.

Digital Library

[11]

A. V. Evfimievski, J. Gehrke, and R. Srikant. Limiting privacy breaches in privacy preserving data mining. In PODS, pages 211--222, 2003.

Digital Library

[12]

B. C. M. Fung, K. Wang, and P. S. Yu. Top-down specialization for information and privacy preservation. In ICDE, pages 205--216, 2005.

Digital Library

[13]

R. J. B. Jr. and R. Agrawal. Data privacy through optimal k-anonymization. In ICDE, pages 217--228, 2005.

Digital Library

[14]

D. Lambert. Measures of disclosure risk and harm. Journal of Official Statistics, 9:313--331, 1993.

[15]

K. LeFevre, R. Agrawal, V. Ercegovac, R. Ramakrishnan, Y. Xu, and D. J. DeWitt. Limiting disclosure in hippocratic databases. In VLDB, pages 108--119, 2004.

Digital Library

[16]

K. LeFevre, D. J. DeWitt, and R. Ramakrishnan. Incognito: Efficient full-domain k-anonymity. In SIGMOD Conference, pages 49--60, 2005.

Digital Library

[17]

K. LeFevre, D. J. DeWitt, and R. Ramakrishnan. Mondrian multidimensional k-anonymity. In ICDE, page 25, 2006.

Digital Library

[18]

N. Li, T. Li, and S. Venkatasubramanian. t-closeness: Privacy beyond k-anonymity and l-diversity. Data Engineering, 2007. ICDE 2007. IEEE 23rd International Conference on, pages 106--115, April 2007.

[19]

A. Machanavajjhala and J. Gehrke. On the efficiency of checking perfect privacy. In PODS '06: Proceedings of the twenty-fifth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, pages 163--172, New York, NY, USA, 2006. ACM.

Digital Library

[20]

A. Machanavajjhala, D. Kifer, J. Gehrke, and M. Venkitasubramaniam. l -diversity: Privacy beyond k-anonymity. TKDD, 1(1), 2007.

Digital Library

[21]

D. J. Martin, D. Kifer, A. Machanavajjhala, J. Gehrke, and J. Y. Halpern. Worst-case background knowledge for privacy-preserving data publishing. In ICDE, pages 126--135, 2007.

[22]

D. Rebollo-Monedero, J. Forné, and J. Domingo-Ferrer. From t-closeness to pram and noise addition via information theory. In Privacy in Statistical Databases, pages 100--112, 2008.

Digital Library

[23]

P. Samarati and L. Sweeney. Generalizing data to provide anonymity when disclosing information (abstract). In PODS, page 188, 1998.

Digital Library

[24]

L. Sankar, S. R. Rajagopalan, and H. V. Poor. A theory of privacy and utility in databases. CoRR, abs/1102.3751, 2011.

[25]

N. Santhi and A. Vardy. On an improvement over rényi's equivocation bound. CoRR, abs/cs/0608087, 2006.

[26]

C. E. Shannon. A mathematical theory of communication. The Bell System Technical Journal, 27, 1948.

[27]

A. Solanas, F. Sebé, and J. Domingo-Ferrer. Micro-aggregation-based heuristics for p-sensitive k-anonymity: one step beyond. In PAIS, pages 61--69, 2008.

Digital Library

[28]

M. Sramka, R. Safavi-Naini, J. Denzinger, and M. Askari. A practice-oriented framework for measuring privacy and utility in data sanitization systems. In EDBT/ICDT Workshops, 2010.

Digital Library

[29]

L. Sweeney. Achieving k-anonymity privacy protection using generalization and suppression. International Journal on Uncertainty, Fuzziness and Knowledge-based Systems, 10:2002, 2002.

Digital Library

[30]

R. C.-W. Wong, A. W.-C. Fu, K. Wang, and J. Pei. Minimality attack in privacy preserving data publishing. In VLDB, pages 543--554, 2007.

Digital Library

[31]

L. Zhang, S. Jajodia, and A. Brodsky. Information disclosure under realistic assumptions: privacy versus optimality. In CCS '07: Proceedings of the 14th ACM conference on Computer and communications security, pages 573--583, New York, NY, USA, 2007. ACM.

Digital Library

Cited By

Gkoulalas-Divanis AKagklis VStavropoulos E(2019)A Frequent Itemset Hiding ToolboxMolecular Logic and Computational Synthetic Biology10.1007/978-3-030-19759-9_11(169-182)Online publication date: 28-Apr-2019
https://doi.org/10.1007/978-3-030-19759-9_11
Makris CMarkovits P(2018)Evaluation of Sensitive Data Hiding Techniques for Transaction DatabasesProceedings of the 10th Hellenic Conference on Artificial Intelligence10.1145/3200947.3201031(1-8)Online publication date: 9-Jul-2018
https://dl.acm.org/doi/10.1145/3200947.3201031
Janakiramaiah BKalyani GChittineni SNarendra Kumar Rao B(2018)An Unbiased Privacy Sustaining Approach Based on SGO for Distortion of Data Sets to Shield the Sensitive Patterns in Trading AlliancesSmart Intelligent Computing and Applications10.1007/978-981-13-1927-3_17(165-177)Online publication date: 5-Nov-2018
https://doi.org/10.1007/978-981-13-1927-3_17
Show More Cited By

Index Terms

An information theoretic privacy and utility measure for data sanitization mechanisms
1. Security and privacy
  1. Human and societal aspects of security and privacy
2. Social and professional topics
  1. Computing / technology policy
    1. Privacy policies

Recommendations

Privacy-preserving data sharing in cloud computing

Storing and sharing databases in the cloud of computers raise serious concern of individual privacy. We consider two kinds of privacy risk: presence leakage, by which the attackers can explicitly identify individuals in (or not in) the database, and ...
The cost of privacy: destruction of data-mining utility in anonymized data publishing
KDD '08: Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining

Re-identification is a major privacy threat to public datasets containing individual records. Many privacy protection algorithms rely on generalization and suppression of "quasi-identifier" attributes such as ZIP code and birthdate. Their objective is ...
Bridging unlinkability and data utility: Privacy preserving data publication schemes for healthcare informatics
Abstract
Publishing patient data without revealing their sensitive information is one of the challenging research issues in the healthcare sector. Patient records contain useful information that is often released to healthcare industries and ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

CODASPY '12: Proceedings of the second ACM conference on Data and Application Security and Privacy

February 2012

338 pages

ISBN:9781450310918

DOI:10.1145/2133601

General Chair:
Elisa Bertino
Purdue University, USA
,
Program Chair:
Ravi Sandhu
University of Texas at San Antonio, USA

Copyright © 2012 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGSAC: ACM Special Interest Group on Security, Audit, and Control

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 February 2012

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

CODASPY'12

Sponsor:

SIGSAC

CODASPY'12: Second ACM Conference on Data and Application Security and Privacy

February 7 - 9, 2012

Texas, San Antonio, USA

Acceptance Rates

CODASPY '12 Paper Acceptance Rate 21 of 113 submissions, 19%;

Overall Acceptance Rate 149 of 789 submissions, 19%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

12
Total Citations
View Citations
541
Total Downloads

Downloads (Last 12 months)8
Downloads (Last 6 weeks)0

Reflects downloads up to 01 Sep 2024

Other Metrics

View Author Metrics

Citations

Cited By

Gkoulalas-Divanis AKagklis VStavropoulos E(2019)A Frequent Itemset Hiding ToolboxMolecular Logic and Computational Synthetic Biology10.1007/978-3-030-19759-9_11(169-182)Online publication date: 28-Apr-2019
https://doi.org/10.1007/978-3-030-19759-9_11
Makris CMarkovits P(2018)Evaluation of Sensitive Data Hiding Techniques for Transaction DatabasesProceedings of the 10th Hellenic Conference on Artificial Intelligence10.1145/3200947.3201031(1-8)Online publication date: 9-Jul-2018
https://dl.acm.org/doi/10.1145/3200947.3201031
Janakiramaiah BKalyani GChittineni SNarendra Kumar Rao B(2018)An Unbiased Privacy Sustaining Approach Based on SGO for Distortion of Data Sets to Shield the Sensitive Patterns in Trading AlliancesSmart Intelligent Computing and Applications10.1007/978-981-13-1927-3_17(165-177)Online publication date: 5-Nov-2018
https://doi.org/10.1007/978-981-13-1927-3_17
Tarameshloo ELoorak MFong PCarpendale S(2016)Using Visualization to Explore Original and Anonymized LBSN DataComputer Graphics Forum10.5555/3071534.307156635:3(291-300)Online publication date: 1-Jun-2016
https://dl.acm.org/doi/10.5555/3071534.3071566
Ahmadinejad SFong PSafavi-Naini RChen XWang XHuang X(2016)Privacy and Utility of Inference Control Mechanisms for Social Computing ApplicationsProceedings of the 11th ACM on Asia Conference on Computer and Communications Security10.1145/2897845.2897878(829-840)Online publication date: 30-May-2016
https://dl.acm.org/doi/10.1145/2897845.2897878
Mortazavi RJalili S(2016)Enhancing aggregation phase of microaggregation methods for interval disclosure risk minimizationData Mining and Knowledge Discovery10.1007/s10618-015-0432-z30:3(605-639)Online publication date: 1-May-2016
https://dl.acm.org/doi/10.1007/s10618-015-0432-z
Lin BKifer D(2015)Information Measures in Statistical Privacy and Data Processing ApplicationsACM Transactions on Knowledge Discovery from Data10.1145/27004079:4(1-29)Online publication date: 1-Jun-2015
https://dl.acm.org/doi/10.1145/2700407
Mortazavi RJalili S(2015)Preference-based anonymization of numerical datasets by multi-objective microaggregationInformation Fusion10.1016/j.inffus.2014.10.00325:C(85-104)Online publication date: 1-Sep-2015
https://dl.acm.org/doi/10.1016/j.inffus.2014.10.003
Akoka JComyn-Wattiau IDu Mouza CFadili HLammari NMetais ECherfi S(2014)A Semantic Approach for Semi-Automatic Detection of Sensitive DataInformation Resources Management Journal10.4018/irmj.201410010227:4(23-44)Online publication date: 1-Oct-2014
https://dl.acm.org/doi/10.4018/irmj.2014100102
Kagklis VVerykios VTzimas GTsakalidis AAkerkar RBassiliades NDavies JErmolayev V(2014)Knowledge Sanitization on the WebProceedings of the 4th International Conference on Web Intelligence, Mining and Semantics (WIMS14)10.1145/2611040.2611044(1-11)Online publication date: 2-Jun-2014
https://dl.acm.org/doi/10.1145/2611040.2611044
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents