Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2133601.2133637acmconferencesArticle/Chapter ViewAbstractPublication PagescodaspyConference Proceedingsconference-collections
research-article

An information theoretic privacy and utility measure for data sanitization mechanisms

Published: 07 February 2012 Publication History

Abstract

Data collection agencies publish sensitive data for legitimate purposes, such as research, marketing and etc. Data publishing has attracted much interest in research community due to the important concerns over the protection of individuals privacy. As a result several sanitization mechanisms with different notions of privacy have been proposed. To be able to measure, set and compare the level of privacy protection, there is a need to translate these different mechanisms to a unified system. In this paper, we propose a novel information theoretic framework for representing a formal model of a mechanism as a noisy channel and evaluating its privacy and utility. We show that deterministic publishing property that is used in most of these mechanisms reduces the privacy guarantees and causes information to leak. The great effect of adversary's background knowledge on this metric is concluded. We also show that using this framework we can compute the sanitization mechanism's preserved utility from the point of view of a data user. By using the specifications of a popular sanitization mechanism, k-anonymity, we analytically provide a representation of this mechanism to be used for its evaluation.

References

[1]
D. Agrawal and C. C. Aggarwal. On the design and quantification of privacy preserving data mining algorithms. In PODS, 2001.
[2]
K. Chatzikokolakis, C. Palamidessi, and P. Panangaden. On the bayes risk in information-hiding protocols. J. Comput. Secur., 16(5):531--571, 2008.
[3]
B.-C. Chen, K. LeFevre, and R. Ramakrishnan. Privacy skyline: privacy with multidimensional adversarial knowledge. In VLDB '07: Proceedings of the 33rd international conference on Very large data bases, pages 770--781. VLDB Endowment, 2007.
[4]
T. Dalenius. Towards a methodology for statistical disclosure control. Statistik Tidskrift, 15:429--444, 1977.
[5]
S. De Capitani di Vimercati and P. Samarati. k-Anonymity for protecting privacy. October 2006.
[6]
J. Domingo-Ferrer, A. Oganian, and V. Torra. Information-theoretic disclosure risk measures in statistical disclosure control of tabular data. In SSDBM, pages 227--231, 2002.
[7]
J. Domingo-Ferrer, F. Sebé, and J. Castellà-Roca. On the security of noise addition for privacy in statistical databases. In Privacy in Statistical Databases, pages 149--161, 2004.
[8]
G. Duncan and D. Lambert. The risk of disclosure for microdata. Journal of Business & Economic Statistics, 7(2):207--17, April 1989.
[9]
C. Dwork. Differential privacy. In ICALP (2), pages 1--12, 2006.
[10]
C. Dwork. Differential privacy: A survey of results. In TAMC, pages 1--19, 2008.
[11]
A. V. Evfimievski, J. Gehrke, and R. Srikant. Limiting privacy breaches in privacy preserving data mining. In PODS, pages 211--222, 2003.
[12]
B. C. M. Fung, K. Wang, and P. S. Yu. Top-down specialization for information and privacy preservation. In ICDE, pages 205--216, 2005.
[13]
R. J. B. Jr. and R. Agrawal. Data privacy through optimal k-anonymization. In ICDE, pages 217--228, 2005.
[14]
D. Lambert. Measures of disclosure risk and harm. Journal of Official Statistics, 9:313--331, 1993.
[15]
K. LeFevre, R. Agrawal, V. Ercegovac, R. Ramakrishnan, Y. Xu, and D. J. DeWitt. Limiting disclosure in hippocratic databases. In VLDB, pages 108--119, 2004.
[16]
K. LeFevre, D. J. DeWitt, and R. Ramakrishnan. Incognito: Efficient full-domain k-anonymity. In SIGMOD Conference, pages 49--60, 2005.
[17]
K. LeFevre, D. J. DeWitt, and R. Ramakrishnan. Mondrian multidimensional k-anonymity. In ICDE, page 25, 2006.
[18]
N. Li, T. Li, and S. Venkatasubramanian. t-closeness: Privacy beyond k-anonymity and l-diversity. Data Engineering, 2007. ICDE 2007. IEEE 23rd International Conference on, pages 106--115, April 2007.
[19]
A. Machanavajjhala and J. Gehrke. On the efficiency of checking perfect privacy. In PODS '06: Proceedings of the twenty-fifth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, pages 163--172, New York, NY, USA, 2006. ACM.
[20]
A. Machanavajjhala, D. Kifer, J. Gehrke, and M. Venkitasubramaniam. l -diversity: Privacy beyond k-anonymity. TKDD, 1(1), 2007.
[21]
D. J. Martin, D. Kifer, A. Machanavajjhala, J. Gehrke, and J. Y. Halpern. Worst-case background knowledge for privacy-preserving data publishing. In ICDE, pages 126--135, 2007.
[22]
D. Rebollo-Monedero, J. Forné, and J. Domingo-Ferrer. From t-closeness to pram and noise addition via information theory. In Privacy in Statistical Databases, pages 100--112, 2008.
[23]
P. Samarati and L. Sweeney. Generalizing data to provide anonymity when disclosing information (abstract). In PODS, page 188, 1998.
[24]
L. Sankar, S. R. Rajagopalan, and H. V. Poor. A theory of privacy and utility in databases. CoRR, abs/1102.3751, 2011.
[25]
N. Santhi and A. Vardy. On an improvement over rényi's equivocation bound. CoRR, abs/cs/0608087, 2006.
[26]
C. E. Shannon. A mathematical theory of communication. The Bell System Technical Journal, 27, 1948.
[27]
A. Solanas, F. Sebé, and J. Domingo-Ferrer. Micro-aggregation-based heuristics for p-sensitive k-anonymity: one step beyond. In PAIS, pages 61--69, 2008.
[28]
M. Sramka, R. Safavi-Naini, J. Denzinger, and M. Askari. A practice-oriented framework for measuring privacy and utility in data sanitization systems. In EDBT/ICDT Workshops, 2010.
[29]
L. Sweeney. Achieving k-anonymity privacy protection using generalization and suppression. International Journal on Uncertainty, Fuzziness and Knowledge-based Systems, 10:2002, 2002.
[30]
R. C.-W. Wong, A. W.-C. Fu, K. Wang, and J. Pei. Minimality attack in privacy preserving data publishing. In VLDB, pages 543--554, 2007.
[31]
L. Zhang, S. Jajodia, and A. Brodsky. Information disclosure under realistic assumptions: privacy versus optimality. In CCS '07: Proceedings of the 14th ACM conference on Computer and communications security, pages 573--583, New York, NY, USA, 2007. ACM.

Cited By

View all
  • (2019)A Frequent Itemset Hiding ToolboxMolecular Logic and Computational Synthetic Biology10.1007/978-3-030-19759-9_11(169-182)Online publication date: 28-Apr-2019
  • (2018)Evaluation of Sensitive Data Hiding Techniques for Transaction DatabasesProceedings of the 10th Hellenic Conference on Artificial Intelligence10.1145/3200947.3201031(1-8)Online publication date: 9-Jul-2018
  • (2018)An Unbiased Privacy Sustaining Approach Based on SGO for Distortion of Data Sets to Shield the Sensitive Patterns in Trading AlliancesSmart Intelligent Computing and Applications10.1007/978-981-13-1927-3_17(165-177)Online publication date: 5-Nov-2018
  • Show More Cited By

Index Terms

  1. An information theoretic privacy and utility measure for data sanitization mechanisms

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      CODASPY '12: Proceedings of the second ACM conference on Data and Application Security and Privacy
      February 2012
      338 pages
      ISBN:9781450310918
      DOI:10.1145/2133601
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 07 February 2012

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. information theory
      2. privacy
      3. sanitization mechanism
      4. utility

      Qualifiers

      • Research-article

      Conference

      CODASPY'12
      Sponsor:

      Acceptance Rates

      CODASPY '12 Paper Acceptance Rate 21 of 113 submissions, 19%;
      Overall Acceptance Rate 149 of 789 submissions, 19%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)8
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 01 Sep 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2019)A Frequent Itemset Hiding ToolboxMolecular Logic and Computational Synthetic Biology10.1007/978-3-030-19759-9_11(169-182)Online publication date: 28-Apr-2019
      • (2018)Evaluation of Sensitive Data Hiding Techniques for Transaction DatabasesProceedings of the 10th Hellenic Conference on Artificial Intelligence10.1145/3200947.3201031(1-8)Online publication date: 9-Jul-2018
      • (2018)An Unbiased Privacy Sustaining Approach Based on SGO for Distortion of Data Sets to Shield the Sensitive Patterns in Trading AlliancesSmart Intelligent Computing and Applications10.1007/978-981-13-1927-3_17(165-177)Online publication date: 5-Nov-2018
      • (2016)Using Visualization to Explore Original and Anonymized LBSN DataComputer Graphics Forum10.5555/3071534.307156635:3(291-300)Online publication date: 1-Jun-2016
      • (2016)Privacy and Utility of Inference Control Mechanisms for Social Computing ApplicationsProceedings of the 11th ACM on Asia Conference on Computer and Communications Security10.1145/2897845.2897878(829-840)Online publication date: 30-May-2016
      • (2016)Enhancing aggregation phase of microaggregation methods for interval disclosure risk minimizationData Mining and Knowledge Discovery10.1007/s10618-015-0432-z30:3(605-639)Online publication date: 1-May-2016
      • (2015)Information Measures in Statistical Privacy and Data Processing ApplicationsACM Transactions on Knowledge Discovery from Data10.1145/27004079:4(1-29)Online publication date: 1-Jun-2015
      • (2015)Preference-based anonymization of numerical datasets by multi-objective microaggregationInformation Fusion10.1016/j.inffus.2014.10.00325:C(85-104)Online publication date: 1-Sep-2015
      • (2014)A Semantic Approach for Semi-Automatic Detection of Sensitive DataInformation Resources Management Journal10.4018/irmj.201410010227:4(23-44)Online publication date: 1-Oct-2014
      • (2014)Knowledge Sanitization on the WebProceedings of the 4th International Conference on Web Intelligence, Mining and Semantics (WIMS14)10.1145/2611040.2611044(1-11)Online publication date: 2-Jun-2014
      • Show More Cited By

      View Options

      Get Access

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media