Article

Injecting utility into anonymized datasets

Authors:

Johannes GehrkeAuthors Info & Claims

SIGMOD '06: Proceedings of the 2006 ACM SIGMOD international conference on Management of data

Pages 217 - 228

https://doi.org/10.1145/1142473.1142499

Published: 27 June 2006 Publication History

Abstract

Limiting disclosure in data publishing requires a careful balance between privacy and utility. Information about individuals must not be revealed, but a dataset should still be useful for studying the characteristics of a population. Privacy requirements such as k-anonymity and l-diversity are designed to thwart attacks that attempt to identify individuals in the data and to discover their sensitive information. On the other hand, the utility of such data has not been well-studied.In this paper we will discuss the shortcomings of current heuristic approaches to measuring utility and we will introduce a formal approach to measuring utility. Armed with this utility metric, we will show how to inject additional information into k-anonymous and l-diverse tables. This information has an intuitive semantic meaning, it increases the utility beyond what is possible in the original k-anonymity and l-diversity frameworks, and it maintains the privacy guarantees of k-anonymity and l-diversity.

References

[1]

N. R. Adam and J. C. Wortmann. Security-control methods for statistical databases: A comparative study. ACM Comput. Surv., 21(4):515--556, 1989.]]

Digital Library

[2]

Charu C. Aggarwal. On k-anonymity and the curse of dimensionality. In VLDB, pages 901--909, 2005.]]

Digital Library

[3]

G. Aggarwal, T. Feder, K. Kenthapadi, R. Motwani, R. Panigrahy, D. Thomas, and A. Zhu. Approximation algorithms for k-anonymity. Journal of Privacy Technology (JOPT), 2005.]]

[4]

D. Agrawal and C. C. Aggarwal. On the design and quantifiaction of privacy preserving data mining algorithms. In PODS, May 2001.]]

Digital Library

[5]

R. Agrawal and R. Srikant. Privacy preserving data mining. In SIGMOD, May 2000.]]

Digital Library

[6]

R. J. Bayardo and R. Agrawal. Data privacy through optimal k-anonymization. In ICDE, 2005.]]

Digital Library

[7]

Adam L. Berger, Vincent J. Della Pietra, and Stephen A. Della Pietra. A maximum entropy approach to natural language processing. Comput. Linguist., 22(1):39--71, 1996.]]

Digital Library

[8]

Kevin S. Beyer, Jonathan Goldstein, Raghu Ramakrishnan, and Uri Shaft. When is "nearest neighbor" meaningful? In ICDT, pages 217--235, 1999.]]

Digital Library

[9]

S. Chawla, C. Dwork, F. McSherry, A. Smith, and H. Wee. Toward privacy in public databases. In Theory of Cryptography Conference, 2005.]]

Digital Library

[10]

Ronald Christensen. Log-Linear Models and Logistic Regression. Springer-Verlag, 1997.]]

[11]

L. H. Cox. Suppression, methodology and statistical disclosure control. Journal of the American Statistical Association, 75, 1980.]]

[12]

T. Dalenius and S. Reiss. Data swapping: A technique for disclosure control. Journal of Statistical Planning and Inference, 6:73--85, 1982.]]

[13]

Amol Deshpande, Minos N. Garofalakis, and Michael I. Jordan. Efficient stepwise selection in decomposable models. In UAI, pages 128--135, 2001.]]

Digital Library

[14]

I. Dinur and K. Nissim. Revealing information while preserving privacy. In PODS, pages 202--210, 2003.]]

Digital Library

[15]

A. Dobra. Statistical Tools for Disclosure Limitation in Multiway Contingency Tables. PhD thesis, CMU, 2002.]]

[16]

A. Evfimievski, J. Gehrke, and R. Srikant. Limiting privacy breaches in privacy preserving data mining. In PODS, 2003.]]

Digital Library

[17]

Vijay S. Iyengar. Transforming data to satisfy privacy constraints. In KDD, pages 279--288, 2002.]]

Digital Library

[18]

Finn Verner Jensen and Frank Jensen. Optimal junction trees. In UAI, pages 360--366, 1994.]]

[19]

K. Kenthapadi, N. Mishra, and K. Nissim. Simulatable auditing. In PODS, 2005.]]

Digital Library

[20]

S. L. Lauritzen. Graphical Models. Oxford Science Publications, 1996.]]

[21]

K. LeFevre, D. DeWitt, and R. Ramakrishnan. Incognito: Efficient fulldomain k-anonymity. In SIGMOD, 2005.]]

Digital Library

[22]

Jesús A. De Loera and Shmuel Onn. The complexity of three-way statistical tables. SIAM J. Comput., 33(4):819--836, 2004.]]

Digital Library

[23]

A. Machanavajjhala, J. Gehrke, D. Kifer, and M. Venkitasubramaniam. l-diversity: Privacy beyond k-anonymity. In ICDE, 2006.]]

Digital Library

[24]

Francesco M. Malvestuto. Approximating discrete probability distributions with decomposable models. IEEE Transactions on systems, Man and Cybernetics, 21(5):1287--1294, 1991.]]

[25]

A. Meyerson and R. Williams. On the complexity of optimal k-anonymity. In PODS, 2004.]]

Digital Library

[26]

G. Miklau and D. Suciu. A formal analysis of information disclosure in data exchange. In SIGMOD, 2004.]]

Digital Library

[27]

Richard E. Neapolitan. Learning Bayesian Networks. Prentice Hall, December 2000.]]

Digital Library

[28]

Judea Pearl. Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 1988.]]

Digital Library

[29]

U.C.Irvine Machine Learning Repository. http://www.ics.uci.edu/ mlearn/mlrepository.html.]]

[30]

P. Samarati. Protecting respondents' identities in microdata release. In TKDE, pages 1010 -- 1027, 2001.]]

Digital Library

[31]

P. Samarati and L. Sweeney. Protecting privacy when disclosing information: k-anonymity and its enforcement through generalization and suppression. Technical report, CMU, SRI, 1998.]]

[32]

Sunita Sarawagi. User-adaptive exploration of multidimensional data. In VLDB, 2000.]]

[33]

L. Sweeney. k-anonymity: a model for protecting privacy. International Journal on Uncertainty, Fuzziness and Knowledge-based Systems, 10(5):557--570, 2002.]]

Digital Library

[34]

K. Wang, B. C. M. Fung, and P. S. Yu. Template-based privacy preservation in classification problems. In ICDM, November 2005.]]

Digital Library

[35]

Nanny Wermuth. Model search among multiplicative models. Biometrics, 1976.]]

Cited By

Xu HShu T(2024)Hide-and-Seek: Data Sharing with Customizable Machine Learnability and Privacy2024 33rd International Conference on Computer Communications and Networks (ICCCN)10.1109/ICCCN61486.2024.10637601(1-9)Online publication date: 29-Jul-2024
https://doi.org/10.1109/ICCCN61486.2024.10637601
Nicolau AParra-Arnau JForné J(2024)A Taxonomy of Syntactic Privacy Notions for Continuous Data PublishingIEEE Access10.1109/ACCESS.2024.336885212(38490-38511)Online publication date: 2024
https://doi.org/10.1109/ACCESS.2024.3368852
Olatunji IRauch JKatzensteiner MKhosla M(2024)A Review of Anonymization for Healthcare DataBig Data10.1089/big.2021.016912:6(538-555)Online publication date: 1-Dec-2024
https://doi.org/10.1089/big.2021.0169
Show More Cited By

Index Terms

Injecting utility into anonymized datasets
1. Computing methodologies
  1. Modeling and simulation
    1. Simulation theory
      1. Systems theory
2. Mathematics of computing
  1. Information theory

Recommendations

Can the Utility of Anonymized Data be Used for Privacy Breaches?

Group based anonymization is the most widely studied approach for privacy-preserving data publishing. Privacy models/definitions using group based anonymization includes k-anonymity, l-diversity, and t-closeness, to name a few. The goal of this article ...
Multi-criteria Optimization Using l-diversity and t-closeness for k-anonymization
Data Privacy Management, Cryptocurrencies and Blockchain Technology
Abstract
k-anonymity is a commonly used anonymization principle. It provides an anonymous table by grouping the individuals of the table in sets of at least k elements. This principle guarantees a good privacy while limiting the data alteration. Within the ...
Background knowledge attacks in privacy-preserving data publishing models
Abstract
Massive volumes of data are being generated at every moment through various sources in the cyber-physical world. While storing as well as facilitating these data for business or individual requirements, data disclosure, sensitive data leakage, ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

SIGMOD '06: Proceedings of the 2006 ACM SIGMOD international conference on Management of data

June 2006

830 pages

ISBN:1595934340

DOI:10.1145/1142473

General Chairs:
Clement Yu
University of Illinois at Chicago
,
Peter Scheuermann
Northwestern University
,
Program Chair:
Surajit Chaudhuri
Microsoft Research

Copyright © 2006 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 June 2006

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Article

Conference

SIGMOD/PODS06

Sponsor:

SIGMOD/PODS06: International Conference on Management of Data and Symposium on Principles Database and Systems

June 27 - 29, 2006

IL, Chicago, USA

Acceptance Rates

Overall Acceptance Rate 785 of 4,003 submissions, 20%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

198
Total Citations
View Citations
1,040
Total Downloads

Downloads (Last 12 months)15
Downloads (Last 6 weeks)1

Reflects downloads up to 11 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Xu HShu T(2024)Hide-and-Seek: Data Sharing with Customizable Machine Learnability and Privacy2024 33rd International Conference on Computer Communications and Networks (ICCCN)10.1109/ICCCN61486.2024.10637601(1-9)Online publication date: 29-Jul-2024
https://doi.org/10.1109/ICCCN61486.2024.10637601
Nicolau AParra-Arnau JForné J(2024)A Taxonomy of Syntactic Privacy Notions for Continuous Data PublishingIEEE Access10.1109/ACCESS.2024.336885212(38490-38511)Online publication date: 2024
https://doi.org/10.1109/ACCESS.2024.3368852
Olatunji IRauch JKatzensteiner MKhosla M(2024)A Review of Anonymization for Healthcare DataBig Data10.1089/big.2021.016912:6(538-555)Online publication date: 1-Dec-2024
https://doi.org/10.1089/big.2021.0169
Gadad VSowmyarani CDayananda P(2024)Semantic Tree Based PPDP Technique for Multiple Sensitive Attributes in Inter CloudSN Computer Science10.1007/s42979-024-03079-75:6Online publication date: 26-Jul-2024
https://doi.org/10.1007/s42979-024-03079-7
Gadad VSowmyarani C. N. (2023)A Comprehensive Review of Privacy Preserving Data Publishing (PPDP) Algorithms for Multiple Sensitive Attributes (MSA)Information Security and Privacy in Smart Devices10.4018/978-1-6684-5991-1.ch006(142-193)Online publication date: 31-Mar-2023
https://doi.org/10.4018/978-1-6684-5991-1.ch006
BORISOV ABOSOV AIVANOV A(2023)APPLICATION OF COMPUTER SIMULATION TO THE ANONYMIZATION OF PERSONAL DATA: STATE-OF-THE-ART AND KEY POINTSПрограммирование10.31857/S0132347423040040(58-74)Online publication date: 1-Jul-2023
https://doi.org/10.31857/S0132347423040040
Carvalho TMoniz NFaria PAntunes L(2023)Survey on Privacy-Preserving Techniques for Microdata PublicationACM Computing Surveys10.1145/358876555:14s(1-42)Online publication date: 28-Mar-2023
https://dl.acm.org/doi/10.1145/3588765
Khokhar RFung BIqbal FAl-Hussaeni KHussain M(2023)Differentially Private Release of Heterogeneous Network for Managing Healthcare DataACM Transactions on Knowledge Discovery from Data10.1145/358036717:6(1-30)Online publication date: 18-Jan-2023
https://dl.acm.org/doi/10.1145/3580367
Borisov ABosov AIvanov A(2023)Application of Computer Simulation to the Anonymization of Personal Data: State-of-the-Art and Key PointsProgramming and Computer Software10.1134/S036176882304004749:4(232-246)Online publication date: 28-Jul-2023
https://doi.org/10.1134/S0361768823040047
Caputo DPagano FBottino GVerderame LMerlo A(2023)You Can't Always Get What You Want: Towards User-Controlled Privacy on AndroidIEEE Transactions on Dependable and Secure Computing10.1109/TDSC.2022.314602020:2(975-987)Online publication date: 1-Mar-2023
https://doi.org/10.1109/TDSC.2022.3146020
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents