Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1142473.1142499acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
Article

Injecting utility into anonymized datasets

Published: 27 June 2006 Publication History

Abstract

Limiting disclosure in data publishing requires a careful balance between privacy and utility. Information about individuals must not be revealed, but a dataset should still be useful for studying the characteristics of a population. Privacy requirements such as k-anonymity and l-diversity are designed to thwart attacks that attempt to identify individuals in the data and to discover their sensitive information. On the other hand, the utility of such data has not been well-studied.In this paper we will discuss the shortcomings of current heuristic approaches to measuring utility and we will introduce a formal approach to measuring utility. Armed with this utility metric, we will show how to inject additional information into k-anonymous and l-diverse tables. This information has an intuitive semantic meaning, it increases the utility beyond what is possible in the original k-anonymity and l-diversity frameworks, and it maintains the privacy guarantees of k-anonymity and l-diversity.

References

[1]
N. R. Adam and J. C. Wortmann. Security-control methods for statistical databases: A comparative study. ACM Comput. Surv., 21(4):515--556, 1989.]]
[2]
Charu C. Aggarwal. On k-anonymity and the curse of dimensionality. In VLDB, pages 901--909, 2005.]]
[3]
G. Aggarwal, T. Feder, K. Kenthapadi, R. Motwani, R. Panigrahy, D. Thomas, and A. Zhu. Approximation algorithms for k-anonymity. Journal of Privacy Technology (JOPT), 2005.]]
[4]
D. Agrawal and C. C. Aggarwal. On the design and quantifiaction of privacy preserving data mining algorithms. In PODS, May 2001.]]
[5]
R. Agrawal and R. Srikant. Privacy preserving data mining. In SIGMOD, May 2000.]]
[6]
R. J. Bayardo and R. Agrawal. Data privacy through optimal k-anonymization. In ICDE, 2005.]]
[7]
Adam L. Berger, Vincent J. Della Pietra, and Stephen A. Della Pietra. A maximum entropy approach to natural language processing. Comput. Linguist., 22(1):39--71, 1996.]]
[8]
Kevin S. Beyer, Jonathan Goldstein, Raghu Ramakrishnan, and Uri Shaft. When is "nearest neighbor" meaningful? In ICDT, pages 217--235, 1999.]]
[9]
S. Chawla, C. Dwork, F. McSherry, A. Smith, and H. Wee. Toward privacy in public databases. In Theory of Cryptography Conference, 2005.]]
[10]
Ronald Christensen. Log-Linear Models and Logistic Regression. Springer-Verlag, 1997.]]
[11]
L. H. Cox. Suppression, methodology and statistical disclosure control. Journal of the American Statistical Association, 75, 1980.]]
[12]
T. Dalenius and S. Reiss. Data swapping: A technique for disclosure control. Journal of Statistical Planning and Inference, 6:73--85, 1982.]]
[13]
Amol Deshpande, Minos N. Garofalakis, and Michael I. Jordan. Efficient stepwise selection in decomposable models. In UAI, pages 128--135, 2001.]]
[14]
I. Dinur and K. Nissim. Revealing information while preserving privacy. In PODS, pages 202--210, 2003.]]
[15]
A. Dobra. Statistical Tools for Disclosure Limitation in Multiway Contingency Tables. PhD thesis, CMU, 2002.]]
[16]
A. Evfimievski, J. Gehrke, and R. Srikant. Limiting privacy breaches in privacy preserving data mining. In PODS, 2003.]]
[17]
Vijay S. Iyengar. Transforming data to satisfy privacy constraints. In KDD, pages 279--288, 2002.]]
[18]
Finn Verner Jensen and Frank Jensen. Optimal junction trees. In UAI, pages 360--366, 1994.]]
[19]
K. Kenthapadi, N. Mishra, and K. Nissim. Simulatable auditing. In PODS, 2005.]]
[20]
S. L. Lauritzen. Graphical Models. Oxford Science Publications, 1996.]]
[21]
K. LeFevre, D. DeWitt, and R. Ramakrishnan. Incognito: Efficient fulldomain k-anonymity. In SIGMOD, 2005.]]
[22]
Jesús A. De Loera and Shmuel Onn. The complexity of three-way statistical tables. SIAM J. Comput., 33(4):819--836, 2004.]]
[23]
A. Machanavajjhala, J. Gehrke, D. Kifer, and M. Venkitasubramaniam. l-diversity: Privacy beyond k-anonymity. In ICDE, 2006.]]
[24]
Francesco M. Malvestuto. Approximating discrete probability distributions with decomposable models. IEEE Transactions on systems, Man and Cybernetics, 21(5):1287--1294, 1991.]]
[25]
A. Meyerson and R. Williams. On the complexity of optimal k-anonymity. In PODS, 2004.]]
[26]
G. Miklau and D. Suciu. A formal analysis of information disclosure in data exchange. In SIGMOD, 2004.]]
[27]
Richard E. Neapolitan. Learning Bayesian Networks. Prentice Hall, December 2000.]]
[28]
Judea Pearl. Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 1988.]]
[29]
U.C.Irvine Machine Learning Repository. http://www.ics.uci.edu/ mlearn/mlrepository.html.]]
[30]
P. Samarati. Protecting respondents' identities in microdata release. In TKDE, pages 1010 -- 1027, 2001.]]
[31]
P. Samarati and L. Sweeney. Protecting privacy when disclosing information: k-anonymity and its enforcement through generalization and suppression. Technical report, CMU, SRI, 1998.]]
[32]
Sunita Sarawagi. User-adaptive exploration of multidimensional data. In VLDB, 2000.]]
[33]
L. Sweeney. k-anonymity: a model for protecting privacy. International Journal on Uncertainty, Fuzziness and Knowledge-based Systems, 10(5):557--570, 2002.]]
[34]
K. Wang, B. C. M. Fung, and P. S. Yu. Template-based privacy preservation in classification problems. In ICDM, November 2005.]]
[35]
Nanny Wermuth. Model search among multiplicative models. Biometrics, 1976.]]

Cited By

View all
  • (2024)Hide-and-Seek: Data Sharing with Customizable Machine Learnability and Privacy2024 33rd International Conference on Computer Communications and Networks (ICCCN)10.1109/ICCCN61486.2024.10637601(1-9)Online publication date: 29-Jul-2024
  • (2024)A Taxonomy of Syntactic Privacy Notions for Continuous Data PublishingIEEE Access10.1109/ACCESS.2024.336885212(38490-38511)Online publication date: 2024
  • (2024)A Review of Anonymization for Healthcare DataBig Data10.1089/big.2021.016912:6(538-555)Online publication date: 1-Dec-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGMOD '06: Proceedings of the 2006 ACM SIGMOD international conference on Management of data
June 2006
830 pages
ISBN:1595934340
DOI:10.1145/1142473
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 June 2006

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. k-anonymity
  2. graphical models
  3. l-diversity
  4. loglinear models
  5. marginals

Qualifiers

  • Article

Conference

SIGMOD/PODS06
Sponsor:

Acceptance Rates

Overall Acceptance Rate 785 of 4,003 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)15
  • Downloads (Last 6 weeks)1
Reflects downloads up to 11 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Hide-and-Seek: Data Sharing with Customizable Machine Learnability and Privacy2024 33rd International Conference on Computer Communications and Networks (ICCCN)10.1109/ICCCN61486.2024.10637601(1-9)Online publication date: 29-Jul-2024
  • (2024)A Taxonomy of Syntactic Privacy Notions for Continuous Data PublishingIEEE Access10.1109/ACCESS.2024.336885212(38490-38511)Online publication date: 2024
  • (2024)A Review of Anonymization for Healthcare DataBig Data10.1089/big.2021.016912:6(538-555)Online publication date: 1-Dec-2024
  • (2024)Semantic Tree Based PPDP Technique for Multiple Sensitive Attributes in Inter CloudSN Computer Science10.1007/s42979-024-03079-75:6Online publication date: 26-Jul-2024
  • (2023)A Comprehensive Review of Privacy Preserving Data Publishing (PPDP) Algorithms for Multiple Sensitive Attributes (MSA)Information Security and Privacy in Smart Devices10.4018/978-1-6684-5991-1.ch006(142-193)Online publication date: 31-Mar-2023
  • (2023)APPLICATION OF COMPUTER SIMULATION TO THE ANONYMIZATION OF PERSONAL DATA: STATE-OF-THE-ART AND KEY POINTSПрограммирование10.31857/S0132347423040040(58-74)Online publication date: 1-Jul-2023
  • (2023)Survey on Privacy-Preserving Techniques for Microdata PublicationACM Computing Surveys10.1145/358876555:14s(1-42)Online publication date: 28-Mar-2023
  • (2023)Differentially Private Release of Heterogeneous Network for Managing Healthcare DataACM Transactions on Knowledge Discovery from Data10.1145/358036717:6(1-30)Online publication date: 18-Jan-2023
  • (2023)Application of Computer Simulation to the Anonymization of Personal Data: State-of-the-Art and Key PointsProgramming and Computer Software10.1134/S036176882304004749:4(232-246)Online publication date: 28-Jul-2023
  • (2023)You Can't Always Get What You Want: Towards User-Controlled Privacy on AndroidIEEE Transactions on Dependable and Secure Computing10.1109/TDSC.2022.314602020:2(975-987)Online publication date: 1-Mar-2023
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media