Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
article

L-diversity: Privacy beyond k-anonymity

Published: 01 March 2007 Publication History

Abstract

Publishing data about individuals without revealing sensitive information about them is an important problem. In recent years, a new definition of privacy called k-anonymity has gained popularity. In a k-anonymized dataset, each record is indistinguishable from at least k − 1 other records with respect to certain identifying attributes.
In this article, we show using two simple attacks that a k-anonymized dataset has some subtle but severe privacy problems. First, an attacker can discover the values of sensitive attributes when there is little diversity in those sensitive attributes. This is a known problem. Second, attackers often have background knowledge, and we show that k-anonymity does not guarantee privacy against attackers using background knowledge. We give a detailed analysis of these two attacks, and we propose a novel and powerful privacy criterion called ℓ-diversity that can defend against such attacks. In addition to building a formal foundation for ℓ-diversity, we show in an experimental evaluation that ℓ-diversity is practical and can be implemented efficiently.

References

[1]
Adam, N. R. and Wortmann, J. C. 1989. Security-control methods for statistical databases: A comparative study. ACM Comput. Surv. 21, 4, 515--556.
[2]
Aggarwal, C. C. and Yu, P. S. 2004. A condensation approach to privacy preserving data mining. In Proceedings of the International Conference on Extending Database Technology (EDBT). 183--199.
[3]
Aggarwal, G., Feder, T., Kenthapadi, K., Motwani, R., Panigrahy, R., Thomas, D., and Zhu, A. 2004. k-anonymity: Algorithms and hardness. Tech. rep., Stanford University.
[4]
Agrawal, D. and Aggarwal, C. C. 2001. On the design and quantifiaction of privacy preserving data mining algorithms. In Proceedings of the International Conference on Principles of Database Systems (PODS).
[5]
Agrawal, R., Bayardo, R. J., Faloutsos, C., Kiernan, J., Rantzau, R., and Srikant, R. 2004. Auditing compliance with a hippocratic database. In Proceedings of the International Conference on Very Large Databases (VLDB). 516--527.
[6]
Agrawal, R., Evfimievski, A. V., and Srikant, R. 2003. Information sharing across private databases. In Proceedings of the SIGMOD Conference. 86--97.
[7]
Agrawal, R., Kiernan, J., Srikant, R., and Xu, Y. 2002. Hippocratic databases. In Proceedings of the International Conference on Very Large Databases (VLDB). 143--154.
[8]
Agrawal, R. and Srikant, R. 1994. Fast Algorithms for Mining Association Rules in Large Databases. In Proceedings of the International Conference on Very Large Databases (VLDB).
[9]
Agrawal, R. and Srikant, R. 2000. Privacy preserving data mining. In Proceedings of the 19th ACM SIGMOD Conference on Management of Data.
[10]
Agrawal, R., Srikant, R., and Thomas, D. 2004. Privacy preserving OLAP. In Proceedings of the 23th ACM SIGMOD Conference on Management of Data.
[11]
Bacchus, F., Grove, A. J., Halpern, J. Y., and Koller, D. 1996. From statistical knowledge bases to degrees of belief. A.I. 87, 1--2.
[12]
Bayardo, R. J. and Agrawal, R. 2005. Data privacy through optimal k-anonymization. In Proceedings of the International Conference on Data Engineering (ICDE'05).
[13]
Beaver, D. 1997. Commodity-based cryptography (extended abstract). In Proceedings of the 29th ACM Symposium on Theory of Computing (STOC'97). 446--455.
[14]
Beaver, D. 1998. Server-assisted cryptography. In Proceedings of the 1998 Workshop on New Security Paradigms (NSPW'98). 92--106.
[15]
Beck, L. 1980. A security mechanism for statistical database. ACM Trans. Datab. Syst. 5, 3, 316--338.
[16]
Ben-Or, M., Goldwasser, S., and Wigderson, A. 1988. Completeness theorems for non-cryptographic fault-tolerant distributed computation. In Proceedings of the 20th ACM Symposium on Theory of Computing (STOC'88). 1--10.
[17]
Ben-Tal, A., Charnes, A., and Teboulle, M. 1989. Entropic means. J. Mathemat. Anal. Appl. 139, 2, 537--551.
[18]
Blum, A., Dwork, C., McSherry, F., and Nissim, K. 2005. Practical privacy: The SuLQ framework. In Proceedings of the International Conference on Principles of Data Systems (PODS).
[19]
Chaum, D., Crepeau, C., and Damgard, I. 1988. Multiparty unconditionally secure protocols. In Proceedings of the 20th ACM Symposium on Theory of Computing (STOC'88). 11--19.
[20]
Chawla, S., Dwork, C., McSherry, F., Smith, A., and Wee, H. 2005. Toward privacy in public databases. In Proceedings of the Tactical Communications Conference (TCC).
[21]
Chin, F. 1986. Security problems on inference control for sum, max, and min queries. J. ACM 33, 3, 451--464.
[22]
Chin, F. and Ozsoyoglu, G. 1981. Auditing for secure statistical databases. In Proceedings of the ACM Conference (ACM'81). 53--59.
[23]
Clifton, C., Kantarcioglu, M., Vaidya, J., Lin, X., and Zhu, M. Y. 2002. Tools for privacy preserving data mining. SIGKDD Explorations 4, 2, 28--34.
[24]
Cox, L. 1995. Network models for complementary cell suppression. J. Amer. Statis. Asso. 90, 1453--1462.
[25]
Cox, L. H. 1980. Suppression, methodology and statistical disclosure control. J. Amer. Statis. Asso. 75.
[26]
Cox, L. H. 1982. Solving confidentiality protection problems in tabulations using network optimization: A network model for cell suppression in the u.s. economic censuses. In Proceedings of the International Seminar on Statistical Confidentiality. Dublin International Statistical Institute, Dublin, Ireland. 229--245.
[27]
Cox, L. H. 1987. New results in dislosure avoidance for tabulations. In Proceedings of the International Statistical Institute 46th Session. Tokyo, Japan. 83--84.
[28]
Dalenius, T. 1981. A simple procedure for controlled rounding. Statistik Tidskrift.
[29]
Dalenius, T. and Reiss, S. 1982. Data swapping: A technique for disclosure control. J. Statis. Plan. Infer. 6.
[30]
Denning, D. 1980. Secure statistical databases with random sample queries. ACM Trans. Datab. Syst. 5, 3, 291--315.
[31]
Denning, D. E., Denning, P. J., and Schwartz, M. D. 1979. The tracker: A threat to statistical database security. ACM Trans. Datab. Syst. 4, 1, 76--96.
[32]
Diaconis, P. and Sturmfels, B. 1998. Algebraic algorithms for sampling from conditional distributions. Annals of Statistics 1, 363--397.
[33]
Dinur, I. and Nissim, K. 2003. Revealing information while preserving privacy. In Proceedings of the International Conference on Principles of Data Systems (PODS). 202--210.
[34]
Dobkin, D. P., Jones, A. K., and Lipton, R. J. 1979. Secure databases: Protection against user influence. ACM: Trans. Datab. Syst. 4, 1 (March), 76--96.
[35]
Dobra, A. 2002. Statistical tools for disclosure limitation in multiway contingency tables. Ph.D. thesis, Carnegie Mellon University.
[36]
Dobra, A. and Feinberg, S. E. 2000. Assessing the risk of disclosure of confidential categorical data. In Bayesian Statistics 7. Oxford University Press, Oxford, UK.
[37]
Dobra, A. and Feinberg, S. E. 2003. Bounding entries in multi-way contingency tables given a set of marginal totals. In Proceedings of the Shoresh Conference 2000: Foundations of Statistical Inference. Springer Verlag.
[38]
Du, W. 2001. A study of several specific secure two-party computation problems. Ph.D. thesis, Purdue University.
[39]
Du, W. and Zhan, Z. 2002. A practical approach to solve secure multi-party computation problems. New Security Paradigms Workshop.
[40]
Duncan, G. T. and Feinberg, S. E. 1997. Obtaining information while preserving privacy: A markov perturbation method for tabular data. Joint Statistical Meetings. Anaheim, CA.
[41]
Evfimievski, A., Gehrke, J., and Srikant, R. 2003. Limiting privacy breaches in privacy preserving data mining. In Proceedings of the International Conference on Principles of Data Systems (PODS).
[42]
Evfimievsky, A., Srikant, R., Gehrke, J., and Agrawal, R. 2002. Privacy preserving data mining of association rules. In Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery in Databases and Data Mining. 217--228.
[43]
Fellegi, I. P. 1972. On the question of statistical confidentiality. J. Amer. Statis. Asso. 67:337, 7--18.
[44]
Goldreich, O., Micali, S., and Wigderson, A. 1987. How to play any mental game. In Proceedings of the 19th ACM Conference on Theory of Computing (STOC'87). 218--229.
[45]
Huang, Z., Du, W., and Chen, B. 2004. Deriving private information from randomized data. In Proceedings of the 23th ACM SIGMOD Conference on Management of Data.
[46]
Kantarcioglu, M. and Clifton, C. 2002. Privacy-preserving distributed mining of association rules on horizontally partitioned data. In Proceedings of the Conference on Data Mining and Knowledge Discovery (DMKD).
[47]
Kargupta, H., Datta, S., Wang, Q., and Sivakumar, K. 2003. On the privacy preserving properties of random data perturbation techniques. In Proceedings of the International Conference on Data Mining (ICDM). 99--106.
[48]
Kenthapadi, K., Mishra, N., and Nissim, K. 2005. Simulatable auditing. In PODS.
[49]
Kleinberg, J., Papadimitriou, C., and Raghavan, P. 2000. Auditing boolean attributes. In Proceedings of the International Conference on Principles of Data Systems (PODS).
[50]
LeFevre, K., Agrawal, R., Ercegovac, V., Ramakrishnan, R., Xu, Y., and DeWitt, D. J. 2004. Limiting disclosure in hippocratic databases. In Proceedings of the International Conference on Very Large Databases (VLDB). 108--119.
[51]
LeFevre, K., DeWitt, D., and Ramakrishnan, R. 2005. Incognito: Efficient fulldomain k-anonymity. In SIGMOD.
[52]
M. Langheinrich, E. 2001. A P3P preference exchange language 1.0 (appel1.0). W3C Working Draft.
[53]
M. Marchiori, E. 2002. The platform for privacy preferences 1.0 (p3p1.0) specification. W3C Proposed Recommendation.
[54]
Machanavajjhala, A., Gehrke, J., Kifer, D., and Venkitasubramaniam, M. 2006. ℓ-diversity: Privacy beyond k-anonymity. In Proceedings of the International Conference on Data Engineering (ICDE).
[55]
Martin, D., Kifer, D., Machanavajjhala, A., Gehrke, J., and Halpern, J. 2006. Worst-case background knowledge in privacy. Tech. rep., Cornell University.
[56]
Matloff, N. S. 1986. Another look at the use of noise addition for database security. In Proceedings of IEEE Symposium on Security and Privacy. 173--180.
[57]
Meyerson, A. and Williams, R. 2004. On the complexity of optimal k-anonymity. In PODS.
[58]
Miklau, G. and Suciu, D. 2003. Controlling access to published data using cryptography. In Proceedings of the International Conference on Very Large Databases (VLDB). 898--909.
[59]
Miklau, G. and Suciu, D. 2004. A formal analysis of information disclosure in data exchange. In SIGMOD.
[60]
Ohrn, A. and Ohno-Machado, L. 1999. Using boolean reasoning to anonymize databases. A. I. Medicine 15, 3, 235--254.
[61]
Samarati, P. 2001. Protecting respondents' identities in microdata release. In IEEE Trans. Knowl. Data Eng.
[62]
Samarati, P. and Sweeney, L. 1998. Protecting privacy when disclosing information: k-anonymity and its enforcement through generalization and suppression. Tech. rep. SRI-CSL-98-04, SRI Computer Science Laboratory, Palo Alto, CA.
[63]
Schlorer, J. 1975. Identification and retrieval of personal records from a statistical bank. Methods Inform. Medicine.
[64]
Slavkovic, A. and Feinberg, S. E. 2004. Bounds for cell entries in two-way tables given conditional relative frequencies. In Lecture Notes in Computer Science, Vol. 3050. J. Domingo-Ferrer and V. Torra Eds. Springer-Verlag, 30--43.
[65]
Snodgrass, R. T., Yao, S., and Collberg, C. S. 2004. Tamper detection in audit logs. In Proceedings of the International Conference on Very Large Databases (VLDB). 504--515.
[66]
Sweeney, L. 2000. Uniqueness of simple demographics in the u.s. population. Tech. rep., Carnegie Mellon University.
[67]
Sweeney, L. 2002. k-anonymity: a model for protecting privacy. Int. J. Uncer., Fuz. Knowl-based Syst. 10, 5, 557--570.
[68]
Traub, J. F., Yemini, Y., and Wozniakowski, H. 1984. The statistical security of a statistical database. ACM Trans. Datab. Syst. 9, 4, 672--679.
[69]
University of California Irvine Machine Learning Repository. http://www.ics.uci.edu/mlearn/mlrepository.html.
[70]
Vaidya, J. and Clifton, C. 2002. Privacy preserving association rule mining in vertically partitioned data. In Proceedings of the International Conference on Knowledge Discovery and Data Mining (KDD). 639--644.
[71]
Warner, S. L. 1965. Randomized response: A survey technique for eliminating evasive answer bias. J. Amer. Statis. Ass.
[72]
Warner, S. L. 1971. The linear randomized response model. J. Amer. Statis. Ass. 884--888.
[73]
Yang, X. and Li, C. 2004. Secure XML publishing without information leakage in the presence of data inference. In Proceedings of the International Conference on Very Large Databases (VLDB). 96--107.
[74]
Zhong, S., Yang, Z., and Wright, R. N. 2005. Privacy-enhancing k-anonymization of customer data. In Proceedings of the International Conference on Principles of Data Systems (PODS).

Cited By

View all
  • (2025)An Enhanced Clustering-Based (k, t)-Anonymity Algorithm for GraphsChinese Journal of Electronics10.23919/cje.2023.00.27634:1(365-372)Online publication date: Jan-2025
  • (2025)Comparison of anonymization techniques regarding statistical reproducibilityPLOS Digital Health10.1371/journal.pdig.00007354:2(e0000735)Online publication date: 3-Feb-2025
  • (2025)Recent Advances of Differential Privacy in Centralized Deep Learning: A Systematic SurveyACM Computing Surveys10.1145/371200057:6(1-28)Online publication date: 21-Jan-2025
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Knowledge Discovery from Data
ACM Transactions on Knowledge Discovery from Data  Volume 1, Issue 1
March 2007
161 pages
ISSN:1556-4681
EISSN:1556-472X
DOI:10.1145/1217299
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 March 2007
Published in TKDD Volume 1, Issue 1

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. ℓ-diversity
  2. k-anonymity
  3. Data privacy
  4. privacy-preserving data publishing

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)639
  • Downloads (Last 6 weeks)101
Reflects downloads up to 22 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2025)An Enhanced Clustering-Based (k, t)-Anonymity Algorithm for GraphsChinese Journal of Electronics10.23919/cje.2023.00.27634:1(365-372)Online publication date: Jan-2025
  • (2025)Comparison of anonymization techniques regarding statistical reproducibilityPLOS Digital Health10.1371/journal.pdig.00007354:2(e0000735)Online publication date: 3-Feb-2025
  • (2025)Recent Advances of Differential Privacy in Centralized Deep Learning: A Systematic SurveyACM Computing Surveys10.1145/371200057:6(1-28)Online publication date: 21-Jan-2025
  • (2025)On Collaboration and Automation in the Context of Threat Detection and Response with Privacy-Preserving FeaturesDigital Threats: Research and Practice10.1145/37076516:1(1-36)Online publication date: 14-Feb-2025
  • (2025)TA_DA: Target-Aware Data AnonymizationIEEE Transactions on Privacy10.1109/TP.2025.35274612(15-26)Online publication date: 2025
  • (2025)Privacy Passport: Privacy-Preserving Cross-Domain Data SharingIEEE Transactions on Information Forensics and Security10.1109/TIFS.2024.351579720(636-650)Online publication date: 1-Jan-2025
  • (2025)Trading-Off Privacy, Utility, and Explainability in Deep Learning-Based Image Data AnalysisIEEE Transactions on Dependable and Secure Computing10.1109/TDSC.2024.340060822:1(388-405)Online publication date: 1-Jan-2025
  • (2025)DP With Auxiliary Information: Gaussian Mechanism Versus Laplacian MechanismIEEE Open Journal of the Communications Society10.1109/OJCOMS.2024.35219406(143-153)Online publication date: 2025
  • (2025)Solving Truthfulness-Privacy Trade-Off in Mixed Data Outsourcing by Using Data Balancing and Attribute Correlation-Aware Differential PrivacyIEEE Access10.1109/ACCESS.2025.353710913(23171-23194)Online publication date: 2025
  • (2025)Comprehensive Review of Privacy, Utility, and Fairness Offered by Synthetic DataIEEE Access10.1109/ACCESS.2025.353212813(15795-15811)Online publication date: 2025
  • Show More Cited By

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media