Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1557019.1557079acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

On the tradeoff between privacy and utility in data publishing

Published: 28 June 2009 Publication History

Abstract

In data publishing, anonymization techniques such as generalization and bucketization have been designed to provide privacy protection. In the meanwhile, they reduce the utility of the data. It is important to consider the tradeoff between privacy and utility. In a paper that appeared in KDD 2008, Brickell and Shmatikov proposed an evaluation methodology by comparing privacy gain with utility gain resulted from anonymizing the data, and concluded that "even modest privacy gains require almost complete destruction of the data-mining utility". This conclusion seems to undermine existing work on data anonymization. In this paper, we analyze the fundamental characteristics of privacy and utility, and show that it is inappropriate to directly compare privacy with utility. We then observe that the privacy-utility tradeoff in data publishing is similar to the risk-return tradeoff in financial investment, and propose an integrated framework for considering privacy-utility tradeoff, borrowing concepts from the Modern Portfolio Theory for financial investment. Finally, we evaluate our methodology on the Adult dataset from the UCI machine learning repository. Our results clarify several common misconceptions about data utility and provide data publishers useful guidelines on choosing the right tradeoff between privacy and utility.

Supplementary Material

JPG File (p517-li.jpg)
MP4 File (p517-li.mp4)

References

[1]
C. Aggarwal. On k-anonymity and the curse of dimensionality. In VLDB, pages 901--909, 2005.
[2]
A. Asuncion and D. Newman. UCI machine learning repository, 2007.
[3]
M. Barbaro and T. Z. Jr. A face is exposed for aol searcher no. 4417749. New York Times, 2006.
[4]
R. J. Bayardo and R. Agrawal. Data privacy through optimal k-anonymization. In ICDE, pages 217--228, 2005.
[5]
J. Brickell and V. Shmatikov. The cost of privacy: destruction of data-mining utility in anonymized data publishing. In KDD, pages 70--78, 2008.
[6]
G. T. Duncan and D. Lambert. Disclosure-limited data dissemination. J. Am. Stat. Assoc., pages 10--28, 1986.
[7]
C. Dwork. Differential privacy. In ICALP, pages 1--12, 2006.
[8]
E. Elton and M. Gruber. Modern Portfolio Theory and Investment Analysis. John Wiley&Sons Inc, 1995.
[9]
B. C. M. Fung, K. Wang, R. Chen, and P. S. Yu. Privacy-preserving data publishing: A survey on recent developments. ACM Computing Survey, 2009.
[10]
B. C. M. Fung, K. Wang, and P. S. Yu. Top-down specialization for information and privacy preservation. In ICDE, pages 205--216, 2005.
[11]
J. Han, J. Pei, and Y. Yin. Mining frequent patterns without candidate generation. In SIGMOD, pages 1--12, 2000.
[12]
V. S. Iyengar. Transforming data to satisfy privacy constraints. In KDD, pages 279--288, 2002.
[13]
D. Kifer and J. Gehrke. Injecting utility into anonymized datasets. In SIGMOD, pages 217--228, 2006.
[14]
N. Koudas, D. Srivastava, T. Yu, and Q. Zhang. Aggregate query answering on anonymized tables. In ICDE, pages 116--125, 2007.
[15]
S. L. Kullback and R. A. Leibler. On information and sufficiency. Ann. Math. Stat., 22:79--86, 1951.
[16]
D. Lambert. Measures of disclosure risk and harm. J. Official Stat., 9:313--331, 1993.
[17]
K. LeFevre, D. DeWitt, and R. Ramakrishnan. Mondrian multidimensional k-anonymity. In ICDE, page 25, 2006.
[18]
K. LeFevre, D. J. DeWitt, and R. Ramakrishnan. Workload-aware anonymization. In KDD, pages 277--286, 2006.
[19]
N. Li, T. Li, and S. Venkatasubramanian. t-closeness: Privacy beyond k-anonymity and l-diversity. In ICDE, pages 106--115, 2007.
[20]
T. Li and N. Li. Injector: Mining background knowledge for data anonymization. In ICDE, pages 446--455, 2008.
[21]
A. Machanavajjhala, J. Gehrke, D. Kifer, and M. Venkitasubramaniam. l-diversity: Privacy beyond k-anonymity. In ICDE, page 24, 2006.
[22]
D. J. Martin, D. Kifer, A. Machanavajjhala, J. Gehrke, and J. Y. Halpern. Worst-case background knowledge for privacy-preserving data publishing. In ICDE, pages 126--135, 2007.
[23]
M. E. Nergiz, M. Atzori, and C. Clifton. Hiding the presence of individuals from shared databases. In SIGMOD, pages 665--676, 2007.
[24]
V. Rastogi, D. Suciu, and S. Hong. The boundary between privacy and utility in data publishing. In VLDB, pages 531--542, 2007.
[25]
P. Samarati. Protecting respondent's privacy in microdata release. TKDE, 13(6):1010--1027, 2001.
[26]
L. Sweeney. k-anonymity: A model for protecting privacy. Int. J. Uncertain. Fuzz., 10(5):557--570, 2002.
[27]
K. Wang, B. C. M. Fung, and P. S. Yu. Template-based privacy preservation in classification problems. In ICDM, pages 466--473, 2005.
[28]
K. Wang, P. S. Yu, and S. Chakraborty. Bottom-up generalization: A data mining solution to privacy protection. In ICDM, pages 249--256, 2004.
[29]
R. C.-W. Wong, J. Li, A. W.-C. Fu, and K. Wang. ($\alpha$, k)-anonymity: an enhanced k-anonymity model for privacy preserving data publishing. In KDD, pages 754--759, 2006.
[30]
X. Xiao and Y. Tao. Anatomy: simple and effective privacy preservation. In VLDB, pages 139--150, 2006.
[31]
Y. Xu, B. C. M. Fung, K. Wang, A. W.-C. Fu, and J. Pei. Publishing sensitive transactions for itemset utility. In ICDM, pages 1109--1114, 2008.

Cited By

View all
  • (2024)Exploring the tradeoff between data privacy and utility with a clinical data analysis use caseBMC Medical Informatics and Decision Making10.1186/s12911-024-02545-924:1Online publication date: 30-May-2024
  • (2024)Unveiling Privacy Vulnerabilities: Investigating the Role of Structure in Graph DataProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3672013(4059-4070)Online publication date: 25-Aug-2024
  • (2024)Exploring the Tradeoff Between Privacy and Utility of Complete‐count Census Data Using a Multiobjective Optimization ApproachGeographical Analysis10.1111/gean.1238856:3(427-450)Online publication date: 30-Jan-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
KDD '09: Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
June 2009
1426 pages
ISBN:9781605584959
DOI:10.1145/1557019
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 28 June 2009

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. anonymity
  2. data mining
  3. data publishing
  4. privacy

Qualifiers

  • Research-article

Conference

KDD09

Acceptance Rates

Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)132
  • Downloads (Last 6 weeks)10
Reflects downloads up to 22 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Exploring the tradeoff between data privacy and utility with a clinical data analysis use caseBMC Medical Informatics and Decision Making10.1186/s12911-024-02545-924:1Online publication date: 30-May-2024
  • (2024)Unveiling Privacy Vulnerabilities: Investigating the Role of Structure in Graph DataProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3672013(4059-4070)Online publication date: 25-Aug-2024
  • (2024)Exploring the Tradeoff Between Privacy and Utility of Complete‐count Census Data Using a Multiobjective Optimization ApproachGeographical Analysis10.1111/gean.1238856:3(427-450)Online publication date: 30-Jan-2024
  • (2024)Location Privacy Protection Game Against Adversary Through Multi-User Cooperative ObfuscationIEEE Transactions on Mobile Computing10.1109/TMC.2023.324946523:3(2066-2077)Online publication date: Mar-2024
  • (2024)The Economics of Privacy and Utility: Investment StrategiesIEEE Transactions on Information Forensics and Security10.1109/TIFS.2023.334100819(1744-1755)Online publication date: 2024
  • (2024)HySAAD – A Hybrid Selection Approach for Anonymization by Design in the Automotive Domain2024 25th IEEE International Conference on Mobile Data Management (MDM)10.1109/MDM61037.2024.00044(203-210)Online publication date: 24-Jun-2024
  • (2024)Protecting Your Online Persona: A Preferential Selective Encryption Approach for Enhanced Privacy in Tweets, Images, Memes, and MetadataIEEE Access10.1109/ACCESS.2024.341566312(86403-86424)Online publication date: 2024
  • (2024)Defending novice user privacy: An evaluation of default web browser configurationsComputers & Security10.1016/j.cose.2024.103784(103784)Online publication date: Feb-2024
  • (2024)Incremental federated learning for traffic flow classification in heterogeneous data scenariosNeural Computing and Applications10.1007/s00521-024-10281-4Online publication date: 12-Aug-2024
  • (2024)A novel approach for constructing privacy‐aware architecture utilizing Shannon's entropyConcurrency and Computation: Practice and Experience10.1002/cpe.803036:11Online publication date: 21-Jan-2024
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media