research-article

Public Access

Process-Driven Data Privacy

Authors:

Murat Kantarcioglu,

Raymond Heatherly,

Yevgeniy Vorobeychik,

Bradley MalinAuthors Info & Claims

CIKM '15: Proceedings of the 24th ACM International on Conference on Information and Knowledge Management

Pages 1021 - 1030

https://doi.org/10.1145/2806416.2806580

Published: 17 October 2015 Publication History

Abstract

The quantity of personal data gathered by service providers via our daily activities continues to grow at a rapid pace. The sharing, and the subsequent analysis of, such data can support a wide range of activities, but concerns around privacy often prompt an organization to transform the data to meet certain protection models (e.g., k-anonymity or ε-differential privacy). These models, however, are based on simplistic adversarial frameworks, which can lead to both under- and over-protection. For instance, such models often assume that an adversary attacks a protected record exactly once. We introduce a principled approach to explicitly model the attack process as a series of steps. Specifically, we engineer a factored Markov decision process (FMDP) to optimally plan an attack from the adversary's perspective and assess the privacy risk accordingly. The FMDP captures the uncertainty in the adversary's belief (e.g., the number of identified individuals that match the de-identified data) and enables the analysis of various real world deterrence mechanisms beyond a traditional protection model, such as a penalty for committing an attack. We present an algorithm to solve the FMDP and illustrate its efficiency by simulating an attack on publicly accessible U.S. census records against a real identified resource of over 500,000 individuals in a voter registry. Our results demonstrate that while traditional privacy models commonly expect an adversary to attack exactly once per record, an optimal attack in our model may involve exploiting none, one, or more individuals in the pool of candidates, depending on context.

References

[1]

A. Bharadwaj, O. El Sawy, P. Pavlou, et al. Digital business strategy: toward a next generation of insights. MIS Quarterly, 37(2):471--482, 2013.

Digital Library

[2]

L. Bonomi and L. Xiong. A two-phase algorithm for mining sequential patterns with differential privacy. In Proc. 22nd ACM Int'l Conf. on Inform. and Knowl. Management, pages 269--278, 2013.

Digital Library

[3]

R. Dewri, I. Ray, I. Ray, et al. POkA: Identifying pareto-optimal k-anonymous nodes in a domain hierarchy lattice. In Proc. 18th ACM Conf. on Inform. and Knowl. Management, pages 1037--1046, 2009.

Digital Library

[4]

C. Dwork. Differential privacy. In Proc. Int'l Colloquium on Automata, Languages, and Programming, pages 1--12. Springer, 2006.

Digital Library

[5]

C. Dwork. The promise of differential privacy: A tutorial on algorithmic techniques. In Proc. IEEE Annual Symp. on Foundations of Computer Science, pages 1--12, 2011.

Digital Library

[6]

K. El Emam, E. Jonker, L. Arbuckle, and B. Malin. A systematic review of re-identification attacks on health data. PLoS ONE, 6(12):e28071, 2010.

[7]

M. Elliot and A. Dale. Scenarios of attack: the data intruder's perspective on statistical disclosure risk. Netherlands Official Statistics, 14:6--10, 1999.

[8]

L. Fan and L. Xiong. Real-time aggregate monitoring with differential privacy. In Proc. 21st ACM Int'l Conf. on Inform. and Knowl. Management, pages 2169--2173, 2012.

Digital Library

[9]

D. Freni, C. Ruiz Vicente, S. Mascetti, et al. Preserving location and absence privacy in geo-social networks. In Proc. 19th ACM Int'l Conf. on Inform. and Knowl. Management, pages 309--318, 2010.

Digital Library

[10]

B. C. M. Fung, K. Wang, R. Chen, and P. S. Yu. Privacy-preserving data publishing: A survey of recent developments. ACM Computing Surveys, 42(4), 2010.

Digital Library

[11]

C. Guestrin, D. Koller, R. Parr, and S. Venkataraman. Efficient solution algorithms for factored MDPs. CoRR, abs/1106.1822, 2011.

Digital Library

[12]

R. Jones, R. Kumar, B. Pang, and A. Tomkins. "I know what you did last summer": query logs and user privacy. In Proc. 16th ACM Conf. on Inform. and Knowl. Management, pages 909--914, 2007.

Digital Library

[13]

O. Kwon, N. Lee, and B. Shin. Data quality management, data usage experience and acquisition intention of big data analytics. Int'l Journal of Inform. Management, 34(3):387--394, 2014.

[14]

D. Lambert. Measures of disclosure risk and harm. Journal of Official Statistics, 9:313--331, 1993.

[15]

J. Letchford and Y. Vorobeychik. Optimal interdiction of attack plans. In Proc. Int'l Conf. on Autonomous Agents and Multi-agent Systems, pages 199--206, 2013.

Digital Library

[16]

A. Machanavajjhala, D. Kifer, J. Gehrke, et al. l-diversity: Privacy beyond k-anonymity. ACM Trans. on Knowl. Discovery in Data, 1(1), 2007.

Digital Library

[17]

E. Mackey and M. Elliot. Understanding the data environment. XRDS, 20(1):36--39, Sept. 2013.

Digital Library

[18]

A. Narayanan and V. Shmatikov. De-anonymizing social networks. In Proc. 30th IEEE Symp. on Security and Privacy, pages 173--187, 2009.

Digital Library

[19]

A. Narayanan and V. Shmatikov. Myths and fallacies of "personally identifiable information". Communications of the ACM, 53(6):24--26, 2010.

Digital Library

[20]

M. E. Nergiz, M. Atzori, and C. Clifton. Hiding the presence of individuals from shared databases. In Proc. ACM SIGMOD Int'l Conf. on Management of Data, pages 665--676, 2007.

Digital Library

[21]

North Carolina Voter Registration Database, ftp://www.app.sboe.state.nc.us/data. Last accessed 27 Jan 2014.

[22]

M. L. Puterman. Markov Decision Processes: Discrete Stochastic Dynamic Programming. John Wiley & Sons, Inc., New York, NY, USA, 1st edition, 1994.

[23]

L. Roderick. Discipline and power in the digital age: the case of the US consumer data broker. Critical Sociology, 40(5):729--746, 2014.

[24]

P. Samarati and L. Sweeney. Protecting privacy when disclosing information: k-anonymity and its enforcement through generalization and suppression. Technical Report SRI-CSL-98-04, SRI Computer Science Laboratory, 1998.

[25]

D. Solove. A taxonomy of privacy. University of Pennsylvania Law Review, 154(3):477--560, 2006.

[26]

M. Srivatsa and M. Hicks. Deanonymizing mobility traces: using social network as a side-channel. In Proc. ACM Conf. on Computer and Communications Security, pages 628--637, 2012.

Digital Library

[27]

L. Sweeney. Uniqueness of simple demographics in the U.S. population. Technical report, Carnegie Mellon University, 2000.

[28]

P. Tallon. An application of game theory to understanding statistical disclosure events. UNECE/Eurostat Work Session on Data Confidentiality, 2009.

[29]

P. Tallon. Corporate governance of big data: perspectives on value, risk, and cost. IEEE Computer, 46(6):32--38, 2013.

Digital Library

[30]

A. Tanner. Harvard professor re-identifies anonymous volunteers in DNA study. Forbes, 2013.

[31]

Z. Wan, Y. Vorobeychik, W. Xia, et al. A game theoretic framework for analyzing re-identification risk. PLoS ONE, 10:e0120592, 2015.

Cited By

Wiepert DMalin BDuffy JUtianski RStricker JJones DBotha H(2024)Reidentification of Participants in Shared Clinical Data Sets: Experimental StudyJMIR AI10.2196/520543(e52054)Online publication date: 15-Mar-2024
https://doi.org/10.2196/52054
Brown JClayton EMatheny MKantarcioglu MVorobeychik YMalin B(2024)Robin Hood: A De-identification Method to Preserve Minority Representation for Disparities ResearchPrivacy in Statistical Databases10.1007/978-3-031-69651-0_5(67-83)Online publication date: 13-Sep-2024
https://doi.org/10.1007/978-3-031-69651-0_5
Zhang XWan ZYan CBrown JXia WGkoulalas-Divanis AKantarcioglu MMalin B(2022)How Adversarial Assumptions Influence Re-identification Risk Measures: A COVID-19 Case StudyPrivacy in Statistical Databases10.1007/978-3-031-13945-1_25(361-374)Online publication date: 14-Sep-2022
https://doi.org/10.1007/978-3-031-13945-1_25
Show More Cited By

Index Terms

Process-Driven Data Privacy
1. Security and privacy
  1. Human and societal aspects of security and privacy
2. Social and professional topics
  1. Computing / technology policy
    1. Privacy policies

Recommendations

Achieving k-anonymity privacy protection using generalization and suppression

Often a data holder, such as a hospital or bank, needs to share person-specific records in such a way that the identities of the individuals who are the subjects of the data cannot be determined. One way to achieve this is to have the released records ...
k-anonymity: a model for protecting privacy

Consider a data holder, such as a hospital or a bank, that has a privately held collection of person-specific, field structured data. Suppose the data holder wants to share a version of the data with researchers. How can a data holder release a version ...
How (not) to protect genomic data privacy in a distributed network: using trail re-identification to evaluate and design anonymity protection systems

The increasing integration of patient-specific genomic data into clinical practice and research raises serious privacy concerns. Various systems have been proposed that protect privacy by removing or encrypting explicitly identifying information, such ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

CIKM '15: Proceedings of the 24th ACM International on Conference on Information and Knowledge Management

October 2015

1998 pages

ISBN:9781450337946

DOI:10.1145/2806416

General Chairs:
James Bailey
The University of Melbourne
,
Alistair Moffat
The University of Melbourne
,
Program Chairs:
Charu C. Aggarwal
IBM
,
Maarten de Rijke
University of Amsterdam
,
Ravi Kumar
Google
,
Vanessa Murdock
Microsoft
,
Timos Sellis
RMIT University
,
Jeffrey Xu Yu
Chinese University of Hong Kong

Copyright © 2015 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 October 2015

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Conference

CIKM'15

Sponsor:

CIKM'15: 24th ACM International Conference on Information and Knowledge Management

October 18 - 23, 2015

Melbourne, Australia

Acceptance Rates

CIKM '15 Paper Acceptance Rate 165 of 646 submissions, 26%;

Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

Upcoming Conference

CIKM '25

Sponsor:
sigir
sigir

The 34th ACM International Conference on Information and Knowledge Management

November 10 - 14, 2025

Seoul , Republic of Korea

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

6
Total Citations
View Citations
435
Total Downloads

Downloads (Last 12 months)62
Downloads (Last 6 weeks)23

Reflects downloads up to 27 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Wiepert DMalin BDuffy JUtianski RStricker JJones DBotha H(2024)Reidentification of Participants in Shared Clinical Data Sets: Experimental StudyJMIR AI10.2196/520543(e52054)Online publication date: 15-Mar-2024
https://doi.org/10.2196/52054
Brown JClayton EMatheny MKantarcioglu MVorobeychik YMalin B(2024)Robin Hood: A De-identification Method to Preserve Minority Representation for Disparities ResearchPrivacy in Statistical Databases10.1007/978-3-031-69651-0_5(67-83)Online publication date: 13-Sep-2024
https://doi.org/10.1007/978-3-031-69651-0_5
Zhang XWan ZYan CBrown JXia WGkoulalas-Divanis AKantarcioglu MMalin B(2022)How Adversarial Assumptions Influence Re-identification Risk Measures: A COVID-19 Case StudyPrivacy in Statistical Databases10.1007/978-3-031-13945-1_25(361-374)Online publication date: 14-Sep-2022
https://doi.org/10.1007/978-3-031-13945-1_25
Samtani SKantarcioglu MChen H(2021)A Multi-Disciplinary Perspective for Conducting Artificial Intelligence-enabled Privacy AnalyticsACM Transactions on Management Information Systems10.1145/344750712:1(1-18)Online publication date: 17-Mar-2021
https://dl.acm.org/doi/10.1145/3447507
Wan ZVorobeychik YXia WLiu YWooders MGuo JYin ZClayton EKantarcioglu MMalin B(2021)Using game theory to thwart multistage privacy intrusions when sharing dataScience Advances10.1126/sciadv.abe99867:50Online publication date: 10-Dec-2021
https://doi.org/10.1126/sciadv.abe9986
Anindya IRoy HKantarcioglu MMalin BLim EWinslett MSanderson MFu ASun JCulpepper SLo EHo JDonato DAgrawal RZheng YCastillo CSun ATseng VLi C(2017)Building a Dossier on the CheapProceedings of the 2017 ACM on Conference on Information and Knowledge Management10.1145/3132847.3132951(1549-1558)Online publication date: 6-Nov-2017
https://dl.acm.org/doi/10.1145/3132847.3132951
Tsai YWang SSong CTing I(2016)Privacy and Utility Effects of k-anonymity on Association Rule HidingProceedings of the The 3rd Multidisciplinary International Social Networks Conference on SocialInformatics 2016, Data Science 201610.1145/2955129.2955169(1-6)Online publication date: 15-Aug-2016
https://dl.acm.org/doi/10.1145/2955129.2955169

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Figures

Tables

Media

View Table of Conten