Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Preserving Patient Privacy When Sharing Same-Disease Data

Published: 06 October 2016 Publication History

Abstract

Medical and health data are often collected for studying a specific disease. For such same-disease microdata, a privacy disclosure occurs as long as an individual is known to be in the microdata. Individuals in same-disease microdata are thus subject to higher disclosure risk than those in microdata with different diseases. This important problem has been overlooked in data-privacy research and practice, and no prior study has addressed this problem. In this study, we analyze the disclosure risk for the individuals in same-disease microdata and propose a new metric that is appropriate for measuring disclosure risk in this situation. An efficient algorithm is designed and implemented for anonymizing same-disease data to minimize the disclosure risk while keeping data utility as good as possible. An experimental study was conducted on real patient and population data. Experimental results show that traditional reidentification risk measures underestimate the actual disclosure risk for the individuals in same-disease microdata and demonstrate that the proposed approach is very effective in reducing the actual risk for same-disease data. This study suggests that privacy protection policy and practice for sharing medical and health data should consider not only the individuals’ identifying attributes but also the health and disease information contained in the data. It is recommended that data-sharing entities employ a statistical approach, instead of the HIPAA's Safe Harbor policy, when sharing same-disease microdata.

References

[1]
N. R. Adam and J. C. Wortmann. 1989. Security-control methods for statistical databases: A comparative study. ACM Computing Surveys 21, 4, 515--556.
[2]
R. C. Basole, M. L. Braunstein, and J. Sun. 2015. Data and analytics challenges for a learning healthcare system. ACM Journal of Data and Information Quality 6, 2--3, Article 10, 4.
[3]
L. Breiman, J. H. Friedman, R. A. Olshen, and C. J. Stone. 1984. Classification and Regression Trees. Wadsworth, Belmont, CA.
[4]
P. Christen, D. Vatsalan, and V. S. Verykios. 2014. Challenges for privacy preservation in data integration. ACM Journal of Data and Information Quality 5, 1--2, Article 4, 3.
[5]
Centers for Disease Control and Prevention (CDC). 1992. National Program of Cancer Registries. Retrieved August 11, 2016 from http://www.cdc.gov/cancer/npcr/about.htm.
[6]
T. Dalenius and S. P. Reiss. 1982. Data swapping: A technique for disclosure control. Journal of Statistical Planning and Inference 6, 1, 73--85.
[7]
Y. A. de Montjoye, C. A. Hidalgo, M. Verleysen, and V. D. Blondel. 2013. Unique in the crowd: The privacy bounds of human mobility. Scientific Reports 3, Article 1376.
[8]
Department of Health and Human Services (DHHS). 2000. Standards for privacy of individually identifiable health information. Federal Register 65, 250, 82462--82829.
[9]
Department of Health and Human Services (DHHS). 2002. Standards for privacy of individually identifiable health information. Federal Register 67, 157, 53181--53273.
[10]
G. T. Duncan and D. Lambert. 1989. The risk of disclosure for microdata. Journal of Business and Economic Statistics 7, 2, 201--217.
[11]
R. Garfinkel, R. Gopal, and S. Thompson. 2007. Releasing individually identifiable microdata with privacy protection against stochastic threat: An application to health information. Information Systems Research 18, 1, 23--41.
[12]
P. Golle. 2006. Revisiting the uniqueness of simple demographics in the US population. In Proceedings of the 5th ACM Workshop on Privacy in Electronic Society (WPES’06). ACM, New York, NY, 77--80.
[13]
K. LeFevre, D. J. DeWitt, and R. Ramakrishnan. 2006. Mondrian multidimensional k-anonymity. In Proceedings of the 22nd International Conference on Data Engineering (ICDE’06). IEEE Computer Society, Washington, DC, 25--35.
[14]
N. Li, T. Li, and S. Venkatasubramanian. 2007. t-Closeness: Privacy beyond k-anonymity and l-diversity. In Proceedings of the 23rd IEEE International Conference on Data Engineering (ICDE’07). IEEE Computer Society, Washington, DC, 106--115.
[15]
X. B. Li. 2009. A Bayesian approach for estimating and replacing missing categorical data. ACM Journal of Data and Information Quality 1, 1, Article 3, 11.
[16]
X. B. Li and S. Sarkar. 2009. Against classification attacks: A decision tree pruning approach to privacy protection in data mining. Operations Research 57, 6, 1496--1509.
[17]
X. B. Li and S. Sarkar. 2011. Protecting privacy against record linkage disclosure: A bounded swapping approach for numeric data. Information Systems Research 22, 4, 774--789.
[18]
X. B. Li and S. Sarkar. 2013. Class-restricted clustering and microperturbation for data privacy. Management Science 59, 4, 796--812.
[19]
X. B. Li and S. Sarkar. 2014. Digression and value concatenation to enable privacy-preserving regression. MIS Quarterly 38, 3, 679--698.
[20]
C. K. Liew, U. J. Choi, and C. J. Liew. 1985. A data distortion by probability distribution. ACM Transactions on Database Systems 10, 3, 395--411.
[21]
A. Machanavajjhala, J. Gehrke, D. Kifer, and M. Venkitasubramaniam. 2006. l-Diversity: privacy beyond k-anonymity. In Proceedings of the 22nd IEEE International Conference on Data Engineering. (ICDE’06). IEEE Computer Society, Washington, DC, 24--35.
[22]
S. E. Madnick, Y. W. Lee, R. Y. Wang, and H. Zhu. 2009. Overview and framework for data and information quality research. ACM Journal of Data and Information Quality 1, 1, Article 2, 22.
[23]
National Institutes of Health (NIH). 2003. NIH Data Sharing Policy and Implementation Guidance. Retrieved August 11, 2016 from http://grants.nih.gov/grants/policy/data_sharing/data_sharing_ guidance.htm.
[24]
National Science Foundation (NSF). 2011. Dissemination and Sharing of Research Results. Retrieved August 11, 2016 from http://www.nsf.gov/bfa/dias/policy/dmp.jsp.
[25]
L. L. Pipino, Y. W. Lee, and R. Y. Wang. 2002. Data quality assessment. Communications of the ACM 45, 4, 211--218.
[26]
L. Rabeneck, T. Menke, M. S. Simberkoff, P. M. Hartigan, G. M. Dickinson, P. C. Jensen, W. L. George, M. B. Goetz, and N. P. Wray. 2001. Using the national registry of HIV-infected veterans in research: Lessons for the development of disease registries. Journal of Clinical Epidemiology 54, 12, 1195--1203.
[27]
L. Sweeney. 2000. Uniqueness of simple demographics in the U.S. population. Working paper, LIDAP-WP4. Data Privacy Lab, Carnegie Mellon University, Pittsburgh, PA.
[28]
L. Sweeney. 2002 k-Anonymity: A model for protecting privacy. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems 10, 5, 557--570.
[29]
R. M. van Dam, W. C. Willett, J. E. Manson, and F. B. Hu. 2006. Coffee, caffeine, and risk of type 2 diabetes: A prospective cohort study in younger and middle-aged U.S. women. Diabetes Care 29, 2, 398--403.

Cited By

View all
  • (2024)HIV Client Perspectives on Digital Health in MalawiProceedings of the 2024 CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642245(1-13)Online publication date: 11-May-2024
  • (2022)A Privacy Protection Method for Medical Health Data2022 IEEE Smartworld, Ubiquitous Intelligence & Computing, Scalable Computing & Communications, Digital Twin, Privacy Computing, Metaverse, Autonomous & Trusted Vehicles (SmartWorld/UIC/ScalCom/DigitalTwin/PriComp/Meta)10.1109/SmartWorld-UIC-ATC-ScalCom-DigitalTwin-PriComp-Metaverse56740.2022.00322(2265-2271)Online publication date: Dec-2022
  • (2022)Regulatory Framework around Data Governance and External BenchmarkingJournal of Legal Affairs and Dispute Resolution in Engineering and Construction10.1061/(ASCE)LA.1943-4170.000052614:2Online publication date: May-2022
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Journal of Data and Information Quality
Journal of Data and Information Quality  Volume 7, Issue 4
Challenge Papers and Regular Papers
October 2016
57 pages
ISSN:1936-1955
EISSN:1936-1963
DOI:10.1145/3006343
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 06 October 2016
Accepted: 01 June 2016
Revised: 01 February 2016
Received: 01 October 2015
Published in JDIQ Volume 7, Issue 4

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Data sharing
  2. HIPAA
  3. disclosure risk

Qualifiers

  • Research-article
  • Research
  • Refereed

Funding Sources

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)11
  • Downloads (Last 6 weeks)0
Reflects downloads up to 23 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)HIV Client Perspectives on Digital Health in MalawiProceedings of the 2024 CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642245(1-13)Online publication date: 11-May-2024
  • (2022)A Privacy Protection Method for Medical Health Data2022 IEEE Smartworld, Ubiquitous Intelligence & Computing, Scalable Computing & Communications, Digital Twin, Privacy Computing, Metaverse, Autonomous & Trusted Vehicles (SmartWorld/UIC/ScalCom/DigitalTwin/PriComp/Meta)10.1109/SmartWorld-UIC-ATC-ScalCom-DigitalTwin-PriComp-Metaverse56740.2022.00322(2265-2271)Online publication date: Dec-2022
  • (2022)Regulatory Framework around Data Governance and External BenchmarkingJournal of Legal Affairs and Dispute Resolution in Engineering and Construction10.1061/(ASCE)LA.1943-4170.000052614:2Online publication date: May-2022
  • (2021)Tensions and Mitigations: Understanding Concerns and Values around Smartphone Data Collection for Public Health EmergenciesProceedings of the ACM on Human-Computer Interaction10.1145/34760715:CSCW2(1-31)Online publication date: 18-Oct-2021
  • (2020)Integrating Machine Learning with Blockchain to Ensure Data Privacy2020 11th International Conference on Computing, Communication and Networking Technologies (ICCCNT)10.1109/ICCCNT49239.2020.9225342(1-6)Online publication date: Jul-2020
  • (2020)Ensuring Data Privacy Using Machine Learning for Responsible Data ScienceIntelligent Data Engineering and Analytics10.1007/978-981-15-5679-1_49(507-514)Online publication date: 30-Aug-2020
  • (2018)Geolocation with respect to personal privacy for the Allergy Diary app - a MASK studyWorld Allergy Organization Journal10.1186/s40413-018-0194-311(15)Online publication date: 2018
  • (2017)Toward a Metadata Framework for Sharing Sensitive and Closed Data: An Analysis of Data Sharing Agreement AttributesMetadata and Semantic Research10.1007/978-3-319-70863-8_29(300-311)Online publication date: 14-Nov-2017

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media