Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

How Adversarial Assumptions Influence Re-identification Risk Measures: A COVID-19 Case Study

  • Conference paper
  • First Online:
Privacy in Statistical Databases (PSD 2022)

Abstract

The COVID-19 pandemic highlights the need for broad dissemination of case surveillance data. Local and global public health agencies have initiated efforts to do so, but there remains limited data available, due in part to concerns over privacy. As a result, current COVID-19 case surveillance data sharing policies are based on strong adversarial assumptions, such as the expectation that an attacker can readily re-identify individuals based on their distinguishability in a dataset. There are various re-identification risk measures to account for adversarial capabilities; however, the current array insufficiently accounts for real world data challenges - particularly issues of missing records in resources of identifiable records that adversaries may rely upon to execute attacks (e.g., 10 50-year-old male in the de-identified dataset vs. 5 50-year-old male in the identified dataset). In this paper, we introduce several approaches to amend such risk measures and assess re-identification risk in light of how an attacker’s capabilities relate to missing records. We demonstrate the potential for these measures through a record linkage attack using COVID-19 case surveillance data and voter registration records in the state of Florida. Our findings demonstrate that adversarial assumptions, as realized in a risk measure, can dramatically affect re-identification risk estimation. Notably, we show that the re-identification risk is likely to be substantially smaller than the typical risk thresholds, which suggests that more detailed data could be shared publicly than is currently the case.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Sohrabi, C., et al.: World Health Organization declares global emergency: a review of the 2019 novel coronavirus (COVID-19). Int. J. Surg. 76, 71–76 (2020)

    Article  Google Scholar 

  2. Rodriguez-Morales, A.J., et al.: Clinical, laboratory and imaging features of COVID-19: a systematic review and meta-analysis. Travel Med. Infect. Dis. 34, 101623 (2020)

    Article  Google Scholar 

  3. CDC national surveillance. https://www.cdc.gov/coronavirus/2019-ncov/covid-data/faq-surveillance. html#:~:text=CDC%20uses%20national%20case%20surveillance,identify%20groups%20most%20at%20ris. Accessed 20 May 2022

  4. Kostkova, P.: Disease surveillance data sharing for public health: the next ethical frontiers. Life Sci. Soc. Policy 14(1), 1–5 (2018). https://doi.org/10.1186/s40504-018-0078-x

    Article  Google Scholar 

  5. Ienca, M., Vayena, E.: On the responsible use of digital data to tackle the COVID-19 pandemic. Nat. Med. 26, 463–464 (2020)

    Article  Google Scholar 

  6. World Health Organization: Global Surveillance for COVID-19 Caused by Human Infection with COVID-19 Virus: Interim Guidance. World Health Organization, Geneva (2020)

    Google Scholar 

  7. Lee, B., et al.: Protecting privacy and transforming COVID-19 case surveillance datasets for public use. Public Health Methodol. 136(5), 554–561 (2021)

    Google Scholar 

  8. COVID-19 Case Surveillance Public Use Data. https://data.cdc.gov/Case-Surveillance/COVID-19-Case-Surveillance-Public-Use-Data/vbim-akqf. Accessed 20 May 2022

  9. French, M., Monahan, T.: Disease surveillance: how might surveillance studies address Covid-19? Surveill. Soc. 18(1), 1–11 (2020)

    Article  Google Scholar 

  10. Ioannou, A., Tussyadiah, I.: Privacy and surveillance attitudes during health crises: acceptance of surveillance and privacy protection behaviours. Technol. Soc. 67(101774) (2021)

    Google Scholar 

  11. Allam, Z., Jones, D.S.: On the Coronavirus (COVID-19) Outbreak and the smart city network: universal data sharing standards coupled with artificial intelligence (AI) to benefit urban health monitoring and management. Healthcare 8(1), 46 (2020)

    Google Scholar 

  12. Dalenius, T.: Finding a needle in a haystack – or identifying anonymous census record. J. Official Stat. 2(3), 329–336 (1986)

    Google Scholar 

  13. Sweeney, L.: k-anonymity: a model for protecting privacy. Int. J. Uncert. Fuzz. Knowl. Based Syst. 10(5), 557–570 (2002)

    Article  MathSciNet  Google Scholar 

  14. Durham, E., Xue, Y., Kantarcioglu, M., Malin, B: Private medical record linkage with approximate matching. In: AMIA Annual Symposium Proceedings 2010, pp. 182–186 (2010)

    Google Scholar 

  15. Benitez, K., Malin, B.: Evaluating re-identification risks with respect to the HIPAA privacy rule. J. Am. Med. Inf. Assoc. JAMIA 17(2), 169–177 (2010)

    Article  Google Scholar 

  16. Skinner, C., Holmes, D.: Estimating the re-identification risk per record in microdata. J. Official Stat. 14(4), 361–372 (1998)

    Google Scholar 

  17. Sweeney, L.: Simple demographics often identify people uniquely. Technical Report LIDAP-WP3, Carnegie Mellon University (2000). https://dataprivacylab.org/projects/identifiability/paper1.pdf. Accessed 21 May 2022

  18. Centers for Disease Control and Prevention, Agency for Toxic Substances and Disease Registry. Policy on public health research and non-research data management and Access. https://www.cdc.gov/maso/policy/policy385.pdf. Accessed 20 May 2022

  19. Xia, W., et al.: Enabling realistic health data re-identification risk assessment through adversarial modeling. J. Am. Med. Inform. Assoc. 28(4), 744–752 (2021)

    Google Scholar 

  20. Xia, W., Kantarcioglu, M., Wan, Z., Heatherly, R., Vorobeychik, Y., Malin, BA.: Process-driven data privacy. In: 24th ACM International on Conference on Information and Knowledge Management (CIKM 2015) Proceedings, pp. 1021–1030. Association for Computing Machinery, New York, NY, USA (2015)

    Google Scholar 

  21. Koot, M.R., Noordende, G. van ‘t, de Laat C.: A study on the re-identifiability of Dutch citizens. In: 3rd Hot Topics in Privacy Enhancing Technologies (HotPETs 2010) Proceedings, pp. 35–49. Berlin, Germany (2010)

    Google Scholar 

  22. Golle, P.: Revisiting the uniqueness of simple demographics in the US population. In: 5th ACM Workshop on Privacy in Electronic Society Proceedings, pp. 77–80. New York, NY, USA (2006)

    Google Scholar 

  23. Emam, K.E., Buckeridge, D., Tamblyn, R., Neisa, A., Jonker, E., Verma, A.: The re-identification risk of Canadians from longitudinal demographics. BMC Med. Inf. Dec. Mak. 11(1), 46 (2011)

    Article  Google Scholar 

  24. Emam, K.E., Dankar, F.K.: Protecting privacy using k-anonymity. J. Am. Med. Inform. Assoc. 15(5), 627–637 (2008)

    Article  Google Scholar 

  25. Dankar, FK., Emam, KE.: A method for evaluating marketer re-identification risk. In: 2010 EDBT/ ICDT Workshops Proceeding Article 28, pp. 1–10. Association for Computing Machinery, New York, NY, USA (2010)

    Google Scholar 

  26. Florida COVID-19 Case Line Data. https://open-fdoh.hub.arcgis.com/datasets/florida-covid19-case-line-data/about. Accessed 20 May 2022

  27. Institute of Medicine (IOM): Sharing clinical trial data: Maximizing benefits, minimizing risk. The National Academies Press, Washington, DC (2015)

    Google Scholar 

  28. European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use, Revision 4. http://www.ema.europa.eu/ema/index.jsp?curl=pages/regulation/general/general_content_001799.jsp&mid=WC0b01ac0580b2f6ba. Accessed 20 May 2022

  29. Brown, J.T., et al.: Dynamically adjusting case reporting policy to maximize privacy and public health utility in the face of a pandemic. J. Am. Med. Inform. Assoc. 29(5), 853–863 (2022)

    Article  Google Scholar 

  30. Wan, Z., et al.: A game theoretic framework for analyzing re-identification risk. PLoS ONE 10(3), e0120592 (2015)

    Article  Google Scholar 

Download references

Funding

This study was supported by the funding sources: grants CNS-2029651 and CNS-2029661 from the National Science Foundation and training grant T15LM007450 from the National Library of Medicine.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xinmeng Zhang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Zhang, X. et al. (2022). How Adversarial Assumptions Influence Re-identification Risk Measures: A COVID-19 Case Study. In: Domingo-Ferrer, J., Laurent, M. (eds) Privacy in Statistical Databases. PSD 2022. Lecture Notes in Computer Science, vol 13463. Springer, Cham. https://doi.org/10.1007/978-3-031-13945-1_25

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-13945-1_25

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-13944-4

  • Online ISBN: 978-3-031-13945-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics