Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Online Algorithm for Differentially Private Genome-wide Association Studies

Published: 05 March 2021 Publication History

Abstract

Digitization of healthcare records contributed to a large volume of functional scientific data that can help researchers to understand the behaviour of many diseases. However, the privacy implications of this data, particularly genomics data, have surfaced recently as the collection, dissemination, and analysis of human genomics data is highly sensitive. There have been multiple privacy attacks relying on the uniqueness of the human genome that reveals a participant or a certain group’s presence in a dataset. Therefore, the current data sharing policies have ruled out any public dissemination and adopted precautionary measures prior to genomics data release, which hinders timely scientific innovation. In this article, we investigate an approach that only releases the statistics from genomic data rather than the whole dataset and propose a generalized Differentially Private mechanism for Genome-wide Association Studies (GWAS). Our method provides a quantifiable privacy guarantee that adds noise to the intermediate outputs but ensures satisfactory accuracy of the private results. Furthermore, the proposed method offers multiple adjustable parameters that the data owners can set based on the optimal privacy requirements. These variables are presented as equalizers that balance between the privacy and utility of the GWAS. The method also incorporates Online Bin Packing technique [1], which further bounds the privacy loss linearly, growing according to the number of open bins and scales with the incoming queries. Finally, we implemented and benchmarked our approach using seven different GWAS studies to test the performance of the proposed methods. The experimental results demonstrate that for 1,000 arbitrary online queries, our algorithms are more than 80% accurate with reasonable privacy loss and exceed the state-of-the-art approaches on multiple studies (i.e., EigenStrat, LMM, TDT).

Supplementary Material

a13-aziz-apndx.pdf (aziz.zip)
Supplemental movie, appendix, image and software files for, (article title Name)

References

[1]
Joan Boyar, Shahin Kamali, Kim S. Larsen, and Alejandro López-Ortiz. 2016. Online bin packing with advice. Algorithmica 74, 1 (2016), 507--527.
[2]
Robert H. Miller and Ida Sim. 2004. Physicians’ use of electronic medical records: Barriers and solutions. Health Affairs 23, 2 (2004), 116--126.
[3]
Guy Paré, Louis Raymond, Ana Ortiz de Guinea, Placide Poba-Nzaou, Marie-Claude Trudel, Josianne Marsan, and Thomas Micheneau. 2015. Electronic health record usage behaviors in primary care medical practices: A survey of family physicians in Canada. Int. J. Med. Inform. 84, 10 (2015), 857--867.
[4]
Muhammad Naveed, Erman Ayday, Ellen W. Clayton, Jacques Fellay, Carl A. Gunter, Jean-Pierre Hubaux, Bradley A. Malin, and XiaoFeng Wang. 2015. Privacy in the genomic era. ACM Comput. Surveys 48, 1 (2015), 6.
[5]
Md Momin Al Aziz, Md Nazmus Sadat, Dima Alhadidi, Shuang Wang, Xiaoqian Jiang, Cheryl L. Brown, and Noman Mohammed. 2019. Privacy-preserving techniques of genomic data—A survey. Brief. Bioinform. 20, 3 (2019), 887--895. https://doi.org/10.1093/bib/bbx139
[6]
Alexandros Mittos, Bradley Malin, and Emiliano De Cristofaro. 2019. Systematizing genome privacy research: A privacy-enhancing technologies perspective. Proc. Privacy Enhanc. Technol. 2019, 1 (2019), 87--107.
[7]
Bradley Malin, Kenneth Goodman et al. 2018. Between access and privacy: Challenges in sharing health data. Yearbook Med. Info. 27, 1 (2018), 055--059.
[8]
The Personal Information Protection and Electronic Documents Act (PIPEDA). [n.d.]. Retrieved from https://goo.gl/TScuoW.
[9]
Peter Kilbridge. 2003. The cost of HIPAA compliance. New England J. Med. 348, 15 (2003), 1423.
[10]
Cynthia Dwork, Frank McSherry, Kobbi Nissim, and Adam Smith. 2006. Calibrating noise to sensitivity in private data analysis. In Theory of Cryptography. Springer, 265--284.
[11]
Cynthia Dwork. 2006. Differential privacy. In Proceedings of the 33rd International Conference on Automata, Languages and Programming—Volume Part II (ICALP’06). 1--12.
[12]
J. Hsu, M. Gaboardi, A. Haeberlen, S. Khanna, A. Narayan, B. C. Pierce, and A. Roth. 2014. Differential privacy: An economic method for choosing epsilon. In Proceedings of the IEEE 27th Computer Security Foundations Symposium. 398--410.
[13]
Andreas Haeberlen, Benjamin C. Pierce, and Arjun Narayan. 2011. Differential privacy under fire. In Proceedings of the USENIX Security Symposium.
[14]
Matthew Fredrikson, Eric Lantz, Somesh Jha, Simon Lin, David Page, and Thomas Ristenpart. 2014. Privacy in pharmacogenetics: An end-to-end case study of personalized warfarin dosing. In Proceedings of the 23rd USENIX Security Symposium (USENIXSecurity’14). 17--32.
[15]
Md Momin Al Aziz, Reza Ghasemi, Md Waliullah, and Noman Mohammed. 2017. Aftermath of bustamante attack on genomic beacon service. BMC Med. Genom. 10, 2 (2017), 43.
[16]
Moritz Hardt and Guy N. Rothblum. 2010. A multiplicative weights mechanism for privacy-preserving data analysis. In Proceedings of the 51st Annual IEEE Symposium on Foundations of Computer Science (FOCS’10). IEEE, 61--70.
[17]
Fei Yu, Michal Rybar, Caroline Uhler, and Stephen E. Fienberg. 2014. Differentially-private logistic regression for detecting multiple-SNP association in GWAS databases. In Proceedings of the International Conference on Privacy in Statistical Databases. Springer, 170--184.
[18]
Shuang Wang, Noman Mohammed, and Rui Chen. 2014. Differentially private genome data dissemination through top-down specialization. BMC Med. Info. Decision Making 14, 1 (2014), S2.
[19]
Caroline Uhlerop, Aleksandra Slavković, and Stephen E. Fienberg. 2013. Privacy-preserving data sharing for genome-wide association studies. J. Privacy Confidential. 5, 1 (2013), 137.
[20]
Aaron Johnson and Vitaly Shmatikov. 2013. Privacy-preserving data exploration in genome-wide association studies. In Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 1079--1087.
[21]
Yuichi Sei and Akihiko Ohsuga. 2017. Privacy-preserving Chi-squared testing for genome SNP databases. In Proceedings of the 39th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC’17). IEEE, 3884--3889.
[22]
Florian Tramèr, Zhicong Huang, Jean-Pierre Hubaux, and Erman Ayday. 2015. Differential privacy with bounded priors: Reconciling utility and privacy in genome-wide association studies. In Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security. ACM, 1286--1297.
[23]
Sean Simmons and Bonnie Berger. 2016. Realizing privacy preserving genome-wide association studies. Bioinformatics 32, 9 (2016), 1293--1300.
[24]
Fei Yu, Stephen E. Fienberg, Aleksandra B. Slavković, and Caroline Uhler. 2014. Scalable privacy-preserving data sharing methodology for genome-wide association studies. J. Biomed. Inform. 50 (2014), 133--141.
[25]
Sean Simmons, Cenk Sahinalp, and Bonnie Berger. 2016. Enabling privacy-preserving GWASs in heterogeneous human populations. Cell Syst. 3, 1 (2016), 54--61.
[26]
Meng Wang, Zhanglong Ji, Shuang Wang, Jihoon Kim, Hai Yang, Xiaoqian Jiang, and Lucila Ohno-Machado. 2017. Mechanisms to protect the privacy of families when using the transmission disequilibrium test in genome-wide association studies. Bioinformatics 33, 23 (2017), 3716--3725.
[27]
Md Nazmus Sadat, Md Momin Al Aziz, Noman Mohammed, Feng Chen, Xiaoqian Jiang, and Shuang Wang. 2019. SAFETY: Secure GWAS in federated environment through a hYbrid solution. IEEE/ACM Trans. Comput. Biol. Bioinform. 16, 1 (2019), 93--102.
[28]
Junfeng Fan and Frederik Vercauteren. 2012. Somewhat practical fully homomorphic encryption. IACR Cryptol. ePrint Arch. 2012 (2012), 144.
[29]
Cynthia Dwork, Aaron Roth, et al. 2014. The algorithmic foundations of differential privacy. Found. Trends Theor. Comput. Sci. 9, 3–4 (2014), 211--407.
[30]
Frank D. McSherry. 2009. Privacy integrated queries: An extensible platform for privacy-preserving data analysis. In Proceedings of the ACM SIGMOD International Conference on Management of Data. 19--30.
[31]
Indrajit Roy, Srinath T. V. Setty, Ann Kilzer, Vitaly Shmatikov, and Emmett Witchel. 2010. Airavat: Security and privacy for MapReduce. In Proceedings of the USENIX Symposium on Networked Systems Design and Implementation (NSDI’10), Vol. 10. 297--312.
[32]
Jean Louis Raisaro, Juan Ramón Troncoso-Pastoriza, Mickaël Misbach, João Sá Sousa, Sylvain Pradervand, Edoardo Missiaglia, Olivier Michielin, Bryan Ford, and Jean-Pierre Hubaux. 2018. Med Co: Enabling secure and privacy-preserving exploration of distributed clinical and genomic data. IEEE/ACM Trans. Comput. Biol. Bioinform. 16, 4 (2018), 1328--1341.
[33]
Jean Louis Raisaro, Gwangbae Choi, Sylvain Pradervand, Raphael Colsenet, Nathalie Jacquemont, Nicolas Rosat, Vincent Mooser, and Jean-Pierre Hubaux. 2018. Protecting privacy and security of genomic data in I2B2 with homomorphic encryption and differential privacy. IEEE/ACM Trans. Comput. Biol. Bioinform. 15, 5 (2018), 1413--1426.
[34]
Greg Gibson. 2018. Population genetics and GWAS: A primer. PLoS Biol. 16, 3 (2018), e2005485.
[35]
A. J. Paverd, Andrew Martin, and Ian Brown. 2014. Modelling and automatically analysing privacy properties for honest-but-curious adversaries. Technical Report.
[36]
Harmonic Series. [n.d.]. Retrieved from https://en.wikipedia.org/wiki/Harmonic_series_(mathematics).
[37]
Eric W. Weisstein. [n.d.]. Block-Stacking problem. https://mathworld.wolfram.com/BookStackingProblem.html.
[38]
Peter Kairouz, Sewoong Oh, and Pramod Viswanath. 2017. The composition theorem for differential privacy. IEEE Trans. Info. Theory 63, 6 (2017), 4037--4049.
[39]
Stanley L. Warner. 1965. Randomized response: A survey technique for eliminating evasive answer bias. J. Amer. Statist. Assoc. 60, 309 (1965), 63--69.
[40]
Laura Clarke, Xiangqun Zheng-Bradley, Richard Smith, Eugene Kulesha, Chunlin Xiao, Iliana Toneva, Brendan Vaughan, Don Preuss, Rasko Leinonen, Martin Shumway, et al. 2012. The 1,000 genomes project: Data management and community access. Nature Methods 9, 5 (2012), 459.
[41]
Differential Privacy GWAS-implementation. [n.d.]. Retrieved from https://github.com/mominbuet/DifferentialPrivacyGWAS.
[42]
Lon R. Cardon and Lyle J. Palmer. 2003. Population stratification and spurious allelic association. Lancet 361, 9357 (2003), 598--604.
[43]
Nour Almadhoun, Erman Ayday, and Özgür Ulusoy. 2020. Inference attacks against differentially private query results from genomic datasets including dependent tuples. Bioinformatics 36, Supplement 1 (2020), i136–i145.
[44]
William S. Bush and Jason H. Moore. 2012. Genome-wide association studies. PLoS Comput. Biol. 8, 12 (2012), e1002822.
[45]
Steven S. Seiden. 2002. On the online bin packing problem. J. ACM 49, 5 (2002), 640--671.
[46]
M. R. Garey and D. S. Johnson. 1981. Approximation algorithms for Bin packing problems: A survey. In Analysis and Design of Algorithms in Combinatorial Optimization. International Centre for Mechanical Sciences (Courses and Lectures), vol 266, G. Ausiello and M. Lucertini (Eds.). Springer.

Cited By

View all
  • (2024)Genomic privacy preservation in genome-wide association studies: taxonomy, limitations, challenges, and visionBriefings in Bioinformatics10.1093/bib/bbae35625:5Online publication date: 29-Jul-2024
  • (2022)Secure and distributed assessment of privacy-preserving GWAS releasesProceedings of the 23rd ACM/IFIP International Middleware Conference10.1145/3528535.3565253(308-321)Online publication date: 7-Nov-2022
  • (2022)Generalized Genomic Data Sharing for Differentially Private Federated LearningJournal of Biomedical Informatics10.1016/j.jbi.2022.104113(104113)Online publication date: Jun-2022

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Computing for Healthcare
ACM Transactions on Computing for Healthcare  Volume 2, Issue 2
April 2021
226 pages
EISSN:2637-8051
DOI:10.1145/3446675
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 05 March 2021
Accepted: 01 October 2020
Revised: 01 September 2020
Received: 01 April 2020
Published in HEALTH Volume 2, Issue 2

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Online algorithm for privacy
  2. differentially private GWAS
  3. genomics data privacy

Qualifiers

  • Research-article
  • Research
  • Refereed

Funding Sources

  • CPRIT Scholar in Cancer Research
  • NSERC Discovery Grants
  • National Institute of Health (NIH)

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)15
  • Downloads (Last 6 weeks)2
Reflects downloads up to 23 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Genomic privacy preservation in genome-wide association studies: taxonomy, limitations, challenges, and visionBriefings in Bioinformatics10.1093/bib/bbae35625:5Online publication date: 29-Jul-2024
  • (2022)Secure and distributed assessment of privacy-preserving GWAS releasesProceedings of the 23rd ACM/IFIP International Middleware Conference10.1145/3528535.3565253(308-321)Online publication date: 7-Nov-2022
  • (2022)Generalized Genomic Data Sharing for Differentially Private Federated LearningJournal of Biomedical Informatics10.1016/j.jbi.2022.104113(104113)Online publication date: Jun-2022

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media