Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Quantifying Interdependent Risks in Genomic Privacy

Published: 06 February 2017 Publication History
  • Get Citation Alerts
  • Abstract

    The rapid progress in human-genome sequencing is leading to a high availability of genomic data. These data is notoriously very sensitive and stable in time, and highly correlated among relatives. In this article, we study the implications of these familial correlations on kin genomic privacy. We formalize the problem and detail efficient reconstruction attacks based on graphical models and belief propagation. With our approach, an attacker can infer the genomes of the relatives of an individual whose genome or phenotype are observed by notably relying on Mendel’s Laws, statistical relationships between the genomic variants, and between the genome and the phenotype. We evaluate the effect of these dependencies on privacy with respect to the amount of observed variants and the relatives sharing them. We also study how the algorithmic performance evolves when we take these various relationships into account. Furthermore, to quantify the level of genomic privacy as a result of the proposed inference attack, we discuss possible definitions of genomic privacy metrics, and compare their values and evolution. Genomic data reveals Mendelian disorders and the likelihood of developing severe diseases, such as Alzheimer’s. We also introduce the quantification of health privacy, specifically, the measure of how well the predisposition to a disease is concealed from an attacker. We evaluate our approach on actual genomic data from a pedigree and show the threat extent by combining data gathered from a genome-sharing website as well as an online social network.

    References

    [1]
    Dakshi Agrawal and Charu C. Aggarwal. 2001. On the design and quantification of privacy preserving data mining algorithms. In Proceedings of the 20th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems. ACM, 247--255.
    [2]
    Erman Ayday, Emiliano De Cristofaro, Jean-Pierre Hubaux, and Gene Tsudik. 2015. Whole genome sequencing: Revolutionary medicine or privacy nightmare? Computer 2, 58--66.
    [3]
    Erman Ayday, A. Einolghozati, and Faramarz Fekri. 2012. BPRS: Belief Propagation based iterative Recommender System. In IEEE ISIT.
    [4]
    Erman Ayday and Faramarz Fekri. 2012a. Belief propagation based iterative trust and reputation management. IEEE Transactions on Dependable and Secure Computing 9, 3.
    [5]
    Erman Ayday and Faramarz Fekri. 2012b. BP-P2P: A belief propagation-based trust and reputation management for P2P networks. In SECON.
    [6]
    Erman Ayday, Jean Louis Raisaro, Urs Hengartner, Adam Molyneaux, and Jean-Pierre Hubaux. 2013a. Privacy-preserving processing of raw genomic data. In DPM’13.
    [7]
    Erman Ayday, Jean Louis Raisaro, Jean-Pierre Hubaux, and Jacques Rougemont. 2013b. Protecting and evaluating genomic privacy in medical tests and personalized medicine. In Proceedings of the ACM Workshop on Privacy in the Electronic Society (WPES’13).
    [8]
    Erman Ayday, Jean Louis Raisaro, Paul J. McLaren, Jacques Fellay, and Jean-Pierre Hubaux. 2013c. Privacy-preserving computation of disease risk by using genomic, clinical, and environmental data. HealthTech.
    [9]
    Pierre Baldi, Roberta Baronio, Emiliano De Cristofaro, Paolo Gasti, and Gene Tsudik. 2011. Countering GATTACA: Efficient and secure testing of fully-sequenced human genomes. In CCS’11.
    [10]
    Marina Blanton and Mehrdad Aliasgari. 2010. Secure outsourcing of DNA searching via finite automata. In DBSec’10.
    [11]
    Fons Bruekers, Stefan Katzenbeisser, Klaus Kursawe, and Pim Tuyls. 2008. Privacy-Preserving Matching of DNA Profiles. IACR Cryptology ePrint Archive 2008 (2008), 203.
    [12]
    Joshua T. Burdick, Wei-Min Chen, Gonçalo R. Abecasis, and Vivian G. Cheung. 2006. In silico method for inferring genotypes in pedigrees. Nature Genetics 38, 9, 1002--1004.
    [13]
    Christopher A. Cassa, Brian Schmidt, Isaac S. Kohane, and Kenneth D. Mandl. 2008. My sister’s keeper?: Genomic research and the identifiability of siblings. BMC Medical Genomics 1, 1, 32.
    [14]
    Jinghu Chen, Ajay Dholakia, Evangelos Eleflhetiou, Mac P. C. Fossotier, and Xiao-Yu Hu. 2002. Near optimum reduced-complexity decoding algorithm for LDPC codes. In IEEE ISIT’02.
    [15]
    Yangyi Chen, Bo Peng, XiaoFeng Wang, and Haixu Tang. 2012. Large-scale privacy-preserving mapping of human genomic sequences on hybrid clouds. In NDSS’12.
    [16]
    Scott D. Constable, Yuzhe Tang, Shuang Wang, Xiaoqian Jiang, and Steve Chapin. 2015. Privacy-preserving GWAS analysis on federated genomic datasets. BMC Medical Informatics and Decision Making 15, Suppl 5, S2.
    [17]
    George Danezis and Emiliano De Cristofaro. 2014. Fast and private genomic testing for disease susceptibility. Proceedings of the ACM Workshop on Privacy in the Electronic Society (WPES’14).
    [18]
    Claudia Diaz, Stefaan Seys, Joris Claessens, and Bart Preneel. 2003. Towards measuring anonymity. In Privacy Enhancing Technologies. Springer, 54--68.
    [19]
    Mentari Djatmiko, Arik Friedman, Roksana Boreli, Felix Lawrence, Brian Thorne, and Stephen Hardy. 2014. Secure evaluation protocol for personalized medicine. In Proceedings of the ACM Workshop on Privacy in the Electronic Society (WPES’14).
    [20]
    Radoje Drmanac, Andrew B. Sparks, Matthew J. Callow, Aaron L. Halpern, Norman L. Burns, Bahram G. Kermani, Paolo Carnevali, Igor Nazarenko, and others. 2010. Human genome sequencing using unchained base reads on self-assembling DNA nanoarrays. Science 327, 5961, 8--81.
    [21]
    Douglas S. Falconer and Trudy F. C. Mackay. 1996. Introduction to Quantitative Genetics (4th ed.). Addison Wesley Longman, Harlow, Essex, UK.
    [22]
    Stephen E. Fienberg, Aleksandra Slavkovic, and Caroline Uhler. 2011. Privacy preserving GWAS data sharing. In Proceedings of the IEEE 11th International Conference on Data Mining Workshops (ICDMW’11).
    [23]
    Maayan Fishelson and Dan Geiger. 2002. Exact genetic linkage computations for general pedigrees. Bioinformatics 18, Suppl 1, S189--S198.
    [24]
    Matthew Fredrikson, Eric Lantz, Somesh Jha, Simon Lin, David Page, and Thomas Ristenpart. 2013. Privacy in pharmacogenetics: An end-to-end case study of personalized warfarin dosing. In Proceedings of the 23rd USENIX Security Symposium (USENIX Security’13).
    [25]
    Jane Gitschier. 2009. Inferential genotyping of Y chromosomes in Latter-Day Saints founders and comparison to Utah samples in the HapMap project. The American Journal of Human Genetics 84, 2.
    [26]
    Bastian Greshake, Philipp E. Bayer, Helge Rausch, and Julia Reda. 2014. OpenSNP--a crowdsourced web resource for personal genomics. PLoS One 9, 3, e89204.
    [27]
    Melissa Gymrek, Amy L. McGuire, David Golan, Eran Halperin, and Yaniv Erlich. 2013. Identifying personal genomes by surname inference. Science 339, 6117.
    [28]
    Nils Homer, Szabolcs Szelinger, Margot Redman, David Duggan, Waibhav Tembe, Jill Muehling, John V. Pearson, Dietrich A. Stephan, Stanley F. Nelson, and David W. Craig. 2008. Resolving individuals contributing trace amounts of DNA to highly complex mixtures using high-density SNP genotyping microarrays. PLoS Genetics 4.
    [29]
    Mathias Humbert, Erman Ayday, Jean-Pierre Hubaux, and Amalio Telenti. 2013. Addressing the concerns of the Lacks family: Quantification of kin genomic privacy. In Proceedings of the 20th ACM Conference on Computer and Communications Security (CCS’13).
    [30]
    Mathias Humbert, Erman Ayday, Jean-Pierre Hubaux, and Amalio Telenti. 2014. Reconciling utility with privacy in genomics. Proceedings of the ACM Workshop on Privacy in the Electronic Society.
    [31]
    Mathias Humbert, Erman Ayday, Jean-Pierre Hubaux, and Amalio Telenti. 2015a. On non-cooperative genomic privacy. In International Conference on Financial Cryptography and Data Security. Springer.
    [32]
    Mathias Humbert, Kévin Huguenin, Joachim Hugonot, Erman Ayday, and Jean-Pierre Hubaux. 2015b. De-anonymizing genomic databases using phenotypic traits. In Proceedings on Privacy Enhancing Technologies (PoPETs’15).
    [33]
    Claus Skaanning Jensen, Augustine Kong, and Uffe Kjærulff. 1995. Blocking Gibbs sampling in very large probabilistic expert systems. International Journal of Human Computer Studies 42, 6, 647--666.
    [34]
    Finn V. Jensen and Frank Jensen. 1994. Optimal junction trees. In Proceedings of the 10th International Conference on Uncertainty in Artificial Intelligence. Morgan Kaufmann Publishers Inc., San Francisco, CA, 360--366.
    [35]
    Somesh Jha, Louis Kruger, and Vitaly Shmatikov. 2008. Towards practical privacy for genomic computation. In Proceedings of the 2008 IEEE Symposium on Security and Privacy 216--230.
    [36]
    Xiaoqian Jiang, Yongan Zhao, Xiaofeng Wang, Bradley Malin, Shuang Wang, Lucila Ohno-Machado, and Haixu Tang. 2014. A community assessment of privacy preserving techniques for human genomes. BMC Medical Informatics and Decision Making 14, Suppl 1, S1.
    [37]
    Aaron Johnson and Vitaly Shmatikov. 2013. Privacy-preserving data exploration in genome-wide association studies. In Proceedings of ACM International Conference on Knowledge Discovery and Data Mining. (2013).
    [38]
    Andrew D. Johnson and Christopher J. O’Donnell. 2009. An open access database of genome-wide association results. BMC Medical Genetics 10, 6.
    [39]
    Michael I. Jordan. 2004. Graphical models. Statistical Science 140--155.
    [40]
    Murat Kantarcioglu, Wei Jiang, Ying Liu, and Brad Malin. 2008. A cryptographic approach to securely share and query genomic sequences. IEEE Transactions on Information Technology in Biomedicine 12, 5, 606--617.
    [41]
    Nikolaos Karvelas, Andreas Peter, Stefan Katzenbeisser, Erik Tews, and Kay Hamacher. 2014. Privacy-preserving whole genome sequence processing through proxy-aided oram. In Proceedings of the 13th Workshop on Privacy in the Electronic Society. ACM, 1--10.
    [42]
    Miran Kim and Kristin Lauter. 2015. Private genome analysis through homomorphic encryption. BMC Medical Informatics and Decision Making 15, Suppl 5, S3.
    [43]
    Bonnie Kirkpatrick, Eran Halperin, and Richard M. Karp. 2010. Haplotype inference in complex pedigrees. Journal of Computational Biology 17, 3, 269--280.
    [44]
    Daphne Koller and Nir Friedman. 2009. Probabilistic Graphical Models: Principles and Techniques. MIT Press, Cambridge, MA.
    [45]
    Frank R. Kschischang, Brenda J. Frey, and Hans-Andrea Loeliger. 2001. Factor graphs and the sum-product algorithm. IEEE Transactions on Information Theory 47.
    [46]
    Steffen L. Lauritzen and Nuala A. Sheehan. 2003. Graphical models for genetic analyses. Statistical Science 489--514.
    [47]
    Yun Li, Cristen Willer, Serena Sanna, and Gonçalo Abecasis. 2009. Genotype imputation. Annual Review of Genomics and Human Genetics 10, 387.
    [48]
    Wen-Jie Lu, Yoshiji Yamada, and Jun Sakuma. 2015. Privacy-preserving genome-wide association studies on cloud environment using fully homomorphic encryption. BMC Medical Informatics and Decision Making 15, Suppl 5, S1.
    [49]
    Joris M. Mooij and Hilbert J. Kappen. 2007. Sufficient conditions for convergence of the sum--product algorithm. IEEE Transactions on Information Theory 53, 12, 4422--4437.
    [50]
    Kevin Murphy and others. 2001. The Bayes net toolbox for MATLAB. Computing Science and Statistics 33, 2, 1024--1034.
    [51]
    Kevin P. Murphy, Yair Weiss, and Michael I. Jordan. 1999. Loopy belief propagation for approximate inference: An empirical study. In Proceedings of the 15th Conference on Uncertainty in Artificial Intelligence. Morgan Kaufmann Publishers Inc., San Francisco, CA, 467--475.
    [52]
    Judea Pearl. 1988. Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan Kaufmann Publishers, Inc., San Francisco, CA.
    [53]
    Hossein Pishro-Nik and Faramarz Fekri. 2004. On decoding of low-density parity-check codes on the binary erasure channel. IEEE Transactions on Information Theory 50, 439--454.
    [54]
    Sahel Shariati Samani, Zhicong Huang, Erman Ayday, Mark Elliot, Jacques Fellay, Jean-Pierre Hubaux, and Zoltan Kutalik. 2015. Quantifying genomic privacy via inference attack with high-order SNV correlations. In IEEE Security and Privacy Workshops (SPW’15). IEEE, 32--40.
    [55]
    Andrei Serjantov and George Danezis. 2003. Towards an information theoretic metric for anonymity. In Privacy Enhancing Technologies. Springer, 41--53.
    [56]
    Nuala A. Sheehan. 2000. On the application of Markov chain Monte Carlo methods to genetic analyses on complex pedigrees. International Statistical Review 68, 1, 83--110.
    [57]
    Reza Shokri, George Theodorakopoulos, J.-Y. Le Boudec, and J.-P. Hubaux. 2011. Quantifying location privacy. In IEEE Symposium on Security and Privacy.
    [58]
    Frank Stajano, Lucia Bianchi, Pietro Liò, and Douwe Korff. 2008. Forensic genomics: Kin privacy, driftnets and other open questions. In Proceedings of the 7th ACM Workshop on Privacy in the Electronic Society.
    [59]
    Latanya Sweeney, Akua Abu, and Julia Winn. 2013. Identifying participants in the personal genome project by name. Available at SSRN 2257732.
    [60]
    Alun Thomas, Alexander Gutin, Victor Abkevich, and Aruna Bansal. 2000. Multilocus linkage analysis by blocked Gibbs sampling. Statistics and Computing 10, 3, 259--269.
    [61]
    Juan Ramón Troncoso-Pastoriza, Stefan Katzenbeisser, and Mehmet Celik. 2007. Privacy preserving error resilient DNA searching through oblivious automata. Proceedings of the 14th ACM Conference on Computer and Communications Security (CCS’07).
    [62]
    Isabel Wagner. 2015. Genomic privacy metrics: A systematic comparison. International Workshop on Genome Privacy and Security (in Conjunction with IEEE Symposium on Security and Privacy).
    [63]
    Rui Wang, Yong Fuga Li, XiaoFeng Wang, Haixu Tang, and Xiaoyong Zhou. 2009a. Learning your identity and disease from research papers: Information leaks in genome wide association study. Proceedings of the 16th ACM CCS (2009), 534--544.
    [64]
    Rui Wang, XiaoFeng Wang, Zhou Li, Haixu Tang, Michael K. Reiter, and Zheng Dong. 2009b. Privacy-preserving genomic computation through program specialization. Proceedings of the 16th ACM Conference on Computer and Communications Security (CCS’09), 338--347.
    [65]
    Shuang Wang, Noman Mohammed, and Rui Chen. 2014. Differentially private genome data dissemination through top-down specialization. BMC Medical Informatics and Decision Making 14, Suppl 1, S2.
    [66]
    Wei Xie, Murat Kantarcioglu, William S. Bush, Dana Crawford, Joshua C. Denny, Raymond Heatherly, and Bradley A. Malin. 2014. SecureMA: Protecting participant privacy in genetic association meta-analysis. Bioinformatics 30, 23, 133--141.
    [67]
    Fei Yu, Stephen E. Fienberg, Aleksandra B. Slavkovic, and Caroline Uhler. 2014. Scalable privacy-preserving data sharing methodology for genome-wide association studies. Journal of Biomedical Informatics 50, 133--141.
    [68]
    Fei Yu and Zhanglong Ji. 2014. Scalable privacy-preserving data sharing methodology for genome-wide association studies: An application to iDASH healthcare privacy protection challenge. BMC Medical Informatics and Decision Making 14, Suppl 1, S3.
    [69]
    Yihua Zhang, Marina Blanton, and Ghada Almashaqbeh. 2015a. Secure distributed genome analysis for GWAS and sequence comparison computation. BMC Medical Informatics and Decision Making 15, Suppl 5, S4.
    [70]
    Yuchen Zhang, Wenrui Dai, Xiaoqian Jiang, Hongkai Xiong, and Shuang Wang. 2015b. FORESEE: Fully Outsourced secuRe gEnome Study basEd on homomorphic Encryption. BMC Medical Informatics and Decision Making 15, Suppl 5, S5.
    [71]
    Xiaoyong Zhou, Bo Peng, Yong Fuga Li, Yangyi Chen, Haixu Tang, and XiaoFeng Wang. 2011. To release or not to release: Evaluating information leaks in aggregate human-genome data. In ESORICS’11.

    Cited By

    View all
    • (2023)A game theoretic approach to balance privacy risks and familial benefitsScientific Reports10.1038/s41598-023-33177-013:1Online publication date: 28-Apr-2023
    • (2023)Privacy with Good TasteData Privacy Management, Cryptocurrencies and Blockchain Technology10.1007/978-3-031-25734-6_7(103-119)Online publication date: 24-Feb-2023
    • (2022)KGP Meter: Communicating Kin Genomic Privacy to the Masses2022 IEEE 7th European Symposium on Security and Privacy (EuroS&P)10.1109/EuroSP53844.2022.00033(410-429)Online publication date: Jun-2022
    • Show More Cited By

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Privacy and Security
    ACM Transactions on Privacy and Security  Volume 20, Issue 1
    February 2017
    99 pages
    ISSN:2471-2566
    EISSN:2471-2574
    DOI:10.1145/3038258
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 06 February 2017
    Accepted: 01 November 2016
    Revised: 01 June 2016
    Received: 01 December 2015
    Published in TOPS Volume 20, Issue 1

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Genomic privacy
    2. inference
    3. kinship
    4. metrics

    Qualifiers

    • Research-article
    • Research
    • Refereed

    Funding Sources

    • European Union's Horizon 2020 research and innovation programme under the Marie Skodowska-Curie
    • Scientific and Technological Research Council of Turkey, TUBITAK

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)26
    • Downloads (Last 6 weeks)1

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)A game theoretic approach to balance privacy risks and familial benefitsScientific Reports10.1038/s41598-023-33177-013:1Online publication date: 28-Apr-2023
    • (2023)Privacy with Good TasteData Privacy Management, Cryptocurrencies and Blockchain Technology10.1007/978-3-031-25734-6_7(103-119)Online publication date: 24-Feb-2023
    • (2022)KGP Meter: Communicating Kin Genomic Privacy to the Masses2022 IEEE 7th European Symposium on Security and Privacy (EuroS&P)10.1109/EuroSP53844.2022.00033(410-429)Online publication date: Jun-2022
    • (2022)Digital DNA lifecycle security and privacy: an overviewBriefings in Bioinformatics10.1093/bib/bbab60723:2Online publication date: 31-Jan-2022
    • (2022)Sociotechnical safeguards for genomic data privacyNature Reviews Genetics10.1038/s41576-022-00455-y23:7(429-445)Online publication date: 4-Mar-2022
    • (2022)Birds of a Feather: Collective Privacy of Online Social Activist GroupsComputers & Security10.1016/j.cose.2022.102614(102614)Online publication date: Jan-2022
    • (2021)DyPS: Dynamic, Private and Secure GWASProceedings on Privacy Enhancing Technologies10.2478/popets-2021-00252021:2(214-234)Online publication date: 29-Jan-2021
    • (2021)Using game theory to thwart multistage privacy intrusions when sharing dataScience Advances10.1126/sciadv.abe99867:50Online publication date: 10-Dec-2021
    • (2021)Methods of privacy-preserving genomic sequencing data alignmentsBriefings in Bioinformatics10.1093/bib/bbab15122:6Online publication date: 21-May-2021
    • (2021)Interdependent Privacy (IDP)Encyclopedia of Cryptography, Security and Privacy10.1007/978-3-642-27739-9_1544-1(1-4)Online publication date: 10-Feb-2021
    • Show More Cited By

    View Options

    Get Access

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media