Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1653662.1653703acmconferencesArticle/Chapter ViewAbstractPublication PagesccsConference Proceedingsconference-collections
research-article

Privacy-preserving genomic computation through program specialization

Published: 09 November 2009 Publication History
  • Get Citation Alerts
  • Abstract

    In this paper, we present a new approach to performing important classes of genomic computations (e.g., search for homologous genes) that makes a significant step towards privacy protection in this domain. Our approach leverages a key property of the human genome, namely that the vast majority of it is shared across humans (and hence public), and consequently relatively little of it is sensitive. Based on this observation, we propose a privacy-protection framework that partitions a genomic computation, distributing the part on sensitive data to the data provider and the part on the pubic data to the user of the data. Such a partition is achieved through program specialization that enables a biocomputing program to perform a concrete execution on public data and a symbolic execution on sensitive data. As a result, the program is simplified into an efficient query program that takes only sensitive genetic data as inputs. We prove the effectiveness of our techniques on a set of dynamic programming algorithms common in genomic computing. We develop a program transformation tool that automatically instruments a legacy program for specialization operations. We also demonstrate that our techniques can greatly facilitate secure multi-party computations on large biocomputing problems.

    References

    [1]
    Argo genome browser. http://www.genome.wi.mit.edu/annotation/argo/.
    [2]
    Jaligner: java implementation of the smith-waterman algorithm for biological sequence alignement. http://jaligner.sourceforge.net/.
    [3]
    Java2xml : A java to xml converter. https://java2xml.dev.java.net/.
    [4]
    Genetic variation program. http://www.genome.gov/10001551, 2008.
    [5]
    F. E. Allen. Control flow analysis. In Proceedings of a symposium on Compiler optimization, pages 1--19, 1970.
    [6]
    S. F. Altschul, W. Gish, W. Miller, E. W. Myers, and D. J. Lipman. Basic local alignment search tool. J Mol Biol, 215(3):403--410, 1990.
    [7]
    S. F. Altschul, T. L. Madden, A. A. Schaffer, J. Zhang, Z. Zhang, W. Miller, and D. J. Lipman. Gapped blast and psi-blast: a new generation of protein database search programs. Nucleic Acids Res, 25(17):3389--3402, Sep 1997.
    [8]
    L. O. Andersen. Program analysis and specialization for the c programming language. Phd thesis, Department of Computer Science, University of Copenhagen, May 1994.
    [9]
    S. Artzi, A. Kiezun, and N. Shomron. miRNAminer: a tool for homologous microRNA gene search. BMC Bioinformatics, 9:39, 2008.
    [10]
    M. J. Atallah, F. Kerschbaum, and W. Du. Secure and private sequence comparisons. In WPES '03: Proceedings of the 2003 ACM workshop on Privacy in the electronic society, pages 39--44, New York, NY, USA, 2003. ACM.
    [11]
    W.-Y. Au, D. Weise, and S. Seligman. Generating compiled simulations using partial evaluation. In DAC '93: Proceedings of the 28th Design Automation Conference, pages 205--210, New York, NY, USA, 1991. IEEE
    [12]
    G. J. Badros. Javaml: a markup language for java source code. In Proceedings of the 9th international World Wide Web conference on Computer networks: the international journal of computer and telecommunications netowrking, pages 159--177, Amsterdam, The Netherlands, The Netherlands, 2000. North-Holland Publishing Co.
    [13]
    D. E. Bell and L. J. LaPadula. Secure computer systems: Mathematical foundations. Technical Report ESD-TR-73-278, Hanscom AFB, Bed-ford, Mass., November 1973.
    [14]
    R. Bellman. Dynamic programming. Science, 153(3731):34-37, 1966.
    [15]
    F. Bruekers, S. Katzenbeisser, K. Kursawe, and P. Tuyls. Privacy-preserving matching of dna profiles. Technical Report Report 2008/203, ACR Cryptology ePrint Archive, 2008.
    [16]
    D. Brumley and D. Song. Privtrans: Automatically partitioning programs for privilege separation. In Proceedings of the 13th USENIX Security Symposium, August 2004.
    [17]
    N. E. Castellana, S. H. Payne, Z. Shen, M. Stanke, V. Bafna, and S. P. Briggs. Discovery and revision of Arabidopsis genes by proteogenomics. Proc. Natl. Acad. Sci. U.S.A., 105:21034-21038, Dec 2008.
    [18]
    S. Chong, J. Liu, A. C. Myers, X. Qi, K. Vikram, L. Zheng, and X. Zheng. Secure web application via automatic partitioning. In SOSP '07: Proceedings of twenty-first ACM SIGOPS symposium on Operating systems principles, pages 31--44, New York, NY, USA, 2007. ACM.
    [19]
    C. Consel and O. Danvy. Tutorial notes on partial evaluation. In POPL '93: Proceedings of the 20th ACM SIGPLAN-SIGACT symposium on Principles of programming languages, pages 493--501, New York, NY, USA, 1993. ACM.
    [20]
    C. Consel and S. C. Khoo. Semantics-directed generation of a prolog compiler. Sci. Comput. Program., 21(3):263--291, 1993.
    [21]
    M. Crochemore, G. M. Landau, and M. Ziv-Ukelson. A sub-quadratic sequence alignment algorithm for unrestricted cost matrices. In 13th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA 02), 2002.
    [22]
    S. A. de Carvalho Junior. Neobio - bioinformatics algorithms in java. http://neobio.sourceforge.net/.
    [23]
    D. E. Denning. A lattice model of secure information flow. Commun. ACM, 19(5):236--243, 1976.
    [24]
    J. Domingo-Ferrer, editor. Inference control in statistical databases: From theory to practice. Springer, 2002.
    [25]
    B. Dutertre and L. Moura. The YICES SMT Solver. http://yices.csl.sri.com/, as of 2008.
    [26]
    R. C. Edgar and S. Batzoglou. Multiple sequence alignment. Current Opinion in Structural Biology, 16(3):368--373, 2006.
    [27]
    R. Gibbs. The international hapmap project. Nature (London), 426:789, 2003.
    [28]
    R. Glück and J. Jorgensen. Efficient multi-level generating extensions for program specialization. In PLILPS '95: Proceedings of the 7th International Symposium on Programming Languages: Implementations, Logics and Programs, pages 259--278, London, UK, 1995. Springer-Verlag.
    [29]
    O. Goldreich, S.Micali, and A.Wigderson. How to play any mental game. In STOC, 1987.
    [30]
    O. Gotoh. An improved algorithm for matching biological sequences. J Mol Biol, 162(3):705--708, December 1982.
    [31]
    V. Goyal, S. K. Gupta, and A. Gupta. A unified audit expression model for auditing sql queries. In Proceeedings of the 22nd annual IFIP WG 11.3 working conference on Data and Applications Security, pages 33--47, Berlin, Heidelberg, 2008. Springer-Verlag.
    [32]
    N. Gupta, S. Tanner, N. Jaitly, J. N. Adkins, M. Lipton, R. Edwards, M. Romine, A. Osterman, V. Bafna, R. D. Smith, and P. A. Pevzner. Whole proteome analysis of post-translational modifications: applications of mass-spectrometry for proteogenomic annotation. Genome Res., 17:1362--1377, Sep 2007.
    [33]
    J. N. Hirschhorn and M. J. Daly. Genome-wide association studies for common diseases and complex traits. Nature Reviews Genetics, 6(2):95--108, February 2005.
    [34]
    S. Jha, L. Kruger, and V. Shmatikov. Towards practical privacy for genomic computation. In 2008 IEEE Symposium on Security and Privacy, 2008.
    [35]
    N. Jones, C. Gomard, and P. Sestoft. Partial Evaluation and Automatic Program Generation, C.A.R. Hoare Series. Prentice-Hall, 1993.
    [36]
    N. D. Jones, P. Sestoft, and H. Sondergaard. An experiment in partial evaluation: the generation of a compiler generator. In Proc. of the first international conference on Rewriting techniques and applications, pages 124--140, New York, NY, USA, 1985. Springer-Verlag New York, Inc.
    [37]
    J. Jorgensen. Generating a compiler for a lazy language by partial evaluation. In POPL '92: Proceedings of the 19th ACM SIGPLAN-SIGACT symposium on Principles of programming languages, pages 258--268, New York, NY, USA, 1992. ACM.
    [38]
    O. Keller, F. Odronitz, M. Stanke, M. Kollmar, and S. Waack. Scipio: using protein sequences to determine the precise exon/intron structures of genes and their orthologs in closely related species. BMC Bioinformatics, 9:278, 2008.
    [39]
    W. J. Kent, C. W. Sugnet, T. S. Furey, K. M. Roskin, T. H. Pringle, A. M. Zahler, and D. Haussler. The human genome browser at ucsc. GENOME RESEARCH, 25(6):996--1006, 2002.
    [40]
    K. Kenthapadi, N. Mishra, and K. Nissim. Simulatable auditing. In PODS '05: Proceedings of the twenty-fourth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, pages 118--127, New York, NY, USA, 2005. ACM.
    [41]
    S. C. Khoo and R. S. Sundaresh. Compiling inheritance using partial evaluation. In PEPM '91: Proceedings of the 1991 ACM SIGPLAN symposium on Partial evaluation and semantics-based program manipulation, pages 211--222, New York, NY, USA, 1991. ACM.
    [42]
    J. C. King. Symbolic execution and program testing. Commun. ACM, 19(7):385--394, 1976.
    [43]
    L. Kruglyak and D. Nickerson. Variation is the spice of life. Nat. Genet., 27:234--236, Mar 2001.
    [44]
    N. Li and T. Li. t-closeness: Privacy beyond k-anonymity and âĎŞ-diversity. In In Proceedings of IEEE International Conference on Data Engineering, 2007.
    [45]
    B. Ma, J. Tromp, and M. Li. Patternhunter: faster and more sensitive homology search. Bioinformatics, 18(3):440--445, Mar 2002.
    [46]
    A. Machanavajjhala, D. Kifer, J. Gehrke, and M. Venkitasubramaniam. L-diversity: Privacy beyond k-anonymity. ACM Trans. Knowl. Discov. Data, 1(1):3, 2007.
    [47]
    B. Malin. Protecting dna sequence anonymity with generalization lattices. Technical Report CMU-ISRI-04-134, Carnegie Mellon University, As of October 2007.
    [48]
    E. M. McCreight. A space-economical suffix tree construction algorithm. J. ACM, 23(2):262--272, 1976.
    [49]
    T. Mogensen. The appliation of partial evaluation to ray-tracing. Master thesis, DIKU, University of Copenhagen, 1986.
    [50]
    R. Motwani, S. Nabar, and D. Thomas. Auditing a batch of sql queries. Data Engineering Workshop, 2007. IEEE 23th International Conference on, pages 186--191, April 2007.
    [51]
    R. Motwani, S. Nabar, and D. Thomas. Auditing sql queries. Data Engineering, 2008. ICDE 2008. IEEE 24th International Conference on, pages 287--296, April 2008.
    [52]
    A. C. Myers. Jflow: Practical mostly-static information flow control. In In Proc. 26th ACM Symp. on Principles of Programming Languages (POPL, pages 228--241, 1999.
    [53]
    A. C. Myers and B. Liskov. Protecting privacy using the decentralized label model. ACM Trans. Softw. Eng. Methodol., 9(4):410--442, 2000.
    [54]
    E. W. Myers and W. Miller. Optimal alignments in linear space. CABIOS, 4:11--17, 1988.
    [55]
    S. U. Nabar, B. Marthi, K. Kenthapadi, N. Mishra, and R. Motwani. Towards robustness in query auditing. In VLDB '06: Proceedings of the 32nd international conference on Very large data bases, pages 151--162. VLDB Endowment, 2006.
    [56]
    W. C. Needleman SB. A general method applicable to the search for similarities in the amino acid sequence of two proteins. J Mol Biol, 48(3):443--453, 1970.
    [57]
    V. Nirkhe and W. Pugh. Partial evaluation of high-level imperative programming languages with applications in hard real-time systems. In POPL '92: Proceedings of the 19th ACM SIGPLAN-SIGACT symposium on Principles of programming languages, pages 269--280, New York, NY, USA, 1992. ACM.
    [58]
    G. Pavesi, F. Zambelli, C. Caggese, and G. Pesole. Exalign: a new method for comparative analysis of exon-intron gene structures. Nucleic Acids Res., 36:e47, May 2008.
    [59]
    M. Poletto, W. C. Hsieh, D. R. Engler, and M. F. Kaashoek. C and tcc: a language and compiler for dynamic code generation. ACM Trans. Program. Lang. Syst., 21(2):324--369, 1999.
    [60]
    T. W. Reps and T. Turnidge. Program specialization via program slicing. In Selected Papers from the Internaltional Seminar on Partial Evaluation, pages 409--429, London, UK, 1996. Springer-Verlag.
    [61]
    R. G. Sadygov, D. Cociorva, and J. R. Yates. Large-scale database searching using tandem mass spectra: looking up the answer in the back of the book. Nat. Methods, 1:195--202, Dec 2004.
    [62]
    U. P. Schultz, J. L. Lawall, C. Consel, and G. Muller. Towards automatic specialization of java programs. In ECOOP '99: Proceedings of the 13th European Conference on Object-Oriented Programming, pages 367--390, London, UK, 1999. Springer-Verlag.
    [63]
    S. Schwartz, W. J. Kent, A. Smit, Z. Zhang, R. Baertsch, R. C. Hardison, D. Haussler, and W. Miller. Human-mouse alignments with blastz. Genome Res, 13(1):103--107, Jan 2003.
    [64]
    W. M. Smith TF. Identification of common molecular subsequences. J Mol Biol, 147:195, 1981.
    [65]
    E. Szajda, M. Pohl, J. Owen, and B. Lawson. Toward a practical data privacy scheme for a distributed implementation of the smith-waterman genome sequence comparison algrotihm. In Proceedings of the 12th Annual Network and Distributed System Security Symposium (NDSS 06), 2006.
    [66]
    T. A. Tatusova and T. L. Madden. Blast 2 sequences - a new tool for comparing protein and nucleotide sequences. FEMS Microbiology Letters, 174:247HH250, 1999.
    [67]
    D. Tsur, S. Tanner, E. Zandi, V. Bafna, and P. A. Pevzner. Identification of post-translational modifications by blind search of mass spectra. Nat. Biotechnol., 23:1562--1567, Dec 2005.
    [68]
    R. Wang, X. Wang, Z. Li, H. Tang, M. K. Reiter, and Z. Dong. Privacy-preserving genomic computation through program specialization. Technical Report IUCS-TR679, Indiana University, 2009.
    [69]
    A. Yao. How to generate and exchange secrets. In FOCS, 1986.

    Cited By

    View all
    • (2024)Subjective Privacy Information: Concepts, Models and Characteristics2024 IEEE 14th International Conference on Electronics Information and Emergency Communication (ICEIEC)10.1109/ICEIEC61773.2024.10561732(94-97)Online publication date: 24-May-2024
    • (2023)Efficient and Privacy-Preserving Similar Patients Query Scheme Over Outsourced Genomic DataIEEE Transactions on Cloud Computing10.1109/TCC.2021.313128711:2(1286-1302)Online publication date: 1-Apr-2023
    • (2023)dpUGC: Learn Differentially Private Representation for User Generated Contents (Best Paper Award, Third Place, Shared)Computational Linguistics and Intelligent Text Processing10.1007/978-3-031-24337-0_23(316-331)Online publication date: 26-Feb-2023
    • Show More Cited By

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    CCS '09: Proceedings of the 16th ACM conference on Computer and communications security
    November 2009
    664 pages
    ISBN:9781605588940
    DOI:10.1145/1653662
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 09 November 2009

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. dynamic programming
    2. human genome
    3. privacy-preserving computation
    4. program specialization
    5. secure multi-party computation
    6. symbolic execution

    Qualifiers

    • Research-article

    Conference

    CCS '09
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 1,261 of 6,999 submissions, 18%

    Upcoming Conference

    CCS '24
    ACM SIGSAC Conference on Computer and Communications Security
    October 14 - 18, 2024
    Salt Lake City , UT , USA

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)13
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 10 Aug 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Subjective Privacy Information: Concepts, Models and Characteristics2024 IEEE 14th International Conference on Electronics Information and Emergency Communication (ICEIEC)10.1109/ICEIEC61773.2024.10561732(94-97)Online publication date: 24-May-2024
    • (2023)Efficient and Privacy-Preserving Similar Patients Query Scheme Over Outsourced Genomic DataIEEE Transactions on Cloud Computing10.1109/TCC.2021.313128711:2(1286-1302)Online publication date: 1-Apr-2023
    • (2023)dpUGC: Learn Differentially Private Representation for User Generated Contents (Best Paper Award, Third Place, Shared)Computational Linguistics and Intelligent Text Processing10.1007/978-3-031-24337-0_23(316-331)Online publication date: 26-Feb-2023
    • (2023)Self-adaptive Privacy Concern Detection for User-Generated ContentComputational Linguistics and Intelligent Text Processing10.1007/978-3-031-23793-5_14(153-167)Online publication date: 26-Feb-2023
    • (2022)DNA Similarity Search With Access Control Over Encrypted Cloud DataIEEE Transactions on Cloud Computing10.1109/TCC.2020.296889310:2(1233-1252)Online publication date: 1-Apr-2022
    • (2021)Differential Privacy Defenses and Sampling Attacks for Membership InferenceProceedings of the 14th ACM Workshop on Artificial Intelligence and Security10.1145/3474369.3486876(193-202)Online publication date: 15-Nov-2021
    • (2021)Differential Privacy-Based Genetic Matching in Personalized MedicineIEEE Transactions on Emerging Topics in Computing10.1109/TETC.2020.29700949:3(1109-1125)Online publication date: 1-Jul-2021
    • (2020)Privacy-Preserving Visual Content Tagging using Graph Transformer NetworksProceedings of the 28th ACM International Conference on Multimedia10.1145/3394171.3414047(2299-2307)Online publication date: 12-Oct-2020
    • (2020)Quantum protocol for privacy preserving Hamming distance problem of DNA sequencesInternational Journal of Theoretical Physics10.1007/s10773-020-04483-4Online publication date: 23-May-2020
    • (2020)Security Count Query and Integrity Verification Based on Encrypted Genomic DataProceedings of the 9th International Conference on Computer Engineering and Networks10.1007/978-981-15-3753-0_63(647-654)Online publication date: 1-Jul-2020
    • Show More Cited By

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media