Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Secure DNA Motif-Finding Method Based on Sampling Candidate Pruning

Published: 19 August 2021 Publication History

Abstract

With the continuous exploration of genetic research, gradually exposed privacy issues become the bottleneck that limits its development. DNA motif finding is an important study to understand the regulation of gene expression; however, the existing methods generally ignore the potential sensitive information that may be exposed in the process. In this work, we utilize the -differential privacy model to provide provable privacy guarantees which is independent of attackers’ background knowledge. Our method makes use of sample databases to prune the generated candidate motifs to lower the magnitude of added noise. Furthermore, to improve the utility of mining results, a strategy of threshold modification is designed to reduce the propagation and random sampling errors in the mining process. Extensive experiments on actual DNA databases confirm that our approach can privately find DNA motifs with high utility and efficiency.

References

[1]
M. K. Das and H.-K. Dai. 2007. A survey of DNA motif finding algorithms. BMC Bioinformatics 8, 7 (2007), S21.
[2]
F. Zhu, X. Yan, J. Han, and S. Y. Philip. 2007. Efficient discovery of frequent approximate sequential patterns. In 7th IEEE International Conference on Data Mining (ICDM’07). IEEE, 751–756.
[3]
X. Wang, Y. Miao, and M. Cheng. 2014. Finding motifs in DNA sequences using low dispersion sequences. Journal of Computational Biology 21, 4 (2014), 320–329.
[4]
V. Vishal and S. Singh. 2011. A genetic optimization approach for finding common motif in biological sequences. International Journal of Computer Technology and Applications 2, 4 (2011).
[5]
K. Shida. 2006. GibbSST: a Gibbs sampling method for motif discovery with enhanced resistance to local optima. BMC Bioinformatics 7, 1 (2006), 486.
[6]
D. Liu, X. Xiong, B. Das Gupta, and H. Zhang. 2006. Motif discoveries in unaligned molecular sequences using self-organizing neural networks. IEEE Transactions on Neural Networks 17, 4 (2006), 919–928.
[7]
J. Kim, S. Yu, and S. Yoon. 2014. Ensemble algorithms for DNA motif finding. In 2014 International Conference on Electronics, Information and Communications (ICEIC’14). IEEE, 1–2.
[8]
J. Hu, Y. D. Yang, and D. Kihara. 2006. EMD: An ensemble algorithm for discovering regulatory motifs in DNA sequences. BMC Bioinformatics 7, 1 (2006), 342.
[9]
T.-M. Chan, K.-S. Leung, and K.-H. Lee. 2012. Memetic algorithms for de novo motif discovery. IEEE Transactions on Evolutionary Computation 16, 5 (2012), 730–748.
[10]
B. Malin. 2004. Protecting DNA sequence anonymity with generalization lattices, Carnegie Mellon University, School of Computer Science [Institute for Software Research International].
[11]
N. Homer, S. Szelinger, M. Redman, D. Duggan, W. Tembe, J. Muehling, J. V. Pearson, D. A. Stephan, S. F. Nelson, and D. W. Craig. 2008. Resolving individuals contributing trace amounts of DNA to highly complex mixtures using high-density SNP genotyping microarrays. PLoS Genetics 4, 8 (2008), e1000167.
[12]
M. Gymrek, A. L. McGuire, D. Golan, E. Halperin, and Y. Erlich. 2013. Identifying personal genomes by surname inference. Science 339, 6117 (2013), 321–324.
[13]
B. A. Malin. 2005. An evaluation of the current state of genomic data privacy protection technology and a roadmap for the future. Journal of the American Medical Informatics Association 12, 1 (2005), 28–34.
[14]
C. Dwork. 2006. Differential privacy. In International Colloquium on Automata, Languages and Programming (ICALP’06). 1–12.
[15]
R. Agrawal and R. Srikant. 1995. Mining sequential patterns. In Proceedings of the 11th International Conference on Data Engineering. IEEE, 3–14.
[16]
R. Chen, G. Acs, and C. Castelluccia. 2012. Differentially private sequential data publication via variable-length n-grams. In Proceedings of the 2012 ACM Conference on Computer and Communications Security. ACM, 638–649.
[17]
R. Chen, B. Fung, B. C. Desai, and N. M. Sossou. 2012. Differentially private transit data publication: A case study on the Montreal transportation system. In Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 213–221.
[18]
N. Li, W. Qardaji, D. Su, and J. Cao. 2012. PrivBasis: Frequent itemset mining with differential privacy. Proceedings of the VLDB Endowment 5, 11 (2012), 1340–1351.
[19]
X. Cheng, S. Su, S. Xu, and Z. Li. 2015. DP-Apriori: A differentially private frequent itemset mining algorithm based on transaction splitting. Computers & Security 50 (2015), 74–90.
[20]
S. Su, S. Xu, X. Cheng, Z. Li, and F. Yang, Differentially private frequent itemset mining via transaction splitting. IEEE Transactions on Knowledge and Data Engineering 27, 7 (2015), 1875–1891.
[21]
E. Shen and T. Yu. 2013. Mining frequent graph patterns with differential privacy. In Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 545–553.
[22]
S. Xu, S. Su, L. Xiong, X. Cheng, and K. Xiao. 2016. Differentially private frequent subgraph mining. In 2016 IEEE 32nd International Conference on Data Engineering (ICDE’16), IEEE, 229–240.
[23]
S. Xu, X. Cheng, S. Su, K. Xiao, and L. Xiong. 2016. Differentially private frequent sequence mining. IEEE Transactions on Knowledge and Data Engineering 28, 11 (2016), 2910–2926.
[24]
R. Chen, Y. Peng, B. Choi, J. Xu, and H. Hu. 2014. A private DNA motif finding algorithm. Journal of Biomedical Informatics 50 (2014), 122–132.
[25]
C. Dwork, F. McSherry, K. Nissim, and A. Smith. 2006. Calibrating noise to sensitivity in private data analysis. In TCC, Vol. 3876. Springer, 265–284.
[26]
F. D. McSherry. 2009. Privacy integrated queries: An extensible platform for privacy-preserving data analysis. In Proceedings of the 2009 ACM SIGMOD International Conference on Management of Data. ACM, 19–30.
[27]
J. Han, J. Pei, B. Mortazavi-Asl, H. Pinto, Q. Chen, U. Dayal, and M. Hsu, PrefixSpan: Mining sequential patterns efficiently by prefix-projected pattern growth. In Proceedings of the 17th International Conference on Data Engineering, 215–224.
[28]
R. Agrawal, R. Srikant, et al. 1994. Fast algorithms for mining association rules. In Proceedings of the 20th International Conference on Very Large Data Bases. VLDB, Vol. 1215, 487–499.
[29]
C. Zeng, J. F. Naughton, and J.-Y. Cai. 2012. On differentially private frequent itemset mining. Proceedings of the VLDB Endowment 6, 1 (2012), 25–36.
[30]
M. Wistuba, J. Grabocka, and L. Schmidt-Thieme, Ultra-fast shapelets for time series classification. arXiv:1503.05018. Retrieved from http://de.arxiv.org/pdf/1503.05018.
[31]
G. Pavesi, G. Mauri, and G. Pesole. 2001. An algorithm for finding signals of unknown length in DNA sequences. Bioinformatics 17 (Suppl 1) (2001), S207–S214.
[32]
R. Staden. 1989. Methods for discovering novel motifs in nucleic acid sequences. Bioinformatics 5, 4 (1989), 293–298.
[33]
S. Kurtz, J. V. Choudhuri, E. Ohlebusch, C. Schleiermacher, J. Stoye, and R. Giegerich, Reputer: The manifold applications of repeat analysis on a genomic scale. Nucleic Acids Research 29, 22 (2001), 4633–4642.
[34]
M. Tompa, N. Li, T. L. Bailey, G. M. Church, B. De Moor, E. Eskin, A. V. Favorov, M. C. Frith, Y. Fu, W. J. Kent et al. 2005. Assessing computational tools for the discovery of transcription factor binding sites. Nature Biotechnology 23, 1 (2005), 137.
[35]
R. Chen, N. Mohammed, B. C. Fung, B. C. Desai, and L. Xiong. 2011. Publishing set-valued data via differential privacy. Proceedings of the VLDB Endowment 4, 11 (2011), 1087–1098.
[36]
R. Bhaskar, S. Laxman, A. Smith, and A. Thakurta. 2010. Discovering frequent patterns in sensitive data. In Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, 503–512.
[37]
X. Wu, H. Wang, M. Shi, A. Wang, and K. Xia. 2019. DNA motif finding method without protection can leak user privacy. IEEE Access 7 (2019),152076–152087.

Cited By

View all
  • (2024)A Deep Learning Cache Framework for Privacy Security on Heterogeneous IoT NetworksIEEE Access10.1109/ACCESS.2024.342248712(93261-93269)Online publication date: 2024
  • (2024)Privacy-preserving federated discovery of DNA motifs with differential privacyExpert Systems with Applications10.1016/j.eswa.2024.123799249(123799)Online publication date: Sep-2024
  • (2023)An intelligent blockchain-based access control framework with federated learning for genome-wide association studiesComputer Standards & Interfaces10.1016/j.csi.2022.10369484:COnline publication date: 1-Mar-2023
  • Show More Cited By

Index Terms

  1. Secure DNA Motif-Finding Method Based on Sampling Candidate Pruning

        Recommendations

        Comments

        Information & Contributors

        Information

        Published In

        cover image ACM Transactions on Internet Technology
        ACM Transactions on Internet Technology  Volume 21, Issue 3
        August 2021
        522 pages
        ISSN:1533-5399
        EISSN:1557-6051
        DOI:10.1145/3468071
        • Editor:
        • Ling Liu
        Issue’s Table of Contents
        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        Published: 19 August 2021
        Online AM: 07 May 2020
        Accepted: 01 February 2020
        Revised: 01 January 2020
        Received: 01 December 2019
        Published in TOIT Volume 21, Issue 3

        Permissions

        Request permissions for this article.

        Check for updates

        Author Tags

        1. Motif finding
        2. differential privacy
        3. data privacy
        4. DNA computing

        Qualifiers

        • Research-article
        • Refereed

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)14
        • Downloads (Last 6 weeks)1
        Reflects downloads up to 04 Oct 2024

        Other Metrics

        Citations

        Cited By

        View all
        • (2024)A Deep Learning Cache Framework for Privacy Security on Heterogeneous IoT NetworksIEEE Access10.1109/ACCESS.2024.342248712(93261-93269)Online publication date: 2024
        • (2024)Privacy-preserving federated discovery of DNA motifs with differential privacyExpert Systems with Applications10.1016/j.eswa.2024.123799249(123799)Online publication date: Sep-2024
        • (2023)An intelligent blockchain-based access control framework with federated learning for genome-wide association studiesComputer Standards & Interfaces10.1016/j.csi.2022.10369484:COnline publication date: 1-Mar-2023
        • (2022)Construction of the Human-Computer Interaction Model of Hybrid Course Based on Machine LearningMathematical Problems in Engineering10.1155/2022/58367102022(1-10)Online publication date: 27-Apr-2022
        • (2022)A Dynamic CGE Model for Consumer Trust Mechanism within an E-Commerce MarketMobile Information Systems10.1155/2022/52206542022Online publication date: 1-Jan-2022

        View Options

        Get Access

        Login options

        Full Access

        View options

        PDF

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        HTML Format

        View this article in HTML Format.

        HTML Format

        Media

        Figures

        Other

        Tables

        Share

        Share

        Share this Publication link

        Share on social media