Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3307339.3342179acmconferencesArticle/Chapter ViewAbstractPublication PagesbcbConference Proceedingsconference-collections
short-paper
Public Access

A Sparse Convolutional Predictor with Denoising Autoencoders for Phenotype Prediction

Published: 04 September 2019 Publication History

Abstract

Phenotype prediction has been widely conducted in many areas to help understand disease risks and susceptibility, and improve the breeding cycles of plants and animals. Most methods of phenotype prediction are based on regularized statistical approaches which only consider linear relationships among genetic features. Deep learning based methods have been recently reported to nicely address regression problems in high dimensional data in genomic studies. To explore deep learning for phenotype prediction, we propose a deep learning regression model, called Sparse Convolutional Predictor with Denoising Autoencoders (SCP_DAE), to predict quantitative traits. We constructed SCP_DAE by utilizing a convolutional layer that can extract correlation or linkage patterns in the genotype data and applying a sparse weight matrix resulted from the L1 regularization to handle high dimensional genotype data. To learn efficient and compressed hidden representations of genotype data, we pre-trained the convolutional layer and the first fully connected layer in SCP_DAE using denoising autoencoders. These pre-trained layers were then fine-tuned to improve its performance of the SCP_DAE model for phenotype prediction. We comprehensively evaluated our proposed method on a yeast dataset which contains well assayed genotype profiles and quantitative traits. Our results showed that the proposed SCP\_DAE method significantly outperforms regularized statistical approaches and similar deep learning models without pre-trained weights.

References

[1]
Anna Alemany, Maria Florescu, Chloé S Baron, Josi Peterson-Maduro, and Alexander Van Oudenaarden. 2018. Whole-organism clone tracing using single-cell sequencing. Nature, Vol. 556, 7699 (2018), 108.
[2]
Joshua S Bloom, Iulia Kotenko, Meru J Sadhu, Sebastian Treusch, Frank W Albert, and Leonid Kruglyak. 2015. Genetic interactions contribute less than additive effects to quantitative trait variation in yeast. Nature communications, Vol. 6 (2015), 8712.
[3]
Lujia Chen, Chunhui Cai, Vicky Chen, and Xinghua Lu. 2016a. Learning a hierarchical representation of the yeast transcriptomic machinery using an autoencoder model. BMC Bioinformatics, Vol. 17, 1 (11 Jan 2016), S9.
[4]
Yifei Chen, Yi Li, Rajiv Narayan, Aravind Subramanian, and Xiaohui Xie. 2016b. Gene expression inference with deep learning. Bioinformatics, Vol. 32, 12 (2016), 1832--1839.
[5]
George E Dahl, Tara N Sainath, and Geoffrey E Hinton. 2013. Improving deep neural networks for LVCSR using rectified linear units and dropout. In 2013 IEEE international conference on acoustics, speech and signal processing. IEEE, 8609--8613.
[6]
Jun Han and Claudio Moraga. 1995. The influence of the sigmoid function parameters on the speed of backpropagation learning. In International Workshop on Artificial Neural Networks. Springer, 195--201.
[7]
Rhys Heffernan, Yuedong Yang, Kuldip Paliwal, and Yaoqi Zhou. 2017. Capturing non-local interactions by long short-term memory bidirectional recurrent neural networks for improving prediction of protein secondary structure, backbone angles, contact numbers and solvent accessibility. Bioinformatics, Vol. 33, 18 (2017), 2842--2849.
[8]
Jason A Holliday, Tongli Wang, and Sally Aitken. 2012. Predicting adaptive phenotypes from multilocus genotypes in Sitka spruce (Picea sitchensis) using random forest. G3: Genes, Genomes, Genetics, Vol. 2, 9 (2012), 1085--1093.
[9]
David H Hubel and Torsten N Wiesel. 1968. Receptive fields and functional architecture of monkey striate cortex. The Journal of physiology, Vol. 195, 1 (1968), 215--243.
[10]
Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. 2015. Deep learning. nature, Vol. 521, 7553 (2015), 436.
[11]
Haiou Li, Jie Hou, Badri Adhikari, Qiang Lyu, and Jianlin Cheng. 2017. Deep learning methods for protein torsion angle prediction. BMC bioinformatics, Vol. 18, 1 (2017), 417.
[12]
Yang Liu and Duolin Wang. 2017. Application of deep learning in genomic selection. In 2017 IEEE International Conference on Bioinformatics and Biomedicine (BIBM). IEEE, 2280--2280.
[13]
Wenlong Ma, Zhixu Qiu, Jie Song, Jiajia Li, Qian Cheng, Jingjing Zhai, and Chuang Ma. 2018. A deep convolutional neural network approach for predicting phenotypes from genotypes. Planta, Vol. 248, 5 (2018), 1307--1318.
[14]
T.H.E. Meuwissen, B.J. Hayes, and M.E. Goddard. 2001. Prediction of total genetic value using genome-wide dense marker maps. Genetics, Vol. 157, 4 (2001), 1819--1829.
[15]
Seonwoo Min, Byunghan Lee, and Sungroh Yoon. 2017. Deep learning in bioinformatics. Briefings in bioinformatics, Vol. 18, 5 (2017), 851--869.
[16]
Pooya Mobadersany, Safoora Yousefi, Mohamed Amgad, David A Gutman, Jill S Barnholtz-Sloan, José E Velázquez Vega, Daniel J Brat, and Lee AD Cooper. 2018. Predicting cancer outcomes from histology and genomics using convolutional networks. Proceedings of the National Academy of Sciences, Vol. 115, 13 (2018), E2970--E2979.
[17]
Trevor Park and George Casella. 2008. The bayesian Lasso. J. Amer. Statist. Assoc., Vol. 103, 482 (2008), 681--686.
[18]
Menelaos Pavlou, Gareth Ambler, Shaun Seaman, Maria De Iorio, and Rumana Z Omar. 2016. Review and evaluation of penalised regression methods for risk prediction in low-dimensional data with few events. Statistics in medicine, Vol. 35, 7 (2016), 1159--1177.
[19]
Alec Radford, Luke Metz, and Soumith Chintala. 2015. Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv preprint arXiv:1511.06434 (2015).
[20]
David E Reich, Michele Cargill, Stacey Bolk, James Ireland, Pardis C Sabeti, Daniel J Richter, Thomas Lavery, Rose Kouyoumjian, Shelli F Farhadian, Ryk Ward, et almbox. 2001. Linkage disequilibrium in the human genome. Nature, Vol. 411, 6834 (2001), 199.
[21]
Ritambhara Singh, Jack Lanchantin, Gabriel Robins, and Yanjun Qi. 2016. DeepChrome: deep-learning for predicting gene expression from histone modifications. Bioinformatics, Vol. 32, 17 (2016), i639--i648.
[22]
Jasper Snoek, Hugo Larochelle, and Ryan P. Adams. 2012. Practical Bayesian Optimization of Machine Learning Algorithms . arXiv:1206.2944 {cs, stat} (June 2012). http://arxiv.org/abs/1206.2944 arXiv: 1206.2944.
[23]
Zaixiang Tang, Yueping Shen, Xinyan Zhang, and Nengjun Yi. 2016. The Spike-and-Slab Lasso Generalized Linear Models for Prediction and Associated Genes Detection. Genetics (2016), genetics--116.
[24]
Robert Tibshirani. 1996. Regression shrinkage and selection via the Lasso. Journal of the Royal Statistical Society. Series B (Methodological) (1996), 267--288.
[25]
Paul M VanRaden. 2008. Efficient methods to compute genomic predictions. Journal of dairy science, Vol. 91, 11 (2008), 4414--4423.
[26]
Vladimir Vapnik. 1998. Statistical learning theory. 1998 . Vol. 3. Wiley, New York.
[27]
Pascal Vincent, Hugo Larochelle, Yoshua Bengio, and Pierre-Antoine Manzagol. 2008. Extracting and composing robust features with denoising autoencoders. In Proceedings of the 25th international conference on Machine learning. ACM, 1096--1103.
[28]
Jian Zhou, Chandra L Theesfeld, Kevin Yao, Kathleen M Chen, Aaron K Wong, and Olga G Troyanskaya. 2018. Deep learning sequence-based ab initio prediction of variant effects on expression and disease risk. Nature genetics, Vol. 50, 8 (2018), 1171.
[29]
Xiang Zhou, Peter Carbonetto, and Matthew Stephens. 2013. Polygenic modeling with Bayesian sparse linear mixed models. PLoS Genetics, Vol. 9, 2 (2013), e1003264.
[30]
Hui Zou and Trevor Hastie. 2005. Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society: Series B (Statistical Methodology), Vol. 67, 2 (2005), 301--320.

Cited By

View all
  • (2023)FSF-GA: A Feature Selection Framework for Phenotype Prediction Using Genetic AlgorithmsGenes10.3390/genes1405105914:5(1059)Online publication date: 9-May-2023
  • (2023)Deep Learning Methods for Omics Data ImputationBiology10.3390/biology1210131312:10(1313)Online publication date: 7-Oct-2023
  • (2021)Fully-automated root image analysis (faRIA)Scientific Reports10.1038/s41598-021-95480-y11:1Online publication date: 6-Aug-2021
  • Show More Cited By

Index Terms

  1. A Sparse Convolutional Predictor with Denoising Autoencoders for Phenotype Prediction

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    BCB '19: Proceedings of the 10th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics
    September 2019
    716 pages
    ISBN:9781450366663
    DOI:10.1145/3307339
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 04 September 2019

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. autoencoder
    2. convolutional network
    3. deep learning
    4. genomics
    5. phenotype prediction
    6. sparse model

    Qualifiers

    • Short-paper

    Funding Sources

    Conference

    BCB '19
    Sponsor:

    Acceptance Rates

    BCB '19 Paper Acceptance Rate 42 of 157 submissions, 27%;
    Overall Acceptance Rate 254 of 885 submissions, 29%

    Upcoming Conference

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)62
    • Downloads (Last 6 weeks)5
    Reflects downloads up to 10 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)FSF-GA: A Feature Selection Framework for Phenotype Prediction Using Genetic AlgorithmsGenes10.3390/genes1405105914:5(1059)Online publication date: 9-May-2023
    • (2023)Deep Learning Methods for Omics Data ImputationBiology10.3390/biology1210131312:10(1313)Online publication date: 7-Oct-2023
    • (2021)Fully-automated root image analysis (faRIA)Scientific Reports10.1038/s41598-021-95480-y11:1Online publication date: 6-Aug-2021
    • (2020)Innovating Computational Biology and Intelligent Medicine: ICIBM 2019 Special IssueGenes10.3390/genes1104043711:4(437)Online publication date: 17-Apr-2020
    • (2020)Population-scale Genomic Data Augmentation Based on Conditional Generative Adversarial NetworksProceedings of the 11th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics10.1145/3388440.3412475(1-6)Online publication date: 21-Sep-2020
    • (2020)Self-supervised Deep Learning for Flower Image Segmentation2020 14th International Conference on Innovations in Information Technology (IIT)10.1109/IIT50501.2020.9298979(126-130)Online publication date: 17-Nov-2020

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Get Access

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media