short-paper

Public Access

A Sparse Convolutional Predictor with Denoising Autoencoders for Phenotype Prediction

Authors:

Xinghua ShiAuthors Info & Claims

BCB '19: Proceedings of the 10th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics

Pages 217 - 222

https://doi.org/10.1145/3307339.3342179

Published: 04 September 2019 Publication History

Abstract

Phenotype prediction has been widely conducted in many areas to help understand disease risks and susceptibility, and improve the breeding cycles of plants and animals. Most methods of phenotype prediction are based on regularized statistical approaches which only consider linear relationships among genetic features. Deep learning based methods have been recently reported to nicely address regression problems in high dimensional data in genomic studies. To explore deep learning for phenotype prediction, we propose a deep learning regression model, called Sparse Convolutional Predictor with Denoising Autoencoders (SCP_DAE), to predict quantitative traits. We constructed SCP_DAE by utilizing a convolutional layer that can extract correlation or linkage patterns in the genotype data and applying a sparse weight matrix resulted from the L1 regularization to handle high dimensional genotype data. To learn efficient and compressed hidden representations of genotype data, we pre-trained the convolutional layer and the first fully connected layer in SCP_DAE using denoising autoencoders. These pre-trained layers were then fine-tuned to improve its performance of the SCP_DAE model for phenotype prediction. We comprehensively evaluated our proposed method on a yeast dataset which contains well assayed genotype profiles and quantitative traits. Our results showed that the proposed SCP\_DAE method significantly outperforms regularized statistical approaches and similar deep learning models without pre-trained weights.

References

[1]

Anna Alemany, Maria Florescu, Chloé S Baron, Josi Peterson-Maduro, and Alexander Van Oudenaarden. 2018. Whole-organism clone tracing using single-cell sequencing. Nature, Vol. 556, 7699 (2018), 108.

[2]

Joshua S Bloom, Iulia Kotenko, Meru J Sadhu, Sebastian Treusch, Frank W Albert, and Leonid Kruglyak. 2015. Genetic interactions contribute less than additive effects to quantitative trait variation in yeast. Nature communications, Vol. 6 (2015), 8712.

[3]

Lujia Chen, Chunhui Cai, Vicky Chen, and Xinghua Lu. 2016a. Learning a hierarchical representation of the yeast transcriptomic machinery using an autoencoder model. BMC Bioinformatics, Vol. 17, 1 (11 Jan 2016), S9.

[4]

Yifei Chen, Yi Li, Rajiv Narayan, Aravind Subramanian, and Xiaohui Xie. 2016b. Gene expression inference with deep learning. Bioinformatics, Vol. 32, 12 (2016), 1832--1839.

[5]

George E Dahl, Tara N Sainath, and Geoffrey E Hinton. 2013. Improving deep neural networks for LVCSR using rectified linear units and dropout. In 2013 IEEE international conference on acoustics, speech and signal processing. IEEE, 8609--8613.

[6]

Jun Han and Claudio Moraga. 1995. The influence of the sigmoid function parameters on the speed of backpropagation learning. In International Workshop on Artificial Neural Networks. Springer, 195--201.

Digital Library

[7]

Rhys Heffernan, Yuedong Yang, Kuldip Paliwal, and Yaoqi Zhou. 2017. Capturing non-local interactions by long short-term memory bidirectional recurrent neural networks for improving prediction of protein secondary structure, backbone angles, contact numbers and solvent accessibility. Bioinformatics, Vol. 33, 18 (2017), 2842--2849.

[8]

Jason A Holliday, Tongli Wang, and Sally Aitken. 2012. Predicting adaptive phenotypes from multilocus genotypes in Sitka spruce (Picea sitchensis) using random forest. G3: Genes, Genomes, Genetics, Vol. 2, 9 (2012), 1085--1093.

[9]

David H Hubel and Torsten N Wiesel. 1968. Receptive fields and functional architecture of monkey striate cortex. The Journal of physiology, Vol. 195, 1 (1968), 215--243.

[10]

Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. 2015. Deep learning. nature, Vol. 521, 7553 (2015), 436.

[11]

Haiou Li, Jie Hou, Badri Adhikari, Qiang Lyu, and Jianlin Cheng. 2017. Deep learning methods for protein torsion angle prediction. BMC bioinformatics, Vol. 18, 1 (2017), 417.

[12]

Yang Liu and Duolin Wang. 2017. Application of deep learning in genomic selection. In 2017 IEEE International Conference on Bioinformatics and Biomedicine (BIBM). IEEE, 2280--2280.

[13]

Wenlong Ma, Zhixu Qiu, Jie Song, Jiajia Li, Qian Cheng, Jingjing Zhai, and Chuang Ma. 2018. A deep convolutional neural network approach for predicting phenotypes from genotypes. Planta, Vol. 248, 5 (2018), 1307--1318.

[14]

T.H.E. Meuwissen, B.J. Hayes, and M.E. Goddard. 2001. Prediction of total genetic value using genome-wide dense marker maps. Genetics, Vol. 157, 4 (2001), 1819--1829.

[15]

Seonwoo Min, Byunghan Lee, and Sungroh Yoon. 2017. Deep learning in bioinformatics. Briefings in bioinformatics, Vol. 18, 5 (2017), 851--869.

[16]

Pooya Mobadersany, Safoora Yousefi, Mohamed Amgad, David A Gutman, Jill S Barnholtz-Sloan, José E Velázquez Vega, Daniel J Brat, and Lee AD Cooper. 2018. Predicting cancer outcomes from histology and genomics using convolutional networks. Proceedings of the National Academy of Sciences, Vol. 115, 13 (2018), E2970--E2979.

[17]

Trevor Park and George Casella. 2008. The bayesian Lasso. J. Amer. Statist. Assoc., Vol. 103, 482 (2008), 681--686.

[18]

Menelaos Pavlou, Gareth Ambler, Shaun Seaman, Maria De Iorio, and Rumana Z Omar. 2016. Review and evaluation of penalised regression methods for risk prediction in low-dimensional data with few events. Statistics in medicine, Vol. 35, 7 (2016), 1159--1177.

[19]

Alec Radford, Luke Metz, and Soumith Chintala. 2015. Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv preprint arXiv:1511.06434 (2015).

[20]

David E Reich, Michele Cargill, Stacey Bolk, James Ireland, Pardis C Sabeti, Daniel J Richter, Thomas Lavery, Rose Kouyoumjian, Shelli F Farhadian, Ryk Ward, et almbox. 2001. Linkage disequilibrium in the human genome. Nature, Vol. 411, 6834 (2001), 199.

[21]

Ritambhara Singh, Jack Lanchantin, Gabriel Robins, and Yanjun Qi. 2016. DeepChrome: deep-learning for predicting gene expression from histone modifications. Bioinformatics, Vol. 32, 17 (2016), i639--i648.

[22]

Jasper Snoek, Hugo Larochelle, and Ryan P. Adams. 2012. Practical Bayesian Optimization of Machine Learning Algorithms . arXiv:1206.2944 {cs, stat} (June 2012). http://arxiv.org/abs/1206.2944 arXiv: 1206.2944.

Digital Library

[23]

Zaixiang Tang, Yueping Shen, Xinyan Zhang, and Nengjun Yi. 2016. The Spike-and-Slab Lasso Generalized Linear Models for Prediction and Associated Genes Detection. Genetics (2016), genetics--116.

[24]

Robert Tibshirani. 1996. Regression shrinkage and selection via the Lasso. Journal of the Royal Statistical Society. Series B (Methodological) (1996), 267--288.

[25]

Paul M VanRaden. 2008. Efficient methods to compute genomic predictions. Journal of dairy science, Vol. 91, 11 (2008), 4414--4423.

[26]

Vladimir Vapnik. 1998. Statistical learning theory. 1998 . Vol. 3. Wiley, New York.

[27]

Pascal Vincent, Hugo Larochelle, Yoshua Bengio, and Pierre-Antoine Manzagol. 2008. Extracting and composing robust features with denoising autoencoders. In Proceedings of the 25th international conference on Machine learning. ACM, 1096--1103.

Digital Library

[28]

Jian Zhou, Chandra L Theesfeld, Kevin Yao, Kathleen M Chen, Aaron K Wong, and Olga G Troyanskaya. 2018. Deep learning sequence-based ab initio prediction of variant effects on expression and disease risk. Nature genetics, Vol. 50, 8 (2018), 1171.

[29]

Xiang Zhou, Peter Carbonetto, and Matthew Stephens. 2013. Polygenic modeling with Bayesian sparse linear mixed models. PLoS Genetics, Vol. 9, 2 (2013), e1003264.

[30]

Hui Zou and Trevor Hastie. 2005. Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society: Series B (Statistical Methodology), Vol. 67, 2 (2005), 301--320.

Cited By

Mowlaei MShi X(2023)FSF-GA: A Feature Selection Framework for Phenotype Prediction Using Genetic AlgorithmsGenes10.3390/genes1405105914:5(1059)Online publication date: 9-May-2023
https://doi.org/10.3390/genes14051059
Huang LSong MShen HHong HGong PDeng HZhang C(2023)Deep Learning Methods for Omics Data ImputationBiology10.3390/biology1210131312:10(1313)Online publication date: 7-Oct-2023
https://doi.org/10.3390/biology12101313
Narisetti NHenke MSeiler CJunker AOstermann JAltmann TGladilin E(2021)Fully-automated root image analysis (faRIA)Scientific Reports10.1038/s41598-021-95480-y11:1Online publication date: 6-Aug-2021
https://doi.org/10.1038/s41598-021-95480-y
Show More Cited By

Index Terms

A Sparse Convolutional Predictor with Denoising Autoencoders for Phenotype Prediction
1. Applied computing
  1. Life and medical sciences
    1. Computational biology
      1. Computational genomics

Recommendations

Research of stacked denoising sparse autoencoder

Learning results depend on the representation of data, so how to efficiently represent data has been a research hot spot in machine learning and artificial intelligence. With the deepening of the deep learning research, studying how to train the deep ...
Denoising Autoencoder based Long non-coding RNA-Disease Association Prediction
Abstract
Long non-coding RNAs (lncRNAs) are recent listing in RNA Bioinformatics, which is getting more popular due to their important functional roles. According to the available research, lncRNAs play an essential role in multiple complex diseases. ...
Hyper Autoencoders
Abstract
We introduce the hyper autoencoder architecture where a secondary, hypernetwork is used to generate the weights of the encoder and decoder layers of the primary, actual autoencoder. The hyper autoencoder uses a one-layer linear hypernetwork to ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

BCB '19: Proceedings of the 10th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics

September 2019

716 pages

ISBN:9781450366663

DOI:10.1145/3307339

General Chairs:
Xinghua (Mindy) Shi
Temple University, USA
,
Michael Buck
University of Buffalo, USA
,
Program Chairs:
Jian Ma
Carnegie Mellon University, USA
,
Pierangelo Veltri
University Magna Graecia of Catanzaro, Italy

Copyright © 2019 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGBio: ACM Special Interest Group on Bioinformatics

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 04 September 2019

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Short-paper

Funding Sources

National Science Foundation

Conference

BCB '19

Sponsor:

SIGBio

BCB '19: 10th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics

September 7 - 10, 2019

NY, Niagara Falls, USA

Acceptance Rates

BCB '19 Paper Acceptance Rate 42 of 157 submissions, 27%;

Overall Acceptance Rate 254 of 885 submissions, 29%

Upcoming Conference

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

6
Total Citations
View Citations
623
Total Downloads

Downloads (Last 12 months)62
Downloads (Last 6 weeks)5

Reflects downloads up to 10 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Mowlaei MShi X(2023)FSF-GA: A Feature Selection Framework for Phenotype Prediction Using Genetic AlgorithmsGenes10.3390/genes1405105914:5(1059)Online publication date: 9-May-2023
https://doi.org/10.3390/genes14051059
Huang LSong MShen HHong HGong PDeng HZhang C(2023)Deep Learning Methods for Omics Data ImputationBiology10.3390/biology1210131312:10(1313)Online publication date: 7-Oct-2023
https://doi.org/10.3390/biology12101313
Narisetti NHenke MSeiler CJunker AOstermann JAltmann TGladilin E(2021)Fully-automated root image analysis (faRIA)Scientific Reports10.1038/s41598-021-95480-y11:1Online publication date: 6-Aug-2021
https://doi.org/10.1038/s41598-021-95480-y
Guo YNing XMathé EWang KLi LZhang CZhao Z(2020)Innovating Computational Biology and Intelligent Medicine: ICIBM 2019 Special IssueGenes10.3390/genes1104043711:4(437)Online publication date: 17-Apr-2020
https://doi.org/10.3390/genes11040437
Chen JMowlaei MShi X(2020)Population-scale Genomic Data Augmentation Based on Conditional Generative Adversarial NetworksProceedings of the 11th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics10.1145/3388440.3412475(1-6)Online publication date: 21-Sep-2020
https://dl.acm.org/doi/10.1145/3388440.3412475
Saha SSheikh NBanerjee BPendurkar S(2020)Self-supervised Deep Learning for Flower Image Segmentation2020 14th International Conference on Innovations in Information Technology (IIT)10.1109/IIT50501.2020.9298979(126-130)Online publication date: 17-Nov-2020
https://doi.org/10.1109/IIT50501.2020.9298979

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents