Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3388440.3412412acmconferencesArticle/Chapter ViewAbstractPublication PagesbcbConference Proceedingsconference-collections
research-article
Open access

Zero-shot imputations across species are enabled through joint modeling of human and mouse epigenomics

Published: 10 November 2020 Publication History

Abstract

Recent large-scale efforts to characterize functional activity in human have produced thousands of genome-wide experiments that quantify various forms of biochemistry, such as histone modifications, protein binding, transcription, and chromatin accessibility. Although these experiments represent a small fraction of the possible experiments that could be performed, they also make human more comprehensively characterized than any other species. We propose an extension to the imputation approach Avocado that enables the model to leverage genome alignments and the large number of human genomics data sets when making imputations in other species. We found that not only does this extension result in improved imputation of mouse functional experiments, but that the extended model is able to make accurate imputations for protein binding assays that have been performed in human but not in mouse. This ability to make "zero-shot" imputations greatly increases the utility of such imputation approaches and enables comprehensive imputations to be made for species even when experimental data are sparse.

References

[1]
B. E. Bernstein, M. Kamal, K. Lindblad-Toh, S. Bekiranov, D. K. Bailey, D.J. Huebert, S. McMahon, E. K. Karlsson, E. J. Kulbokas, T. R. Gingeras, S. L. Schreiber, and E. S. Lander. 2005. Genomic maps and comparative analysis of histone modifications in human and mouse. Cell 120, 2 (2005), 169--181.
[2]
D. Bujold, D. A. Morais, C. Gauthier, C. Cote, M. Caron, T. Kwan, K. C. Chen, J. Laperle, A. N. Markovits, T. Pastinen, B. Caron, A. Veilleux, P. E. Jacques, and G. Bourque. 2016. The International Human Epigenome Consortium Data Portal. Cell Systems 3 (2016), 496--499. Issue 5.
[3]
R. W. DeBry and M. F. Seldin. 1996. Human/mouse homology relationships. Genomics 33 (1996), 337--351.
[4]
T. J. Durham, M. W. Libbrecht, J. J. Howbert, J. A. Bilmes, and W. S. Noble. 2018. PREDICTD: PaRallel Epigenomics Data Imputation with Cloud-based Tensor Decomposition. Nature Communications 9 (2018).
[5]
ENCODE Project Consortium. 2012. An Integrated Encyclopedia of DNA Elements in the Human Genome. Nature 489 (2012), 57--74.
[6]
J. Ernst and M. Kellis. 2012. ChromHMM: automating chromatin-state discovery and characterization. Nat Methods 9, 3 (2012), 215--216.
[7]
J. Ernst and M. Kellis. 2015. Large-scale imputation of epigenomic datasets for systematic annotation of diverse human tissues. Nature Biotechnology 33, 4 (2015), 364--376.
[8]
W. H. Gharib and M. Robinson-Rechavi. 2011. When orthologs diverge between human and mouse. Brief Bioinform 12 (2011), 436--441. Issue 5.
[9]
C. W. Hanna, H. Demond, and G. Kelsey. 2018. Epigenetic regulation in development: is the mouse a good model for the human? Human Reproduction Update 24 (2018), 556--576. Issue 5.
[10]
R. C. Hardison, J. Oeltjen, and W. Miller. 1997. Long human-mouse sequence alignments reveal novel regulatory elements: A reason to sequence the mouse genome. Genome Research 7, 10 (1997), 959--966.
[11]
M. M. Hoffman, O.J. Buske, J. Wang, Z. Weng, J. A. Bilmes, and W. S. Noble. 2012. Unsupervised pattern discovery in human chromatin structure through genomic segmentation. Nature Methods 9, 5 (2012), 473--476.
[12]
D. R. Kelley. 2019. Cross-species regulatory sequence activity prediction. bioRxiv (2019).
[13]
D. Kingma and J. Ba. 2015. Adam: A Method for Stochastic Optimization. In Proceedings of the 3rd International Conference on Learning Representations.
[14]
Anshul Kundaje, Wouter Meuleman, Jason Ernst, Misha Bilenky, Angela Yen, Alireza Heravi-Moussavi, Pouya Kheradpour, Zhizhuo Zhang, Jianrong Wang, and Michael J Ziller. 2015. Integrative analysis of 111 reference human epigenomes. Nature 518, 7539 (2015), 317--330.
[15]
Stephen G Landt, Georgi K Marinov, Anshul Kundaje, Pouya Kheradpour, et al. 2012. ChIP-seq guidelines and practices of the ENCODE and modENCODE consortia. Genome Research 22, 9 (Sep 2012), 1813--1831. https://doi.org/10.1101/gr.136184.111
[16]
F. M. Lehmann, S. Feicht, F. Helm, A. Maurberger, C. Ladinig, U. Zimber-Strobl, R. Kuhn, J. Mautner, A. Gerbitz, and G. W. Bornkamm. 2012. Humanized c-Myc Mouse. PLoS One 7, 7 (2012), 1--8. https://doi.org/10.1371/journal.pone.0042021
[17]
J. P. Morton and O. J. Sansom. 2013. MYC-y mice: From tumour initiation to therapeutic targeting of endogenous MYC. Molecular Oncology 7, 2 (2013), 248--258.
[18]
Mouse Genome Sequencing Consortium. 2002. Initial sequencing and comparative analysis of the mouse genome. Nature 420 (2002), 520--562.
[19]
Florian M Pauler, Mathew A Sloane, Ru Huang, Kakkad Regha, Martha V Koerner, Ido Tamir, Andreas Sommer, Andras Aszodi, Thomas Jenuwein, and Denise P Barlow. 2009. H3K27me3 forms BLOCs over silent genes and intergenic regions and specifies a histone banding pattern on a mouse autosomal chromosome. Genome Research 19, 2 (2009), 221--233.
[20]
J. M. Schreiber, J. Bilmes, and W. S. Noble. 2019. Prioritizing transcriptomic and epigenomic experiments by using an optimization strategy that leverages imputed data. bioRxiv (2019). https://www.biorxiv.org/content/10.1101/708107v1.
[21]
J. M. Schreiber, J. Bilmes, and W. S. Noble. 2020. Completing the ENCODE3 compendium yields accurate imputations across a variety of assays and human biosamples. Genome Biology (2020).
[22]
J. M. Schreiber, T. J. Durham, J. Bilmes, and W. S. Noble. 2020. Multi-scale deep tensor factorization learns a latent representation of the human epigenome. Genome Biology (2020).
[23]
J. M. Schreiber, R. Singh, J. Bilmes, and W. S. Noble. 2019. A pitfall for machine learning methods aiming to predict across cell types. bioRxiv (2019). https://www.biorxiv.org/content/10.1101/512434v1.
[24]
Rotem Sorek and Gil Ast. 2003. Intronic sequences flanking alternatively spliced exons are conserved between human and mouse. Genome Research 13, 7 (Jul 2003), 1631--1637.
[25]
D. Szklarcyk, A. L. Gable, and D. Lyon et al. 2019. STRING v11: Protein-protein association networks with increased coverage, supporting functional discovery in genome-wide experimental datasets. Nucleic Acids Res. 47, D1 (2019), D607-D613.
[26]
K. Wei, M.W. Libbrecht, J. A. Bilmes, and W. S. Noble. 2016. Choosing panels of genomics assays using submodular optimization. Genome Biology 17, 1 (2016), 229.
[27]
F. Yue and The Mouse ENCODE Consortium. 2014. A comparative encyclopedia of DNA elements in the mouse genome. Nature 515 (2014), 355--364.

Cited By

View all
  • (2022)Domain-adaptive neural networks improve cross-species prediction of transcription factor bindingGenome Research10.1101/gr.275394.12132:3(512-523)Online publication date: 18-Jan-2022
  • (2022)Asymmetric predictive relationships across histone modificationsNature Machine Intelligence10.1038/s42256-022-00455-x4:3(288-299)Online publication date: 21-Mar-2022

Index Terms

  1. Zero-shot imputations across species are enabled through joint modeling of human and mouse epigenomics

          Recommendations

          Comments

          Information & Contributors

          Information

          Published In

          cover image ACM Conferences
          BCB '20: Proceedings of the 11th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics
          September 2020
          193 pages
          ISBN:9781450379649
          DOI:10.1145/3388440
          This work is licensed under a Creative Commons Attribution International 4.0 License.

          Sponsors

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          Published: 10 November 2020

          Permissions

          Request permissions for this article.

          Check for updates

          Author Tags

          1. deep learning
          2. functional genomics
          3. tensor factorization

          Qualifiers

          • Research-article
          • Research
          • Refereed limited

          Conference

          BCB '20
          Sponsor:

          Acceptance Rates

          Overall Acceptance Rate 254 of 885 submissions, 29%

          Contributors

          Other Metrics

          Bibliometrics & Citations

          Bibliometrics

          Article Metrics

          • Downloads (Last 12 months)63
          • Downloads (Last 6 weeks)5
          Reflects downloads up to 27 Dec 2024

          Other Metrics

          Citations

          Cited By

          View all
          • (2022)Domain-adaptive neural networks improve cross-species prediction of transcription factor bindingGenome Research10.1101/gr.275394.12132:3(512-523)Online publication date: 18-Jan-2022
          • (2022)Asymmetric predictive relationships across histone modificationsNature Machine Intelligence10.1038/s42256-022-00455-x4:3(288-299)Online publication date: 21-Mar-2022

          View Options

          View options

          PDF

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader

          Login options

          Media

          Figures

          Other

          Tables

          Share

          Share

          Share this Publication link

          Share on social media