-
The Coming of Age of Nucleic Acid Vaccines during COVID-19
Authors:
Halie M. Rando,
Ronan Lordan,
Likhitha Kolla,
Elizabeth Sell,
Alexandra J. Lee,
Nils Wellhausen,
Amruta Naik,
Jeremy P. Kamil,
COVID-19 Review Consortium,
Anthony Gitter,
Casey S. Greene
Abstract:
In the 21st century, several emergent viruses have posed a global threat. Each pathogen has emphasized the value of rapid and scalable vaccine development programs. The ongoing SARS-CoV-2 pandemic has made the importance of such efforts especially clear. New biotechnological advances in vaccinology allow for recent advances that provide only the nucleic acid building blocks of an antigen, eliminat…
▽ More
In the 21st century, several emergent viruses have posed a global threat. Each pathogen has emphasized the value of rapid and scalable vaccine development programs. The ongoing SARS-CoV-2 pandemic has made the importance of such efforts especially clear. New biotechnological advances in vaccinology allow for recent advances that provide only the nucleic acid building blocks of an antigen, eliminating many safety concerns. During the COVID-19 pandemic, these DNA and RNA vaccines have facilitated the development and deployment of vaccines at an unprecedented pace. This success was attributable at least in part to broader shifts in scientific research relative to prior epidemics; the genome of SARS-CoV-2 was available as early as January 2020, facilitating global efforts in the development of DNA and RNA vaccines within two weeks of the international community becoming aware of the new viral threat. Additionally, these technologies that were previously only theoretical are not only safe but also highly efficacious. Although historically a slow process, the rapid development of vaccines during the COVID-19 crisis reveals a major shift in vaccine technologies. Here, we provide historical context for the emergence of these paradigm-shifting vaccines. We describe several DNA and RNA vaccines and in terms of their efficacy, safety, and approval status. We also discuss patterns in worldwide distribution. The advances made since early 2020 provide an exceptional illustration of how rapidly vaccine development technology has advanced in the last two decades in particular and suggest a new era in vaccines against emerging pathogens.
△ Less
Submitted 24 January, 2023; v1 submitted 14 October, 2022;
originally announced October 2022.
-
Application of Traditional Vaccine Development Strategies to SARS-CoV-2
Authors:
Halie M. Rando,
Ronan Lordan,
Alexandra J. Lee,
Amruta Naik,
Nils Wellhausen,
Elizabeth Sell,
Likhitha Kolla,
COVID-19 Review Consortium,
Anthony Gitter,
Casey S. Greene
Abstract:
Over the past 150 years, vaccines have revolutionized the relationship between people and disease. During the COVID-19 pandemic, technologies such as mRNA vaccines have received attention due to their novelty and successes. However, more traditional vaccine development platforms have also yielded important tools in the worldwide fight against the SARS-CoV-2 virus. A variety of approaches have been…
▽ More
Over the past 150 years, vaccines have revolutionized the relationship between people and disease. During the COVID-19 pandemic, technologies such as mRNA vaccines have received attention due to their novelty and successes. However, more traditional vaccine development platforms have also yielded important tools in the worldwide fight against the SARS-CoV-2 virus. A variety of approaches have been used to develop COVID-19 vaccines that are now authorized for use in countries around the world. In this review, we highlight strategies that focus on the viral capsid and outwards, rather than on the nucleic acids inside. These approaches fall into two broad categories: whole-virus vaccines and subunit vaccines. Whole-virus vaccines use the virus itself, either in an inactivated or attenuated state. Subunit vaccines contain instead an isolated, immunogenic component of the virus. Here, we highlight vaccine candidates that apply these approaches against SARS-CoV-2 in different ways. In a companion manuscript, we review the more recent and novel development of nucleic-acid based vaccine technologies. We further consider the role that these COVID-19 vaccine development programs have played in prophylaxis at the global scale. Well-established vaccine technologies have proved especially important to making vaccines accessible in low- and middle-income countries. Vaccine development programs that use established platforms have been undertaken in a much wider range of countries than those using nucleic-acid-based technologies, which have been led by wealthy Western countries. Therefore, these vaccine platforms, though less novel from a biotechnological standpoint, have proven to be extremely important to the management of SARS-CoV-2.
△ Less
Submitted 23 January, 2023; v1 submitted 16 August, 2022;
originally announced August 2022.
-
Molecular and Serologic Diagnostic Technologies for SARS-CoV-2
Authors:
Halie M. Rando,
Christian Brueffer,
Ronan Lordan,
Anna Ada Dattoli,
David Manheim,
Jesse G. Meyer,
Ariel I. Mundo,
Dimitri Perrin,
David Mai,
Nils Wellhausen,
COVID-19 Review Consortium,
Anthony Gitter,
Casey S. Greene
Abstract:
The COVID-19 pandemic has presented many challenges that have spurred biotechnological research to address specific problems. Diagnostics is one area where biotechnology has been critical. Diagnostic tests play a vital role in managing a viral threat by facilitating the detection of infected and/or recovered individuals. From the perspective of what information is provided, these tests fall into t…
▽ More
The COVID-19 pandemic has presented many challenges that have spurred biotechnological research to address specific problems. Diagnostics is one area where biotechnology has been critical. Diagnostic tests play a vital role in managing a viral threat by facilitating the detection of infected and/or recovered individuals. From the perspective of what information is provided, these tests fall into two major categories, molecular and serological. Molecular diagnostic techniques assay whether a virus is present in a biological sample, thus making it possible to identify individuals who are currently infected. Additionally, when the immune system is exposed to a virus, it responds by producing antibodies specific to the virus. Serological tests make it possible to identify individuals who have mounted an immune response to a virus of interest and therefore facilitate the identification of individuals who have previously encountered the virus. These two categories of tests provide different perspectives valuable to understanding the spread of SARS-CoV-2. Within these categories, different biotechnological approaches offer specific advantages and disadvantages. Here we review the categories of tests developed for the detection of the SARS-CoV-2 virus or antibodies against SARS-CoV-2 and discuss the role of diagnostics in the COVID-19 pandemic.
△ Less
Submitted 28 April, 2022; v1 submitted 26 April, 2022;
originally announced April 2022.
-
Using genome-wide expression compendia to study microorganisms
Authors:
Alexandra J. Lee,
Taylor Reiter,
Georgia Doing,
Julia Oh,
Deborah A. Hogan,
Casey S. Greene
Abstract:
A gene expression compendium is a heterogeneous collection of gene expression experiments assembled from data collected for diverse purposes. The widely varied experimental conditions and genetic backgrounds across samples creates a tremendous opportunity for gaining a systems level understanding of the transcriptional responses that influence phenotypes. Variety in experimental design is particul…
▽ More
A gene expression compendium is a heterogeneous collection of gene expression experiments assembled from data collected for diverse purposes. The widely varied experimental conditions and genetic backgrounds across samples creates a tremendous opportunity for gaining a systems level understanding of the transcriptional responses that influence phenotypes. Variety in experimental design is particularly important for studying microbes, where the transcriptional responses integrate many signals and demonstrate plasticity across strains including response to what nutrients are available and what microbes are present. Advances in high-throughput measurement technology have made it feasible to construct compendia for many microbes. In this review we discuss how these compendia are constructed and analyzed to reveal transcriptional patterns.
△ Less
Submitted 25 March, 2022;
originally announced March 2022.
-
Beyond Low Earth Orbit: Biological Research, Artificial Intelligence, and Self-Driving Labs
Authors:
Lauren M. Sanders,
Jason H. Yang,
Ryan T. Scott,
Amina Ann Qutub,
Hector Garcia Martin,
Daniel C. Berrios,
Jaden J. A. Hastings,
Jon Rask,
Graham Mackintosh,
Adrienne L. Hoarfrost,
Stuart Chalk,
John Kalantari,
Kia Khezeli,
Erik L. Antonsen,
Joel Babdor,
Richard Barker,
Sergio E. Baranzini,
Afshin Beheshti,
Guillermo M. Delgado-Aparicio,
Benjamin S. Glicksberg,
Casey S. Greene,
Melissa Haendel,
Arif A. Hamid,
Philip Heller,
Daniel Jamieson
, et al. (31 additional authors not shown)
Abstract:
Space biology research aims to understand fundamental effects of spaceflight on organisms, develop foundational knowledge to support deep space exploration, and ultimately bioengineer spacecraft and habitats to stabilize the ecosystem of plants, crops, microbes, animals, and humans for sustained multi-planetary life. To advance these aims, the field leverages experiments, platforms, data, and mode…
▽ More
Space biology research aims to understand fundamental effects of spaceflight on organisms, develop foundational knowledge to support deep space exploration, and ultimately bioengineer spacecraft and habitats to stabilize the ecosystem of plants, crops, microbes, animals, and humans for sustained multi-planetary life. To advance these aims, the field leverages experiments, platforms, data, and model organisms from both spaceborne and ground-analog studies. As research is extended beyond low Earth orbit, experiments and platforms must be maximally autonomous, light, agile, and intelligent to expedite knowledge discovery. Here we present a summary of recommendations from a workshop organized by the National Aeronautics and Space Administration on artificial intelligence, machine learning, and modeling applications which offer key solutions toward these space biology challenges. In the next decade, the synthesis of artificial intelligence into the field of space biology will deepen the biological understanding of spaceflight effects, facilitate predictive modeling and analytics, support maximally autonomous and reproducible experiments, and efficiently manage spaceborne data and metadata, all with the goal to enable life to thrive in deep space.
△ Less
Submitted 22 December, 2021;
originally announced December 2021.
-
Beyond Low Earth Orbit: Biomonitoring, Artificial Intelligence, and Precision Space Health
Authors:
Ryan T. Scott,
Erik L. Antonsen,
Lauren M. Sanders,
Jaden J. A. Hastings,
Seung-min Park,
Graham Mackintosh,
Robert J. Reynolds,
Adrienne L. Hoarfrost,
Aenor Sawyer,
Casey S. Greene,
Benjamin S. Glicksberg,
Corey A. Theriot,
Daniel C. Berrios,
Jack Miller,
Joel Babdor,
Richard Barker,
Sergio E. Baranzini,
Afshin Beheshti,
Stuart Chalk,
Guillermo M. Delgado-Aparicio,
Melissa Haendel,
Arif A. Hamid,
Philip Heller,
Daniel Jamieson,
Katelyn J. Jarvis
, et al. (31 additional authors not shown)
Abstract:
Human space exploration beyond low Earth orbit will involve missions of significant distance and duration. To effectively mitigate myriad space health hazards, paradigm shifts in data and space health systems are necessary to enable Earth-independence, rather than Earth-reliance. Promising developments in the fields of artificial intelligence and machine learning for biology and health can address…
▽ More
Human space exploration beyond low Earth orbit will involve missions of significant distance and duration. To effectively mitigate myriad space health hazards, paradigm shifts in data and space health systems are necessary to enable Earth-independence, rather than Earth-reliance. Promising developments in the fields of artificial intelligence and machine learning for biology and health can address these needs. We propose an appropriately autonomous and intelligent Precision Space Health system that will monitor, aggregate, and assess biomedical statuses; analyze and predict personalized adverse health outcomes; adapt and respond to newly accumulated data; and provide preventive, actionable, and timely insights to individual deep space crew members and iterative decision support to their crew medical officer. Here we present a summary of recommendations from a workshop organized by the National Aeronautics and Space Administration, on future applications of artificial intelligence in space biology and health. In the next decade, biomonitoring technology, biomarker science, spacecraft hardware, intelligent software, and streamlined data management must mature and be woven together into a Precision Space Health system to enable humanity to thrive in deep space.
△ Less
Submitted 22 December, 2021;
originally announced December 2021.
-
An Open-Publishing Response to the COVID-19 Infodemic
Authors:
Halie M. Rando,
Simina M. Boca,
Lucy D'Agostino McGowan,
Daniel S. Himmelstein,
Michael P. Robson,
Vincent Rubinetti,
Ryan Velazquez,
COVID-19 Review Consortium,
Casey S. Greene,
Anthony Gitter
Abstract:
The COVID-19 pandemic catalyzed the rapid dissemination of papers and preprints investigating the disease and its associated virus, SARS-CoV-2. The multifaceted nature of COVID-19 demands a multidisciplinary approach, but the urgency of the crisis combined with the need for social distancing measures present unique challenges to collaborative science. We applied a massive online open publishing ap…
▽ More
The COVID-19 pandemic catalyzed the rapid dissemination of papers and preprints investigating the disease and its associated virus, SARS-CoV-2. The multifaceted nature of COVID-19 demands a multidisciplinary approach, but the urgency of the crisis combined with the need for social distancing measures present unique challenges to collaborative science. We applied a massive online open publishing approach to this problem using Manubot. Through GitHub, collaborators summarized and critiqued COVID-19 literature, creating a review manuscript. Manubot automatically compiled citation information for referenced preprints, journal publications, websites, and clinical trials. Continuous integration workflows retrieved up-to-date data from online sources nightly, regenerating some of the manuscript's figures and statistics. Manubot rendered the manuscript into PDF, HTML, LaTeX, and DOCX outputs, immediately updating the version available online upon the integration of new content. Through this effort, we organized over 50 scientists from a range of backgrounds who evaluated over 1,500 sources and developed seven literature reviews. While many efforts from the computational community have focused on mining COVID-19 literature, our project illustrates the power of open publishing to organize both technical and non-technical scientists to aggregate and disseminate information in response to an evolving crisis.
△ Less
Submitted 17 September, 2021;
originally announced September 2021.
-
Ten Quick Tips for Deep Learning in Biology
Authors:
Benjamin D. Lee,
Anthony Gitter,
Casey S. Greene,
Sebastian Raschka,
Finlay Maguire,
Alexander J. Titus,
Michael D. Kessler,
Alexandra J. Lee,
Marc G. Chevrette,
Paul Allen Stewart,
Thiago Britto-Borges,
Evan M. Cofer,
Kun-Hsing Yu,
Juan Jose Carmona,
Elana J. Fertig,
Alexandr A. Kalinin,
Beth Signal,
Benjamin J. Lengerich,
Timothy J. Triche Jr,
Simina M. Boca
Abstract:
Machine learning is a modern approach to problem-solving and task automation. In particular, machine learning is concerned with the development and applications of algorithms that can recognize patterns in data and use them for predictive modeling. Artificial neural networks are a particular class of machine learning algorithms and models that evolved into what is now described as deep learning. G…
▽ More
Machine learning is a modern approach to problem-solving and task automation. In particular, machine learning is concerned with the development and applications of algorithms that can recognize patterns in data and use them for predictive modeling. Artificial neural networks are a particular class of machine learning algorithms and models that evolved into what is now described as deep learning. Given the computational advances made in the last decade, deep learning can now be applied to massive data sets and in innumerable contexts. Therefore, deep learning has become its own subfield of machine learning. In the context of biological research, it has been increasingly used to derive novel insights from high-dimensional biological data. To make the biological applications of deep learning more accessible to scientists who have some experience with machine learning, we solicited input from a community of researchers with varied biological and deep learning interests. These individuals collaboratively contributed to this manuscript's writing using the GitHub version control platform and the Manubot manuscript generation toolset. The goal was to articulate a practical, accessible, and concise set of guidelines and suggestions to follow when using deep learning. In the course of our discussions, several themes became clear: the importance of understanding and applying machine learning fundamentals as a baseline for utilizing deep learning, the necessity for extensive model comparisons with careful evaluation, and the need for critical thought in interpreting results generated by deep learning, among others.
△ Less
Submitted 29 May, 2021;
originally announced May 2021.
-
A field guide to cultivating computational biology
Authors:
Anne E Carpenter,
Casey S Greene,
Piero Carnici,
Benilton S Carvalho,
Michiel de Hoon,
Stacey Finley,
Kim-Anh Le Cao,
Jerry SH Lee,
Luigi Marchionni,
Suzanne Sindi,
Fabian J Theis,
Gregory P Way,
Jean YH Yang,
Elana J Fertig
Abstract:
Biomedical research centers can empower basic discovery and novel therapeutic strategies by leveraging their large-scale datasets from experiments and patients. This data, together with new technologies to create and analyze it, has ushered in an era of data-driven discovery which requires moving beyond the traditional individual, single-discipline investigator research model. This interdisciplina…
▽ More
Biomedical research centers can empower basic discovery and novel therapeutic strategies by leveraging their large-scale datasets from experiments and patients. This data, together with new technologies to create and analyze it, has ushered in an era of data-driven discovery which requires moving beyond the traditional individual, single-discipline investigator research model. This interdisciplinary niche is where computational biology thrives. It has matured over the past three decades and made major contributions to scientific knowledge and human health, yet researchers in the field often languish in career advancement, publication, and grant review. We propose solutions for individual scientists, institutions, journal publishers, funding agencies, and educators.
△ Less
Submitted 22 April, 2021;
originally announced April 2021.
-
Identification and Development of Therapeutics for COVID-19
Authors:
Halie M. Rando,
Nils Wellhausen,
Soumita Ghosh,
Alexandra J. Lee,
Anna Ada Dattoli,
Fengling Hu,
James Brian Byrd,
Diane N. Rafizadeh,
Ronan Lordan,
Yanjun Qi,
Yuchen Sun,
Christian Brueffer,
Jeffrey M. Field,
Marouen Ben Guebila,
Nafisa M. Jadavji,
Ashwin N. Skelly,
Bharath Ramsundar,
Jinhui Wang,
Rishi Raj Goel,
YoSon Park,
the COVID-19 Review Consortium,
Simina M. Boca,
Anthony Gitter,
Casey S. Greene
Abstract:
After emerging in China in late 2019, the novel Severe acute respiratory syndrome-like coronavirus 2 (SARS-CoV-2) spread worldwide and as of early 2021, continues to significantly impact most countries. Only a small number of coronaviruses are known to infect humans, and only two are associated with the severe outcomes associated with SARS-CoV-2: Severe acute respiratory syndrome-related coronavir…
▽ More
After emerging in China in late 2019, the novel Severe acute respiratory syndrome-like coronavirus 2 (SARS-CoV-2) spread worldwide and as of early 2021, continues to significantly impact most countries. Only a small number of coronaviruses are known to infect humans, and only two are associated with the severe outcomes associated with SARS-CoV-2: Severe acute respiratory syndrome-related coronavirus, a closely related species of SARS-CoV-2 that emerged in 2002, and Middle East respiratory syndrome-related coronavirus, which emerged in 2012. Both of these previous epidemics were controlled fairly rapidly through public health measures, and no vaccines or robust therapeutic interventions were identified. However, previous insights into the immune response to coronaviruses gained during the outbreaks of severe acute respiratory syndrome (SARS) and Middle East respiratory syndrome (MERS) have proved beneficial to identifying approaches to the treatment and prophylaxis of novel coronavirus disease 2019 (COVID-19). A number of potential therapeutics against SARS-CoV-2 and the resultant COVID-19 illness were rapidly identified, leading to a large number of clinical trials investigating a variety of possible therapeutic approaches being initiated early on in the pandemic. As a result, a small number of therapeutics have already been authorized by regulatory agencies such as the Food and Drug Administration (FDA) in the United States, and many other therapeutics remain under investigation. Here, we describe a range of approaches for the treatment of COVID-19, along with their proposed mechanisms of action and the current status of clinical investigation into each candidate. The status of these investigations will continue to evolve, and this review will be updated as progress is made.
△ Less
Submitted 10 September, 2021; v1 submitted 3 March, 2021;
originally announced March 2021.
-
Dietary Supplements and Nutraceuticals Under Investigation for COVID-19 Prevention and Treatment
Authors:
Ronan Lordan,
Halie M. Rando,
COVID-19 Review Consortium,
Casey S. Greene
Abstract:
Coronavirus disease 2019 (COVID-19) has caused global disruption and a significant loss of life. Existing treatments that can be repurposed as prophylactic and therapeutic agents could reduce the pandemic's devastation. Emerging evidence of potential applications in other therapeutic contexts has led to the investigation of dietary supplements and nutraceuticals for COVID-19. Such products include…
▽ More
Coronavirus disease 2019 (COVID-19) has caused global disruption and a significant loss of life. Existing treatments that can be repurposed as prophylactic and therapeutic agents could reduce the pandemic's devastation. Emerging evidence of potential applications in other therapeutic contexts has led to the investigation of dietary supplements and nutraceuticals for COVID-19. Such products include vitamin C, vitamin D, omega 3 polyunsaturated fatty acids, probiotics, and zinc, all of which are currently under clinical investigation. In this review, we critically appraise the evidence surrounding dietary supplements and nutraceuticals for the prophylaxis and treatment of COVID-19. Overall, further study is required before evidence-based recommendations can be formulated, but nutritional status plays a significant role in patient outcomes, and these products could help alleviate deficiencies. For example, evidence indicates that vitamin D deficiency may be associated with greater incidence of infection and severity of COVID-19, suggesting that vitamin D supplementation may hold prophylactic or therapeutic value. A growing number of scientific organizations are now considering recommending vitamin D supplementation to those at high risk of COVID-19. Because research in vitamin D and other nutraceuticals and supplements is preliminary, here we evaluate the extent to which these nutraceutical and dietary supplements hold potential in the COVID-19 crisis.
△ Less
Submitted 3 February, 2021;
originally announced February 2021.
-
Pathogenesis, Symptomatology, and Transmission of SARS-CoV-2 through Analysis of Viral Genomics and Structure
Authors:
Halie M. Rando,
Adam L. MacLean,
Alexandra J. Lee,
Ronan Lordan,
Sandipan Ray,
Vikas Bansal,
Ashwin N. Skelly,
Elizabeth Sell,
John J. Dziak,
Lamonica Shinholster,
Lucy D'Agostino McGowan,
Marouen Ben Guebila,
Nils Wellhausen,
Sergey Knyazev,
Simina M. Boca,
Stephen Capone,
Yanjun Qi,
YoSon Park,
Yuchen Sun,
David Mai,
Joel D. Boerckel,
Christian Brueffer,
James Brian Byrd,
Jeremy P. Kamil,
Jinhui Wang
, et al. (9 additional authors not shown)
Abstract:
The novel coronavirus SARS-CoV-2, which emerged in late 2019, has since spread around the world and infected hundreds of millions of people with coronavirus disease 2019 (COVID-19). While this viral species was unknown prior to January 2020, its similarity to other coronaviruses that infect humans has allowed for rapid insight into the mechanisms that it uses to infect human hosts, as well as the…
▽ More
The novel coronavirus SARS-CoV-2, which emerged in late 2019, has since spread around the world and infected hundreds of millions of people with coronavirus disease 2019 (COVID-19). While this viral species was unknown prior to January 2020, its similarity to other coronaviruses that infect humans has allowed for rapid insight into the mechanisms that it uses to infect human hosts, as well as the ways in which the human immune system can respond. Here, we contextualize SARS-CoV-2 among other coronaviruses and identify what is known and what can be inferred about its behavior once inside a human host. Because the genomic content of coronaviruses, which specifies the virus's structure, is highly conserved, early genomic analysis provided a significant head start in predicting viral pathogenesis and in understanding potential differences among variants. The pathogenesis of the virus offers insights into symptomatology, transmission, and individual susceptibility. Additionally, prior research into interactions between the human immune system and coronaviruses has identified how these viruses can evade the immune system's protective mechanisms. We also explore systems-level research into the regulatory and proteomic effects of SARS-CoV-2 infection and the immune response. Understanding the structure and behavior of the virus serves to contextualize the many facets of the COVID-19 pandemic and can influence efforts to control the virus and treat the disease.
△ Less
Submitted 3 December, 2021; v1 submitted 1 February, 2021;
originally announced February 2021.
-
Recommendations to enhance rigor and reproducibility in biomedical research
Authors:
Jaqueline J. Brito,
Jun Li,
Jason H. Moore,
Casey S. Greene,
Nicole A. Nogoy,
Lana X. Garmire,
Serghei Mangul
Abstract:
Computational methods have reshaped the landscape of modern biology. While the biomedical community is increasingly dependent on computational tools, the mechanisms ensuring open data, open software, and reproducibility are variably enforced by academic institutions, funders, and publishers. Publications may present academic software for which essential materials are or become unavailable, such as…
▽ More
Computational methods have reshaped the landscape of modern biology. While the biomedical community is increasingly dependent on computational tools, the mechanisms ensuring open data, open software, and reproducibility are variably enforced by academic institutions, funders, and publishers. Publications may present academic software for which essential materials are or become unavailable, such as source code and documentation. Publications that lack such information compromise the role of peer review in evaluating technical strength and scientific contribution. Incomplete ancillary information for an academic software package may bias or limit any subsequent work produced with the tool. We provide eight recommendations across four different domains to improve reproducibility, transparency, and rigor in computational biology - precisely on the main values which should be emphasized in life science curricula. Our recommendations for improving software availability, usability, and archival stability aim to foster a sustainable data science ecosystem in biomedicine and life science research.
△ Less
Submitted 27 July, 2020; v1 submitted 14 January, 2020;
originally announced January 2020.
-
Incorporating biological structure into machine learning models in biomedicine
Authors:
Jake Crawford,
Casey S. Greene
Abstract:
In biomedical applications of machine learning, relevant information often has a rich structure that is not easily encoded as real-valued predictors. Examples of such data include DNA or RNA sequences, gene sets or pathways, gene interaction or coexpression networks, ontologies, and phylogenetic trees. We highlight recent examples of machine learning models that use structure to constrain model ar…
▽ More
In biomedical applications of machine learning, relevant information often has a rich structure that is not easily encoded as real-valued predictors. Examples of such data include DNA or RNA sequences, gene sets or pathways, gene interaction or coexpression networks, ontologies, and phylogenetic trees. We highlight recent examples of machine learning models that use structure to constrain model architecture or incorporate structured data into model training. For machine learning in biomedicine, where sample size is limited and model interpretability is critical, incorporating prior knowledge in the form of structured data can be particularly useful. The area of research would benefit from performant open source implementations and independent benchmarking efforts.
△ Less
Submitted 15 October, 2019;
originally announced October 2019.
-
Evaluating deep variational autoencoders trained on pan-cancer gene expression
Authors:
Gregory P. Way,
Casey S. Greene
Abstract:
Cancer is a heterogeneous disease with diverse molecular etiologies and outcomes. The Cancer Genome Atlas (TCGA) has released a large compendium of over 10,000 tumors with RNA-seq gene expression measurements. Gene expression captures the diverse molecular profiles of tumors and can be interrogated to reveal differential pathway activations. Deep unsupervised models, including Variational Autoenco…
▽ More
Cancer is a heterogeneous disease with diverse molecular etiologies and outcomes. The Cancer Genome Atlas (TCGA) has released a large compendium of over 10,000 tumors with RNA-seq gene expression measurements. Gene expression captures the diverse molecular profiles of tumors and can be interrogated to reveal differential pathway activations. Deep unsupervised models, including Variational Autoencoders (VAE) can be used to reveal these underlying patterns. We compare a one-hidden layer VAE to two alternative VAE architectures with increased depth. We determine the additional capacity marginally improves performance. We train and compare the three VAE architectures to other dimensionality reduction techniques including principal components analysis (PCA), independent components analysis (ICA), non-negative matrix factorization (NMF), and analysis of gene expression by denoising autoencoders (ADAGE). We compare performance in a supervised learning task predicting gene inactivation pan-cancer and in a latent space analysis of high grade serous ovarian cancer (HGSC) subtypes. We do not observe substantial differences across algorithms in the classification task. VAE latent spaces offer biological insights into HGSC subtype biology.
△ Less
Submitted 13 November, 2017;
originally announced November 2017.
-
An expanded evaluation of protein function prediction methods shows an improvement in accuracy
Authors:
Yuxiang Jiang,
Tal Ronnen Oron,
Wyatt T Clark,
Asma R Bankapur,
Daniel D'Andrea,
Rosalba Lepore,
Christopher S Funk,
Indika Kahanda,
Karin M Verspoor,
Asa Ben-Hur,
Emily Koo,
Duncan Penfold-Brown,
Dennis Shasha,
Noah Youngs,
Richard Bonneau,
Alexandra Lin,
Sayed ME Sahraeian,
Pier Luigi Martelli,
Giuseppe Profiti,
Rita Casadio,
Renzhi Cao,
Zhaolong Zhong,
Jianlin Cheng,
Adrian Altenhoff,
Nives Skunca
, et al. (122 additional authors not shown)
Abstract:
Background: The increasing volume and variety of genotypic and phenotypic data is a major defining characteristic of modern biomedical sciences. At the same time, the limitations in technology for generating data and the inherently stochastic nature of biomolecular events have led to the discrepancy between the volume of data and the amount of knowledge gleaned from it. A major bottleneck in our a…
▽ More
Background: The increasing volume and variety of genotypic and phenotypic data is a major defining characteristic of modern biomedical sciences. At the same time, the limitations in technology for generating data and the inherently stochastic nature of biomolecular events have led to the discrepancy between the volume of data and the amount of knowledge gleaned from it. A major bottleneck in our ability to understand the molecular underpinnings of life is the assignment of function to biological macromolecules, especially proteins. While molecular experiments provide the most reliable annotation of proteins, their relatively low throughput and restricted purview have led to an increasing role for computational function prediction. However, accurately assessing methods for protein function prediction and tracking progress in the field remain challenging. Methodology: We have conducted the second Critical Assessment of Functional Annotation (CAFA), a timed challenge to assess computational methods that automatically assign protein function. One hundred twenty-six methods from 56 research groups were evaluated for their ability to predict biological functions using the Gene Ontology and gene-disease associations using the Human Phenotype Ontology on a set of 3,681 proteins from 18 species. CAFA2 featured significantly expanded analysis compared with CAFA1, with regards to data set size, variety, and assessment metrics. To review progress in the field, the analysis also compared the best methods participating in CAFA1 to those of CAFA2. Conclusions: The top performing methods in CAFA2 outperformed the best methods from CAFA1, demonstrating that computational function prediction is improving. This increased accuracy can be attributed to the combined effect of the growing number of experimental annotations and improved methods for function prediction.
△ Less
Submitted 2 January, 2016;
originally announced January 2016.