Aging is one of the chief biomedical problems of the 21st century. After decades of basic research on biogerontology (the science of aging), the aging process still remains an enigma. Although hundreds of "theories" on aging have been...
moreAging is one of the chief biomedical problems of the 21st century. After decades of basic research on biogerontology
(the science of aging), the aging process still remains an enigma. Although hundreds of "theories" on aging have
been formulated and many fundamental insights about age-related changes and genetic as well as environmental
interventions that change the pace of aging have been discovered, the actual why and how we age remain enigmatic.
In the post-genomic era there is an exponential increase in data. As a consequence it is a challenge to utilize all
information based on it and derive meaningful knowledge about biological phenomena. No individual scientist, no
group, nor consortium is capable of keeping up within their own field and are overwhelmed by the explosion of data
increase. Machine learning applied on biological data has the potential to solve this and cause a paradigm shift from
hypothesis-driven research (which predominates biological research including biogerontology) to data-driven
research.
This dissertation addresses this problem. In particular it proposes and executes the use of machine learning on
current existing data to predict drivers of aging (and therefore helps to distinguish causes from consequences),
interventions to counteract aging, and specific hypotheses to fill in research gaps that require experimental
validation.
The objective of this project is therefore to build computational models that are based on data relevant to the
phenomenon of aging and to predict as many of its aspects and dimensions as possible (thus elucidate their relations
to each other). For converting between and sorting within dimensions which are relevant to aging, different machine
learning models are evaluated. Ones models are built, it can be determined how much they can explain different
aspects of aging. Those models will also be capable of specifying which features are most relevant for prediction (in
both classification or regression). It is possible to train models that incorporate age-related changes based on
transcriptomic, proteomic, metabolomic, epigenomic as well as morphological data and their combinations. Machine
learning is further used to convert between and within them.
This work focuses on three types of predictors. Subsequently, discoveries are made with the statistical and learning
algorithms. The first model (lifespan predictor) is trained on predicting the lifespan based on genotype, environment
and combinations thereof. It is useful for predicting lifespan extending interventions on the population level. The
second model (age predictor) is trained on predicting the age given features measured on individuals. This is useful
for identifying biomarkers of aging and to determine the effects of interventions on the level of individuals. The third
model predicts functions/regulations of biological entities in regard to the aging process based on heterogeneous data
such as ontologies and diverse omics including time-series gene expression profiles (which can be visualized as
plots), and linked data. It is used to understand the role of genes and proteins as well as perhaps other entities such as
small molecules including lipids and other metabolites. Functions of proteins, which are still unknown, especially
those involved in yeast lipid metabolism and its regulation, can be predicted.
For this purpose we use primarily yeast as model organism as well as data on humans. Other biomedical model
organisms might be added if found beneficial.
The novel aspects of this research are for instance that 1) aging is investigated systematically in an unbiased
data-driven approach, 2) lifespan is predicted as continuous values, 3) age is predicted by combining multiple omics
data, 4) functions and regulations of biological entities like genes are predicted with high confidence from
heterogeneous data sources.
This thesis discovered that genetics is the most important feature of lifespan determination. Phenotypic features
related to lipid and membranes such as vacuolar morphology and autophagy activity are important for lifespan
determination according to the best performing models. A age predictor based on transcriptomics and proteomics can
highly accurately determine the age. It selected features are associated with both translation and lipid metabolism.
Among the top selected features are transcripts of genes when deleted exhibit abnormal vacuolar morphology as well
as targets of Opi1. Opi1 itself and its regulators were found to be differentially regulated post-transcriptional or
post-translational. Lastly, a function predictor for genes was created that achieved exceptional accuracy of
classifying aging genes. It learned for instance that piecemeal autophagy of the nucleus is strongly predictive for
aging-suppressor genes while cytoplasmic translation is strongly predictive for gerontogenes