Cellular Reprogramming

Alessandra Jordano Conforte; Nicolas Carels

Cellular Reprogramming

With cellular reprogramming it is possible to convert a cell from one phenotype to another without necessarily passing through a pluripotent state. This perspective is opening many interesting fields in the world of research and biomedical applications. This essay provides a concise description of the subject by describing the purpose of this technique, its evolution, mathematical models used and applied methodologies. As example, four areas in the biomedical field where cellular reprogramming can be applied with interesting perspectives are illustrated: diseases modeling, drug discovery, precision medicine and regenerative medicine. Furthermore, the use of ordinary differential equations, Bayesian network and Boolean network are described in these contexts. These strategies of mathematical modeling are the three main types that are applied in gene regulatory networks to analyze the dynamic interaction between a network of nodes. Ultimately, their application in disease research is discussed considering their benefits and limitations.

Cellular Reprogramming Domenico Sgariglia1, Alessandra Jordano Conforte2, Luis Alfredo Vidal de Carvalho3, Nicolas Carels2, Fabricio Alves Barbosa da Silva4,* 1Programa de Engenharia de Sistemas e Computação, COPPE-UFRJ, Rio de Janeiro, Brazil 2Laboratório de Modelagem de Sistemas Biológicos, Centro de Desenvolvimento Tecnológico em Saúde, Fundação Oswaldo Cruz, Rio de Janeiro, Brazil 3Departamento de Medicina Preventiva, Faculdade de Medicina, UFRJ, Rio de Janeiro, Brazil 4Laboratório de Modelagem Computacional de Sistemas Biológicos, Programa de Computação Científica, Fundação Oswaldo Cruz, Rio de Janeiro, Brazil *Corresponding author Abstract With cellular reprogramming it is possible to convert a cell from one phenotype to another without necessarily passing through a pluripotent state. This perspective is opening many interesting fields in the world of research and biomedical applications. This essay provides a concise description of the subject by describing the purpose of this technique, its evolution, mathematical models used and applied methodologies. As example, four areas in the biomedical field where cellular reprogramming can be applied with interesting perspectives are illustrated: diseases modeling, drug discovery, precision medicine and regenerative medicine. Furthermore, the use of ordinary differential equations, Bayesian network and Boolean network are described in these contexts. These strategies of mathematical modeling are the three main types that are applied in gene regulatory networks to analyze the dynamic interaction between a network of nodes. Ultimately, their application in disease research is discussed considering their benefits and limitations. 1.1 – Introduction. The concept of cellular reprogramming began in the 1960s with the idea of reversing the direction of cell differentiation, which was so far conceived only as occurring in a single irreversible direction. The differentiation of cellular state was schematically described through the Waddington landscape , where the metaphorical valleys represent states of cellular stability, and the hills around them represent the epigenetics barriers that prevent the transition from one state to another. The goal of cellular reprogramming is to induce cells to overcome these barriers by moving from one stable state (attractor) to another through according to the technological simulations described in this chapter. Among the various scientific advances in this field, one may quote the work done by Takahashi and Yiamanaka , concerning the generation of induced pluripotent stem cells (iPSCs), as an important reference in the progress of cellular reprogramming. The ability of a cell to move from one attractor to another in the epigenetic space according to external or internal perturbations, for instance, the overexpression of some genes, has opened a huge field of investigation in the world of scientific research. Different strategies were followed with the aim of inducing phenotypic cell changes using the different mathematical and biological modeling techniques available. Techniques’ integration in different scientific areas such as biology, mathematics, statistics and computational sciences is essential for the success in the simulation of cellular reprogramming. For this reason, the contribution of systems biology is determinant for the success of this emerging field. The synergy between different disciplines is needed, for example, to model a gene regulatory network that describes the interactions influencing gene expression. This chapter first defines cellular reprogramming and its objective. Next, it provides a review of the methods used to achieve cellular reprogramming and the approaches to build the network models analyzed. Lastly, we discuss the applications of cellular reprogramming to diseases, highlighting benefits and limitations of this technique and its potential application in different areas. 1.2 – What is cellular reprogramming? 1.2.1 – Premise. We define cellular reprogramming as the conversion of one specific cell type to another one. Eukaryote cells transit from one state to another through changes in gene expression and, as consequence, protein levels in response to signals coming from the extracellular environment. The goal of cellular reprogramming is to artificially induce changes in a cell phenotype through perturbation of specific genes. Until few years ago, cellular differentiation has long been thought of as ‘one-way traffic’, without any possibility of returning to a previous cellular state. The idea that a cell could be induced to reverse its differentiation sate towards a less specialized one was not even imagined. The demonstration in 1963 of cell dedifferenciation in culture of adult fibroblast through interaction with stem cells of a mouse teratocarcinoma was a great step toward the concept that cellular differentiation is, indeed, reversible. In 2006, Takahashi and Yamanaka induced pluripotent stem cells (iPSCs) from adult fibroblast cultures of mouse under the incubation with the transcriptional factors POU5F1, SOX2, KLF4 and MYC . This remarkable discovery was a milestone for further advances and developments in the cellular reprogramming field. For the first time, it was shown to the scientific community that reversibility in the cell differentiation process was possible. Mature cells could be reverted to a previous pluripotent state and it was possible to control the gene expression pattern with a few number of transcription factors. 1.2.2 – Meaning of cellular reprogramming. We begin with the mechanism of cell reprogramming by the definition of epigenetic given by Conrad Waddington (Fig. 1.1): ‘Epigenetic is the branch of biology that studies the casual interactions between genes and their products which bring the phenotype into being’ . He conceives the epigenetic landscape as an inclined surface with a cascade of branch, ridges and valleys . Fig.1.1. Waddigton landscape representation of epigenetic space where the ball is for a cell that can roll down from an undifferentiated cell state to a specialized state. The branches are the different potential states and the ridges are the epigenetics barriers that prevent a cell from taking a different differentiation trajectory than that in which it is already engaged. The goal of cellular reprogramming is to push a ball from a valley back its starting point, which translated into a biologically compatible language means reverting the differentiated state of a cell back to its initial pluripotency. Following the same logic, it became clear that inducing a cell to move from one specialized cell state to another one without necessarily passing through the pluripotent state is also possible. Indeed, the transition from a differentiated state toward a progenitor state is referred to as dedifferentiation, while the transition between two differentiated states is called transdifferentiation. Keeping in mind the Waddington landscape representation described above, we might answer the following two questions: What are the barriers we must overcome to move from one cellular state to another? How can we induce such cellular state transitions? Answering the first question, we know that a stable cell state can be seen as a high-dimensional attractor of the gene regulatory network . Attractors correspond to stable states associated with specific cell types . In this context, cell fates are determined by gene expression and epigenetic patterns controlled by multiple factors , such as DNA methylation and histone modifications . Both modifications can affect gene expression without inducing changes in DNA. DNA methylation involves the addition of methyl groups to the DNA molecule that usually results in the inhibition of eukaryotic gene transcription. Histone modifications are post-translational processes that occur in the histone tails, which inhibit or induce local gene expression depending on the modification type . After illustrating the role of the epigenetic activity that controls cellular states, the second question can be answered: How can we induce state transitions? As outlined above, there are attractors corresponding to different cell fates and different epigenetic barriers that prevent transitions from one cell state to another. A stable cellular state is characterized by a given gene expression pattern. The perturbation of this pattern can induce cells to overcome these barriers by changing their steady state from one attractor to another in the epigenetic space . This transition has the consequence of changing the cell phenotype. Fig. 1.2. Schematic representation of the cellular transition from one attractor to another by overcoming an epigenetic barrier between two cell states as result of a specific perturbation. As an example, we can cite the positive regulation of transcription factors responsible for the regulation of a gene expression pattern. The schema of Fig. 1.2 may represent both dedifferentiation and transdifferentiation processes. In general, we can think at this gene expression landscape as an energy configuration, where the cellular state is defined by the underlying transcriptional and epigenetic regulation . 1.2.3 – Applications. Basically there are four main areas where cellular reprogramming are or could be applied in the biomedical research : Disease modeling, Drug discovery, Precision medicine, Regenerative medicine. With disease modeling (a) we may think about transforming a cell pathology into another desired cell condition, such as healthy, less aggressive phenotypes or even cell death. The benefit of this approach is to work with a human-specific representation that may not be available through cells coming from animal models. As an example, Dezonne et al. successfully generated astrocytes from human cerebral organoids. Astrocytes dysfunction is related to several neurological and degenerative diseases, and their cellular reprogramming provides potential for the investigation of developmental and evolutionary features of the human brain. Concerning drug discovery (b), new drug targets can be inferred from the model representation and tested for cell reprogramming in vitro and in vivo before they reach clinical trials. For example, induced pluripotent stem cells (IPSCs) can be reprogramed into insulin-secreting pancreatic β cells, and their target genes could serve fordrug development. Also, iPSCs from diabete patients are being used to perform drug screening for new therapies against diabetes mellitus (DM) . Precision medicine (c) aims to provide an individual treatment to patients and diseases. A key factor in this context is the pharmacogenomics that studies the influence of an individual’s genetic characteristic in relation to the body’s response to a drug. Succeeding in reprogramming a cell to a pluripotent state gives a chance to better understanding the genotype-phenotype relationship at individual level, which should allowing the improvement in therapeutic efficacy . Regenerative medicine (d) is the process of replacing, engineering or regenerating human cells, tissues or organs to restore or establish normal function . In therapies of cell replacement, the use of reprogrammed autologous cells can theoretically be a solution against the risk of graft rejection, due to cellular mismatch between host and donor. In order to implement this idea in humans, non-human primates were studied regarding their potential to generate iPSCs cells through different cellular reprogramming techniques . 1.3 – Reprogramming methods. By cell state, one means its phenotype identity as determined by the expression pattern of some of its key genes. Based on this definition, it is necessary to act on the gene expression of key genes to change cell’s phenotype identity, which is the main purpose of cellular reprogramming. Consequently, one way to achieve such purpose is to modulate the regulation of the transcriptional factors that are responsible for the expression of those key genes. This method will be discussed below, together with other cellular reprogramming techniques that were also used . 1.3.1 – Cellular reprogramming through the overexpression of transcription factors. The discovery that it is possible to change cellular fate by overexpressing just four transcription factors boosted the field of cellular reprogramming. After transformation, the cell was induced to a pluripotent state very much similar to that of embryonic stem cells; this similarity concerned morphology, phenotype and epigenetics. The switch from a somatic cell phenotype to iPSCs through the modulation of transcription factor expression has an efficiency lower than 1% . Once the genomic sequences of the original and reprogrammed cells are mostly identical, the reason for the low performance of cell reprogramming may be related to cell epigenetic factors, which indicates that IPSCs have an epigenetic memory inherited from the previous cellular state . This process is called ‘transdifferentiation’, or lineage reprogramming. By overexpressing Oct4, Sox2, Klf4 and c-Myc transcriptional factors in adult fibroblasts through transformation with retroviral vectors, Takahashi and Yamanara (3) performed DNA integration randomly at multiple sites, which might knockdown essential genes and promote the risk of oncogenicity. To avoid this risk, alternative transformation techniques were used, such as the combination of seven drug-like compounds that were able to generate iPSCs without the insertion of exogenous genes . In addition to the drug-like treatment, this technique involves the repeated transfection of plasmids for transcriptional factor expression into mouse embryogenic fibroblast, but without any evidence of their genomic integration . 1.3.2 – Somatic cell nuclear transfer. Somatic cell nuclear transfer (SCNT) is a technique in which the nucleus of a donor somatic cell is transferred to another enucleated one called egg cell. After insertion, the somatic cell nucleus is reprogrammed by the egg cell. With this method it is possible to obtain embryonic stem cell (ESCs) as well as to induce the differentiation of a cell phenotype into a different one . 1.3.3 – Cell fusion. It is possible to combine two nuclei within a same cell by the fusion of two cells. The dominant nucleus, the larger and more active, impose its pattern and consequently reprogram the somatic hybrid cell according to its dominant characteristics . It is worth noting to note here that the cell fusing technique is not always efficient in achieving the desired result and the reprogramming is often incomplete. 1.4 – Modeling cellular reprogramming. Reprogramming is obtained by resetting the regulation of gene expression in somatic cells, which depends on the knowledge of the key cellular proteins that may serve as target to induce this process. The intracellular environment is continually subjected to stimuli from extracellular environment, such as nutrient availability, mechanical injury, cell competition and cooperation etc. This type of stimulation affects the intracellular environment by changing the gene expression pattern in response to each stimulus. In this context, transcriptional factors are activated by the external signals through transduction and promote the expression of specific genes and their respective pathways to set up a cellular response. This regulation process can be extended and include the induction of specific cell phenotypes. Therefore, modeling the interaction between proteins in a living system, and the transcription factors that regulate their expression is essential to carry out cellular reprogramming. As an approach to model such cellular system, we may consider genes as variables and their activation state as “on” or “off”. With these observations in mind, we may address some mathematical methodologies a to represent the relationship between these state variables. 1.4.1 – A new approach The development of new high throughput technologies along with the growing amount of available data did promote computational frameworks based on protein interaction networks integrated to different databases, such as (i) FANTUM consortium , which contains data on promoter characterization, (ii) STRING , which provides protein-protein interactions (PPI), and (iii) MARA (Motif Activity Response Analysis) , which provides interactions between proteins and DNA, to predict the reprogramming factors necessary to induce cell conversion. In this context, Mogrify is a predictive system that integrate gene expression data and regulatory network information. It searches for differentially expressed transcriptional factors that regulates most of the differentially expressed genes between two cell types. This methodology has been validated and succeded in completing transdifferentiations. Basically, one may represent by biological system through three different modeling strategy (Fig. 1.3). 1.4.2 – Ordinary differential equation. In the context of a gene regulatory network, ordinary differential equations (ODE) are used to describe the existing quantitative relationship between variables, i.e., nodes . Theoretically, the use of ODE can provide a very accurate description of the existing interactions between system elements. In practice, the use of this technique, especially in complex networks, is difficult due to the high number of data and parameters involved in the process. The differential equation (formula 1) for each variable in the network is: (1) where the right side of the equation represent all variable function linked to the gene , and the left side is the variation in the gene expression. ODE can be used to model cellular reprogramming by determining the rate of change of a given substance concentration within the cell that determines a precise cellular state in response to some kind of cellular perturbation. For example, Mitra et al. 03:29:0403:29:04 used ordinary differential equations to prove that time delays from chemical reactions are of crucial important to understand cell differentiation and that it allows the introduction of a new system regime between two admissible steady states with sustained oscillations due to feedback loops in gene regulation circuits. 1.4.3 – Bayesian network. Bayesian network is an example of network analysis that takes into consideration the random behavior inherent to biological networks. Bayesian networks are acyclic graph G = (X,E), where X represents the network nodes and E, the directed edges that represent the probabilistic relationship dependence between nodes. The relationship between the network’s nodes are regulated by a conditional probability distribution (formula 2): (2) where ) is the parents nodes of the node . A Bayesian Network is a representation of a join probability distribution (formula 3): (3) It allow an intuitive visualization of the network conditional structural dependences . Bayesian Networks that model sequences of variables varying over time are called dynamical Bayesian networks (DBN). As proposed above, one may consider each protein in the network of cellular under reprogramming as active or inactive. In this context, DBN allows the inference of the likelihood of each network node state, which is necessary to calculate the probability of each cell state (an essential feature of cellular reprogramming). As an example, Chang et al. established a cell-state landscape that allowed the search for optimal reprogramming combinations in human embrionic stem cell (hESC) through the use of DBN. 1.4.4 – Boolean network. An alternative to differential equations and Bayesian network to describe variables relationships in a gene regulatory network is the use of Boolean network. It is a qualitative and dynamical model, describing the system change in its state over time. Its representation of the system is easier to implement that the one from differential equation and consequently, it can process gene networks with a higher number of nodes. A Boolean Network is a directed graph G(X, E) where G represent the nodes of the network and E are the edges between them. The vector of formula 4: (4) describes the state of the network at any given time. The Boolean value of a node, 1 or 0, represents the state “on” or “off” of the gene represented, i.e., active or inactive, respectively. The Boolean model is suitable to represent the evolution of biological systems over time and is relatively simple to implement and interpret. The greatest limitation of this type of network is that the state, 0 or 1, of a node is just an approximation of the reality. The state updating of all the nodes across the entire system can be synchronous, asynchronous or probabilistic depending on the modeling purpose and parameters availability . Fig.1.3. Schematic representation of of an edge between two nodes by three different modeling methods showing that (i) ODE gives a quantitative modeling for differential equation modeling, (ii) a qualitative meaning for Boolean network, and (iii) a probabilistic interpretation for a Bayesian network. 1.5 – Cellular reprogramming using a Boolean network. To address the problem of cellular reprogramming using the Boolean network in practice, one may use a modeling strategy of genetic regulation network (GRN) that warrants a relative simplicity in finding attractors. It should be noticed, however, that detailed information on the interactions within the elements of the network is not taken into account by this approach, since kinetic parameters or affinity terms may take different values according to the different components of the netwok. As seen above, gene interactions can be modeled based on the knowledge of the relationships between the genes of a set that should be perturbed, activated or inactivated, to achieve cellular reprogramming. Therefore, it is crucial to identify specific trancription factors that regulate these genes in order to enable a cell to perform a transition between its actual state and the wanted state. Different cell types are defined as stable states, and a stable steady state is called an attractor. An attactor is characterized by an exclusive gene expression pattern and its perturbation can induce a transition from a stable cellular state to another . It was shown that the number of genes to be perturbed is relatively low, compared to the high number of genes differently expressed between two different cellular state . Considering that the complexity of a gene regulatory network increases together with its number of nodes, and that a phenotypic transition requires a low number of genes to be perturbed, different strategies are being used to reduce the number of network nodes to be analyzed. An iterative network pruning can be used to contextualize the network to the biological condition under which the expression data were obtained . Pruning algorithms compare lists of genes and interactions from literature-based network with lists of genes differentially expressed from a bench experiment in two cellular phenotypes and then search for compatibility between both data sets. This comparison produces a score for each sample of pruned network in order to identify the genes to be perturbed according to the data pair that best matches the cell steady state regarded as a phenotype. The topological relationship between the elements of a specific attractor in a network can be used to construct a protocol of cell reprogramming . Based on data of topological configuration, it is possible to establish a hierarchical organization of strongly connected components (SCC), identify their respective differentially expressed positive circuits (DEPC) and identify determinant genes able of promoting the transition from one stable cellular state to another. The choice of genes to be perturbed can also be done based on dynamic simulation through the combination of transcriptomics profiling and analyses of network stability in order to find the minimun number of DEPCs that need to be perturbed to complete cellular transition. 1.6 – Aplication of cellular reprogramming to diseases. All human diseases are intrinsic multifactorial and characterized by dysregulated processes in gene regulatory network. The GRN knowlegde is important to understand how a molecular network robustness may lead malignant cells to overcome the inactivation of single protein targets by therapeutic treatment using alternative pathways, or network propagation until the system accomodates to a new equilibrium . Thus, network pharmacology and cellular reprogramming are promising methods for the identification of protein combinations with potential to disarticulate a key subnetwork that correlates with a disease and achieve an efficient therapeutic result . The methods described in the previous section are able to overcome a typical problem of bias for well known related pathways and generic models that do not consider individual and new aspects of the studied case. They integrate gene expression data to regulatory models in order to work with a more accurate GRN, indicating specific aspects of each case, such as which genes are present and which pathways are dysregulated . The use of gene expression data from both disease and healthy cells is also important to identify the differentially expressed genes, and target the ones preferentially expressed in diseased cells in order to minimize the negative side effects of target inactivation to healthy cells. The Mogrify methodology considers all these features. However, it has great potential to cause two negative effects if applied to patients in the context of a therapeutic treatment. First, with this metholodgy, one searches for diferentially expressed transcriptional factors responsible for the regulation of genes related to the establishment of the disease phenotype. The problem is that transcriptional factors might be responsible for the regulation of hundreds of genes, and probably they are not all significantly more expressed in the diseased cell than in healthy cells. The pertubation of hundreds of genes, even if mostly are differentially expressed in disease cells, may affect genes essential to cell maintenance and cause serious side effects. Second, this methodology required the induction of gene expression throught cell transfection. As already discussed above, the insertion of a plasmid into DNA occurs randomly and might knockdown some key genes, which increases the risk of oncogenicity. The most common approach applied in patients is the inhibition of a protein target with drugs. Even new inovative alternative patient therapies based on bioagents as RNA interference, aptamer, peptides or antibodies also target proteins with the aim to inactivate their function . These limations need to be considered when appling cellular reprogramming strategies in disease context because they may exclude a number of possible alternative solutions. Once attractors for cell reprogramming have been considered, it is important to emphasize that focusing on the full reprogramming of a cell in order to reach a given steady state, is not necessary. All stable attractors have a basin of attraction, in which trajectories spontaneously converge to the steady state attractor (43). The concept of basin of attraction should simplify the application of cellular reprogramming in diseases, since it reduces the number of required perturbations needed to achieve the desired stable state. The pertubation capable of overcoming an epigenetic barrier and bring a cell from a disease attractor to another desired one considered to match a healthy or at least a less aggressive condition for the patient, need to be carried out in a subspace where therapeutic options overlap with the basin of attraction. As examples, we now propose putative applications of cellular reprogramming in two different diseases, cancer (cell disease) and malaria (infection disease). Cancer cells accumulates malignant mutations during its development and, as result, present a different network topology if compared to healthy cells . Due to mutations accumulations and its consequences on genome dysregulation, it would be impossible to control a cell in order to bring it back from its malignant attractor toward its healthy one. However, the key genes involved in the malignant attractor can be analysed at the light of malignant features, such as continuous proliferation and death avoidance. In addition, both malignant and healthy conditions can be analysed in term of attractor phenotype differences. This would allow the identification of key genes able to reprogram dysregulated cellular processes and achieve proliferation control and/or the induction malignant cells to apoptosis . The vaccines used against malaria uses live attenuated salivary gland sporozoites (SPZ) , and cannot be produced in large scale due to hurdles associated with SPZ obtention. It is known that SPZ development occurs following three main stages according to the insect organ that is infected: migdut, hemolynph and/or salivary gland. Therefore, if considering the salivary gland tissue, the cellular reprogramming analysis should allow the identification of key genes related to this tissue by comparison to the others two stages. The undertanding of salivary gland SPZ genesis and maturation is crucial to develop culture system in laboratory and produce SPZs in vitro for large scale vaccine production. Many advances were already made towards cell reprogramming, and it is effective for a number of purposes. However, much still need to be done in regard to diseases and patient treatment. A clear example is that, unfortunantely, there is yet no efficient general method to identify basins of attraction . 1.7 – Conclusion. The concept of cell reprogramming has evolved a lot during the last decades. The development of high throughput technologies has also promoted more accurate applications of cell reprogramming through its integration with gene expression data. Currently, there is a great perspective of its application in multiple biomedical areas, such as drug screnning and regenerative medicine. Nevertheless, there is still much to do in order to understand and predict complex systems behaviors such as the biological ones. References: Sgariglia D, Conforte AJ, de Carvalho LAV, Carels N, da Silva FAB. Cellular Reprogramming. In: Theoretical and Applied Aspects of Systems Biology. Eds. da Silva FAB, Carels N, Paes Silva Jr F. Computational Biology. Springer International Publishing, 2018; 27:41-55. doi: 10.1007/978-3-319-74974-7_3. 18

Log In

Cellular Reprogramming

Related papers

Related papers

Related topics