Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Inference of chromosome 3D structures from GAM data by a physics computational approach

Methods, 2019
...Read more
Journal Pre-proofs Inference of chromosome 3D structures from GAM data by a physics compu- tational approach Luca Fiorillo, Simona Bianco, Andrea M. Chiariello, Mariano Barbieri, Andrea Esposito, Carlo Annunziatella, Mattia Conte, Alfonso Corrado, Antonella Prisco, Ana Pombo, Mario Nicodemi PII: S1046-2023(18)30485-7 DOI: https://doi.org/10.1016/j.ymeth.2019.09.018 Reference: YMETH 4805 To appear in: Methods Received Date: 15 March 2019 Revised Date: 2 August 2019 Accepted Date: 27 September 2019 Please cite this article as: L. Fiorillo, S. Bianco, A.M. Chiariello, M. Barbieri, A. Esposito, C. Annunziatella, M. Conte, A. Corrado, A. Prisco, A. Pombo, M. Nicodemi, Inference of chromosome 3D structures from GAM data by a physics computational approach, Methods (2019), doi: https://doi.org/10.1016/j.ymeth.2019.09.018 This is a PDF file of an article that has undergone enhancements after acceptance, such as the addition of a cover page and metadata, and formatting for readability, but it is not yet the definitive version of record. This version will undergo additional copyediting, typesetting and review before it is published in its final form, but we are providing this version to give early visibility of the article. Please note that, during the production process, errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain. © 2019 Elsevier Inc. All rights reserved.
1 Inference of chromosome 3D structures from GAM data by a physics computational approach Luca Fiorillo 1* , Simona Bianco 1*# , Andrea M. Chiariello 1* , Mariano Barbieri 2 , Andrea Esposito 1,2 , Carlo Annunziatella 1 , Mattia Conte 1 , Alfonso Corrado 1 , Antonella Prisco 3 , Ana Pombo 2 and Mario Nicodemi 1,4# Addresses 1 Dipartimento di Fisica, Università di Napoli Federico II, and INFN Napoli, Complesso Universitario di Monte Sant’Angelo, 80126 Naples, Italy. 2 Berlin Institute for Medical Systems Biology, Max-Delbrück Centre for Molecular Medicine, Robert-Rössle Strasse, Berlin-Buch 13092, Germany. 3 Institute of Genetics and Biophysics, Consiglio Nazionale Delle Ricerche (CNR). 4 Berlin Institute of Health (BIH), MDC-Berlin, Germany. * Equal contribution # Corresponding authors: biancos@na.infn.it, mario.nicodemi@na.infn.it Abstract The combination of modelling and experimental advances can provide deep insights for understanding chromatin 3D organization and ultimately its underlying mechanisms. In particular, models of polymer physics can help comprehend the complexity of genomic contact maps, as those emerging from technologies such as Hi-C, GAM or SPRITE. Here we discuss a method to reconstruct 3D structures from Genome Architecture Mapping (GAM) data, based on PRISMR, a computational approach introduced to find the minimal polymer model best describing Hi-C input data from only polymer physics. After recapitulating the PRISMR procedure, we describe how we extended it for treating GAM data. We successfully test the method on a 6Mb region around the Sox9 gene and, at a lower resolution, on the whole chromosome 7 in mouse embryonic stem cells. The PRISMR derived 3D structures from GAM co- segregation data are finally validated against independent Hi-C contact maps. The method results to be versatile and robust, hinting that it can be similarly applied to different experimental data, such as SPRITE or microscopy distance data. 1. INTRODUCTION The three-dimensional (3D) folding of the genome is crucial for the proper functioning of the cell, as it enables looping between gene promoters and enhancers, which when disrupted can lead to genes mis- expression and disease [1, 2]. Recent technologies such as Hi-C [3], GAM [4] and SPRITE [5] have allowed to detect the frequency of contact between genomic loci across the genome, revealing striking and complex interaction patterns, including topologically associated domains (TADs) [6, 7] with their internal structures [8, 9], and a higher-order hierarchical organization in meta-TADs [10, 11] encompassing A/B compartments [3] and lamina associated domains [12]. Strong and widespread chromatin loops have been discovered between pairs of CTCF-occupied chromatin sites [13], often in correspondence of TAD edges, while active and poised Pol-II are known to bridge distal genes [14]. Inter-chromosomal contact hubs have been discovered around nuclear bodies [5] and in specialized cell types [15].
Journal Pre-proofs Inference of chromosome 3D structures from GAM data by a physics computational approach Luca Fiorillo, Simona Bianco, Andrea M. Chiariello, Mariano Barbieri, Andrea Esposito, Carlo Annunziatella, Mattia Conte, Alfonso Corrado, Antonella Prisco, Ana Pombo, Mario Nicodemi PII: DOI: Reference: S1046-2023(18)30485-7 https://doi.org/10.1016/j.ymeth.2019.09.018 YMETH 4805 To appear in: Methods Received Date: Revised Date: Accepted Date: 15 March 2019 2 August 2019 27 September 2019 Please cite this article as: L. Fiorillo, S. Bianco, A.M. Chiariello, M. Barbieri, A. Esposito, C. Annunziatella, M. Conte, A. Corrado, A. Prisco, A. Pombo, M. Nicodemi, Inference of chromosome 3D structures from GAM data by a physics computational approach, Methods (2019), doi: https://doi.org/10.1016/j.ymeth.2019.09.018 This is a PDF file of an article that has undergone enhancements after acceptance, such as the addition of a cover page and metadata, and formatting for readability, but it is not yet the definitive version of record. This version will undergo additional copyediting, typesetting and review before it is published in its final form, but we are providing this version to give early visibility of the article. Please note that, during the production process, errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain. © 2019 Elsevier Inc. All rights reserved. Inference of chromosome 3D structures from GAM data by a physics computational approach Luca Fiorillo1*, Simona Bianco1*#, Andrea M. Chiariello1*, Mariano Barbieri2, Andrea Esposito1,2, Carlo Annunziatella1, Mattia Conte1, Alfonso Corrado1, Antonella Prisco3, Ana Pombo2 and Mario Nicodemi1,4# Addresses 1Dipartimento di Fisica, Università di Napoli Federico II, and INFN Napoli, Complesso Universitario di Monte Sant’Angelo, 80126 Naples, Italy. 2Berlin Institute for Medical Systems Biology, Max-Delbrück Centre for Molecular Medicine, Robert-Rössle Strasse, Berlin-Buch 13092, Germany. 3Institute 4Berlin of Genetics and Biophysics, Consiglio Nazionale Delle Ricerche (CNR). Institute of Health (BIH), MDC-Berlin, Germany. * Equal contribution # Corresponding authors: biancos@na.infn.it, mario.nicodemi@na.infn.it Abstract The combination of modelling and experimental advances can provide deep insights for understanding chromatin 3D organization and ultimately its underlying mechanisms. In particular, models of polymer physics can help comprehend the complexity of genomic contact maps, as those emerging from technologies such as Hi-C, GAM or SPRITE. Here we discuss a method to reconstruct 3D structures from Genome Architecture Mapping (GAM) data, based on PRISMR, a computational approach introduced to find the minimal polymer model best describing Hi-C input data from only polymer physics. After recapitulating the PRISMR procedure, we describe how we extended it for treating GAM data. We successfully test the method on a 6Mb region around the Sox9 gene and, at a lower resolution, on the whole chromosome 7 in mouse embryonic stem cells. The PRISMR derived 3D structures from GAM cosegregation data are finally validated against independent Hi-C contact maps. The method results to be versatile and robust, hinting that it can be similarly applied to different experimental data, such as SPRITE or microscopy distance data. 1. INTRODUCTION The three-dimensional (3D) folding of the genome is crucial for the proper functioning of the cell, as it enables looping between gene promoters and enhancers, which when disrupted can lead to genes misexpression and disease [1, 2]. Recent technologies such as Hi-C [3], GAM [4] and SPRITE [5] have allowed to detect the frequency of contact between genomic loci across the genome, revealing striking and complex interaction patterns, including topologically associated domains (TADs) [6, 7] with their internal structures [8, 9], and a higher-order hierarchical organization in meta-TADs [10, 11] encompassing A/B compartments [3] and lamina associated domains [12]. Strong and widespread chromatin loops have been discovered between pairs of CTCF-occupied chromatin sites [13], often in correspondence of TAD edges, while active and poised Pol-II are known to bridge distal genes [14]. Inter-chromosomal contact hubs have been discovered around nuclear bodies [5] and in specialized cell types [15]. 1 Despite the large amount of information from experimental assays that map chromatin conformation, the actual three-dimensional structure of chromatin has only been partially discovered. To help explore 3D genome topology in more holistic terms, different approaches have been proposed that employ polymer physics models and heavy computation. Several computational methods that aim to infer chromosomes’ 3D structures are based on the optimization of objective functions of contact data, such as 5C or Hi-C [15– 22]. Principled approaches have also been proposed based on models of the physical mechanisms underlying chromosome folding [3, 23–27]. Here, we will focus on the application of the String and Binders Switch (SBS) polymer model of chromatin [28, 29] which quantifies the biological scenario in which diffusing molecules, such as transcription factors or chromatin proteins, promote the looping of specific chromatin sites by bridging distal cognate binding sites. Previous application of the SBS model successfully captured features of chromatin organisation that were subsequently validated by FISH and Hi-C data [14, 28, 30]. To further develop the application of the SBS model to gain insights from experimental assays that map 3D genome structure, we explored the application of PRISMR (polymer-based recursive statistical inference method) [32], a recently developed polymer-physics-based approach, to extract chromatin 3D structures from GAM data at the single-allele level. We first describe the approach using as example a specific genomic region around the Sox9 gene in mouse embryonic stem cells (mESCs), an important and architecturally well characterized genomic locus [33]. We show that the PRISMR inferred 3D conformations reproduce GAM data with high accuracy. We also discuss the robustness of the method. Next, we apply PRISMR and GAM data to describe the 3D structure of the whole mouse chromosome 7 in mESCs and we discuss how 3D models can be employed to extract important information about genome folding, including physical distances or relative positioning of loci. Finally, we validate the 3D structures inferred by PRISMR from GAM data to derive in-silico Hi-C contact maps that are successfully compared against independent real Hi-C experiments. 2. 3D modelling of chromatin from GAM data 2.1 The PRISMR method PRISMR (polymer-based recursive statistical inference method) [29, 31] is a computational tool developed to reconstruct the 3D conformations of a genomic region of interest at the single-allele level, consistent with a given set of experimental contact data and a given model of polymer physics. For clarity, we employ the SBS model of chromatin as underlying model of chromatin behaviour, but PRISMR can be used with any other model of polymer physics. The SBS model describes a chromatin filament as a self-avoiding-walk polymer made of beads and embedded in an aqueous solution with a certain concentration of diffusing molecules, or binders [28, 33]. Each binder can interact simultaneously with more than one specific cognate bead, promoting contact between them, and driving polymer’s organization. The specificity of bead-binder interaction is represented by a colour, i.e. only beads and binders having the same colour can come into interaction (Fig. 1a). Noninteracting, “inert” beads are also considered and depicted in grey. Coloured beads are defined as binding sites, while the set of all the beads featuring the same colour is called a binding domain. The number of colours and the distribution of the relative binding sites along the polymer are the factors regulating its folding dynamics and determining its possible equilibrium configurations [14, 30–32, 35]. The key idea is that the complex 3D structure of any real genomic locus can be well represented by the folded equilibrium conformations of an SBS polymer built by aptly choosing binders and binding sites and letting the laws of physics work on the polymer-binders system to reach thermodynamics equilibrium. 2 In this framework, PRIMSR aims to unbiasedly derive the minimal number of colours and the best arrangement of their binding sites that describes the folding of the locus of interest with an SBS polymer. PRISMR infers the minimal arrangement of binding sites that best describes the 3D organization of a genomic region by employing a simulated annealing Monte-Carlo procedure [32], originally developed to use Hi-C data as input. For a given locus, PRISMR searches for the distribution of binding sites in the SBS polymer which minimizes a cost function taking into account the distance between the input experimental contact matrix of the given genomic region and the contact matrix which the ensemble of SBS polymer at thermodynamics folded equilibrium would yield (Fig.1b). To reduce overfitting, the cost function also includes a regularization term penalizing the addition of binding sites (a chemical potential in Statistical Mechanics) [32]. Schematically, PRISMR involves the following steps: 1. consider an SBS polymer model, i.e. a given sequence of binding sites; 2. derive a thermodynamics ensemble of its equilibrium 3D structures; 3. compute their average contact matrix; 4. evaluate the cost function comparing experimental and model contact matrices; 5. change the SBS polymer model accordingly; 6. repeat 1-5 steps until convergence of the cost function. To take into account the eventual influence of initial conditions, a number (up to 5*102) of runs of the described algorithm is performed starting from independent, random initial configurations of the polymer model and the absolute minimum of the cost function is retained as the final output [32]. The average contact matrix computed at step 3 must be of the same type of the input matrix, to allow for step 4. For instance, if PRISMR is informed by an Hi-C matrix, it will have to calculate a simulated Hi-C matrix from the thermodynamics ensemble generated at step 2 [32]. Once convergence is reached at step 6, PRISMR returns its optimal average contact matrix and the optimal SBS polymer model to describe the input contact data of the genomic locus of interest. Next, from the output polymer model, the full ensemble of the 3D configurations of the locus can be derived by Molecular Dynamics (MD) simulations, giving access to information on the folding, including physical distances between regulatory regions, multi-way contacts, and others. PRISMR approach has been successfully employed to describe, from Hi-C data, the folding of a number of genomic loci [29–31, 35, 36] and it also proved effective in predicting the effects of structural variants over chromatin spatial organization [32]. Here, we extend PRISMR to investigate 3D topology determined experimentally using GAM. 2.2 PRISMR for GAM The GAM technology detects the numbers of times with which DNA loci are included in randomly taken slices of an ensemble of nuclei by extracting and sequencing DNA for each slice independently [4]; these are called segregation frequencies. By keeping track of the DNA content in each slice, it is possible to extract the frequencies with which two, three or any number of loci are found together in the same slices, called the co-segregation frequencies. For example, for pairwise contacts, the output of a GAM experiment is a 2-dimensional matrix of the frequencies whereby pairs of loci are found in the same GAM nuclear slice. To filter out experimental bias, such as differences in locus sampling or sequence mappability, a GAM cosegregation matrix can be normalized by a variety of approaches, such as a linkage disequilibrium normalization procedure, yielding a matrix called D’ [4]. For the reader’s ease, we recall here how the D’ matrix is obtained from the segregation data. Indicating with 𝑓𝑖𝑗, the observed co-segregation frequency for the loci i and j, and with 𝑓𝑖, the segregation frequency of locus i, the linkage disequilibrium 𝑑𝑖𝑗 between such loci pair is defined as 𝑑𝑖𝑗 = 𝑓𝑖𝑗 ― 𝑓𝑖𝑓𝑗 . 3 Then, the normalized linkage disequilibrium 𝐷𝑖𝑗 is computed as follows: 𝐷𝑖𝑗 = 𝑑𝑖𝑗 𝑑𝑚𝑎𝑥 , with 𝑑𝑚𝑎𝑥 = 𝑚𝑖𝑛〈𝑓 𝑓 , (1 ― 𝑓 )(1 ― 𝑓 )〉 { 𝑚𝑖𝑛 〈𝑓 (1 ― 𝑓 ),𝑓 (1 ― 𝑓 )〉 𝑖 𝑗 𝑗 𝑖 𝑖 𝑖 𝑗 𝑗 𝑖𝑓 𝑑𝑖𝑗 < 0 𝑖𝑓 𝑑𝑖𝑗 > 0 Arranging the 𝐷𝑖𝑗 for every possible pair in a 2x2 matrix yields the D’. The term 𝑓𝑖𝑓𝑗 gives the expected cosegregation frequency for the loci i and j if they were statistically independent; thus, a positive or negative 𝑑𝑖𝑗 is an indication that the loci co-segregate respectively more or less than they would if they were uncorrelated. Dividing by 𝑑𝑚𝑎𝑥 ensures 𝐷𝑖𝑗 ranges from -1 to 1, as 𝑑𝑚𝑎𝑥 is the maximum possible value linkage disequilibrium between two loci can assume. To make PRISMR work with GAM data as input, its cost function needs to compare experimental and polymer model derived co-segregation matrices (see PRISMR step 4 from the previous section). To this aim, we implemented an algorithm to extract a GAM matrix in silico from an ensemble of 3D SBS polymer conformations. The GAM technique consists in cutting single nuclear slices in random orientations out of a population of nuclei, detecting the DNA loci that happened to be caught in the slices, calculating the segregation frequencies for the chromatin region of interest and from those obtaining the pairwise and multi-wise co-segregation frequencies. Focusing on the pairwise detection, we developed a Python algorithm to implement the GAM process over a population of SBS 3D configurations (see scheme in Fig.1c): each SBS 3D conformation is randomly placed inside a sphere representing the cell nucleus; a slice is cut at random orientation, all the polymer beads falling into it are detected and finally counted. The cell nucleus is approximated as a sphere for sake of simplicity and because it approximates the ESC nucleus examined in the GAM dataset; our in silico method can be in principle extended to accommodate any nuclear shape. The slice is implemented as a plane generated at casual orientation and passing for a randomly chosen point belonging to the sphere; the plane is assumed to be the middle plane of the slice, so, given the slice thickness, all the beads distant less than half of the thickness from the plane are assumed to have fallen in the slice. As in the experiment a slice is not necessarily containing portions of the region of interest, it is possible in our algorithm that a slice results empty by chance. By repeating such slicing procedure over all the 3D conformations of the ensemble, the segregation and co-segregation frequencies are computed and the pairwise co-segregation matrix generated. The D’ can then be easily obtained. The algorithm requires as input the radius of the nucleus of the considered cell type (here approximated as a sphere) and the thickness of a GAM nuclear slice. For the mESC nucleus, here we chose a radius of 4.5 micrometers and a thickness of a slice of 220 nm, as indicated in [4]. 3. The 3D structure of the Sox9 locus in mESC resulting from GAM data 3.1. Inference of the Sox9 optimal polymer model and robustness of the approach To test the performance of the extended PRISMR method on GAM data, we first considered the genomic regions of 6Mb centered around Sox9 (mm9, chr11:109000000-115000000) - which we will refer to as the Sox9 locus – and a publicly available GAM dataset from mESCs [4]. The Sox9 locus is a biologically well characterized genomic region which is associated to serious congenital diseases such as skeletal malformation and sex-reversal syndromes [38]. The locus GAM co-segregation and D’ matrices were obtained at a resolution of 40kb. As explained above, we described this locus by employing the SBS 4 polymer model of chromatin [28, 33] and PRISMR was used to infer the most suitable arrangement of its binding sites (Fig.1b). We applied the extended PRISMR to both the co-segregation and D’ GAM matrices to test the robustness of the approach, finding that PRISMR manages to reproduce well the input matrix no matter of its type (Fig.2a,b). Indeed, the Pearson correlations between the PRISMR matrices and respectively GAM cosegregation and D’ matrix are r=0.93 and r=0.86. However, Pearson correlation may be biased by different factors, primarily the decay of contacts with genomic distance. For Hi-C matrices comparison, this issue has been overcome by the introduction of sophisticated bias-corrected metrics, like the ones in [54,55]. Each of these metrics manages matrices comparison in a different manner, with different kinds of outcomes. Even though they were designed specifically for Hi-C comparison, we employed two of these metrics: the HiCspector [39] reproducibility score (Q) ranging from 0 to 1, the latter assessing the highest similarity, and the HiCRep [40] stratum-adjusted correlation coefficient (SCC), ranging from -1 to 1, as the standard Pearson correlation. For the co-segregation matrices (PRISMR against experimental), we obtained Q=0.81 and SCC=0.96; for the D’ matrices Q=0.69 and SCC=0.97. These values overall confirm the Pearson correlation outcomes, assessing good PRISMR performances both for GAM co-segregation and D’ matrices and adding strong quantitative support to the visual inspection (Fig.2a,b). We also found that the number and distribution of binding domains that PRISMR assigned to the output polymer model is the same in the two cases. To quantify such similarity, we measured the genomic overlap q between pairs of the binding domains derived from the co-segregation and D’ data. The overlap q between a pair of binding domains is defined as the normalized integral over the Sox9 locus of the product of the numbers of their binding sites [32]. The most overlapping binding domain pairs found from D’ and from co-segregation data have the same colour in Fig.2a,b. We found a mean overlap q=0.86 between the 30% best matching binding domains and an overall mean overlap between matching domains of q=0.70. A control overlap distribution was obtained randomizing the binding domains found for the co-segregation case (103 independent random realizations), by bootstrapping their binding sites positions, and evaluating the overlaps between all possible bootstrapped domain pairs (mean qrand=0.39). We found that the overlaps between matching binding domains were significantly higher than the random control case (pvalue=2e-9, Mann-Whitney U test). Analogous results were obtained by bootstrapping the domains found for the D’ case. Moreover, we also considered another, more stringent control case, obtained by bootstrapping at the same time all the co-segregation and D’ binding domains, and by computing their overlaps only after re-matching the most overlapping pairs. As before we generated 103 random cases and obtained a mean control overlap of qrand=0.47, that is still significantly lower than the real case (p-value=3e6, Mann-Whitney U test). Finally, we checked the stability of the binding domains found by PRISMR when different runs of the algorithm are performed starting with different initial states, as already shown for PRISMR applied to Hi-C data [32]. To this aim, we compared the best 10 minima found out from hundreds of independent PRISMR runs. For definiteness, we consider the co-segregation case, but similar results are also expected to hold for the D’ case. We found a mean overlap between corresponding binding domains from different runs q=0.75, significantly higher than the more stringent random model described above (p-value<1e-5, Mann-Whitney U test), and an overlap around 0.90 when considering only the top 30% overlapping binding domains. All these results show that PRISMR is effective on GAM data and demonstrate PRISMR’s robustness and versatility to work well on different input data, such as Hi-C (see e.g. [30, 32]), GAM co-segregation and GAM D’. The predicted binding domains are arranged in a complex way along the polymer chain, often extending along the whole locus and overlapping with each other. Their arrangement gives rise to the complex contact pattern of interaction of the locus. For definiteness, let us focus on the co-segregation case. To 5 better visualize how the different binding domains in the model contribute to the formation of the specific contact pattern of the Sox9 region, we constructed a matrix with the most contributing colour to each pairwise contact (Fig.2c). Since interactions are possible only between cognate binding sites pairs, the contribution of each binding domain to a contact between two loci is simply calculated as the number of its binding sites pairs between the two loci. It turns out that binding domain 13 (dark orange) has a major contribution to the locus folding, forming a large metaTAD containing Sox9 gene and being responsible of diffuse long-range contacts; the binding domain 2 forms a more separate TAD immediately upstream the Sox9 TAD, the other domains forming finer and internal structures. 3.2. 3D conformations of the Sox9 locus To make sense of the patterns present in the GAM contact matrix and to infer the corresponding 3D structures of the locus, we performed Molecular Dynamics simulations of the PRISMR-derived optimal SBS polymer model. For definiteness, we focus on the SBS polymer deduced from the co-segregation matrix. We ran Molecular Dynamics simulations of the optimal polymer found by PRISMR to sample the ensemble of its thermodynamics states. Specifically, we generated a population of 100 replicas of the optimal SBS polymer identified by PRISMR, prepared them in a self-avoiding-walk state and ran MD to make the polymers fold until thermodynamics equilibrium [30]. To check that every polymer attained an equilibrium folded state, we followed the temporal evolution of its gyration radius, Rg [41], which gives an estimate of the polymer compactness, during the simulation. In Fig.S1 we show the temporal evolution of Rg averaged over the 100 replicas: a plateau is reached at a value which is about one third of the initial self-avoidingwalk one, as expected by micro-phase separation (see below) [41]. In Fig.S1 we also show, for a given replica, reconstructions of the 3D-configurations at different time points from the initial self-avoiding-walk state. It can be seen how the polymer, initially open and widely spread, finally shrinks to a denser folded organization. Importantly, the presence of different types of binding sites (colours) having homotypic interactions with corresponding, cognate molecular factors, allows the formation of complex 3D structures rather than simple spherical globules, where sites of the same type tend to form separate clusters through a physical mechanism known as microphase separation [42]. In Fig.2d, we exhibit an example of 3D equilibrium configuration for the Sox9 locus as seen from two different viewpoints. The 3D structure is coloured accordingly to the code colour shown in Fig.2c, which follows the TADs called for the Sox9 region in [6]. It can be observed that each of the different regions tends to fold onto itself giving raise to the enrichment of contacts which is typical of TADs (see in particular the dark red, red and green regions); on the other hand, the dark orange region is deeply contacting almost all the other colours, producing the abundant long-range contacts detected in the GAM matrix. In blue we marked the region containing the Sox9 gene (chr11:112.643:112.649Mb). 4. The 3D structure of chromosome 7 in mESC resulting from GAM data 4.1. 3D conformations of chromosome 7 We also tested our novel PRISMR procedure on the whole chromosome 7 in mESC. We illustrate our results when PRISMR is applied to the GAM D’ matrix of chr7 at 250kb resolution [4], but similar findings are found if GAM co-segregation data are used (see above). In Fig.3a, we show the experimental matrix and the one derived by PRISMR with the corresponding binding domains. Their Pearson, and HiCRep correlations are r=0.67 and SCC=0.96, the HiC-spector score is Q=0.49. This is albeit the genomic length of chromosome 7 is around 30 times longer than the previously considered Sox9 region and the corresponding matrix at 250kb about 16 times greater than the 30kb Sox9 matrix. 6 As for the much smaller Sox9 region, the binding domains have a complex arrangement along chromosome 7, overlapping each other and some extending along the whole chromosome. The most contributing binding domains to each pairwise interaction are shown in Fig.3b, highlighting a major role for domains 3 (light blue), 4 (blue) and 10 (dark grey). The former and the latter appears to be responsible for large metadomains formations at chromosome opposite sides, while the 4th binding domain seems to be involved mainly in long-range contacts. Following the same steps explained in the previous section for the Sox9 locus, we also extracted the 3D equilibrium configurations of chromosome 7. In Fig.3c an example 3D structure of the chromosome 7 is shown. As above (Section 3.2), chromatin folding within the chr7 SBS model takes places through a classical polymer physics mechanism of microphase separation [42]. Thus, the obtained good agreement of our models with experimental contact data for both Sox9 region and chromosome 7 suggests how microphase separation could be involved in chromatin 3D organization both at the Mb-scale and at the chromosome level, as also supported by recent experimental results [43–45]. 4.2. Single-allele 3D location of the mouse orthologue of the human 16p.11.2 locus A possible striking application of the PRISMR inferred 3D structures is locating interesting genomic regions in their architectural context, say e.g. peripheral or internal. This can be instrumental, for instance, to assess whether a locus can form contacts with other chromosomes or the nuclear lamina. With this aim, we focused on a 2Mb region on chromosome 7, containing the mouse orthologue of the human 16p.11.2 locus. This region has a big interest in biomedicine as copy number variations (CNVs, i.e. deletions and duplications) happening there have been associated to phenotypes including severe cognitive disorders, such as autism [46–48]. Cis- and trans- chromatin interactions of the locus have been moreover identified with distal genomic regions associated with similar phenotypes [48]. The study of the 16p.11.2 locus with polymer physics models could help to investigate a possible link [48] between such rearrangements and chromatin contacts disruption, which could lead to the observed phenotypes more than genes directly affected by the CNV. In the 3D snapshot of Fig. 3c, we marked in red the considered 2Mb region (133:135Mb) containing the mouse orthologue of the human 16p.11.2 locus (133.84:134.24Mb). In the illustrated conformation, the 16p locus is projecting outwardly, however we explored the variability of its radial position over the whole ensemble of 3D structures. We computed, for each conformation, the distance of the 16p locus centre of mass from the centre of mass of the whole chromosome, normalizing such a distance by the gyration radius. We then extracted the distribution of the normalized distance, which we labelled as r/Rg. We did the same for 3*102 loci randomly selected over chromosome 7, sized as the 16p locus, thus getting a control random distribution of normalized radial positions. We reported in Fig.3d the 16p distribution against the control one. The distributions are broad to a similar extent, with standard deviations around 34% and 38% of the mean values respectively for the 16p and the control case, indicating a high variability of the loci positions ranging from r/Rg near to 0.0 up to 2.0. Nevertheless, the distributions are not compatible (p-value = 0.01, Mann-Whitney U test), the 16p one being slightly shifted toward higher r/Rg values. In particular, if we reasonably assume that values of r/Rg larger than 1.5 indicate peripheral positions, then this happens roughly the 10% of times for the 16p. The non-negligible frequency to be in peripheral position is an indication of the possibility for the locus to be involved in trans interactions, as actually detected by experiments [48]. 5. In-silico simulation of independent experimental techniques 5.1. Validation of the inferred 3D polymer models 7 As shown in the previous sections the 3D polymer models inferred by PRISMR well reproduce the input GAM data. In order to validate such 3D models, however, we tested their ability to predict the outcome of experiments not given as input to the algorithm. As a first test, from the generated ensemble of 3D conformations of the chromosome 7, derived by PRISMR with GAM D’ data as input, we extracted our predicted co-segregation matrix, by employing our in-silico GAM algorithm. The experimental matrix and the predicted one compare as follows (see Fig.4a): r=0.64, Q=0.45 and SCC=0.62. All these correlations assess a comparative similarity between our model and experiment, showing that by applying PRISMR on the D’ matrix the main features of the co-segregation matrix can be reconstructed. In other words, PRISMR allowed the generation of an ensemble of polymers which represent the conformations of chromosomes 7 that emerge from GAM data. Next, as a more stringent test, we simulated an Hi-C experiment over our population of 3D structures coming from the Sox9 GAM co-segregation data input. The algorithm employed to simulate the Hi-C technique is the same used in [30], where the average contact frequency matrix over the ensemble of 3D structures is calculated by considering in interaction all loci within a threshold distance. We obtained an insilico contact matrix of the Sox9 locus and compared it with independent Hi-C experimental data of the same region in mESCs, at the same 40kb resolution, taken from [6]. Strikingly, our simulated matrix is very similar to the experimental one (Fig.4b), with r=0.90, Q=0.75 and SCC=0.51. Thus, we found a good match with Hi-C data by simulating an Hi-C experiment over a population of 3D structures derived from GAM data, that is a completely independent and different kind of technique. This is an indication that PRISMR yields 3D configurations that can be considered representative of the real possible states a locus may be found in. As a further and final test, we compared the Sox9 3D structures as inferred from GAM co-segregation data with the ones previously obtained [30] by applying PRISMR on the Hi-C data [6] of the same region. The two independently derived ensembles of 3D structures result similar to each other (Fig. 5). For example, a quantitative comparison of the degree of compaction of the 3D conformations, as measured by the gyration radius distribution, for the GAM derived and Hi-C derived conformations is shown in Fig. 5b. Although the GAM case distribution is slightly more shifted toward low values than the Hi-C one, the distributions are very similar in shape. This is a notable result as the GAM and Hi-C matrices given as input to PRISMR have a profoundly different data structure and further support the robustness of PRISMR when changing input data type. 5.2. Exploring in-silico the performance of GAM and Hi-C with 3D polymer models In the previous sections, the different experimental techniques have been simulated in silico in an ideal manner. In fact, Hi-C includes a number of steps such as crosslinking, digestion, biotinylation and ligation, each one having a far from perfect efficiency. However, in simulations, Hi-C like contact frequencies are calculated by just considering in contact each pair of loci on a folded polymer that lies within a distance threshold. In this way, we obtain an ideally perfect detection of contacts, much more efficient than if we had e.g. to cross-link each time only one pair of loci among all contacting ones. The same states for GAM insilico reconstruction, where we simulate ideal, perfectly efficient experiments. For example, we in-silico cut a very large number of slices (tens of thousands in the previously reported simulations), since there is no limit to the number of slices that can be simulated, apart from computational time availability. Interestingly, however, more realistic in-silico simulations could be easily implemented, that would allow to investigate experimental techniques behaviour under different conditions, such as a different number of cut slices for GAM or a different cell population or number of single-molecule conformations, or different efficiencies for each experimental step. As only a first exploration, we used our in-silico method to derive how the co-segregation matrix would look like if a different number of slices were cut. We report as an example, in Fig.S2 a co-segregation matrix 8 predicted by in-silico cutting exactly 408 slices, as in the original GAM experiment [4]. Its Pearson correlation with the GAM experimental co-segregation matrix is r=0.54, the HiC-spector score is Q=0.47 and the HiCRep one amounts to SCC=0.40. Compared with the ideal case exposed in Section 4.1, the considered similarity metrics result generally lower. That is presumably due to the increased noise level, leading to sensible statistical fluctuations on a 408-slice matrix, which in turn are supposedly much weaker at tens of thousands slices. Further investigation about the minimal number of slices necessary for the co-segregation matrix to stabilise would give an important indication to design future GAM experiments. More generally, this kind of in-silico predictions could be used to understand, for instance, Hi-C or GAM optimal settings without actually performing the experiments many times, thus overcoming technical and economic limitations. 6. CONCLUSIONS PRISMR is a computational, polymer-physics-based technology by which the folding properties of whatever locus can be inferred, starting from its Hi-C pairwise contact matrix [32]. Here we showed that PRISMR can be used to derive chromosome architecture from GAM co-segregation or D’ matrices, provided that proper modifications are implemented, as discussed here. We tested the extended version of PRISMR on the Sox9 locus and on the entire chromosome 7 in mESCs, with successful results. This also shows the versatility and robustness of the PRISMR tool, as GAM and Hi-C are two very different techniques, suggesting that PRISMR could be tuned to work on other types of input as well, like SPRITE or microscopy data. In particular, we showed that by simulating an in-silico Hi-C experiment by using the 3D structures which PRISMR inferred from GAM co-segregation data, we successfully reproduced with high accuracy real Hi-C data from independent experiments. Additionally, a comparison of these 3D structures with the ones previously derived by applying PRISMR on Hi-C data [6, 30], showed that they are strikingly consistent, despite coming from completely different experiments. As an application of the results from PRISMR, we discussed how its derived 3D structures can be interestingly exploited to explore experimental tecniques performances under different conditions and to guide the design of more accurate experiments. We also discussed how the PRISMR derived 3D structures can be employed to make sense of the patterns emerging from 2D contact matrices or to assess the positions and reciprocal distances of interesting loci. In particular, we investigated the radial position of the 16p.11.2, a locus linked to cognitive disorders. We found a high singleconformation variability which gives an indication of the possibility for the locus to be engaged in different cis- and trans- contacts. Further modelling of disease-associated rearrangements at this locus could help to investigate the possible rewiring of contacts with distal genes and their regulators, as a possible phenotypecausing mechanism. In this sense PRISMR represents a useful resource to extract new information and to shed new light on experimental data. Recently, an approach named GEM [49] has been developed to reconstruct the genome 3D organization. Within GEM, a chromatin segment of N windows is modelled as a Gaussian chain of N beads with pairwise harmonic interactions [50]. The NxN pair-wise coupling coefficients are the model parameters: they must be chosen to best fit a given input NxN chromatin contact map. The Gaussian nature of the model implies that its scaling properties do not match those of real polymers, yet it has the main advantage to be easily fully solved to derive the set of NxN coupling coefficients that best reproduce, at thermodynamic equilibrium, the input experimental contact data. That makes GEM extremely efficient from a computational point of view, albeit approximate. One of the advantages of PRISMR is that the number of its parameters is orders of magnitude smaller, being the genomic locations of the model binding sites, which scale linearly with the length N of the considered region. Importantly, from only those parameters the experimental contact matrix, scaling with NxN, can be fully reconstructed. PRISMR has also the advantage to employ more advanced models for chromatin, taking into account the Self-Avoiding nature of polymers 9 [41], hence respecting their correct scaling properties [30]. Indeed, in the GEM model the inclusion of the excluded volume leads to a loss of accuracy [49]. The price to pay for PRISMR is that heavier computations are required. Here, we applied PRISMR to find the optimal SBS model of chromatin, as the SBS has been successfully used to fit Hi-C and GAM data [4, 29, 30] and for its comparative simplicity. Nevertheless, it neglects a number of complications arising in real chromatin as emerging from the physics of complex systems, such as offequilibrium, jamming, segregation and stress anomalies effects [51–60]. While improvements are certainly required in our modelling of chromatin, PRISMR is a powerful tool to reconstruct the 3D architecture of the genome and to interpret complex experimental data, such as pair-wise contact matrices, in the light of underlying fundamental molecular mechanisms. We believe that the combination of modelling and experimental advancements can provide deeper and critical insights in the mechanisms of chromatin 3D organization and ultimately in the development of medical tools for diagnosis and treatment of diseases associated with chromosome mis-folding. Acknowledgements M.N. acknowledges grants from the National Institutes of Health Common Fund 4D Nucleome Program grant (1U54DK107977-01), the EU H2020 Marie Curie ITN (813282), the Einstein BIH Fellowship Award (EVF-BIH-2016-282), CINECA ISCRA (HP10CRTY8P), Regione Campania SATIN Project 2018-2020, and computer resources from the INFN, CINECA, ENEA CRESCO/ENEAGRID [61] and SCoPE/ReCaS at the University of Naples. A.P. acknowledges support from the National Institutes of Health Common Fund 4D Nucleome Program grant U54DK107977, the Helmholtz Association, and from the Berlin Institute of Health (CRG2b-TP3). Author contributions S.B. and M.N. designed the project. L.F., S.B., A.M.C., M.B. developed the modeling part; L.F., S.B., A.E., C.A., M.C., A.C. ran the computer simulations and performed analyses. A.Pr. and A.Po. gave conceptual advice. L.F., S.B., A.Po., M.N. wrote the manuscript. REFERENCES [1] J. Dekker and L. Mirny, “The 3D Genome as Moderator of Chromosomal Communication,” Cell, 2016. [2] M. Spielmann, D. G. Lupiáñez, and S. Mundlos, “Structural variation in the 3D genome,” Nature Reviews Genetics. 2018. [3] E. Lieberman-Aiden et al., “Comprehensive mapping of long-range interactions reveals folding principles of the human genome,” Science (80-. )., 2009. [4] R. A. Beagrie et al., “Complex multi-enhancer contacts captured by genome architecture mapping,” Nature, 2017. [5] S. A. Quinodoz et al., “Higher-Order Inter-chromosomal Hubs Shape 3D Genome Organization in the Nucleus,” Cell, 2018. [6] J. R. Dixon et al., “Topological domains in mammalian genomes identified by analysis of chromatin interactions,” Nature, vol. 485, no. 7398, pp. 376–380, 2012. [7] E. P. Nora et al., “Spatial partitioning of the regulatory landscape of the X-inactivation centre,” Nature, 2012. [8] T. Sexton et al., “Three-dimensional folding and functional organization principles of the Drosophila genome,” Cell. 2012. [9] J. E. Phillips-Cremins et al., “Architectural protein subclasses shape 3D organization of genomes during lineage commitment,” Cell, 2013. 10 [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23] [24] [25] [26] [27] [28] [29] [30] [31] [32] [33] [34] J. Fraser et al., “Hierarchical folding and reorganization of chromosomes are linked to transcriptional changes in cellular differentiation,” Mol. Syst. Biol., 2015. A. M. Chiariello, S. Bianco, C. Annunziatella, A. Esposito, and M. Nicodemi, “The scaling features of the 3D organization of chromosomes are highlighted by a transformation a la Kadanoff of Hi-C data,” EPL, 2017. B. van Steensel and A. S. Belmont, “Lamina-Associated Domains: Links with Chromosome Architecture, Heterochromatin, and Gene Repression,” Cell. 2017. S. S. P. Rao et al., “A 3D map of the human genome at kilobase resolution reveals principles of chromatin looping,” Cell, 2014. M. Barbieri et al., “Active and poised promoter states drive folding of the extended HoxB locus in mouse embryonic stem cells,” Nat. Struct. Mol. Biol., vol. 24, no. 6, pp. 515–524, 2017. K. Monahan, A. Horta, and S. Lomvardas, “LHX2- and LDB1-mediated trans interactions regulate olfactory receptor choice,” Nature, 2019. D. Baú et al., “The three-dimensional folding of the α-globin gene domain reveals formation of chromatin globules,” Nat. Struct. Mol. Biol., 2011. R. Kalhor, H. Tjong, N. Jayathilaka, F. Alber, and L. Chen, “Genome architectures revealed by tethered chromosome conformation capture and population-based modeling,” Nat. Biotechnol., 2012. Z. Zhang, G. Li, K.-C. Toh, and W.-K. Sung, “3D Chromosome Modeling with Semi-Definite Programming and Hi-C Data,” J. Comput. Biol., 2013. L. Giorgetti et al., “Predictive polymer modeling reveals coupled fluctuations in chromosome conformation and transcription,” Cell, 2014. F. Serra et al., “Restraint-based three-dimensional modeling of genomes and genomic domains,” FEBS Lett., 2015. B. Adhikari, T. Trieu, and J. Cheng, “Chromosome3D: Reconstructing three-dimensional chromosomal structures from Hi-C interaction frequency data using distance geometry simulated annealing,” BMC Genomics, 2016. P. Szalaj et al., “3D-GNOME: an integrated web service for structural modeling of the 3D genome,” Nucleic Acids Res., 2016. T. Trieu and J. Cheng, “3D genome structure modeling by Lorentzian objective function,” Nucleic Acids Res., 2017. A. L. Sanborn et al., “Chromatin extrusion explains key features of loop and domain formation in wild-type and engineered genomes,” Proc. Natl. Acad. Sci., 2015. A. Goloborodko, J. F. Marko, and L. A. Mirny, “Chromosome Compaction by Active Loop Extrusion,” Biophys. J., 2016. G. Fudenberg, M. Imakaev, C. Lu, A. Goloborodko, N. Abdennur, and L. A. Mirny, “Formation of Chromosomal Domains by Loop Extrusion,” Cell Rep., 2016. C. A. Brackley et al., “Nonequilibrium Chromosome Looping via Molecular Slip Links,” Phys. Rev. Lett., 2017. A. Esposito, C. Annunziatella, S. Bianco, A. M. Chiariello, L. Fiorillo, and M. Nicodemi, “Models of polymer physics for the architecture of the cell nucleus,” Wiley Interdisciplinary Reviews: Systems Biology and Medicine, 2018. M. Barbieri et al., “Complexity of chromatin folding is captured by the strings and binders switch model,” Proc. Natl. Acad. Sci., 2012. A. M. Chiariello, C. Annunziatella, S. Bianco, A. Esposito, and M. Nicodemi, “Polymer physics of chromosome large-scale 3D organisation,” Sci. Rep., 2016. C. Annunziatella, A. M. Chiariello, S. Bianco, and M. Nicodemi, “Polymer models of the hierarchical folding of the Hox-B chromosomal locus,” Phys. Rev. E, 2016. S. Bianco et al., “Polymer physics predicts the effects of structural variants on chromatin architecture,” Nat. Genet., 2018. M. Franke et al., “Formation of new chromatin domains determines pathogenicity of genomic duplications,” Nature, vol. 538, no. 7624, pp. 265–269, 2016. M. Nicodemi and A. Prisco, “Thermodynamic pathways to genome spatial organization in the cell 11 [35] [36] [37] [38] [39] [40] [41] [42] [43] [44] [45] [46] [47] [48] [49] [50] [51] [52] [53] [54] [55] [56] [57] [58] [59] [60] [61] nucleus,” Biophys. J., 2009. C. Annunziatella, A. M. Chiariello, A. Esposito, S. Bianco, L. Fiorillo, and M. Nicodemi, “Molecular Dynamics simulations of the Strings and Binders Switch model of chromatin,” Methods. 2018. A. M. Chiariello et al., “A polymer physics investigation of the architecture of the murine orthologue of the 7q11.23 human locus,” Frontiers in Neuroscience. 2017. B. K. Kragesteen et al., “Dynamic 3D chromatin architecture contributes to enhancer specificity and limb morphogenesis,” Nat. Genet., 2018. M. Franke et al., “Formation of new chromatin domains determines pathogenicity of genomic duplications,” Nature, 2016. K. K. Yan, G. G. Yardlmcl, C. Yan, W. S. Noble, and M. Gerstein, “HiC-spector: A matrix library for spectral and reproducibility analysis of Hi-C contact maps,” in Bioinformatics, 2017. T. Yang et al., “HiCRep: assessing the reproducibility of Hi-C data using a stratum-adjusted correlation coefficient,” Genome Res., 2017. P. G. de Gennes, Scaling concepts in polymer physics. Ithaca, NY: Cornell University Press., 1979. L. Leibler, “Theory of Microphase Separation in Block Copolymers,” Macromolecules, 1980. D. Hnisz, K. Shrinivas, R. A. Young, A. K. Chakraborty, and P. A. Sharp, “A Phase Separation Model for Transcriptional Control,” Cell. 2017. A. G. Larson et al., “Liquid droplet formation by HP1α suggests a role for phase separation in heterochromatin,” Nature, 2017. A. R. Strom, A. V. Emelyanov, M. Mir, D. V. Fyodorov, X. Darzacq, and G. H. Karpen, “Phase separation drives heterochromatin domain formation,” Nature, vol. 547, no. 7662, pp. 241–245, 2017. S. E. McCarthy et al., “Microduplications of 16p11.2 are associated with schizophrenia,” Nat. Genet., 2009. J. L. Stein, “Copy number variation and brain structure: Lessons learned from chromosome 16p11.2,” Genome Med., 2015. M. N. Loviglio et al., “Chromosomal contacts connect loci associated with autism, BMI and head circumference phenotypes,” Mol. Psychiatry, 2017. G. Le Treut, F. Képès, and H. Orland, “A Polymer Model for the Quantitative Reconstruction of Chromosome Architecture from HiC and GAM Data,” Biophys. J., 2018. M. Bohn, D. W. Heermann, and R. Van Driel, “Random loop model for long polymers,” Phys. Rev. E Stat. Nonlinear, Soft Matter Phys., 2007. E. Caglioti, A. Coniglio, H. J. Herrmann, V. Loreto, and M. Nicodemi, “Segregation of granular mixtures in the presence of compaction,” Europhys. Lett., 1998. M. Nicodemi, B. Panning, and A. Prisco, “A thermodynamic switch for chromosome colocalization,” Genetics, 2008. M. Nicodemi, A. Fierro, and A. Coniglio, “Segregation in hard-sphere mixtures under gravity. An extension of Edwards approach with two thermodynamical parameters,” Europhys. Lett., 2002. M. Nicodemi and A. Coniglio, “Macroscopic glassy relaxations and microscopic motions in a frustrated lattice gas,” Phys. Rev. E - Stat. Physics, Plasmas, Fluids, Relat. Interdiscip. Top., 1998. M. P. Ciamarra, R. Pastore, M. Nicodemi, and A. Coniglio, “Jamming phase diagram for frictional particles,” Phys. Rev. E - Stat. Nonlinear, Soft Matter Phys., 2011. M. P. Ciamarra, M. Nicodemi, and A. Coniglio, “Recent results on the jamming phase diagram,” Soft Matter, 2010. D. S. Grebenkov, M. P. Ciamarra, M. Nicodemi, and A. Coniglio, “Flow, ordering, and jamming of sheared granular suspensions,” Phys. Rev. Lett., 2008. M. Nicodemi, “Force correlations and arch formation in granular assemblies,” Phys. Rev. Lett., 1998. A. Coniglio, A. De Candia, A. Fierro, and M. Nicodemi, “Universality in glassy systems,” J. Phys. Condens. Matter, 1999. D. Hamon, M. Nicodemi, and H. J. Jensen, “Continuously driven OFC: A simple model of solar flare statistics,” Astron. Astrophys., 2003. G. Ponti et al., “The role of medium size facilities in the HPC ecosystem: The case of the new CRESCO4 cluster integrated in the ENEAGRID infrastructure,” in Proceedings of the 2014 12 International Conference on High Performance Computing and Simulation, HPCS 2014, 2014. 13 Fig. 1: The SBS model and the PRISMR algorithm for GAM data. a) A cartoon of the SBS polymer model. Beads of the same type (colour) along the chain can be bridged by diffusing cognate binders. b) A scheme of the PRISMR method algorithm (adapted from [32]): PRISMR infers the minimal, optimal SBS polymer, whose equilibrium folded structures best reproduce an input contact matrix. c) Representation of the insilico GAM algorithm: an SBS polymer is randomly placed inside a surface (representing the cell nucleus), then a slice is cut through and the beads found in that slice are counted as co-segregated. Fig. 2: The PRISMR polymer model of the Sox9 locus from GAM mESC data. a) GAM co-segregation matrix [4] of the Sox9 locus (top, mESC chr11:109-115Mb, 408 slices, 40kb) is compared with the corresponding PRISMR derived one (bottom). Their Pearson correlation is r=0.93, the HiC-spector score is Q=0.81 and the HiCRep SCC=0.96. The PRISMR inferred binding sites distribution of the Sox9 polymer model is shown in the middle panel: 15 different types (colours) are identified. b) The GAM D’ matrix of the Sox9 locus (top) is compared to the PRISMR one (bottom). They have r=0.86, Q=0.69, SCC=0.97. The PRISMR inferred PRISMR binding sites distribution is shown. Consistently with the case in a), 15 colours are identified. The mean genomic overlap (see text) between the distributions of binding sites in the two cases is q=70%. Overall, PRISMR deals equally well with both kinds of experimental data, GAM co-segregation and D’ matrices. c) The most contributing binding domain to each pairwise interaction is shown for the PRISMR model derived from the GAM co-segregation matrix. The shown colour bar code right above the matrix represents the TADs of the Sox9 locus as found in [6]. They are coloured following as closest as possible the colour of the most contributing binding domain to each TAD. d) A snapshot of the 3D conformation of the Sox9 locus (see from two angles) derived from our MD simulations of the SBS model from GAM co-segregation data. The employed colour code is shown in panel c. The region containing the Sox9 gene is highlighted in blue. Fig. 3: The PRISMR polymer model of chr7 from GAM mESC data. a) The GAM D’ matrix (top) of mESC chr7 at 250kb of resolution [4] - where dark blue stripes represents region with mapping issues or biases - is compared to the corresponding PRISMR matrix (bottom). Their Pearson correlation coefficient is r=0.67, the HiC-spector score is Q=0.49, the HiCRep correlation is SCC=0.96. The inferred model binding domains (10 different types were found) are shown in the middle. b) Most contributing binding domain to each pairwise interaction for the PRISMR-derived D’ matrix of chromosome 7. The colour bar code above is used for visualization of Fig.3c. c) A snapshot of the 3D configuration of the inferred model of chr.7. The code colour is shown in panel b). The region containing the 16p11.2 mouse orthologue locus (133.84:134.28Mb) is highlighted in red. d) A histogram (blue) showing the distribution of r/Rg, i.e., the positions of the center of mass of the 16p11.2-locus, r, normalized by the gyration radius, Rg, of the chromosome. The over imposed grey histogram shows the same distribution for 3*102 randomly taken loci, with the same size of the 16p11.2. Standard deviation to mean value ratios are 34% for the 16p case and 38% for the control, highlighting a high variability across single-chromosomes conformation. The two distributions are not compatible (p-value = 0.01, Mann Whitney U test), as the 16p locus distribution is slightly shifted toward more peripheral positions. Fig. 4: PRISMR inferred 3D structures are validated against independent GAM and Hi-C data. a) The PRISMR co-segregation matrix for mESC chr7, predicted from the ensemble of 3D structures inferred from GAM D’ data (top) is compared to the experimental GAM co-segregation data (bottom, [4]). The comparison yields Pearson correlation r=0.64, HiC-spector score Q=0.45 and HiCRep correlation SCC=0.62. b) The Hi-C-like contact matrix for the mESC Sox9 locus predicted from the ensemble of 3D configurations inferred by PRISMR from GAM co-segregation data (top) is compared to independent Hi-C data (bottom, 14 [6]). The comparison yields Pearson correlation r=0.90, HiC-spector score Q=0.75 and HiCRep correlation SCC=0.51. Fig.5: PRISMR 3D structures from GAM data are consistent with the ones inferred from Hi-C data. a) Sox9 locus 3D structures derived by PRISMR from GAM co-segregation [4] data (on the left; on top-left the same conformation already shown in Fig.2d and on bottom-left another example from the same ensemble, described in section 3.2) are compared with 3D structures previously derived by PRISMR (in [30]) from mESC Hi-C data [6] of the same Sox9 region at the same, 40kb, resolution (on the right). The colouring is made as in Fig. 2d. b) Gyration radius distributions for the GAM-derived (red) and Hi-C – derived 3D structures. Gyration radius is expressed in unit of the SBS bead diameter (σ). 15 16 17 18 19 20 HIGHLIGHTS • A computational method, based on polymer-physics, is introduced to extract 3D structures of genomic loci from GAM data • The method is robustly applied to co-segregation and linkage disequilibrium (D’) GAM data. • The method is applied to describe the 3D folding of a 6Mb region around the Sox9 gene and of the whole chromosome 7 in mouse ES cells. • The inferred 3D structures from GAM data are successfully compared against independent Hi-C experiments. 21
Keep reading this paper — and 50 million others — with a free Academia account
Used by leading Academics
Jon R Sayers
The University of Sheffield
Sabina Passamonti
Università degli Studi di Trieste
Grum Gebreyesus
Aarhus University
Hikmet Budak
University of Nebraska Lincoln