Journal Pre-proofs
Inference of chromosome 3D structures from GAM data by a physics computational approach
Luca Fiorillo, Simona Bianco, Andrea M. Chiariello, Mariano Barbieri, Andrea
Esposito, Carlo Annunziatella, Mattia Conte, Alfonso Corrado, Antonella
Prisco, Ana Pombo, Mario Nicodemi
PII:
DOI:
Reference:
S1046-2023(18)30485-7
https://doi.org/10.1016/j.ymeth.2019.09.018
YMETH 4805
To appear in:
Methods
Received Date:
Revised Date:
Accepted Date:
15 March 2019
2 August 2019
27 September 2019
Please cite this article as: L. Fiorillo, S. Bianco, A.M. Chiariello, M. Barbieri, A. Esposito, C. Annunziatella, M.
Conte, A. Corrado, A. Prisco, A. Pombo, M. Nicodemi, Inference of chromosome 3D structures from GAM data
by a physics computational approach, Methods (2019), doi: https://doi.org/10.1016/j.ymeth.2019.09.018
This is a PDF file of an article that has undergone enhancements after acceptance, such as the addition of a cover
page and metadata, and formatting for readability, but it is not yet the definitive version of record. This version will
undergo additional copyediting, typesetting and review before it is published in its final form, but we are providing
this version to give early visibility of the article. Please note that, during the production process, errors may be
discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
© 2019 Elsevier Inc. All rights reserved.
Inference of chromosome 3D structures from GAM data by a physics
computational approach
Luca Fiorillo1*, Simona Bianco1*#, Andrea M. Chiariello1*, Mariano Barbieri2, Andrea Esposito1,2, Carlo
Annunziatella1, Mattia Conte1, Alfonso Corrado1, Antonella Prisco3, Ana Pombo2 and Mario Nicodemi1,4#
Addresses
1Dipartimento di Fisica, Università di Napoli Federico II, and INFN Napoli, Complesso Universitario di Monte
Sant’Angelo, 80126 Naples, Italy.
2Berlin
Institute for Medical Systems Biology, Max-Delbrück Centre for Molecular Medicine, Robert-Rössle
Strasse, Berlin-Buch 13092, Germany.
3Institute
4Berlin
of Genetics and Biophysics, Consiglio Nazionale Delle Ricerche (CNR).
Institute of Health (BIH), MDC-Berlin, Germany.
* Equal contribution
# Corresponding authors: biancos@na.infn.it, mario.nicodemi@na.infn.it
Abstract
The combination of modelling and experimental advances can provide deep insights for understanding
chromatin 3D organization and ultimately its underlying mechanisms. In particular, models of polymer
physics can help comprehend the complexity of genomic contact maps, as those emerging from
technologies such as Hi-C, GAM or SPRITE. Here we discuss a method to reconstruct 3D structures from
Genome Architecture Mapping (GAM) data, based on PRISMR, a computational approach introduced to
find the minimal polymer model best describing Hi-C input data from only polymer physics. After
recapitulating the PRISMR procedure, we describe how we extended it for treating GAM data. We
successfully test the method on a 6Mb region around the Sox9 gene and, at a lower resolution, on the
whole chromosome 7 in mouse embryonic stem cells. The PRISMR derived 3D structures from GAM cosegregation data are finally validated against independent Hi-C contact maps. The method results to be
versatile and robust, hinting that it can be similarly applied to different experimental data, such as SPRITE
or microscopy distance data.
1. INTRODUCTION
The three-dimensional (3D) folding of the genome is crucial for the proper functioning of the cell, as it
enables looping between gene promoters and enhancers, which when disrupted can lead to genes misexpression and disease [1, 2]. Recent technologies such as Hi-C [3], GAM [4] and SPRITE [5] have allowed to
detect the frequency of contact between genomic loci across the genome, revealing striking and complex
interaction patterns, including topologically associated domains (TADs) [6, 7] with their internal structures
[8, 9], and a higher-order hierarchical organization in meta-TADs [10, 11] encompassing A/B compartments
[3] and lamina associated domains [12]. Strong and widespread chromatin loops have been discovered
between pairs of CTCF-occupied chromatin sites [13], often in correspondence of TAD edges, while active
and poised Pol-II are known to bridge distal genes [14]. Inter-chromosomal contact hubs have been
discovered around nuclear bodies [5] and in specialized cell types [15].
1
Despite the large amount of information from experimental assays that map chromatin conformation, the
actual three-dimensional structure of chromatin has only been partially discovered. To help explore 3D
genome topology in more holistic terms, different approaches have been proposed that employ polymer
physics models and heavy computation. Several computational methods that aim to infer chromosomes’
3D structures are based on the optimization of objective functions of contact data, such as 5C or Hi-C [15–
22]. Principled approaches have also been proposed based on models of the physical mechanisms
underlying chromosome folding [3, 23–27]. Here, we will focus on the application of the String and Binders
Switch (SBS) polymer model of chromatin [28, 29] which quantifies the biological scenario in which
diffusing molecules, such as transcription factors or chromatin proteins, promote the looping of specific
chromatin sites by bridging distal cognate binding sites. Previous application of the SBS model successfully
captured features of chromatin organisation that were subsequently validated by FISH and Hi-C data [14,
28, 30].
To further develop the application of the SBS model to gain insights from experimental assays that map 3D
genome structure, we explored the application of PRISMR (polymer-based recursive statistical inference
method) [32], a recently developed polymer-physics-based approach, to extract chromatin 3D structures
from GAM data at the single-allele level. We first describe the approach using as example a specific
genomic region around the Sox9 gene in mouse embryonic stem cells (mESCs), an important and
architecturally well characterized genomic locus [33]. We show that the PRISMR inferred 3D conformations
reproduce GAM data with high accuracy. We also discuss the robustness of the method. Next, we apply
PRISMR and GAM data to describe the 3D structure of the whole mouse chromosome 7 in mESCs and we
discuss how 3D models can be employed to extract important information about genome folding, including
physical distances or relative positioning of loci. Finally, we validate the 3D structures inferred by PRISMR
from GAM data to derive in-silico Hi-C contact maps that are successfully compared against independent
real Hi-C experiments.
2. 3D modelling of chromatin from GAM data
2.1 The PRISMR method
PRISMR (polymer-based recursive statistical inference method) [29, 31] is a computational tool developed
to reconstruct the 3D conformations of a genomic region of interest at the single-allele level, consistent
with a given set of experimental contact data and a given model of polymer physics. For clarity, we employ
the SBS model of chromatin as underlying model of chromatin behaviour, but PRISMR can be used with any
other model of polymer physics.
The SBS model describes a chromatin filament as a self-avoiding-walk polymer made of beads and
embedded in an aqueous solution with a certain concentration of diffusing molecules, or binders [28, 33].
Each binder can interact simultaneously with more than one specific cognate bead, promoting contact
between them, and driving polymer’s organization. The specificity of bead-binder interaction is represented
by a colour, i.e. only beads and binders having the same colour can come into interaction (Fig. 1a). Noninteracting, “inert” beads are also considered and depicted in grey. Coloured beads are defined as binding
sites, while the set of all the beads featuring the same colour is called a binding domain. The number of
colours and the distribution of the relative binding sites along the polymer are the factors regulating its
folding dynamics and determining its possible equilibrium configurations [14, 30–32, 35]. The key idea is
that the complex 3D structure of any real genomic locus can be well represented by the folded equilibrium
conformations of an SBS polymer built by aptly choosing binders and binding sites and letting the laws of
physics work on the polymer-binders system to reach thermodynamics equilibrium.
2
In this framework, PRIMSR aims to unbiasedly derive the minimal number of colours and the best
arrangement of their binding sites that describes the folding of the locus of interest with an SBS polymer.
PRISMR infers the minimal arrangement of binding sites that best describes the 3D organization of a
genomic region by employing a simulated annealing Monte-Carlo procedure [32], originally developed to
use Hi-C data as input. For a given locus, PRISMR searches for the distribution of binding sites in the SBS
polymer which minimizes a cost function taking into account the distance between the input experimental
contact matrix of the given genomic region and the contact matrix which the ensemble of SBS polymer at
thermodynamics folded equilibrium would yield (Fig.1b). To reduce overfitting, the cost function also
includes a regularization term penalizing the addition of binding sites (a chemical potential in Statistical
Mechanics) [32]. Schematically, PRISMR involves the following steps:
1. consider an SBS polymer model, i.e. a given sequence of binding sites;
2. derive a thermodynamics ensemble of its equilibrium 3D structures;
3. compute their average contact matrix;
4. evaluate the cost function comparing experimental and model contact matrices;
5. change the SBS polymer model accordingly;
6. repeat 1-5 steps until convergence of the cost function.
To take into account the eventual influence of initial conditions, a number (up to 5*102) of runs of the
described algorithm is performed starting from independent, random initial configurations of the polymer
model and the absolute minimum of the cost function is retained as the final output [32]. The average
contact matrix computed at step 3 must be of the same type of the input matrix, to allow for step 4. For
instance, if PRISMR is informed by an Hi-C matrix, it will have to calculate a simulated Hi-C matrix from the
thermodynamics ensemble generated at step 2 [32]. Once convergence is reached at step 6, PRISMR
returns its optimal average contact matrix and the optimal SBS polymer model to describe the input
contact data of the genomic locus of interest. Next, from the output polymer model, the full ensemble of
the 3D configurations of the locus can be derived by Molecular Dynamics (MD) simulations, giving access to
information on the folding, including physical distances between regulatory regions, multi-way contacts,
and others. PRISMR approach has been successfully employed to describe, from Hi-C data, the folding of a
number of genomic loci [29–31, 35, 36] and it also proved effective in predicting the effects of structural
variants over chromatin spatial organization [32]. Here, we extend PRISMR to investigate 3D topology
determined experimentally using GAM.
2.2 PRISMR for GAM
The GAM technology detects the numbers of times with which DNA loci are included in randomly taken
slices of an ensemble of nuclei by extracting and sequencing DNA for each slice independently [4]; these
are called segregation frequencies. By keeping track of the DNA content in each slice, it is possible to
extract the frequencies with which two, three or any number of loci are found together in the same slices,
called the co-segregation frequencies. For example, for pairwise contacts, the output of a GAM experiment
is a 2-dimensional matrix of the frequencies whereby pairs of loci are found in the same GAM nuclear slice.
To filter out experimental bias, such as differences in locus sampling or sequence mappability, a GAM cosegregation matrix can be normalized by a variety of approaches, such as a linkage disequilibrium
normalization procedure, yielding a matrix called D’ [4]. For the reader’s ease, we recall here how the D’
matrix is obtained from the segregation data. Indicating with 𝑓𝑖𝑗, the observed co-segregation frequency for
the loci i and j, and with 𝑓𝑖, the segregation frequency of locus i, the linkage disequilibrium 𝑑𝑖𝑗 between
such loci pair is defined as
𝑑𝑖𝑗 = 𝑓𝑖𝑗 ― 𝑓𝑖𝑓𝑗 .
3
Then, the normalized linkage disequilibrium 𝐷𝑖𝑗 is computed as follows:
𝐷𝑖𝑗 = 𝑑𝑖𝑗 𝑑𝑚𝑎𝑥 , with
𝑑𝑚𝑎𝑥 =
𝑚𝑖𝑛〈𝑓 𝑓 , (1 ― 𝑓 )(1 ― 𝑓 )〉
{ 𝑚𝑖𝑛
〈𝑓 (1 ― 𝑓 ),𝑓 (1 ― 𝑓 )〉
𝑖 𝑗
𝑗
𝑖
𝑖
𝑖
𝑗
𝑗
𝑖𝑓 𝑑𝑖𝑗 < 0
𝑖𝑓 𝑑𝑖𝑗 > 0
Arranging the 𝐷𝑖𝑗 for every possible pair in a 2x2 matrix yields the D’. The term 𝑓𝑖𝑓𝑗 gives the expected cosegregation frequency for the loci i and j if they were statistically independent; thus, a positive or negative
𝑑𝑖𝑗 is an indication that the loci co-segregate respectively more or less than they would if they were
uncorrelated. Dividing by 𝑑𝑚𝑎𝑥 ensures 𝐷𝑖𝑗 ranges from -1 to 1, as 𝑑𝑚𝑎𝑥 is the maximum possible value
linkage disequilibrium between two loci can assume.
To make PRISMR work with GAM data as input, its cost function needs to compare experimental and
polymer model derived co-segregation matrices (see PRISMR step 4 from the previous section). To this aim,
we implemented an algorithm to extract a GAM matrix in silico from an ensemble of 3D SBS polymer
conformations. The GAM technique consists in cutting single nuclear slices in random orientations out of a
population of nuclei, detecting the DNA loci that happened to be caught in the slices, calculating the
segregation frequencies for the chromatin region of interest and from those obtaining the pairwise and
multi-wise co-segregation frequencies. Focusing on the pairwise detection, we developed a Python
algorithm to implement the GAM process over a population of SBS 3D configurations (see scheme in
Fig.1c): each SBS 3D conformation is randomly placed inside a sphere representing the cell nucleus; a slice
is cut at random orientation, all the polymer beads falling into it are detected and finally counted. The cell
nucleus is approximated as a sphere for sake of simplicity and because it approximates the ESC nucleus
examined in the GAM dataset; our in silico method can be in principle extended to accommodate any
nuclear shape. The slice is implemented as a plane generated at casual orientation and passing for a
randomly chosen point belonging to the sphere; the plane is assumed to be the middle plane of the slice,
so, given the slice thickness, all the beads distant less than half of the thickness from the plane are assumed
to have fallen in the slice. As in the experiment a slice is not necessarily containing portions of the region of
interest, it is possible in our algorithm that a slice results empty by chance. By repeating such slicing
procedure over all the 3D conformations of the ensemble, the segregation and co-segregation frequencies
are computed and the pairwise co-segregation matrix generated. The D’ can then be easily obtained.
The algorithm requires as input the radius of the nucleus of the considered cell type (here approximated as
a sphere) and the thickness of a GAM nuclear slice. For the mESC nucleus, here we chose a radius of 4.5
micrometers and a thickness of a slice of 220 nm, as indicated in [4].
3. The 3D structure of the Sox9 locus in mESC resulting from GAM data
3.1. Inference of the Sox9 optimal polymer model and robustness of the approach
To test the performance of the extended PRISMR method on GAM data, we first considered the genomic
regions of 6Mb centered around Sox9 (mm9, chr11:109000000-115000000) - which we will refer to as the
Sox9 locus – and a publicly available GAM dataset from mESCs [4]. The Sox9 locus is a biologically well
characterized genomic region which is associated to serious congenital diseases such as skeletal
malformation and sex-reversal syndromes [38]. The locus GAM co-segregation and D’ matrices were
obtained at a resolution of 40kb. As explained above, we described this locus by employing the SBS
4
polymer model of chromatin [28, 33] and PRISMR was used to infer the most suitable arrangement of its
binding sites (Fig.1b).
We applied the extended PRISMR to both the co-segregation and D’ GAM matrices to test the robustness of
the approach, finding that PRISMR manages to reproduce well the input matrix no matter of its type
(Fig.2a,b). Indeed, the Pearson correlations between the PRISMR matrices and respectively GAM cosegregation and D’ matrix are r=0.93 and r=0.86. However, Pearson correlation may be biased by different
factors, primarily the decay of contacts with genomic distance. For Hi-C matrices comparison, this issue has
been overcome by the introduction of sophisticated bias-corrected metrics, like the ones in [54,55]. Each of
these metrics manages matrices comparison in a different manner, with different kinds of outcomes. Even
though they were designed specifically for Hi-C comparison, we employed two of these metrics: the HiCspector [39] reproducibility score (Q) ranging from 0 to 1, the latter assessing the highest similarity, and the
HiCRep [40] stratum-adjusted correlation coefficient (SCC), ranging from -1 to 1, as the standard Pearson
correlation. For the co-segregation matrices (PRISMR against experimental), we obtained Q=0.81 and
SCC=0.96; for the D’ matrices Q=0.69 and SCC=0.97. These values overall confirm the Pearson correlation
outcomes, assessing good PRISMR performances both for GAM co-segregation and D’ matrices and adding
strong quantitative support to the visual inspection (Fig.2a,b).
We also found that the number and distribution of binding domains that PRISMR assigned to the output
polymer model is the same in the two cases. To quantify such similarity, we measured the genomic overlap
q between pairs of the binding domains derived from the co-segregation and D’ data. The overlap q
between a pair of binding domains is defined as the normalized integral over the Sox9 locus of the product
of the numbers of their binding sites [32]. The most overlapping binding domain pairs found from D’ and
from co-segregation data have the same colour in Fig.2a,b. We found a mean overlap q=0.86 between the
30% best matching binding domains and an overall mean overlap between matching domains of q=0.70. A
control overlap distribution was obtained randomizing the binding domains found for the co-segregation
case (103 independent random realizations), by bootstrapping their binding sites positions, and evaluating
the overlaps between all possible bootstrapped domain pairs (mean qrand=0.39). We found that the
overlaps between matching binding domains were significantly higher than the random control case (pvalue=2e-9, Mann-Whitney U test). Analogous results were obtained by bootstrapping the domains found
for the D’ case. Moreover, we also considered another, more stringent control case, obtained by
bootstrapping at the same time all the co-segregation and D’ binding domains, and by computing their
overlaps only after re-matching the most overlapping pairs. As before we generated 103 random cases and
obtained a mean control overlap of qrand=0.47, that is still significantly lower than the real case (p-value=3e6, Mann-Whitney U test).
Finally, we checked the stability of the binding domains found by PRISMR when different runs of the
algorithm are performed starting with different initial states, as already shown for PRISMR applied to Hi-C
data [32]. To this aim, we compared the best 10 minima found out from hundreds of independent PRISMR
runs. For definiteness, we consider the co-segregation case, but similar results are also expected to hold for
the D’ case. We found a mean overlap between corresponding binding domains from different runs q=0.75,
significantly higher than the more stringent random model described above (p-value<1e-5, Mann-Whitney
U test), and an overlap around 0.90 when considering only the top 30% overlapping binding domains. All
these results show that PRISMR is effective on GAM data and demonstrate PRISMR’s robustness and
versatility to work well on different input data, such as Hi-C (see e.g. [30, 32]), GAM co-segregation and
GAM D’.
The predicted binding domains are arranged in a complex way along the polymer chain, often extending
along the whole locus and overlapping with each other. Their arrangement gives rise to the complex
contact pattern of interaction of the locus. For definiteness, let us focus on the co-segregation case. To
5
better visualize how the different binding domains in the model contribute to the formation of the specific
contact pattern of the Sox9 region, we constructed a matrix with the most contributing colour to each
pairwise contact (Fig.2c). Since interactions are possible only between cognate binding sites pairs, the
contribution of each binding domain to a contact between two loci is simply calculated as the number of its
binding sites pairs between the two loci. It turns out that binding domain 13 (dark orange) has a major
contribution to the locus folding, forming a large metaTAD containing Sox9 gene and being responsible of
diffuse long-range contacts; the binding domain 2 forms a more separate TAD immediately upstream the
Sox9 TAD, the other domains forming finer and internal structures.
3.2. 3D conformations of the Sox9 locus
To make sense of the patterns present in the GAM contact matrix and to infer the corresponding 3D
structures of the locus, we performed Molecular Dynamics simulations of the PRISMR-derived optimal SBS
polymer model. For definiteness, we focus on the SBS polymer deduced from the co-segregation matrix.
We ran Molecular Dynamics simulations of the optimal polymer found by PRISMR to sample the ensemble
of its thermodynamics states. Specifically, we generated a population of 100 replicas of the optimal SBS
polymer identified by PRISMR, prepared them in a self-avoiding-walk state and ran MD to make the
polymers fold until thermodynamics equilibrium [30]. To check that every polymer attained an equilibrium
folded state, we followed the temporal evolution of its gyration radius, Rg [41], which gives an estimate of
the polymer compactness, during the simulation. In Fig.S1 we show the temporal evolution of Rg averaged
over the 100 replicas: a plateau is reached at a value which is about one third of the initial self-avoidingwalk one, as expected by micro-phase separation (see below) [41]. In Fig.S1 we also show, for a given
replica, reconstructions of the 3D-configurations at different time points from the initial self-avoiding-walk
state. It can be seen how the polymer, initially open and widely spread, finally shrinks to a denser folded
organization. Importantly, the presence of different types of binding sites (colours) having homotypic
interactions with corresponding, cognate molecular factors, allows the formation of complex 3D structures
rather than simple spherical globules, where sites of the same type tend to form separate clusters through
a physical mechanism known as microphase separation [42]. In Fig.2d, we exhibit an example of 3D
equilibrium configuration for the Sox9 locus as seen from two different viewpoints. The 3D structure is
coloured accordingly to the code colour shown in Fig.2c, which follows the TADs called for the Sox9 region
in [6]. It can be observed that each of the different regions tends to fold onto itself giving raise to the
enrichment of contacts which is typical of TADs (see in particular the dark red, red and green regions); on
the other hand, the dark orange region is deeply contacting almost all the other colours, producing the
abundant long-range contacts detected in the GAM matrix. In blue we marked the region containing the
Sox9 gene (chr11:112.643:112.649Mb).
4. The 3D structure of chromosome 7 in mESC resulting from GAM data
4.1. 3D conformations of chromosome 7
We also tested our novel PRISMR procedure on the whole chromosome 7 in mESC. We illustrate our results
when PRISMR is applied to the GAM D’ matrix of chr7 at 250kb resolution [4], but similar findings are found
if GAM co-segregation data are used (see above). In Fig.3a, we show the experimental matrix and the one
derived by PRISMR with the corresponding binding domains. Their Pearson, and HiCRep correlations are
r=0.67 and SCC=0.96, the HiC-spector score is Q=0.49. This is albeit the genomic length of chromosome 7 is
around 30 times longer than the previously considered Sox9 region and the corresponding matrix at 250kb
about 16 times greater than the 30kb Sox9 matrix.
6
As for the much smaller Sox9 region, the binding domains have a complex arrangement along chromosome
7, overlapping each other and some extending along the whole chromosome. The most contributing
binding domains to each pairwise interaction are shown in Fig.3b, highlighting a major role for domains 3
(light blue), 4 (blue) and 10 (dark grey). The former and the latter appears to be responsible for large metadomains formations at chromosome opposite sides, while the 4th binding domain seems to be involved
mainly in long-range contacts.
Following the same steps explained in the previous section for the Sox9 locus, we also extracted the 3D
equilibrium configurations of chromosome 7. In Fig.3c an example 3D structure of the chromosome 7 is
shown. As above (Section 3.2), chromatin folding within the chr7 SBS model takes places through a classical
polymer physics mechanism of microphase separation [42]. Thus, the obtained good agreement of our
models with experimental contact data for both Sox9 region and chromosome 7 suggests how microphase
separation could be involved in chromatin 3D organization both at the Mb-scale and at the chromosome
level, as also supported by recent experimental results [43–45].
4.2. Single-allele 3D location of the mouse orthologue of the human 16p.11.2 locus
A possible striking application of the PRISMR inferred 3D structures is locating interesting genomic regions
in their architectural context, say e.g. peripheral or internal. This can be instrumental, for instance, to
assess whether a locus can form contacts with other chromosomes or the nuclear lamina. With this aim, we
focused on a 2Mb region on chromosome 7, containing the mouse orthologue of the human 16p.11.2
locus. This region has a big interest in biomedicine as copy number variations (CNVs, i.e. deletions and
duplications) happening there have been associated to phenotypes including severe cognitive disorders,
such as autism [46–48]. Cis- and trans- chromatin interactions of the locus have been moreover identified
with distal genomic regions associated with similar phenotypes [48]. The study of the 16p.11.2 locus with
polymer physics models could help to investigate a possible link [48] between such rearrangements and
chromatin contacts disruption, which could lead to the observed phenotypes more than genes directly
affected by the CNV.
In the 3D snapshot of Fig. 3c, we marked in red the considered 2Mb region (133:135Mb) containing the
mouse orthologue of the human 16p.11.2 locus (133.84:134.24Mb). In the illustrated conformation, the
16p locus is projecting outwardly, however we explored the variability of its radial position over the whole
ensemble of 3D structures. We computed, for each conformation, the distance of the 16p locus centre of
mass from the centre of mass of the whole chromosome, normalizing such a distance by the gyration
radius. We then extracted the distribution of the normalized distance, which we labelled as r/Rg. We did
the same for 3*102 loci randomly selected over chromosome 7, sized as the 16p locus, thus getting a
control random distribution of normalized radial positions. We reported in Fig.3d the 16p distribution
against the control one. The distributions are broad to a similar extent, with standard deviations around
34% and 38% of the mean values respectively for the 16p and the control case, indicating a high variability
of the loci positions ranging from r/Rg near to 0.0 up to 2.0. Nevertheless, the distributions are not
compatible (p-value = 0.01, Mann-Whitney U test), the 16p one being slightly shifted toward higher r/Rg
values. In particular, if we reasonably assume that values of r/Rg larger than 1.5 indicate peripheral
positions, then this happens roughly the 10% of times for the 16p. The non-negligible frequency to be in
peripheral position is an indication of the possibility for the locus to be involved in trans interactions, as
actually detected by experiments [48].
5. In-silico simulation of independent experimental techniques
5.1. Validation of the inferred 3D polymer models
7
As shown in the previous sections the 3D polymer models inferred by PRISMR well reproduce the input
GAM data. In order to validate such 3D models, however, we tested their ability to predict the outcome of
experiments not given as input to the algorithm.
As a first test, from the generated ensemble of 3D conformations of the chromosome 7, derived by PRISMR
with GAM D’ data as input, we extracted our predicted co-segregation matrix, by employing our in-silico
GAM algorithm. The experimental matrix and the predicted one compare as follows (see Fig.4a): r=0.64,
Q=0.45 and SCC=0.62. All these correlations assess a comparative similarity between our model and
experiment, showing that by applying PRISMR on the D’ matrix the main features of the co-segregation
matrix can be reconstructed. In other words, PRISMR allowed the generation of an ensemble of polymers
which represent the conformations of chromosomes 7 that emerge from GAM data.
Next, as a more stringent test, we simulated an Hi-C experiment over our population of 3D structures
coming from the Sox9 GAM co-segregation data input. The algorithm employed to simulate the Hi-C
technique is the same used in [30], where the average contact frequency matrix over the ensemble of 3D
structures is calculated by considering in interaction all loci within a threshold distance. We obtained an insilico contact matrix of the Sox9 locus and compared it with independent Hi-C experimental data of the
same region in mESCs, at the same 40kb resolution, taken from [6]. Strikingly, our simulated matrix is very
similar to the experimental one (Fig.4b), with r=0.90, Q=0.75 and SCC=0.51. Thus, we found a good match
with Hi-C data by simulating an Hi-C experiment over a population of 3D structures derived from GAM data,
that is a completely independent and different kind of technique. This is an indication that PRISMR yields
3D configurations that can be considered representative of the real possible states a locus may be found in.
As a further and final test, we compared the Sox9 3D structures as inferred from GAM co-segregation data
with the ones previously obtained [30] by applying PRISMR on the Hi-C data [6] of the same region. The two
independently derived ensembles of 3D structures result similar to each other (Fig. 5). For example, a
quantitative comparison of the degree of compaction of the 3D conformations, as measured by the
gyration radius distribution, for the GAM derived and Hi-C derived conformations is shown in Fig. 5b.
Although the GAM case distribution is slightly more shifted toward low values than the Hi-C one, the
distributions are very similar in shape. This is a notable result as the GAM and Hi-C matrices given as input
to PRISMR have a profoundly different data structure and further support the robustness of PRISMR when
changing input data type.
5.2. Exploring in-silico the performance of GAM and Hi-C with 3D polymer models
In the previous sections, the different experimental techniques have been simulated in silico in an ideal
manner. In fact, Hi-C includes a number of steps such as crosslinking, digestion, biotinylation and ligation,
each one having a far from perfect efficiency. However, in simulations, Hi-C like contact frequencies are
calculated by just considering in contact each pair of loci on a folded polymer that lies within a distance
threshold. In this way, we obtain an ideally perfect detection of contacts, much more efficient than if we
had e.g. to cross-link each time only one pair of loci among all contacting ones. The same states for GAM insilico reconstruction, where we simulate ideal, perfectly efficient experiments. For example, we in-silico cut
a very large number of slices (tens of thousands in the previously reported simulations), since there is no
limit to the number of slices that can be simulated, apart from computational time availability.
Interestingly, however, more realistic in-silico simulations could be easily implemented, that would allow to
investigate experimental techniques behaviour under different conditions, such as a different number of
cut slices for GAM or a different cell population or number of single-molecule conformations, or different
efficiencies for each experimental step.
As only a first exploration, we used our in-silico method to derive how the co-segregation matrix would
look like if a different number of slices were cut. We report as an example, in Fig.S2 a co-segregation matrix
8
predicted by in-silico cutting exactly 408 slices, as in the original GAM experiment [4]. Its Pearson
correlation with the GAM experimental co-segregation matrix is r=0.54, the HiC-spector score is Q=0.47 and
the HiCRep one amounts to SCC=0.40. Compared with the ideal case exposed in Section 4.1, the considered
similarity metrics result generally lower. That is presumably due to the increased noise level, leading to
sensible statistical fluctuations on a 408-slice matrix, which in turn are supposedly much weaker at tens of
thousands slices. Further investigation about the minimal number of slices necessary for the co-segregation
matrix to stabilise would give an important indication to design future GAM experiments. More generally,
this kind of in-silico predictions could be used to understand, for instance, Hi-C or GAM optimal settings
without actually performing the experiments many times, thus overcoming technical and economic
limitations.
6. CONCLUSIONS
PRISMR is a computational, polymer-physics-based technology by which the folding properties of whatever
locus can be inferred, starting from its Hi-C pairwise contact matrix [32]. Here we showed that PRISMR can
be used to derive chromosome architecture from GAM co-segregation or D’ matrices, provided that proper
modifications are implemented, as discussed here. We tested the extended version of PRISMR on the Sox9
locus and on the entire chromosome 7 in mESCs, with successful results. This also shows the versatility and
robustness of the PRISMR tool, as GAM and Hi-C are two very different techniques, suggesting that PRISMR
could be tuned to work on other types of input as well, like SPRITE or microscopy data. In particular, we
showed that by simulating an in-silico Hi-C experiment by using the 3D structures which PRISMR inferred
from GAM co-segregation data, we successfully reproduced with high accuracy real Hi-C data from
independent experiments. Additionally, a comparison of these 3D structures with the ones previously
derived by applying PRISMR on Hi-C data [6, 30], showed that they are strikingly consistent, despite coming
from completely different experiments. As an application of the results from PRISMR, we discussed how its
derived 3D structures can be interestingly exploited to explore experimental tecniques performances under
different conditions and to guide the design of more accurate experiments. We also discussed how the
PRISMR derived 3D structures can be employed to make sense of the patterns emerging from 2D contact
matrices or to assess the positions and reciprocal distances of interesting loci. In particular, we investigated
the radial position of the 16p.11.2, a locus linked to cognitive disorders. We found a high singleconformation variability which gives an indication of the possibility for the locus to be engaged in different
cis- and trans- contacts. Further modelling of disease-associated rearrangements at this locus could help to
investigate the possible rewiring of contacts with distal genes and their regulators, as a possible phenotypecausing mechanism. In this sense PRISMR represents a useful resource to extract new information and to
shed new light on experimental data.
Recently, an approach named GEM [49] has been developed to reconstruct the genome 3D organization.
Within GEM, a chromatin segment of N windows is modelled as a Gaussian chain of N beads with pairwise
harmonic interactions [50]. The NxN pair-wise coupling coefficients are the model parameters: they must
be chosen to best fit a given input NxN chromatin contact map. The Gaussian nature of the model implies
that its scaling properties do not match those of real polymers, yet it has the main advantage to be easily
fully solved to derive the set of NxN coupling coefficients that best reproduce, at thermodynamic
equilibrium, the input experimental contact data. That makes GEM extremely efficient from a
computational point of view, albeit approximate. One of the advantages of PRISMR is that the number of its
parameters is orders of magnitude smaller, being the genomic locations of the model binding sites, which
scale linearly with the length N of the considered region. Importantly, from only those parameters the
experimental contact matrix, scaling with NxN, can be fully reconstructed. PRISMR has also the advantage
to employ more advanced models for chromatin, taking into account the Self-Avoiding nature of polymers
9
[41], hence respecting their correct scaling properties [30]. Indeed, in the GEM model the inclusion of the
excluded volume leads to a loss of accuracy [49]. The price to pay for PRISMR is that heavier computations
are required.
Here, we applied PRISMR to find the optimal SBS model of chromatin, as the SBS has been successfully used
to fit Hi-C and GAM data [4, 29, 30] and for its comparative simplicity. Nevertheless, it neglects a number of
complications arising in real chromatin as emerging from the physics of complex systems, such as offequilibrium, jamming, segregation and stress anomalies effects [51–60]. While improvements are certainly
required in our modelling of chromatin, PRISMR is a powerful tool to reconstruct the 3D architecture of the
genome and to interpret complex experimental data, such as pair-wise contact matrices, in the light of
underlying fundamental molecular mechanisms. We believe that the combination of modelling and
experimental advancements can provide deeper and critical insights in the mechanisms of chromatin 3D
organization and ultimately in the development of medical tools for diagnosis and treatment of diseases
associated with chromosome mis-folding.
Acknowledgements
M.N. acknowledges grants from the National Institutes of Health Common Fund 4D Nucleome Program
grant (1U54DK107977-01), the EU H2020 Marie Curie ITN (813282), the Einstein BIH Fellowship Award
(EVF-BIH-2016-282), CINECA ISCRA (HP10CRTY8P), Regione Campania SATIN Project 2018-2020, and
computer resources from the INFN, CINECA, ENEA CRESCO/ENEAGRID [61] and SCoPE/ReCaS at the
University of Naples. A.P. acknowledges support from the National Institutes of Health Common Fund 4D
Nucleome Program grant U54DK107977, the Helmholtz Association, and from the Berlin Institute of Health
(CRG2b-TP3).
Author contributions
S.B. and M.N. designed the project. L.F., S.B., A.M.C., M.B. developed the modeling part; L.F., S.B., A.E., C.A.,
M.C., A.C. ran the computer simulations and performed analyses. A.Pr. and A.Po. gave conceptual advice.
L.F., S.B., A.Po., M.N. wrote the manuscript.
REFERENCES
[1]
J. Dekker and L. Mirny, “The 3D Genome as Moderator of Chromosomal Communication,” Cell,
2016.
[2]
M. Spielmann, D. G. Lupiáñez, and S. Mundlos, “Structural variation in the 3D genome,” Nature
Reviews Genetics. 2018.
[3]
E. Lieberman-Aiden et al., “Comprehensive mapping of long-range interactions reveals folding
principles of the human genome,” Science (80-. )., 2009.
[4]
R. A. Beagrie et al., “Complex multi-enhancer contacts captured by genome architecture mapping,”
Nature, 2017.
[5]
S. A. Quinodoz et al., “Higher-Order Inter-chromosomal Hubs Shape 3D Genome Organization in the
Nucleus,” Cell, 2018.
[6]
J. R. Dixon et al., “Topological domains in mammalian genomes identified by analysis of chromatin
interactions,” Nature, vol. 485, no. 7398, pp. 376–380, 2012.
[7]
E. P. Nora et al., “Spatial partitioning of the regulatory landscape of the X-inactivation centre,”
Nature, 2012.
[8]
T. Sexton et al., “Three-dimensional folding and functional organization principles of the Drosophila
genome,” Cell. 2012.
[9]
J. E. Phillips-Cremins et al., “Architectural protein subclasses shape 3D organization of genomes
during lineage commitment,” Cell, 2013.
10
[10]
[11]
[12]
[13]
[14]
[15]
[16]
[17]
[18]
[19]
[20]
[21]
[22]
[23]
[24]
[25]
[26]
[27]
[28]
[29]
[30]
[31]
[32]
[33]
[34]
J. Fraser et al., “Hierarchical folding and reorganization of chromosomes are linked to transcriptional
changes in cellular differentiation,” Mol. Syst. Biol., 2015.
A. M. Chiariello, S. Bianco, C. Annunziatella, A. Esposito, and M. Nicodemi, “The scaling features of
the 3D organization of chromosomes are highlighted by a transformation a la Kadanoff of Hi-C data,”
EPL, 2017.
B. van Steensel and A. S. Belmont, “Lamina-Associated Domains: Links with Chromosome
Architecture, Heterochromatin, and Gene Repression,” Cell. 2017.
S. S. P. Rao et al., “A 3D map of the human genome at kilobase resolution reveals principles of
chromatin looping,” Cell, 2014.
M. Barbieri et al., “Active and poised promoter states drive folding of the extended HoxB locus in
mouse embryonic stem cells,” Nat. Struct. Mol. Biol., vol. 24, no. 6, pp. 515–524, 2017.
K. Monahan, A. Horta, and S. Lomvardas, “LHX2- and LDB1-mediated trans interactions regulate
olfactory receptor choice,” Nature, 2019.
D. Baú et al., “The three-dimensional folding of the α-globin gene domain reveals formation of
chromatin globules,” Nat. Struct. Mol. Biol., 2011.
R. Kalhor, H. Tjong, N. Jayathilaka, F. Alber, and L. Chen, “Genome architectures revealed by
tethered chromosome conformation capture and population-based modeling,” Nat. Biotechnol.,
2012.
Z. Zhang, G. Li, K.-C. Toh, and W.-K. Sung, “3D Chromosome Modeling with Semi-Definite
Programming and Hi-C Data,” J. Comput. Biol., 2013.
L. Giorgetti et al., “Predictive polymer modeling reveals coupled fluctuations in chromosome
conformation and transcription,” Cell, 2014.
F. Serra et al., “Restraint-based three-dimensional modeling of genomes and genomic domains,”
FEBS Lett., 2015.
B. Adhikari, T. Trieu, and J. Cheng, “Chromosome3D: Reconstructing three-dimensional
chromosomal structures from Hi-C interaction frequency data using distance geometry simulated
annealing,” BMC Genomics, 2016.
P. Szalaj et al., “3D-GNOME: an integrated web service for structural modeling of the 3D genome,”
Nucleic Acids Res., 2016.
T. Trieu and J. Cheng, “3D genome structure modeling by Lorentzian objective function,” Nucleic
Acids Res., 2017.
A. L. Sanborn et al., “Chromatin extrusion explains key features of loop and domain formation in
wild-type and engineered genomes,” Proc. Natl. Acad. Sci., 2015.
A. Goloborodko, J. F. Marko, and L. A. Mirny, “Chromosome Compaction by Active Loop Extrusion,”
Biophys. J., 2016.
G. Fudenberg, M. Imakaev, C. Lu, A. Goloborodko, N. Abdennur, and L. A. Mirny, “Formation of
Chromosomal Domains by Loop Extrusion,” Cell Rep., 2016.
C. A. Brackley et al., “Nonequilibrium Chromosome Looping via Molecular Slip Links,” Phys. Rev.
Lett., 2017.
A. Esposito, C. Annunziatella, S. Bianco, A. M. Chiariello, L. Fiorillo, and M. Nicodemi, “Models of
polymer physics for the architecture of the cell nucleus,” Wiley Interdisciplinary Reviews: Systems
Biology and Medicine, 2018.
M. Barbieri et al., “Complexity of chromatin folding is captured by the strings and binders switch
model,” Proc. Natl. Acad. Sci., 2012.
A. M. Chiariello, C. Annunziatella, S. Bianco, A. Esposito, and M. Nicodemi, “Polymer physics of
chromosome large-scale 3D organisation,” Sci. Rep., 2016.
C. Annunziatella, A. M. Chiariello, S. Bianco, and M. Nicodemi, “Polymer models of the hierarchical
folding of the Hox-B chromosomal locus,” Phys. Rev. E, 2016.
S. Bianco et al., “Polymer physics predicts the effects of structural variants on chromatin
architecture,” Nat. Genet., 2018.
M. Franke et al., “Formation of new chromatin domains determines pathogenicity of genomic
duplications,” Nature, vol. 538, no. 7624, pp. 265–269, 2016.
M. Nicodemi and A. Prisco, “Thermodynamic pathways to genome spatial organization in the cell
11
[35]
[36]
[37]
[38]
[39]
[40]
[41]
[42]
[43]
[44]
[45]
[46]
[47]
[48]
[49]
[50]
[51]
[52]
[53]
[54]
[55]
[56]
[57]
[58]
[59]
[60]
[61]
nucleus,” Biophys. J., 2009.
C. Annunziatella, A. M. Chiariello, A. Esposito, S. Bianco, L. Fiorillo, and M. Nicodemi, “Molecular
Dynamics simulations of the Strings and Binders Switch model of chromatin,” Methods. 2018.
A. M. Chiariello et al., “A polymer physics investigation of the architecture of the murine orthologue
of the 7q11.23 human locus,” Frontiers in Neuroscience. 2017.
B. K. Kragesteen et al., “Dynamic 3D chromatin architecture contributes to enhancer specificity and
limb morphogenesis,” Nat. Genet., 2018.
M. Franke et al., “Formation of new chromatin domains determines pathogenicity of genomic
duplications,” Nature, 2016.
K. K. Yan, G. G. Yardlmcl, C. Yan, W. S. Noble, and M. Gerstein, “HiC-spector: A matrix library for
spectral and reproducibility analysis of Hi-C contact maps,” in Bioinformatics, 2017.
T. Yang et al., “HiCRep: assessing the reproducibility of Hi-C data using a stratum-adjusted
correlation coefficient,” Genome Res., 2017.
P. G. de Gennes, Scaling concepts in polymer physics. Ithaca, NY: Cornell University Press., 1979.
L. Leibler, “Theory of Microphase Separation in Block Copolymers,” Macromolecules, 1980.
D. Hnisz, K. Shrinivas, R. A. Young, A. K. Chakraborty, and P. A. Sharp, “A Phase Separation Model for
Transcriptional Control,” Cell. 2017.
A. G. Larson et al., “Liquid droplet formation by HP1α suggests a role for phase separation in
heterochromatin,” Nature, 2017.
A. R. Strom, A. V. Emelyanov, M. Mir, D. V. Fyodorov, X. Darzacq, and G. H. Karpen, “Phase
separation drives heterochromatin domain formation,” Nature, vol. 547, no. 7662, pp. 241–245,
2017.
S. E. McCarthy et al., “Microduplications of 16p11.2 are associated with schizophrenia,” Nat. Genet.,
2009.
J. L. Stein, “Copy number variation and brain structure: Lessons learned from chromosome
16p11.2,” Genome Med., 2015.
M. N. Loviglio et al., “Chromosomal contacts connect loci associated with autism, BMI and head
circumference phenotypes,” Mol. Psychiatry, 2017.
G. Le Treut, F. Képès, and H. Orland, “A Polymer Model for the Quantitative Reconstruction of
Chromosome Architecture from HiC and GAM Data,” Biophys. J., 2018.
M. Bohn, D. W. Heermann, and R. Van Driel, “Random loop model for long polymers,” Phys. Rev. E Stat. Nonlinear, Soft Matter Phys., 2007.
E. Caglioti, A. Coniglio, H. J. Herrmann, V. Loreto, and M. Nicodemi, “Segregation of granular
mixtures in the presence of compaction,” Europhys. Lett., 1998.
M. Nicodemi, B. Panning, and A. Prisco, “A thermodynamic switch for chromosome colocalization,”
Genetics, 2008.
M. Nicodemi, A. Fierro, and A. Coniglio, “Segregation in hard-sphere mixtures under gravity. An
extension of Edwards approach with two thermodynamical parameters,” Europhys. Lett., 2002.
M. Nicodemi and A. Coniglio, “Macroscopic glassy relaxations and microscopic motions in a
frustrated lattice gas,” Phys. Rev. E - Stat. Physics, Plasmas, Fluids, Relat. Interdiscip. Top., 1998.
M. P. Ciamarra, R. Pastore, M. Nicodemi, and A. Coniglio, “Jamming phase diagram for frictional
particles,” Phys. Rev. E - Stat. Nonlinear, Soft Matter Phys., 2011.
M. P. Ciamarra, M. Nicodemi, and A. Coniglio, “Recent results on the jamming phase diagram,” Soft
Matter, 2010.
D. S. Grebenkov, M. P. Ciamarra, M. Nicodemi, and A. Coniglio, “Flow, ordering, and jamming of
sheared granular suspensions,” Phys. Rev. Lett., 2008.
M. Nicodemi, “Force correlations and arch formation in granular assemblies,” Phys. Rev. Lett., 1998.
A. Coniglio, A. De Candia, A. Fierro, and M. Nicodemi, “Universality in glassy systems,” J. Phys.
Condens. Matter, 1999.
D. Hamon, M. Nicodemi, and H. J. Jensen, “Continuously driven OFC: A simple model of solar flare
statistics,” Astron. Astrophys., 2003.
G. Ponti et al., “The role of medium size facilities in the HPC ecosystem: The case of the new
CRESCO4 cluster integrated in the ENEAGRID infrastructure,” in Proceedings of the 2014
12
International Conference on High Performance Computing and Simulation, HPCS 2014, 2014.
13
Fig. 1: The SBS model and the PRISMR algorithm for GAM data. a) A cartoon of the SBS polymer model.
Beads of the same type (colour) along the chain can be bridged by diffusing cognate binders. b) A scheme
of the PRISMR method algorithm (adapted from [32]): PRISMR infers the minimal, optimal SBS polymer,
whose equilibrium folded structures best reproduce an input contact matrix. c) Representation of the insilico GAM algorithm: an SBS polymer is randomly placed inside a surface (representing the cell nucleus),
then a slice is cut through and the beads found in that slice are counted as co-segregated.
Fig. 2: The PRISMR polymer model of the Sox9 locus from GAM mESC data. a) GAM co-segregation matrix
[4] of the Sox9 locus (top, mESC chr11:109-115Mb, 408 slices, 40kb) is compared with the corresponding
PRISMR derived one (bottom). Their Pearson correlation is r=0.93, the HiC-spector score is Q=0.81 and the
HiCRep SCC=0.96. The PRISMR inferred binding sites distribution of the Sox9 polymer model is shown in the
middle panel: 15 different types (colours) are identified. b) The GAM D’ matrix of the Sox9 locus (top) is
compared to the PRISMR one (bottom). They have r=0.86, Q=0.69, SCC=0.97. The PRISMR inferred PRISMR
binding sites distribution is shown. Consistently with the case in a), 15 colours are identified. The mean
genomic overlap (see text) between the distributions of binding sites in the two cases is q=70%. Overall,
PRISMR deals equally well with both kinds of experimental data, GAM co-segregation and D’ matrices. c)
The most contributing binding domain to each pairwise interaction is shown for the PRISMR model derived
from the GAM co-segregation matrix. The shown colour bar code right above the matrix represents the
TADs of the Sox9 locus as found in [6]. They are coloured following as closest as possible the colour of the
most contributing binding domain to each TAD. d) A snapshot of the 3D conformation of the Sox9 locus
(see from two angles) derived from our MD simulations of the SBS model from GAM co-segregation data.
The employed colour code is shown in panel c. The region containing the Sox9 gene is highlighted in blue.
Fig. 3: The PRISMR polymer model of chr7 from GAM mESC data. a) The GAM D’ matrix (top) of mESC chr7
at 250kb of resolution [4] - where dark blue stripes represents region with mapping issues or biases - is
compared to the corresponding PRISMR matrix (bottom). Their Pearson correlation coefficient is r=0.67,
the HiC-spector score is Q=0.49, the HiCRep correlation is SCC=0.96. The inferred model binding domains
(10 different types were found) are shown in the middle. b) Most contributing binding domain to each
pairwise interaction for the PRISMR-derived D’ matrix of chromosome 7. The colour bar code above is used
for visualization of Fig.3c. c) A snapshot of the 3D configuration of the inferred model of chr.7. The code
colour is shown in panel b). The region containing the 16p11.2 mouse orthologue locus (133.84:134.28Mb)
is highlighted in red. d) A histogram (blue) showing the distribution of r/Rg, i.e., the positions of the center
of mass of the 16p11.2-locus, r, normalized by the gyration radius, Rg, of the chromosome. The over
imposed grey histogram shows the same distribution for 3*102 randomly taken loci, with the same size of
the 16p11.2. Standard deviation to mean value ratios are 34% for the 16p case and 38% for the control,
highlighting a high variability across single-chromosomes conformation. The two distributions are not
compatible (p-value = 0.01, Mann Whitney U test), as the 16p locus distribution is slightly shifted toward
more peripheral positions.
Fig. 4: PRISMR inferred 3D structures are validated against independent GAM and Hi-C data. a) The
PRISMR co-segregation matrix for mESC chr7, predicted from the ensemble of 3D structures inferred from
GAM D’ data (top) is compared to the experimental GAM co-segregation data (bottom, [4]). The
comparison yields Pearson correlation r=0.64, HiC-spector score Q=0.45 and HiCRep correlation SCC=0.62.
b) The Hi-C-like contact matrix for the mESC Sox9 locus predicted from the ensemble of 3D configurations
inferred by PRISMR from GAM co-segregation data (top) is compared to independent Hi-C data (bottom,
14
[6]). The comparison yields Pearson correlation r=0.90, HiC-spector score Q=0.75 and HiCRep correlation
SCC=0.51.
Fig.5: PRISMR 3D structures from GAM data are consistent with the ones inferred from Hi-C data. a) Sox9
locus 3D structures derived by PRISMR from GAM co-segregation [4] data (on the left; on top-left the same
conformation already shown in Fig.2d and on bottom-left another example from the same ensemble,
described in section 3.2) are compared with 3D structures previously derived by PRISMR (in [30]) from
mESC Hi-C data [6] of the same Sox9 region at the same, 40kb, resolution (on the right). The colouring is
made as in Fig. 2d. b) Gyration radius distributions for the GAM-derived (red) and Hi-C – derived 3D
structures. Gyration radius is expressed in unit of the SBS bead diameter (σ).
15
16
17
18
19
20
HIGHLIGHTS
• A computational method, based on polymer-physics, is introduced to extract 3D structures
of genomic loci from GAM data
• The method is robustly applied to co-segregation and linkage disequilibrium (D’) GAM data.
• The method is applied to describe the 3D folding of a 6Mb region around the Sox9 gene
and of the whole chromosome 7 in mouse ES cells.
• The inferred 3D structures from GAM data are successfully compared against independent
Hi-C experiments.
21