A Scalable Computational Approach for Simulating Complexes of Multiple Chromosomes

Antonio B. Oliveira Junior; Vinícius G. Contessoto; Matheus F. Mello; José N. Onuchic

Research Article A Scalable Computational Approach for Simulating Complexes of Multiple Chromosomes Antonio B. Oliveira Junior 1,2†, Vinı́cius G. Contessoto 1,3†, Matheus F. Mello 1,4 and José N. Onuchic 1,2⇑ 1 - Center for Theoretical Biological Physics, Rice University, Houston, TX, USA 2 - ICTP South American Institute for Fundamental Research, Instituto de Fı´sica Teórica, UNESP – 01140-070, São Paulo, SP, Brazil 3 - Instituto de Biociências, Letras e Ciências Exatas, UNESP - Univ. Estadual Paulista, Departamento de Fı´sica, São José do Rio Preto, SP, Brazil 1 4 - Chemical Engineering Department, Military Institute of Engineering, Rio de Janeiro, RJ, Brazil Correspondence to José N. Onuchic: Center for Theoretical Biological Physics, Rice University, Houston, TX, USA. antonio.oliveira@rice.edu (A.B. Oliveira Junior), vinicius.contessoto@rice.edu (V.G. Contessoto), jonuchic@ rice.edu (J.N. Onuchic) https://doi.org/10.1016/j.jmb.2020.10.034 Edited by Anna Panchenko Abstract Significant efforts have been recently made to obtain the three-dimensional (3D) structure of the genome with the goal of understanding how structures may affect gene regulation and expression. Chromosome conformational capture techniques such as Hi-C, have been key in uncovering the quantitative information needed to determine chromatin organization. Complementing these experimental tools, co-polymers theoretical methods are necessary to determine the ensemble of three-dimensional structures associated to the experimental data provided by Hi-C maps. Going beyond just structural information, these theoretical advances also start to provide an understanding of the underlying mechanisms governing genome assembly and function. Recent theoretical work, however, has been focused on single chromosome structures, missing the fact that, in the full nucleus, interactions between chromosomes play a central role in their organization. To overcome this limitation, MiChroM (Minimal Chromatin Model) has been modified to become capable of performing these multi-chromosome simulations. It has been upgraded into a fast and scalable software version, which is able to perform chromosome simulations using GPUs via OpenMM Python API, called Open-MiChroM. To validate the efficiency of this new version, analyses for GM12878 individual autosomes were performed and compared to earlier studies. This validation was followed by multi-chain simulations including the four largest human chromosomes (C1-C4). These simulations demonstrated the full power of this new approach. Comparison to Hi-C data shows that these multiple chromosome interactions are essential for a more accurate agreement with experimental results. Without any changes to the original MiChroM potential, it is now possible to predict experimentally observed inter-chromosome contacts. This scalability of Open-MiChroM allow for more audacious investigations, looking at interactions of multiple chains as well as moving towards higher resolution chromosomes models. Ó 2020 Elsevier Ltd. All rights reserved. 0022-2836/Ó 2020 Elsevier Ltd. All rights reserved. Journal of Molecular Biology (2020) xxx, xxx–xxx Please cite tts article as: A.B. Oliveira Junior, Vinı́cius G. Contessoto, M.F. Mello, et al., A Scalable Computational Approach for Simulating Complexes of Multiple ChromosomesJournal of Molecular Biology, https://doi.org/10.1016/j.jmb.2020.10.034 A.B. Oliveira Junior, Vinı´cius G. Contessoto, M.F. Mello, et al. Journal of Molecular Biology xxx (xxxx) xxx factor in simulating large groups of chromosomes. A simulation for the whole human genome, for example, would include more than 150,000 beads at a 50 kb resolution. To overcome this challenge, we introduce OpenMiChroM, a software tool that implements the MiChroM energy function for chromosome simulation using the OpenMM Python API.27 Open-MiChroM allows high-performance simulations through Graphics Processing Unit (GPU) acceleration and allows the simulation of more complex and larger systems. Initially, OpenMiChroM is used to generate an ensemble of 3D structures for all individual chromosomes of the cell line GM12878. Using this ensemble of structures, we generate the in silico Hi-C maps for each chromosome to validate and determine the efficiency of the new method. This was followed by simulating multiple chromosomes together to demonstrate the scalability power of Open-MiChroM. Initial simulations included chromosomes 1 to 4, generating the in silico Hi-C maps for over eight million different structures. The Open-MiChroM code and the trajectories generated for this work are deposited and freely available at the Nucleome Data Bank (https://ndb.rice.edu/). A tutorial page about Open-MiChroM, containing the pipeline to run the chromosome simulations and generate the in silico Hi-C maps, is also provided at the NDB website.28 Comparison to Hi-C data shows that the inclusion of interactions among chromosomes is essential to be able to predict experimental Hi-C maps beyond single chromosome but also reproducing all the experimentally observed contacts between different chromosomes. Introduction Since the human DNA sequence was decoded1 in early 2000s, significant efforts have been made to understand the three-dimensional (3D) structure of the genome and its role in gene regulation and expression. This challenge gave rise to a plethora of experimental techniques needed to investigate the complexity of chromatin folding.2–8 Hi-C is central in this effort and it uses a genome-wide highthroughput sequencing technique that provides the contact frequency between all genomic loci pairs. Hi-C data supports the existence of chromosome territories and also identified that the overall genome organization can be described by two major compartments (called A and B) and their respective sub-compartments. Compartment A has a strong presence of genes, thus higher expression, and, as expected, it is more accessible. On the other hand, compartment B is more densely packed and tends to be more located in the core of the chromosomes.9 Another genome feature identified from these Hi-C maps is the topologically associating domains (TADs), selfinteracting partitions in order of mega bases.10 Additionally, Rao and co-workers identified local domains associated with distinct patterns of histone marks resulting in six sub-compartments; they also observed the presence of loops and observed that loops frequently link promoters and enhancers, indicating their key role in gene activation.11 Although Hi-C maps provide information about the genome’s organization, the data obtained is an average over a large number cells. It produces a 2D average contact map and not the much desired 3D ensemble of structures. Advances in high-resolution microscopy and polymer modeling, however, shed light on how chromosomes fold in the 3D space.12–17 Many theoretical models based on polymer physics were designed along the years, which helps in explaining mechanisms such as loops extrusion and phase separation.18–25 We highlight the Minimal Chromatin Model (MiChroM), which was the first of all these methods and has great prediction power. MiChroM is a theoretical energy landscape model for chromatin folding, which uses the maximum entropy principle.18,26 MiChroM energy function was applied to polymer chromosome simulations to generate ensembles of individual human chromosome structures in interphase by using the sub-compartments chromatin types and position of loop anchor loci as inputs. The Hi-C contact maps obtained from the ensemble of chromosome 3D structures reproduced in detail the maps from Hi-C at a 50 kb resolution.18 Although the model was also used to simulate the combined pair of chromosomes 17 and 18 from human GM12 878 lymphoblastoid cells successfully, the computational cost for the simulations increased substantially with the size of the simulation, This has been a limiting Results Open-MiChroM as tool to simulate human genome Comparison of Open-MiChroM to previous simulations were performed to validate this new tool and to verify the capabilities and efficiency of this new method. Open-MiChroM simulations reproduced the experimental Hi-C map from the human B-lymphoblastoid cells GM1287811,18 as shown in Figure 1(A) (upper triangle). The Pearson’s correlation between the simulation (lower triangle) and experimental results (upper triangle) is R = 0.96 for chromosome 1 at a 50 kb resolution. It is worthwhile to mention that the parameters were obtained from a training set using only chromosome 10 from the GM12878 cell line18,11 (details about how obtain these parameters are in SI). Chromosome 10 was utilized because of its diversity in patterns of chromosome interactions. We show that these trained parameters are not only transferable to all other chromosomes, as discussed in our earlier work, but also to interactions between chromosomes. This transferability 2 A.B. Oliveira Junior, Vinı´cius G. Contessoto, M.F. Mello, et al. Journal of Molecular Biology xxx (xxxx) xxx Figure 1. Open-MiChroM simulations of chromosome 1 of the human cell line GM12878. (A) Comparison between the Open-MiChroM in silico Hi-C simulations (lower triangle) and experimental Hi-C map (upper triangle).11 Pearson’s correlation between the maps is 0:96. (B) The first eigenvector obtained from the correlation matrix extracted from the Hi-C maps.9 The orange and blue curves represent experimental data and the Open-MiChroM simulation, respectively. (C) Contact probability as a function of genomic distance. The color scheme is the same as in (B). (D) A representative structure of chromosome 1 highlighting the sub-compartment annotation, following the color scheme shown in (A). (E) The same structure presented in (D) colored by the index locus position. (F) Knot formation distribution for Open-MiChroM and the associated homopolymer. enables high computing performance by using GPU to run the simulations. As presented here, this hardware acceleration is fundamental for simulating large chromosome complexes such as the human full nucleus to be investigated in future works. Comparison to previous single chromosome simulations18,11 were performed for all chromosomes in GM12878 and also for additional human cell lines during interphase.16,28 Figure 1(B) shows the eigenvectors extracted from the in silico, and from the experimental Hi-C results. The agreement between them corroborates the accuracy of Open-MiChroM in determining the compartment annotations A and B. The eigenvector analysis distinguishes the chromatin types A and B: positive and negative values represent the chromatin type A and type B, respectively. Figure 1(C) shows contact probability as a function of the genomic distance. Previous MiChroM simulations exhibited an unprecedented accuracy in reproducing experimental Hi-C maps.18 Open-MiChroM uses the same MiChroM energy function with one further improvement. Since it is a much more efficient methods, there is no need for including a truncation length in the Ideal Chromosome (IC) term. In the original MiChroM potential, the Ideal Chromosome term was trained with a d cutoff ¼ 200 in genome distance and simulated to reproduce the Hi-C maps using a d cutoff ¼ 500. Open-MiChroM has an effective way to embed the IC term by using the beads indexation information. Hence, it is now possible to implement the IC term calculation for the entire chain without the need for truncation and, in addition, using less computational resources. The removal of this cutoff leads to an improvement of the polymer scaling when compared to experiments as shown in Figure 1(C). Figure 1(D) shows a representative structure of chromosome 1. The color of the beads is related to the chromatin subcompartments annotation. Figure 1(E) shows the same chromosome 1 but now the structure is colored by the locus index in the genomic sequence at a 50 kb resolution. Furthermore, Open3 A.B. Oliveira Junior, Vinı´cius G. Contessoto, M.F. Mello, et al. MiChroM has implemented in its software core a function that calculates the knot formation in the chromosome polymer. Reproducing the previous results, Open-MiChroM calculates the Minimal Rope Length29,30 for each frame of the simulation (details can be found in SI). Figure 1(F) shows the knot distributions for the same chromosome simulated using Open-MiChroM and for a the associated homopolymer. Although it still has a few knots, the chromosome is less entangled than the homopolymer. The human chromosome simulations follow a procedure described in Methods Section. OpenMiChroM receives a sequence including chromatin types annotation for each chromosome as an input. An initial open configuration of the chromosome chain is built for the minimization step. The structure is collapsed at a high temperature (T ¼ 5 in reduced units) for randomizing the initial state for the equilibration and sampling simulations. T ¼ 1 was determined in previous work18 as the temperature that the simulation matches as the temperature during the Hi-C experiments production; the chromosome is folded but still dynamical. After this initial collapse, an annealing procedure is performed, decreasing the temperature from T ¼ 5 to T ¼ 1. This strategy creates random and uncorrelated configurations for different simulations. The sampling simulation is carried out over 5 107 steps via Langevin dynamics with a damping coefficient of 10:0s, where s is the time unit. A time step of Dt ¼ 0:01s was used for all the simulations, storing a structure every 1 103 steps. Figure 1(A) (lower triangle) shows the in silico Hi-C for chromosome 1 obtained over 40 sampling simulation replicas generating a total of 2 105 different conformations. The multi-chain simulations follow the same protocol described in Section “Open-MiChroM as tool to simulate human genome”. The structure initialization and the polymer collapse step were performed for each chromosome separately. Once the initial configuration for each chromosome was obtained, the equilibration step starts. The chromosome chains are distributed using the spherical Fibonacci lattices.31 Each of the 40 replicas has a different initial chromosome distribution. This strategy helps to randomize the neighboring interactions in the chromosome interface. The simulations were carried over 2 108 steps and saving a frame every 1 103 steps. In total, an ensemble of 8 million chromosomes structures were obtained from 40 replicas. The chromosome trajectories store the XYZ position of each bead of each chromosome at every frame. The data is deposited in the NDB server following the .ndb file format. Open-MiChroM introduces a reading and writing tool for creating a binary version of the .ndb file format, called .cndb. Open-MiChroM also provides a tool for reading the polymer trajectories and generate the in silico Hi-C maps and the ensemble of 3D structures. Figure 2A shows the in silico Hi-C map(bottom) from the multi-chain simulation. Open-MiChroM reproduces intra- and inter-chain loci contact observed in the experimental Hi-C maps (upper triangle).11 The colored squares highlight the intra-chain contacts for the four simulated chromosomes. The Pearson’s correlation between the in silico and experimental Hi-C is on average r ¼ 0:94. Figure 2(B) shows representative structures of the four chromosomes. The color intensity highlights the formation of chromosome territories following the more evident intra-chromosome contacts in Figure 2(A). Open-MiChroM allows simulations of multiple chromosomes chains Open-MiChroM captures the phase separation between chromatin types in a single and multiple interacting chromosomes The advance of Open-MiChroM makes it possible simulations of multiple chromosome chains, thus allowing us to analyze more than a single chromosome. To determine how multiple chromosome interactions affect the final folding, we have simulated the first 4 human chromosomes of the cell line GM12878. Table 1 describes the number of beads for each polymer chain and the total beads for the multi-chain simulations. As previously reported, chromatin types A and B tend to phase separate within a chromosome chain.18,17 This observation is consisted with the need to expose active genes to be expressed. Active genes exist mostly in regions of chromatin type A.16,19,9 Open-MiChroM has implemented an analysis of the radial density profile for each subcompartment annotation (The details about RDF calculations are presented in SI). Figure 3A shows the radial distribution for each sub-type (A1, A2, B1, B2, and B3) for a single-chain simulation of chromosome 1 using an ensemble of 2 105 structures. As expected, the B sub-type annotations are more often observed in the inner chromosome shell while chromatin types A are mostly present on the chromosome surface. The AB phase separation in the simulations is perturbed when neighboring chromosomes are introduced. The introduction of inter-chain interactions competes with the interac- Table 1 Details of the chromosomes set up used in OpenMiChroM multi-chain simulations C1 C2 C3 C4 Total Genome Length (Mb) Number of Beads (50 kb) 249.50 243.25 198.20 191.40 882.35 4990 4865 3964 3828 17,647 Journal of Molecular Biology xxx (xxxx) xxx 4 A.B. Oliveira Junior, Vinı´cius G. Contessoto, M.F. Mello, et al. Journal of Molecular Biology xxx (xxxx) xxx Figure 2. Open-MiChroM multi-chain Hi-C maps and associated representative structures for the combined simulation of the first 4 human chromosomes of the cell line GM12878. (A) Hi-C maps from Open-MiChroM simulations (lower triangle) and experiments (upper triangle). The squared contours highlight the intra-chain contacts for each chromosome. (B) Representative structure of the Open-MiChroM multi-chain simulation containing the chromosomes 1 to 4. The color scheme matches the squared contours used in (A). Figure 3. Open-MiChroM single- and multi-chain radial distribution analysis. (A) Open-MiChroM single-chain radial distribution of chromosome 1. The colors represent the different sub-compartment annotations. On the right, it is shown a representative structure highlighting the A-type (red) chromatin located preferentially on the surface and Btype (blue) chromatin buried inside of the chromosome. The gray color represents the centromere region of the chromosome structure. (B) Open-MiChroM radial distribution of chromosomes 1 in a multi-chain simulation. The color scheme is the same as in (A). On the right, it is shown a representative structure of chromosome 1 in the multi-chain simulation.Notice that the structural perturbation due to neighboring interactions is accompanied by a global A-type and B-type phase separation. (C) Open-MiChroM multi-chain radial distribution of the complex of chromosomes 1 to 4. Again, the color scheme is the same as in (A). On the right, it is shown a representative structure of the first four simulated chromosomes. The AB phase separation is observed among different chromosome polymer chains. The radial distribution profile calculation is described in SI. 5 A.B. Oliveira Junior, Vinı´cius G. Contessoto, M.F. Mello, et al. tions of the single chain. The result is a distortion on the spherical shape of the polymer chain observed in the single-chain simulation since A types tend to move to the surface and B types to the core. Therefore, the chromosome complex maintains independent territories for individual chromosomes and a global AB phase separation, as observed in Figure 3(B) and (C). It is interesting to notice that this distortion in structure for individual chromosomes allow the chromosome complex to satisfy both of these conditions. The liquid-like behavior of chromosomes32 in interphase allows the distribution chromatin type A on the chromosomes surface, even with the polymer connectivity constraint. Journal of Molecular Biology xxx (xxxx) xxx the knots calculations, radial density distribution, and the in silico Hi-C contact map. Open-MiChroM is also efficient in storing 3D structures of the chromosome polymer chain. The XYZ position of each bead can be exported into different file formats, including the binary format .ndb, named here as .cndb. Python scripts are developed to read and write .cndb files and the conversion of this file format to others commonly used ones. These scripts are made available at the NDB server. Additionally, a tutorial web page, describing, step-by-step usage of Open-MiChroM is included in the NDB server. Expansion of Open-MiChroM for different resolutions will be implemented in the near future. Going beyond human cell lines and tissues, the model is also being expanded to incorporate experimental data from different species and different cell phases. Finally, it is noteworthy that Open-MiChroM is freely available deposited as an MIT license 4, and any contributions to the code are welcome. Conclusion This new theoretical development and computational tool remove existent barriers and allow for extensions of the current simulation beyond individual chromosomes. They represent the first step towards a detailed simulation of the full nucleus genome. In addition, it also allows for future modifications to current models to improve their resolution beyond 50 kb. These advances also make possible to expand our studies to a large number of cell lines. Here we have shown the joint simulation of the four large chromosomes and have shown how inter-chromosome interactions affects the final structure. Although the single chromosome structures still maintain strong similarity to the individual chromosome ones, it is import to highlight that the AB phase separation observed in individual chromosomes is maintained in this large complex. To achieve these goals, Open-MiChroM is introduced as a software tool to perform dynamical simulations of chromosomes using the Minimal Chromatin Model energy function. OpenMiChroM uses the Python toolkit OpenMM, which enables high computing performance by using GPU to run the simulations. This acceleration is the key for performing simulations of large chromosome complexes and moving towards simulating the full nucleus. There are many other challenges to simulate the human full nucleus, such as determining appropriate initial conditions for the simulations and the effects of interactions with Lamin-A in the nucleus wall, but the theoretical and computational tool towards this goal is now available. Comparison to previous single chromosome simulations18 were performed to show the power of the method. for example, by using Open-MiChroM, one can now remove the cutoff distance in the Ideal Chromosome function (needed before due to computational limitations) leading to a much improved polymer scaling when compared to experiments. To facilitate result analysis, Open-MiChroM expands the analysis tools in the code core. More specifically, these tools include Methods Open-MiChroM is a software tool developed to simulate and analyze the genome organization. Open-MiChroM implements the Minimal Chromatin Model (MiChroM) energy function to simulate chromosome dynamics using the Python API OpenMM software. There are significant advantages in using Open-MiChroM, which are described in Section “Results”. Details about MiChroM, OpenMM, code implementation, and applications are described next. MiChroM potential The Minimal Chromatin Model (MiChroM) is a copolymer physics model that uses the maximum entropy principle constrained by three assumptions to incorporate the structural interactions between genomics loci.20 The energy function takes the following form: X U MiChroM ð~ r Þ ¼ U HP ð~ rÞ þ k Pl k; l 2 Types þv X ði;jÞ2fLoopSitesg X akl f ð~ r ij Þ þ f ð~ r ij Þ i 2 fTypekg j 2 fTypelg dX cutoff d ¼3 cðd Þ X f ð~ r i;iþd Þ; ð1Þ i where U HP ð~ r Þ is the potential energy of a generic homopolymer with an additional soft-core term to allow for sporadic chain crossing (see SI for details). The following terms result from the constraints applied when implementing the maximum entropy approach. The second term in Eq. (1) corresponds to type-to-type interactions that give rise to the compartmentalization pattern found in Hi-C maps. The third term represents the loop anchor points, to account for the CTCF binding sites. The last term is the translational invariant compaction term, called the ideal chromosome 6 A.B. Oliveira Junior, Vinı´cius G. Contessoto, M.F. Mello, et al. Journal of Molecular Biology xxx (xxxx) xxx The Open-MiChroM code uses Python 3 with standard scientific libraries. Figure 4 shows a diagram of the Open-MiChroM simulation pipeline. The code was implemented to accommodate a single or multiple chromosomes chains. Analyses tools and the necessary steps to generate in silico Hi-C maps are implemented in the software core. Open-MiChroM initially creates an initial chromosome chain with the chromatin type annotation sequence as an input. Additionally, the loop anchor information is optional and can be provided as a file containing the CTCF loci observed experimentally. The chromatin sequence and the loop anchor points can be provided as text files, or the users can also use the chromosome information provided in the .ndb file. The initial simulation structure can take several forms: a straight line, a random walk structure, a helicoidal shape, or any structure provided by the user in a .ndb file format. In the case of multiple chains, each chain’s initial position is defined using a Fibonacci Lattice distribution. After this initial preparation step, Open-MiChroM is ready to perform simulations potential, which assumes that, every time a pair of loci comes into contact, there is a gain/loss of effective free energy, cðd Þ, that is a function of genomic distance d. The spatial range of all interactions is controlled by the function f ð~ r ij Þ ¼ 12 f1 þ tanh½lðr c r ij Þg, which varies between one at short distances and zero at large separations. All parameters used in this work were obtained from the previous work,18 see section ”GM12878 Hi-C data usage” in SI for more details. Open-MiChroM implementation and simulation work using OpenMM OpenMM is a software toolkit for highperformance molecular dynamics simulations. OpenMM allows users to add customized forces with novel energy functions, such as those implemented in Open-MiChroM. OpenMM receives the customize energy function expression as an input and generates an optimized code for high-performance computing. OpenMM supports different hardware architectures, including both CPUs and GPUs.27 Figure 4. Open-MiChroM implementation diagram. Open-MiChroM combines the Minimal Chromatin Model (MiChroM) with the Python API OpenMM. This implementation allows for chromosome simulations of genomes with high-performance computing by using the hardware acceleration of GPUs. The Open-MiChroM code accommodates simulations of a single or multi chromosome chains. Open-MiChroM reproduces experimental Hi-C maps. The 3D ensemble of chromosome structures is stored in trajectory files, saving them as .ndb (human-readable text file) or as .cndb (binary version of .ndb). 7 A.B. Oliveira Junior, Vinı´cius G. Contessoto, M.F. Mello, et al. Journal of Molecular Biology xxx (xxxx) xxx in a range of 1 103 to 4 105 beads, comparable to sizes of the human chromosome 21 to the full wheat genome. The simulation pipeline follows the scheme presented in Figure 4. A sequence of the chromatin compartment annotation, types A and B, is provided as an input. For benchmarking purposes, the sequences used in this section were randomly generated. The simulations have 2 steps: 1) an initial conformation, either a straight line or a more compact helicoidal structure is chosen. This is followed by an equilibration run needed for compacting the polymer chain and to randomize the initial 3D structure for the next step. 2) Chromosome simulations are performed to sample different conformations which are needed to obtain the 3D structural ensemble of the chromosomes. For this performance analysis, the simulations go over 2 105 steps using a time step Dt ¼ 0:01s, where s is the time unit described in Section “Open-MiChroM as tool to simulate human genome”. Figure 5 shows the performance of OpenMiChroM for different polymer chain lengths. For both CPUs and GPUs, the simulations were averaged over 20 independent runs. The error bars in this benchmarking are negligible. The CPU yellow curve is truncated at the valued of the longest simulation which is shorter than for GPUs (green line). The performance difference between CPUs and GPUs using Open-MiChroM is similar to those obtained in the OpenMM performance analysis in chromosome simulations. The GPU efficiency using Open-MiChroM allows for the investigation of much larger systems. The comparison presented in Figure 5 shows that using the energy function described in Section “MiChroM potential”. Open-MiChroM equilibration step uses a spherical constraint term for the initial structure collapse. This constraint is optional and is used to speed up the generation of the initial configuration. In the case of multiple chain simulations, equilibration is required until the polymer chains get in contact. After this initial collapse step, this constraint is turned off, and a second equilibration is performed. The second equilibration simulation is a strategy to randomize the initial configuration prior to the sampling simulation. For the simulations presented in this work, the sampling step was carried out over 40 replicas with 5 107 steps, storing a frame every 1 103 steps that generates a total of two million 3D structures. These numbers were used to produce the in silico Hi-C maps which are compared to experimental ones. The comparison for all human autosomes of the cell line GM12878 are presented in the SI. Trajectory files are stored as .ndb or .cndb files. Analysis parameters such as the potential energies, radius of gyration, knot calculation are outputs when requested with the corresponding flag during the Open-MiChroM simulation. These analyses can also be performed by reading the trajectory files after the simulations. Open-MiChroM performance analysis in CPUs and GPUs The Open-MiChroM performance was evaluated based on the simulation time in both CPUs and GPUs. Simulations were performed for chromosome polymer chains at a 50 kb resolution Figure 5. Open-MiChroM performance analyses in CPUs and GPUs. The semi-log plot shows the running time as a function of the polymer size. The times plotted are an average over 20 replicas of 2 105 time steps, using a CPU (Intel(R) Xeon(R) Gold 6230 CPU) (yellow) and a GPU (NVIDIA(R) Tesla(R) V100) (green). The shaded regions represent the time scale in seconds (green region) and hours (yellow region). The icons represent the polymer size compared to a full nucleus genome simulation of different organisms. 8 A.B. Oliveira Junior, Vinı´cius G. Contessoto, M.F. Mello, et al. simulations of a full human genome size with the 46 chromosomes at a 50 kb resolution are now possible using minutes or less of computer time. This also opens the possibility of simulating plant genomes, such as wheat, which have much bigger genomes. It is important to comment that these comparisons only show the ability of OpenMiChroM to scale up in genome size with suitable performance. The results containing the biological comparisons such as between in silico and experimental Hi-C maps are presented in Section “Open-MiChroM as tool to simulate human genome”. Journal of Molecular Biology xxx (xxxx) xxx CAPES (Higher Education Personnel Improvement Coordination) Grants 2016/13998-8 and 2017/09662-7. Appendix A. Supplementary material Supplementary data associated with this article can be found, in the online version, at https://doi. org/10.1016/j.jmb.2020.10.034. Received 26 August 2020; Accepted 31 October 2020; Available online xxxx Keywords: genome architecture; Hi-C; OpenMM; chromosome simulations CRediT authorship contribution statement Antonio B. Oliveira Junior: Conceptualization, Methodology, Investigation, Data curation, Software, Visualization, Writing - review & editing. Vinı́cius G. Contessoto: Conceptualization, Methodology, Investigation, Data curation, Software, Visualization, Writing - review & editing. Matheus F. Mello: Software, Visualization, Writing - review & editing. José N. Onuchic: Conceptualization, Methodology, Investigation, Data curation, Project administration, Resources, Funding acquisition, Writing - review & editing. † These authors contributed equally. References 1. Lander, E.S. et al, (2001). Nature, 409, 860–921. https:// doi.org/10.1038/35057062. ISSN 1476-4687. 2. Dekker, J., Rippe, K., Dekker, M., Kleckner, N., (2002). Capturing chromosome conformation. Science, 295 (5558), 1306–1311. 3. Zhao, Z., Tavoosidana, G., Sjölinder, M., Göndör, A., Mariano, P., Wang, S., Kanduri, C., Lezcano, M., et al., (2006). Circular chromosome conformation capture (4C) uncovers extensive networks of epigenetically regulated intra-and interchromosomal interactions. Nat. Genet., 38 (11), 1341. 4. Simonis, M., Klous, P., Splinter, E., Moshkin, Y., Willemsen, R., De Wit, E., Van Steensel, B., De Laat, W., (2006). Nuclear organization of active and inactive chromatin domains uncovered by chromosome conformation capture–on-chip (4C). Nature Genet., 38 (11), 1348. 5. Dostie, J., Richmond, T.A., Arnaout, R.A., Selzer, R.R., Lee, W.L., Honan, T.A., Rubio, E.D., Krumm, A., et al., (2006). Chromosome Conformation Capture Carbon Copy (5C): a massively parallel solution for mapping interactions between genomic elements. Genome Res., 16 (10), 1299– 1309. 6. Beagrie, R.A., Scialdone, A., Schueler, M., Kraemer, D.C., Chotalia, M., Xie, S.Q., Barbieri, M., de Santiago, I., et al., (2017). Complex multi-enhancer contacts captured by genome architecture mapping. Nature, 543 (7646), 519. 7. Quinodoz, S.A., Ollikainen, N., Tabak, B., Palla, A., Schmidt, J.M., Detmar, E., Lai, M.M., Shishkin, A.A., et al., (2018). Higher-order inter-chromosomal hubs shape 3D genome organization in the nucleus. Cell, 174 (3), 744–757. 8. Olivares-Chauvet, P., Mukamel, Z., Lifshitz, A., Schwartzman, O., Elkayam, N.O., Lubling, Y., Deikus, G., Sebra, R.P., et al., (2016). Capturing pairwise and multiway chromosomal conformations using chromosomal walks. Nature, 540 (7632), 296. 9. Lieberman-Aiden, E., Van Berkum, N.L., Williams, L., Imakaev, M., Ragoczy, T., Telling, A., Amit, I., Lajoie, B. DECLARATION OF COMPETING INTEREST The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper. Acknowledgments We want to thank Michele Di Pierro, Ryan Cheng and Peter Wolynes for important discussions and suggestions during the development of this work. JNO wants to thank the hospitality and support from the ICTP/SAIFR and Instituto de Fı́sica Teórica, UNESP where much of this research was performed. During this period, JNO wants also to thank the FAPESP (São Paulo State Research Foundation) for the support of his sabbatical. This research was supported by the Center for Theoretical Biological Physics sponsored by the NSF (Grants PHY-2019745 and CHE-1614101) and by the Welch Foundation (Grant C-1792). JNO is a Cancer Prevention and Research Institute of Texas (CPRIT) Scholar in Cancer Research. ABOJ was funded by FAPESP (São Paulo State Research Foundation and Higher Education Personnel) Grant 2016/01343-7 and acknowledges the Robert A. Welch Postdoctoral Fellow program. VGC is funded by FAPESP (São Paulo State Research Foundation and Higher Education Personnel), and 9 A.B. Oliveira Junior, Vinı´cius G. Contessoto, M.F. Mello, et al. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. R., et al., (2009). Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science, 326 (5950), 289–293. Dixon, J.R., Selvaraj, S., Yue, F., Kim, A., Li, Y., Shen, Y., Hu, M., Liu, J.S., et al., (2012). Topological domains in mammalian genomes identified by analysis of chromatin interactions. Nature, 485 (7398), 376–380. Rao, S.S.P., Huntley, M.H., Durand, N.C., Stamenova, E. K., Bochkov, I.D., Robinson, J.T., Sanborn, A.L., Machol, I., et al., (2014). A 3D map of the human genome at Kilobase resolution reveals principles of chromatin looping. Cell, 159 (7), 1665–1680. https://doi.org/10.1016/ j.cell.2014.11.021. ISSN 0092-8674. Nir, G., Farabella, I., Estrada, C.P., Ebeling, C.G., Beliveau, B.J., Sasaki, H.M., Lee, S.H., Nguyen, S.C., et al., (2018). Walking along chromosomes with superresolution imaging, contact maps, and integrative modeling. PLoS Genet., 14 (12), e1007872. Bintu, B., Mateo, L.J., Su, J.-H., Sinnott-Armstrong, N.A., Parker, M., Kinrot, S., Yamaya, K., Boettiger, A.N., et al., (2018). Super-resolution chromatin tracing reveals domains and cooperative interactions in single cells. Science, 362 (6413), eaau1783. Boettiger, A.N., Bintu, B., Moffitt, J.R., Wang, S., Beliveau, B.J., Fudenberg, G., Imakaev, M., Mirny, L.A., et al., (2016). Super-resolution imaging reveals distinct chromatin folding for different epigenetic states. Nature, 529 (7586), 418–422. https://doi.org/10.1038/nature16496. ISSN 14764687. Di Pierro, M., (2019). Inner workings of gene folding. Proc. Nat. Acad. Sci., 116 (11), 4774–4775. Cheng, R.R., Contessoto, V.G., Lieberman Aiden, E., Wolynes, P.G., Di Pierro, M., Onuchic, J.N., (2020). Exploring chromosomal structural heterogeneity across multiple cell lines. eLife, 9, e60312. https://doi.org/ 10.7554/eLife.60312. ISSN 2050-084X. Gürsoy, G., Liang, J., (2016). Three-dimensional chromosome structures from energy landscape. Proc. Nat. Acad. Sci., 113 (43), 11991–11993. Di Pierro, M., Zhang, B., Aiden, E.L., Wolynes, P.G., Onuchic, J.N., (2016). Transferable model for chromosome architecture. Proc. Natl. Acad. Sci. U.S.A., 113 (43), 12168–12173. https://doi.org/10.1073/pnas.1613607113. ISSN 0027-8424. Di Pierro, M., Cheng, R.R., Aiden, E.L., Wolynes, P.G., Onuchic, J.N., (2017). De novo prediction of human chromosome structures: epigenetic marking patterns encode genome architecture. Proc. Nat. Acad. Sci., 114 (46), 12126–12131. Zhang, B., Wolynes, P.G., (2015). Topology, structures, and energy landscapes of human chromosomes. Proc. 21. 22. 23. 24. 25. 26. 27. 28. 29. 30. 31. 32. 10 Journal of Molecular Biology xxx (xxxx) xxx Natl. Acad. Sci. U.S.A., 112 (19), 6062–6067. https://doi. org/10.1073/pnas.1506257112. ISSN 0027-8424. Brackley, C.A., Johnson, J., Michieletto, D., Morozov, A.N., Nicodemi, M., Cook, P.R., Marenduzzo, D., (2017). Nonequilibrium chromosome looping via molecular slip links. Phys. Rev. Lett., 119 (13), 138101. MacPherson, Q., Beltran, B., Spakowitz, A.J., (2018). Bottom–up modeling of chromatin segregation due to epigenetic modifications. Proc. Nat. Acad. Sci., 115 (50), 12739–12744. Mirny, L.A., (2011). The fractal globule as a model of chromatin architecture in the cell. Chromosome Res., 19 (1), 37–51. Fudenberg, G., Imakaev, M., Lu, C., Goloborodko, A., Abdennur, N., Mirny, L.A., (2016). Formation of chromosomal domains by loop extrusion. Cell Rep., 15 (9), 2038–2049. Krepel, D., Cheng, R.R., Di Pierro, M., Onuchic, J.N., (2018). Deciphering the structure of the condensin protein complex. Proc. Nat. Acad. Sci., 115 (47), 11911–11916. Di Pierro, M., Cheng, R.R., Zhang, B., Onuchic, J.N., Wolynes, P.G., (2019). Learning genomic energy landscapes from experiments. Model. 3D Conform. Genomes,, 305. Eastman, P., Swails, J., Chodera, J.D., McGibbon, R.T., Zhao, Y., Beauchamp, K.A., Wang, L.-P., Simmonett, A.C., et al., (2017). OpenMM 7: Rapid development of high performance algorithms for molecular dynamics. PLoS Comput. Biol., 13 (7), e1005659. https://doi.org/10.1371/ journal.pcbi.1005659. ISSN 1553-7358. Contessoto, V.G., Cheng, R.R., Hajitaheri, A., DoderoRojas, E., Mello, M.F., Lieberman-Aiden, E., Wolynes, P. G., Di Pierro, M., et al., (2020). The Nucleome Data Bank: web-based resources to simulate and analyze the threedimensional genome. Nucleic Acids Res., gkaa818., ISSN 1362–4962, doi:10.1093/nar/gkaa818. Lua, R.C., (2012). PyKnot: a PyMOL tool for the discovery and analysis of knots in proteins. Bioinformatics, 28 (15), 2069–2071. https://doi.org/10.1093/bioinformatics/bts299. ISSN 1367-4803. Stasiak, A., Katritch, V., Kauffman, L.H., (1998). Ideal Knots jSeries on Knots and Everything. Vol. 19, World Scientific Publishing Company, ISBN 978-981-02-3530-7, doi:10.1142/3843. Swinbank, R., Purser, R.J., (2006). Fibonacci grids: a novel approach to global modelling. Q.J.R. Meteorolog. Soc., 132 (619), 1769–1793. https://doi.org/10.1256/qj.05.227. ISSN 0035-9009. Maeshima, K., Ide, S., Hibino, K., Sasai, M., (2016). Liquidlike behavior of chromatin. Curr. Opin. Genet. Develop., 37, 36–45.

RELATED PAPERS

RELATED TOPICS

Log In

A Scalable Computational Approach for Simulating Complexes of Multiple Chromosomes

A Scalable Computational Approach for Simulating Complexes of Multiple Chromosomes

Related Papers

RELATED PAPERS

RELATED TOPICS