-
BioSimulators: a central registry of simulation engines and services for recommending specific tools
Authors:
Bilal Shaikh,
Lucian P. Smith,
Dan Vasilescu,
Gnaneswara Marupilla,
Michael Wilson,
Eran Agmon,
Henry Agnew,
Steven S. Andrews,
Azraf Anwar,
Moritz E. Beber,
Frank T. Bergmann,
David Brooks,
Lutz Brusch,
Laurence Calzone,
Kiri Choi,
Joshua Cooper,
John Detloff,
Brian Drawert,
Michel Dumontier,
G. Bard Ermentrout,
James R. Faeder,
Andrew P. Freiburger,
Fabian Fröhlich,
Akira Funahashi,
Alan Garny
, et al. (46 additional authors not shown)
Abstract:
Computational models have great potential to accelerate bioscience, bioengineering, and medicine. However, it remains challenging to reproduce and reuse simulations, in part, because the numerous formats and methods for simulating various subsystems and scales remain siloed by different software tools. For example, each tool must be executed through a distinct interface. To help investigators find…
▽ More
Computational models have great potential to accelerate bioscience, bioengineering, and medicine. However, it remains challenging to reproduce and reuse simulations, in part, because the numerous formats and methods for simulating various subsystems and scales remain siloed by different software tools. For example, each tool must be executed through a distinct interface. To help investigators find and use simulation tools, we developed BioSimulators (https://biosimulators.org), a central registry of the capabilities of simulation tools and consistent Python, command-line, and containerized interfaces to each version of each tool. The foundation of BioSimulators is standards, such as CellML, SBML, SED-ML, and the COMBINE archive format, and validation tools for simulation projects and simulation tools that ensure these standards are used consistently. To help modelers find tools for particular projects, we have also used the registry to develop recommendation services. We anticipate that BioSimulators will help modelers exchange, reproduce, and combine simulations.
△ Less
Submitted 13 March, 2022;
originally announced March 2022.
-
SED-ML Validator: tool for debugging simulation experiments
Authors:
Bilal Shaikh,
Andrew Philip Freiburger,
Matthias König,
Frank T. Bergmann,
David P. Nickerson,
Herbert M. Sauro,
Michael L. Blinov,
Lucian P. Smith,
Ion I. Moraru,
Jonathan R. Karr
Abstract:
Summary: More sophisticated models are needed to address problems in bioscience, synthetic biology, and precision medicine. To help facilitate the collaboration needed for such models, the community developed the Simulation Experiment Description Markup Language (SED-ML), a common format for describing simulations. However, the utility of SED-ML has been hampered by limited support for SED-ML amon…
▽ More
Summary: More sophisticated models are needed to address problems in bioscience, synthetic biology, and precision medicine. To help facilitate the collaboration needed for such models, the community developed the Simulation Experiment Description Markup Language (SED-ML), a common format for describing simulations. However, the utility of SED-ML has been hampered by limited support for SED-ML among modeling software tools and by different interpretations of SED-ML among the tools that support the format. To help modelers debug their simulations and to push the community to use SED-ML consistently, we developed a tool for validating SED-ML files. We have used the validator to correct the official SED-ML example files. We plan to use the validator to correct the files in the BioModels database so that they can be simulated. We anticipate that the validator will be a valuable tool for developing more predictive simulations and that the validator will help increase the adoption and interoperability of SED-ML.
Availability: The validator is freely available as a webform, HTTP API, command-line program, and Python package at https://run.biosimulations.org/utils/validate and https://pypi.org/project/biosimulators-utils. The validator is also embedded into interfaces to 11 simulation tools. The source code is openly available as described in the Supplementary data.
Contact: karr@mssm.edu
△ Less
Submitted 1 June, 2021;
originally announced June 2021.
-
Centralizing data to unlock whole-cell models
Authors:
Yin Hoon Chew,
Jonathan R. Karr
Abstract:
Despite substantial potential to transform bioscience, medicine, and bioengineering, whole-cell models remain elusive. One of the biggest challenges to whole-cell models is assembling the large and diverse array of data needed to model an entire cell. Thanks to rapid advances in experimentation, much of the necessary data is becoming available. Furthermore, investigators are increasingly sharing t…
▽ More
Despite substantial potential to transform bioscience, medicine, and bioengineering, whole-cell models remain elusive. One of the biggest challenges to whole-cell models is assembling the large and diverse array of data needed to model an entire cell. Thanks to rapid advances in experimentation, much of the necessary data is becoming available. Furthermore, investigators are increasingly sharing their data due to increased emphasis on reproducibility. However, the scattered organization of this data continues to hamper modeling. Toward more predictive models, we highlight the challenges to assembling the data needed for whole-cell modeling and outline how we can overcome these challenges by working together to build a central data warehouse.
△ Less
Submitted 21 May, 2021;
originally announced May 2021.
-
Practical Resources for Enhancing the Reproducibility of Mechanistic Modeling in Systems Biology
Authors:
Michael L. Blinov,
John H. Gennari,
Jonathan R. Karr,
Ion I. Moraru,
David P. Nickerson,
Herbert M. Sauro
Abstract:
Although reproducibility is a core tenet of the scientific method, it remains challenging to reproduce many results. Surprisingly, this also holds true for computational results in domains such as systems biology where there have been extensive standardization efforts. For example, Tiwari et al. recently found that they could only repeat 50% of published simulation results in systems biology. Towa…
▽ More
Although reproducibility is a core tenet of the scientific method, it remains challenging to reproduce many results. Surprisingly, this also holds true for computational results in domains such as systems biology where there have been extensive standardization efforts. For example, Tiwari et al. recently found that they could only repeat 50% of published simulation results in systems biology. Toward improving the reproducibility of computational systems research, we identified several resources that investigators can leverage to make their research more accessible, executable, and comprehensible by others. In particular, we identified several domain standards and curation services, as well as powerful approaches pioneered by the software engineering industry that we believe many investigators could adopt. Together, we believe these approaches could substantially enhance the reproducibility of systems biology research. In turn, we believe enhanced reproducibility would accelerate the development of more sophisticated models that could inform precision medicine and synthetic biology.
△ Less
Submitted 9 April, 2021;
originally announced April 2021.
-
Exact Parallelization of the Stochastic Simulation Algorithm for Scalable Simulation of Large Biochemical Networks
Authors:
Arthur P. Goldberg,
David R. Jefferson,
John A. P. Sekar,
Jonathan R. Karr
Abstract:
Comprehensive simulations of the entire biochemistry of cells have great potential to help physicians treat disease and help engineers design biological machines. But such simulations must model networks of millions of molecular species and reactions.
The Stochastic Simulation Algorithm (SSA) is widely used for simulating biochemistry, especially systems with species populations small enough tha…
▽ More
Comprehensive simulations of the entire biochemistry of cells have great potential to help physicians treat disease and help engineers design biological machines. But such simulations must model networks of millions of molecular species and reactions.
The Stochastic Simulation Algorithm (SSA) is widely used for simulating biochemistry, especially systems with species populations small enough that discreteness and stochasticity play important roles. However, existing serial SSA methods are prohibitively slow for comprehensive networks, and existing parallel SSA methods, which use periodic synchronization, sacrifice accuracy.
To enable fast, accurate, and scalable simulations of biochemistry, we present an exact parallel algorithm for SSA that partitions a biochemical network into many SSA processes that simulate in parallel. Our parallel SSA algorithm exactly coordinates the interactions among these SSA processes and the species state they share by structuring the algorithm as a parallel discrete event simulation (DES) application and using an optimistic parallel DES simulator to synchronize the interactions. We anticipate that our method will enable unprecedented biochemical simulations.
△ Less
Submitted 20 May, 2020; v1 submitted 11 May, 2020;
originally announced May 2020.
-
ObjTables: structured spreadsheets that promote data quality, reuse, and integration
Authors:
Jonathan R. Karr,
Wolfram Liebermeister,
Arthur P. Goldberg,
John A. P. Sekar,
Bilal Shaikh
Abstract:
A central challenge in science is to understand how systems behaviors emerge from complex networks. This often requires aggregating, reusing, and integrating heterogeneous information. Supplementary spreadsheets to articles are a key data source. Spreadsheets are popular because they are easy to read and write. However, spreadsheets are often difficult to reanalyze because they capture data ad hoc…
▽ More
A central challenge in science is to understand how systems behaviors emerge from complex networks. This often requires aggregating, reusing, and integrating heterogeneous information. Supplementary spreadsheets to articles are a key data source. Spreadsheets are popular because they are easy to read and write. However, spreadsheets are often difficult to reanalyze because they capture data ad hoc without schemas that define the objects, relationships, and attributes that they represent. To help researchers reuse and compose spreadsheets, we developed ObjTables, a toolkit that makes spreadsheets human- and machine-readable by combining spreadsheets with schemas and an object-relational mapping system. ObjTables includes a format for schemas; markup for indicating the class and attribute represented by each spreadsheet and column; numerous data types for scientific information; and high-level software for using schemas to read, write, validate, compare, merge, revision, and analyze spreadsheets. By making spreadsheets easier to reuse, ObjTables could enable unprecedented secondary meta-analyses. By making it easy to build new formats and associated software for new types of data, ObjTables can also accelerate emerging scientific fields.
△ Less
Submitted 6 August, 2020; v1 submitted 11 May, 2020;
originally announced May 2020.
-
Organizing genome engineering for the gigabase scale
Authors:
Bryan A. Bartley,
Jacob Beal,
Jonathan R. Karr,
Elizabeth A. Strychalski
Abstract:
Engineering the entire genome of an organism enables large-scale changes in organization, function, and external interactions, with significant implications for industry, medicine, and the environment. Improvements to DNA synthesis and organism engineering are already enabling substantial changes to organisms with megabase genomes, such as Escherichia coli and Saccharomyces cerevisiae. Simultaneou…
▽ More
Engineering the entire genome of an organism enables large-scale changes in organization, function, and external interactions, with significant implications for industry, medicine, and the environment. Improvements to DNA synthesis and organism engineering are already enabling substantial changes to organisms with megabase genomes, such as Escherichia coli and Saccharomyces cerevisiae. Simultaneously, recent advances in genome-scale modeling are increasingly informing the design of metabolic networks. However, major challenges remain for integrating these and other relevant technologies into workflows that can scale to the engineering of gigabase genomes.
In particular, we find that a major under-recognized challenge is coordinating the flow of models, designs, constructs, and measurements across the large teams and complex technological systems that will likely be required for gigabase genome engineering. We recommend that the community address these challenges by 1) adopting and extending existing standards and technologies for representing and exchanging information at the gigabase genomic scale, 2) developing new technologies to address major open questions around data curation and quality control, 3) conducting fundamental research on the integration of modeling and design at the genomic scale, and 4) developing new legal and contractual infrastructure to better enable collaboration across multiple institutions.
△ Less
Submitted 3 September, 2019;
originally announced September 2019.
-
BpForms and BcForms: Tools for concretely describing non-canonical polymers and complexes to facilitate comprehensive biochemical networks
Authors:
Paul F. Lang,
Yassmine Chebaro,
Xiaoyue Zheng,
John A. P. Sekar,
Bilal Shaikh,
Darren A. Natale,
Jonathan R. Karr
Abstract:
Although non-canonical residues, caps, crosslinks, and nicks play an important role in the function of many DNA, RNA, proteins, and complexes, we do not fully understand how networks of non-canonical macromolecules generate behavior. One barrier is our limited formats, such as IUPAC, for abstractly describing macromolecules. To overcome this barrier, we developed BpForms and BcForms, a toolkit of…
▽ More
Although non-canonical residues, caps, crosslinks, and nicks play an important role in the function of many DNA, RNA, proteins, and complexes, we do not fully understand how networks of non-canonical macromolecules generate behavior. One barrier is our limited formats, such as IUPAC, for abstractly describing macromolecules. To overcome this barrier, we developed BpForms and BcForms, a toolkit of ontologies, grammars, and software for abstracting the primary structure of polymers and complexes as combinations of residues, caps, crosslinks, and nicks. The toolkit can help quality control, exchange, and integrate information about the primary structure of macromolecules into fine-grained global networks of intracellular biochemistry.
△ Less
Submitted 3 September, 2019; v1 submitted 24 March, 2019;
originally announced March 2019.
-
Emerging whole-cell modeling principles and methods
Authors:
Arthur P. Goldberg,
Balázs Szigeti,
Yin Hoon Chew,
John A. P. Sekar,
Yosef D. Roth,
Jonathan R. Karr
Abstract:
Whole-cell computational models aim to predict cellular phenotypes from genotype by representing the entire genome, the structure and concentration of each molecular species, each molecular interaction, and the extracellular environment. Whole-cell models have great potential to transform bioscience, bioengineering, and medicine. However, numerous challenges remain to achieve whole-cell models. Ne…
▽ More
Whole-cell computational models aim to predict cellular phenotypes from genotype by representing the entire genome, the structure and concentration of each molecular species, each molecular interaction, and the extracellular environment. Whole-cell models have great potential to transform bioscience, bioengineering, and medicine. However, numerous challenges remain to achieve whole-cell models. Nevertheless, researchers are beginning to leverage recent progress in measurement technology, bioinformatics, data sharing, rule-based modeling, and multi-algorithmic simulation to build the first whole-cell models. We anticipate that ongoing efforts to develop scalable whole-cell modeling tools will enable dramatically more comprehensive and more accurate models, including models of human cells.
△ Less
Submitted 8 December, 2017; v1 submitted 6 October, 2017;
originally announced October 2017.