PyCoTools - A Python Toolbox For COPASI
PyCoTools - A Python Toolbox For COPASI
doi.10.1093/bioinformatics/xxxxxx
Advance Access Publication Date: Day Month Year
Manuscript Category
Systems Biology
Abstract
Motivation: COPASI is an open source software package for constructing, simulating and analysing
dynamic models of biochemical networks. COPASI is primarily intended to be used with a graphical user
interface but often it is desirable to be able to access COPASI features programmatically, with a high level
interface.
Results: PyCoTools is a Python package aimed at providing a high level interface to COPASI tasks with
an emphasis on model calibration. PyCoTools enables the construction of COPASI models and the exe-
cution of a subset of COPASI tasks including time courses, parameter scans and parameter estimations.
Additional ’composite’ tasks which use COPASI tasks as building blocks are available for increasing para-
meter estimation throughput, performing identifiability analysis and performing model selection. PyCoTools
supports exploratory data analysis on parameter estimation data to assist with troubleshooting model
calibrations. We demonstrate PyCoTools by posing a model selection problem designed to show case
PyCoTools within a realistic scenario. The aim of the model selection problem is to test the feasibility
of three alternative hypotheses in explaining experimental data derived from neonatal dermal fibroblasts
in response to TGF-β over time. PyCoTools is used to critically analyse the parameter estimations and
propose strategies for model improvement.
Availability: PyCoTools can be downloaded from the Python Package Index (PyPI) using the command ’pip
install pycotools’ or directly from GitHub (https://github.com/CiaranWelsh/pycotools). Docu-
mentation at http://pycotools.readthedocs.io.
Contact:
Supplementary information: Supplementary data are available at Bioinformatics.
tools are Data2Dynamics (Raue et al. (2015)), Systems Biology Work- media without LSGS for 0, 1, 2, 4, 8, 12 hours. To harvest, media was
bench (Sauro et al. (2003)), AMIGO (Balsa-Canto and Banga (2011)), aspirated, cells were washed twice in DPBS and then lysed in 350µl RLT
SBpipe (Dalle Pezze and Le Novère (2017)), libRoadRunner (Sauro et al. buffer (Qiagen 79216).
(2013); Somogyi et al. (2015)), Antimony (Smith et al. (2009)), Tellurium
(Choi et al. (2016)), Ecell (Takahashi et al. (2003)), PyDsTool (http: 2.1.2 High-throughput qPCR
//www2.gsu.edu/~matrhc/PyDSTool.htm), PySCeS (Olivier Lysates were snap frozen in liquid nitrogen and stored -80◦ C prior to
(2005)), ABC-SysBio (Liepe et al. (2010)), Condor Copasi (Kent et al. quantification. Cell lystes were thawed at 4◦ C and then RNA was isolated
(2012)) and COPASI (Hoops et al. (2006)). using the Biomek FxP and the RNAdvance Tissue Isolation kit (Beckman
COPASI is a widely used tool in modelling biological systems because Coulter, p/n A32646). The resulting RNA was quantified using the Nan-
it supports a variety of modelling applications including deterministic, drop 8000 (Nanodrop, ND-8000). cDNA was generated using 500ng of
stochastic and hybrid model solvers, parameter estimation, optimisation, TotalRNA and Applied Biosystems High Capacity cDNA with Reverse
parameter scans, steady state analysis, local sensitivity analysis and meta- Transcription kit (Applied Biosystems p/n 4368814).
bolic control analysis. COPASI has a graphical user interface (GUI) which cDNA, assays, and dilutions of Applied Biosystems Taqman Fast
makes the tool accessible to non-expert programmers and mathematici- Advanced MasterMix (Applied Biosystems, p/n 4444965) were plated
ans, but also has a command line interface for batch processing and an onto a Wafergen MyDesign SmartChip (TakaraBio, p/n 640036) using
application programming interface (API) for several programming langu- the Wafergen Nanodispenser. The chip was then loaded into the Smar-
ages. These APIs have been used for integrating the COPASI framework tChip cycler and qPCR performed using the following conditions: hold
with custom software, for example in JigCell Run Manager (Palmisano Stage 50◦ C for 2 minutes, 95◦ C for 10 minutes, PCR Stage 95◦ C for 15
et al., 2015), CellDesigner (Matsuoka et al., 2014), ManyCell (Dada and seconds and 60◦ C for 1 minute. After 40 cycles the reaction was stopped
Mendes, 2012) and ModelMage (Flöttmann et al. (2008)). and the data was exported for analysis.
The Python programming language is useful for scientific computing Prior to use for fitting, cycle threshold CT values were normalised
because of its concise syntax and the availability of open source toolboxes using the 2−∆∆CT method of quantitative PCR normalisation to the geo-
such as pandas (https://pandas.pydata.org/), numpy (http: metric mean of four reference genes (B2M, PPIA, GAPDH, ACTB) per
//www.numpy.org/), scipy (http://www.scipy.org/), sklearn sample (Livak and Schmittgen (2001)).
(Pedregosa et al., 2011) and matplotlib (Hunter, 2007), which together
provide a series of well-documented, easy-to-use, high-level tools for 2.2 Computational
interacting with and manipulating numerical data. Development of fur-
2.2.1 PyCoTools Availability and Installation
ther tools in Python is enabled by the Python Package Index (PyPI) where
PyCoTools was developed partially on Windows 7 and partially on Ubuntu
code can be made freely available to other developers. As a result, Python
16.04.2 with the Anaconda distribution of Python 2.7 and COPASI ver-
has an extensive publicly available code base for scientific computing that
sion’s 4.19.158 and 4.21.166. PyCoTools can be installed with ‘pip’,
competes well with other commercial and non-commercial environments
Python’s native package manager using the command ‘pip install pycoto-
such as Matlab and R.
ols’. PyCoTools can also be downloaded directly from source at https:
Here we present PyCoTools, an open-source Python package which
//github.com/CiaranWelsh/pycotools. More detailed instru-
provides a high level interface to COPASI tasks with an emphasis on model
ctions on installation and PyCoTools usage can be found in the PyCoTools
calibration. COPASI tasks are integrated with the Python environment
documentation (http://pycotools.readthedocs.io).
to provide additional features which are non-native to COPASI. Featu-
res include: the construction of COPASI models with Antimony (Smith
2.2.2 Definition of the Model Selection Problem
et al. (2009)); the automation of repeat parameter estimation configurati-
All models were built by downloading the Zi and Klipp (2007) model
ons, chaser parameter estimations and parameter estimations for multiple
from BioModels (ID:BIOMD0000000163) and modifying it as appropri-
models (e.g. model selection); automation of the profile likelihood method
ate using the COPASI user interface for each model. The models are
of identifiability analysis (Raue et al., 2013; Schaber, 2012) with visua-
available in the supplementary content as SBML files. Model selection
lisation facilities which are flexible enough to support model reduction
was performed by calibrating each model to the same experimental data
(Maiwald et al. (2016)); visualisation of time course from ensembles of
and then evaluating model selection criteria. The Ski mRNA and Smad7
parameter sets and multiple ways of visualising parameter estimation data.
mRNA profiles were measured whilst protein level data were derived by
We demonstrate PyCoTools by defining a model selection problem to intro-
assuming that Smad7 and Ski protein appear 30 minutes after the mRNA
duce a known negative feedback into a previously published model of
and at 100 times the magnitude. Since the experimental data units are
TGF-β signalling (Zi and Klipp (2007)) using new data.
arbitrary and the Zi and Klipp (2007) model simulates in nanomoles per
litre, the experimental data were mapped to the model via an observation
function (Equation 1).
X(t)
XObs(t) = (1)
2 Methods XSF
2.1 Experimental where:
2.1.1 Cell Lines and Treatment
XObs(t) = A mapping between experimental and simulated data
Neonatal human dermal fibroblasts (HDFn, Life Technologies, C-004-5C)
were cultured as per manufacturer guidelines in M106 (Life Technolo- X(t) = Amount of model species X at time t
gies M-106-500) supplemented with LSGS (Life Technologies S-003-10).
XSF = Scale factor for species X = 100
HDFn were seeded at a density of 10,000 cells/cm2 into 12 well plates
(Greiner 665180) in 4ml complete M106 and cultured for 3 days. Media X ∈ {Smad7 mRNA, Ski mRNA, Smad7 Protein, Ski Protein}
was aspirated, cells washed twice with DPBS and replaced with 4ml M106
without LSGS and cells were serum starved for 24 hours. HDFn were tre- All scale factors were set to 100 which is a reasonable value to ensure new
ated with 5ng ml−1 TGF-β1 (Life Technologies, PHG9211) in M106 profiles were of the same order of magnitude as the original. The initial
400
Concentration (nmol)
Smads_Complex_n
300 Smads_Complex_c
200
100
0
0 250 500 750 1000
Time (min)
(b) Output from published Zi and Klipp (2007) model, without changes
Fig. 1: Network representation of ODE networks used in model selection problem. (a) The Zi and Klipp (2007) model is a common component of each
model variant. (b) Simulation output from the Zi and Klipp (2007). (c-e) The model variable ‘Smads_Complex_n’ is responsible for transcription reactions
in model variants while ‘LRC_Cave’ is degraded by Smad7 protein, thus completing the explicit representation of the Smad7 negative feedback loop. In
(c) Model 1, Smad7 participates in but is not consumed by the reaction with LRC_Cave while in (d) Model 2, Smad7 is consumed by this process. In (e)
Model 3, the same topology as Model 2 is assumed but it also incorporates second order mass action degradation kinetics for Ski protein.
concentration of Smad7 and Ski protein were set to 100 times that of example with short execution times that parallels the main model selection
the corresponding mRNA and all new kinetic parameters were estimated. problem and provides code that users can run themselves. Specifically, in
All parameters from the original Zi et al. (2011) model were fixed at this alternative model selection problem we create three models (a negative
the published values, including initial concentration parameters. Initial feedback motif, a positive feedback motif and a feed-forward motif) using
concentrations of Smad7 mRNA and Ski mRNA were set using Equation 2: the Antimony interface. Analogous to the main problem defined above, we
then perform model selection using synthetic experimental data from the
X(t0) = X(µ,t0) · XSF (2) negative feedback topology, visualise the results and run an identifiability
analysis.
where:
0.200
0.175
0.150
Signal (AU)
0.125
0.100
0.075
0.050
0.025
0.000
0 2 4 6 8 10 12
Time(h)
(a) Smad7 mRNA
0.18
0.16
(a) Violin plot comparing AICc scores in each model
0.14
Model Selection Criteria Model 1 Model 2 Model 3
Signal (AU)
Smad7 mRNA Obs Profile Smad7 Protein Obs Profile Ski mRNA Obs Profile Ski Protein Obs Profile
1.0
20
Concentration (nM)
Concentration (nM)
Concentration (nM)
Concentration (nM)
0.8 15
0.4 15
0.6
Model 1
10 10
0.2 0.4
5 0.2
5
0.0 0
0 200 400 600 0 200 400 600 0 200 400 600 0 200 400 600
Time (min) Time (min) Time (min) Time (min)
1.0
20
0.4
Concentration (nM)
Concentration (nM)
Concentration (nM)
Concentration (nM)
0.8 15
15
0.3 0.6
10
Model 2
0.2 0.4 10
0.1 5 0.2
5
0.0 0
0 200 400 600 0 200 400 600 0 200 400 600 0 200 400 600
Time (min) Time (min) Time (min) Time (min)
1.0
20
0.4
Concentration (nM)
Concentration (nM)
Concentration (nM)
Concentration (nM)
0.8 15
15
0.3 0.6
10
Model 3
0.2 0.4 10
0.1 5 0.2
5
0.0 0
0 200 400 600 0 200 400 600 0 200 400 600 0 200 400 600
Time (min) Time (min) Time (min) Time (min)
Fig. 4: Ensemble time courses produced with ‘viz.PlotTimeCourseEnsemble’. The top 10 best parameter sets for each model were sequentially inserted
into their respective models. Time courses were simulated with each parameter set and averaged. Red profiles indicate experimental data while solid blue
lines are simulated profiles. Shaded areas represent 95% confidence intervals.
The shape of this profile is then compared to a confidence threshold based to inform model parameters, or put another way, the model is too complex
on the likelihood ratio statistic (Raue et al., 2009). for the data. Viewing the paths traced by other parameters in a profile likeli-
A profile likelihood typically has one of three interpretations. If the hood analysis (e.g. putting the trajectory of another parameter on the y-axis
profile does not exceed the threshold in one or both directions and is not rather than the objective function value) provides information about the
flat, the parameter is practically non-identifiable. In this case, the trajectory relationship between the parameter of interest on the x-axis and the para-
of the other model components over the profile may be used to direct model meter on the y-axis. Identifying this relationship enables steps to be taken
reduction strategies (Maiwald et al., 2016). If a profile is completely flat the to resolve the problem by fixing parameters or replacing non-identifiable
parameter is structurally non-identifiable, which means the parameter is species or parameters with algebraic equations. Profile likelihoods are the-
algebraically related to another. To resolve structural non-identifiabilities, refore useful in a data-driven approach to iteratively refine an optimisation
one can fix one of the parameters in a relationship to an arbitrary value. Of problem, fixing parameters where possible and modifying the topology as
note, one must be cautious about using profile likelihoods to render a para- necessary until the model fits the experimental data.
meter structurally non-identifiable because the profile likelihood method Profile likelihood calculations are a computationally intense task and
only samples the parameter space. It is possible that the profile appears to be useful, it is required that the starting parameter set is optimal, or at
flat but only on the scale of the sampled profile. Therefore, structurally least very close to optimal, with respect to the data. It is therefore prudent
non-identifiable parameters should be further investigated to determine to assess this condition before conducting a profile likelihood analysis.
any relationships which might exist. Finally, if the profile exceeds this The performance of an optimisation problem can be evaluated by plotting
threshold in both directions the parameter is identifiable and the parameter the sorted objective function value (i.e. residual sum of squares (RSS)
values at which the profile exceeds the threshold are the upper and lower or likelihood) for each parameter estimation iteration against its rank of
confidence boundaries for the parameter (Raue et al., 2009). Ideally, for best fit (herein referred to as a ‘likelihood-ranks’ plot). In these plots
precise model predictions, every estimated parameter in a defined parame- the best case scenario is either a flat line for when there is only a single
ter estimation problem should be identifiable. In reality, limited data and global minimum or more commonly, a monotonically increasing step-like
overly complex model structures often lead to identifiability issues. function where each step marks a different minimum (Raue et al., 2013).
Maiwald et al. (2016) extended the usefulness of profile likelih- Horizontal lines in the likelihood-ranks plot indicate that many iterations
ood from assessing identifiability to model reduction. A practical non- of the same optimisation problem have located the same minimum, which
identifiability exists because the optimisation does not have enough data
RSS
RSS
1000 1000 1000
800 800
800
600 600
0 100 200 300 0 100 200 300 0 100 200 300
Rank of Best Fit Rank of Best Fit Rank of Best Fit
(a) Model 1: Smad7 not consumed (b) Model 2: Smad7 consumed (c) Model 3: Ski second order degradation
Fig. 5: A ‘likelihood-ranks’ plot. The residual sum of squares objective function value is plotted against the rank of best fit for each parameter estimation
iteration for each model (a-c). Graphs were produced with ‘viz.LikelihoodRanks’.
increases our confidence that the problem is well-posed. In contrast a for conveying the TGF-β signal, via phosphorylation, to the Smad second
smooth curve indicates that estimations have not converged to a minimum. messenger system. Phosphorylated Smad2/3 binds to Smad4, transloca-
If the likelihood-ranks plot shows a smooth curve, it is a good idea tes to the nucleus and induces transcription of TGF-β responsive genes
to either rerun the parameter estimation using a different algorithm or (Schmierer et al., 2008). Smad7 is a well characterised negative regula-
different algorithm settings. Alternatively, while others (Raue et al., 2013) tor of the Smad system and is transiently produced in response to TGF-β
employ a multi-start Latin-hypercube strategy with a local optimiser to (Nakao et al. (1997); Hayashi et al. (1997)). Multiple mechanisms of nega-
ensure strategic and uniform sampling of the parameter space, given the tive regulation by Smad7 have been reported, including the recruitment of
choice of algorithms in COPASI it is easy to first run a global and then E3 ubiquitin ligases to either Smad2/3 in competition with Smad4 (Yan
switch to a local algorithm. This strategy, here referred to as a ’chaser et al., 2016) or to activated TGF-β receptors in caveolae (Kavsak et al.,
estimation’, can be performed on all or a subset parameter sets to drive 2000; Di Guglielmo et al., 2003). Many biological entities have been pro-
them closer to their respective minima. posed as regulators of this process, including PPM1A (Lin et al., 2006),
In addition to profile likelihoods and time course ensembles, viewing NEDD4L (Gao et al., 2009), SNoN (Stroschein et al., 1999) and Ski. Ski
distributions of parameter estimation data and correlations between para- acts as co-repressor at Smad regulated genes by recruiting histone dea-
meters can provide information about an optimisation problem. Box plots cetylases which leads to epigenetic constriction of Smad-responsive genes
provide immediate information about the range of parameter estimates and (Akiyoshi et al., 1999).
how they compare to other parameters. Often a box plot can provide clues The Zi and Klipp (2007) model (Figure 1a) combines work by Vilar
to a parameter’s identifiability status. Histograms on the other hand provide et al. (2006) describing TGF-β receptor internalisation and recycling
a more detailed view of parameter distributions and can identify behavi- dynamics with a Smad nuclear-cytoplasmic translocation module. In this
our (e.g. bimodal parameters) that would not be identified with box plots. model, an explicit representation of the Smad7 negative feedback was not
Moreover, a combination of Pearson’s correlation heat maps and scatter included, but was instead incorporated into the rate law for the reaction
graphs can be used to locate linear or log-linear relationships between describing the degradation of the activated ligand-receptor complexes from
parameters. within caveolar compartments (‘LRC_Cave’ in Figure 1a). The purpose
An important aspect of visualising parameter estimation data is that not of the model selection problem presented here is to investigate the feasi-
all parameter sets fit the model equally well. Parameter sets with higher bility of three alternative mechanisms of negative regulation (Figure 1) in
objective function values can distort the distribution of better performing explaining the experimental data (Figure 2).
parameter sets or the shape of a relationship. For this reason PyCoTools After calibration, the ‘viz.ModelSelection’ class was used to calculate
implements flexible means of subsetting parameter estimation data before and visualise the Akaike information criteria (AIC) corrected for small
plotting. sample sizes (AICc) (Figure 3a) and the Bayesian information criteria
(BIC) (Figure S1). With these statistics, a lower value indicates a better
agreement with the data and thus a better model. In the current problem, a
3.2 A Demonstration: Extending the Zi and Klipp (2007) closer inspection of the best model selection values (Figure 3b) indicates
Model that from a purely statistical perspective, the topologies of Models 1 and
2 are indistinguishable in terms of the experimental data (Figure 2) while
To demonstrate PyCoTools, we define a model selection problem to extend
Model 3 is worse.
a published model of canonical TGF-β signalling (Zi and Klipp, 2007)
The simulated profiles for each model (Figure 4) supports the model
(Figure 1). As an alternative demonstration, we also provide an another
selection results. While the Smad7 mRNA and Ski mRNA profiles are sli-
model selection problem in the supplementary content, as described in the
ghtly greater in Model 1 and Model 3 respectively, all profiles are virtually
methods.
indistinguishable between all the models. It is likely that the difference
TGF-β binds to the autophosphorylated homodimeric type 2 TGF-β
in the Ski mRNA profile in Model 3 accounts for the difference observed
receptors which phosphorylate and heterodimerise with homodimers of
in the best model selection criteria (Figure 3b). Regardless of this slight
type 1 TGF-β receptors (De Crescenzo et al., 2001). This event leads to
difference, the same qualitative interpretation holds for each model: the
internalisation of the ligand-receptor complex into one of two types of
speed and magnitude of both Smad7 and Ski mRNA induction profiles are
membrane bound intracellular compartment: early endosomes or caveo-
overestimated while the protein level data fits each model to a high degree
lae. Evidence in Di Guglielmo et al. (2003) suggests that ligand-receptor
of confidence.
complexes in the early endosome, rather than the caveolae, are responsible
When looking at model predictions it is important to consider whether S5a) to those using only the top 10% ranking parameter sets (Figure S5b).
the parameter sets used to produce them are actually the best parameter Figure S5 demonstrates that suboptimal parameter sets can distort the insi-
sets. This is important because it is quite common for parameter estimation ght that can be gained from visually exploring parameter estimation data.
algorithms to find sub-optimal parameters. Here, while improvements can Without truncating the parameter estimation data, the observation that
still be made, the algorithm and settings were reasonably well-chosen the distributions of parameters from the best parameter sets reflect the
because the likelihood-ranks plot produced a step-like shape for each identifiability status of the model, would be missed.
model (Figure 5), heuristically mapping out where the local and global
minima are.
Profile likelihoods are only meaningful when calculated from a mini-
mum with respect to the data. For this reason the best three parameter sets
4 Discussion
from the stochastic genetic algorithm in Model 2 were ‘chased’ with a
Hooke & Jeeves algorithm (tolerance=1e−10 and iteration limit=1000) PyCoTools is an open source Python package designed to assist COPASI
using the ‘PyCoTools.tasks.ChaserParameterEstimations’ class. Profile users in the task of modelling biological systems. PyCoTools offers an
likelihoods were then computed around these three parameter sets, again alternative high level interface to COPASI tasks including time courses,
using the Hooke & Jeeves algorithm (tolerance=1e−6 and iteration parameter scans and parameter estimations. While COPASI implements
limit=50). Sampling was conducted on a log10 scale over 6 orders of the heavy computation, PyCoTools automates task configuration and
magnitude, 1e3 times above and below the best estimated parameter values. execution, thereby promoting efficiency, organisation and reproducibility.
For brevity, profile likelihoods for Models 1 and 3 are not discussed. The PyCoTools bridges COPASI with the Python environment allowing
identifiability analysis shows that seven of the ten parameters are identifi- users to take advantage of Python’s numerical computation, visualisation,
able and the remaining three are practically non-identifiable (Figure 6 and file management and code development facilities. One tool in particular,
Fig S2). the Jupyter notebook, allows annotation of code blocks with rich text
To investigate the source of these non-identifiabilities, two strategies elements and is a powerful environment from which to develop and share
were employed: Pearson’s correlation analysis and the ‘profile likelihood annotated workflows. The combination of Jupyter notebooks, COPASI and
model reduction’ approach as described in Maiwald et al. (2016). The PyCoTools therefore enables the production of reproducible and shareable
Pearson’s correlation approach identified several parameter pairs as puta- models that are annotated with justifications.
tive linear correlations (Figure S3). Of these, only the most correlated pair, PyCoTools supports model editing using both an object-oriented
the km and I50 parameters of Smad7 transcription, was verified to be log- approach and with Antimony, a model specification language for buil-
linearly related in both scatter graphs (Figure 7a) and profile likelihood ding SBML models (Smith et al. (2009)). The Antimony and COPASI
traces (Figure 7b). To resolve this issue, one could replace one of the free user interface are complementary and can be used together to enhance the
parameters in the relationship with the algebraic equation resulting from modelling process. For example, models in Antimony format can be used
the fit of a linear model to the profile likelihood trace (Figure 7b). The as a ‘hard copy’ while a parallel COPASI model can be used for exploratory
other putative relationships suggested by the Pearson’s correlation analy- changes that are ‘committed’ to the hard copy when satisfactory.
sis (Figure S3) were also investigated but the relationships were more PyCoTools supports the configuration of ‘composite’ tasks which are
difficult to interpret. As an example, Figure S4 shows the relationship those comprised of a combination of other tasks. These tasks can be con-
between ‘(SkiDeg).k1’ and ‘(SkimRNADeg).k1’ parameters. While the figured using the COPASI user interface but generally take time and are
scatter graph shows a reasonable linear correlation (Figure S4a), it is defi- vulnerable to human error. For example, users can automatically configure
ned on a very small interval and the profile likelihood is clearly non-linear, repeat parameter estimations, chaser parameter estimations and model
albeit linear on a sub-domain of the parameter space (Figure S4b). selection problems, thereby circumventing the requirement for manual
Lastly, distributions of parameter estimates were visualised using box configuration.
plots (Figure S5) and histograms (Figure S6). Despite being presented last, Another composite task supported by PyCoTools is the profile likeli-
these are computationally inexpensive to generate and are good to view hood method of identifiability analysis (Raue et al. (2009)). Models with
prior to more involved analyses such as profile likelihoods. To demonstrate non-identifiable parameters are common in systems biology and it is use-
the effect of sub-optimal parameter sets, a comparison is made between ful to have a means of assessing which parameters are reliably defined
box plots generated for Model 2 using all parameter estimation data (Figure by an estimation problem. PyCoTools automates the procedure outlined
by Schaber (2012) for conducting profile likelihoods in COPASI, thereby
Fig. 6: Profile likelihoods were calculated using the ‘tasks.ProfileLikelihood’ class for the top three parameter sets of Model 2 and visualised using
‘viz.PlotProfileLikelihood’. The black stars indicates the best estimated parameter. The dotted green line indicates the 95% confidence level and the red
spots are the minimum RSS value achieved after re-optimisation of all parameters except the parameter of interest (x-axis). Lines between red spots have
been interpolated using a cubic spline.
enabling COPASI users to perform an identifiability analysis more effi- models in model calibration it is possible reject one or more topologies
ciently and in a way less amenable to errors than manual configuration. in favour of another. Here, however, because the models are so similar, it
PyCoTools also enables users to calculate profile likelihoods from multiple was not possible to provide support for any model being worse than any
parameter sets thereby enabling users to address one of the shortcomings other, despite the minor differences in model selection criteria for Model 3
of the profile likelihood approach: that it is a local method of identifiability (Figure 3a). In a more comprehensive investigation many more topologies
analysis. would be similarly compared to iteratively reject topologies until the model
One alternative to COPASI and PyCoTools is Data2Dynamics (Raue is capable of making useful, validatable predictions.
et al. (2015)). While Data2Dynamics provides an excellent range of model Regardless of the biological interpretation, we have demonstrated
analysis tools, the transfer of files between COPASI and Data2Dynamics the process of using PyCoTools and COPASI to discriminate between
is imperfect, often necessitating that a COPASI user redefine their model model alternatives and to critically assess the parameter estimation process.
within the Data2Dynamics environment. PyCoTools allows COPASI users Model calibration is an essential part of a systems modelling investigati-
to stay within the COPASI environment, thereby making profile likelihood ons, but it is often limited by a vast, underdetermined parameter space
analysis more accessible to COPASI users. and therefore, procedures that provide a measure of uncertainty are valu-
In this work we have demonstrated PyCoTools by posing a model sele- able. In PyCoTools, we have implemented a number of features aimed
ction problem to discriminate between three model topologies (Figure 1) towards gauging confidence and uncertainty in the optimisation process
with respect to some experimental data in response to TGF-β (Figure 2). so that COPASI users can diagnose problems and make better informed
Rather than using synthetic data, our aim was to demonstrate in a ‘real decisions based on their parameter estimation output. These tools include:
world’ scenario how PyCoTools can be used together with COPASI to the likelihood-ranks plot (Figure 5) which enables evaluation of an opti-
calibrate a set of models and discriminate between them. misation algorithm and settings on a specific problem (Raue et al., 2013);
As this is primarily a software demonstration and not a biological inve- ensemble time courses (Figure 4) which calculate confidence intervals
stigation, the model selection problem proposed was designed to be as from predictions made from multiple best parameter sets and propagates
simple as possible whilst still being non-trivial. Mechanistically the three uncertainty from parameter estimates to model predictions; profile like-
models (Figure 1) are alternative hypotheses which attempt to address the lihoods for assessing identifiability (Figure 6, Figure S2) and for model
dynamics of the Smad7 (Figure 2) negative feedback. Model alternatives reduction (Figure 7b) (Maiwald et al. (2016)); Pearson’s correlation heat
were based on a published dynamic model of TGF-β signalling (Zi and maps (Figure S3) and scatter graphs (Figure 7a) for identifying relationsh-
Klipp (2007)) that was adapted to incorporate Smad7. Since the decay of ips, and box plots (Figure S5) and histograms (Figure S6) for visualising
Smad7 is transient and fast (Figure 2a), the simplest mechanism involving distributions of parameter estimates. Together these tools provide detailed
only Smad7 with first order mass action degradation kinetics would not information about an optimisation problem that can be used to guide the
be able to account for the observed decline in Smad7. Therefore Smad7 modelling process.
degradation was assumed to be an active process. Since Ski is a known
Smad co-repressor (Akiyoshi et al. (1999)) and Smad7 is a Smad respon-
sive gene (Hayashi et al. (1997)), Ski was proposed to be transcribed 5 Conclusion
in response to TGF-β (Figure 2b) and inhibit Smad7 transcription. The
model alternatives are slightly different representations of this hypothesis PyCoTools is an open-source and extensible Python package designed to
(Figure 1). facilitate the use of COPASI, particularly for model calibration. PyCoTo-
In this model selection problem it is clear that the model topologies ols supports a range of tools which are either wrappers around COPASI
chosen are too similar to be discriminated with the experimental data and tasks, an ordered workflow of task configurations, or plotting facilities for
therefore the models are virtually indistinguishable (Figure 4). Generally, exploratory data analysis on parameter estimation data. Use of PyCoTo-
with model selection, the strongest statement that can be made about a ols can enhance the effectiveness with which one can calibrate models to
model is a rejection, since accepting the hypothesis does not necessarily experimental data and discriminate between alternate hypotheses.
guarantee that it is correct. By comparing the performance of multiple
Funding
This work was funded by Procter & Gamble. The contribution from
AGM and CJP was supported by the Medical Research Council (https:
//www.mrc.ac.uk/) and Arthritis Research UK (http://www.
arthritisresearchuk.org/) as part of the MRC-Arthritis Rese-
arch UK Centre for Integrated research into Musculoskeletal Ageing
(CIMA) (MR/K006312/1). The work builds on a BBSRC LINK grant
awarded to SAB (BB/K019260/1).
and White, M. R. (2009). Pulsatile stimulation determines timing and specificity 631–635.
of nf-κb-dependent transcription. Science, 324(5924), 242–6. Nelson, D., Ihekwaba, A., Elliott, M., Johnson, J., Gibney, C., Foreman, B., Nelson,
Balsa-Canto, E. and Banga, J. R. (2011). Amigo, a toolbox for advanced model iden- G., See, V., Horton, C., and Spiller, D. (2004). Oscillations in nf-κb signaling
tification in systems biology using global optimization. Bioinformatics, 27(16), control the dynamics of gene expression. Sci Signal, 306.
2311–2313. Olivier, B. G. (2005). Simulation and database software for computational systems
Choi, K., Medley, J. K., Cannistra, C., Konig, M., Smith, L., Stocking, K., and Sauro, biology: PySCeS and JWS Online. Ph.D. thesis, University of Stellenbosch.
H. M. (2016). Tellurium: a python based modeling and reproducibility platform Palmisano, A., Hoops, S., Watson, L. T., Jones, T. C., J., Tyson, J. J., and Shaf-
for systems biology. bioRxiv, page 054601. fer, C. A. (2015). Jigcell run manager (jc-rm): a tool for managing large sets of
Dada, J. O. and Mendes, P. (2012). ManyCell: A Multiscale Simulator for Cellular biochemical model parametrizations. BMC Syst Biol, 9, 95.
Systems, pages 366–369. Springer Berlin Heidelberg, Berlin, Heidelberg. Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O.,
Dalle Pezze, P. and Le Novère, N. (2017). Sbpipe: a collection of pipelines for Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., et al. (2011). Scikit-learn:
automating repetitive simulation and analysis tasks. BMC systems biology, 11(1), Machine learning in python. Journal of Machine Learning Research, 12(Oct),
46. 2825–2830.
Dalle Pezze, P., Sonntag, A. G., Thien, A., Prentzell, M. T., Gödel, M., Fischer, Purvis, J. E., Karhohs, K. W., Mock, C., Batchelor, E., Loewer, A., and Lahav, G.
S., Neumann-Haefelin, E., Huber, T. B., Baumeister, R., and Shanley, D. P. (2012). p53 dynamics control cell fate. Science, 336(6087), 1440–1444.
(2012). A dynamic network model of mtor signaling reveals tsc-independent Raue, A., Kreutz, C., Maiwald, T., Bachmann, J., Schilling, M., Klingmüller, U.,
mtorc2 regulation. Sci. Signal., 5(217), ra25–ra25. and Timmer, J. (2009). Structural and practical identifiability analysis of partially
Dalle Pezze, P., Ruf, S., Sonntag, A. G., Langelaar-Makkinje, M., Hall, P., Heberle, observed dynamical models by exploiting the profile likelihood. Bioinformatics,
A. M., Navas, P. R., Van Eunen, K., Tolle, R. C., and Schwarz, J. J. (2016). A 25(15), 1923–1929.
systems study reveals concurrent activation of ampk and mtor by amino acids. Raue, A., Schilling, M., Bachmann, J., Matteson, A., Schelke, M., Kaschek, D.,
Nature communications, 7, 13254. Hug, S., Kreutz, C., Harms, B. D., Theis, F. J., et al. (2013). Lessons learned from
De Crescenzo, G., Grothe, S., Zwaagstra, J., Tsang, M., and O’Connor-McCourt, quantitative dynamical modeling in systems biology. PloS one, 8(9), e74335.
M. D. (2001). Real-time monitoring of the interactions of transforming growth Raue, A., Steiert, B., Schelker, M., Kreutz, C., Maiwald, T., Hass, H., Vanlier, J.,
factor-β (tgf-β) isoforms with latency-associated protein and the ectodomains of Tönsing, C., Adlung, L., Engesser, R., Mader, W., Heinemann, T., Hasenauer, J.,
the tgf-β type ii and iii receptors reveals different kinetic models and stoichiome- Schilling, M., Höfer, T., Klipp, E., Theis, F., Klingmüller, U., Schöberl, B., and
tries of binding. Journal of Biological Chemistry, 276(32), 29632–29643. Timmer, J. (2015). Data2dynamics: a modeling environment tailored to parameter
Di Guglielmo, G. M., Le Roy, C., Goodfellow, A. F., and Wrana, J. L. (2003). Distinct estimation in dynamical systems. Bioinformatics.
endocytic pathways regulate tgf-β receptor signalling and turnover. Nature cell Sauro, H. M., Hucka, M., Finney, A., Wellock, C., Bolouri, H., Doyle, J., and Kitano,
biology, 5(5), 410–421. H. (2003). Next generation simulation tools: the systems biology workbench and
Flöttmann, M., Schaber, J., Hoops, S., Klipp, E., and Mendes, P. (2008). Model- biospice integration. Omics A Journal of Integrative Biology, 7(4), 355–372.
mage: a tool for automatic model generation, selection and management. In Sauro, H. M., Karlsson, T. T., Swat, M., Galdzicki, M., and Somogyi, A. (2013).
Genome Informatics 2008: Genome Informatics Series Vol. 20, pages 52–63. World libroadrunner: A high performance sbml compliant simulator. bioRxiv.
Scientific. Schaber, J. (2012). Easy parameter identifiability analysis with copasi. Biosystems,
Gao, S., Alarcón, C., Sapkota, G., Rahman, S., Chen, P.-Y., Goerner, N., Macias, 110(3), 183–185.
M. J., Erdjument-Bromage, H., Tempst, P., and Massagué, J. (2009). Ubiquitin Schmierer, B., Tournier, A. L., Bates, P. A., and Hill, C. S. (2008). Math-
ligase nedd4l targets activated smad2/3 to limit tgf-β signaling. Molecular cell, ematical modeling identifies smad nucleocytoplasmic shuttling as a dynamic
36(3), 457–468. signal-interpreting system. Proceedings of the National Academy of Sciences,
Hayashi, H., Abdollah, S., Qiu, Y., Cai, J., Xu, Y.-Y., Grinnell, B. W., Richardson, 105(18), 6608–6613.
M. A., Topper, J. N., Gimbrone, M. A., Wrana, J. L., et al. (1997). The mad-related Smith, L. P., Bergmann, F. T., Chandran, D., and Sauro, H. M. (2009). Antimony: a
protein smad7 associates with the tgfβ receptor and functions as an antagonist of modular model definition language. Bioinformatics, 25(18), 2452–2454.
tgfβ signaling. Cell, 89(7), 1165–1173. Somogyi, E. T., Bouteiller, J.-M., Glazier, J. A., König, M., Medley, J. K., Swat,
Hoops, S., Sahle, S., Gauges, R., Lee, C., Pahle, J., Simus, N., Singhal, M., Xu, M. H., and Sauro, H. M. (2015). libroadrunner: a high performance sbml simulation
L., Mendes, P., and Kummer, U. (2006). Copasi—a complex pathway simulator. and analysis library. Bioinformatics, 31(20), 3315–3321.
Bioinformatics, 22(24), 3067–3074. Stroschein, S. L., Wang, W., Zhou, S., Zhou, Q., and Luo, K. (1999). Negative feed-
Hunter, J. D. (2007). Matplotlib: A 2d graphics environment. Computing In Science back regulation of tgf-β signaling by the snon oncoprotein. Science, 286(5440),
& Engineering, 9(3), 90–95. 771–774.
Kavsak, P., Rasmussen, R. K., Causing, C. G., Bonni, S., Zhu, H., Thomsen, G. H., Sun, T., Yang, W., Liu, J., and Shen, P. (2011). Modeling the basal dynamics of p53
and Wrana, J. L. (2000). Smad7 binds to smurf2 to form an e3 ubiquitin ligase that system. PLoS ONE, 6(11), e27882.
targets the tgfβ receptor for degradation. Molecular cell, 6(6), 1365–1375. Takahashi, K., Ishikawa, N., Sadamoto, Y., Sasamoto, H., Ohta, S., Shiozawa,
Kent, E., Hoops, S., and Mendes, P. (2012). Condor-copasi: high-throughput A., Miyoshi, F., Naito, Y., Nakayama, Y., and Tomita, M. (2003). E-cell 2:
computing for biochemical networks. BMC systems biology, 6(1), 91. multi-platform e-cell simulation system. Bioinformatics, 19(13), 1727–1729.
Liepe, J., Barnes, C., Cule, E., Erguler, K., Kirk, P., Toni, T., and Stumpf, M. Vilar, J. M. G., Jansen, R., and Sander, C. (2006). Signal processing in the tgf-β
P. H. (2010). Abc—sysbio—approximate bayesian computation in python with superfamily ligand-receptor network. PLoS computational biology, 2(1), e3.
gpu support. Bioinformatics, 26(14), 1797–1799. Wang, J., Tucker-Kellogg, L., Ng, I. C., Jia, R., Thiagarajan, P. S., White, J. K., and
Lin, X., Duan, X., Liang, Y.-Y., Su, Y., Wrighton, K. H., Long, J., Hu, M., Davis, Yu, H. (2014). The self-limiting dynamics of tgf-beta signaling in silico and in
C. M., Wang, J., Brunicardi, F. C., et al. (2006). Ppm1a functions as a smad vitro, with negative feedback through ppm1a upregulation. PLoS computational
phosphatase to terminate tgfβ signaling. Cell, 125(5), 915–928. biology, 10(6), e1003573.
Livak, K. J. and Schmittgen, T. D. (2001). Analysis of relative gene expression data Yan, X., Liao, H., Cheng, M., Shi, X., Lin, X., Feng, X.-H., and Chen, Y.-G. (2016).
using real-time quantitative pcr and the 2- δδct method. methods, 25(4), 402–408. Smad7 protein interacts with receptor-regulated smads (r-smads) to inhibit tran-
Maiwald, T., Hass, H., Steiert, B., Vanlier, J., Engesser, R., Raue, A., Kipkeew, F., sforming growth factor-β (tgf-β)/smad signaling. Journal of Biological Chemistry,
Bock, H. H., Kaschek, D., Kreutz, C., et al. (2016). Driving the model to its limit: 291(1), 382–392.
Profile likelihood based model reduction. PloS one, 11(9), e0162366. Zi, Z. and Klipp, E. (2007). Constraint-based modeling and kinetic analysis of the
Matsuoka, Y., Funahashi, A., Ghosh, S., and Kitano, H. (2014). Modeling and smad dependent tgf-β signaling pathway. PloS one, 2(9), e936.
simulation using celldesigner. Methods Mol Biol, 1164, 121–45. Zi, Z., Feng, Z., Chapnick, D. A., Dahl, M., Deng, D., Klipp, E., Moustakas, A.,
Nakao, A., Afrakhte, M., Morn, A., Nakayama, T., Christian, J. L., Heuchel, R., and Liu, X. (2011). Quantitative analysis of transient and sustained transforming
Itoh, S., Kawabata, M., Heldin, N.-E., Heldin, C.-H., et al. (1997). Identification growth factor-β signaling dynamics. Molecular systems biology, 7(1), 492.
of smad7, a tgfβ-inducible antagonist of tgf-β signalling. Nature, 389(6651),