Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Machine Learning for the identification of phase-transitions
in interacting agent-based systems: a Desai-Zwanzig example

Nikolaos Evangelou1,2, Dimitris G. Giovanis3,4,
George A. Kevrekidis2, Grigorios A. Pavliotis5, Ioannis G. Kevrekidis1,2,⋆
1Department of Chemical and Biomolecular Engineering, Johns Hopkins University, 3400 North Charles Street, Baltimore, MD 21218, USA 2Department of Applied Mathematics and Statistics, Johns Hopkins University, 3400 North Charles Street, Baltimore, MD 21218, USA 3Department of Civil and Systems Engineering, Johns Hopkins University, 3400 North Charles Street, Baltimore, MD, 21218, USA 4Hopkins Extreme Materials Institute, Johns Hopkins University, 3400 North Charles Street, Baltimore, MD 21218, USA 5Department of Mathematics, Imperial College London, London SW7 2AZ, United Kingdom yannisk@jhu.edu
Abstract

Deriving closed-form, analytical expressions for reduced-order models, and judiciously choosing the closures leading to them, has long been the strategy of choice for studying phase- and noise-induced transitions for agent-based models (ABMs). In this paper, we propose a data-driven framework that pinpoints phase transitions for an ABM- the Desai-Zwanzig model - in its mean-field limit, using a smaller number of variables than traditional closed-form models. To this end, we use the manifold learning algorithm Diffusion Maps to identify a parsimonious set of data-driven latent variables, and show that they are in one-to-one correspondence with the expected theoretical order parameter of the ABM. We then utilize a deep learning framework to obtain a conformal reparametrization of the data-driven coordinates that facilitates, in our example, the identification of a single parameter-dependent ODE in these coordinates. We identify this ODE through a residual neural network inspired by a numerical integration scheme (forward Euler). We then use the identified ODE – enabled through an odd symmetry transformation– to construct the bifurcation diagram exhibiting the phase transition.

interacting particle systems, machine-learning, dynamical systems, phase-transitions, system identification
preprint: APS/123-QED

I Introduction:

Complex dynamic phenomena are ubiquitous in natural sciences, social sciences, and engineering [1, 2, 3, 4]. In many cases, their study has been performed using agent-based models (ABM) also known as interacting particle systems (IPS). The dynamics of those models in the thermodynamic (mean-field) limit undergo phase transitions – bifurcations in non-linear dynamics terminology  [5, 6]. The exploration of the dynamics of these systems typically involves extensive numerical simulation scenarios for a very large number, N𝑁Nitalic_N, of interacting agents that can be truly challenging and impedes the widespread utilization of these models. Therefore, coarse-graining methodologies become necessary in order to reduce this complexity. Reducing the dimensionality of an ABM can be achieved by defining collective variables that are capable of accurately describing the full dynamics of such large systems in terms of a relatively small number of observables [7, 8, 9]; or discovering those by data mining [10, 11, 12].

In our previous work [9], we developed and successfully tested a model reduction approach based on the cumulants of the single-agent probability distribution for such large ABM systems. The proposed coarse-graining framework was based on an analytical closure methodology of the infinite hierarchy of equations for the moments or, equivalently, cumulants of the probability distribution of the infinite-dimensional system. The basic steps for building a reduced-order Desai-Zwanzig (DZ) model are: (i) Consider the mean-field ansatz by writing the N𝑁Nitalic_N-particle distribution function, the solution of the Nlimit-from𝑁N-italic_N -particle Fokker-Planck equation, as the product of one-particle distribution functions ρ(x,t)𝜌𝑥𝑡\rho(x,t)italic_ρ ( italic_x , italic_t ) [13]. (ii) Represent ρ(x,t)𝜌𝑥𝑡\rho(x,t)italic_ρ ( italic_x , italic_t ) either in terms of its moments (DZ [14]) or in terms of its Fourier coefficients (Smoluchowski/noisy Kuramoto model [15]), thus obtaining an infinite system of ordinary differential equations (ODEs) that is exactly equivalent to the McKean-Vlasov PDE. Truncating the equations for the moments (or the Fourier coefficients), while choosing an appropriate closure scheme gives the reduced-order model described and analyzed in [9]. However, selecting the correct closure scheme for the truncated moments’ system is the trickiest part of the coarse-graining procedure.

Refer to caption
Figure 1: Schematic of the overall workflow. (I) Sample data from the Agent-Based Model (ABM) across multiple initial conditions and parameter values, (II) Compute histograms for each snapshot of the ABM. (III) Apply the Diffusion Maps algorithm on the computed histograms to discover a reduced latent embedding. (IV) Use a conformal autoencoder (AE) to find a conformal reparametrization of the latent space. (V) Identify a data-driven ODE in terms of the latent coordinate ν2subscript𝜈2\nu_{2}italic_ν start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT of the AE. (VI) Construct the bifurcation diagram (enabled via a symmetry transformation) in terms of the latent coordinate ν2subscript𝜈2\nu_{2}italic_ν start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT.

This analytical approximation requires choosing the “correct” observable(s), the right level of observation/analytical closure, and –importantly – the level at which we attempt the closure. To overcome these constraints, this paper proposes a data-driven framework for studying phase transitions of ABMs by (i) discovering the coarse observables in a data-driven fashion, (ii) determining the level of closure, and (iii) identifying the reduced dynamics directly from data, (iv) utilizing the data-driven reduced model and applying an odd symmetry transformation to it for the construction of the bifurcation diagram. To this end, we employ Diffusion Maps [16], a manifold learning algorithm, conformal autoencoders [17] and residual neural networks inspired by numerical integrators of ordinary differential equations [18, 19, 20, 21, 12]. This allows us to circumvent the difficulties arising from the choice of an analytical model and the selection of its closure. The main steps of our proposed data-driven workflow, are illustrated in Figure 1.

As an example, we apply our data-driven framework to the generalization of the DZ model with multiplicative noise considered in [9]; this very well-studied ABM model features a second-order (continuous) phase transition, which enables us to compare the performance of our data-driven approach with well-known analytical and computational results. We emphasize, however, that our method is quite general, since it does not depend on the detailed features of the interaction, and can be applied to many different ABMs that exhibit phase transitions in the thermodynamic limit.

II The Agent-Based Model:

We consider a model describing an ensemble of N𝑁Nitalic_N identical interacting agents subject to multiplicative noise. For this model, the agents are coupled via a mean reverting force and a second-order phase transition exists at a critical temperature that can be calculated analytically [9][Eqn. A12]. This transition appears as a symmetric pitchfork bifurcation of the mean-field dynamics. The dynamics of each agent, xisubscript𝑥𝑖x_{i}italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, are described by a stochastic differential equation (SDE) of the form

dxi=[xi3+(α+νσm2)xiθ(xix¯)]dt+σ2+σm2xi2dWi;dsubscript𝑥𝑖delimited-[]superscriptsubscript𝑥𝑖3𝛼𝜈superscriptsubscript𝜎𝑚2subscript𝑥𝑖𝜃subscript𝑥𝑖¯𝑥𝑑𝑡superscript𝜎2superscriptsubscript𝜎𝑚2superscriptsubscript𝑥𝑖2dsubscript𝑊𝑖\mathrm{d}x_{i}=\bigg{[}-x_{i}^{3}+(\alpha+\nu\sigma_{m}^{2})x_{i}-\theta(x_{i% }-\overline{x})\bigg{]}dt+\sqrt{\sigma^{2}+\sigma_{m}^{2}x_{i}^{2}}\mathrm{d}W% _{i};roman_d italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = [ - italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT + ( italic_α + italic_ν italic_σ start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_θ ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - over¯ start_ARG italic_x end_ARG ) ] italic_d italic_t + square-root start_ARG italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_σ start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG roman_d italic_W start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ; (1)

where σ𝜎\sigmaitalic_σ is the bifurcation parameter, θ𝜃\thetaitalic_θ denotes the interaction strength, α𝛼\alphaitalic_α is a parameter characterizing the amplitude of the multiplicative noise, ν𝜈\nuitalic_ν corresponds to different mathematical prescriptions of the SDEs (Itô, Stratonovich, etc.) and, σmsubscript𝜎𝑚\sigma_{m}italic_σ start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT is a rectifying parameter: when σm0subscript𝜎𝑚0\sigma_{m}\neq 0italic_σ start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ≠ 0 the phase-transition is pushed to higher values of σ.𝜎\sigma.italic_σ . The agents are coupled through x¯¯𝑥\bar{x}over¯ start_ARG italic_x end_ARG, which denotes the center of mass of the system (equal to the first moment M1subscript𝑀1M_{1}italic_M start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT). Furthermore, dWi,i=1N𝑑subscript𝑊𝑖𝑖1𝑁dW_{i},i=1\dots Nitalic_d italic_W start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_i = 1 … italic_N denotes independent Brownian motions. The values of the parameters were set to α=1𝛼1\alpha=1italic_α = 1, θ=4𝜃4\theta=4italic_θ = 4 , σm=0.8subscript𝜎𝑚0.8\sigma_{m}=0.8italic_σ start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT = 0.8, ν=12𝜈12\nu=\frac{1}{2}italic_ν = divide start_ARG 1 end_ARG start_ARG 2 end_ARG, and N=12,000𝑁12000N=12,000italic_N = 12 , 000 as in [9].

For this model, it was demonstrated in [9] that, away from the phase transition at σ1.890𝜎1.890\sigma\cong 1.890italic_σ ≅ 1.890, the first moment, i.e., the order parameter of the system, is sufficient for an accurate description of the mean-field dynamics. As we get closer to the phase transition/the bifurcation, more moments need to be taken into account in order to accurately locate the bifurcation. More specifically, it was numerically demonstrated that at least a four-moment truncation is necessary in order to accurately predict the bifurcation/phase transition. In this work, by observing the eigenvalues of the Jacobian at the steady state (based on the moments equation proposed in [9]), we confirm that a clear separation of time scales prevails in the neighborhood of the bifurcation, and that the long-term dynamics live on a one-dimensional manifold. Our goal here, discussed in the next section, is to find a data-driven parametrization of this one-dimensional manifold, and identify the dynamics on it.

III Numerical Results:

III.1 Latent data-driven coordinates

Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Figure 2: (a) The first two moments (M1subscript𝑀1M_{1}italic_M start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, M2subscript𝑀2M_{2}italic_M start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT) estimated from the ABM simulations colored with the parameter value σ𝜎\sigmaitalic_σ at which they were obtained. (b-c) Diffusion Maps on histograms for a single value of the parameter σ𝜎\sigmaitalic_σ: (b) One-to-one relation between leading histogram moment M1subscript𝑀1M_{1}italic_M start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and leading Diffusion Maps coordinate ψ1subscript𝜓1\psi_{1}italic_ψ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT; (c) shows the residual (rksubscript𝑟𝑘r_{k}italic_r start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT) based on the local linear regression algorithm indicating that ψ1subscript𝜓1\psi_{1}italic_ψ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT might be enough to parametrize the data (larger value of rksubscript𝑟𝑘r_{k}italic_r start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT). (d-f) Diffusion Maps on collected histograms from the ABM simulation across multiple values of the parameter σ𝜎\sigmaitalic_σ: (d-e) show the non-harmonic diffusion map coordinates (ψ1,ψ2)subscript𝜓1subscript𝜓2(\psi_{1},\psi_{2})( italic_ψ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_ψ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) colored with σ𝜎\sigmaitalic_σ and M1subscript𝑀1M_{1}italic_M start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT respectively; (f) shows the residual (rksubscript𝑟𝑘r_{k}italic_r start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT) based on the local linear regression algorithm indicating that ψ1subscript𝜓1\psi_{1}italic_ψ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and ψ2subscript𝜓2\psi_{2}italic_ψ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT might be enough to parametrize the data (larger values of rksubscript𝑟𝑘r_{k}italic_r start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT).

We start by sampling data on an equidistant grid of 18181818 distinct values of the parameter σ𝜎\sigmaitalic_σ in the interval σ[0.5,2.2]𝜎0.52.2\sigma\in[0.5,2.2]italic_σ ∈ [ 0.5 , 2.2 ]. A more detailed description of the sampling strategy is discussed in Section A.1.

Given the sampled data from the ABM we compute the first four moments M1,M2,M3,M4subscript𝑀1subscript𝑀2subscript𝑀3subscript𝑀4M_{1},M_{2},M_{3},M_{4}italic_M start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_M start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_M start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT , italic_M start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT as Mk=1Ni=1Nxik(t)subscript𝑀𝑘1𝑁superscriptsubscript𝑖1𝑁superscriptsubscript𝑥𝑖𝑘𝑡M_{k}=\frac{1}{N}\sum_{i=1}^{N}x_{i}^{k}(t)italic_M start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = divide start_ARG 1 end_ARG start_ARG italic_N end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ( italic_t ), where xiksuperscriptsubscript𝑥𝑖𝑘x_{i}^{k}italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT denotes the i𝑖iitalic_i-th agent at a fixed time raised in the power k𝑘kitalic_k, Mksubscript𝑀𝑘M_{k}italic_M start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT is the k𝑘kitalic_k-th moment of the population from the sampled ABM data. The computation of the moments serves as a comparison between our data-driven framework and the one proposed in [9] where an analytical system of ODEs based on the four moments is proposed. The choice of four moment is also supported by studying the time-scale separation of the analytical reduced model proposed in [9] between the four-moment truncation and higher-order moment truncations of the dynamics, see A.5 in the Appendix. In Figure 2 we plot the first two moments M1subscript𝑀1M_{1}italic_M start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and M2subscript𝑀2M_{2}italic_M start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT colored with the parameter σ𝜎\sigmaitalic_σ. Based on this, we argue that the data are at least two-dimensional, and that the parameter σ𝜎\sigmaitalic_σ appears to correlate with the moments M1,M2subscript𝑀1subscript𝑀2M_{1},M_{2}italic_M start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_M start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT.

We then proceed by computing Diffusion Maps (see Section A.2 for a description of the algorithm) using data from the sampled ABM simulation. The Diffusion Maps algorithm was computed using: (a) histograms obtained for a single value of the parameter σ𝜎\sigmaitalic_σ, (b) histograms obtained across our grid of 18 distinct σ𝜎\sigmaitalic_σ values, (c) on the four computed moments from data sampled across our 18 values of σ𝜎\sigmaitalic_σ; those computations are reported in Section A.2.3 of the Appendix. We would like to reiterate that the Diffusion Maps computations on the moments are not necessary for the overall framework they serve as a comparison of our approach and the one by [9] et al.

Computing Diffusion Maps (based on histograms) for single values of σ𝜎\sigmaitalic_σ (far from the phase transition) gave, a one-dimensional manifold parametrized by ψ1subscript𝜓1\psi_{1}italic_ψ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT. In Figure 2 we plot the leading Diffusion Maps coordinate ψ1subscript𝜓1\psi_{1}italic_ψ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT against the first moment M1subscript𝑀1M_{1}italic_M start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and demonstrate that they are one-to-one. This suggests that Diffusion Maps is capable of discovering a coordinate that is one-to-one with the known order parameter M1subscript𝑀1M_{1}italic_M start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT for this model [14, 22, 9]. The claim that the manifold is one-dimensional in this case is also supported from the results of the local-linear algorithm shown in Figure 2; the residual rksubscript𝑟𝑘r_{k}italic_r start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT of the first eigenvector ψ1subscript𝜓1\psi_{1}italic_ψ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT is larger than the one of the remaining. In Section A.2.3 of the Appendix we illustrate the relation of the Diffusion Maps coordinate with the subsequent moments M2,M3,M4subscript𝑀2subscript𝑀3subscript𝑀4M_{2},M_{3},M_{4}italic_M start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_M start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT , italic_M start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT.

We proceed with the Diffusion Maps computation on data sampled over multiple parameter values of σ𝜎\sigmaitalic_σ. Details for the Diffusion Maps algorithm applied on histograms are provided in Section A.2.2. Here in Figures 2, 2 we show the two, non-harmonic eigenvectors that parameterize different eigendirections ψ1subscript𝜓1\psi_{1}italic_ψ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and ψ2subscript𝜓2\psi_{2}italic_ψ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT colored with the parameter σ𝜎\sigmaitalic_σ and the first moment M1subscript𝑀1M_{1}italic_M start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT. The selection of the non-harmonic eigenvectors was made by using the local linear regression algorithm proposed in [23] and implemented by the datafold Python package [24]. The residual rksubscript𝑟𝑘r_{k}italic_r start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT of the first two eigenvectors, shown in Figure 2, is greater than the residual of the remaining eigenvectors. This suggests that the first two eigenvectors (ψ1,ψ2subscript𝜓1subscript𝜓2\psi_{1},\psi_{2}italic_ψ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_ψ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT) are non-harmonic, while the remaining ones are harmonic (functions) of the first two.

The Diffusion Maps results for the case of using collected data across multiple parameter values, shown in Figures 2, 2, and 2, suggest that the mean-field dynamics of the ABM live on a two-dimensional manifold. However, the parameter σ𝜎\sigmaitalic_σ appears to correlate with the behavior of the moments observed at that σ𝜎\sigmaitalic_σ value. This suggests that even though the data can be parametrized by two coordinates ψ1subscript𝜓1\psi_{1}italic_ψ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, ψ2subscript𝜓2\psi_{2}italic_ψ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT, one might be able to disentangle this coupled two-dimensional description into a factorized (a single state)×\times×(a parameter) description. In the next section, we illustrate how a data-driven reparametrization of the Diffusion Maps coordinates can be achieved that will allow us to disentangle the effect of the parameter σ𝜎\sigmaitalic_σ from that of the latent variable.

III.2 Y-shaped conformal autoencoder

Refer to caption

Refer to caption
Refer to caption
Refer to caption
Refer to caption
Figure 3: (a) A schematic of the Y-shaped conformal autoencoder. The inputs to the network (ψ1subscript𝜓1\psi_{1}italic_ψ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and ψ2subscript𝜓2\psi_{2}italic_ψ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT) are shown as green nodes. The outputs of the autoencoder (ψ^1subscript^𝜓1\hat{\psi}_{1}over^ start_ARG italic_ψ end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and ψ^2subscript^𝜓2\hat{\psi}_{2}over^ start_ARG italic_ψ end_ARG start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT) and the estimator are shown as red nodes. The latent variables (ν1subscript𝜈1\nu_{1}italic_ν start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and ν2subscript𝜈2\nu_{2}italic_ν start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT) are shown as light blue nodes. (b-c) The obtained latent coordinates ν1,ν2subscript𝜈1subscript𝜈2\nu_{1},\nu_{2}italic_ν start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_ν start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT colored with σ𝜎\sigmaitalic_σ and M1subscript𝑀1M_{1}italic_M start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT respectively. We can see that the σ𝜎\sigmaitalic_σ appears to vary only across ν1subscript𝜈1\nu_{1}italic_ν start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, compared to Fig. 2 (c) in which σ𝜎\sigmaitalic_σ varied across both ψ1,ψ2subscript𝜓1subscript𝜓2\psi_{1},\psi_{2}italic_ψ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_ψ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT. Similarly, M1subscript𝑀1M_{1}italic_M start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT appears to vary only across ν2subscript𝜈2\nu_{2}italic_ν start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT. (d) The parameter σ𝜎\sigmaitalic_σ is plotted against the latent coordinate ν1subscript𝜈1\nu_{1}italic_ν start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT indicating a strong dependence. (e) Contour lines representing level sets of ν1subscript𝜈1\nu_{1}italic_ν start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and ν2subscript𝜈2\nu_{2}italic_ν start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT are plotted in the Diffusion Maps (ψ1subscript𝜓1\psi_{1}italic_ψ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT,ψ2subscript𝜓2\psi_{2}italic_ψ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT) providing a visual illustration of the obtained conformality.

In this section, we illustrate the use of the Y-shaped conformal autoencoder, originally introduced in our previous work [17] as a means of disentangling the effect of different parameter combinations on model outputs in the context of parameter nonidentifiability. The Y-shaped conformal autoencoder seeks a reparametrization of the Diffusion Maps coordinates in which it disentangles the effect of the parameter σ𝜎\sigmaitalic_σ from that of the latent coordinates of the autoencoder. A more detailed description of the overall framework, the different loss function components of the network, and the training procedure we followed are discussed in Section A.3. A schematic of the Y-shaped conformal autoencoder is shown in Figure 3. The coordinates ψ1subscript𝜓1\psi_{1}italic_ψ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and ψ2subscript𝜓2\psi_{2}italic_ψ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT, obtained by computing the Diffusion Maps algorithm on estimated histograms of the ABM, were the ones used as input to the Y-shaped conformal autoencoder.

The bottleneck latent variables ν1,ν2subscript𝜈1subscript𝜈2\nu_{1},\nu_{2}italic_ν start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_ν start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT are shown in Figures 3 and 3 colored with the parameter σ𝜎\sigmaitalic_σ and the first moment M1subscript𝑀1M_{1}italic_M start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT (that is the known coarse variable for this model [9]). We see that the parameter σ𝜎\sigmaitalic_σ appears to be varying only along ν1subscript𝜈1\nu_{1}italic_ν start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, compared to Figure 2 in which σ𝜎\sigmaitalic_σ varied across both ψ1,ψ2subscript𝜓1subscript𝜓2\psi_{1},\psi_{2}italic_ψ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_ψ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT. Similarly, M1subscript𝑀1M_{1}italic_M start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT varies only along ν2subscript𝜈2\nu_{2}italic_ν start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT. The latent variable ν1subscript𝜈1\nu_{1}italic_ν start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT shows a strong correlation with the parameter σ𝜎\sigmaitalic_σ, as depicted by Figure 3, indicating that they are effectively one-to-one. The conformality efficiency of the autoencoder becomes visually pronounced in Figure 3 where level sets of ν1subscript𝜈1\nu_{1}italic_ν start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and ν2subscript𝜈2\nu_{2}italic_ν start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT are plotted in the Diffusion Maps coordinates. Furthermore, the average value of the cosine of the angle, cosθ=ν1ν2ν1ν2𝜃subscript𝜈1subscript𝜈2normsubscript𝜈1normsubscript𝜈2\cos\theta=\frac{\nabla\nu_{1}\cdot\nabla\nu_{2}}{||\nabla\nu_{1}||||\nabla\nu% _{2}||}roman_cos italic_θ = divide start_ARG ∇ italic_ν start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ⋅ ∇ italic_ν start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG start_ARG | | ∇ italic_ν start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT | | | | ∇ italic_ν start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT | | end_ARG, calculated over the test set, we found it to be 5×1045superscript1045\times 10^{-4}5 × 10 start_POSTSUPERSCRIPT - 4 end_POSTSUPERSCRIPT.

This reparametrization of the latent coordinates is crucial since, as we show in the next section, it allows us to identify a single-state dimension ODE depending on σ𝜎\sigmaitalic_σ (rather than a needlessly two-dimensional ODE).

In Section A.3 we provide additional results obtained for the Y-shaped autoencoder illustrating the reconstruction of its input data (ψ1,ψ2subscript𝜓1subscript𝜓2\psi_{1},\psi_{2}italic_ψ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_ψ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT). Remarkably, in addition, the Y-shaped autoencoder allows us to estimate σ𝜎\sigmaitalic_σ from previously unseen histogram observation through ν1subscript𝜈1\nu_{1}italic_ν start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT.

III.3 Identifying parameter dependent ODEs

Using the latent variable ν2subscript𝜈2\nu_{2}italic_ν start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT obtained from the Y-shaped conformal autoencoder as the state variable, we now identify a σ𝜎\sigmaitalic_σ-dependent ODE; remember that ν2subscript𝜈2\nu_{2}italic_ν start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT is conformal to ν1subscript𝜈1\nu_{1}italic_ν start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT (and therefore to σ𝜎\sigmaitalic_σ). The identification of the parameter-dependent ODE was achieved through a residual neural network inspired by the forward Euler numerical integration scheme. A schematic of the neural network is shown in Figure 4. The inputs to this network are the state variable at time t𝑡titalic_t, ν2(t)subscript𝜈2𝑡\nu_{2}(t)italic_ν start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_t ), and the parameter σ𝜎\sigmaitalic_σ, and the output is the state variable evolved to time t+h𝑡t+hitalic_t + italic_h, ν2(t+h)subscript𝜈2𝑡\nu_{2}(t+h)italic_ν start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_t + italic_h ). We provide a more detailed description of the forward Euler network and the constructed loss function scheme in Section A.4.

We test the ability of the ODE to produce accurate paths (trajectories), in ν2subscript𝜈2\nu_{2}italic_ν start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT space, for seven unseen values of the parameter σ={0.57,0.85,1.11,1.75,1.9,2.06,2.25}𝜎0.570.851.111.751.92.062.25\sigma=\{0.57,0.85,1.11,1.75,1.9,2.06,2.25\}italic_σ = { 0.57 , 0.85 , 1.11 , 1.75 , 1.9 , 2.06 , 2.25 }, see Section A.1 for more details. In Figure 4 we illustrate the predicted values ν^2(t+h)subscript^𝜈2𝑡\hat{\nu}_{2}(t+h)over^ start_ARG italic_ν end_ARG start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_t + italic_h ) from the Euler neural network against the ground-truth (projected trajectories in ν2subscript𝜈2\nu_{2}italic_ν start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT generated by the ABM) for all seven unseen values of the parameter σ𝜎\sigmaitalic_σ. The estimated MSE in this case was MSE = 6×1046superscript1046\times 10^{-4}6 × 10 start_POSTSUPERSCRIPT - 4 end_POSTSUPERSCRIPT. In addition, in Figure 4 we provide a visual comparison between simulated by the Euler Neural Network trajectories and the ABM model (projected in ν2subscript𝜈2\nu_{2}italic_ν start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT) for the test parameter values σ={1.11,1.75,1.9}𝜎1.111.751.9\sigma=\{1.11,1.75,1.9\}italic_σ = { 1.11 , 1.75 , 1.9 }. The paths generated by the ABM and those generated by the neural network-identified ODE show excellent visual agreement across different values of σ𝜎\sigmaitalic_σ and different initial conditions. This suggests that the identified right-hand-side provides a successful approximation of the ground-truth dynamics.

Refer to caption
Refer to caption
Refer to caption
Figure 4: (a) A schematic of the forward Euler residual neural network. The state variable ν2(t)subscript𝜈2𝑡\nu_{2}(t)italic_ν start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_t ), the parameter σ𝜎\sigmaitalic_σ, and the time step hhitalic_h are inputs to the neural network that estimates the right-hand side fθsubscript𝑓𝜃f_{\theta}italic_f start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT of the ODE. The right-hand-side is then used to estimate the state variable ν2^(t+h)^subscript𝜈2𝑡\hat{\nu_{2}}(t+h)over^ start_ARG italic_ν start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG ( italic_t + italic_h ) by using a forward Euler step. (b) The predicted value for ν^2(t+h)subscript^𝜈2𝑡\hat{\nu}_{2}(t+h)over^ start_ARG italic_ν end_ARG start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_t + italic_h ) estimated from the Euler neural network is plotted against the true value (projected ABM trajectories in ν2subscript𝜈2\nu_{2}italic_ν start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT space) for values of the parameter σ={0.57,0.85,1.11,1.75,1.9,2.06,2.25}𝜎0.570.851.111.751.92.062.25\sigma=\{0.57,0.85,1.11,1.75,1.9,2.06,2.25\}italic_σ = { 0.57 , 0.85 , 1.11 , 1.75 , 1.9 , 2.06 , 2.25 } not included in the training set. (c) For three values, of the parameter not included in the training set we contrast generated paths from the ODE (solid lines) with paths simulated by the ABM (dashed lines) embedded in ν2subscript𝜈2\nu_{2}italic_ν start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT. The colors black, blue (gray) and green (light-gray) corresponds to σ=1.11𝜎1.11\sigma=1.11italic_σ = 1.11, σ=1.75𝜎1.75\sigma=1.75italic_σ = 1.75 and σ=2.25𝜎2.25\sigma=2.25italic_σ = 2.25 respectively.

III.4 Bifurcation Diagram: phase transition

Refer to caption
Refer to caption
Figure 5: (a) A representative bifurcation diagram constructed by the identified right-hand-side of the Euler neural network suggests a “perturbed” pitchfork. (b) A representative bifurcation diagram after applying equation (2) to the identified right-hand-side shows a symmetric pitchfork.

In this section, we now proceed to test whether the parameter-dependent ODE identified in a data-driven way through our neural network can qualitatively and quantitatively capture the phase transition. To assess the robustness of our approach, we also estimate the error associated with the phase transition by training 5,000 neural networks in parallel, using different splits of the data for training, validation, and testing. All hyperparameters were kept the same across models during this computation. For each of the trained networks given the identified right-hand-side, we compute the steady-states across a range of values of the parameter σ𝜎\sigmaitalic_σ and construct the bifurcation diagram. For values of σ[0.5,2.2]𝜎0.52.2\sigma\in[0.5,2.2]italic_σ ∈ [ 0.5 , 2.2 ] a representative constructed bifurcation diagram is shown in Figure 5. The bifurcation diagram shows that the model identifies the existence of three steady states, two stable and one unstable, for σ<1.84𝜎1.84\sigma<1.84italic_σ < 1.84. For σ1.84𝜎1.84\sigma\geq 1.84italic_σ ≥ 1.84 a unique stable steady state exists. However, the computed bifurcation diagram clearly possesses the flip symmetry of the generic pitchfork bifurcation. We have identified a perturbed pitchfork, in which the upper branch remains permanently stable and the lower branch exhibits a turning point where a stable and an unstable component collide. The consistency of identifying a perturbed pitchfork was maintained across the different models we trained.

To make sure we recover a generically symmetric pitchfork bifurcation diagram, a simple transformation suffices: given the identified right-hand-side fθ(ν2(t);σ)subscript𝑓𝜃subscript𝜈2𝑡𝜎f_{\theta}(\nu_{2}(t);\sigma)italic_f start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( italic_ν start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_t ) ; italic_σ ) of the neural network we compute

g(ν2;σ)=fθ(ν2(t);σ)fθ(ν2(t);σ)2.𝑔subscript𝜈2𝜎subscript𝑓𝜃subscript𝜈2𝑡𝜎subscript𝑓𝜃subscript𝜈2𝑡𝜎2g(\nu_{2};\sigma)=\frac{f_{\theta}(\nu_{2}(t);\sigma)-f_{\theta}(-\nu_{2}(t);% \sigma)}{2}.italic_g ( italic_ν start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ; italic_σ ) = divide start_ARG italic_f start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( italic_ν start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_t ) ; italic_σ ) - italic_f start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( - italic_ν start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_t ) ; italic_σ ) end_ARG start_ARG 2 end_ARG . (2)

The vector field g(ν2;σ)𝑔subscript𝜈2𝜎g(\nu_{2};\sigma)italic_g ( italic_ν start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ; italic_σ ) is then used to construct the bifurcation diagram, shown in Figure 5 correctly capturing the symmetry. Note that the computation of Equation (2) does not require a second neural network but only evaluations of the trained Euler neural network discussed in the previous section. The average critical parameter of the identified ODE (based on the 5000500050005000 trained neural networks) was estimated at σ=1.837superscript𝜎1.837\sigma^{*}=1.837italic_σ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT = 1.837 and the standard deviation was 0.0260.0260.0260.026; the true critical transition has been computed to be at σ=1.890superscript𝜎1.890\sigma^{*}=1.890italic_σ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT = 1.890 [9]–see also[14][Eqn. 3.52] for the case of additive noise. This suggests that the identified ODE provides a qualitatively correct and arguably quantitatively accurate approximation of the phase transition. This small discrepancy can likely be attributed to the limited range of σ𝜎\sigmaitalic_σ values used in our training data. With only 18181818 distinct values for σ𝜎\sigmaitalic_σ (details on sampling in Section A.1), the closest ones to the true bifurcation point are σ=1.80𝜎1.80\sigma=1.80italic_σ = 1.80 and σ=1.90𝜎1.90\sigma=1.90italic_σ = 1.90. Using a finer grid of parameter values we believe it would have led to even smaller discrepancy.

IV Summary and Conclusions

We presented a data-driven framework for identifying qualitatively and approximating quantitatively phase transitions of interacting agent systems through an interplay between direct simulations and machine-learning-assisted, coarse-grained system identification and bifurcation analysis. We demonstrated that our framework is capable of identifying the phase transition of the DZ system with a single, interpretable data-driven variable, in contrast to the derived closed-form model, proposed in [9] where a system of four approximate ODEs were required for the task. To discover the effective coarse (collective) variables that parameterize collected data from the ABM, the Diffusion Maps manifold learning algorithm [16] was used. We showed that, for a constant parameter value (σ=1𝜎1\sigma=1italic_σ = 1), away from the phase transition, the Diffusion Maps algorithm discovers a one-dimensional manifold parametrized by the data-driven coordinate ψ1subscript𝜓1\psi_{1}italic_ψ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT; and that ψ1subscript𝜓1\psi_{1}italic_ψ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT is one-to-one with the theoretical order parameter M1subscript𝑀1M_{1}italic_M start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT. We then illustrated that, while data collected over a range of parameter values (bracketing the critical value) lie on a two-dimensional manifold, this manifold can be “disentangled” (factored) into a one-dimensional state-variability manifold “cross” a one-dimensional parameter variability manifold. To disentangle the variability due to the parameter σ𝜎\sigmaitalic_σ from the state variability, in terms of the latent Diffusion Maps coordinates (ψ1,ψ2subscript𝜓1subscript𝜓2\psi_{1},\psi_{2}italic_ψ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_ψ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT), we introduced a Y-shaped conformal autoencoder, initially proposed in [17] for addressing parameter non-identifiability. We illustrated that the autoencoder’s latent variables (ν1,ν2subscript𝜈1subscript𝜈2\nu_{1},\nu_{2}italic_ν start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_ν start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT) form a conformal set of coordinates that ultimately allow us to learn a single effective nonlinear ODE, in terms of a single state variable ν2subscript𝜈2\nu_{2}italic_ν start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT and depending on a single parameter σ𝜎\sigmaitalic_σ (the latter being one-to-one with ν1subscript𝜈1\nu_{1}italic_ν start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT). This effective nonlinear ODE was identified via a residual neural network, templated on the forward Euler numerical integrator. We compared generated paths of the identified ODE for test values of the parameter σ𝜎\sigmaitalic_σ to paths generated with the full ABM embedded in the latent coordinate ν2subscript𝜈2\nu_{2}italic_ν start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT and demonstrated good agreement between the two. To construct the bifurcation diagram from the identified right-hand-side of the ODE, we used an odd symmetry transformation of the right-hand-side. This transformation provides a symmetric vector field and captures the pitchfork bifurcation that denotes the phase transition. Imposing known physical symmetries in the data-driven models in order to enhance accuracy and improve generalizability is currently an active area of research [25, 26, 27, 28, 29, 30, 31, 32, 33].

Our proposed framework can be extended to a broad class of complex, multiscale dynamical systems for which a fine scale, atomistic/stochastic mode exists, but for which accurate closed form macroscopic equations at the coarse-grained level are not explicitly known. We are particularly interested in models from the social sciences, where no physics-informed natural choice for the order parameter(s) exists. As an example, we mention the noisy Hegselmann-Krause model for opinion dynamics [34], for which it has been rigorously proved that a discontinuous disorder (no consensus)/order (consensus) phase transition exists [6][Prop. 6.2]. It would be interesting to identify the phenomenological order parameter introduced in [34] using our approach. Other examples for which our approach is expected to be applicable are the Keller-Segel model for chemotaxis, see the work in [35, 36, 37, 38], the Fitzhugh–Nagumo from Lattice Boltzmann in [39, 40], the mean-field Fitzhugh–Nagumo model for Neurons  [41][Sec. 5.6], and the Dean-Kawasaki stochastic PDE [3, 42], that takes into account finite-particle effects and fluctuations.

Acknowledgements.
I.G.K. acknowledges partial support from the US AFOSR FA9550-21-0317 and the US Department of Energy SA22-0052-S001. G.A.P. is partially supported by the Frontier Research Advanced Investigator Grant ERC grant Machine-aided general framework for fluctuating dynamic density functional theory. The authors are grateful to N. Zagli and A. Zanoni for useful discussions for making available the code from  [9] and [41], respectively. Many thanks to T. Gaskin for helping set up the collaboration with Imperial College London. He acknowledges the hospitality of the Johns Hopkins group.

Codes

The code used to generate the results for this paper will become available as a public repository at https://gitlab.com/nicolasevangelou/ml_phase_transition upon acceptance of the paper.

Appendix A

A.1 Data Collection

Given the ABM model described in Section II in the main text, we generated trajectories at 18 equidistant values of σ[0.5,2.2]𝜎0.52.2\sigma\in[0.5,2.2]italic_σ ∈ [ 0.5 , 2.2 ] [9]. For each value of σ𝜎\sigmaitalic_σ we sampled 100 trajectories for different initial conditions (agents’ distribution) drawn from the Pearson distribution (implemented in Matlab) with prescribed mean, standard deviation, skewness, and kurtosis. The values for the mean, standard deviation, skewness, and kurtosis were chosen randomly from a prescribed grid of equidistant points: The mean was chosen from the range [2.0,2.0]2.02.0[-2.0,2.0][ - 2.0 , 2.0 ], with increments of 0.20.20.20.2. The standard deviation was selected from the range [0.0,2.0]0.02.0[0.0,2.0][ 0.0 , 2.0 ], with increments of 0.10.10.10.1. The skewness was chosen from the range [2.0,2.0]2.02.0[-2.0,2.0][ - 2.0 , 2.0 ], with increments of 0.20.20.20.2 and the kurtosis was selected from the range [0.0,15]0.015[0.0,15][ 0.0 , 15 ], with increments of 0.720.720.720.72. This scheme ensured a dense sampling both, in parameter and state space.

The integration time of the ABM, containing N=12,000𝑁12000N=12,000italic_N = 12 , 000 agents, was set to tf=10subscript𝑡𝑓10t_{f}=10italic_t start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT = 10 with a time-step dt=0.005𝑑𝑡0.005dt=0.005italic_d italic_t = 0.005. Data were collected every five snapshots which led to a total of 400400400400 snapshots per trajectory. Therefore, for a single value of the parameter, the total number of initial data is n=40,000𝑛40000n=40,000italic_n = 40 , 000 (400×100400100400\times 100400 × 100) while for multiple values of the parameter n=720,000𝑛720000n=720,000italic_n = 720 , 000 (400×100×1840010018400\times 100\times 18400 × 100 × 18). Note that a number of trajectories explodes and thus, are omitted from any further computations.

An additional preprocessing step was applied to the data before the Diffusion Maps computation that includes discarding a short transient tcutsubscript𝑡𝑐𝑢𝑡t_{cut}italic_t start_POSTSUBSCRIPT italic_c italic_u italic_t end_POSTSUBSCRIPT for each trajectory. This ensures that the fast transients have decayed and the collected data contains only the long-term dynamics (that live on the slow manifold). When dealing with multiple parameter values we chose tcut=1subscript𝑡𝑐𝑢𝑡1t_{cut}=1italic_t start_POSTSUBSCRIPT italic_c italic_u italic_t end_POSTSUBSCRIPT = 1 while for a single value of the parameter σ=1𝜎1\sigma=1italic_σ = 1 we choose tcut=0.5subscript𝑡𝑐𝑢𝑡0.5t_{cut}=0.5italic_t start_POSTSUBSCRIPT italic_c italic_u italic_t end_POSTSUBSCRIPT = 0.5 to make sure that enough transients are close to the unstable steady state, otherwise, the manifold would appear as two clusters. As a test set, we sampled nine trajectories for each of the seven values of σ={0.57,0.85,1.11,1.75,1.9,2.06,2.25}𝜎0.570.851.111.751.92.062.25\sigma=\{0.57,0.85,1.11,1.75,1.9,2.06,2.25\}italic_σ = { 0.57 , 0.85 , 1.11 , 1.75 , 1.9 , 2.06 , 2.25 } not included in the training set. The selection of these test values was made to validate the predictions of our Euler neural network for dynamics before and after the bifurcation.

A.2 Diffusion Maps

The Diffusion Maps algorithm, introduced by Coifman and Lafon [16] can be used to discover a low-dimensional parameterization of high-dimensional data 𝐗={𝒙i}i=1n𝐗superscriptsubscriptsubscript𝒙𝑖𝑖1𝑛\mathbf{X}=\{\bm{x}_{i}\}_{i=1}^{n}bold_X = { bold_italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT with each 𝒙imsubscript𝒙𝑖superscript𝑚\bm{x}_{i}\in\mathbb{R}^{m}bold_italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT. Diffusion Maps constructs a weighted graph 𝐊n×n𝐊superscript𝑛𝑛\mathbf{K}\in\mathbb{R}^{n\times n}bold_K ∈ blackboard_R start_POSTSUPERSCRIPT italic_n × italic_n end_POSTSUPERSCRIPT between the sampled data points by using a kernel function. A common choice, also used in our case, is the Gaussian kernel

K(𝒙i,𝒙j)=exp(𝒙i𝒙j222ε),𝐾subscript𝒙𝑖subscript𝒙𝑗superscriptsubscriptnormsubscript𝒙𝑖subscript𝒙𝑗222𝜀K(\bm{x}_{i},\bm{x}_{j})=\exp\bigg{(}\frac{-||\bm{x}_{i}-\bm{x}_{j}||_{2}^{2}}% {2\varepsilon}\bigg{)},italic_K ( bold_italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) = roman_exp ( divide start_ARG - | | bold_italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - bold_italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT | | start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 2 italic_ε end_ARG ) , (3)

where ε𝜀\varepsilonitalic_ε is a positive hyperparameter that controls the rate of the kernel’s decay. The metric ||||||\cdot||| | ⋅ | | in our case was chosen as the 2superscript2\ell^{2}roman_ℓ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT norm but different metrics are also possible.

To discover a low-dimensional manifold, regardless of the sampling density, the following normalization is required

𝐊~=𝐏1𝐊𝐏1~𝐊superscript𝐏1superscript𝐊𝐏1\mathbf{\tilde{K}}=\mathbf{P}^{-1}\mathbf{K}\mathbf{P}^{-1}over~ start_ARG bold_K end_ARG = bold_P start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT bold_KP start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT (4)

where Pii=j=1nKjjsubscript𝑃𝑖𝑖superscriptsubscript𝑗1𝑛subscript𝐾𝑗𝑗P_{ii}=\sum_{j=1}^{n}K_{jj}italic_P start_POSTSUBSCRIPT italic_i italic_i end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_K start_POSTSUBSCRIPT italic_j italic_j end_POSTSUBSCRIPT. A second normalization of 𝐊~~𝐊\mathbf{\tilde{K}}over~ start_ARG bold_K end_ARG recovers a row-stochastic, Markovian matrix

𝐌=𝐃1𝐊~𝐌superscript𝐃1~𝐊\mathbf{M}=\mathbf{D}^{-1}\mathbf{\tilde{K}}bold_M = bold_D start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT over~ start_ARG bold_K end_ARG (5)

where 𝐃𝐃\mathbf{D}bold_D is a diagonal matrix defined as Dii=j=1nK~ijsubscript𝐷𝑖𝑖superscriptsubscript𝑗1𝑛subscript~𝐾𝑖𝑗D_{ii}=\sum_{j=1}^{n}\tilde{K}_{ij}italic_D start_POSTSUBSCRIPT italic_i italic_i end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT over~ start_ARG italic_K end_ARG start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT.

The entries of matrix 𝐌𝐌\mathbf{M}bold_M can be seen as probabilities of jumping from one point to the other. The eigendecomposition of 𝐌𝐌\mathbf{M}bold_M,

𝐌ϕi=λiϕi,𝐌subscriptbold-italic-ϕ𝑖subscript𝜆𝑖subscriptbold-italic-ϕ𝑖\mathbf{M}\bm{\phi}_{i}=\lambda_{i}\bm{\phi}_{i},bold_M bold_italic_ϕ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = italic_λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT bold_italic_ϕ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , (6)

provides a set of eigenvectors ϕisubscriptbold-italic-ϕ𝑖\bm{\phi}_{i}bold_italic_ϕ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and corresponding eigenvalues λisubscript𝜆𝑖\lambda_{i}italic_λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT. To obtain a more parsimonious representation of the original data set 𝐗𝐗\mathbf{X}bold_X proper selection of the eigenvectors is needed. If the intrinsic dimension of the data is small, this selection can be achieved by visual inspection of the non-harmonic eigenvectors (eigenvectors that span independent directions) [43]. Alternatively, the local-linear regression algorithm proposed by Dsilva et al. [23] can be used for selecting the non-harmonic eigenvectors. If, the number of independent non-harmonic eigenvectors is smaller than the dimension m𝑚mitalic_m of the data 𝐗𝐗\mathbf{X}bold_X, then dimensionality reduction has been achieved. In our work, the Python library datafold [24] was used for the Diffusion Maps and the local-linear regression algorithms.

A.2.1 Nystöm Extension

The Nyström Extension formula provides a numerical approximation of eigenfunctions of the form [44]

abM(𝒙j,𝒙i)ϕi(𝒙i)𝑑𝒙i=λϕ(𝒙j).superscriptsubscript𝑎𝑏𝑀subscript𝒙𝑗subscript𝒙𝑖subscriptitalic-ϕ𝑖subscript𝒙𝑖differential-dsubscript𝒙𝑖𝜆italic-ϕsubscript𝒙𝑗\int_{a}^{b}M(\bm{x}_{j},\bm{x}_{i})\phi_{i}(\bm{x}_{i})d\bm{x}_{i}=\lambda% \phi({\bm{x}_{j}}).∫ start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_b end_POSTSUPERSCRIPT italic_M ( bold_italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , bold_italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) italic_ϕ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) italic_d bold_italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = italic_λ italic_ϕ ( bold_italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) . (7)

In our work, Nyström extension is utilized when new out-of-sample data points are given, e.g., 𝒙new𝐗subscript𝒙𝑛𝑒𝑤𝐗\bm{x}_{new}\notin\mathbf{X}bold_italic_x start_POSTSUBSCRIPT italic_n italic_e italic_w end_POSTSUBSCRIPT ∉ bold_X. To generate the Diffusion Maps coordinates ϕnewsubscriptbold-italic-ϕ𝑛𝑒𝑤\bm{\phi}_{new}bold_italic_ϕ start_POSTSUBSCRIPT italic_n italic_e italic_w end_POSTSUBSCRIPT Nyström extension uses an interpolation scheme based on the kernel computations and normalizations applied during the dimensionality reduction step, discussed in the previous Section. The Nyström extension formula reads

ϕi(𝒙new)=1λij=1n𝐌~(𝒙new,𝒙j)ϕi(𝒙j),subscriptitalic-ϕ𝑖subscript𝒙𝑛𝑒𝑤1subscript𝜆𝑖superscriptsubscript𝑗1𝑛~𝐌subscript𝒙𝑛𝑒𝑤subscript𝒙𝑗subscriptitalic-ϕ𝑖subscript𝒙𝑗\phi_{i}(\bm{x}_{new})=\frac{1}{\lambda_{i}}\sum_{j=1}^{n}\tilde{\mathbf{M}}(% \bm{x}_{new},\bm{x}_{j})\phi_{i}(\bm{x}_{j}),italic_ϕ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_italic_x start_POSTSUBSCRIPT italic_n italic_e italic_w end_POSTSUBSCRIPT ) = divide start_ARG 1 end_ARG start_ARG italic_λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT over~ start_ARG bold_M end_ARG ( bold_italic_x start_POSTSUBSCRIPT italic_n italic_e italic_w end_POSTSUBSCRIPT , bold_italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) italic_ϕ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) , (8)

where ϕi(xnew)subscriptitalic-ϕ𝑖subscript𝑥𝑛𝑒𝑤\phi_{i}(x_{new})italic_ϕ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_n italic_e italic_w end_POSTSUBSCRIPT ) denotes the estimated i𝑖iitalic_i-th eigenvector for the data point ϕnewsubscriptbold-italic-ϕ𝑛𝑒𝑤\bm{\phi}_{new}bold_italic_ϕ start_POSTSUBSCRIPT italic_n italic_e italic_w end_POSTSUBSCRIPT, λisubscript𝜆𝑖\lambda_{i}italic_λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT denotes the corresponding eigenvalue, ϕi(𝒙j)subscriptitalic-ϕ𝑖subscript𝒙𝑗\phi_{i}(\bm{x}_{j})italic_ϕ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) denotes the j𝑗jitalic_j-th component of the i𝑖iitalic_i-th eigenvector and 𝐌~(,)~𝐌\tilde{\mathbf{M}}(\cdot,\cdot)over~ start_ARG bold_M end_ARG ( ⋅ , ⋅ ) denotes the kernel function used to determine the similarity of 𝒙newsubscript𝒙𝑛𝑒𝑤\bm{x}_{new}bold_italic_x start_POSTSUBSCRIPT italic_n italic_e italic_w end_POSTSUBSCRIPT to all the points in 𝐗𝐗\mathbf{X}bold_X.

A.2.2 Diffusion Maps on ABM data

In this section, we provide details on how the Diffusion Maps algorithm was computed on histograms and moments from the ABM. In the first case, for each snapshot of the ABM we constructed a histogram as an approximation of the agents’ density. Each histogram contains 40404040 equidistant bins defined in the range [4,4]44[-4,4][ - 4 , 4 ]. This range ensures that all agents in the collected (training) data lie in between. Note, that the method is insensitive to the selected number of bins. To reduce the computational cost of the Diffusion Maps, we subsampled the training data uniformly [24] which resulted in about 3,50035003,5003 , 500 data points when Diffusion Maps applied for a single parameter and about 19,0001900019,00019 , 000 data points for multiple values of the parameter. The hyperparameter ε𝜀\varepsilonitalic_ε was selected as the square of the median of the pairwise distances multiplied by a constant c𝑐citalic_c, where c=0.03𝑐0.03c=0.03italic_c = 0.03 for a single value of the parameter σ𝜎\sigmaitalic_σ and c=20𝑐20c=20italic_c = 20 for multiple values of the parameter σ𝜎\sigmaitalic_σ. In the second case, Diffusion Maps was computed on the sampled moments, M1,M2,M3,M4subscript𝑀1subscript𝑀2subscript𝑀3subscript𝑀4M_{1},M_{2},M_{3},M_{4}italic_M start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_M start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_M start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT , italic_M start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT. Again, to alleviate the computational cost we subsampled the data [24], which results in N11,000similar-to𝑁11000N\sim 11,000italic_N ∼ 11 , 000. The hyperparameter ε𝜀\varepsilonitalic_ε was also selected for this case by computing the median of the pairwise distances multiplied with c=10𝑐10c=10italic_c = 10.

A.2.3 Diffusion Maps: Additional Results

In this section, we provide additional results for the Diffusion Maps computation for σ=1𝜎1\sigma=1italic_σ = 1 on the histograms and the Diffusion Maps computations on the moments M1,M2,M3,M4subscript𝑀1subscript𝑀2subscript𝑀3subscript𝑀4M_{1},M_{2},M_{3},M_{4}italic_M start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_M start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_M start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT , italic_M start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT. We showed in the main text that the Diffusion Maps coordinate ψ1subscript𝜓1\psi_{1}italic_ψ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT is one-to-one with the first moment M1subscript𝑀1M_{1}italic_M start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT. Here we show that M3subscript𝑀3M_{3}italic_M start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT is also one-to-one with ψ1subscript𝜓1\psi_{1}italic_ψ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT (Figure 6) and that the even moments M2subscript𝑀2M_{2}italic_M start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT and M4subscript𝑀4M_{4}italic_M start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT can be seen as functions of ψ1subscript𝜓1\psi_{1}italic_ψ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT (Figures 6,6).

To strengthen, the argument that the manifold in this case is one-dimensional we show in Figure 6 the eigenvectors ψ2ψ9subscript𝜓2subscript𝜓9\psi_{2}-\psi_{9}italic_ψ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT - italic_ψ start_POSTSUBSCRIPT 9 end_POSTSUBSCRIPT plotted against the first non-trivial eigenvector ψ1subscript𝜓1\psi_{1}italic_ψ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT. From Figure 6 it appears that the eigenvectors ψ2ψ9subscript𝜓2subscript𝜓9\psi_{2}-\psi_{9}italic_ψ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT - italic_ψ start_POSTSUBSCRIPT 9 end_POSTSUBSCRIPT are harmonics of ψ1subscript𝜓1\psi_{1}italic_ψ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT. This suggests that the manifold for a fixed value of σ𝜎\sigmaitalic_σ is one-dimensional.

Refer to caption
Refer to caption
Refer to caption
Refer to caption
Figure 6: (a-c) The Diffusion Maps coordinate ψ1subscript𝜓1\psi_{1}italic_ψ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT is plotted against the computed moments M2subscript𝑀2M_{2}italic_M start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT, M3subscript𝑀3M_{3}italic_M start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT and M4subscript𝑀4M_{4}italic_M start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT respectively. (d) The eigenvector ψ1subscript𝜓1\psi_{1}italic_ψ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT is plotted against the eigenvectors ψ2ψ9subscript𝜓2subscript𝜓9\psi_{2}-\psi_{9}italic_ψ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT - italic_ψ start_POSTSUBSCRIPT 9 end_POSTSUBSCRIPT. This supports our argument that the eigenvectors ψ2ψ9subscript𝜓2subscript𝜓9\psi_{2}-\psi_{9}italic_ψ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT - italic_ψ start_POSTSUBSCRIPT 9 end_POSTSUBSCRIPT are harmonics of ψ1subscript𝜓1\psi_{1}italic_ψ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT.
Refer to caption
Refer to caption
Refer to caption
Figure 7: (a-b) The non-harmonic coordinates ϕ1subscriptitalic-ϕ1\phi_{1}italic_ϕ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and ϕ2subscriptitalic-ϕ2\phi_{2}italic_ϕ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT colored with σ𝜎\sigmaitalic_σ and M1subscript𝑀1M_{1}italic_M start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT respectively. (c) The residual rksubscript𝑟𝑘r_{k}italic_r start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT indicates ϕ1subscriptitalic-ϕ1\phi_{1}italic_ϕ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and ϕ2subscriptitalic-ϕ2\phi_{2}italic_ϕ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT are the two non-harmonic coordinates.

The Diffusion Maps algorithm applied using the computed first four moments reveals a two-dimensional manifold embedded in four dimensions. The first two eigenvectors in this case ϕ1subscriptitalic-ϕ1\phi_{1}italic_ϕ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and ϕ2subscriptitalic-ϕ2\phi_{2}italic_ϕ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT, shown in Figures 7 and 7 are the non-harmonic, eigenvectors that parametrize the data. This is corroborated by the larger residuals rksubscript𝑟𝑘r_{k}italic_r start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT of the first two eigenvectors shown in Figure 7. Also in this case, the parameter σ𝜎\sigmaitalic_σ and the first moment M1subscript𝑀1M_{1}italic_M start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT appear visually as functions of the Diffusion Maps coordinates (Figures 7 and 7).This pair of coordinates, obtained by Diffusion Maps on moments, could have been used for the CAE’s training instead of the Diffusion Maps coordinates (ψ1,ψ2subscript𝜓1subscript𝜓2\psi_{1},\psi_{2}italic_ψ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_ψ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT)obtained from the computations on the histograms; but we omit this for brevity.

A.3 Y-shaped conformal autoencoder

The Y-shaped conformal autoencoder was initially presented in [17]. In this work, the Y-shaped conformal autoencoder consists of the three connected (sub)networks

Encoder : (ψ1,ψ2)(ν1,ν2)maps-toEncoder : subscript𝜓1subscript𝜓2subscript𝜈1subscript𝜈2\displaystyle\text{Encoder : }(\psi_{1},\psi_{2})\mapsto(\nu_{1},\nu_{2})Encoder : ( italic_ψ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_ψ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) ↦ ( italic_ν start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_ν start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) (9)
Decoder : (ν1,ν2)(ψ^1,ψ^2)maps-toDecoder : subscript𝜈1subscript𝜈2subscript^𝜓1subscript^𝜓2\displaystyle\text{Decoder : }(\nu_{1},\nu_{2})\mapsto(\hat{\psi}_{1},\hat{% \psi}_{2})Decoder : ( italic_ν start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_ν start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) ↦ ( over^ start_ARG italic_ψ end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , over^ start_ARG italic_ψ end_ARG start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) (10)
Estimator : ν1σ^.maps-toEstimator : subscript𝜈1^𝜎\displaystyle\text{Estimator : }\nu_{1}\mapsto\hat{\sigma}.Estimator : italic_ν start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ↦ over^ start_ARG italic_σ end_ARG . (11)

The Encoder receives as inputs the two Diffusion Maps coordinates ψ1subscript𝜓1\psi_{1}italic_ψ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, ψ2subscript𝜓2\psi_{2}italic_ψ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT and maps them to the latent variables ν1,ν2subscript𝜈1subscript𝜈2\nu_{1},\nu_{2}italic_ν start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_ν start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT. The Decoder aims to reconstruct the Diffusion Maps coordinates from the latent variables ν1subscript𝜈1\nu_{1}italic_ν start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, ν2subscript𝜈2\nu_{2}italic_ν start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT. The Estimator has as input the latent coordinate ν1subscript𝜈1\nu_{1}italic_ν start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and aims to learn a map from ν1subscript𝜈1\nu_{1}italic_ν start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT to the parameter σ𝜎\sigmaitalic_σ.

The loss function used to train the Y-shaped conformal autoencoder consists of three parts: (a) The loss function of the Encoder-Decoder (autoencoder) aesubscriptae\mathcal{L}_{\text{ae}}caligraphic_L start_POSTSUBSCRIPT ae end_POSTSUBSCRIPT that aims to reconstruct the input itself (b) the loss function of the Estimator, est.subscriptest.\mathcal{L}_{\text{est.}}caligraphic_L start_POSTSUBSCRIPT est. end_POSTSUBSCRIPT, that aims to reproduce the parameter σ𝜎\sigmaitalic_σ given ν1subscript𝜈1\nu_{1}italic_ν start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and (c) the loss function for imposing the conformality constrain, con.subscriptcon.\mathcal{L}_{\text{con.}}caligraphic_L start_POSTSUBSCRIPT con. end_POSTSUBSCRIPT, between ν1subscript𝜈1\nu_{1}italic_ν start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and ν2subscript𝜈2\nu_{2}italic_ν start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT,

ν1,ν2=0subscript𝜈1subscript𝜈20\langle\nabla\nu_{1},\nabla\nu_{2}\rangle=0⟨ ∇ italic_ν start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , ∇ italic_ν start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ⟩ = 0 (12)

where the gradient \nabla is in terms of the Diffusion Maps coordinates (ψ1subscript𝜓1\psi_{1}italic_ψ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT,ψ2subscript𝜓2\psi_{2}italic_ψ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT) of the input and ,\langle\cdot,\cdot\rangle⟨ ⋅ , ⋅ ⟩ denotes the inner product between the two vectors. The gradients were computed by using the automatic differentiation of Pytorch [45]. In practice, instead of using equation 12, one can minimize the angle between the vectors

cosθ=ν1ν2ν1ν2cos𝜃subscript𝜈1subscript𝜈2normsubscript𝜈1normsubscript𝜈2\text{cos}\theta=\frac{\nabla\nu_{1}\cdot\nabla\nu_{2}}{||\nabla\nu_{1}||||% \nabla\nu_{2}||}cos italic_θ = divide start_ARG ∇ italic_ν start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ⋅ ∇ italic_ν start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG start_ARG | | ∇ italic_ν start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT | | | | ∇ italic_ν start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT | | end_ARG (13)

which stabilizes the training of the network. We describe details for training the network, specifics about the architecture used, and the choice of hyperparameters in the next section.

A.3.1 Hyperparameter selection and training procedure

The implementation of the Y-shaped conformal autoencoder was done with the Pytorch Python library [45].

Each (sub)network (Encoder, Decoder, Estimator) in the architecture of the Y-shaped conformal autoencoder consists of five fully connected layers. The first four hidden layers have 20 neurons and tanh(t) activation functions and the fifth has no activation function (linear activation) and its size depends on the size of the desired output. The ADAM optimizer was chosen for training the overall network. We chose mini-batches of size 32323232 to train the network. The learning rate was selected as η=0.001𝜂0.001\eta=0.001italic_η = 0.001 and 500500500500 epochs. Training the network for a larger number of epochs leads to overfitting.

To train the network and test its generalization capability we used about 19,0001900019,00019 , 000 data points. We split the data into train||||test||||validation as 80808080:10101010:10101010. We then rescaled the training data by using the MinMaxScaler Python preprocessing scheme from sklearn. We applied the same transformation for the validation and test sets. During the training of the network, we used only the training set to perform backpropagation and the validation set to get insight into the model’s performance. The network did not see the test set during training.

The optimization process we performed was heuristic: for a fixed mini-batch two updates (backpropagation steps) were performed. The first step updates the weights of the Encoder-Decoder and the second step the weights of the Estimator-Encoder. Altering this training protocol is possible. In Algorithm 1 below we provide a more detailed description of the network’s training.

Upon training of the neural network, the estimated MSE for the autoencoder’s reconstruction, was ae=1.36e08subscriptae1.36𝑒08\mathcal{L}_{\text{ae}}=1.36e-08caligraphic_L start_POSTSUBSCRIPT ae end_POSTSUBSCRIPT = 1.36 italic_e - 08 on the train set and ae=1.37e08subscriptae1.37𝑒08\mathcal{L}_{\text{ae}}=1.37e-08caligraphic_L start_POSTSUBSCRIPT ae end_POSTSUBSCRIPT = 1.37 italic_e - 08 on the test set. The MSE for the Estimator was est.=1.2e03subscriptest.1.2𝑒03\mathcal{L}_{\text{est.}}=1.2e-03caligraphic_L start_POSTSUBSCRIPT est. end_POSTSUBSCRIPT = 1.2 italic_e - 03 for the train set and est.=1.2e03subscriptest.1.2𝑒03\mathcal{L}_{\text{est.}}=1.2e-03caligraphic_L start_POSTSUBSCRIPT est. end_POSTSUBSCRIPT = 1.2 italic_e - 03 for the test. The average value of the cosθcos𝜃\text{cos}\thetacos italic_θ, on the train set was con.=2.72e05subscriptcon.2.72𝑒05\mathcal{L}_{\text{con.}}=2.72e-05caligraphic_L start_POSTSUBSCRIPT con. end_POSTSUBSCRIPT = 2.72 italic_e - 05 and on the test set con.=2.63e05subscriptcon.2.63𝑒05\mathcal{L}_{\text{con.}}=2.63e-05caligraphic_L start_POSTSUBSCRIPT con. end_POSTSUBSCRIPT = 2.63 italic_e - 05.

Input: Diffusion Maps coordinates ψ1subscript𝜓1\psi_{1}italic_ψ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT,ψ2subscript𝜓2\psi_{2}italic_ψ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT and parameter values σ𝜎\sigmaitalic_σ.
Output: The weights of: (i) Encoder (θencodersubscript𝜃encoder\theta_{\text{encoder}}italic_θ start_POSTSUBSCRIPT encoder end_POSTSUBSCRIPT),
(ii) Decoder (θdecodersubscript𝜃decoder\theta_{\text{decoder}}italic_θ start_POSTSUBSCRIPT decoder end_POSTSUBSCRIPT), (iii) Estimator (θestimatorsubscript𝜃estimator\theta_{\text{estimator}}italic_θ start_POSTSUBSCRIPT estimator end_POSTSUBSCRIPT).
For i=1,2,,T𝑖12𝑇i=1,2,...,Titalic_i = 1 , 2 , … , italic_T
  1. 1.

    Predict:

    (ν1,ν2)subscript𝜈1subscript𝜈2\displaystyle(\nu_{1},\nu_{2})( italic_ν start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_ν start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) =Encoder(ψ1,ψ2)absentEncodersubscript𝜓1subscript𝜓2\displaystyle=\text{Encoder}(\psi_{1},\psi_{2})= Encoder ( italic_ψ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_ψ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT )
    (ψ^1,ψ^2)subscript^𝜓1subscript^𝜓2\displaystyle(\hat{\psi}_{1},\hat{\psi}_{2})( over^ start_ARG italic_ψ end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , over^ start_ARG italic_ψ end_ARG start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) =Decoder(ν1,ν2)absentDecodersubscript𝜈1subscript𝜈2\displaystyle=\text{Decoder}(\nu_{1},\nu_{2})= Decoder ( italic_ν start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_ν start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT )
  2. 2.

    Compute Autoencoder (Encoder-Decoder) and Conformality Losses:

    1subscript1\displaystyle\mathcal{L}_{1}caligraphic_L start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT =ae+con.absentsubscriptaesubscriptcon.\displaystyle=\mathcal{L}_{\text{ae}}+\mathcal{L}_{\text{con.}}= caligraphic_L start_POSTSUBSCRIPT ae end_POSTSUBSCRIPT + caligraphic_L start_POSTSUBSCRIPT con. end_POSTSUBSCRIPT
    =MSE(𝝍^,𝝍)+αMSE(cosθ,0)absentMSE^𝝍𝝍𝛼MSEcos𝜃0\displaystyle=\text{MSE}(\hat{\bm{\psi}},\bm{\psi})+\alpha\text{MSE}(\text{cos% }\theta,0)= MSE ( over^ start_ARG bold_italic_ψ end_ARG , bold_italic_ψ ) + italic_α MSE ( cos italic_θ , 0 )
  3. 3.

    Backpropagation step - update weights (illustration with gradient descent):

    θencoder=ηθencoder1limit-fromsubscript𝜃encoder𝜂subscript𝜃encodersubscript1\displaystyle\theta_{\text{encoder}}-=\eta{\theta_{\text{encoder}}}\mathcal{L}% _{1}italic_θ start_POSTSUBSCRIPT encoder end_POSTSUBSCRIPT - = italic_η italic_θ start_POSTSUBSCRIPT encoder end_POSTSUBSCRIPT caligraphic_L start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT
    θdecoder=ηθdecoder1limit-fromsubscript𝜃decoder𝜂subscript𝜃decodersubscript1\displaystyle\theta_{\text{decoder}}-=\eta{\theta_{\text{decoder}}}\mathcal{L}% _{1}italic_θ start_POSTSUBSCRIPT decoder end_POSTSUBSCRIPT - = italic_η italic_θ start_POSTSUBSCRIPT decoder end_POSTSUBSCRIPT caligraphic_L start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT
  4. 4.

    Predict:

    (ν1,ν2)=Encoder(ψ1,ψ2)subscript𝜈1subscript𝜈2Encodersubscript𝜓1subscript𝜓2\displaystyle(\nu_{1},\nu_{2})=\text{Encoder}(\psi_{1},\psi_{2})( italic_ν start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_ν start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) = Encoder ( italic_ψ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_ψ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT )
    σ^=Estimator(ν1)^𝜎Estimatorsubscript𝜈1\displaystyle\hat{\sigma}=\text{Estimator}(\nu_{1})over^ start_ARG italic_σ end_ARG = Estimator ( italic_ν start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT )
  5. 5.

    Compute Estimator Loss:

    est.=MSE(σ^,σ)subscriptest.MSE^𝜎𝜎\displaystyle\mathcal{L}_{\text{est.}}=\text{MSE}(\hat{\sigma},\sigma)caligraphic_L start_POSTSUBSCRIPT est. end_POSTSUBSCRIPT = MSE ( over^ start_ARG italic_σ end_ARG , italic_σ )
  6. 6.

    Backpropagation step - update weights (illustration with gradient descent)

    θestimator=ηθestimatorest.limit-fromsubscript𝜃estimator𝜂subscript𝜃estimatorsubscriptest.\displaystyle\theta_{\text{estimator}}-=\eta{\theta_{\text{estimator}}}% \mathcal{L}_{\text{est.}}italic_θ start_POSTSUBSCRIPT estimator end_POSTSUBSCRIPT - = italic_η italic_θ start_POSTSUBSCRIPT estimator end_POSTSUBSCRIPT caligraphic_L start_POSTSUBSCRIPT est. end_POSTSUBSCRIPT
    θencoder=ηθencoderest.limit-fromsubscript𝜃encoder𝜂subscript𝜃encodersubscriptest.\displaystyle\theta_{\text{encoder}}-=\eta{\theta_{\text{encoder}}}\mathcal{L}% _{\text{est.}}italic_θ start_POSTSUBSCRIPT encoder end_POSTSUBSCRIPT - = italic_η italic_θ start_POSTSUBSCRIPT encoder end_POSTSUBSCRIPT caligraphic_L start_POSTSUBSCRIPT est. end_POSTSUBSCRIPT
Algorithm 1 The algorithm illustrates a full iteration during training of the Y-shaped conformal autoencoder. We set the scale parameter to α=10𝛼10\alpha=10italic_α = 10. The learning rate is denoted as η𝜂\etaitalic_η.

A.3.2 Y-shaped conformal autoencoder: Additional Results

In this section, we provide additional results regarding the Y-shaped conformal autoencoder described in the main text. The ability of the Estimator to predict the parameter σ𝜎\sigmaitalic_σ from ν1subscript𝜈1\nu_{1}italic_ν start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT is shown in Figure 8 for train (black dots) and test (red dots) points. We also illustrate, in Figures 8, 8, the reconstruction of the autoencoder for train and test points.

Refer to caption
Refer to caption
Refer to caption
Figure 8: (a) The true values of the parameter σ𝜎\sigmaitalic_σ are plotted against the reconstructed by the Estimator for train (black) and test (red) points. The blue dashed line indicates y=x𝑦𝑥y=xitalic_y = italic_x. (b-c) The true values of the Diffusion Maps coordinates ψ1subscript𝜓1\psi_{1}italic_ψ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and ψ2subscript𝜓2\psi_{2}italic_ψ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT are plotted against the reconstructed ψ^1subscript^𝜓1\hat{\psi}_{1}over^ start_ARG italic_ψ end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, ψ^2subscript^𝜓2\hat{\psi}_{2}over^ start_ARG italic_ψ end_ARG start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT coordinates by the autoencoder.

A.4 Forward Euler neural network

In this section, we describe how we identified the right-hand side of an ordinary differential equation directly from data. Let ν2(t)subscript𝜈2𝑡\nu_{2}(t)italic_ν start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_t ) be a state variable whose dynamics are governed by a σ𝜎\sigmaitalic_σ-dependent ODE given by the general form

ν˙2(t)=f(ν2(t);σ)subscript˙𝜈2𝑡𝑓subscript𝜈2𝑡𝜎\dot{\nu}_{2}(t)=f(\nu_{2}(t);\sigma)over˙ start_ARG italic_ν end_ARG start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_t ) = italic_f ( italic_ν start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_t ) ; italic_σ ) (14)

Our goal is to construct a neural network architecture inspired by numerical integrators of ODEs to estimate the right-hand-side f(ν2(t);σ)𝑓subscript𝜈2𝑡𝜎f(\nu_{2}(t);\sigma)italic_f ( italic_ν start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_t ) ; italic_σ ). To this end, we constructed a forward Euler residual neural network depicted in Figure 4. To train this network, we do not require long trajectories but only snapshots of the form 𝒟={ν2(t+h),ν2(t),σ,h}𝒟subscript𝜈2𝑡subscript𝜈2𝑡𝜎\mathcal{D}=\{\nu_{2}(t+h),\nu_{2}(t),\sigma,h\}caligraphic_D = { italic_ν start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_t + italic_h ) , italic_ν start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_t ) , italic_σ , italic_h } where ν2(t)subscript𝜈2𝑡\nu_{2}(t)italic_ν start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_t ) is the state variable at time t𝑡titalic_t, ν2(t+h)subscript𝜈2𝑡\nu_{2}(t+h)italic_ν start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_t + italic_h ) is the state variable after a small time-step hhitalic_h. Given sampled data in the form of 𝒟𝒟\mathcal{D}caligraphic_D we wish to approximate f𝑓fitalic_f by using a neural network, with weights denoted as θ𝜃\thetaitalic_θ.

To formulate the loss used to train the network fθsubscript𝑓𝜃f_{\theta}italic_f start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT we remind the reader that the forward Euler approximates the evolution of an ODE, by a small positive step hhitalic_h, as

ν2(t+h)=ν2(t)+hf(ν2;σ).subscript𝜈2𝑡subscript𝜈2𝑡𝑓subscript𝜈2𝜎\nu_{2}(t+h)=\nu_{2}(t)+hf(\nu_{2};\sigma).italic_ν start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_t + italic_h ) = italic_ν start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_t ) + italic_h italic_f ( italic_ν start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ; italic_σ ) . (15)

In our case, without having access to f𝑓fitalic_f but only data in the form 𝒟𝒟\mathcal{D}caligraphic_D we wish to approximate the right-hand-side with the neural network fθsubscript𝑓𝜃f_{\theta}italic_f start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT. This is achieved by performing for each pair of inputs (ν2(t),σsubscript𝜈2𝑡𝜎\nu_{2}(t),\sigmaitalic_ν start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_t ) , italic_σ) one integration step of size hhitalic_h, estimating the evolved dynamics ν2^(t+h)^subscript𝜈2𝑡\hat{\nu_{2}}(t+h)over^ start_ARG italic_ν start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG ( italic_t + italic_h ) and minimizing the loss

(θ|ν2(t),ν2(t+h),h,σ)=ν^2(t+h)ν2(t+h)2.conditional𝜃subscript𝜈2𝑡subscript𝜈2𝑡𝜎superscriptnormsubscript^𝜈2𝑡subscript𝜈2𝑡2\mathcal{L}(\theta|\nu_{2}(t),\nu_{2}(t+h),h,\sigma)=||\hat{\nu}_{2}(t+h)-\nu_% {2}(t+h)||^{2}.caligraphic_L ( italic_θ | italic_ν start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_t ) , italic_ν start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_t + italic_h ) , italic_h , italic_σ ) = | | over^ start_ARG italic_ν end_ARG start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_t + italic_h ) - italic_ν start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_t + italic_h ) | | start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT . (16)

For our computations, the time step hhitalic_h was kept constant but the overall approach can be easily extended to handle also varying time steps hhitalic_h.

A.4.1 Hyperparameter selection and training procedure

The implementation of the forward Euler neural networks was done with the Tensorflow/Keras Python libraries [46].

To train the forward Euler neural network we need to ensure our data is in the form of snapshots 𝒟𝒟\mathcal{D}caligraphic_D. To achieve that we used the Nyström extension formula (see Section A.2.1) to all the available sampled trajectories and obtained the corresponding trajectories in ψ1ψ2subscript𝜓1subscript𝜓2\psi_{1}\psi_{2}italic_ψ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_ψ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT. We then evaluated the Encoder to get trajectories in terms of ν2subscript𝜈2\nu_{2}italic_ν start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT. These two steps provided us with about 600,000600000600,000600 , 000 snapshots. We then split the data into train||||test||||validation as 80::::10::::10. We then centered and whitened the data (based on the mean and variance of the training set) and applied the same transformation to the validation and test set.

The architecture consisted of two hidden layers with 10101010 neurons each. The first hidden layer had a tanh(t) activation function and the second hidden layer had a linear activation function. We used ADAM to optimize this network. The mini-batch size was set to 32323232, the number of total epochs to 100100100100, and the learning rate to η=0.001𝜂0.001\eta=0.001italic_η = 0.001. The learning curves for the training and validation are shown in Figure 9. The ability of the network to fit the training set and generalize is shown in 9. Upon training of the network, the MSE on the train and test sets were 2.63e032.63𝑒032.63e-032.63 italic_e - 03 and 2.79e052.79𝑒052.79e-052.79 italic_e - 05 respectively. This neural network was used for the computations reported in III.3. We did not record the MSE for all the 5000500050005000 networks used to check the robustness of our approach in identifying the critical transition.

A.4.2 Euler neural network: Additional Results

Refer to caption
Refer to caption
Figure 9: (a) The learning curves for the train and validation sets. (b) The true values of the state variable ν2(t+h)subscript𝜈2𝑡\nu_{2}(t+h)italic_ν start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_t + italic_h ) plotted against the predicted by the Euler neural network ν^2(t+h)subscript^𝜈2𝑡\hat{\nu}_{2}(t+h)over^ start_ARG italic_ν end_ARG start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_t + italic_h ) for the train (black points) and test (red points).

In this section, we provide some additional results for the forward Euler neural network. The learning curves for the training and validation are shown in Figure 9. The ability of the network to fit the train set and generalize is shown in Figure 9. Note that the normalized value of the state variable ν2subscript𝜈2\nu_{2}italic_ν start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT is shown in Figure 4 and that the test points in this include values of the parameter σ𝜎\sigmaitalic_σ that are also in the training set.

A.5 Separation of time scales

In this Section, we discuss the separation of time scales for the reduced order models based on the equations for the moments proposed in [9]. As shown in Figure 10 the slowest eigenvalue λ1subscript𝜆1\lambda_{1}italic_λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT is at least one order of magnitude smaller than the second slowest λ2subscript𝜆2\lambda_{2}italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT. This suggests that the dynamics after a short transient are slaved in the direction of this slowest eigenvalue and are effectively one-dimensional.

Refer to caption
Figure 10: The eigenvalues of the Jacobian computed across a range of parameter values σ𝜎\sigmaitalic_σ for the moments’ equations with six ODEs presented in [9]. The slowest eigenvalue (upper line) λ1subscript𝜆1\lambda_{1}italic_λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT crosses the real axis at σ=1.89𝜎1.89\sigma=1.89italic_σ = 1.89 where the bifurcation occurs. The black square indicates the value of σ𝜎\sigmaitalic_σ where the (slowest) eigenvalue crosses zero.

References

  • Naldi et al. [2010] G. Naldi, L. Pareschi, and G. Toscani, Mathematical modeling of collective behavior in socio-economic and life sciences (Springer Science & Business Media, 2010).
  • Pareschi and Toscani [2013] L. Pareschi and G. Toscani, Interacting multiagent systems: kinetic equations and Monte Carlo methods (OUP Oxford, 2013).
  • Helfmann et al. [2021a] L. Helfmann, J. Heitzig, P. Koltai, J. Kurths, and C. Schütte, Statistical analysis of tipping pathways in agent-based models, The European Physical Journal Special Topics 230, 3249 (2021a).
  • Helfmann et al. [2021b] L. Helfmann, D. C.N., A. Djurdjevac, S. Winkelmann, and C. Schütte, From interacting agents to density-based modeling with stochastic PDEs, Commun. Appl. Math. Comput. Sci. 16, 1 (2021b).
  • Chayes and Panferov [2010] L. Chayes and V. Panferov, The McKean-Vlasov equation in finite volume, J. Stat. Phys. 138, 351 (2010).
  • Carrillo et al. [2020] J. A. Carrillo, R. S. Gvalani, G. A. Pavliotis, and A. Schlichting, Long-time behaviour and phase transitions for the Mckean-Vlasov equation on the torus, Arch. Ration. Mech. Anal. 235, 635 (2020).
  • Gross et al. [2006] T. Gross, C. J. D. D’Lima, and B. Blasius, Epidemic dynamics on an adaptive network, Physical review letters 96, 208701 (2006).
  • Winkelmann et al. [2021] S. Winkelmann, J. Zonker, C. Schütte, and N. D. Conrad, Mathematical modeling of spatio-temporal population dynamics and application to epidemic spreading, Mathematical biosciences 336, 108619 (2021).
  • Zagli et al. [2023] N. Zagli, G. A. Pavliotis, V. Lucarini, and A. Alecio, Dimension reduction of noisy interacting systems, Physical Review Research 5, 013078 (2023).
  • Liu et al. [2014] P. Liu, H. R. Safford, I. D. Couzin, and I. G. Kevrekidis, Coarse-grained variables for particle-based models: diffusion maps and animal swarming simulations, Computational Particle Mechanics 1, 425 (2014).
  • Liu et al. [2015] P. Liu, C. Siettos, C. W. Gear, and I. Kevrekidis, Equation-free model reduction in agent-based computations: Coarse-grained bifurcation and variable-free rare event analysis, Mathematical Modelling of Natural Phenomena 10, 71 (2015).
  • Fabiani et al. [2023] G. Fabiani, N. Evangelou, T. Cui, J. M. Bello-Rivas, C. P. Martin-Linares, C. Siettos, and I. G. Kevrekidis, Tasks makyth models: Machine learning assisted surrogates for tipping points, arXiv preprint arXiv:2309.14334  (2023).
  • Martzel and Aslangul [2001] N. Martzel and C. Aslangul, Mean-field treatment of the many-body Fokker-Planck equation, J. Phys. A 34, 11225 (2001).
  • Dawson [1983] D. A. Dawson, Critical dynamics and fluctuations for a mean-field model of cooperative behavior, J. Statist. Phys. 31, 29 (1983).
  • Acebrón et al. [2005] J. A. Acebrón, L. L. Bonilla, C. J. Pérez V., F. Ritort, and R. Spigler, The kuramoto model: A simple paradigm for synchronization phenomena, Rev. Mod. Phys. 77, 137 (2005).
  • Coifman and Lafon [2006] R. R. Coifman and S. Lafon, Diffusion maps, Applied and computational harmonic analysis 21, 5 (2006).
  • Evangelou et al. [2022] N. Evangelou, N. J. Wichrowski, G. A. Kevrekidis, F. Dietrich, M. Kooshkbaghi, S. McFann, and I. G. Kevrekidis, On the parameter combinations that matter and on those that do not: data-driven studies of parameter (non) identifiability, PNAS nexus 1, pgac154 (2022).
  • Rico-Martinez et al. [1992] R. Rico-Martinez, K. Krischer, I. Kevrekidis, M. Kube, and J. Hudson, Discrete-vs. continuous-time nonlinear signal processing of cu electrodissolution data, Chemical Engineering Communications 118, 25 (1992).
  • González-García et al. [1998] R. González-García, R. Rico-Martìnez, and I. G. Kevrekidis, Identification of distributed parameter systems: A neural net based approach, Computers & chemical engineering 22, S965 (1998).
  • Dietrich et al. [2021] F. Dietrich, A. Makeev, G. Kevrekidis, N. Evangelou, T. Bertalan, S. Reich, and I. G. Kevrekidis, Learning effective stochastic differential equations from microscopic simulations: Combining stochastic numerics and deep learning, arXiv preprint arXiv:2106.09004  (2021).
  • Evangelou et al. [2023a] N. Evangelou, F. Dietrich, J. M. Bello-Rivas, A. J. Yeh, R. S. Hendley, M. A. Bevan, and I. G. Kevrekidis, Learning effective sdes from brownian dynamic simulations of colloidal particles, Molecular Systems Design & Engineering  (2023a).
  • Van den Broeck et al. [1994] C. Van den Broeck, J. M. R. Parrondo, J. Armero, and A. Hernández-Machado, Mean field model for spatially extended systems in the presence of multiplicative noise, Phys. Rev. E 49, 2639 (1994).
  • Dsilva et al. [2018] C. J. Dsilva, R. Talmon, R. R. Coifman, and I. G. Kevrekidis, Parsimonious representation of nonlinear dynamical systems through manifold learning: A chemotaxis case study, Applied and Computational Harmonic Analysis 44, 759 (2018).
  • Lehmberg et al. [2020] D. Lehmberg, F. Dietrich, G. Köster, and H.-J. Bungartz, Datafold: Data-driven models for point clouds and time series on manifolds, Journal of Open Source Software 5, 2283 (2020).
  • Olver [2015] P. J. Olver, Modern developments in the theory and applications of moving frames, London Math. Soc. Impact150 Stories 1, 14 (2015).
  • Mattheakis et al. [2019] M. Mattheakis, P. Protopapas, D. Sondak, M. Di Giovanni, and E. Kaxiras, Physical symmetries embedded in neural networks, arXiv preprint arXiv:1904.08991  (2019).
  • Yarotsky [2022] D. Yarotsky, Universal approximations of invariant maps by neural networks, Constructive Approximation 55, 407 (2022).
  • Blum-Smith and Villar [2022] B. Blum-Smith and S. Villar, Equivariant maps from invariant functions, arXiv preprint arXiv:2209.14991  (2022).
  • Villar et al. [2022] S. Villar, W. Yao, D. W. Hogg, B. Blum-Smith, and B. Dumitrascu, Dimensionless machine learning: Imposing exact units equivariance, arXiv preprint arXiv:2204.00887  (2022).
  • Olver et al. [2023] P. J. Olver, M. Sabzevari, and F. Valiquette, Normal forms, moving frames, and differential invariants for nondegenerate hypersurfaces in c 2, The Journal of Geometric Analysis 33, 192 (2023).
  • Jin et al. [2020] P. Jin, Z. Zhang, A. Zhu, Y. Tang, and G. E. Karniadakis, Sympnets: Intrinsic structure-preserving symplectic networks for identifying hamiltonian systems, Neural Networks 132, 166 (2020).
  • Alet et al. [2021] F. Alet, D. Doblar, A. Zhou, J. Tenenbaum, K. Kawaguchi, and C. Finn, Noether networks: meta-learning useful conserved quantities, Advances in Neural Information Processing Systems 34, 16384 (2021).
  • Burby et al. [2020] J. W. Burby, Q. Tang, and R. Maulik, Computing Poincaré maps using physics-informed deep learning, Tech. Rep. (Los Alamos National Lab.(LANL), Los Alamos, NM (United States), 2020).
  • Wang et al. [2017] C. Wang, Q. Li, W. E, and B. Chazelle, Noisy Hegselmann-Krause systems: phase transition and the 2R2𝑅2R2 italic_R-conjecture, J. Stat. Phys. 166, 1209 (2017).
  • Lee et al. [2023] S. Lee, Y. M. Psarellis, C. I. Siettos, and I. G. Kevrekidis, Learning black-and gray-box chemotactic pdes/closures from agent based monte carlo simulation data, Journal of Mathematical Biology 87, 15 (2023).
  • Psarellis et al. [2022] Y. M. Psarellis, S. Lee, T. Bhattacharjee, S. S. Datta, J. M. Bello-Rivas, and I. G. Kevrekidis, Data-driven discovery of chemotactic migration of bacteria via machine learning, arXiv preprint arXiv:2208.11853  (2022).
  • Siettos [2010] C. I. Siettos, System level numerical analysis of a monte carlo simulation of the e. coli chemotaxis, arXiv preprint arXiv:1011.1467  (2010).
  • Setayeshgar et al. [2005] S. Setayeshgar, C. W. Gear, H. G. Othmer, and I. G. Kevrekidis, Application of coarse integration to bacterial chemotaxis, Multiscale Modeling & Simulation 4, 307 (2005).
  • Armaou et al. [2005] A. Armaou, I. G. Kevrekidis, and C. Theodoropoulos, Equation-free gaptooth-based controller design for distributed complex/multiscale processes, Computers & chemical engineering 29, 731 (2005).
  • Galaris et al. [2022] E. Galaris, G. Fabiani, I. Gallos, I. Kevrekidis, and C. Siettos, Numerical bifurcation analysis of pdes from lattice boltzmann model simulations: a parsimonious machine learning approach, Journal of Scientific Computing 92, 34 (2022).
  • Pavliotis and Zanoni [2022] G. A. Pavliotis and A. Zanoni, A method of moments estimator for interacting particle systems and their mean field limit (2022), arXiv:2212.00403 [math.NA] .
  • Cornalba and Fischer [2023] F. Cornalba and J. Fischer, The dean-kawasaki equation and the structure of density fluctuations in systems of diffusing particles (2023), arXiv:2109.06500 [math.AP] .
  • Evangelou et al. [2023b] N. Evangelou, F. Dietrich, E. Chiavazzo, D. Lehmberg, M. Meila, and I. G. Kevrekidis, Double diffusion maps and their latent harmonics for scientific computations in latent space, Journal of Computational Physics 485, 112072 (2023b).
  • Fowlkes et al. [2004] C. Fowlkes, S. Belongie, F. Chung, and J. Malik, Spectral grouping using the nystrom method, IEEE transactions on pattern analysis and machine intelligence 26, 214 (2004).
  • Paszke et al. [2019] A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, A. Desmaison, A. Kopf, E. Yang, Z. DeVito, M. Raison, A. Tejani, S. Chilamkurthy, B. Steiner, L. Fang, J. Bai, and S. Chintala, Pytorch: An imperative style, high-performance deep learning library, in Advances in Neural Information Processing Systems 32 (Curran Associates, Inc., 2019) pp. 8024–8035.
  • Abadi et al. [2015] M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G. S. Corrado, A. Davis, J. Dean, M. Devin, S. Ghemawat, I. Goodfellow, A. Harp, G. Irving, M. Isard, Y. Jia, R. Jozefowicz, L. Kaiser, M. Kudlur, J. Levenberg, D. Mané, R. Monga, S. Moore, D. Murray, C. Olah, M. Schuster, J. Shlens, B. Steiner, I. Sutskever, K. Talwar, P. Tucker, V. Vanhoucke, V. Vasudevan, F. Viégas, O. Vinyals, P. Warden, M. Wattenberg, M. Wicke, Y. Yu, and X. Zheng, TensorFlow: Large-scale machine learning on heterogeneous systems (2015), software available from tensorflow.org.