Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

HTML conversions sometimes display errors due to content that did not convert correctly from the source. This paper uses the following packages that are not yet supported by the HTML conversion tool. Feedback on these issues are not necessary; they are known and are being worked on.

  • failed: layouts
  • failed: tikzscale

Authors: achieve the best HTML results from your LaTeX submissions by following these best practices.

License: CC BY 4.0
arXiv:2402.16433v1 [physics.comp-ph] 26 Feb 2024

Data-Driven Acceleration of Multi-Physics Simulations

Stefan Meinecke meinecke@tu-berlin.de Institut für Theoretische Physik, Technische Universität Berlin, Hardenbergstr. 36, 10623 Berlin, Germany    Malte Selig Institut für Theoretische Physik, Technische Universität Berlin, Hardenbergstr. 36, 10623 Berlin, Germany    Felix Köster Institut für Theoretische Physik, Technische Universität Berlin, Hardenbergstr. 36, 10623 Berlin, Germany    Andreas Knorr Institut für Theoretische Physik, Technische Universität Berlin, Hardenbergstr. 36, 10623 Berlin, Germany    Kathy Lüdge kathy.luedge@tu-ilmenau.de Technische Universität Ilmenau, Institut für Physik, Weimarer Straße 25, 98693 Ilmenau, Germany
Abstract

Multi-physics simulations play a crucial role in understanding complex systems. However, their computational demands are often prohibitive due to high dimensionality and complex interactions, such that actual calculations often rely on approximations. To address this, we introduce a data-driven approach to approximate interactions among degrees of freedom of no direct interest and thus significantly reduce computational costs. Focusing on a semiconductor laser as a case study, we demonstrate the superiority of this method over traditional analytical approximations in both accuracy and efficiency. Our approach streamlines simulations, offering promise for complex multi-physics systems, especially for scenarios requiring a large number of individual simulations.

Machine learning, regression, dimensionality reduction, reduced-order model, solid-state physics, semiconductor laser, electron-phonon dynamics, non-equilibrium physics, simulation

I Introduction

Multi-physics simulations have become a powerful tool for analyzing and understanding complex physical systems. These simulations involve the integration of multiple physical models that describe the behavior of different phenomena, such as heat transfer, fluid dynamics, and electromagnetic fields. Laser simulations are a prime example of multi-physics simulation, since they are used to model the behavior of lasers in various applications, such as material processing [1], medical treatments [2, 3, 4], and communication systems [5, 6, 7]. However, performing these simulations can be computationally intensive and time-consuming [8, 9, 10, 11].

The computational costs are typically driven by a large number of degrees of freedom, which are subject to non-trivial, i.e., nonlinear mutual interactions [11]. In many cases, however, one is only interested in the dynamics of only a few of them. Due to their interdependence, one nonetheless needs to solve the complete system, to obtain the dynamics of interest.

Traditionally, analytical approximations are applied to parts of the system, to reduce the complexity of the problem and achieve reasonable computational costs. As an example, we consider a semiconductor laser, where the photon intensity in the optical cavity interacts with a coupled electron-phonon gas of the gain medium (see Fig. 1). In the literature, the momentum resolved electron and phonon dynamics are typically reduced by fixing the temperature of the phonons (bath approximation) and treating the electrons in the thermal approximation. Hence, their dynamics are calculated in the relaxation-time approximation for the effective electron temperature[12]. Such approximations compromise the model’s predictive power, but have, nonetheless, provided many insights into multi-physics phenomena so far [8, 9, 11, 13].

The successful application of the machine-learning paradigm to technological tasks such as computer vision and natural-language processing [14, 15, 16, 17, 18, 19, 20, 21] has motivated researchers to utilize data-driven approaches for scientific discovery [22, 23, 24, 25, 26, 27, 28, 29]. Their application to multi-physics simulation models intends to either improve their accuracy or reduce their computational costs [30, 31]. Successful implementations can be used for efficient parameter studies, multi-objective optimization [32, 33], and real-time model based control [34]. The different approaches can be distinguished into two classes: The data-driven model can either be designed to entirely emulate the multi-physics system [35, 36, 37, 38, 33, 39], or to integrate with system equations [40, 41, 42, 34]. Our contribution belongs to the latter class.

In this manuscript, we propose to accelerate the simulation of complex multi-physics systems by approximating the interaction terms among the degrees of freedom, which are not of direct interest, via a data-driven model. Degrees of freedom, which do not directly couple to the observables of interest can even be dropped with their effect implicitly kept in the data-driven model.

Naturally, this approach requires sufficient training data and a suitable data-driven model architecture. Both critically determine the feasibility and success of our approach. Given a new problem, an overall improvement is only achieved, if two criteria are met: Firstly, the combined efforts of generating the training data; implementing, training, and testing the data-driven model; and performing the simulations must be computationally less demanding than the direct integration of the system. Secondly, the data-driven approach must provide better accuracy than analytical approximations. Hence, one must be thoughtful and deliberate with the choice and design of the data-driven model and the generation of the training data.

We demonstrate that our data-driven approach to accelerating multi-physics simulations can outperform a traditional analytic approximation both in terms of accuracy and computational costs on a chosen toy model. The advantage of the data-driven approach becomes especially pronounced if many simulations are required.










photon intensity electron distribution phonon distribution observables of interest variables with direct coupling variables with indirect coupling semiconductor material
Figure 1: Sketch of the considered multi-physics system setup. If only the photon intensity is of interest, the degrees of freedom among the semiconductor crystal can be distinguished between direct and indirect coupling.

To demonstrate our approach, we consider a semiconductor laser. Its dynamics can be separated into two different physical domains, which are represented by the optical resonator with the photon intensity I𝐼Iitalic_I and the semiconductor crystal with the electron and phonon degrees of freedom [8, 43, 44]. Accurate quantitative simulations have been proven difficult, because they involve the self-consistent treatment of both domains, which strongly interact via the stimulated emission/absorption process [8, 43].

For our purpose, we are only interested in the photon intensity at the lasing mode. Following our data-driven approach, we therefore construct an approximation model, which only keeps dynamical variables for the photon intensity and the electron states. The phonon states, i.e., the remaining degrees of freedom of the semiconductor crystal, are dropped since they only indirectly couple to the photon intensity. All the internal interactions of the electron-phonon system are captured by a data-driven model and are then incorporated into the dynamical electron equations. Figure 1 visualizes and summarizes our subdivision of the system’s degrees of freedom and indicates their couplings.

II Methods

To model the semiconductor laser, we devise a toy model, which builds upon our previous work on electron-phonon dynamics [45], by coupling a photon intensity equation to the electrons via a stimulated emission term [8, 46]. The dynamics are described by the following equations of motion for the photon intensity I𝐼Iitalic_I and the electron occupations fksubscript𝑓𝑘f_{k}italic_f start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT:

ddtI=dd𝑡𝐼absent\displaystyle\frac{\mathrm{d}}{\mathrm{d}t}I=divide start_ARG roman_d end_ARG start_ARG roman_d italic_t end_ARG italic_I = Iτpht+kG(k)D(k)[2fk1]I,𝐼subscript𝜏phtsubscript𝑘𝐺𝑘𝐷𝑘delimited-[]2subscript𝑓𝑘1𝐼\displaystyle-\frac{I}{\tau_{\mathrm{pht}}}+\sum_{k}G(k)D(k)\left[2f_{k}-1% \right]I,- divide start_ARG italic_I end_ARG start_ARG italic_τ start_POSTSUBSCRIPT roman_pht end_POSTSUBSCRIPT end_ARG + ∑ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT italic_G ( italic_k ) italic_D ( italic_k ) [ 2 italic_f start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - 1 ] italic_I , (1)
ddtfk=dd𝑡subscript𝑓𝑘absent\displaystyle\frac{\mathrm{d}}{\mathrm{d}t}f_{k}=divide start_ARG roman_d end_ARG start_ARG roman_d italic_t end_ARG italic_f start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = G(k)[2fk1]I+tfk|col.𝐺𝑘delimited-[]2subscript𝑓𝑘1𝐼evaluated-atsubscript𝑡subscript𝑓𝑘col\displaystyle-G(k)\left[2f_{k}-1\right]I+\partial_{t}f_{k}|_{\mathrm{col}}.- italic_G ( italic_k ) [ 2 italic_f start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - 1 ] italic_I + ∂ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT italic_f start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT | start_POSTSUBSCRIPT roman_col end_POSTSUBSCRIPT . (2)

The first term in Eq. 1 describes the photon lifetime τphtsubscript𝜏pht\tau_{\mathrm{pht}}italic_τ start_POSTSUBSCRIPT roman_pht end_POSTSUBSCRIPT and the second term the interaction with the electrons via stimulated emission, where G(k)𝐺𝑘G(k)italic_G ( italic_k ) denotes the k𝑘kitalic_k-dependent gain function, D(k)𝐷𝑘D(k)italic_D ( italic_k ) the number of states at discretized k𝑘kitalic_k, and [2fk1]delimited-[]2subscript𝑓𝑘1[2f_{k}-1][ 2 italic_f start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - 1 ] the inversion. Note that the sum runs over the modulus k𝑘kitalic_k of the two-dimensional momentum 𝐤𝐤\mathbf{k}bold_k, which yields D(k)=kΔk/2π𝐷𝑘𝑘Δ𝑘2𝜋D(k)=k\Delta k/2\piitalic_D ( italic_k ) = italic_k roman_Δ italic_k / 2 italic_π with the discretization step ΔkΔ𝑘\Delta kroman_Δ italic_k. The gain function is given by

G(k)=Glinπγphsech(2ϵkϵphγph)𝐺𝑘subscript𝐺lin𝜋subscript𝛾phsech2subscriptitalic-ϵ𝑘subscriptitalic-ϵphsubscript𝛾ph\displaystyle G(k)=\frac{G_{\mathrm{lin}}}{\pi\gamma_{\mathrm{ph}}}% \operatorname{sech}{\left(\frac{2\epsilon_{k}-\epsilon_{\mathrm{ph}}}{\gamma_{% \mathrm{ph}}}\right)}italic_G ( italic_k ) = divide start_ARG italic_G start_POSTSUBSCRIPT roman_lin end_POSTSUBSCRIPT end_ARG start_ARG italic_π italic_γ start_POSTSUBSCRIPT roman_ph end_POSTSUBSCRIPT end_ARG roman_sech ( divide start_ARG 2 italic_ϵ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - italic_ϵ start_POSTSUBSCRIPT roman_ph end_POSTSUBSCRIPT end_ARG start_ARG italic_γ start_POSTSUBSCRIPT roman_ph end_POSTSUBSCRIPT end_ARG ) (3)

with the linear gain coefficient Glinsubscript𝐺linG_{\mathrm{lin}}italic_G start_POSTSUBSCRIPT roman_lin end_POSTSUBSCRIPT, the homogeneous linewidth of the transitions γphsubscript𝛾ph\gamma_{\mathrm{ph}}italic_γ start_POSTSUBSCRIPT roman_ph end_POSTSUBSCRIPT, the single particle kinetic energy ϵksubscriptitalic-ϵ𝑘\epsilon_{k}italic_ϵ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT, and the lasing-mode photon energy ϵphsubscriptitalic-ϵph\epsilon_{\mathrm{ph}}italic_ϵ start_POSTSUBSCRIPT roman_ph end_POSTSUBSCRIPT with respect to the bandgap. The factor of two accounts for the symmetric valance band. The homogeneous linewidth is modeled by a hyperbolic secant to ensure properly decaying tails [8]. The first term in Eq. 2 describes the interaction with the photon intensity via stimulated emission and the second term describes the internal processes of the semiconductor material, where tfk|colevaluated-atsubscript𝑡subscript𝑓𝑘col\partial_{t}f_{k}|_{\mathrm{col}}∂ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT italic_f start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT | start_POSTSUBSCRIPT roman_col end_POSTSUBSCRIPT summarizes all considered collision integrals.

Table 1: Laser parameters
Glinsubscript𝐺linG_{\mathrm{lin}}italic_G start_POSTSUBSCRIPT roman_lin end_POSTSUBSCRIPT 0.04 nm22{}^{2}start_FLOATSUPERSCRIPT 2 end_FLOATSUPERSCRIPTeV/fs τphtsubscript𝜏pht\tau_{\mathrm{pht}}italic_τ start_POSTSUBSCRIPT roman_pht end_POSTSUBSCRIPT 200 fs
ϵphsubscriptitalic-ϵph\epsilon_{\mathrm{ph}}italic_ϵ start_POSTSUBSCRIPT roman_ph end_POSTSUBSCRIPT 0.035 eV γphsubscript𝛾ph\gamma_{\mathrm{ph}}italic_γ start_POSTSUBSCRIPT roman_ph end_POSTSUBSCRIPT 0.01 eV

We choose laser parameters, which are inspired by a nanolaser [13, 47] and are summarized in Table 1. The characteristically large optical confinement factor and short photon lifetime yield photon intensity dynamics on a time scale similar to the electron dynamics [13]. Hence, this setup relies on the proper modeling of the latter and likely highlights any shortcomings.

Our microscopic model for the collision term includes the interaction with two optical and two acoustic phonon branches and omits electron-electron interaction. Hence, the microscopic evaluation of the collision term complements the laser model with dynamical equations for the phonon occupation numbers nqαsuperscriptsubscript𝑛𝑞𝛼n_{q}^{\alpha}italic_n start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT, where q𝑞qitalic_q denotes the crystal momentum and α𝛼\alphaitalic_α the branch. A detailed description is presented in Appendix A. The microscopic evaluation of the collision term

tfk|colmicro{fk,nqα}evaluated-atsubscript𝑡subscript𝑓𝑘colmicrosubscript𝑓𝑘superscriptsubscript𝑛𝑞𝛼\displaystyle\partial_{t}f_{k}|_{\mathrm{col}}^{\mathrm{micro}}\left\{f_{k},n_% {q}^{\alpha}\right\}∂ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT italic_f start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT | start_POSTSUBSCRIPT roman_col end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_micro end_POSTSUPERSCRIPT { italic_f start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_n start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT } (4)

is typically expensive, which in turn drives the total computational costs of the simulation. Hence, one utilizes approximations for the collision term, to achieve reasonable computation times.

With our data-driven approach, we replace the microscopic evaluation of the collision term with a trained data-driven model

tfk|coldata{fk(t),fk(tΔt),,fk(t(1)Δt)}evaluated-atsubscript𝑡subscript𝑓𝑘coldatasubscript𝑓𝑘𝑡subscript𝑓𝑘𝑡Δ𝑡subscript𝑓𝑘𝑡1Δ𝑡\displaystyle\partial_{t}f_{k}|_{\mathrm{col}}^{\mathrm{data}}\left\{f_{k}(t),% f_{k}(t-\Delta t),\dots,f_{k}(t-(\ell-1)\Delta t)\right\}∂ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT italic_f start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT | start_POSTSUBSCRIPT roman_col end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_data end_POSTSUPERSCRIPT { italic_f start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_t ) , italic_f start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_t - roman_Δ italic_t ) , … , italic_f start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_t - ( roman_ℓ - 1 ) roman_Δ italic_t ) } (5)

where, instead of the phonon occupation numbers nqαsuperscriptsubscript𝑛𝑞𝛼n_{q}^{\alpha}italic_n start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT, a delay embedding of the electron occupations fksubscript𝑓𝑘f_{k}italic_f start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT with the past \ellroman_ℓ states is implemented. This yields delay-differential equations and can be thought of as a Takens embedding [48]. For our purpose, we compose a delay-embedded regressive reduced-order model (derrom) via the package [45], which we built and maintain ourselves. In short, derrom first projects the past \ellroman_ℓ system states into a reduced dimensionality (order) latent space, which is constructed from the first r𝑟ritalic_r left singular vectors of the training data. This step is designed to retain the dominant patterns of the input data and remove redundant and irrelevant information. Next, the latent space features are normalized to appropriate magnitudes and thereafter nonlinearly transformed, to yield derrom’s feature vector. The nonlinear transformation is constructed via L𝐿Litalic_L hyperbolic tangent activation functions with random weights and biases. Lastly, the regression step is taken via a linear map 𝐖𝐖\mathbf{W}bold_W. Both the dimensionality reduction and the regression step are trained with the supervised learning paradigm, where we minimize the Frobenius norm of the residual error between the training data and the corresponding reconstruction/prediction. More details on the delay-embedded regressive reduced-order model are presented in Appendix B.

To assess the utility of our proposed data-driven approach, we implement a two-temperature approximation (tta) for the collision term [12]. This approximation assumes a steady state distribution for electrons and phonons at all times. Hence, the electron and phonon occupations are then described by effective electron and phonon temperature instead of a fully momentum resolved distribution function [12] Consequently, the electron dynamics can then be described as an exponential relaxation towards a Fermi-Dirac distribution with the phonon temperature Tphsubscript𝑇phT_{\mathrm{ph}}italic_T start_POSTSUBSCRIPT roman_ph end_POSTSUBSCRIPT. The phonon temperature Tphsubscript𝑇phT_{\mathrm{ph}}italic_T start_POSTSUBSCRIPT roman_ph end_POSTSUBSCRIPT itself is included as another dynamical quantity, whose relaxation is driven by the difference to the quasi-equilibrium temperature of the electron distribution. The two-temperature approximation

tfk|coltta{fk,Tph}evaluated-atsubscript𝑡subscript𝑓𝑘colttasubscript𝑓𝑘subscript𝑇ph\displaystyle\partial_{t}f_{k}|_{\mathrm{col}}^{\mathrm{tta}}\left\{f_{k},T_{% \mathrm{ph}}\right\}∂ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT italic_f start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT | start_POSTSUBSCRIPT roman_col end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_tta end_POSTSUPERSCRIPT { italic_f start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_T start_POSTSUBSCRIPT roman_ph end_POSTSUBSCRIPT } (6)

thus reduces the computational costs, but has the major caveat that it is only valid for electron distributions, which are sufficiently close to an equilibrium state. Such relaxation-time approximations are, nonetheless, often used for the simulation of laser dynamics [8, 9, 10, 49, 50, 13] due to a lack of viable alternatives. A detailed description of the two-temperature approximation and an example relaxation are presented in Appendix C.

III Results

III.1 Test Scenario Setup

To study the applicability of the data-driven approach, we design a test scenario, in which we are interested in the laser’s response to being optically pumped by a short pulse far above the band edge. Specifically, we already know that the laser emits a pulse if the pump energy is sufficient and we want to study how the pulse peak power and the pulse position (the delay after the pump pulse) depend on the power, spectral position, and spectral width of the pump pulse. The pump pulse is implemented as an initial electron state with a Gaussian distribution, where the maximum reflects the power, the mean the center frequency, and the standard deviation the spectral width of the pump laser.

Refer to caption
Figure 2: Demonstration of the laser dynamics and approximation methods for representative initial conditions. (a) Photon intensity as obtained from the microscopic model (solid blue), data-driven model (dashed orange), and the two-temperature approximation (dashed-dotted green) normalized to the median pulse maximum of the training/testing data set. (b) Color-coded electron occupation as function of the single-particle energy and time. The inset shows the distribution at t=2.0ps𝑡2.0pst=2.0\,\mathrm{ps}italic_t = 2.0 roman_ps. (c1) and (c2) color-coded electron occupation errors of the data-driven model and the two-temperature approximation, respectively.

An example trajectory is presented in Fig. 2, where the blue line in (a) shows the normalized photon intensity and (b) shows the conduction-band electron distribution as a function of the single-particle energy. The additional plots and panels regarding the different approximations will be discussed later on. After an initial lag of 1.2psabsent1.2ps\approx 1.2\,\mathrm{ps}≈ 1.2 roman_ps, the photon intensity I𝐼Iitalic_I rises to form a pulse which peaks around 2.0psabsent2.0ps\approx 2.0\,\mathrm{ps}≈ 2.0 roman_ps and then quickly decays again. Much of the observed photon intensity dynamics can be explained by the electron dynamics. Initially, the electrons, which have been excited by the pump with a center energy of 0.160eV0.160eV0.160\,\mathrm{eV}0.160 roman_eV, decay towards their quasi-Fermi distribution via the electron-phonon interaction. This process supplies the gain to the lasing mode, which is located at 0.0175eV0.0175eV0.0175\,\mathrm{eV}0.0175 roman_eV and thus causes some of the observed lag. Once the laser has turned on, the photon intensity clearly burns a spectral hole into the electron distribution. Moreover, a periodic pattern appears with a period of 30meV30meV30\,\mathrm{meV}30 roman_meV, which corresponds to the energy of the optical phonons. Hence, the refilling of the initial spectral hole via the interaction with optical phonons causes characteristic secondary spectral holes in the electron distribution. The generated pattern is highlighted in the inset in Fig. 2 (b). Once the available gain is used up, the laser pulse quickly dies and the electron distribution approaches a Fermi distribution. From that example, we can deduce that the details of the electron dynamics, i.e., the collision term, strongly determine the temporal position and maximum value of the photon intensity pulse. Therefore, we score candidate approximations of the collision term with respect to their ability to achieve small errors of those two quantities (details in Appendix D).

III.2 Training-Data Generation and Training

Since, we want to evaluate both the reduction of the simulation time and the accuracy of our data-driven approach, we set ourselves the goal of simulating 1000 trajectories of 10ps10ps10\,\mathrm{ps}10 roman_ps length with varying pumping conditions. For that purpose, we construct 1000 initial conditions, where we draw the maximum, mean, and standard deviation of the Gaussian initial conditions from the uniform distributions 𝒰max(0.5,0.99)subscript𝒰max0.50.99\mathcal{U}_{\mathrm{max}}(0.5,0.99)caligraphic_U start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT ( 0.5 , 0.99 ), 𝒰mean(0.1eV,0.2eV)subscript𝒰mean0.1eV0.2eV\mathcal{U}_{\mathrm{mean}}(0.1\,\mathrm{eV},0.2\,\mathrm{eV})caligraphic_U start_POSTSUBSCRIPT roman_mean end_POSTSUBSCRIPT ( 0.1 roman_eV , 0.2 roman_eV ), and 𝒰std(0.04eV,0.06eV)subscript𝒰std0.04eV0.06eV\mathcal{U}_{\mathrm{std}}(0.04\,\mathrm{eV},0.06\,\mathrm{eV})caligraphic_U start_POSTSUBSCRIPT roman_std end_POSTSUBSCRIPT ( 0.04 roman_eV , 0.06 roman_eV ), respectively.

To operate the data-driven approach, we first require sufficient training data. Since, we a priori don’t know the required number of training trajectories, we implement the following strategy: We first set a quantitative accuracy target, which is to be evaluated via a 10-fold cross-validation procedure on the training data (see Appendix D). For our purpose, we define the accuracy measure to be the sum of the 90ths percentiles of the normalized absolute pulse maximum errors ΔImaxΔsubscript𝐼max\Delta I_{\mathrm{max}}roman_Δ italic_I start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT and the normalized absolute pulse maximum position errors ΔtImaxΔsubscript𝑡subscript𝐼max\Delta t_{I_{\mathrm{max}}}roman_Δ italic_t start_POSTSUBSCRIPT italic_I start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT end_POSTSUBSCRIPT, i.e.,

A=P90{|ΔImax|Imaxmd}+P90{|ΔtImax|tImaxmd}𝐴subscript𝑃90Δsubscript𝐼maxsubscriptdelimited-⟨⟩subscript𝐼maxmdsubscript𝑃90Δsubscript𝑡subscript𝐼maxsubscriptdelimited-⟨⟩subscript𝑡subscript𝐼maxmd\displaystyle A=P_{90}\left\{\frac{|\Delta I_{\mathrm{max}}|}{\langle I_{% \mathrm{max}}\rangle_{\mathrm{md}}}\right\}+P_{90}\left\{\frac{|\Delta t_{I_{% \mathrm{max}}}|}{\langle t_{I_{\mathrm{max}}}\rangle_{\mathrm{md}}}\right\}italic_A = italic_P start_POSTSUBSCRIPT 90 end_POSTSUBSCRIPT { divide start_ARG | roman_Δ italic_I start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT | end_ARG start_ARG ⟨ italic_I start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT ⟩ start_POSTSUBSCRIPT roman_md end_POSTSUBSCRIPT end_ARG } + italic_P start_POSTSUBSCRIPT 90 end_POSTSUBSCRIPT { divide start_ARG | roman_Δ italic_t start_POSTSUBSCRIPT italic_I start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT end_POSTSUBSCRIPT | end_ARG start_ARG ⟨ italic_t start_POSTSUBSCRIPT italic_I start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT end_POSTSUBSCRIPT ⟩ start_POSTSUBSCRIPT roman_md end_POSTSUBSCRIPT end_ARG } (7)

and require it to be smaller than 0.20.20.20.2. Note that we specifically use robust statistics to mitigate the effect of outliers.

In the next step, we start simulating trajectories using the microscopic model in quasi-logarithmic steps, i.e., 10,20,50,100,10205010010,20,50,100,...10 , 20 , 50 , 100 , …. After each step, we set out to obtain an optimal data-driven model via a heuristic hyperparameter optimization. Based on our previous experience with the electron-phonon system [45], we keep a fixed delay embedding of =22\ell=2roman_ℓ = 2 and a fixed number of nonlinear nodes L=1000𝐿1000L=1000italic_L = 1000 and optimize the number of reduced dimensions drsubscript𝑑𝑟d_{r}italic_d start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT and the regularization strength α𝛼\alphaitalic_α via a simple grid search. This procedure can be assigned to the class of exploration-labeling-training (ELT) algorithms [40, 51], whereat the exploration is performed via the randomly drawn initial conditions. Once such a model meets the accuracy target, it can be used for the further simulation of the remaining trajectories.

Refer to caption
Figure 3: Simulation of 1000 trajectories. Cross-validated training set error of the pulse maximum (a) and the pulse maximum position (b) as function of the number of training trajectories n𝑛nitalic_n for the best performing model. Only the modulus of the deviations is considered and normalization is performed with respect to the median pulse maximum and median pulse maximum position. The whiskers denote the 10th and 90th percentile. (c) Simulation time as a function of the number of trajectories. Blue denotes the microscopic model, orange denotes the data-driven model with n=50𝑛50n=50italic_n = 50 (solid) and n=20𝑛20n=20italic_n = 20 (dashed) training trajectories, and green denotes the two-temperature approximation.

The accuracy results of this strategy are presented in Fig. 3. Panels (a) and (b) show the training set cross-validated errors for the normalized absolute pulse maximum error and the normalized absolute pulse maximum position error as a function of the number training trajectories n𝑛nitalic_n. In all cases, the best performing model is shown. In both plots, the horizontal black bar denotes the median, the box the interquartile range, and the whiskers the 10th and 90th percentile. Hence, the sum of the top whiskers represents the accuracy metric Eq. 7. For an increasing number of training trajectories, the observed errors decrease, i.e., all indicators shift to smaller values. We find that the accuracy target Eq. 7 is met for n=50𝑛50n=50italic_n = 50 training trajectories with a value of An=500.16subscript𝐴𝑛500.16A_{n=50}\approx 0.16italic_A start_POSTSUBSCRIPT italic_n = 50 end_POSTSUBSCRIPT ≈ 0.16. Thus, we can now use the obtained data-driven model for the further simulation of trajectories. Before moving on, however, we would like to stress, that choosing a suitable accuracy target is the human operator’s responsibility. Our choice is motivated by achieving a competitive performance with respect to accuracy and overall simulation time. For comparison, the two-temperature approximation only achieves an accuracy of Atta1.5subscript𝐴tta1.5A_{\mathrm{tta}}\approx 1.5italic_A start_POSTSUBSCRIPT roman_tta end_POSTSUBSCRIPT ≈ 1.5 for the considered 50 training trajectories.

III.3 Simulation-Time Evaluation

For the evaluation of the computation time, we simulate all 1000 trajectories using the microscopic model, the data-driven model with the outlined strategy, and the two-temperature approximation. The setup of the latter is described in Appendix C. Details on the implementation are given in Appendix E. Figure 3 (c) plots the total simulation time as a function of the number of trajectories. For the microscopic model (blue line), the simulation time increases linearly and amounts to 3.7habsent3.7h\approx 3.7\,\mathrm{h}≈ 3.7 roman_h for all trajectories. This corresponds to an average of 13.3sabsent13.3s\approx 13.3\,\mathrm{s}≈ 13.3 roman_s per trajectory. Similarly, the time for the two-temperature approximation (green solid line) also increases linearly, but only amounts to 0.73habsent0.73h\approx 0.73\,\mathrm{h}≈ 0.73 roman_h for all trajectories, which corresponds to 2.5sabsent2.5s\approx 2.5\,\mathrm{s}≈ 2.5 roman_s per trajectory.

The data-driven approach (orange solid line), on the other hand, shows a distinctive two-stage behavior, which is due to the initial simulation of training data and model hyperparameter optimization. However, once a sufficient model is obtained, it only takes 0.58sabsent0.58s\approx 0.58\,\mathrm{s}≈ 0.58 roman_s on average to simulate a trajectory. The total simulation time for all 1000 trajectories then amounts to 0.68habsent0.68h\approx 0.68\,\mathrm{h}≈ 0.68 roman_h. The crossing point with the microscopic model is found at n=147𝑛147n=147italic_n = 147 trajectories and with the two-temperature approximation at n=900𝑛900n=900italic_n = 900 trajectories.

At this point, we also want to highlight that the position of crossing points is determined by the initial costs of obtaining the data-driven model. E.g., if we set the accuracy target Eq. 7 to 0.250.250.250.25 instead of of 0.20.20.20.2, n=20𝑛20n=20italic_n = 20 training trajectories are sufficient. The corresponding simulation time is shown via the orange dashed line. For that case, the time to obtain all 1000 trajectories now only amounts to 0.36habsent0.36h\approx 0.36\,\mathrm{h}≈ 0.36 roman_h and the crossing point with the microscopic model is found at n=57𝑛57n=57italic_n = 57 and with the two-temperature approximation at n=305𝑛305n=305italic_n = 305.

In the limit of large numbers of trajectories, i.e., if the initial costs of obtaining the data-driven model can be neglected, the data-driven approach offers an acceleration factor of 20absent20\approx 20≈ 20 compared to the microscopic model. The two-temperature approximation, on the other hand, only offers a factor of 4absent4\approx 4≈ 4. While the latter is deceptively easy to write down, its implementation requires to fit a Fermi-function to the electron distribution in each step, which drives the computational costs. The data-driven model, on the other side, only performs a few vector and matrix operations, and one nonlinear transformation, which turn out to be relatively inexpensive. Thus, the data-driven approach also exhibits a performance advantage over the analytic approximation.

III.4 Accuracy Evaluation

Refer to caption
Figure 4: Histograms of the pulse maximum error (a), the pulse position error (b) and the electron distribution error (c). Orange denotes the data-driven model and green the two-temperature approximation. The former two are normalized by the median of the pulse maximum and pulse maximum position. The latter amounts to tImaxmd=1.93subscriptdelimited-⟨⟩subscript𝑡subscript𝐼maxmd1.93\langle t_{I_{\mathrm{max}}}\rangle_{\mathrm{md}}=1.93⟨ italic_t start_POSTSUBSCRIPT italic_I start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT end_POSTSUBSCRIPT ⟩ start_POSTSUBSCRIPT roman_md end_POSTSUBSCRIPT = 1.93 ns.

Having benchmarked the simulation time performances, we now continue with the detailed evaluation of the simulation accuracy. Implicitly, we have already done that via the training error in Fig. 3 and the accuracy target Eq. 7. At this point, however, we want to strictly separate the training and testing data. For that purpose, we use the optimal data-driven model trained with n=50𝑛50n=50italic_n = 50 trajectories and compare its approximation of the 950 trajectories of our test scenario to the trajectories computed with the two-temperature approximation with respect to ground truth produced by the microscopic model.

To start off with, we would first like to exemplify both approximation methods via the representative trajectory show in Fig. 2. In panel (a), the data-driven model is denoted by the orange dashed line and the two-temperature approximation by the green dash-dotted line. We find that the data-driven model beautifully tracks the microscopic model with the pulse maximum only slightly to small and slightly shifted to later times. The two-temperature approximation, on the other hand, only gets the rough qualitative feature right: The pulse maximum is underestimated by more than a factor of two, the pulse is shifted by 0.5psabsent0.5ps\approx 0.5\,\mathrm{ps}≈ 0.5 roman_ps to later times, and the trailing edge decays much slower.

The apparent different behavior of the two approximations can be explained by their respective electron distribution dynamics, which are presented in Fig. 2 (c) in terms of the occupation error fϵkapprx(t)fϵkmicro(t)superscriptsubscript𝑓subscriptitalic-ϵ𝑘apprx𝑡superscriptsubscript𝑓subscriptitalic-ϵ𝑘micro𝑡f_{\epsilon_{k}}^{\mathrm{apprx}}(t)-f_{\epsilon_{k}}^{\mathrm{micro}}(t)italic_f start_POSTSUBSCRIPT italic_ϵ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_apprx end_POSTSUPERSCRIPT ( italic_t ) - italic_f start_POSTSUBSCRIPT italic_ϵ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_micro end_POSTSUPERSCRIPT ( italic_t ) with respect to the microscopic simulation shown in (b). The occupation errors produced by the data-driven model Fig. 2 (c1) are hardly visible on the common color scale, since they are about one magnitude smaller than those of the two-temperature approximation Fig. 2 (c2). Moreover, the two-temperature approximation misses important qualitative features of the microscopic model. Firstly, the initial relaxation is to slow and thus produces the blue area in the bottom left of the figure. Secondly, it does not exhibit the characteristic periodic structure produced by interaction with the optical phonons. Lastly, the refilling of the spectral hole, which is caused by the lasing process, also occurs to slowly and thus leaves the pronounced red region at energies above the lasing mode (as identified by the white horizontal stripe for t>2ps𝑡2pst>2\,\mathrm{ps}italic_t > 2 roman_ps).

Shortcomings of the two-temperature approximation are to be expected, since the appearing electron distributions are far away from a quasi equilibrium. However, it is important to note, that the quantitative impact onto the photon intensity dynamics cannot be known a priori. Hence, a responsible utilization of the two-temperature approximation should include a quantitative estimation of the potential errors.

To provide a broad picture of the approximation accuracy of the 950 test trajectories, Fig. 4 shows histograms of the normalized pulse maximum error (a), the normalized pulse maximum position error (b), and the electron distribution rms error (c) for both the data-driven model (orange) and the two-temperature approximation (green).

Taking a look at the pulse maximum error first, we find that the data-driven model produces deviations that are almost centered above zero and hardly exceed ±0.2plus-or-minus0.2\pm 0.2± 0.2. The two-temperature approximation, on the other hand, exclusively underestimates the true maximum and generates normalized errors up to 1.251.25-1.25- 1.25 with a median of 0.6absent0.6\approx-0.6≈ - 0.6.

Moving on to the pulse maximum position error, we find that the data-driven model produces errors, which are approximately centered above zero and do not exceed ±0.1plus-or-minus0.1\pm 0.1± 0.1. This time, the two-temperature approximation exclusively overestimates the pulse maximum positions and thus produces normalized errors with a median of 0.17absent0.17\approx 0.17≈ 0.17 and outliers up to 0.720.720.720.72.

We can therefore conclude that the two-temperature approximation systematically underestimates the pulse maximum and overestimates the pulse position, while the data-driven model rather suffers from (much smaller) random errors. We attribute the behavior of the two-temperature approximation to the insufficient description of the electron dynamics for distributions for away from a quasi-equilibrium, as discussed previously.

Hence, we lastly take a look at the electron distribution rms error, which by definition is positive. The data-driven model exhibits a median error of 0.015absent0.015\approx 0.015≈ 0.015 and the two-temperature approximation a median error of 0.066absent0.066\approx 0.066≈ 0.066. Note, that the rms error is not specific to the relevant physical features of the electron distribution, e.g., the gain at the lasing mode, and is therefore not an optimal predictor for the pulse accuracy. Nonetheless, it indicates the much better performance of the data-driven method over the two-temperature approximation.

With those results, we conclude that the example trajectories presented in Fig. 2 are representative both of the data-driven approach and the two-temperature approximation. Both the example and the statistics demonstrate the much better accuracy of the data-driven approach when compared to the two-temperature approximation. This can also be expressed in terms of the accuracy metric Eq. 7, which yields a value of Atta1.38subscript𝐴tta1.38A_{\mathrm{tta}}\approx 1.38italic_A start_POSTSUBSCRIPT roman_tta end_POSTSUBSCRIPT ≈ 1.38 for the two-temperature approximation and a value of An=500.16subscript𝐴𝑛500.16A_{n=50}\approx 0.16italic_A start_POSTSUBSCRIPT italic_n = 50 end_POSTSUBSCRIPT ≈ 0.16 for selected data-driven model on the 950 test trajectories.

IV Discussion and Conclusions

In this work, we have proposed a data-driven approach to accelerate multi-physics simulations and exemplarily demonstrated its application to a semiconductor laser model. The primary goal was to reduce computational costs without compromising accuracy compared to traditional analytical approximations. In our test scenario, the results establish a substantial advantage of the data-driven method over the two-temperature approximation in both computational efficiency and accuracy.

While demonstrating promising results, this study also has certain limitations. Naturally, the quantitative results regarding the accuracy and computation time are specific to the considered system, the mathematical model, the test scenario, and the implementation. However, we believe that the presented case study does not represent a best case scenario for data-driven approaches and that the potential performance gains may be much greater in other setups. This is because our toy semiconductor laser model only considers electron-phonon interactions in the collision term and no electron-electron interactions. The latter, however, are numerically more expensive to evaluate [8] and their inclusion would therefore give further advantage to the data-driven approach. This would become especially relevant if a finer discretization was required due to the scaling behavior of the microscopic evaluation (quadratic) and the data-driven model (linear) with the number discretization points.

Moreover, we showed that the data-driven approach only becomes feasible if a certain minimum number of trajectories is required due to the training costs of the data-driven model. However, it is easy to think of cases, where one would need to simulate many more than the 1000 trajectories of our test scenario. For a parameter study, a three-dimensional grid with 1000 samples only corresponds to ten sample values per parameter. Adding further parameters or increasing the sample density then quickly drives up the number of samples. If one where to include stochastic effects, e.g., spontaneous emission to our laser, each parameter combination would require multiple trajectory realizations to obtain proper statistics. Hence, the initial training costs can become negligible in many potential cases.

Both the microscopic evaluation and the data-driven model can benefit from a more efficient implementation. This may include the exact mathematical formulation, the simulation code, and in the case of the data-driven approach, the structure of the model and its training. The latter thus offers further options for a more efficient implementation and thus performance improvements. It thereby also puts the burden of doing so on the human operator. This stresses the importance of the careful construction and training of the data-driven model. Nonetheless, we are convinced that this task is manageable at present, since our case study already demonstrated excellent results without extensive model design experimentation and optimization. We expect future research to further establish practical guidelines for the successful design and training of data-driven approximation models.

One promising future research path is the integration of physical knowledge into the data-driven model. This could be done via the structure of the model itself, a combination with a knowledge-based approximation, or the training procedure [41]. In our considered case of the semiconductor laser, the proper collision term only redistributes charge-carriers and thus induces no net change of the charge-carrier density. At present, this physical knowledge is not included in the model and therefore small deviations can be observed. A straight-forward option to fixing this, could be the penalization of such deviations during the training via the cost function.

In conclusion, the data-driven approach presented in this study holds significant promise for accelerating multi-physics simulations in various fields, including but not limited to semiconductor lasers. Its superior performance in computational efficiency and accuracy, as compared to traditional approximations, may open avenues for further exploration and application in diverse scientific domains and facilitate rapid prototyping and optimization of complex systems.

V Acknowledgements

We acknowledge fruitful discussions with Dominik Christiansen (TU Berlin).
This work was funded by the Deutsche Forschungsgemeinschaft (DFG) through Project SE 3098/1, Project No. 432266622 (M.S.), SFB 951, Project No. 182087777 (A.K.).

VI Author contributions

S.M. and M.S. initiated and conceptualized the work. M.S. and A.K. worked on the microscopic derivation and the numerical implementation of electron-phonon scattering and parameter search. S.M. implemented the laser model and the data-driven model; performed the simulations and numerical experiments; and drafted the manuscript. All authors discussed and edited the manuscript.

VII Data Availability

The data generated in this work can be generated by running the publicly available code as described in the code availability statement.

VIII Code Availability

The simulation code and the regression code is available on GitHub under MIT license (https://github.com/stmeinecke/derrom)

Appendix A Microscopic Electron-Phonon Equations

We have described the coupled electron-phonon dynamics in our previous publication [45] and only present a brief summary here. The equations of motion for the electron occupations f𝐤subscript𝑓𝐤f_{\mathbf{k}}italic_f start_POSTSUBSCRIPT bold_k end_POSTSUBSCRIPT and phonon occupations n𝐪αsuperscriptsubscript𝑛𝐪𝛼n_{\mathbf{q}}^{\alpha}italic_n start_POSTSUBSCRIPT bold_q end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT are derived from a Hamiltonian via the Heisenberg equation. The electron dispersion is treated in the parabolic approximation, the acoustic phonon dispersion in the Debye approximation, and the optical phonon dispersion in the Einstein approximation, with parameters from ab-initio calculations[52, 53]. The occurring hierarchy problem is addressed using a correlation expansion and a second-order Born-Markov approximation[54]. The resulting coupled electron-phonon Boltzmann scattering equations then read

tf𝐤|colmicroevaluated-atsubscript𝑡subscript𝑓𝐤colmicro\displaystyle\partial_{t}f_{\mathbf{k}}|_{\mathrm{col}}^{\mathrm{micro}}∂ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT italic_f start_POSTSUBSCRIPT bold_k end_POSTSUBSCRIPT | start_POSTSUBSCRIPT roman_col end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_micro end_POSTSUPERSCRIPT =2π𝐤,α,±|g𝐤𝐤|2(12±12+n𝐤𝐤α)(1f𝐤)f𝐤δ(ϵ𝐤ϵ𝐤±ω𝐤𝐤α)absent2𝜋Planck-constant-over-2-pisubscriptsuperscript𝐤𝛼plus-or-minussuperscriptsubscript𝑔𝐤superscript𝐤2plus-or-minus1212superscriptsubscript𝑛𝐤superscript𝐤𝛼1subscript𝑓𝐤subscript𝑓superscript𝐤𝛿plus-or-minussubscriptitalic-ϵ𝐤subscriptitalic-ϵsuperscript𝐤Planck-constant-over-2-pisubscriptsuperscript𝜔𝛼𝐤superscript𝐤\displaystyle=\frac{2\pi}{\hbar}\sum_{\mathbf{k^{\prime}},\alpha,\pm}|g_{% \mathbf{k}-\mathbf{k^{\prime}}}|^{2}\left(\frac{1}{2}\pm\frac{1}{2}+n_{\mathbf% {\mathbf{k}-\mathbf{k^{\prime}}}}^{\alpha}\right)\left(1-f_{\mathbf{k}}\right)% f_{\mathbf{k^{\prime}}}\delta(\epsilon_{\mathbf{k}}-\epsilon_{\mathbf{k^{% \prime}}}\pm\hbar\omega^{\alpha}_{\mathbf{k-k^{\prime}}})= divide start_ARG 2 italic_π end_ARG start_ARG roman_ℏ end_ARG ∑ start_POSTSUBSCRIPT bold_k start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_α , ± end_POSTSUBSCRIPT | italic_g start_POSTSUBSCRIPT bold_k - bold_k start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT | start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( divide start_ARG 1 end_ARG start_ARG 2 end_ARG ± divide start_ARG 1 end_ARG start_ARG 2 end_ARG + italic_n start_POSTSUBSCRIPT bold_k - bold_k start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT ) ( 1 - italic_f start_POSTSUBSCRIPT bold_k end_POSTSUBSCRIPT ) italic_f start_POSTSUBSCRIPT bold_k start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_δ ( italic_ϵ start_POSTSUBSCRIPT bold_k end_POSTSUBSCRIPT - italic_ϵ start_POSTSUBSCRIPT bold_k start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ± roman_ℏ italic_ω start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT start_POSTSUBSCRIPT bold_k - bold_k start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT )
2π𝐤,α,±|g𝐤𝐤|2(12±12+n𝐤𝐤α)(1f𝐤)f𝐤δ(ϵ𝐤ϵ𝐤ω𝐤𝐤α)2𝜋Planck-constant-over-2-pisubscriptsuperscript𝐤𝛼plus-or-minussuperscriptsubscript𝑔𝐤superscript𝐤2plus-or-minus1212superscriptsubscript𝑛𝐤superscript𝐤𝛼1subscript𝑓superscript𝐤subscript𝑓𝐤𝛿minus-or-plussubscriptitalic-ϵ𝐤subscriptitalic-ϵsuperscript𝐤Planck-constant-over-2-pisubscriptsuperscript𝜔𝛼𝐤superscript𝐤\displaystyle-\frac{2\pi}{\hbar}\sum_{\mathbf{k^{\prime}},\alpha,\pm}|g_{% \mathbf{k}-\mathbf{k^{\prime}}}|^{2}\left(\frac{1}{2}\pm\frac{1}{2}+n_{\mathbf% {\mathbf{k}-\mathbf{k^{\prime}}}}^{\alpha}\right)\left(1-f_{\mathbf{k^{\prime}% }}\right)f_{\mathbf{k}}\delta(\epsilon_{\mathbf{k}}-\epsilon_{\mathbf{k^{% \prime}}}\mp\hbar\omega^{\alpha}_{\mathbf{k-k^{\prime}}})- divide start_ARG 2 italic_π end_ARG start_ARG roman_ℏ end_ARG ∑ start_POSTSUBSCRIPT bold_k start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_α , ± end_POSTSUBSCRIPT | italic_g start_POSTSUBSCRIPT bold_k - bold_k start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT | start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( divide start_ARG 1 end_ARG start_ARG 2 end_ARG ± divide start_ARG 1 end_ARG start_ARG 2 end_ARG + italic_n start_POSTSUBSCRIPT bold_k - bold_k start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT ) ( 1 - italic_f start_POSTSUBSCRIPT bold_k start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ) italic_f start_POSTSUBSCRIPT bold_k end_POSTSUBSCRIPT italic_δ ( italic_ϵ start_POSTSUBSCRIPT bold_k end_POSTSUBSCRIPT - italic_ϵ start_POSTSUBSCRIPT bold_k start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ∓ roman_ℏ italic_ω start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT start_POSTSUBSCRIPT bold_k - bold_k start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ) (8)
tn𝐪αsubscript𝑡superscriptsubscript𝑛𝐪𝛼\displaystyle\partial_{t}n_{\mathbf{q}}^{\alpha}∂ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT italic_n start_POSTSUBSCRIPT bold_q end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT =2π|g𝐪α|2𝐤(1f𝐤)f𝐤+𝐪(1+n𝐪)δ(ϵ𝐤ϵ𝐤+𝐪+ω𝐪α)absent2𝜋Planck-constant-over-2-pisuperscriptsuperscriptsubscript𝑔𝐪𝛼2subscript𝐤1subscript𝑓𝐤subscript𝑓𝐤𝐪1subscript𝑛𝐪𝛿subscriptitalic-ϵ𝐤subscriptitalic-ϵ𝐤𝐪Planck-constant-over-2-pisubscriptsuperscript𝜔𝛼𝐪\displaystyle=\frac{2\pi}{\hbar}|g_{\mathbf{q}}^{\alpha}|^{2}\sum_{\mathbf{k}}% \left(1-f_{\mathbf{k}}\right)f_{\mathbf{k+q}}\left(1+n_{\mathbf{q}}\right)% \delta(\epsilon_{\mathbf{k}}-\epsilon_{\mathbf{k+q}}+\hbar\omega^{\alpha}_{% \mathbf{q}})= divide start_ARG 2 italic_π end_ARG start_ARG roman_ℏ end_ARG | italic_g start_POSTSUBSCRIPT bold_q end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT | start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT bold_k end_POSTSUBSCRIPT ( 1 - italic_f start_POSTSUBSCRIPT bold_k end_POSTSUBSCRIPT ) italic_f start_POSTSUBSCRIPT bold_k + bold_q end_POSTSUBSCRIPT ( 1 + italic_n start_POSTSUBSCRIPT bold_q end_POSTSUBSCRIPT ) italic_δ ( italic_ϵ start_POSTSUBSCRIPT bold_k end_POSTSUBSCRIPT - italic_ϵ start_POSTSUBSCRIPT bold_k + bold_q end_POSTSUBSCRIPT + roman_ℏ italic_ω start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT start_POSTSUBSCRIPT bold_q end_POSTSUBSCRIPT )
2π|g𝐪α|2𝐤(1f𝐤+𝐪)f𝐤n𝐪δ(ϵ𝐤ϵ𝐤+𝐪ω𝐪α).2𝜋Planck-constant-over-2-pisuperscriptsuperscriptsubscript𝑔𝐪𝛼2subscript𝐤1subscript𝑓𝐤𝐪subscript𝑓𝐤subscript𝑛𝐪𝛿subscriptitalic-ϵ𝐤subscriptitalic-ϵ𝐤𝐪Planck-constant-over-2-pisubscriptsuperscript𝜔𝛼𝐪\displaystyle-\frac{2\pi}{\hbar}|g_{\mathbf{q}}^{\alpha}|^{2}\sum_{\mathbf{k}}% \left(1-f_{\mathbf{k+q}}\right)f_{\mathbf{k}}n_{\mathbf{q}}\delta(\epsilon_{% \mathbf{k}}-\epsilon_{\mathbf{k+q}}-\hbar\omega^{\alpha}_{\mathbf{q}}).- divide start_ARG 2 italic_π end_ARG start_ARG roman_ℏ end_ARG | italic_g start_POSTSUBSCRIPT bold_q end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT | start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT bold_k end_POSTSUBSCRIPT ( 1 - italic_f start_POSTSUBSCRIPT bold_k + bold_q end_POSTSUBSCRIPT ) italic_f start_POSTSUBSCRIPT bold_k end_POSTSUBSCRIPT italic_n start_POSTSUBSCRIPT bold_q end_POSTSUBSCRIPT italic_δ ( italic_ϵ start_POSTSUBSCRIPT bold_k end_POSTSUBSCRIPT - italic_ϵ start_POSTSUBSCRIPT bold_k + bold_q end_POSTSUBSCRIPT - roman_ℏ italic_ω start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT start_POSTSUBSCRIPT bold_q end_POSTSUBSCRIPT ) . (9)

The appearing electron coupling elements g𝐤subscript𝑔𝐤g_{\mathbf{k}}italic_g start_POSTSUBSCRIPT bold_k end_POSTSUBSCRIPT are treated in the effective deformation potential approximation 53

gqi=2ρΩiAVq,superscriptsubscript𝑔𝑞𝑖Planck-constant-over-2-pi2𝜌superscriptΩ𝑖𝐴subscript𝑉𝑞g_{q}^{i}=\sqrt{\frac{\hbar}{2\rho\Omega^{i}A}}V_{q},italic_g start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT = square-root start_ARG divide start_ARG roman_ℏ end_ARG start_ARG 2 italic_ρ roman_Ω start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT italic_A end_ARG end_ARG italic_V start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT , (10)

with the effective mass density of the unit cell ρ𝜌\rhoitalic_ρ and the semiconductor area A𝐴Aitalic_A. The effective deformation potential reads Vq=D1qsubscript𝑉𝑞subscript𝐷1𝑞V_{q}=D_{1}qitalic_V start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT = italic_D start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_q for acoustic phonon coupling, and Vq=D0subscript𝑉𝑞subscript𝐷0V_{q}=D_{0}italic_V start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT = italic_D start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT for optical phonons. We evaluate the system for two-dimensional MoSe2 [53, 55, 56, 57, 58] and take the parameters from reference 55 and list them in Table 2.

Table 2: Material parameters.
mesubscript𝑚𝑒m_{e}italic_m start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT/m0subscript𝑚0m_{0}italic_m start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT 0.5 52
ω¯osuperscript¯𝜔𝑜\bar{\omega}^{o}over¯ start_ARG italic_ω end_ARG start_POSTSUPERSCRIPT italic_o end_POSTSUPERSCRIPT/meV 36 55
cLAsubscript𝑐𝐿𝐴c_{LA}italic_c start_POSTSUBSCRIPT italic_L italic_A end_POSTSUBSCRIPT/(nm/fs) 4.1 55
D1asubscriptsuperscript𝐷𝑎1D^{a}_{1}italic_D start_POSTSUPERSCRIPT italic_a end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT/eV 3.4 55
D0osubscriptsuperscript𝐷𝑜0D^{o}_{0}italic_D start_POSTSUPERSCRIPT italic_o end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT/eV nm11{}^{-1}start_FLOATSUPERSCRIPT - 1 end_FLOATSUPERSCRIPT 52 55

Appendix B Delay Embedded Regressive Reduced Order Model












𝐬n+1subscript𝐬𝑛1\mathbf{s}_{n-\ell+1}bold_s start_POSTSUBSCRIPT italic_n - roman_ℓ + 1 end_POSTSUBSCRIPT\cdots𝐬n1subscript𝐬𝑛1\mathbf{s}_{n-1}bold_s start_POSTSUBSCRIPT italic_n - 1 end_POSTSUBSCRIPT𝐬nsubscript𝐬𝑛\mathbf{s}_{n}bold_s start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT𝐫n+1subscript𝐫𝑛1\mathbf{r}_{n-\ell+1}bold_r start_POSTSUBSCRIPT italic_n - roman_ℓ + 1 end_POSTSUBSCRIPT\cdots𝐫n1subscript𝐫𝑛1\mathbf{r}_{n-1}bold_r start_POSTSUBSCRIPT italic_n - 1 end_POSTSUBSCRIPT𝐫nsubscript𝐫𝑛\mathbf{r}_{n}bold_r start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT𝐫~n+1subscript~𝐫𝑛1\mathbf{\widetilde{r}}_{n-\ell+1}over~ start_ARG bold_r end_ARG start_POSTSUBSCRIPT italic_n - roman_ℓ + 1 end_POSTSUBSCRIPT\cdots𝐫~n1subscript~𝐫𝑛1\mathbf{\widetilde{r}}_{n-1}over~ start_ARG bold_r end_ARG start_POSTSUBSCRIPT italic_n - 1 end_POSTSUBSCRIPT𝐫~nsubscript~𝐫𝑛\mathbf{\widetilde{r}}_{n}over~ start_ARG bold_r end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT𝐟(𝐫~n,,𝐫~n+1)=𝐟n𝐟subscript~𝐫𝑛subscript~𝐫𝑛1subscript𝐟𝑛\mathbf{f}(\mathbf{\widetilde{r}}_{n},\dots,\mathbf{\widetilde{r}}_{n-\ell+1})% =\mathbf{f}_{n}bold_f ( over~ start_ARG bold_r end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , … , over~ start_ARG bold_r end_ARG start_POSTSUBSCRIPT italic_n - roman_ℓ + 1 end_POSTSUBSCRIPT ) = bold_f start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT𝐩nT=𝐟nT𝐖superscriptsubscript𝐩𝑛𝑇superscriptsubscript𝐟𝑛𝑇𝐖\mathbf{p}_{n}^{T}=\mathbf{f}_{n}^{T}\mathbf{W}bold_p start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT = bold_f start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT bold_Wsystem stateptreduced stateptscaled reduced stateptfeature vectorptpredictiontime
Figure 5: Sketch of the delay-embedded regressive reduced order model (derrom). The past \ellroman_ℓ system states 𝐬nsubscript𝐬𝑛\mathbf{s}_{n}bold_s start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT are reduced in their dimensionality 𝐫nsubscript𝐫𝑛\mathbf{r}_{n}bold_r start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT, scaled 𝐫~nsubscript~𝐫𝑛\mathbf{\widetilde{r}}_{n}over~ start_ARG bold_r end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT, concatenated and then nonlinearly transformed to yield the feature vector 𝐟nsubscript𝐟𝑛\mathbf{f}_{n}bold_f start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT. The prediction 𝐩nsubscript𝐩𝑛\mathbf{p}_{n}bold_p start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT is computed via a linear map with the trained weights 𝐖𝐖\mathbf{W}bold_W.

In this section, we present and discuss the delay-embedded regressive reduced order model (derrom) in detail. The processing and flow of information, coming from the past \ellroman_ℓ system states, can be separated into multiple individual steps and is illustrated in Fig. 5. In this figure, time moves to the right and information flows to the bottom. In the first step, the past \ellroman_ℓ (truncated) system states 𝐬nsubscript𝐬𝑛\mathbf{s}_{n}bold_s start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT are collected (orange boxes). If they are truncated, only a subset of the system state variables is considered, e.g., only the electron occupation numbers enter the data-driven model in this work. In the next step, each system state is projected into a reduced order latent space 𝐫nsubscript𝐫𝑛\mathbf{r}_{n}bold_r start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT (green boxes). Next, a scaling of each feature, i.e., element in the reduced states, produces the scaled reduced system states 𝐫~nsubscript~𝐫𝑛\mathbf{\widetilde{r}}_{n}over~ start_ARG bold_r end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT (blue boxes). Their concatenation (light blue box) represents the linear feature vector of the model. The nonlinear feature vector 𝐟nsubscript𝐟𝑛\mathbf{f}_{n}bold_f start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT (purple box) is then generated via the nonlinear transformation 𝐟()𝐟\mathbf{f}(\cdot)bold_f ( ⋅ ). Note that the feature scaling is highly relevant, because it affects the nonlinear response of 𝐟𝐟\mathbf{f}bold_f. Lastly, a linear regression step is carried out via the matrix 𝐖𝐖\mathbf{W}bold_W to produce the prediction 𝐩𝐩\mathbf{p}bold_p.

For the dimensionality reduction stage, we use the left singular vectors of the singular value decomposition (SVD) [59, 60]

𝐒T=𝐔𝚺𝐕T,superscript𝐒𝑇𝐔𝚺superscript𝐕𝑇\displaystyle\mathbf{S}^{T}=\mathbf{U}\mathbf{\Sigma}\mathbf{V}^{T},bold_S start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT = bold_U bold_Σ bold_V start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT , (11)

which has proven most efficient in our previous work on the electron-phonon system [45]. The SVD has a rich history in science and engineering [59, 60] and is known across different disciplines as the Karhunen–Loève transform (KLT) [61, 62], empirical orthogonal functions [63], proper orthogonal decomposition (POD) [64], and canonical correlation analysis [65]. The 𝐔𝐔\mathbf{U}bold_U and 𝐕𝐕\mathbf{V}bold_V matrices are unitary and contain the left and right singular vectors of 𝐒Tsuperscript𝐒𝑇\mathbf{S}^{T}bold_S start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT. 𝚺𝚺\mathbf{\Sigma}bold_Σ is a diagonal matrix with the singular values in descending order. Their magnitude naturally organizes the left and right singular vectors according to their share in the reconstruction of the data matrix 𝐒Tsuperscript𝐒𝑇\mathbf{S}^{T}bold_S start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT. Specifically, the Eckart-Young theorem [66, 60] states that the best rank-r𝑟ritalic_r approximation of a matrix with respect to the Frobenius norm can be achieved via the truncation over the leading r𝑟ritalic_r singular values of the SVD. To reduce a system state 𝐬nsubscript𝐬𝑛\mathbf{s}_{n}bold_s start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT, we compute the SVD of the training data to obtain the basis 𝐔𝐔\mathbf{U}bold_U. We then use the truncated matrix 𝐔r=(𝐮1,,𝐮r)subscript𝐔𝑟subscript𝐮1subscript𝐮𝑟\mathbf{U}_{r}=(\mathbf{u}_{1},\dots,\mathbf{u}_{r})bold_U start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT = ( bold_u start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , bold_u start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ) to project 𝐬nsubscript𝐬𝑛\mathbf{s}_{n}bold_s start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT onto the first r𝑟ritalic_r left singular vectors 𝐮msubscript𝐮𝑚\mathbf{u}_{m}bold_u start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT to obtain the reduced state

𝐫n=(r1,,rr)T=𝐔rT𝐬n.subscript𝐫𝑛superscriptsubscript𝑟1subscript𝑟𝑟𝑇superscriptsubscript𝐔𝑟𝑇subscript𝐬𝑛\displaystyle\mathbf{r}_{n}=(r_{1},\dots,r_{r})^{T}=\mathbf{U}_{r}^{T}\mathbf{% s}_{n}.bold_r start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT = ( italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_r start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT = bold_U start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT bold_s start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT . (12)

Note, that this dimensionality reduction scheme is entirely data driven. We thus expect its performance to depend on the quality of the training data. To obtain a basis, that is well suited for the task, the training data should thus be representative of the relevant system state subspace.

In the feature scaling stage, we standardize the individual features by subtracting their mean μmsubscript𝜇𝑚\mu_{m}italic_μ start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT and scaling them to unit variance

r~nm=rnmμmσm,subscript~𝑟𝑛𝑚subscript𝑟𝑛𝑚subscript𝜇𝑚subscript𝜎𝑚\displaystyle\widetilde{r}_{nm}=\frac{r_{nm}-\mu_{m}}{\sigma_{m}},over~ start_ARG italic_r end_ARG start_POSTSUBSCRIPT italic_n italic_m end_POSTSUBSCRIPT = divide start_ARG italic_r start_POSTSUBSCRIPT italic_n italic_m end_POSTSUBSCRIPT - italic_μ start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_ARG start_ARG italic_σ start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_ARG , (13)

where σmsubscript𝜎𝑚\sigma_{m}italic_σ start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT denotes the standard deviation of the m𝑚mitalic_m-th feature in the training data.

The delay embedding is achieved by concatenating the past \ellroman_ℓ system states 𝐫~~𝐫\widetilde{\mathbf{r}}over~ start_ARG bold_r end_ARG

𝐟nlin=𝐫~n𝐫~n1𝐫~n+1,superscriptsubscript𝐟𝑛lindirect-sumsubscript~𝐫𝑛subscript~𝐫𝑛1subscript~𝐫𝑛1\displaystyle\mathbf{f}_{n}^{\mathrm{lin}}=\widetilde{\mathbf{r}}_{n}\oplus% \widetilde{\mathbf{r}}_{n-1}\oplus\dots\oplus\widetilde{\mathbf{r}}_{n-\ell+1},bold_f start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_lin end_POSTSUPERSCRIPT = over~ start_ARG bold_r end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ⊕ over~ start_ARG bold_r end_ARG start_POSTSUBSCRIPT italic_n - 1 end_POSTSUBSCRIPT ⊕ ⋯ ⊕ over~ start_ARG bold_r end_ARG start_POSTSUBSCRIPT italic_n - roman_ℓ + 1 end_POSTSUBSCRIPT , (14)

where direct-sum\oplus denotes the concatenation operator. The result represents the linear feature vector 𝐟nlinsuperscriptsubscript𝐟𝑛lin\mathbf{f}_{n}^{\mathrm{lin}}bold_f start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_lin end_POSTSUPERSCRIPT of the data-driven model. For a reduced dimensionality drsubscript𝑑𝑟d_{r}italic_d start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT of the state vectors 𝐫~nsubscript~𝐫𝑛\widetilde{\mathbf{r}}_{n}over~ start_ARG bold_r end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT, the dimension of the linear feature vector 𝐟nlinsuperscriptsubscript𝐟𝑛lin\mathbf{f}_{n}^{\mathrm{lin}}bold_f start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_lin end_POSTSUPERSCRIPT thus becomes drsubscript𝑑𝑟d_{r}\ellitalic_d start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT roman_ℓ.

For the nonlinear transformation, we construct the feature vector according to

𝐟n=1𝐟nlin𝐟nl(𝐟nlin),subscript𝐟𝑛direct-sum1superscriptsubscript𝐟𝑛linsuperscript𝐟nlsuperscriptsubscript𝐟𝑛lin\displaystyle\mathbf{f}_{n}=1\oplus\mathbf{f}_{n}^{\mathrm{lin}}\oplus\mathbf{% f}^{\mathrm{nl}}(\mathbf{f}_{n}^{\mathrm{lin}}),bold_f start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT = 1 ⊕ bold_f start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_lin end_POSTSUPERSCRIPT ⊕ bold_f start_POSTSUPERSCRIPT roman_nl end_POSTSUPERSCRIPT ( bold_f start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_lin end_POSTSUPERSCRIPT ) , (15)

where we add a bias term, denoted by the one, and explicitly keep the linear feature vector. The nonlinear features are represented by 𝐟nl(𝐟nlin)superscript𝐟nlsuperscriptsubscript𝐟𝑛lin\mathbf{f}^{\mathrm{nl}}(\mathbf{f}_{n}^{\mathrm{lin}})bold_f start_POSTSUPERSCRIPT roman_nl end_POSTSUPERSCRIPT ( bold_f start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_lin end_POSTSUPERSCRIPT ). In our case, the nonlinear transformation is given by

𝐟nl(𝐟nlin)=ϕ(𝐖nl𝐟nlin+𝜷),superscript𝐟nlsuperscriptsubscript𝐟𝑛linitalic-ϕsubscript𝐖nlsuperscriptsubscript𝐟𝑛lin𝜷\displaystyle\mathbf{f}^{\mathrm{nl}}(\mathbf{f}_{n}^{\mathrm{lin}})=\phi\left% (\mathbf{W}_{\mathrm{nl}}\mathbf{f}_{n}^{\mathrm{lin}}+\boldsymbol{\beta}% \right),bold_f start_POSTSUPERSCRIPT roman_nl end_POSTSUPERSCRIPT ( bold_f start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_lin end_POSTSUPERSCRIPT ) = italic_ϕ ( bold_W start_POSTSUBSCRIPT roman_nl end_POSTSUBSCRIPT bold_f start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_lin end_POSTSUPERSCRIPT + bold_italic_β ) , (16)

where a nonlinear activation function ϕitalic-ϕ\phiitalic_ϕ acts on each element of the input vector, which is generated by the weight matrix 𝐖nlL×dfsubscript𝐖nlsuperscript𝐿subscript𝑑𝑓\mathbf{W}_{\mathrm{nl}}\in\mathbb{R}^{L\times d_{f}}bold_W start_POSTSUBSCRIPT roman_nl end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_L × italic_d start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT end_POSTSUPERSCRIPT, the input (linear feature) vector 𝐟nlinsuperscriptsubscript𝐟𝑛lin\mathbf{f}_{n}^{\mathrm{lin}}bold_f start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_lin end_POSTSUPERSCRIPT, and the bias vector 𝜷L×df𝜷superscript𝐿subscript𝑑𝑓\boldsymbol{\beta}\in\mathbb{R}^{L\times d_{f}}bold_italic_β ∈ blackboard_R start_POSTSUPERSCRIPT italic_L × italic_d start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT end_POSTSUPERSCRIPT. This nonlinear transformation can be represented by a fully connected feed-forward network with one hidden layer, to which a nonlinear activation function is applied. The matrix 𝐖nlsubscript𝐖nl\mathbf{W}_{\mathrm{nl}}bold_W start_POSTSUBSCRIPT roman_nl end_POSTSUBSCRIPT defines the number of nonlinear nodes L𝐿Litalic_L in the hidden layer via its shape L×dfsuperscript𝐿subscript𝑑𝑓\mathbb{R}^{L\times d_{f}}blackboard_R start_POSTSUPERSCRIPT italic_L × italic_d start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT end_POSTSUPERSCRIPT. The weights wmnnlsuperscriptsubscript𝑤𝑚𝑛nlw_{mn}^{\mathrm{nl}}italic_w start_POSTSUBSCRIPT italic_m italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_nl end_POSTSUPERSCRIPT are drawn from a normal distribution 𝒩𝐖(0,df1)superscript𝒩𝐖0superscriptsubscript𝑑𝑓1\mathcal{N}^{\mathbf{W}}(0,d_{f}^{-1})caligraphic_N start_POSTSUPERSCRIPT bold_W end_POSTSUPERSCRIPT ( 0 , italic_d start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ) with zero mean and the variance given be the inverse feature vector dimension dfsubscript𝑑𝑓d_{f}italic_d start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT. Given the standardization of the features, the scaling of the variance ensures that the expected magnitude of the inputs unl=mwlmnlfnmlinsubscript𝑢𝑛𝑙subscript𝑚superscriptsubscript𝑤𝑙𝑚nlsubscriptsuperscript𝑓lin𝑛𝑚u_{nl}=\sum_{m}w_{lm}^{\mathrm{nl}}f^{\mathrm{lin}}_{nm}italic_u start_POSTSUBSCRIPT italic_n italic_l end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT italic_w start_POSTSUBSCRIPT italic_l italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_nl end_POSTSUPERSCRIPT italic_f start_POSTSUPERSCRIPT roman_lin end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n italic_m end_POSTSUBSCRIPT is of the order 1absent1\approx 1≈ 1. This design has been shown to best harness the tanh\tanhroman_tanh nonlinaerity and produce optimal regression results [67]. With the same argument, the biases βmsubscript𝛽𝑚\beta_{m}italic_β start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT are drawn from the uniform distribution 𝒰β(1.0,1.0)superscript𝒰𝛽1.01.0\mathcal{U}^{\mathbf{\beta}}(-1.0,1.0)caligraphic_U start_POSTSUPERSCRIPT italic_β end_POSTSUPERSCRIPT ( - 1.0 , 1.0 ). If one does not optimize the parameters of the transformation, i.e., 𝐖nlsubscript𝐖nl\mathbf{W}_{\mathrm{nl}}bold_W start_POSTSUBSCRIPT roman_nl end_POSTSUBSCRIPT and 𝜷𝜷\boldsymbol{\beta}bold_italic_β, in the training of the model, this approach is also known as an extreme learning machine (ELM) [68, 69, 70, 71, 72].

Lastly, the regression step is performed via the linear map

𝐩nT=𝐟nT𝐖,superscriptsubscript𝐩𝑛𝑇superscriptsubscript𝐟𝑛𝑇𝐖\displaystyle\mathbf{p}_{n}^{T}=\mathbf{f}_{n}^{T}\mathbf{W},bold_p start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT = bold_f start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT bold_W , (17)

where 𝐖df×dp𝐖superscriptsubscript𝑑𝑓subscript𝑑𝑝\mathbf{W}\in\mathbb{R}^{d_{f}\times d_{p}}bold_W ∈ blackboard_R start_POSTSUPERSCRIPT italic_d start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT × italic_d start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT end_POSTSUPERSCRIPT denotes the regression weight matrix with the dimensions of the feature vector dfsubscript𝑑𝑓d_{f}italic_d start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT and the prediction dpsubscript𝑑𝑝d_{p}italic_d start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT.

The model is trained using the supervised learning paradigm, i.e., the model parameters are tuned to minimize a loss function of some labeled training data and its corresponding model output. The training of our proposed model includes the dimensionality reduction stage and the regression stage. For the sake of training simplicity, we train the dimensionality reduction stage and the regression weights separately. I.e., we compute the SVD for the dimensionality reduction stage and then optimize the regression weights 𝐖𝐖\mathbf{W}bold_W for a given reduced dimensionality drsubscript𝑑𝑟d_{r}italic_d start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT.

To facilitate the model training, we first sample M𝑀Mitalic_M trajectories at NMsubscript𝑁𝑀N_{M}italic_N start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT discrete times and build the data matrix

𝐒=(𝐬11,,𝐬nm,,𝐬NMM)T,𝐒superscriptsuperscriptsubscript𝐬11superscriptsubscript𝐬𝑛𝑚superscriptsubscript𝐬subscript𝑁𝑀𝑀𝑇\displaystyle\mathbf{S}=(\mathbf{s}_{1}^{1},\dots,\mathbf{s}_{n}^{m},\dots,% \mathbf{s}_{N_{M}}^{M})^{T},bold_S = ( bold_s start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT , … , bold_s start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT , … , bold_s start_POSTSUBSCRIPT italic_N start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_M end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT , (18)

where the rows correspond to the system state vectors 𝐬nmsuperscriptsubscript𝐬𝑛𝑚\mathbf{s}_{n}^{m}bold_s start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT of the m𝑚mitalic_m-th trajectory at the time tnmsuperscriptsubscript𝑡𝑛𝑚t_{n}^{m}italic_t start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT. We then proceed to construct the feature matrix

𝐅=(𝐟11,,𝐟nm,,𝐟NMM)T,𝐅superscriptsuperscriptsubscript𝐟11superscriptsubscript𝐟𝑛𝑚superscriptsubscript𝐟subscript𝑁𝑀𝑀𝑇\displaystyle\mathbf{F}=(\mathbf{f}_{1}^{1},\dots,\mathbf{f}_{n}^{m},\dots,% \mathbf{f}_{N_{M}}^{M})^{T},bold_F = ( bold_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT , … , bold_f start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT , … , bold_f start_POSTSUBSCRIPT italic_N start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_M end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT , (19)

where the feature vectors 𝐟nmsuperscriptsubscript𝐟𝑛𝑚\mathbf{f}_{n}^{m}bold_f start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT are computed according to Eq. 15 from the system states. For delay embeddings (>11\ell>1roman_ℓ > 1), the trajectories 𝐬nmsuperscriptsubscript𝐬𝑛𝑚\mathbf{s}_{n}^{m}bold_s start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT are padded with the trajectories first state 𝐬1msuperscriptsubscript𝐬1𝑚\mathbf{s}_{1}^{m}bold_s start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT to calculate the first feature vectors 𝐟n<msubscriptsuperscript𝐟𝑚𝑛\mathbf{f}^{m}_{n<\ell}bold_f start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n < roman_ℓ end_POSTSUBSCRIPT. Lastly, we construct the target matrix

𝐓=(𝐭11,,𝐭nm,,𝐭NMM)T,𝐓superscriptsuperscriptsubscript𝐭11superscriptsubscript𝐭𝑛𝑚superscriptsubscript𝐭subscript𝑁𝑀𝑀𝑇\displaystyle\mathbf{T}=(\mathbf{t}_{1}^{1},\dots,\mathbf{t}_{n}^{m},\dots,% \mathbf{t}_{N_{M}}^{M})^{T},bold_T = ( bold_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT , … , bold_t start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT , … , bold_t start_POSTSUBSCRIPT italic_N start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_M end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT , (20)

where the vector 𝐭nmsuperscriptsubscript𝐭𝑛𝑚\mathbf{t}_{n}^{m}bold_t start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT is to be predicted based on the feature vector 𝐟nmsuperscriptsubscript𝐟𝑛𝑚\mathbf{f}_{n}^{m}bold_f start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT. In this work, it represents the k𝑘kitalic_k-resolved collision terms in Eq. 2. The regression weights 𝐖𝐖\mathbf{W}bold_W are then determined by solving the least-squares problem

argmin𝐖[𝐅𝐖𝐓F2+α𝐖F2],subscriptargmin𝐖superscriptsubscriptdelimited-∥∥𝐅𝐖𝐓F2𝛼superscriptsubscriptdelimited-∥∥𝐖F2\displaystyle\operatorname*{arg\,min}_{\mathbf{W}}\left[\left\lVert\mathbf{F}% \mathbf{W}-\mathbf{T}\right\rVert_{\mathrm{F}}^{2}+\alpha\left\lVert\mathbf{W}% \right\rVert_{\mathrm{F}}^{2}\right],start_OPERATOR roman_arg roman_min end_OPERATOR start_POSTSUBSCRIPT bold_W end_POSTSUBSCRIPT [ ∥ bold_FW - bold_T ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_α ∥ bold_W ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] , (21)

where Fsubscriptdelimited-∥∥F\left\lVert\cdot\right\rVert_{\mathrm{F}}∥ ⋅ ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT is the Frobenius norm and α𝛼\alphaitalic_α the Tikhonov regularization (ridge) parameter [73, 74, 60]. A nonzero α𝛼\alphaitalic_α penalizes large weights wnmsubscript𝑤𝑛𝑚w_{nm}italic_w start_POSTSUBSCRIPT italic_n italic_m end_POSTSUBSCRIPT and thereby counteracts overfitting while also ensuring that a solution exists. The solution to this problem reads

𝐖=(𝐅T𝐅+α𝐈)1𝐅T𝐓,𝐖superscriptsuperscript𝐅𝑇𝐅𝛼𝐈1superscript𝐅𝑇𝐓\displaystyle\mathbf{W}=\left(\mathbf{F}^{T}\mathbf{F}+\alpha\mathbf{I}\right)% ^{-1}\mathbf{F}^{T}\mathbf{T},bold_W = ( bold_F start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT bold_F + italic_α bold_I ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT bold_F start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT bold_T , (22)

where ()1superscript1(\cdot)^{-1}( ⋅ ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT, depending on the rank of (𝐅T𝐅+α𝐈)superscript𝐅𝑇𝐅𝛼𝐈\left(\mathbf{F}^{T}\mathbf{F}+\alpha\mathbf{I}\right)( bold_F start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT bold_F + italic_α bold_I ), either denotes the inverse or the Moore–Penrose pseudo inverse.

Apart from the model parameters, which are optimized in the training process, four hyperparameters remain: the delay embedding depth \ellroman_ℓ, the reduced system state dimension drsubscript𝑑𝑟d_{r}italic_d start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT, the number of nonlinear nodes L𝐿Litalic_L, and the regularization parameter α𝛼\alphaitalic_α. To obtain a well performing model, those must be set to appropriate values, either by systematic hyperparameter optimization schemes or based on educated guesses.

Appendix C Two-Temperature Approximation

The two-temperature approximation improves upon a relaxation-time approximation with a fixed phonon-bath temperature by introducing a dynamical phonon temperature Tphsubscript𝑇phT_{\mathrm{ph}}italic_T start_POSTSUBSCRIPT roman_ph end_POSTSUBSCRIPT, which represents the combined quasi-equilibrium temperature of all phonon branches. Hence, the heating or cooling of the semiconductor lattice via the electron-phonon interaction is taken into account, albeit within a relaxation-time approximation: The dynamics of the phonon temperature are driven by its difference to the quasi-equilibrium temperature of the electrons. This modification causes the electrons to relax towards a Fermi distribution with the common equilibrium temperature of both the electron and the phonon system. In the considered setup, the common equilibrium temperature depends upon the initial non-equilibrium electron distribution, which is generated by the pump pulse. The respective equations of motion for the electron occupation numbers and the phonon temperature then read

tfk|coltta=evaluated-atsubscript𝑡subscript𝑓𝑘colttaabsent\displaystyle\partial_{t}f_{k}|_{\mathrm{col}}^{\mathrm{tta}}=∂ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT italic_f start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT | start_POSTSUBSCRIPT roman_col end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_tta end_POSTSUPERSCRIPT = 1τel[F(ϵk,μ,Tph)fk],1subscript𝜏eldelimited-[]𝐹subscriptitalic-ϵ𝑘𝜇subscript𝑇phsubscript𝑓𝑘\displaystyle\frac{1}{\tau_{\mathrm{el}}}\left[F(\epsilon_{k},\mu,T_{\mathrm{% ph}})-f_{k}\right],divide start_ARG 1 end_ARG start_ARG italic_τ start_POSTSUBSCRIPT roman_el end_POSTSUBSCRIPT end_ARG [ italic_F ( italic_ϵ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_μ , italic_T start_POSTSUBSCRIPT roman_ph end_POSTSUBSCRIPT ) - italic_f start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ] , (23)
ddtTph=dd𝑡subscript𝑇phabsent\displaystyle\frac{\mathrm{d}}{\mathrm{d}t}T_{\mathrm{ph}}=divide start_ARG roman_d end_ARG start_ARG roman_d italic_t end_ARG italic_T start_POSTSUBSCRIPT roman_ph end_POSTSUBSCRIPT = 1τph[TelTph],1subscript𝜏phdelimited-[]subscript𝑇elsubscript𝑇ph\displaystyle\frac{1}{\tau_{\mathrm{ph}}}\left[T_{\mathrm{el}}-T_{\mathrm{ph}}% \right],divide start_ARG 1 end_ARG start_ARG italic_τ start_POSTSUBSCRIPT roman_ph end_POSTSUBSCRIPT end_ARG [ italic_T start_POSTSUBSCRIPT roman_el end_POSTSUBSCRIPT - italic_T start_POSTSUBSCRIPT roman_ph end_POSTSUBSCRIPT ] , (24)

where F()𝐹F(\cdot)italic_F ( ⋅ ) denotes the Fermi distribution, τelsubscript𝜏el\tau_{\mathrm{el}}italic_τ start_POSTSUBSCRIPT roman_el end_POSTSUBSCRIPT and τphsubscript𝜏ph\tau_{\mathrm{ph}}italic_τ start_POSTSUBSCRIPT roman_ph end_POSTSUBSCRIPT the relaxation time-constants for the electron distribution and the phonon temperature, and Telsubscript𝑇elT_{\mathrm{el}}italic_T start_POSTSUBSCRIPT roman_el end_POSTSUBSCRIPT the quasi-equilibrium temperature of the electrons.

To evaluate the right-hand-side of the equations 23 and 24, both the chemical potential μ𝜇\muitalic_μ and the quasi-equilibrium temperature Telsubscript𝑇elT_{\mathrm{el}}italic_T start_POSTSUBSCRIPT roman_el end_POSTSUBSCRIPT must be calculated dynamically. Telsubscript𝑇elT_{\mathrm{el}}italic_T start_POSTSUBSCRIPT roman_el end_POSTSUBSCRIPT is uniquely determined by the electron density and the total electron energy density of the current distribution {fk}subscript𝑓𝑘\{f_{k}\}{ italic_f start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT } and μ𝜇\muitalic_μ is uniquely determined by the electron density (for a given Tphsubscript𝑇phT_{\mathrm{ph}}italic_T start_POSTSUBSCRIPT roman_ph end_POSTSUBSCRIPT). Hence, both can be obtained by solving the optimization problem of matching the density and total energy density of the quasi-Fermi distribution to the current electron distribution. The optimizations are solved iteratively via Newton’s method. Note that this process requires multiple evaluations of the Fermi function, which represents the largest contribution to the computational costs of this approximation.

Refer to caption
Figure 6: Temperature relaxation of the coupled electron-phonon system. The electron system is plotted in red and the phonon system in blue. Solid lines indicate the microscopic model and dashed lines the two-temperature approximation with the time constants τel=500fssubscript𝜏el500fs\tau_{\mathrm{el}}=500\,\mathrm{fs}italic_τ start_POSTSUBSCRIPT roman_el end_POSTSUBSCRIPT = 500 roman_fs and τlat=4000fssubscript𝜏lat4000fs\tau_{\mathrm{lat}}=4000\,\mathrm{fs}italic_τ start_POSTSUBSCRIPT roman_lat end_POSTSUBSCRIPT = 4000 roman_fs.

To setup the two-temperature approximation, one must also determine the time constants τelsubscript𝜏el\tau_{\mathrm{el}}italic_τ start_POSTSUBSCRIPT roman_el end_POSTSUBSCRIPT and τphsubscript𝜏ph\tau_{\mathrm{ph}}italic_τ start_POSTSUBSCRIPT roman_ph end_POSTSUBSCRIPT. Since, we have the microscopic model at hand, we treat the time constants as fit parameters, which are to be determined from a sample trajectory. For that purpose, we simulate the electron-phonon system without the photon intensity. The phonons are initialized with a T=300K𝑇300KT=300\,\mathrm{K}italic_T = 300 roman_K Bose-Einstein distribution and the electrons with an out-of-equilibrium Gaussian distribution. The electron-phonon interactions then drives the coupled system towards an equilibrium state, which is characterized by a common temperature.

Figure 6 plots the quasi-equilibrium temperature of the electrons (red) and the phonons (blue) with solid lines as a function of time. These temperatures are obtained by fitting Fermi/Bose distributions to the electron and phonon distributions as described above. We compute the combined phonon temperature Tphsubscript𝑇phT_{\mathrm{ph}}italic_T start_POSTSUBSCRIPT roman_ph end_POSTSUBSCRIPT as the mean of the two optical branches. The acoustic branches are neglected, because their much smaller excitation energies lead to temperature dynamics much slower than the considered 10ps10ps10\,\mathrm{ps}10 roman_ps. For this time interval, we thereby obtain a better approximation for collision term in the electron equation. For the given parameters, the phonons absorbs energy from the hot electrons and both relax towards the common equilibrium temperature of Teq429Ksubscript𝑇eq429KT_{\mathrm{eq}}\approx 429\,\mathrm{K}italic_T start_POSTSUBSCRIPT roman_eq end_POSTSUBSCRIPT ≈ 429 roman_K.

To fit the time constants, we only consider the trajectory for t>1ps𝑡1pst>1\,\mathrm{ps}italic_t > 1 roman_ps, where the system is somewhat close to its equilibrium. τelsubscript𝜏el\tau_{\mathrm{el}}italic_τ start_POSTSUBSCRIPT roman_el end_POSTSUBSCRIPT is obtained by fitting an exponential function to the relaxation of the total electron energy density. τphsubscript𝜏ph\tau_{\mathrm{ph}}italic_τ start_POSTSUBSCRIPT roman_ph end_POSTSUBSCRIPT is obtained by numerically evaluating the derivative ddtTph𝑑𝑑𝑡subscript𝑇ph\tfrac{d}{dt}T_{\mathrm{ph}}divide start_ARG italic_d end_ARG start_ARG italic_d italic_t end_ARG italic_T start_POSTSUBSCRIPT roman_ph end_POSTSUBSCRIPT from data and treating Eq. 24 as an regression equation. This procedure yields τel424fssubscript𝜏el424fs\tau_{\mathrm{el}}\approx 424\,\mathrm{fs}italic_τ start_POSTSUBSCRIPT roman_el end_POSTSUBSCRIPT ≈ 424 roman_fs and τph4333fssubscript𝜏ph4333fs\tau_{\mathrm{ph}}\approx 4333\,\mathrm{fs}italic_τ start_POSTSUBSCRIPT roman_ph end_POSTSUBSCRIPT ≈ 4333 roman_fs. However, if we use those values and propagate the same initial conditions with the two-temperature approximation, we underestimate the final equilibrium temperature with Teq408Ksubscript𝑇eq408KT_{\mathrm{eq}}\approx 408\,\mathrm{K}italic_T start_POSTSUBSCRIPT roman_eq end_POSTSUBSCRIPT ≈ 408 roman_K. This discrepancy is due to the nonlinearities of the far-out-of equilibrium states at the early stages of the relaxation. To account for that, the manually change time-constants to τel=500fssubscript𝜏el500fs\tau_{\mathrm{el}}=500\,\mathrm{fs}italic_τ start_POSTSUBSCRIPT roman_el end_POSTSUBSCRIPT = 500 roman_fs and τlat=4000fssubscript𝜏lat4000fs\tau_{\mathrm{lat}}=4000\,\mathrm{fs}italic_τ start_POSTSUBSCRIPT roman_lat end_POSTSUBSCRIPT = 4000 roman_fs to obtain the proper equilibrium temperature. The resulting temperatures for the electrons (red) and the phonons (blue), as obtained from the two-temperature approximation simulation, are plotted in Fig. 6 with dashed lines. Besides the final equilibrium temperature, we find a good agreement for the phonon temperature and a decent agreement for the electron temperature. Deviations occur in the initial stages of the relaxation, where the system is still far from its equilibrium.

Appendix D Benchmarking

To evaluate the quality of a given approximation (indicated by a hat, sampled at discrete tnsubscript𝑡𝑛t_{n}italic_t start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT), we use the following three metrics: The pulse maximum error of the photon intensity I𝐼Iitalic_I

ΔImax=maxnInmaxnI^n,Δsubscript𝐼maxsubscript𝑛subscript𝐼𝑛subscript𝑛subscript^𝐼𝑛\displaystyle\Delta I_{\mathrm{max}}=\max_{n}I_{n}-\max_{n}\hat{I}_{n},roman_Δ italic_I start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT = roman_max start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT italic_I start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT - roman_max start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT over^ start_ARG italic_I end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , (25)

the position error of maximum photon intensity Imaxsubscript𝐼maxI_{\mathrm{max}}italic_I start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT

ΔtImax=argmaxtIargmaxtI^,Δsubscript𝑡subscript𝐼maxsubscriptargmax𝑡𝐼subscriptargmax𝑡^𝐼\displaystyle\Delta t_{I_{\mathrm{max}}}=\operatorname*{arg\,max}_{t}I-% \operatorname*{arg\,max}_{t}\hat{I},roman_Δ italic_t start_POSTSUBSCRIPT italic_I start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT end_POSTSUBSCRIPT = start_OPERATOR roman_arg roman_max end_OPERATOR start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT italic_I - start_OPERATOR roman_arg roman_max end_OPERATOR start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT over^ start_ARG italic_I end_ARG , (26)

and the root-mean-squared (rms) error of the electron distribution f𝑓fitalic_f

ϵrms=1Nn,k(f^nkfnk)2.subscriptitalic-ϵrms1𝑁subscript𝑛𝑘superscriptsubscript^𝑓𝑛𝑘subscript𝑓𝑛𝑘2\displaystyle\epsilon_{\mathrm{rms}}=\sqrt{\frac{1}{N}\sum_{n,k}\left(\hat{f}_% {nk}-f_{nk}\right)^{2}}.italic_ϵ start_POSTSUBSCRIPT roman_rms end_POSTSUBSCRIPT = square-root start_ARG divide start_ARG 1 end_ARG start_ARG italic_N end_ARG ∑ start_POSTSUBSCRIPT italic_n , italic_k end_POSTSUBSCRIPT ( over^ start_ARG italic_f end_ARG start_POSTSUBSCRIPT italic_n italic_k end_POSTSUBSCRIPT - italic_f start_POSTSUBSCRIPT italic_n italic_k end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG . (27)

To benchmark a given data-driven model, we use a k-fold cross validation scheme, where we split the training data set (in terms of trajectories) into k equally sized folds. For each fold, we train the model on all the other folds and then score the trajectories contained in the selected fold. This way, we do not mix training and testing data but, nonetheless, obtain a score for each trajectory in the data set. Using the individual error scores, we can then compute the desired statistics, e.g., the median score.

Appendix E Implementation and Simulation Details

For the numerical integration of the laser model and its approximations, we chose a discretization of the electron momentum k𝑘kitalic_k with kmax=2.5nm1subscript𝑘max2.5superscriptnm1k_{\mathrm{max}}=2.5\,\mathrm{nm}^{-1}italic_k start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT = 2.5 roman_nm start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT and 100 equidistant points. For the microscopic model, the crystal momentum q𝑞qitalic_q follows the same discretization. This choice produces a sufficiently smooth gain curve (as constructed from the inhomogeneously broadened gain medium) and minimal energy losses in the electron-phonon interaction while remaining computationally tractable.

Both the microscopic model and the two-temperature approximation are then composed of coupled systems of ordinary differential equations (ODEs). To integrate them, we use SciPy’s initial value problem solver with the explicit Dormand-Prince adaptive step-size Runge-Kutta method of order 5(4). We note, that increasing the relative and absolute error tolerances does hardly speed up the simulation, but rather destabilizes the integration. The data-driven approximation, on the other hand, yields a system of delay-differential equations (DDEs) via its delay embedding. Those are outside the scope of the typical ODE solver package. Hence, we implement an explicit second-order Heun method with a constant step-size. This approach has the advantage, that it only evaluates the delayed variables at integer multiples of the step size and does not require any history-array interpolation routine. For our system, this method converges with a step-size h=5fs5fsh=5\,\mathrm{fs}italic_h = 5 roman_fs, which corresponds to half of our sampling interval δt=10fs𝛿𝑡10fs\delta t=10\,\mathrm{fs}italic_δ italic_t = 10 roman_fs. For a larger output sampling interval, we would advise to implement a constant step-size explicit fourth-order Runge-Kutta method with third-order Hermite interpolation for the history array.

The simulation code for all models in completely implemented in Python and publicly available as stated above. All simulation were performed on a single CPU, namely a state of the art Intel(R) Core(TM) i9-10900. The utilized NumPy and SciPy libraries are allowed utilize all CPU cores.

References

  • Gu et al. [2021] D. Gu, X. Shi, R. Poprawe, D. L. Bourell, R. Setchi, and J. Zhu, Material-structure-performance integrated laser-metal additive manufacturing, Science 372, eabg1487 (2021).
  • Loesel et al. [1996] F. H. Loesel, M. H. Niemz, J. F. Bille, and T. Juhasz, Laser-induced optical breakdown on hard and soft tissues and its dependence on the pulse duration: experiment and model, IEEE J. Quantum Electron. 32, 1717 (1996).
  • Juhasz et al. [1999] T. Juhasz, F. H. Loesel, R. M. Kurtz, C. Horvath, J. F. Bille, and G. Mourou, Corneal refractive surgery with femtosecond lasers, IEEE J. Sel. Top. Quantum Electron. 5, 902 (1999).
  • Nagy et al. [2009] Z. Nagy, A. Takacs, T. Filkorn, and M. Sarayba, Initial clinical evaluation of an intraocular femtosecond laser in cataract surgery, J. Refract. Surg. 25, 1053 (2009).
  • Knox [2000] W. H. Knox, Ultrafast technology in telecommunications, IEEE J. Sel. Top. Quantum Electron. 6, 1273 (2000).
  • Kuntz et al. [2007] M. Kuntz, G. Fiol, M. Lämmlin, C. Meuer, and D. Bimberg, High-speed mode-locked quantum-dot lasers and optical amplifiers, Proc. IEEE 95, 1767 (2007).
  • Rafailov et al. [2011] E. U. Rafailov, M. A. Cataluna, and E. A. Avrutin, Physics and Devices (WILEY-VCH, Weinheim, 2011).
  • Chow and Koch [1999] W. W. Chow and S. W. Koch, Physics of the Gain Materials, 1st ed. (Springer-Verlag Berlin Heidelberg, Berlin, 1999).
  • Lingnau et al. [2013] B. Lingnau, W. W. Chow, E. Schöll, and K. Lüdge, Feedback and injection locking instabilities in quantum-dot lasers: a microscopically based bifurcation analysis, New J. Phys. 15, 093031 (2013).
  • Kilen et al. [2014] I. Kilen, J. Hader, J. V. Moloney, and S. W. Koch, Ultrafast nonequilibrium carrier dynamics in semiconductor laser mode locking, Optica 1, 192 (2014).
  • Strogatz [2014] S. H. Strogatz, Nonlinear Dynamics and Chaos (CRC Press, 2014).
  • Czycholl [2008] G. Czycholl, Theoretische Festkörperphysik (Springer Verlag, 2008).
  • Thurn et al. [2023] A. Thurn, J. Bissinger, S. Meinecke, P. Schmiedeke, S. S. Oh, W. W. Chow, K. Lüdge, G. Koblmüller, and J. J. Finley, Self-induced ultrafast electron-hole plasma temperature oscillations in nanowire lasers, Phys. Rev. Appl. 20, 034045 (2023).
  • Khurana et al. [2023] D. Khurana, A. Koli, K. Khatter, and S. Singh, Natural language processing: State of the art, current trends and challenges, Multimedia tools and applications 82, 3713 (2023).
  • Bertolini et al. [2021] M. Bertolini, D. Mezzogori, M. Neroni, and F. Zammori, Machine learning for industrial applications: A comprehensive literature review, Expert Systems with Applications 175, 114820 (2021).
  • Sarker [2021] I. H. Sarker, Machine learning: Algorithms, real-world applications and research directions, SN Computer Science 2, 1 (2021).
  • Zhang et al. [2019a] X. Zhang, X. Ming, Z. Liu, D. Yin, Z. Chen, and Y. Chang, A reference framework and overall planning of industrial artificial intelligence (i-ai) for new application scenarios, The International Journal of Advanced Manufacturing Technology 101, 2367 (2019a).
  • Wang and Deng [2021] M. Wang and W. Deng, Deep face recognition: A survey, Neurocomputing 429, 215 (2021).
  • Lee et al. [2021] K. Lee, M. Eo, E. Jung, Y. Yoon, and W. Rhee, Short-term traffic prediction with deep neural networks: A survey, IEEE Access 9, 54739 (2021).
  • De La Torre et al. [2021] R. De La Torre, C. G. Corlu, J. Faulin, B. S. Onggo, and A. A. Juan, Simulation, optimization, and machine learning in sustainable transportation systems: Models and applications, Sustainability 13, 1551 (2021).
  • Zhang et al. [2019b] S. Zhang, L. Yao, A. Sun, and Y. Tay, Deep learning based recommender system: A survey and new perspectives, ACM Computing Surveys (CSUR) 52, 1 (2019b).
  • Radovic et al. [2018] A. Radovic, M. Williams, D. Rousseau, M. Kagan, D. Bonacorsi, A. Himmel, A. Aurisano, K. Terao, and T. Wongjirad, Machine learning at the energy and intensity frontiers of particle physics, Nature 560, 41 (2018).
  • Brunton et al. [2020] S. L. Brunton, B. R. Noack, and P. Koumoutsakos, Machine learning for fluid mechanics, Annual Review of Fluid Mechanics 52, 477 (2020).
  • Abbas [2020] O. Abbas, Science in the age of machine learning, Nat. Rev. Phys. 2, 342 (2020).
  • Baldi and Brunak [2001] P. Baldi and S. Brunak, Bioinformatics: the machine learning approach (MIT press, 2001).
  • May [2021] M. May, Eight ways machine learning is assisting medicine, Nature Medicine 27, 2 (2021).
  • Chen et al. [2021] Z. Chen, N. Andrejevic, N. C. Drucker, T. Nguyen, R. P. Xian, T. Smidt, Y. Wang, R. Ernstorfer, D. A. Tennant, M. Chan, et al., Machine learning on neutron and x-ray scattering and spectroscopies, Chemical Physics Reviews 2, 031301 (2021).
  • Schmidt et al. [2019] J. Schmidt, M. R. Marques, S. Botti, and M. A. Marques, Recent advances and applications of machine learning in solid-state materials science, npj Computational Materials 5, 1 (2019).
  • Bedolla et al. [2020] E. Bedolla, L. C. Padierna, and R. Castaneda-Priego, Machine learning for condensed matter physics, Journal of Physics: Condensed Matter 33, 053001 (2020).
  • Karniadakis et al. [2021] G. E. Karniadakis, I. G. Kevrekidis, L. Lu, P. Perdikaris, S. Wang, and L. Yang, Physics-informed machine learning, Nature Reviews Physics 3, 422 (2021).
  • Peng et al. [2021] G. C. Peng, M. Alber, A. Buganza Tepole, W. R. Cannon, S. De, S. Dura-Bernal, K. Garikipati, G. Karniadakis, W. W. Lytton, P. Perdikaris, et al., Multiscale modeling meets machine learning: What can we learn?, Archives of Computational Methods in Engineering 28, 1017 (2021).
  • Wang et al. [2023] J. Wang, H. Jiang, G. Chen, H. Wang, L. Lu, J. Liu, and L. Xing, Integration of multi-physics and machine learning-based surrogate modelling approaches for multi-objective optimization of deformed gdl of pem fuel cells, Energy and AI 14, 100261 (2023).
  • Edelen et al. [2020] A. Edelen, N. Neveu, M. Frey, Y. Huber, C. Mayes, and A. Adelmann, Machine learning for orders of magnitude speedup in multiobjective optimization of particle accelerator systems, Physical Review Accelerators and Beams 23, 044601 (2020).
  • Xu et al. [2020] H. Xu, J. Ma, P. Tan, B. Chen, Z. Wu, Y. Zhang, H. Wang, J. Xuan, and M. Ni, Towards online optimisation of solid oxide fuel cell performance: Combining deep learning with multi-physics simulation, Energy and AI 1, 100003 (2020).
  • Hennigh et al. [2021] O. Hennigh, S. Narasimhan, M. A. Nabian, A. Subramaniam, K. Tangsali, Z. Fang, M. Rietmann, W. Byeon, and S. Choudhry, Nvidia simnet: An ai-accelerated multi-physics simulation framework, in International conference on computational science (Springer, 2021) pp. 447–461.
  • Kasim et al. [2021] M. F. Kasim, D. Watson-Parris, L. Deaconu, S. Oliver, P. Hatfield, D. H. Froula, G. Gregori, M. Jarvis, S. Khatiwala, J. Korenaga, et al., Building high accuracy emulators for scientific simulations with deep neural architecture search, Machine Learning: Science and Technology 3, 015013 (2021).
  • Ladickỳ et al. [2015] L. Ladickỳ, S. Jeong, B. Solenthaler, M. Pollefeys, and M. Gross, Data-driven fluid simulations using regression forests, ACM Transactions on Graphics (TOG) 34, 1 (2015).
  • Lu et al. [2021] H. Lu, D. Ermakova, H. M. Wainwright, L. Zheng, and D. M. Tartakovsky, Data-informed emulators for multi-physics simulations, Journal of Machine Learning for Modeling and Computing 2 (2021).
  • Bi et al. [2023] K. Bi, L. Xie, H. Zhang, X. Chen, X. Gu, and Q. Tian, Accurate medium-range global weather forecasting with 3d neural networks, Nature 619, 533 (2023).
  • Han et al. [2020] J. Han, L. Zhang, et al., Integrating machine learning with physics-based modeling, arXiv preprint arXiv:2006.02619  (2020).
  • Willard et al. [2020] J. Willard, X. Jia, S. Xu, M. Steinbach, and V. Kumar, Integrating physics-based modeling with machine learning: A survey, arXiv preprint arXiv:2003.04919 1, 1 (2020).
  • Kochkov et al. [2021] D. Kochkov, J. A. Smith, A. Alieva, Q. Wang, M. P. Brenner, and S. Hoyer, Machine learning–accelerated computational fluid dynamics, Proceedings of the National Academy of Sciences 118, e2101784118 (2021).
  • Haug and Koch [2004] H. Haug and S. W. Koch, Quantum Theory of the Optical and Electronic Properties of Semiconductors, 2nd ed. (World Scientific, Singapore, 2004).
  • Coldren et al. [2012] L. A. Coldren, S. W. Corzine, and M. L. Masanovic, Diode Lasers and Photonic Integrated Circuits, 2nd ed., Wiley series in microwave and optical enginieering (Wiley & Sons, 2012).
  • Meinecke et al. [2023] S. Meinecke, F. Köster, D. Christiansen, K. Lüdge, A. Knorr, and M. Selig, Data-driven forecasting of nonequilibrium solid-state dynamics, Phys. Rev. B 107, 184306 (2023).
  • Lingnau [2015] B. Lingnau, Nonlinear and Nonequilibrium Dynamics of Quantum-Dot Optoelectronic Devices, Springer Theses (Springer International Publishing, Switzerland, 2015).
  • Roos et al. [2021] A. Roos, S. Meinecke, and K. Lüdge, Stabilizing nanolasers via polarization lifetime tuning, Sci. Rep. 11, 18558 (2021).
  • Takens [1981] F. Takens, Detecting strange attractors in turbulence, in Dynamical Systems and Turbulence, Warwick 1980, edited by D. G. Rand and L. S. Young (Springer Berlin Heidelberg, Berlin, Heidelberg, 1981) pp. 366–381.
  • Meinecke et al. [2019] S. Meinecke, L. Drzewietzki, C. Weber, B. Lingnau, S. Breuer, and K. Lüdge, Ultra-short pulse generation in a three section tapered passively mode-locked quantum-dot semiconductor laser, Sci. Rep. 9, 1783 (2019).
  • Meinecke [2022] S. Meinecke, Spatio-Temporal Modeling and Device Optimization of Passively Mode-Locked Semiconductor Lasers, Springer Theses (Springer, Cham, 2022).
  • Zhang et al. [2018] L. Zhang, H. Wang, et al., Reinforced dynamics for enhanced sampling in large atomic and molecular systems, The Journal of chemical physics 148 (2018).
  • Kormanyos et al. [2015] A. Kormanyos, G. Burkard, M. Gmitra, J. Fabian, V. Zolyomi, N. D. Drummond, and V. Falko, k p theory for two-dimensional transition metal dichalcogenide semiconductors, 2D Materials 2, 022001 (2015).
  • Li et al. [2013] X. Li, J. T. Mullen, Z. Jin, K. M. Borysenko, M. Buongiorno Nardelli, and K. W. Kim, Intrinsic electrical transport properties of monolayer silicene and MoS2subscriptMoS2\mathrm{MoS}_{2}roman_MoS start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT from first principles, Phys. Rev. B 87, 115418 (2013).
  • Butscher et al. [2007] S. Butscher, F. Milde, M. Hirtschulz, E. Malić, and A. Knorr, Hot electron relaxation and phonon dynamics in graphene, Applied Physics Letters 91, 203103 (2007).
  • Jin et al. [2014] Z. Jin, X. Li, J. T. Mullen, and K. W. Kim, Intrinsic transport properties of electrons and holes in monolayer transition-metal dichalcogenides, Phys. Rev. B 90, 045422 (2014).
  • Steinhoff et al. [2014] A. Steinhoff, M. Rösner, F. Jahnke, T. O. Wehling, and C. Gies, Influence of excited carriers on the optical and electronic properties of mos2, Nano Letters 14, 3743 (2014).
  • Perea-Causin et al. [2019] R. Perea-Causin, S. Brem, R. Rosati, R. Jago, M. Kulig, J. D. Ziegler, J. Zipfel, A. Chernikov, and E. Malic, Exciton propagation and halo formation in two-dimensional materials, Nano letters 19, 7317 (2019).
  • Selig et al. [2019] M. Selig, F. Katsch, R. Schmidt, S. M. de Vasconcellos, R. Bratschitsch, E. Malic, and A. Knorr, Ultrafast dynamics in monolayer transition metal dichalcogenides: Interplay of dark excitons, phonons, and intervalley exchange, Physical Review Research 1, 022007 (2019).
  • Golub and Van Loan [2013] G. H. Golub and C. F. Van Loan, Matrix Computations, Johns Hopkins Studies in the Mathematical Sciences (Johns Hopkins University Press, 2013).
  • Brunton and Kutz [2022] S. L. Brunton and J. N. Kutz, Data-driven science and engineering: Machine learning, dynamical systems, and control, 2nd ed. (Cambridge University Press, Cambridge, 2022).
  • Karhunen [1947] K. Karhunen, Under lineare methoden in der wahr scheinlichkeitsrechnung, Annales Academiae Scientiarun Fennicae Series A1: Mathematia Physica 47 (1947).
  • Loeve [2017] M. Loeve, Probability theory (Courier Dover Publications, 2017).
  • Lorenz [1956] E. N. Lorenz, Empirical orthogonal functions and statistical weather prediction, Vol. 1 (Massachusetts Institute of Technology, Department of Meteorology Cambridge, 1956).
  • Lumley [1967] J. L. Lumley, The structure of inhomogeneous turbulent flows, Atmospheric turbulence and radio wave propagation , 166 (1967).
  • Cherry [1996] S. Cherry, Singular value decomposition analysis and canonical correlation analysis, Journal of Climate 9, 2003 (1996).
  • Eckart and Young [1936] C. Eckart and G. Young, The approximation of one matrix by another of lower rank, Psychometrika 1, 211 (1936).
  • Akusok et al. [2015] A. Akusok, K. M. Börk, Y. Miche, and A. Lendasse, High-performance extreme learning machines: A complete toolbox for big data applications, IEEE Access 3, 1011 (2015).
  • Huang et al. [2004] G. B. Huang, Q. Y. Zhu, and C. K. Siew, Extreme learning machine: a new learning scheme of feedforward neural networks, IEEE International Joint Conference on Neural Networks (IEEE Cat. No.04CH37541) 2, 985 (2004).
  • Huang et al. [2006a] G. B. Huang, L. Chen, and C. K. Siew, Universal approximation using incremental constructive feedforward networks with random hidden nodes, IEEE Trans. Neural Networks 17, 879 (2006a).
  • Huang et al. [2011] G. B. Huang, H. Zhou, X. Ding, and R. Zhang, Extreme learning machine for regression and multiclass classification, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics) 42, 513 (2011).
  • Huang et al. [2006b] G. B. Huang, Q. Y. Zhu, and C. K. Siew, Extreme learning machine: theory and applications, Neurocomputing 70, 489 (2006b).
  • Pyle et al. [2021] R. Pyle, N. Jovanovic, D. Subramanian, K. V. Palem, and A. B. Patel, Domain-driven models yield better predictions at lower cost than reservoir computers in lorenz systems, Philos. Trans. Royal Soc. A 379, 20200246 (2021).
  • Vogel [2002] C. R. Vogel, Computational Methods for Inverse Problems (Society for Industrial and Applied Mathematics, 2002).
  • Press et al. [2007] W. H. Press, B. P. Flannery, S. A. Teukolsky, and W. T. Vettering, Numerical Recipes (3rd ed.) (Cambridge University Press, Cambridge, 2007).