Data-Driven Acceleration of Multi-Physics Simulations

Stefan Meinecke meinecke@tu-berlin.de Institut für Theoretische Physik, Technische Universität Berlin, Hardenbergstr. 36, 10623 Berlin, Germany Malte Selig Institut für Theoretische Physik, Technische Universität Berlin, Hardenbergstr. 36, 10623 Berlin, Germany Felix Köster Institut für Theoretische Physik, Technische Universität Berlin, Hardenbergstr. 36, 10623 Berlin, Germany Andreas Knorr Institut für Theoretische Physik, Technische Universität Berlin, Hardenbergstr. 36, 10623 Berlin, Germany Kathy Lüdge kathy.luedge@tu-ilmenau.de Technische Universität Ilmenau, Institut für Physik, Weimarer Straße 25, 98693 Ilmenau, Germany

Abstract

Multi-physics simulations play a crucial role in understanding complex systems. However, their computational demands are often prohibitive due to high dimensionality and complex interactions, such that actual calculations often rely on approximations. To address this, we introduce a data-driven approach to approximate interactions among degrees of freedom of no direct interest and thus significantly reduce computational costs. Focusing on a semiconductor laser as a case study, we demonstrate the superiority of this method over traditional analytical approximations in both accuracy and efficiency. Our approach streamlines simulations, offering promise for complex multi-physics systems, especially for scenarios requiring a large number of individual simulations.

Machine learning, regression, dimensionality reduction, reduced-order model, solid-state physics, semiconductor laser, electron-phonon dynamics, non-equilibrium physics, simulation

I Introduction

Multi-physics simulations have become a powerful tool for analyzing and understanding complex physical systems. These simulations involve the integration of multiple physical models that describe the behavior of different phenomena, such as heat transfer, fluid dynamics, and electromagnetic fields. Laser simulations are a prime example of multi-physics simulation, since they are used to model the behavior of lasers in various applications, such as material processing [1], medical treatments [2, 3, 4], and communication systems [5, 6, 7]. However, performing these simulations can be computationally intensive and time-consuming [8, 9, 10, 11].

The computational costs are typically driven by a large number of degrees of freedom, which are subject to non-trivial, i.e., nonlinear mutual interactions [11]. In many cases, however, one is only interested in the dynamics of only a few of them. Due to their interdependence, one nonetheless needs to solve the complete system, to obtain the dynamics of interest.

Traditionally, analytical approximations are applied to parts of the system, to reduce the complexity of the problem and achieve reasonable computational costs. As an example, we consider a semiconductor laser, where the photon intensity in the optical cavity interacts with a coupled electron-phonon gas of the gain medium (see Fig. 1). In the literature, the momentum resolved electron and phonon dynamics are typically reduced by fixing the temperature of the phonons (bath approximation) and treating the electrons in the thermal approximation. Hence, their dynamics are calculated in the relaxation-time approximation for the effective electron temperature[12]. Such approximations compromise the model’s predictive power, but have, nonetheless, provided many insights into multi-physics phenomena so far [8, 9, 11, 13].

The successful application of the machine-learning paradigm to technological tasks such as computer vision and natural-language processing [14, 15, 16, 17, 18, 19, 20, 21] has motivated researchers to utilize data-driven approaches for scientific discovery [22, 23, 24, 25, 26, 27, 28, 29]. Their application to multi-physics simulation models intends to either improve their accuracy or reduce their computational costs [30, 31]. Successful implementations can be used for efficient parameter studies, multi-objective optimization [32, 33], and real-time model based control [34]. The different approaches can be distinguished into two classes: The data-driven model can either be designed to entirely emulate the multi-physics system [35, 36, 37, 38, 33, 39], or to integrate with system equations [40, 41, 42, 34]. Our contribution belongs to the latter class.

In this manuscript, we propose to accelerate the simulation of complex multi-physics systems by approximating the interaction terms among the degrees of freedom, which are not of direct interest, via a data-driven model. Degrees of freedom, which do not directly couple to the observables of interest can even be dropped with their effect implicitly kept in the data-driven model.

Naturally, this approach requires sufficient training data and a suitable data-driven model architecture. Both critically determine the feasibility and success of our approach. Given a new problem, an overall improvement is only achieved, if two criteria are met: Firstly, the combined efforts of generating the training data; implementing, training, and testing the data-driven model; and performing the simulations must be computationally less demanding than the direct integration of the system. Secondly, the data-driven approach must provide better accuracy than analytical approximations. Hence, one must be thoughtful and deliberate with the choice and design of the data-driven model and the generation of the training data.

We demonstrate that our data-driven approach to accelerating multi-physics simulations can outperform a traditional analytic approximation both in terms of accuracy and computational costs on a chosen toy model. The advantage of the data-driven approach becomes especially pronounced if many simulations are required.

Figure 1: Sketch of the considered multi-physics system setup. If only the photon intensity is of interest, the degrees of freedom among the semiconductor crystal can be distinguished between direct and indirect coupling.

To demonstrate our approach, we consider a semiconductor laser. Its dynamics can be separated into two different physical domains, which are represented by the optical resonator with the photon intensity $I$ and the semiconductor crystal with the electron and phonon degrees of freedom [8, 43, 44]. Accurate quantitative simulations have been proven difficult, because they involve the self-consistent treatment of both domains, which strongly interact via the stimulated emission/absorption process [8, 43].

For our purpose, we are only interested in the photon intensity at the lasing mode. Following our data-driven approach, we therefore construct an approximation model, which only keeps dynamical variables for the photon intensity and the electron states. The phonon states, i.e., the remaining degrees of freedom of the semiconductor crystal, are dropped since they only indirectly couple to the photon intensity. All the internal interactions of the electron-phonon system are captured by a data-driven model and are then incorporated into the dynamical electron equations. Figure 1 visualizes and summarizes our subdivision of the system’s degrees of freedom and indicates their couplings.

II Methods

To model the semiconductor laser, we devise a toy model, which builds upon our previous work on electron-phonon dynamics [45], by coupling a photon intensity equation to the electrons via a stimulated emission term [8, 46]. The dynamics are described by the following equations of motion for the photon intensity $I$ and the electron occupations $f_{k}$ :

	$\displaystyle\frac{\mathrm{d}}{\mathrm{d}t}I=$	$\displaystyle-\frac{I}{\tau_{\mathrm{pht}}}+\sum_{k}G(k)D(k)\left[2f_{k}-1% \right]I,$		(1)
	$\displaystyle\frac{\mathrm{d}}{\mathrm{d}t}f_{k}=$	$\displaystyle-G(k)\left[2f_{k}-1\right]I+\partial_{t}f_{k}\|_{\mathrm{col}}.$		(2)

The first term in Eq. 1 describes the photon lifetime $\tau_{\mathrm{pht}}$ and the second term the interaction with the electrons via stimulated emission, where $G(k)$ denotes the $k$ -dependent gain function, $D(k)$ the number of states at discretized $k$ , and $[2f_{k}-1]$ the inversion. Note that the sum runs over the modulus $k$ of the two-dimensional momentum $\mathbf{k}$ , which yields $D(k)=k\Delta k/2\pi$ with the discretization step $\Delta k$ . The gain function is given by

\displaystyle G(k)=\frac{G_{\mathrm{lin}}}{\pi\gamma_{\mathrm{ph}}}% \operatorname{sech}{\left(\frac{2\epsilon_{k}-\epsilon_{\mathrm{ph}}}{\gamma_{% \mathrm{ph}}}\right)}

(3)

with the linear gain coefficient $G_{\mathrm{lin}}$ , the homogeneous linewidth of the transitions $\gamma_{\mathrm{ph}}$ , the single particle kinetic energy $\epsilon_{k}$ , and the lasing-mode photon energy $\epsilon_{\mathrm{ph}}$ with respect to the bandgap. The factor of two accounts for the symmetric valance band. The homogeneous linewidth is modeled by a hyperbolic secant to ensure properly decaying tails [8]. The first term in Eq. 2 describes the interaction with the photon intensity via stimulated emission and the second term describes the internal processes of the semiconductor material, where $\partial_{t}f_{k}|_{\mathrm{col}}$ summarizes all considered collision integrals.

Table 1: Laser parameters

$G_{\mathrm{lin}}$	0.04 nm ${}^{2}$ eV/fs	$\tau_{\mathrm{pht}}$	200 fs
$\epsilon_{\mathrm{ph}}$	0.035 eV	$\gamma_{\mathrm{ph}}$	0.01 eV

We choose laser parameters, which are inspired by a nanolaser [13, 47] and are summarized in Table 1. The characteristically large optical confinement factor and short photon lifetime yield photon intensity dynamics on a time scale similar to the electron dynamics [13]. Hence, this setup relies on the proper modeling of the latter and likely highlights any shortcomings.

Our microscopic model for the collision term includes the interaction with two optical and two acoustic phonon branches and omits electron-electron interaction. Hence, the microscopic evaluation of the collision term complements the laser model with dynamical equations for the phonon occupation numbers $n_{q}^{\alpha}$ , where $q$ denotes the crystal momentum and $\alpha$ the branch. A detailed description is presented in Appendix A. The microscopic evaluation of the collision term

\displaystyle\partial_{t}f_{k}|_{\mathrm{col}}^{\mathrm{micro}}\left\{f_{k},n_% {q}^{\alpha}\right\}

(4)

is typically expensive, which in turn drives the total computational costs of the simulation. Hence, one utilizes approximations for the collision term, to achieve reasonable computation times.

With our data-driven approach, we replace the microscopic evaluation of the collision term with a trained data-driven model

\displaystyle\partial_{t}f_{k}|_{\mathrm{col}}^{\mathrm{data}}\left\{f_{k}(t),% f_{k}(t-\Delta t),\dots,f_{k}(t-(\ell-1)\Delta t)\right\}

(5)

where, instead of the phonon occupation numbers $n_{q}^{\alpha}$ , a delay embedding of the electron occupations $f_{k}$ with the past $\ell$ states is implemented. This yields delay-differential equations and can be thought of as a Takens embedding [48]. For our purpose, we compose a delay-embedded regressive reduced-order model (derrom) via the package [45], which we built and maintain ourselves. In short, derrom first projects the past $\ell$ system states into a reduced dimensionality (order) latent space, which is constructed from the first $r$ left singular vectors of the training data. This step is designed to retain the dominant patterns of the input data and remove redundant and irrelevant information. Next, the latent space features are normalized to appropriate magnitudes and thereafter nonlinearly transformed, to yield derrom’s feature vector. The nonlinear transformation is constructed via $L$ hyperbolic tangent activation functions with random weights and biases. Lastly, the regression step is taken via a linear map $\mathbf{W}$ . Both the dimensionality reduction and the regression step are trained with the supervised learning paradigm, where we minimize the Frobenius norm of the residual error between the training data and the corresponding reconstruction/prediction. More details on the delay-embedded regressive reduced-order model are presented in Appendix B.

To assess the utility of our proposed data-driven approach, we implement a two-temperature approximation (tta) for the collision term [12]. This approximation assumes a steady state distribution for electrons and phonons at all times. Hence, the electron and phonon occupations are then described by effective electron and phonon temperature instead of a fully momentum resolved distribution function [12] Consequently, the electron dynamics can then be described as an exponential relaxation towards a Fermi-Dirac distribution with the phonon temperature $T_{\mathrm{ph}}$ . The phonon temperature $T_{\mathrm{ph}}$ itself is included as another dynamical quantity, whose relaxation is driven by the difference to the quasi-equilibrium temperature of the electron distribution. The two-temperature approximation

\displaystyle\partial_{t}f_{k}|_{\mathrm{col}}^{\mathrm{tta}}\left\{f_{k},T_{% \mathrm{ph}}\right\}

(6)

thus reduces the computational costs, but has the major caveat that it is only valid for electron distributions, which are sufficiently close to an equilibrium state. Such relaxation-time approximations are, nonetheless, often used for the simulation of laser dynamics [8, 9, 10, 49, 50, 13] due to a lack of viable alternatives. A detailed description of the two-temperature approximation and an example relaxation are presented in Appendix C.

III Results

III.1 Test Scenario Setup

To study the applicability of the data-driven approach, we design a test scenario, in which we are interested in the laser’s response to being optically pumped by a short pulse far above the band edge. Specifically, we already know that the laser emits a pulse if the pump energy is sufficient and we want to study how the pulse peak power and the pulse position (the delay after the pump pulse) depend on the power, spectral position, and spectral width of the pump pulse. The pump pulse is implemented as an initial electron state with a Gaussian distribution, where the maximum reflects the power, the mean the center frequency, and the standard deviation the spectral width of the pump laser.

Refer to caption — Figure 2: Demonstration of the laser dynamics and approximation methods for representative initial conditions. (a) Photon intensity as obtained from the microscopic model (solid blue), data-driven model (dashed orange), and the two-temperature approximation (dashed-dotted green) normalized to the median pulse maximum of the training/testing data set. (b) Color-coded electron occupation as function of the single-particle energy and time. The inset shows the distribution at $t=2.0\,\mathrm{ps}$ . (c1) and (c2) color-coded electron occupation errors of the data-driven model and the two-temperature approximation, respectively.

An example trajectory is presented in Fig. 2, where the blue line in (a) shows the normalized photon intensity and (b) shows the conduction-band electron distribution as a function of the single-particle energy. The additional plots and panels regarding the different approximations will be discussed later on. After an initial lag of $\approx 1.2\,\mathrm{ps}$ , the photon intensity $I$ rises to form a pulse which peaks around $\approx 2.0\,\mathrm{ps}$ and then quickly decays again. Much of the observed photon intensity dynamics can be explained by the electron dynamics. Initially, the electrons, which have been excited by the pump with a center energy of $0.160\,\mathrm{eV}$ , decay towards their quasi-Fermi distribution via the electron-phonon interaction. This process supplies the gain to the lasing mode, which is located at $0.0175\,\mathrm{eV}$ and thus causes some of the observed lag. Once the laser has turned on, the photon intensity clearly burns a spectral hole into the electron distribution. Moreover, a periodic pattern appears with a period of $30\,\mathrm{meV}$ , which corresponds to the energy of the optical phonons. Hence, the refilling of the initial spectral hole via the interaction with optical phonons causes characteristic secondary spectral holes in the electron distribution. The generated pattern is highlighted in the inset in Fig. 2 (b). Once the available gain is used up, the laser pulse quickly dies and the electron distribution approaches a Fermi distribution. From that example, we can deduce that the details of the electron dynamics, i.e., the collision term, strongly determine the temporal position and maximum value of the photon intensity pulse. Therefore, we score candidate approximations of the collision term with respect to their ability to achieve small errors of those two quantities (details in Appendix D).

III.2 Training-Data Generation and Training

Since, we want to evaluate both the reduction of the simulation time and the accuracy of our data-driven approach, we set ourselves the goal of simulating 1000 trajectories of $10\,\mathrm{ps}$ length with varying pumping conditions. For that purpose, we construct 1000 initial conditions, where we draw the maximum, mean, and standard deviation of the Gaussian initial conditions from the uniform distributions $\mathcal{U}_{\mathrm{max}}(0.5,0.99)$ , $\mathcal{U}_{\mathrm{mean}}(0.1\,\mathrm{eV},0.2\,\mathrm{eV})$ , and $\mathcal{U}_{\mathrm{std}}(0.04\,\mathrm{eV},0.06\,\mathrm{eV})$ , respectively.

To operate the data-driven approach, we first require sufficient training data. Since, we a priori don’t know the required number of training trajectories, we implement the following strategy: We first set a quantitative accuracy target, which is to be evaluated via a 10-fold cross-validation procedure on the training data (see Appendix D). For our purpose, we define the accuracy measure to be the sum of the 90ths percentiles of the normalized absolute pulse maximum errors $\Delta I_{\mathrm{max}}$ and the normalized absolute pulse maximum position errors $\Delta t_{I_{\mathrm{max}}}$ , i.e.,

\displaystyle A=P_{90}\left\{\frac{|\Delta I_{\mathrm{max}}|}{\langle I_{% \mathrm{max}}\rangle_{\mathrm{md}}}\right\}+P_{90}\left\{\frac{|\Delta t_{I_{% \mathrm{max}}}|}{\langle t_{I_{\mathrm{max}}}\rangle_{\mathrm{md}}}\right\}

(7)

and require it to be smaller than $0.2$ . Note that we specifically use robust statistics to mitigate the effect of outliers.

In the next step, we start simulating trajectories using the microscopic model in quasi-logarithmic steps, i.e., $10,20,50,100,...$ . After each step, we set out to obtain an optimal data-driven model via a heuristic hyperparameter optimization. Based on our previous experience with the electron-phonon system [45], we keep a fixed delay embedding of $\ell=2$ and a fixed number of nonlinear nodes $L=1000$ and optimize the number of reduced dimensions $d_{r}$ and the regularization strength $\alpha$ via a simple grid search. This procedure can be assigned to the class of exploration-labeling-training (ELT) algorithms [40, 51], whereat the exploration is performed via the randomly drawn initial conditions. Once such a model meets the accuracy target, it can be used for the further simulation of the remaining trajectories.

The accuracy results of this strategy are presented in Fig. 3. Panels (a) and (b) show the training set cross-validated errors for the normalized absolute pulse maximum error and the normalized absolute pulse maximum position error as a function of the number training trajectories $n$ . In all cases, the best performing model is shown. In both plots, the horizontal black bar denotes the median, the box the interquartile range, and the whiskers the 10th and 90th percentile. Hence, the sum of the top whiskers represents the accuracy metric Eq. 7. For an increasing number of training trajectories, the observed errors decrease, i.e., all indicators shift to smaller values. We find that the accuracy target Eq. 7 is met for $n=50$ training trajectories with a value of $A_{n=50}\approx 0.16$ . Thus, we can now use the obtained data-driven model for the further simulation of trajectories. Before moving on, however, we would like to stress, that choosing a suitable accuracy target is the human operator’s responsibility. Our choice is motivated by achieving a competitive performance with respect to accuracy and overall simulation time. For comparison, the two-temperature approximation only achieves an accuracy of $A_{\mathrm{tta}}\approx 1.5$ for the considered 50 training trajectories.

III.3 Simulation-Time Evaluation

For the evaluation of the computation time, we simulate all 1000 trajectories using the microscopic model, the data-driven model with the outlined strategy, and the two-temperature approximation. The setup of the latter is described in Appendix C. Details on the implementation are given in Appendix E. Figure 3 (c) plots the total simulation time as a function of the number of trajectories. For the microscopic model (blue line), the simulation time increases linearly and amounts to $\approx 3.7\,\mathrm{h}$ for all trajectories. This corresponds to an average of $\approx 13.3\,\mathrm{s}$ per trajectory. Similarly, the time for the two-temperature approximation (green solid line) also increases linearly, but only amounts to $\approx 0.73\,\mathrm{h}$ for all trajectories, which corresponds to $\approx 2.5\,\mathrm{s}$ per trajectory.

The data-driven approach (orange solid line), on the other hand, shows a distinctive two-stage behavior, which is due to the initial simulation of training data and model hyperparameter optimization. However, once a sufficient model is obtained, it only takes $\approx 0.58\,\mathrm{s}$ on average to simulate a trajectory. The total simulation time for all 1000 trajectories then amounts to $\approx 0.68\,\mathrm{h}$ . The crossing point with the microscopic model is found at $n=147$ trajectories and with the two-temperature approximation at $n=900$ trajectories.

At this point, we also want to highlight that the position of crossing points is determined by the initial costs of obtaining the data-driven model. E.g., if we set the accuracy target Eq. 7 to $0.25$ instead of of $0.2$ , $n=20$ training trajectories are sufficient. The corresponding simulation time is shown via the orange dashed line. For that case, the time to obtain all 1000 trajectories now only amounts to $\approx 0.36\,\mathrm{h}$ and the crossing point with the microscopic model is found at $n=57$ and with the two-temperature approximation at $n=305$ .

In the limit of large numbers of trajectories, i.e., if the initial costs of obtaining the data-driven model can be neglected, the data-driven approach offers an acceleration factor of $\approx 20$ compared to the microscopic model. The two-temperature approximation, on the other hand, only offers a factor of $\approx 4$ . While the latter is deceptively easy to write down, its implementation requires to fit a Fermi-function to the electron distribution in each step, which drives the computational costs. The data-driven model, on the other side, only performs a few vector and matrix operations, and one nonlinear transformation, which turn out to be relatively inexpensive. Thus, the data-driven approach also exhibits a performance advantage over the analytic approximation.

III.4 Accuracy Evaluation

Having benchmarked the simulation time performances, we now continue with the detailed evaluation of the simulation accuracy. Implicitly, we have already done that via the training error in Fig. 3 and the accuracy target Eq. 7. At this point, however, we want to strictly separate the training and testing data. For that purpose, we use the optimal data-driven model trained with $n=50$ trajectories and compare its approximation of the 950 trajectories of our test scenario to the trajectories computed with the two-temperature approximation with respect to ground truth produced by the microscopic model.

To start off with, we would first like to exemplify both approximation methods via the representative trajectory show in Fig. 2. In panel (a), the data-driven model is denoted by the orange dashed line and the two-temperature approximation by the green dash-dotted line. We find that the data-driven model beautifully tracks the microscopic model with the pulse maximum only slightly to small and slightly shifted to later times. The two-temperature approximation, on the other hand, only gets the rough qualitative feature right: The pulse maximum is underestimated by more than a factor of two, the pulse is shifted by $\approx 0.5\,\mathrm{ps}$ to later times, and the trailing edge decays much slower.

The apparent different behavior of the two approximations can be explained by their respective electron distribution dynamics, which are presented in Fig. 2 (c) in terms of the occupation error $f_{\epsilon_{k}}^{\mathrm{apprx}}(t)-f_{\epsilon_{k}}^{\mathrm{micro}}(t)$ with respect to the microscopic simulation shown in (b). The occupation errors produced by the data-driven model Fig. 2 (c1) are hardly visible on the common color scale, since they are about one magnitude smaller than those of the two-temperature approximation Fig. 2 (c2). Moreover, the two-temperature approximation misses important qualitative features of the microscopic model. Firstly, the initial relaxation is to slow and thus produces the blue area in the bottom left of the figure. Secondly, it does not exhibit the characteristic periodic structure produced by interaction with the optical phonons. Lastly, the refilling of the spectral hole, which is caused by the lasing process, also occurs to slowly and thus leaves the pronounced red region at energies above the lasing mode (as identified by the white horizontal stripe for $t>2\,\mathrm{ps}$ ).

Shortcomings of the two-temperature approximation are to be expected, since the appearing electron distributions are far away from a quasi equilibrium. However, it is important to note, that the quantitative impact onto the photon intensity dynamics cannot be known a priori. Hence, a responsible utilization of the two-temperature approximation should include a quantitative estimation of the potential errors.

To provide a broad picture of the approximation accuracy of the 950 test trajectories, Fig. 4 shows histograms of the normalized pulse maximum error (a), the normalized pulse maximum position error (b), and the electron distribution rms error (c) for both the data-driven model (orange) and the two-temperature approximation (green).

Taking a look at the pulse maximum error first, we find that the data-driven model produces deviations that are almost centered above zero and hardly exceed $\pm 0.2$ . The two-temperature approximation, on the other hand, exclusively underestimates the true maximum and generates normalized errors up to $-1.25$ with a median of $\approx-0.6$ .

Moving on to the pulse maximum position error, we find that the data-driven model produces errors, which are approximately centered above zero and do not exceed $\pm 0.1$ . This time, the two-temperature approximation exclusively overestimates the pulse maximum positions and thus produces normalized errors with a median of $\approx 0.17$ and outliers up to $0.72$ .

We can therefore conclude that the two-temperature approximation systematically underestimates the pulse maximum and overestimates the pulse position, while the data-driven model rather suffers from (much smaller) random errors. We attribute the behavior of the two-temperature approximation to the insufficient description of the electron dynamics for distributions for away from a quasi-equilibrium, as discussed previously.

Hence, we lastly take a look at the electron distribution rms error, which by definition is positive. The data-driven model exhibits a median error of $\approx 0.015$ and the two-temperature approximation a median error of $\approx 0.066$ . Note, that the rms error is not specific to the relevant physical features of the electron distribution, e.g., the gain at the lasing mode, and is therefore not an optimal predictor for the pulse accuracy. Nonetheless, it indicates the much better performance of the data-driven method over the two-temperature approximation.

With those results, we conclude that the example trajectories presented in Fig. 2 are representative both of the data-driven approach and the two-temperature approximation. Both the example and the statistics demonstrate the much better accuracy of the data-driven approach when compared to the two-temperature approximation. This can also be expressed in terms of the accuracy metric Eq. 7, which yields a value of $A_{\mathrm{tta}}\approx 1.38$ for the two-temperature approximation and a value of $A_{n=50}\approx 0.16$ for selected data-driven model on the 950 test trajectories.

IV Discussion and Conclusions

In this work, we have proposed a data-driven approach to accelerate multi-physics simulations and exemplarily demonstrated its application to a semiconductor laser model. The primary goal was to reduce computational costs without compromising accuracy compared to traditional analytical approximations. In our test scenario, the results establish a substantial advantage of the data-driven method over the two-temperature approximation in both computational efficiency and accuracy.

While demonstrating promising results, this study also has certain limitations. Naturally, the quantitative results regarding the accuracy and computation time are specific to the considered system, the mathematical model, the test scenario, and the implementation. However, we believe that the presented case study does not represent a best case scenario for data-driven approaches and that the potential performance gains may be much greater in other setups. This is because our toy semiconductor laser model only considers electron-phonon interactions in the collision term and no electron-electron interactions. The latter, however, are numerically more expensive to evaluate [8] and their inclusion would therefore give further advantage to the data-driven approach. This would become especially relevant if a finer discretization was required due to the scaling behavior of the microscopic evaluation (quadratic) and the data-driven model (linear) with the number discretization points.

Moreover, we showed that the data-driven approach only becomes feasible if a certain minimum number of trajectories is required due to the training costs of the data-driven model. However, it is easy to think of cases, where one would need to simulate many more than the 1000 trajectories of our test scenario. For a parameter study, a three-dimensional grid with 1000 samples only corresponds to ten sample values per parameter. Adding further parameters or increasing the sample density then quickly drives up the number of samples. If one where to include stochastic effects, e.g., spontaneous emission to our laser, each parameter combination would require multiple trajectory realizations to obtain proper statistics. Hence, the initial training costs can become negligible in many potential cases.

Both the microscopic evaluation and the data-driven model can benefit from a more efficient implementation. This may include the exact mathematical formulation, the simulation code, and in the case of the data-driven approach, the structure of the model and its training. The latter thus offers further options for a more efficient implementation and thus performance improvements. It thereby also puts the burden of doing so on the human operator. This stresses the importance of the careful construction and training of the data-driven model. Nonetheless, we are convinced that this task is manageable at present, since our case study already demonstrated excellent results without extensive model design experimentation and optimization. We expect future research to further establish practical guidelines for the successful design and training of data-driven approximation models.

One promising future research path is the integration of physical knowledge into the data-driven model. This could be done via the structure of the model itself, a combination with a knowledge-based approximation, or the training procedure [41]. In our considered case of the semiconductor laser, the proper collision term only redistributes charge-carriers and thus induces no net change of the charge-carrier density. At present, this physical knowledge is not included in the model and therefore small deviations can be observed. A straight-forward option to fixing this, could be the penalization of such deviations during the training via the cost function.

In conclusion, the data-driven approach presented in this study holds significant promise for accelerating multi-physics simulations in various fields, including but not limited to semiconductor lasers. Its superior performance in computational efficiency and accuracy, as compared to traditional approximations, may open avenues for further exploration and application in diverse scientific domains and facilitate rapid prototyping and optimization of complex systems.

V Acknowledgements

We acknowledge fruitful discussions with Dominik Christiansen (TU Berlin).
This work was funded by the Deutsche Forschungsgemeinschaft (DFG) through Project SE 3098/1, Project No. 432266622 (M.S.), SFB 951, Project No. 182087777 (A.K.).

VI Author contributions

S.M. and M.S. initiated and conceptualized the work. M.S. and A.K. worked on the microscopic derivation and the numerical implementation of electron-phonon scattering and parameter search. S.M. implemented the laser model and the data-driven model; performed the simulations and numerical experiments; and drafted the manuscript. All authors discussed and edited the manuscript.

VII Data Availability

The data generated in this work can be generated by running the publicly available code as described in the code availability statement.

VIII Code Availability

The simulation code and the regression code is available on GitHub under MIT license (https://github.com/stmeinecke/derrom)

Appendix A Microscopic Electron-Phonon Equations

We have described the coupled electron-phonon dynamics in our previous publication [45] and only present a brief summary here. The equations of motion for the electron occupations $f_{\mathbf{k}}$ and phonon occupations $n_{\mathbf{q}}^{\alpha}$ are derived from a Hamiltonian via the Heisenberg equation. The electron dispersion is treated in the parabolic approximation, the acoustic phonon dispersion in the Debye approximation, and the optical phonon dispersion in the Einstein approximation, with parameters from ab-initio calculations[52, 53]. The occurring hierarchy problem is addressed using a correlation expansion and a second-order Born-Markov approximation[54]. The resulting coupled electron-phonon Boltzmann scattering equations then read

$\displaystyle\partial_{t}f_{\mathbf{k}}\|_{\mathrm{col}}^{\mathrm{micro}}$	$\displaystyle=\frac{2\pi}{\hbar}\sum_{\mathbf{k^{\prime}},\alpha,\pm}\|g_{% \mathbf{k}-\mathbf{k^{\prime}}}\|^{2}\left(\frac{1}{2}\pm\frac{1}{2}+n_{\mathbf% {\mathbf{k}-\mathbf{k^{\prime}}}}^{\alpha}\right)\left(1-f_{\mathbf{k}}\right)% f_{\mathbf{k^{\prime}}}\delta(\epsilon_{\mathbf{k}}-\epsilon_{\mathbf{k^{% \prime}}}\pm\hbar\omega^{\alpha}_{\mathbf{k-k^{\prime}}})$
	$\displaystyle-\frac{2\pi}{\hbar}\sum_{\mathbf{k^{\prime}},\alpha,\pm}\|g_{% \mathbf{k}-\mathbf{k^{\prime}}}\|^{2}\left(\frac{1}{2}\pm\frac{1}{2}+n_{\mathbf% {\mathbf{k}-\mathbf{k^{\prime}}}}^{\alpha}\right)\left(1-f_{\mathbf{k^{\prime}% }}\right)f_{\mathbf{k}}\delta(\epsilon_{\mathbf{k}}-\epsilon_{\mathbf{k^{% \prime}}}\mp\hbar\omega^{\alpha}_{\mathbf{k-k^{\prime}}})$	(8)
$\displaystyle\partial_{t}n_{\mathbf{q}}^{\alpha}$	$\displaystyle=\frac{2\pi}{\hbar}\|g_{\mathbf{q}}^{\alpha}\|^{2}\sum_{\mathbf{k}}% \left(1-f_{\mathbf{k}}\right)f_{\mathbf{k+q}}\left(1+n_{\mathbf{q}}\right)% \delta(\epsilon_{\mathbf{k}}-\epsilon_{\mathbf{k+q}}+\hbar\omega^{\alpha}_{% \mathbf{q}})$
	$\displaystyle-\frac{2\pi}{\hbar}\|g_{\mathbf{q}}^{\alpha}\|^{2}\sum_{\mathbf{k}}% \left(1-f_{\mathbf{k+q}}\right)f_{\mathbf{k}}n_{\mathbf{q}}\delta(\epsilon_{% \mathbf{k}}-\epsilon_{\mathbf{k+q}}-\hbar\omega^{\alpha}_{\mathbf{q}}).$	(9)

The appearing electron coupling elements $g_{\mathbf{k}}$ are treated in the effective deformation potential approximation 53

g_{q}^{i}=\sqrt{\frac{\hbar}{2\rho\Omega^{i}A}}V_{q},

(10)

with the effective mass density of the unit cell $\rho$ and the semiconductor area $A$ . The effective deformation potential reads $V_{q}=D_{1}q$ for acoustic phonon coupling, and $V_{q}=D_{0}$ for optical phonons. We evaluate the system for two-dimensional MoSe2 [53, 55, 56, 57, 58] and take the parameters from reference 55 and list them in Table 2.

Table 2: Material parameters.

$m_{e}$ / $m_{0}$	0.5	52
$\bar{\omega}^{o}$ /meV	36	55
$c_{LA}$ /(nm/fs)	4.1	55
$D^{a}_{1}$ /eV	3.4	55
$D^{o}_{0}$ /eV nm ${}^{-1}$	52	55

Appendix B Delay Embedded Regressive Reduced Order Model

Figure 5: Sketch of the delay-embedded regressive reduced order model (derrom). The past

\ell

system states

\mathbf{s}_{n}

are reduced in their dimensionality

\mathbf{r}_{n}

, scaled

\mathbf{\widetilde{r}}_{n}

, concatenated and then nonlinearly transformed to yield the feature vector

\mathbf{f}_{n}

. The prediction

\mathbf{p}_{n}

is computed via a linear map with the trained weights

\mathbf{W}

In this section, we present and discuss the delay-embedded regressive reduced order model (derrom) in detail. The processing and flow of information, coming from the past $\ell$ system states, can be separated into multiple individual steps and is illustrated in Fig. 5. In this figure, time moves to the right and information flows to the bottom. In the first step, the past $\ell$ (truncated) system states $\mathbf{s}_{n}$ are collected (orange boxes). If they are truncated, only a subset of the system state variables is considered, e.g., only the electron occupation numbers enter the data-driven model in this work. In the next step, each system state is projected into a reduced order latent space $\mathbf{r}_{n}$ (green boxes). Next, a scaling of each feature, i.e., element in the reduced states, produces the scaled reduced system states $\mathbf{\widetilde{r}}_{n}$ (blue boxes). Their concatenation (light blue box) represents the linear feature vector of the model. The nonlinear feature vector $\mathbf{f}_{n}$ (purple box) is then generated via the nonlinear transformation $\mathbf{f}(\cdot)$ . Note that the feature scaling is highly relevant, because it affects the nonlinear response of $\mathbf{f}$ . Lastly, a linear regression step is carried out via the matrix $\mathbf{W}$ to produce the prediction $\mathbf{p}$ .

For the dimensionality reduction stage, we use the left singular vectors of the singular value decomposition (SVD) [59, 60]

\displaystyle\mathbf{S}^{T}=\mathbf{U}\mathbf{\Sigma}\mathbf{V}^{T},

(11)

which has proven most efficient in our previous work on the electron-phonon system [45]. The SVD has a rich history in science and engineering [59, 60] and is known across different disciplines as the Karhunen–Loève transform (KLT) [61, 62], empirical orthogonal functions [63], proper orthogonal decomposition (POD) [64], and canonical correlation analysis [65]. The $\mathbf{U}$ and $\mathbf{V}$ matrices are unitary and contain the left and right singular vectors of $\mathbf{S}^{T}$ . $\mathbf{\Sigma}$ is a diagonal matrix with the singular values in descending order. Their magnitude naturally organizes the left and right singular vectors according to their share in the reconstruction of the data matrix $\mathbf{S}^{T}$ . Specifically, the Eckart-Young theorem [66, 60] states that the best rank- $r$ approximation of a matrix with respect to the Frobenius norm can be achieved via the truncation over the leading $r$ singular values of the SVD. To reduce a system state $\mathbf{s}_{n}$ , we compute the SVD of the training data to obtain the basis $\mathbf{U}$ . We then use the truncated matrix $\mathbf{U}_{r}=(\mathbf{u}_{1},\dots,\mathbf{u}_{r})$ to project $\mathbf{s}_{n}$ onto the first $r$ left singular vectors $\mathbf{u}_{m}$ to obtain the reduced state

\displaystyle\mathbf{r}_{n}=(r_{1},\dots,r_{r})^{T}=\mathbf{U}_{r}^{T}\mathbf{% s}_{n}.

(12)

Note, that this dimensionality reduction scheme is entirely data driven. We thus expect its performance to depend on the quality of the training data. To obtain a basis, that is well suited for the task, the training data should thus be representative of the relevant system state subspace.

In the feature scaling stage, we standardize the individual features by subtracting their mean $\mu_{m}$ and scaling them to unit variance

\displaystyle\widetilde{r}_{nm}=\frac{r_{nm}-\mu_{m}}{\sigma_{m}},

(13)

where $\sigma_{m}$ denotes the standard deviation of the $m$ -th feature in the training data.

The delay embedding is achieved by concatenating the past $\ell$ system states $\widetilde{\mathbf{r}}$

\displaystyle\mathbf{f}_{n}^{\mathrm{lin}}=\widetilde{\mathbf{r}}_{n}\oplus% \widetilde{\mathbf{r}}_{n-1}\oplus\dots\oplus\widetilde{\mathbf{r}}_{n-\ell+1},

(14)

where $\oplus$ denotes the concatenation operator. The result represents the linear feature vector $\mathbf{f}_{n}^{\mathrm{lin}}$ of the data-driven model. For a reduced dimensionality $d_{r}$ of the state vectors $\widetilde{\mathbf{r}}_{n}$ , the dimension of the linear feature vector $\mathbf{f}_{n}^{\mathrm{lin}}$ thus becomes $d_{r}\ell$ .

For the nonlinear transformation, we construct the feature vector according to

\displaystyle\mathbf{f}_{n}=1\oplus\mathbf{f}_{n}^{\mathrm{lin}}\oplus\mathbf{% f}^{\mathrm{nl}}(\mathbf{f}_{n}^{\mathrm{lin}}),

(15)

where we add a bias term, denoted by the one, and explicitly keep the linear feature vector. The nonlinear features are represented by $\mathbf{f}^{\mathrm{nl}}(\mathbf{f}_{n}^{\mathrm{lin}})$ . In our case, the nonlinear transformation is given by

\displaystyle\mathbf{f}^{\mathrm{nl}}(\mathbf{f}_{n}^{\mathrm{lin}})=\phi\left% (\mathbf{W}_{\mathrm{nl}}\mathbf{f}_{n}^{\mathrm{lin}}+\boldsymbol{\beta}% \right),

(16)

where a nonlinear activation function $\phi$ acts on each element of the input vector, which is generated by the weight matrix $\mathbf{W}_{\mathrm{nl}}\in\mathbb{R}^{L\times d_{f}}$ , the input (linear feature) vector $\mathbf{f}_{n}^{\mathrm{lin}}$ , and the bias vector $\boldsymbol{\beta}\in\mathbb{R}^{L\times d_{f}}$ . This nonlinear transformation can be represented by a fully connected feed-forward network with one hidden layer, to which a nonlinear activation function is applied. The matrix $\mathbf{W}_{\mathrm{nl}}$ defines the number of nonlinear nodes $L$ in the hidden layer via its shape $\mathbb{R}^{L\times d_{f}}$ . The weights $w_{mn}^{\mathrm{nl}}$ are drawn from a normal distribution $\mathcal{N}^{\mathbf{W}}(0,d_{f}^{-1})$ with zero mean and the variance given be the inverse feature vector dimension $d_{f}$ . Given the standardization of the features, the scaling of the variance ensures that the expected magnitude of the inputs $u_{nl}=\sum_{m}w_{lm}^{\mathrm{nl}}f^{\mathrm{lin}}_{nm}$ is of the order $\approx 1$ . This design has been shown to best harness the $\tanh$ nonlinaerity and produce optimal regression results [67]. With the same argument, the biases $\beta_{m}$ are drawn from the uniform distribution $\mathcal{U}^{\mathbf{\beta}}(-1.0,1.0)$ . If one does not optimize the parameters of the transformation, i.e., $\mathbf{W}_{\mathrm{nl}}$ and $\boldsymbol{\beta}$ , in the training of the model, this approach is also known as an extreme learning machine (ELM) [68, 69, 70, 71, 72].

Lastly, the regression step is performed via the linear map

\displaystyle\mathbf{p}_{n}^{T}=\mathbf{f}_{n}^{T}\mathbf{W},

(17)

where $\mathbf{W}\in\mathbb{R}^{d_{f}\times d_{p}}$ denotes the regression weight matrix with the dimensions of the feature vector $d_{f}$ and the prediction $d_{p}$ .

The model is trained using the supervised learning paradigm, i.e., the model parameters are tuned to minimize a loss function of some labeled training data and its corresponding model output. The training of our proposed model includes the dimensionality reduction stage and the regression stage. For the sake of training simplicity, we train the dimensionality reduction stage and the regression weights separately. I.e., we compute the SVD for the dimensionality reduction stage and then optimize the regression weights $\mathbf{W}$ for a given reduced dimensionality $d_{r}$ .

To facilitate the model training, we first sample $M$ trajectories at $N_{M}$ discrete times and build the data matrix

\displaystyle\mathbf{S}=(\mathbf{s}_{1}^{1},\dots,\mathbf{s}_{n}^{m},\dots,% \mathbf{s}_{N_{M}}^{M})^{T},

(18)

where the rows correspond to the system state vectors $\mathbf{s}_{n}^{m}$ of the $m$ -th trajectory at the time $t_{n}^{m}$ . We then proceed to construct the feature matrix

\displaystyle\mathbf{F}=(\mathbf{f}_{1}^{1},\dots,\mathbf{f}_{n}^{m},\dots,% \mathbf{f}_{N_{M}}^{M})^{T},

(19)

where the feature vectors $\mathbf{f}_{n}^{m}$ are computed according to Eq. 15 from the system states. For delay embeddings ( $\ell>1$ ), the trajectories $\mathbf{s}_{n}^{m}$ are padded with the trajectories first state $\mathbf{s}_{1}^{m}$ to calculate the first feature vectors $\mathbf{f}^{m}_{n<\ell}$ . Lastly, we construct the target matrix

\displaystyle\mathbf{T}=(\mathbf{t}_{1}^{1},\dots,\mathbf{t}_{n}^{m},\dots,% \mathbf{t}_{N_{M}}^{M})^{T},

(20)

where the vector $\mathbf{t}_{n}^{m}$ is to be predicted based on the feature vector $\mathbf{f}_{n}^{m}$ . In this work, it represents the $k$ -resolved collision terms in Eq. 2. The regression weights $\mathbf{W}$ are then determined by solving the least-squares problem

\displaystyle\operatorname*{arg\,min}_{\mathbf{W}}\left[\left\lVert\mathbf{F}% \mathbf{W}-\mathbf{T}\right\rVert_{\mathrm{F}}^{2}+\alpha\left\lVert\mathbf{W}% \right\rVert_{\mathrm{F}}^{2}\right],

(21)

where $\left\lVert\cdot\right\rVert_{\mathrm{F}}$ is the Frobenius norm and $\alpha$ the Tikhonov regularization (ridge) parameter [73, 74, 60]. A nonzero $\alpha$ penalizes large weights $w_{nm}$ and thereby counteracts overfitting while also ensuring that a solution exists. The solution to this problem reads

\displaystyle\mathbf{W}=\left(\mathbf{F}^{T}\mathbf{F}+\alpha\mathbf{I}\right)% ^{-1}\mathbf{F}^{T}\mathbf{T},

(22)

where $(\cdot)^{-1}$ , depending on the rank of $\left(\mathbf{F}^{T}\mathbf{F}+\alpha\mathbf{I}\right)$ , either denotes the inverse or the Moore–Penrose pseudo inverse.

Apart from the model parameters, which are optimized in the training process, four hyperparameters remain: the delay embedding depth $\ell$ , the reduced system state dimension $d_{r}$ , the number of nonlinear nodes $L$ , and the regularization parameter $\alpha$ . To obtain a well performing model, those must be set to appropriate values, either by systematic hyperparameter optimization schemes or based on educated guesses.

Appendix C Two-Temperature Approximation

The two-temperature approximation improves upon a relaxation-time approximation with a fixed phonon-bath temperature by introducing a dynamical phonon temperature $T_{\mathrm{ph}}$ , which represents the combined quasi-equilibrium temperature of all phonon branches. Hence, the heating or cooling of the semiconductor lattice via the electron-phonon interaction is taken into account, albeit within a relaxation-time approximation: The dynamics of the phonon temperature are driven by its difference to the quasi-equilibrium temperature of the electrons. This modification causes the electrons to relax towards a Fermi distribution with the common equilibrium temperature of both the electron and the phonon system. In the considered setup, the common equilibrium temperature depends upon the initial non-equilibrium electron distribution, which is generated by the pump pulse. The respective equations of motion for the electron occupation numbers and the phonon temperature then read

	$\displaystyle\partial_{t}f_{k}\|_{\mathrm{col}}^{\mathrm{tta}}=$	$\displaystyle\frac{1}{\tau_{\mathrm{el}}}\left[F(\epsilon_{k},\mu,T_{\mathrm{% ph}})-f_{k}\right],$		(23)
	$\displaystyle\frac{\mathrm{d}}{\mathrm{d}t}T_{\mathrm{ph}}=$	$\displaystyle\frac{1}{\tau_{\mathrm{ph}}}\left[T_{\mathrm{el}}-T_{\mathrm{ph}}% \right],$		(24)

where $F(\cdot)$ denotes the Fermi distribution, $\tau_{\mathrm{el}}$ and $\tau_{\mathrm{ph}}$ the relaxation time-constants for the electron distribution and the phonon temperature, and $T_{\mathrm{el}}$ the quasi-equilibrium temperature of the electrons.

To evaluate the right-hand-side of the equations 23 and 24, both the chemical potential $\mu$ and the quasi-equilibrium temperature $T_{\mathrm{el}}$ must be calculated dynamically. $T_{\mathrm{el}}$ is uniquely determined by the electron density and the total electron energy density of the current distribution $\{f_{k}\}$ and $\mu$ is uniquely determined by the electron density (for a given $T_{\mathrm{ph}}$ ). Hence, both can be obtained by solving the optimization problem of matching the density and total energy density of the quasi-Fermi distribution to the current electron distribution. The optimizations are solved iteratively via Newton’s method. Note that this process requires multiple evaluations of the Fermi function, which represents the largest contribution to the computational costs of this approximation.

To setup the two-temperature approximation, one must also determine the time constants $\tau_{\mathrm{el}}$ and $\tau_{\mathrm{ph}}$ . Since, we have the microscopic model at hand, we treat the time constants as fit parameters, which are to be determined from a sample trajectory. For that purpose, we simulate the electron-phonon system without the photon intensity. The phonons are initialized with a $T=300\,\mathrm{K}$ Bose-Einstein distribution and the electrons with an out-of-equilibrium Gaussian distribution. The electron-phonon interactions then drives the coupled system towards an equilibrium state, which is characterized by a common temperature.

Figure 6 plots the quasi-equilibrium temperature of the electrons (red) and the phonons (blue) with solid lines as a function of time. These temperatures are obtained by fitting Fermi/Bose distributions to the electron and phonon distributions as described above. We compute the combined phonon temperature $T_{\mathrm{ph}}$ as the mean of the two optical branches. The acoustic branches are neglected, because their much smaller excitation energies lead to temperature dynamics much slower than the considered $10\,\mathrm{ps}$ . For this time interval, we thereby obtain a better approximation for collision term in the electron equation. For the given parameters, the phonons absorbs energy from the hot electrons and both relax towards the common equilibrium temperature of $T_{\mathrm{eq}}\approx 429\,\mathrm{K}$ .

To fit the time constants, we only consider the trajectory for $t>1\,\mathrm{ps}$ , where the system is somewhat close to its equilibrium. $\tau_{\mathrm{el}}$ is obtained by fitting an exponential function to the relaxation of the total electron energy density. $\tau_{\mathrm{ph}}$ is obtained by numerically evaluating the derivative $\tfrac{d}{dt}T_{\mathrm{ph}}$ from data and treating Eq. 24 as an regression equation. This procedure yields $\tau_{\mathrm{el}}\approx 424\,\mathrm{fs}$ and $\tau_{\mathrm{ph}}\approx 4333\,\mathrm{fs}$ . However, if we use those values and propagate the same initial conditions with the two-temperature approximation, we underestimate the final equilibrium temperature with $T_{\mathrm{eq}}\approx 408\,\mathrm{K}$ . This discrepancy is due to the nonlinearities of the far-out-of equilibrium states at the early stages of the relaxation. To account for that, the manually change time-constants to $\tau_{\mathrm{el}}=500\,\mathrm{fs}$ and $\tau_{\mathrm{lat}}=4000\,\mathrm{fs}$ to obtain the proper equilibrium temperature. The resulting temperatures for the electrons (red) and the phonons (blue), as obtained from the two-temperature approximation simulation, are plotted in Fig. 6 with dashed lines. Besides the final equilibrium temperature, we find a good agreement for the phonon temperature and a decent agreement for the electron temperature. Deviations occur in the initial stages of the relaxation, where the system is still far from its equilibrium.

Appendix D Benchmarking

To evaluate the quality of a given approximation (indicated by a hat, sampled at discrete $t_{n}$ ), we use the following three metrics: The pulse maximum error of the photon intensity $I$

\displaystyle\Delta I_{\mathrm{max}}=\max_{n}I_{n}-\max_{n}\hat{I}_{n},

(25)

the position error of maximum photon intensity $I_{\mathrm{max}}$

\displaystyle\Delta t_{I_{\mathrm{max}}}=\operatorname*{arg\,max}_{t}I-% \operatorname*{arg\,max}_{t}\hat{I},

(26)

and the root-mean-squared (rms) error of the electron distribution $f$

\displaystyle\epsilon_{\mathrm{rms}}=\sqrt{\frac{1}{N}\sum_{n,k}\left(\hat{f}_% {nk}-f_{nk}\right)^{2}}.

(27)

To benchmark a given data-driven model, we use a k-fold cross validation scheme, where we split the training data set (in terms of trajectories) into k equally sized folds. For each fold, we train the model on all the other folds and then score the trajectories contained in the selected fold. This way, we do not mix training and testing data but, nonetheless, obtain a score for each trajectory in the data set. Using the individual error scores, we can then compute the desired statistics, e.g., the median score.

Appendix E Implementation and Simulation Details

For the numerical integration of the laser model and its approximations, we chose a discretization of the electron momentum $k$ with $k_{\mathrm{max}}=2.5\,\mathrm{nm}^{-1}$ and 100 equidistant points. For the microscopic model, the crystal momentum $q$ follows the same discretization. This choice produces a sufficiently smooth gain curve (as constructed from the inhomogeneously broadened gain medium) and minimal energy losses in the electron-phonon interaction while remaining computationally tractable.

Both the microscopic model and the two-temperature approximation are then composed of coupled systems of ordinary differential equations (ODEs). To integrate them, we use SciPy’s initial value problem solver with the explicit Dormand-Prince adaptive step-size Runge-Kutta method of order 5(4). We note, that increasing the relative and absolute error tolerances does hardly speed up the simulation, but rather destabilizes the integration. The data-driven approximation, on the other hand, yields a system of delay-differential equations (DDEs) via its delay embedding. Those are outside the scope of the typical ODE solver package. Hence, we implement an explicit second-order Heun method with a constant step-size. This approach has the advantage, that it only evaluates the delayed variables at integer multiples of the step size and does not require any history-array interpolation routine. For our system, this method converges with a step-size $h=5\,\mathrm{fs}$ , which corresponds to half of our sampling interval $\delta t=10\,\mathrm{fs}$ . For a larger output sampling interval, we would advise to implement a constant step-size explicit fourth-order Runge-Kutta method with third-order Hermite interpolation for the history array.

The simulation code for all models in completely implemented in Python and publicly available as stated above. All simulation were performed on a single CPU, namely a state of the art Intel(R) Core(TM) i9-10900. The utilized NumPy and SciPy libraries are allowed utilize all CPU cores.

References

Gu et al. [2021] D. Gu, X. Shi, R. Poprawe, D. L. Bourell, R. Setchi, and J. Zhu, Material-structure-performance integrated laser-metal additive manufacturing, Science 372, eabg1487 (2021).
Loesel et al. [1996] F. H. Loesel, M. H. Niemz, J. F. Bille, and T. Juhasz, Laser-induced optical breakdown on hard and soft tissues and its dependence on the pulse duration: experiment and model, IEEE J. Quantum Electron. 32, 1717 (1996).
Juhasz et al. [1999] T. Juhasz, F. H. Loesel, R. M. Kurtz, C. Horvath, J. F. Bille, and G. Mourou, Corneal refractive surgery with femtosecond lasers, IEEE J. Sel. Top. Quantum Electron. 5, 902 (1999).
Nagy et al. [2009] Z. Nagy, A. Takacs, T. Filkorn, and M. Sarayba, Initial clinical evaluation of an intraocular femtosecond laser in cataract surgery, J. Refract. Surg. 25, 1053 (2009).
Knox [2000] W. H. Knox, Ultrafast technology in telecommunications, IEEE J. Sel. Top. Quantum Electron. 6, 1273 (2000).
Kuntz et al. [2007] M. Kuntz, G. Fiol, M. Lämmlin, C. Meuer, and D. Bimberg, High-speed mode-locked quantum-dot lasers and optical amplifiers, Proc. IEEE 95, 1767 (2007).
Rafailov et al. [2011] E. U. Rafailov, M. A. Cataluna, and E. A. Avrutin, Physics and Devices (WILEY-VCH, Weinheim, 2011).
Chow and Koch [1999] W. W. Chow and S. W. Koch, Physics of the Gain Materials, 1st ed. (Springer-Verlag Berlin Heidelberg, Berlin, 1999).
Lingnau et al. [2013] B. Lingnau, W. W. Chow, E. Schöll, and K. Lüdge, Feedback and injection locking instabilities in quantum-dot lasers: a microscopically based bifurcation analysis, New J. Phys. 15, 093031 (2013).
Kilen et al. [2014] I. Kilen, J. Hader, J. V. Moloney, and S. W. Koch, Ultrafast nonequilibrium carrier dynamics in semiconductor laser mode locking, Optica 1, 192 (2014).
Strogatz [2014] S. H. Strogatz, Nonlinear Dynamics and Chaos (CRC Press, 2014).
Czycholl [2008] G. Czycholl, Theoretische Festkörperphysik (Springer Verlag, 2008).
Thurn et al. [2023] A. Thurn, J. Bissinger, S. Meinecke, P. Schmiedeke, S. S. Oh, W. W. Chow, K. Lüdge, G. Koblmüller, and J. J. Finley, Self-induced ultrafast electron-hole plasma temperature oscillations in nanowire lasers, Phys. Rev. Appl. 20, 034045 (2023).
Khurana et al. [2023] D. Khurana, A. Koli, K. Khatter, and S. Singh, Natural language processing: State of the art, current trends and challenges, Multimedia tools and applications 82, 3713 (2023).
Bertolini et al. [2021] M. Bertolini, D. Mezzogori, M. Neroni, and F. Zammori, Machine learning for industrial applications: A comprehensive literature review, Expert Systems with Applications 175, 114820 (2021).
Sarker [2021] I. H. Sarker, Machine learning: Algorithms, real-world applications and research directions, SN Computer Science 2, 1 (2021).
Zhang et al. [2019a] X. Zhang, X. Ming, Z. Liu, D. Yin, Z. Chen, and Y. Chang, A reference framework and overall planning of industrial artificial intelligence (i-ai) for new application scenarios, The International Journal of Advanced Manufacturing Technology 101, 2367 (2019a).
Wang and Deng [2021] M. Wang and W. Deng, Deep face recognition: A survey, Neurocomputing 429, 215 (2021).
Lee et al. [2021] K. Lee, M. Eo, E. Jung, Y. Yoon, and W. Rhee, Short-term traffic prediction with deep neural networks: A survey, IEEE Access 9, 54739 (2021).
De La Torre et al. [2021] R. De La Torre, C. G. Corlu, J. Faulin, B. S. Onggo, and A. A. Juan, Simulation, optimization, and machine learning in sustainable transportation systems: Models and applications, Sustainability 13, 1551 (2021).
Zhang et al. [2019b] S. Zhang, L. Yao, A. Sun, and Y. Tay, Deep learning based recommender system: A survey and new perspectives, ACM Computing Surveys (CSUR) 52, 1 (2019b).
Radovic et al. [2018] A. Radovic, M. Williams, D. Rousseau, M. Kagan, D. Bonacorsi, A. Himmel, A. Aurisano, K. Terao, and T. Wongjirad, Machine learning at the energy and intensity frontiers of particle physics, Nature 560, 41 (2018).
Brunton et al. [2020] S. L. Brunton, B. R. Noack, and P. Koumoutsakos, Machine learning for fluid mechanics, Annual Review of Fluid Mechanics 52, 477 (2020).
Abbas [2020] O. Abbas, Science in the age of machine learning, Nat. Rev. Phys. 2, 342 (2020).
Baldi and Brunak [2001] P. Baldi and S. Brunak, Bioinformatics: the machine learning approach (MIT press, 2001).
May [2021] M. May, Eight ways machine learning is assisting medicine, Nature Medicine 27, 2 (2021).
Chen et al. [2021] Z. Chen, N. Andrejevic, N. C. Drucker, T. Nguyen, R. P. Xian, T. Smidt, Y. Wang, R. Ernstorfer, D. A. Tennant, M. Chan, et al., Machine learning on neutron and x-ray scattering and spectroscopies, Chemical Physics Reviews 2, 031301 (2021).
Schmidt et al. [2019] J. Schmidt, M. R. Marques, S. Botti, and M. A. Marques, Recent advances and applications of machine learning in solid-state materials science, npj Computational Materials 5, 1 (2019).
Bedolla et al. [2020] E. Bedolla, L. C. Padierna, and R. Castaneda-Priego, Machine learning for condensed matter physics, Journal of Physics: Condensed Matter 33, 053001 (2020).
Karniadakis et al. [2021] G. E. Karniadakis, I. G. Kevrekidis, L. Lu, P. Perdikaris, S. Wang, and L. Yang, Physics-informed machine learning, Nature Reviews Physics 3, 422 (2021).
Peng et al. [2021] G. C. Peng, M. Alber, A. Buganza Tepole, W. R. Cannon, S. De, S. Dura-Bernal, K. Garikipati, G. Karniadakis, W. W. Lytton, P. Perdikaris, et al., Multiscale modeling meets machine learning: What can we learn?, Archives of Computational Methods in Engineering 28, 1017 (2021).
Wang et al. [2023] J. Wang, H. Jiang, G. Chen, H. Wang, L. Lu, J. Liu, and L. Xing, Integration of multi-physics and machine learning-based surrogate modelling approaches for multi-objective optimization of deformed gdl of pem fuel cells, Energy and AI 14, 100261 (2023).
Edelen et al. [2020] A. Edelen, N. Neveu, M. Frey, Y. Huber, C. Mayes, and A. Adelmann, Machine learning for orders of magnitude speedup in multiobjective optimization of particle accelerator systems, Physical Review Accelerators and Beams 23, 044601 (2020).
Xu et al. [2020] H. Xu, J. Ma, P. Tan, B. Chen, Z. Wu, Y. Zhang, H. Wang, J. Xuan, and M. Ni, Towards online optimisation of solid oxide fuel cell performance: Combining deep learning with multi-physics simulation, Energy and AI 1, 100003 (2020).
Hennigh et al. [2021] O. Hennigh, S. Narasimhan, M. A. Nabian, A. Subramaniam, K. Tangsali, Z. Fang, M. Rietmann, W. Byeon, and S. Choudhry, Nvidia simnet: An ai-accelerated multi-physics simulation framework, in International conference on computational science (Springer, 2021) pp. 447–461.
Kasim et al. [2021] M. F. Kasim, D. Watson-Parris, L. Deaconu, S. Oliver, P. Hatfield, D. H. Froula, G. Gregori, M. Jarvis, S. Khatiwala, J. Korenaga, et al., Building high accuracy emulators for scientific simulations with deep neural architecture search, Machine Learning: Science and Technology 3, 015013 (2021).
Ladickỳ et al. [2015] L. Ladickỳ, S. Jeong, B. Solenthaler, M. Pollefeys, and M. Gross, Data-driven fluid simulations using regression forests, ACM Transactions on Graphics (TOG) 34, 1 (2015).
Lu et al. [2021] H. Lu, D. Ermakova, H. M. Wainwright, L. Zheng, and D. M. Tartakovsky, Data-informed emulators for multi-physics simulations, Journal of Machine Learning for Modeling and Computing 2 (2021).
Bi et al. [2023] K. Bi, L. Xie, H. Zhang, X. Chen, X. Gu, and Q. Tian, Accurate medium-range global weather forecasting with 3d neural networks, Nature 619, 533 (2023).
Han et al. [2020] J. Han, L. Zhang, et al., Integrating machine learning with physics-based modeling, arXiv preprint arXiv:2006.02619 (2020).
Willard et al. [2020] J. Willard, X. Jia, S. Xu, M. Steinbach, and V. Kumar, Integrating physics-based modeling with machine learning: A survey, arXiv preprint arXiv:2003.04919 1, 1 (2020).
Kochkov et al. [2021] D. Kochkov, J. A. Smith, A. Alieva, Q. Wang, M. P. Brenner, and S. Hoyer, Machine learning–accelerated computational fluid dynamics, Proceedings of the National Academy of Sciences 118, e2101784118 (2021).
Haug and Koch [2004] H. Haug and S. W. Koch, Quantum Theory of the Optical and Electronic Properties of Semiconductors, 2nd ed. (World Scientific, Singapore, 2004).
Coldren et al. [2012] L. A. Coldren, S. W. Corzine, and M. L. Masanovic, Diode Lasers and Photonic Integrated Circuits, 2nd ed., Wiley series in microwave and optical enginieering (Wiley & Sons, 2012).
Meinecke et al. [2023] S. Meinecke, F. Köster, D. Christiansen, K. Lüdge, A. Knorr, and M. Selig, Data-driven forecasting of nonequilibrium solid-state dynamics, Phys. Rev. B 107, 184306 (2023).
Lingnau [2015] B. Lingnau, Nonlinear and Nonequilibrium Dynamics of Quantum-Dot Optoelectronic Devices, Springer Theses (Springer International Publishing, Switzerland, 2015).
Roos et al. [2021] A. Roos, S. Meinecke, and K. Lüdge, Stabilizing nanolasers via polarization lifetime tuning, Sci. Rep. 11, 18558 (2021).
Takens [1981] F. Takens, Detecting strange attractors in turbulence, in Dynamical Systems and Turbulence, Warwick 1980, edited by D. G. Rand and L. S. Young (Springer Berlin Heidelberg, Berlin, Heidelberg, 1981) pp. 366–381.
Meinecke et al. [2019] S. Meinecke, L. Drzewietzki, C. Weber, B. Lingnau, S. Breuer, and K. Lüdge, Ultra-short pulse generation in a three section tapered passively mode-locked quantum-dot semiconductor laser, Sci. Rep. 9, 1783 (2019).
Meinecke [2022] S. Meinecke, Spatio-Temporal Modeling and Device Optimization of Passively Mode-Locked Semiconductor Lasers, Springer Theses (Springer, Cham, 2022).
Zhang et al. [2018] L. Zhang, H. Wang, et al., Reinforced dynamics for enhanced sampling in large atomic and molecular systems, The Journal of chemical physics 148 (2018).
Kormanyos et al. [2015] A. Kormanyos, G. Burkard, M. Gmitra, J. Fabian, V. Zolyomi, N. D. Drummond, and V. Falko, k p theory for two-dimensional transition metal dichalcogenide semiconductors, 2D Materials 2, 022001 (2015).
Li et al. [2013] X. Li, J. T. Mullen, Z. Jin, K. M. Borysenko, M. Buongiorno Nardelli, and K. W. Kim, Intrinsic electrical transport properties of monolayer silicene and $\mathrm{MoS}_{2}$ from first principles, Phys. Rev. B 87, 115418 (2013).
Butscher et al. [2007] S. Butscher, F. Milde, M. Hirtschulz, E. Malić, and A. Knorr, Hot electron relaxation and phonon dynamics in graphene, Applied Physics Letters 91, 203103 (2007).
Jin et al. [2014] Z. Jin, X. Li, J. T. Mullen, and K. W. Kim, Intrinsic transport properties of electrons and holes in monolayer transition-metal dichalcogenides, Phys. Rev. B 90, 045422 (2014).
Steinhoff et al. [2014] A. Steinhoff, M. Rösner, F. Jahnke, T. O. Wehling, and C. Gies, Influence of excited carriers on the optical and electronic properties of mos2, Nano Letters 14, 3743 (2014).
Perea-Causin et al. [2019] R. Perea-Causin, S. Brem, R. Rosati, R. Jago, M. Kulig, J. D. Ziegler, J. Zipfel, A. Chernikov, and E. Malic, Exciton propagation and halo formation in two-dimensional materials, Nano letters 19, 7317 (2019).
Selig et al. [2019] M. Selig, F. Katsch, R. Schmidt, S. M. de Vasconcellos, R. Bratschitsch, E. Malic, and A. Knorr, Ultrafast dynamics in monolayer transition metal dichalcogenides: Interplay of dark excitons, phonons, and intervalley exchange, Physical Review Research 1, 022007 (2019).
Golub and Van Loan [2013] G. H. Golub and C. F. Van Loan, Matrix Computations, Johns Hopkins Studies in the Mathematical Sciences (Johns Hopkins University Press, 2013).
Brunton and Kutz [2022] S. L. Brunton and J. N. Kutz, Data-driven science and engineering: Machine learning, dynamical systems, and control, 2nd ed. (Cambridge University Press, Cambridge, 2022).
Karhunen [1947] K. Karhunen, Under lineare methoden in der wahr scheinlichkeitsrechnung, Annales Academiae Scientiarun Fennicae Series A1: Mathematia Physica 47 (1947).
Loeve [2017] M. Loeve, Probability theory (Courier Dover Publications, 2017).
Lorenz [1956] E. N. Lorenz, Empirical orthogonal functions and statistical weather prediction, Vol. 1 (Massachusetts Institute of Technology, Department of Meteorology Cambridge, 1956).
Lumley [1967] J. L. Lumley, The structure of inhomogeneous turbulent flows, Atmospheric turbulence and radio wave propagation , 166 (1967).
Cherry [1996] S. Cherry, Singular value decomposition analysis and canonical correlation analysis, Journal of Climate 9, 2003 (1996).
Eckart and Young [1936] C. Eckart and G. Young, The approximation of one matrix by another of lower rank, Psychometrika 1, 211 (1936).
Akusok et al. [2015] A. Akusok, K. M. Börk, Y. Miche, and A. Lendasse, High-performance extreme learning machines: A complete toolbox for big data applications, IEEE Access 3, 1011 (2015).
Huang et al. [2004] G. B. Huang, Q. Y. Zhu, and C. K. Siew, Extreme learning machine: a new learning scheme of feedforward neural networks, IEEE International Joint Conference on Neural Networks (IEEE Cat. No.04CH37541) 2, 985 (2004).
Huang et al. [2006a] G. B. Huang, L. Chen, and C. K. Siew, Universal approximation using incremental constructive feedforward networks with random hidden nodes, IEEE Trans. Neural Networks 17, 879 (2006a).
Huang et al. [2011] G. B. Huang, H. Zhou, X. Ding, and R. Zhang, Extreme learning machine for regression and multiclass classification, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics) 42, 513 (2011).
Huang et al. [2006b] G. B. Huang, Q. Y. Zhu, and C. K. Siew, Extreme learning machine: theory and applications, Neurocomputing 70, 489 (2006b).
Pyle et al. [2021] R. Pyle, N. Jovanovic, D. Subramanian, K. V. Palem, and A. B. Patel, Domain-driven models yield better predictions at lower cost than reservoir computers in lorenz systems, Philos. Trans. Royal Soc. A 379, 20200246 (2021).
Vogel [2002] C. R. Vogel, Computational Methods for Inverse Problems (Society for Industrial and Applied Mathematics, 2002).
Press et al. [2007] W. H. Press, B. P. Flannery, S. A. Teukolsky, and W. T. Vettering, Numerical Recipes (3rd ed.) (Cambridge University Press, Cambridge, 2007).

$\displaystyle\partial_{t}f_{\mathbf{k}}\|_{\mathrm{col}}^{\mathrm{micro}}$	$\displaystyle=\frac{2\pi}{\hbar}\sum_{\mathbf{k^{\prime}},\alpha,\pm}\|g_{% \mathbf{k}-\mathbf{k^{\prime}}}\|^{2}\left(\frac{1}{2}\pm\frac{1}{2}+n_{\mathbf% {\mathbf{k}-\mathbf{k^{\prime}}}}^{\alpha}\right)\left(1-f_{\mathbf{k}}\right)% f_{\mathbf{k^{\prime}}}\delta(\epsilon_{\mathbf{k}}-\epsilon_{\mathbf{k^{% \prime}}}\pm\hbar\omega^{\alpha}_{\mathbf{k-k^{\prime}}})$
	$\displaystyle-\frac{2\pi}{\hbar}\sum_{\mathbf{k^{\prime}},\alpha,\pm}\|g_{% \mathbf{k}-\mathbf{k^{\prime}}}\|^{2}\left(\frac{1}{2}\pm\frac{1}{2}+n_{\mathbf% {\mathbf{k}-\mathbf{k^{\prime}}}}^{\alpha}\right)\left(1-f_{\mathbf{k^{\prime}% }}\right)f_{\mathbf{k}}\delta(\epsilon_{\mathbf{k}}-\epsilon_{\mathbf{k^{% \prime}}}\mp\hbar\omega^{\alpha}_{\mathbf{k-k^{\prime}}})$	(8)
$\displaystyle\partial_{t}n_{\mathbf{q}}^{\alpha}$	$\displaystyle=\frac{2\pi}{\hbar}\|g_{\mathbf{q}}^{\alpha}\|^{2}\sum_{\mathbf{k}}% \left(1-f_{\mathbf{k}}\right)f_{\mathbf{k+q}}\left(1+n_{\mathbf{q}}\right)% \delta(\epsilon_{\mathbf{k}}-\epsilon_{\mathbf{k+q}}+\hbar\omega^{\alpha}_{% \mathbf{q}})$
	$\displaystyle-\frac{2\pi}{\hbar}\|g_{\mathbf{q}}^{\alpha}\|^{2}\sum_{\mathbf{k}}% \left(1-f_{\mathbf{k+q}}\right)f_{\mathbf{k}}n_{\mathbf{q}}\delta(\epsilon_{% \mathbf{k}}-\epsilon_{\mathbf{k+q}}-\hbar\omega^{\alpha}_{\mathbf{q}}).$	(9)