[2]\fnmSigrid \surLeyendecker

[1]\orgdivDepartment of Mathematical Sciences, \orgnameNorwegian University of Science and Technology, \orgaddress\streetAlfred Getz’ vei 1, \cityTrondheim, \postcode7034, \countryNorway

2]\orgdivInstitute of Applied Dynamics, \orgnameFriedrich-Alexander-Universität Erlangen-Nürnberg, \orgaddress\streetImmerwahrstrasse 1, \cityErlangen, \postcode91058, \countryGermany

Neural networks for the approximation of Euler’s elastica

\fnmElena \surCelledoni elena.celledoni@ntnu.no \fnmErgys \surÇokaj ergys.cokaj@ntnu.no \fnmAndrea \surLeone andrea.leone@ntnu.no sigrid.leyendecker@fau.de \fnmDavide \surMurari davide.murari@ntnu.no \fnmBrynjulf \surOwren brynjulf.owren@ntnu.no \fnmRodrigo T. \surSato Martín de Almagro rodrigo.t.sato@fau.de \fnmMartina \surStavole martina.stavole@fau.de * [

Abstract

Euler’s elastica is a classical model of flexible slender structures relevant in many industrial applications. Static equilibrium equations can be derived via a variational principle. The accurate approximation of solutions to this problem can be challenging due to nonlinearity and constraints. We here present two neural network-based approaches for simulating Euler’s elastica. Starting from a data set of solutions of the discretised static equilibria, we train the neural networks to produce solutions for unseen boundary conditions. We present a discrete approach learning discrete solutions from the discrete data. We then consider a continuous approach using the same training data set but learning continuous solutions to the problem. We present numerical evidence that the proposed neural networks can effectively approximate configurations of the planar Euler’s elastica for a range of different boundary conditions.

keywords:

planar Euler’s elastica, supervised learning, neural networks, geometric mechanics, variational problem.

1 Introduction

Modelling of mechanical systems is relevant in various branches of engineering. Typically, it leads to the formulation of variational problems and differential equations, whose solutions are approximated with numerical techniques. The efficient solution of linear and nonlinear systems resulting from the discretisation of mechanical problems has been a persistent challenge of applied mathematics. While classical solvers are characterised by a well-established and mature body of literature [1, 2, 3, 4, 5, 6, 7], the past decade has witnessed a surge in the use of novel machine learning-assisted techniques [8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24]. These approaches aim at enhancing solution methods by leveraging the wealth of available data and known physical principles. The use of deep learning techniques to improve the performance of traditional numerical algorithms in terms of efficiency, accuracy, and computational scalability [9], is becoming increasingly popular also in computational mechanics (see, e.g. [25]). Examples include a wide range of problems that require the approximation of functions, as well as efficient reduced order modeling [26] or more specific numerical tasks such as optimising the quadrature rule for computing the finite element stiffness matrix [27] or the investigation of data-driven numerical frameworks for the bifurcation analysis of partial differential equations [28, 29]. This recent literature is evidence that neural networks can be used successfully as surrogate models for the solution operators of various differential equations.

In the context of ordinary and partial differential equations, two main trends can be identified. The first one aims at providing a machine learning-based approximation to the discrete solutions of differential problems on a specific space-time grid, for example, by solving linear or nonlinear systems efficiently and accelerating convergence of iterative schemes [20, 19, 15, 14, 13]. The second one provides instead solutions to the differential problem as continuous (and differentiable) functions of the temporal and spatial variables. Depending on the context, conditions on such approximate solutions are provided by the differential problem itself, the initial values and boundary conditions, and the available data. The idea of providing approximate solutions as functions defined on the space-time domain and parametrised as neural networks was proposed in the nineties [30] and was recently revived in the framework of Physics-Informed Neural Networks in [10]. Since then, such an approach has attracted much interest and developed in many directions [8, 21, 31].

In this work, we use neural networks to approximate the configurations of highly flexible slender structures modelled as beams. Such models are of great interest in industrial applications like cable car ropes, diverse types of wires or endoscopes [32, 33, 34, 35]. Notwithstanding their ingenious and simple mathematical formulation, slender structure models can accurately reproduce complex mechanical behaviour and for this reason their numerical discretisation is often challenging. Furthermore, the use of 3-dimensional models requires high computational time. Due to the fact that slender deformable structures have one dimension (length) being orders of magnitude larger than their other dimensions (cross-section), it is possible to reduce the complexity of the problem from a $3$ -dimensional elastic continuum to a $1$ -dimensional beam. A beam is modelled as a centerline curve, $\mathbf{q}:[0,L]\to\mathbb{R}^{n},s\mapsto\mathbf{q}(s)$ , with $n=2$ or $n=3$ , along which a rigid cross-section $\Sigma(s)$ is attached. The main model assumption is that the diameter of $\Sigma(s)$ is small compared with the undeformed length $L$ . The complexity of the model depends on factors such as the dimension of the problem, the translational and rotational degrees of freedom (DOF) at each node of the beam, the analysis, i.e., static or dynamic. Exploring the numerous beam models documented in the literature, we choose to approach the challenge of approximating beam deformations using a simple yet widely employed model, i.e., the $2$ -dimensional Euler’s elastica [36]. The cross-section $\Sigma(s)$ is assumed to have unchanged geometrical and material properties, and be orthogonal to the centerline $\mathbf{q}(s)$ . The latter is an inextensible curve and solution of a bending energy minimisation problem [37, 38, 39] for given boundary conditions.

Although the 2-dimensional Euler’s elastica is relatively simple compared to more comprehensive models, it can robustly represent interesting real-world phenomena. For instance, the elastica model appropriately captures the high bending deformations of flexible endoscopes, complex medical devices, during surgeries [33]. The approximation of the elastica through neural networks can help predict the deformed configuration of the beam for endoscopy simulations, particularly when the beam encounters constraints in confined spaces.

When approximating static equilibria of Euler’s elastica via neural networks, a key issue is to ensure the inextensibility of the curve (having unit norm tangents) as well as the boundary conditions. Two main approaches can be found in the literature [31, 21, 40]. One is the weak imposition of constraints and boundary conditions adding appropriate extra terms to the loss function. The other is a strong imposition strategy consisting in shaping the network architectures to satisfy the constraints by construction. We show examples of both the approaches in Sections 4 and 5.

The paper is organised as follows. In Section 2, we present the mathematical model of the planar Euler’s elastica, including its continuous and discrete equilibrium equations. We describe the approach used to generate the data sets for the numerical experiments. In Section 3, we introduce some basic theory and notation for neural networks that we shall use in the succeeding sections. Starting from general theory, we specialise in the task of approximating configurations of Euler’s elastica. In Section 4, we introduce the discrete approach, which aims to approximate precomputed numerical discretisations of Euler’s elastica. This represents the natural approach to approximate the discrete solution trajectories with a parametric method. We discuss some drawbacks associated with this approach and then propose an alternative approximation strategy in Section 5, that leverages the fact that we are approximating a continuous curve on a spatial grid. The continuous approach consists in computing an arc length parametrisation of the beam configuration. We provide insights into two additional networks and analyse how the test accuracy changes with varying constraints, such as boundary conditions or tangent vector norms. Data and codes for the numerical experiments are available in the GitHub repository associated to the paper¹¹1https://github.com/ergyscokaj/LearningEulersElastica.

Main contributions: This paper presents advancements in the approximation of beam static configurations using neural networks. These advancements include: (i) A detailed experimental analysis of approximating numerical discretisations of Euler’s elastica configurations through what we call discrete network, (ii) Identification and discussion of the limitations associated with this discrete approach, and (iii) Introduction of a new parametrisation strategy called continuous network to address some of these drawbacks.

Nomenclature
$\mathcal{L}$	continuous Lagrangian function
$\mathcal{S}$	continuous action functional
$\mathcal{L}_{d}$	discrete Lagrangian function
$\mathcal{S}_{d}$	discrete action functional
$\mathbf{q}$	configuration of the beam
$\mathbf{q}^{\prime}$	first spatial derivative of $\mathbf{q}$
$\theta$	tangential angle
$s$	arc length parameter
$\kappa$	curvature
$L$	length of the undeformed beam
$EI$	bending stiffness, with $E$ the elastic modulus and $I$ the second moment of area
$\hat{\mathbf{q}}$	numerical approximation of $\mathbf{q}$
$N+1$	number of discretisation nodes, with $N$ the number of intervals
$h$	space step (length of each interval)
$q_{\boldsymbol{\rho}}^{\textrm{d}}$	discrete neural network
$q_{\boldsymbol{\rho}}^{\textrm{c}}$	continuous neural network approximating the solution curve $\boldsymbol{q}(s)$
$\theta_{\boldsymbol{\rho}}^{\textrm{c}}$	continuous neural network approximating the angular function $\theta(s)$
$\boldsymbol{\rho}$	parameters of the neural network
$\ell$	number of layers in the neural network
$\sigma$	activation function
$M$	number of training data
$B$	size of one training batch
MSE	mean squared error
MLP	multi layer perceptron
MULT	multiplicative neural network
$\mathcal{D}$	differential operator
$\mathcal{I}$	quadrature operator

Table 1: List of abbreviations and notations.

2 Euler’s elastica model

We consider an inextensible beam model in which the cross-section $\Sigma(s)$ is assumed to be constant along the arc length $s$ and perpendicular to the centerline $\mathbf{q}(s)$ , which means that no shear deformation can occur. Thus, the deformation of the centerline is a pure bending problem, precisely Euler’s elastica curve. In the following, we assume $\mathbf{q}\in C^{2}([0,L],\mathbb{R}^{2})$ , i.e., the curve is planar and twice continuously differentiable with length $L$ . If $s$ denotes the arc length parameter, then $\|\mathbf{q}^{\prime}(s)\|=1$ , where ${}^{\prime}=\frac{d}{ds}$ , for all $s\in[0,L]$ . The elastica problem consists in minimising the following Euler-Bernoulli energy functional

\int_{0}^{L}\kappa(s)^{2}ds,

where $\kappa(s)$ denotes the curvature of $\mathbf{q}(s)$ , [38]. Given the arc length parametrisation, then $\kappa(s)=\|\mathbf{q}^{\prime\prime}(s)\|$ .

We can reformulate this problem as a constrained Lagrangian problem as follows. Consider the second-order Lagrangian $\mathcal{L}:T^{(2)}Q\rightarrow\mathbb{R}$ , where $T^{(2)}Q$ denotes the second-order tangent bundle [41] of the configuration manifold $Q$ , which in this case is $\mathbb{R}^{2}$ :

\mathcal{L}\left(\mathbf{q},\mathbf{q}^{\prime},\mathbf{q}^{\prime\prime}% \right)=\frac{1}{2}EI\left\|\mathbf{q}^{\prime\prime}\right\|^{2}\,.

(1)

Here, abusing the notation, ^′ denotes a spatial derivative, but we do not initially assume arc length parametrisation. The parameter $EI$ is the bending stiffness, which governs the response of the elastica under bending. This mechanical parameter consists of a material and a geometric properties, where $E$ is the Young’s modulus and $I$ is the second moment of area of the cross-section $\Sigma$ . For simplicity, these parameters are assumed to be constant along the length of the beam.

In order to recover the solutions of the elastica, the Lagrangian in Equation (1) must be supplemented with the constraint equation

\Phi(\mathbf{q},\mathbf{q}^{\prime})=\|\mathbf{q}^{\prime}\|^{2}-1=0.

(2)

This imposes arc length parametrisation of the curve $\mathbf{q}(s)$ and leads to the augmented Lagrangian $\widetilde{\mathcal{L}}:T^{(2)}Q\times\mathbb{R}\rightarrow\mathbb{R}$

\widetilde{\mathcal{L}}\left(\mathbf{q},\mathbf{q}^{\prime},\mathbf{q}^{\prime% \prime},\Lambda\right)=\mathcal{L}\left(\mathbf{q},\mathbf{q}^{\prime},\mathbf% {q}^{\prime\prime}\right)+\Lambda\Phi(\mathbf{q},\mathbf{q}^{\prime}),

(3)

where $\Lambda(s)$ is a Lagrange multiplier, see [39]. The Lagrangian function coincides with the total elastic energy over solutions of the corresponding Euler-Lagrange equations. The internal bending moment is directly related to the curvature $\kappa(s)$ .

The continuous action functional $\mathcal{S}$ is defined as:

\mathcal{S}[\mathbf{q}]=\int_{0}^{L}\widetilde{\mathcal{L}}\left(\mathbf{q},% \mathbf{q}^{\prime},\mathbf{q}^{\prime\prime},\Lambda\right)\,ds.

(4)

Applying Hamilton’s principle of stationary action, $\delta\mathcal{S}=0$ , yields the Euler-Lagrange equations

	$\displaystyle\frac{d^{2}}{ds^{2}}\left(\frac{\partial\mathcal{L}}{\partial% \mathbf{q}^{\prime\prime}}\right)-\frac{d}{ds}\left(\frac{\partial\mathcal{L}}% {\partial\mathbf{q}^{\prime}}\right)+\frac{\partial\mathcal{L}}{\partial% \mathbf{q}}$	$\displaystyle=\frac{d}{ds}\left(\frac{\partial\Phi}{\partial\mathbf{q}^{\prime% }}\Lambda\right)-\frac{\partial\Phi}{\partial\mathbf{q}}\Lambda,$		(5)
	$\displaystyle\\|\mathbf{q}^{\prime}\\|^{2}-1$	$\displaystyle=0,$		(5)

which need to be satisfied together with the boundary conditions on positions and tangents, i.e., $(\mathbf{q}(0),\mathbf{q}^{\prime}(0))=(\mathbf{q}_{0},\mathbf{q}^{\prime}_{0})$ and $(\mathbf{q}(L),\mathbf{q}^{\prime}(L))=(\mathbf{q}_{N},\mathbf{q}^{\prime}_{N})$ .

2.1 Space discretisation of the elastica

The continuous augmented Lagrangian $\widetilde{\mathcal{L}}$ in Equation (3) and the action integral $\mathcal{S}$ in Equation (4) are discretised over the beam length $L$ using constant step size $h=L/N$ , with $N+1$ the number of the resulting equidistant nodes $0=s_{0}<s_{1}<\ldots<s_{N-1}<s_{N}=L$ . In second-order systems, the discrete Lagrangian is a function $\widetilde{\mathcal{L}}_{d}:TQ\times TQ\times\mathbb{R}\times\mathbb{R}% \rightarrow\mathbb{R}$ . In this study, we refer to a discretisation of the Lagrangian function proposed in [42] based on the trapezoidal rule:

\begin{split}&\widetilde{\mathcal{L}}_{d}\left(\mathbf{q}_{k},\mathbf{q}^{% \prime}_{k},\mathbf{q}_{k+1},\mathbf{q}^{\prime}_{k+1},\Lambda_{k},\Lambda_{k+% 1}\right)\\ &=\frac{h}{2}\left[\widetilde{\mathcal{L}}(\mathbf{q}_{k},\mathbf{q}^{\prime}_% {k},\left(\mathbf{q}^{\prime\prime}_{k}\right)^{-},\Lambda_{k})+\widetilde{% \mathcal{L}}(\mathbf{q}_{k+1},\mathbf{q}^{\prime}_{k+1},\left(\mathbf{q}^{% \prime\prime}_{k+1}\right)^{+},\Lambda_{k+1})\right],\end{split}

where $\mathbf{q}_{k}$ , $\mathbf{q}^{\prime}_{k}$ , and $\Lambda_{k}$ are approximations of $\mathbf{q}(s_{k})$ , $\mathbf{q}^{\prime}(s_{k})$ , and $\Lambda(s_{k})$ , and the curvature on the interval $[s_{k},s_{k+1}]$ is approximated in terms of lower order derivatives as follows

	$\displaystyle\mathbf{q}^{\prime\prime}(s_{k})\approx(\mathbf{q}^{\prime\prime}% _{k})^{-}=$	$\displaystyle\,\frac{\left(-2\mathbf{q}^{\prime}_{k+1}-4\mathbf{q}^{\prime}_{k% }\right)h+6(\mathbf{q}_{k+1}-\mathbf{q}_{k})}{h^{2}}\,,$
	$\displaystyle\mathbf{q}^{\prime\prime}(s_{k+1})\approx(\mathbf{q}^{\prime% \prime}_{k+1})^{+}=$	$\displaystyle\,\frac{\left(4\mathbf{q}^{\prime}_{k+1}+2\mathbf{q}^{\prime}_{k}% \right)h-6(\mathbf{q}_{k+1}-\mathbf{q}_{k})}{h^{2}}\,.$

This amounts to a piece-wise linear and discontinuous approximation of the curvature on $[0,L]$ .

The action integral in Equation (4) along the exact solution $\mathbf{q}$ with boundary conditions $\left(\mathbf{q}_{0},\mathbf{q}^{\prime}_{0}\right)$ and $\left(\mathbf{q}_{N},\mathbf{q}^{\prime}_{N}\right)$ is approximated by

\mathcal{S}_{d}=\sum_{k=0}^{N-1}\widetilde{\mathcal{L}}_{d}\left(\mathbf{q}_{k% },\mathbf{q}^{\prime}_{k},\mathbf{q}_{k+1},\mathbf{q}^{\prime}_{k+1},\Lambda_{% k},\Lambda_{k+1}\right).

(6)

The discrete variational principle $\delta\mathcal{S}_{d}=0$ leads to the following discrete Euler-Lagrange equations:

$\displaystyle D_{3}\widetilde{\mathcal{L}}_{d}\left(\mathbf{q}_{k-1},\mathbf{q% }^{\prime}_{k-1},\mathbf{q}_{k},\mathbf{q}^{\prime}_{k},\Lambda_{k-1},\Lambda_% {k}\right)+D_{1}\widetilde{\mathcal{L}}_{d}\left(\mathbf{q}_{k},\mathbf{q}^{% \prime}_{k},\mathbf{q}_{k+1},\mathbf{q}^{\prime}_{k+1},\Lambda_{k},\Lambda_{k+% 1}\right)$	$\displaystyle=0,$	(7)
$\displaystyle D_{4}\widetilde{\mathcal{L}}_{d}\left(\mathbf{q}_{k-1},\mathbf{q% }^{\prime}_{k-1},\mathbf{q}_{k},\mathbf{q}^{\prime}_{k},\Lambda_{k-1},\Lambda_% {k}\right)+D_{2}\widetilde{\mathcal{L}}_{d}\left(\mathbf{q}_{k},\mathbf{q}^{% \prime}_{k},\mathbf{q}_{k+1},\mathbf{q}^{\prime}_{k+1},\Lambda_{k},\Lambda_{k+% 1}\right)$	$\displaystyle=0,$
$\displaystyle D_{6}\widetilde{\mathcal{L}}_{d}\left(\mathbf{q}_{k-1},\mathbf{q% }^{\prime}_{k-1},\mathbf{q}_{k},\mathbf{q}^{\prime}_{k},\Lambda_{k-1},\Lambda_% {k}\right)+D_{5}\widetilde{\mathcal{L}}_{d}\left(\mathbf{q}_{k},\mathbf{q}^{% \prime}_{k},\mathbf{q}_{k+1},\mathbf{q}^{\prime}_{k+1},\Lambda_{k},\Lambda_{k+% 1}\right)$	$\displaystyle=0,$

for $k=1,\dots,N-1$ , which approximate the equilibrium equations of the beam in Equations (5) and can be solved together with the boundary conditions. Here, $D_{i}$ for $i=1,\dots,6$ denotes the differentiation with respect to the $i$ -th argument.

2.2 Data generation

The elastica was one of the first examples displaying elastic instability and bifurcation phenomena [43, 44]. Elastic instability implies that small perturbations of the boundary conditions might lead to large changes in the beam configuration, which results in unstable equilibria. Under certain boundary conditions, bifurcation can appear leading to a multiplicity of solutions [38]. In particular, this means that the numerical problem may display history-dependence and converge to solutions that do not minimise the bending energy. In order to generate a physically meaningful data set, avoiding unstable and non-unique solutions is essential. Thus, in addition to the minimisation of the discrete action $S_{d}$ in Equation (6), we ensure the fulfilment of the discrete Euler-Lagrange equations (7), which can be seen as necessary conditions for the stationarity of the discrete action. We exclude from the data set numerical solutions computed with boundary conditions where minimisation of Equation (6) and accurate solution of Equations (7) can not be simultaneously achieved.

In particular, we consider a curve of length $L=3.3$ and bending stiffness $EI=10$ , divided into $N=50$ intervals. We fix the endpoints $\mathbf{q}_{0}=(0,0)$ , $\mathbf{q}_{N}=(3,0)$ . The units of measurement are deliberately omitted as they have no impact on the results of this work. We impose boundary conditions on the tangents in the following two variants:

1.

the angle of the tangents with respect to the $x$ -axis at the boundary, $\theta_{0}$ and $\theta_{N}$ , is prescribed in the range $[0,2\pi]$ , in a specular symmetric fashion, i.e., $\theta_{N}=\pi-\theta_{0}$ . Hereafter, we refer to this case as both-ends,
2.

the angle of the left tangent is left fixed as $\theta_{0}=0$ and the angle of the right tangent, $\theta_{N}$ , varies in the range of $[0,2\pi]$ . We refer to this case as right-end.

Based on these parameters and boundary values, and using cubic splines as initial guess, we generate a data set of $2000$ trajectories ( $1000$ trajectories for each case) by minimising the particular action in Equation (6), with the trust-constr solver of the optimize.minimize procedure provided in SciPy [45]. We check the resulting solutions by using them as initial guesses for the optimize.root method of SciPy, solving the discrete Euler-Lagrange equations (7).

The learning problem we consider relies on numerically generated solution curves. This choice allows us to work with data points that are quantifiably close to the analytical solution of Euler’s elastica. Consequently, showing that the neural networks we propose can accurately approximate these curves translates into their ability to approximate the analytical solution accurately. The motivation of this strategy is not to improve on the numerical solver we use, but to use its accuracy to train a model able to extrapolate to unseen boundary conditions and generate their solution curves more efficiently than the numerical method itself. The chosen supervised learning setting is independent of the fact that we use numerical solutions as data. Indeed, if one had another reliable approximation of the analytical solution, for example, based on realistic measurements, those could also be used or combined with numerically generated trajectories. Using numerical solutions as data is not an inherent limitation of the proposed procedure but a choice we make to quotient out the issue of not having reliable input data. Furthermore, we mainly focus on the development of neural networks able to approximate such input data with high accuracy.

3 Approximation with neural networks

We start by providing a concise overview of neural networks, which also serves to define the notation used in Sections 4 and 5. We refer to [46, 31, 47] and references therein for a more extensive introduction. A neural network is a parametric function $f_{\boldsymbol{\rho}}:\mathcal{I}\rightarrow\mathcal{O}$ with parameters $\boldsymbol{\rho}\in\Psi$ given as a composition of multiple transformations,

f_{\boldsymbol{\rho}}:=f_{\ell}\circ\dots\circ f_{j}\circ\dots\circ f_{1},

(8)

where each $f_{j}$ represents the $j$ -th layer of the network, with $j=1,\dots,\ell$ , and $\ell$ is the number of layers. For example, multi-layer perceptrons (MLPs) have each layer $f_{j}$ defined as

f_{j}^{\mathrm{MLP}}(\mathbf{x})=\sigma(\mathbf{A}_{j}\mathbf{x}+\mathbf{b}_{j% })\in\mathbb{R}^{n_{j}},

(9)

where $n_{j}$ is the dimension of the output of the $j$ -th layer, $\mathbf{x}\in\mathbb{R}^{n_{j-1}}$ , and $\mathbf{A}_{j}\in\mathbb{R}^{n_{j}\times n_{j-1}}$ , $\mathbf{b}_{j}\in\mathbb{R}^{n_{j}}$ are the parameters of the $j$ -th layer, i.e., $\boldsymbol{\rho}=\{\mathbf{A}_{j},\mathbf{b}_{j}\}_{j=1}^{\ell}$ . The activation function $\sigma$ is a continuous nonlinear scalar function, which acts component-wise on vectors. The architecture of the neural network is prescribed by the layers $f_{j}$ in Equation (8) and determines the space of functions $\mathcal{F}=\{f_{\boldsymbol{\rho}}:\mathcal{I}\rightarrow\mathcal{O},\,\,% \boldsymbol{\rho}\in\Psi\}$ that can be represented. The weights $\boldsymbol{\rho}$ are chosen such that $f_{\boldsymbol{\rho}}$ approximates accurately enough a map of interest $f:\mathcal{I}\rightarrow\mathcal{O}$ . Usually, this choice follows from minimising a purposely designed loss function $\rm{Loss}(\boldsymbol{\rho})$ .

In supervised learning, we are given a data set $\Omega=\{\mathbf{x}^{i},\mathbf{y}^{i}\}_{i=1}^{M}$ consisting of $M$ pairs $(\mathbf{x}^{i},\mathbf{y}^{i}=f\left(\mathbf{x}^{i}\right))$ . The loss function measures the distance between the network predictions $f_{\boldsymbol{\rho}}\left(\mathbf{x}^{i}\right)$ and the desired outputs $\mathbf{y}^{i}$ in some appropriate norm $\|\cdot\|$ ,

\textrm{Loss}(\boldsymbol{\rho})=\frac{1}{M}\sum_{i=1}^{M}\left\|f_{% \boldsymbol{\rho}}\left(\mathbf{x}^{i}\right)-\mathbf{y}^{i}\right\|^{2}.

The training of the network is the process of minimising $\rm{Loss}(\boldsymbol{\rho})$ with respect to $\boldsymbol{\rho}$ and it is usually done with gradient descent (GD):

\boldsymbol{\rho}{{}^{(k)}}\mapsto\boldsymbol{\rho}{{}^{(k)}}-\eta\nabla% \textrm{Loss}\left(\boldsymbol{\rho}{{}^{(k)}}\right)=:{\boldsymbol{\rho}^{(k+% 1)}}.

The scalar value $\eta$ is known as the learning rate. The iteration process is often implemented using subsets of data $\mathcal{B}\subset\Omega$ of cardinality $B=|\mathcal{B}|$ (batches). In this paper we use an accelerated version of GD known as Adam [48].

During training, we evaluate the model’s prediction accuracy using inputs in a validation set. This helps to prevent overfitting on the training data and may serve as a stopping criterion if the training loss diminishes but the validation error rises. Once the training is complete, we assess the model’s accuracy in predicting the correct output for new inputs included in a test set composed of boundary conditions outside the training and validation sets. In the following, we measure the accuracy on the training, validation, and test data using the mean squared error of the difference between the predicted trajectories and the true ones.

We now turn to the task of approximating the static equilibria of the planar elastica introduced in Section 2, i.e., approximating a family of curves $\{\mathbf{q}^{i}:[0,L]\mapsto\mathbb{R}^{2}\}$ determined by boundary conditions,

\{\mathbf{q}^{i}(0)=\mathbf{q}^{i}_{0},\;\mathbf{q}^{i}(L)=\mathbf{q}^{i}_{N},% \;(\mathbf{q}^{i})^{\prime}(0)=(\mathbf{q}^{i}_{0})^{\prime},\;(\mathbf{q}^{i}% )^{\prime}(L)=(\mathbf{q}^{i}_{N})^{\prime}\},

(10)

where $\left(\mathbf{q}^{i}_{0},\mathbf{q}^{i}_{N},(\mathbf{q}^{i}_{0})^{\prime},(% \mathbf{q}^{i}_{N})^{\prime}\right)\in\mathbb{R}^{8}$ . To tackle this problem, we require a set of evaluations $\{\mathbf{q}^{i}_{k},(\mathbf{q}^{i}_{k})^{\prime}\}$ on the nodes $s_{k}\in[0,L]$ of a discretisation. More precisely, in our setting, the data set includes numerical approximations $\hat{\mathbf{q}}$ of the solution $\mathbf{q}(s)$ and its spatial derivative $\mathbf{q}^{\prime}(s)$ at the $N-1$ discrete locations $s_{k}=\frac{kh}{L}$ in the interval $[0,L]$ , for $M$ pairs of boundary conditions, as described in Section 2.2.

4 The discrete network

The discretisation of Euler’s elastica presented in Section 2.1 provides discrete solutions on a set of nodes along the curve. These solutions can sometimes be hard to obtain since a global optimisation problem needs to be solved, and the number of nodes can be large. This motivates using neural networks to learn the approximate solution on the internal nodes for a given set of boundary conditions. The data set $\Omega$ consists of $M$ precomputed discrete solutions

\Omega=\left\{(\mathbf{x}^{i},\mathbf{y}^{i})\right\}_{i=1}^{M},

where

\mathbf{x}^{i}=\left(\mathbf{q}^{i}_{0},(\mathbf{q}^{i}_{0})^{\prime},\mathbf{% q}^{i}_{N},(\mathbf{q}^{i}_{N})^{\prime}\right)\in\mathbb{R}^{8}

are the input boundary conditions and

\mathbf{y}^{i}=(\hat{\mathbf{q}}_{1}^{i},(\hat{\mathbf{q}}^{i}_{1})^{\prime},% \ldots,\hat{\mathbf{q}}_{N-1}^{i},(\hat{\mathbf{q}}^{i}_{N-1})^{\prime})\in% \mathbb{R}^{4(N-1)}

are the computed solutions at the internal nodes that serve as output data for the network’s training.

For any symmetric positive definite matrix $W$ , we define the weighted norm $\|\mathbf{x}\|_{W}^{2}=\mathbf{x}^{\top}W\mathbf{x}$ . The weighted MSE loss

\mathrm{Loss}(\boldsymbol{\rho})=\frac{1}{4M(N-1)}\sum_{i=1}^{M}\left\|q_{% \boldsymbol{\rho}}^{\textrm{d}}\left(\mathbf{x}^{i}\right)-\mathbf{y}^{i}% \right\|_{W}^{2}

(11)

will be used to learn the input-to-output map $q_{\boldsymbol{\rho}}^{\textrm{d}}:\mathbb{R}^{8}\to\mathbb{R}^{4(N-1)}$ , where the superscript $\mathrm{d}$ stands for discrete. One should be aware that there is a numerical error in $\mathbf{y}^{i}$ compared to the exact solution and the size of this error will pose a limit to the accuracy of the neural network approximation.

4.1 Numerical experiments

This section provides experimental support to the proposed learning framework using the machine learning library PyTorch [49]. The experiments of this section are run on a CPU machine. We perform a series of experiments varying some hyperparameters in the training procedure. We fix the batch size $B$ to 32 and use the Adam optimiser [48] for the training with learning rate $10^{-3}$ and weight decay set to $0$ . In (11) we use the weight matrix

W=I+\gamma G^{\top}G,

where $G=S^{4}-I$ with $S$ the forward shift operator on vectors of $\mathbb{R}^{4(N-1)}$ . This choice of $G$ allows us to compute differences between corresponding entries of the input associated with neighbouring nodes. We determine the number of epochs for training both the discrete and continuous networks based on experimental evidence. We fix a high enough number which allows us to achieve qualitatively accurate predictions and ensure that both training and validation losses start to plateau a few epochs before the set maximum. We consider a multi-layer perceptron with the hyperbolic tangent as an activation function, and we vary the number of layers and the number of hidden nodes in each layer. We also test different values of the parameter $\gamma$ in the weight matrix $W$ . We rely on the software framework Optuna [50], which employs Bayesian optimisation methods to automate and efficiently conduct the search for the combination that yields the best result. We collect in Table 4 the hyperparameters with the corresponding ranges and in Table 5 the selected values. The resulting training error on the both-end data set is $1.14\cdot 10^{-7}$ , the validation error is $2.151\cdot 10^{-7}$ , and the test error is $4.009\cdot 10^{-7}$ . Figure 1 compares test trajectories for $\mathbf{q}$ and $\mathbf{q}^{\prime}$ . We remark that, as already clear from the low value of the training and test errors, the network can accurately replicate the behaviour of the training and test data. Furthermore, we have zero errors at the end nodes since the network is trained only on the internal nodes and the boundary values are appended to the predicted solution in a post-processing phase. On the other hand, since this discrete approach does not relate the components as evaluations of a smooth curve, there is no regular behaviour in the error.

Refer to caption — Figure 1: Comparison over test trajectories for $\mathbf{q}$ and $\mathbf{q}^{\prime}$ for the discrete network $q_{\boldsymbol{\rho}}^{\textrm{d}}$ tested on the both-ends data set with $80\%\--10\%\--10\%$ splitting into training, validation, and test sets. The mean squared error on the test set equals $4.009\cdot 10^{-7}$ . For presentation purposes, only 10 randomly selected trajectories are considered in the first two plots.

As an additional evaluation of the deep learning framework’s behaviour, we conduct experiments to assess how the learning process performs when the number of training data varies, i.e., with different splittings of the data set into training, validation, and test sets. We report the results in Table 2 and summarise the corresponding hyperparameters in Table 5 of the Appendix.

$\begin{array}[]{c}\text{Data set splitting}\\ \text{Training \-- validation \-- test}\end{array}$	Training accuracy	Validation accuracy	Test accuracy
10% - 10% - 10%	$2.331\cdot 10^{-5}$	$3.874\cdot 10^{-5}$	$8.545\cdot 10^{-4}$
20% - 10% - 10%	$1.852\cdot 10^{-6}$	$1.327\cdot 10^{-6}$	$1.361\cdot 10^{-4}$
40% - 10% - 10%	$4.802\cdot 10^{-7}$	$4.793\cdot 10^{-7}$	$1.295\cdot 10^{-6}$
80% - 10% - 10%	$1.140\cdot 10^{-7}$	$2.151\cdot 10^{-7}$	$4.009\cdot 10^{-7}$

Table 2: Behaviour of the discrete network

q_{\boldsymbol{\rho}}^{\textrm{d}}

tested on the both-ends data set with fewer training data points. The size of the training set varies, while that of the validation and the test sets is fixed. The last row corresponds to the results in Figure 1.

We also report results obtained by merging the both-end and the right-end trajectories, with $80\%\--10\%\--10\%$ splitting of the whole new data set into training, validation, and test sets. The results are shown in Figure 2 and are obtained with 3 layers, 616 hidden nodes, and $\gamma=7.323\cdot 10^{-3}$ . The resulting training, validation, and test errors are, respectively, $9.893\cdot 10^{-8}$ , $1.126\cdot 10^{-7}$ , and $7.854\cdot 10^{-8}$ .

5 The continuous network

The approach described in the previous section shows accurate results, given a large enough amount of beam discretisations with a fixed number of nodes $N+1$ , equally distributed in $[0,L]$ . It seems reasonable to expect the parametric model’s approximation quality to improve when the number of discretisation nodes increases. However, in this approach, the dimension of the predicted vector grows with $N$ , and hence minimising the loss function (11) becomes more difficult. In addition, the fact that the discrete network approach depends on the spatial discretisation of the training data restricts the output dimension to a specific number of nodes. Consequently, there would be two main options to assess the solution at different locations: training the network once more, or interpolating the previously obtained approximation. These limitations make such a discrete approach less appealing and suggest that having a neural network that is a smooth function of the arc length coordinate $s$ can be beneficial. This modelling assumption would also be compatible with different discretisations of the curve and would not suffer from the curse of dimensionality if more nodes were added. In this setting, the discrete node $s_{k}$ at which an approximation of the solution is available, is included in the input data together with the boundary conditions. As a result, we work with the following data set

\Omega=\left\{\left(s_{k},\,\mathbf{x}^{i}\right),\;{\mathbf{y}}_{k}^{i}\right% \}_{k=0,\dots,N}^{i=1,\dots,M},

where, as in the previous section,

\mathbf{x}^{i}=\left(\,\mathbf{q}^{i}_{0},\,(\mathbf{q}^{i}_{0})^{\prime},\,% \mathbf{q}^{i}_{N},\,(\mathbf{q}^{i}_{N})^{\prime}\right)\in\mathbb{R}^{8},

and

{\mathbf{y}}_{k}^{i}=\left(\hat{\mathbf{q}}_{k}^{i},\,(\hat{\mathbf{q}}_{k}^{i% })^{\prime}\right).

Here $\hat{\mathbf{q}}_{k}^{i}$ is the numerical solution $\hat{\mathbf{q}}$ on the node $s_{k}$ , satisfying the $i$ -th boundary conditions in Equation (10). Let us introduce the neural network

q_{\boldsymbol{\rho}}^{\mathrm{c}}:\mathbb{R}^{8}\to\mathcal{C}^{\infty}\left(% [0,L],\mathbb{R}^{2}\right),

and the differential operator

\mathcal{D}:\mathcal{C}^{\infty}\left([0,L],\mathbb{R}^{2}\right)\to\mathcal{C% }^{\infty}\left([0,L],\mathbb{R}^{2}\right),\,\,\mathcal{D}\left(q_{% \boldsymbol{\rho}}^{\mathrm{c}}\left(\mathbf{x}^{i}\right)\right)(s_{k})=\frac% {d}{ds}\left(q_{\boldsymbol{\rho}}^{\rm c}(\mathbf{x}^{i})\right)(s)\Big{|}_{s% =s_{k}},

so that we can define

y_{\boldsymbol{\rho}}\left(\mathbf{x}^{i}\right)(s_{k}):=\left(q_{\boldsymbol{% \rho}}^{\mathrm{c}}\left(\mathbf{x}^{i}\right)(s_{k}),\,\mathcal{D}\left(q_{% \boldsymbol{\rho}}^{\mathrm{c}}\left(\mathbf{x}^{i}\right)\right)(s_{k})\right).

To train the network $q_{\boldsymbol{\rho}}^{\mathrm{c}}$ , we define the loss function

	$\displaystyle\textrm{Loss}(\boldsymbol{\rho})=$	$\displaystyle\frac{1}{4M(N+1)}\sum_{i=1}^{M}\sum_{k=0}^{N}\left(\left\\|{y_{% \boldsymbol{\rho}}}\left(\mathbf{x}^{i}\right)(s_{k})-{\boldsymbol{y}}_{k}^{i}% \right\\|^{2}_{2}+\right.$		(12)
		$\displaystyle\left.\gamma\left(\left\\|\pi_{\mathcal{D}}\left(y_{\boldsymbol{% \rho}}\left(\mathbf{x}^{i}\right)(s_{k})\right)\right\\|_{2}^{2}-1\right)^{2}% \right),$		(12)

where $\pi_{\mathcal{D}}:\mathbb{R}^{8}\to\mathbb{R}^{4}$ is the projection on the second component $\mathcal{D}(q_{\boldsymbol{\rho}}^{\rm c}(\mathbf{x}^{i}))(s_{k})$ , and $\gamma\geq 0$ weighs the violation of the normality constraint. The map $q_{\boldsymbol{\rho}}^{\textrm{c}}$ is now a neural network that associates each set of boundary conditions $\mathbf{x}^{i}$ with a smooth curve $q_{\boldsymbol{\rho}}^{\textrm{c}}\left(\mathbf{x}^{i}\right):[0,L]\to\mathbb{% R}^{2}$ that can be evaluated at every point $s\in[0,L]$ . We denote this network with the superscript $\mathrm{c}$ since this curve is, in particular, continuous. The outputs $q_{\boldsymbol{\rho}}^{\textrm{c}}\left(\mathbf{x}^{i}\right)(s)\in\mathbb{R}^% {2}$ are approximations of the configuration of the beam at $s\in[0,L]$ .

We point out that, contrary to the discrete case, we learn approximations of $\mathbf{q}(s)$ also on the end nodes, i.e., at $s=0$ and $s=L$ . This is because we do not impose the boundary conditions by construction. Even though there are multiple approaches to embed them into the network architecture, the one we try in our experiments made the optimisation problem too complex, thus we only impose the boundary conditions weakly in the loss function.

Another strategy is to compute the angles $\theta_{k}$ between the tangents $(\hat{\mathbf{q}}_{k})^{\prime}$ and the $x$ -axis and to use them as training data. To this end, we define the neural network

\theta_{\boldsymbol{\rho}}^{\textrm{c}}:\mathbb{R}^{8}\to\mathcal{C}^{\infty}% \left([0,L],\mathbb{R}\right)

as $\theta_{\boldsymbol{\rho}}^{\textrm{c}}=\hat{\theta}_{\boldsymbol{\rho}}^{% \textrm{c}}\circ\pi$ , where

\hat{\theta}_{\boldsymbol{\rho}}^{\textrm{c}}:\mathbb{R}^{2}\to\mathcal{C}^{% \infty}\left([0,L],\mathbb{R}\right)

(13)

is a neural network, and the function $\pi:\mathbb{R}^{8}\to\mathbb{R}^{2}$ extracts the tangential angles from the boundary conditions, i.e., $\pi\left(\mathbf{x}^{i}\right)=\left(\theta_{0}^{i},\theta_{N}^{i}\right)$ . Such a network should approximate the angular function $\theta:[0,L]\ni s\to\mathbb{R}$ , so that

\tau_{\boldsymbol{\rho}}^{\mathrm{c}}\left(\mathbf{x}^{i}\right)(s):=\left(% \cos\left(\theta_{\boldsymbol{\rho}}^{\textrm{c}}\left(\mathbf{x}^{i}\right)(s% )\right),\sin\left(\theta_{\boldsymbol{\rho}}^{\textrm{c}}\left(\mathbf{x}^{i}% \right)(s)\right)\right)\in\mathbb{R}^{2}

(14)

gets close to the tangent vector $\mathbf{q}^{\prime}(s)$ . As a result, the constraint on the unit norm of the tangents is satisfied by construction, and the inextensibility of the elastica is guaranteed. The curve

\mathbf{q}(s)=\mathbf{q}_{0}+\int_{0}^{s}\mathbf{q}^{\prime}(\bar{s})\mathrm{d% }\bar{s}

can then be approximated through the reconstruction formula

q_{\boldsymbol{\rho}}^{\textrm{c}}\left(\mathbf{x}^{i}\right)(s)=\mathbf{q}_{0% }+\mathcal{I}\left(\tau_{\rho}^{\mathrm{c}}\left(\mathbf{x}^{i}\right)\right)(% s),

(15)

where the operator $\mathcal{I}:\mathcal{C}^{\infty}\left([0,L],\mathbb{R}^{2}\right)\to\mathcal{C% }^{\infty}\left([0,L],\mathbb{R}^{2}\right)$ is such that

\mathcal{I}\left(\tau_{\rho}^{\mathrm{c}}\left(\mathbf{x}^{i}\right)\right)(s)% \approx\int_{0}^{s}\tau_{\rho}^{\mathrm{c}}\left(\mathbf{x}^{i}\right)(\bar{s}% )\mathrm{d}\bar{s}.

In the numerical experiments, $\mathcal{I}$ is based on the $3$ -point Gaussian quadrature formula applied to a partition of the interval $[0,L]$ , see [7, Chapter 9]. As done previously, we define the vector

y_{\boldsymbol{\rho}}\left(\mathbf{x}^{i}\right)(s_{k}):=\left(q_{\boldsymbol{% \rho}}^{\mathrm{c}}\left(\mathbf{x}^{i}\right)(s_{k}),\,\tau_{\boldsymbol{\rho% }}^{\rm c}\left(\mathbf{x}^{i}\right)(s_{k})\right),

(16)

with components defined as in Equations (14) and (15). This allows us to train the network $\theta_{\boldsymbol{\rho}}^{\rm c}$ by minimising the same loss function as in Equation (12), where this time $y_{\boldsymbol{\rho}}^{\rm c}$ is given by Equation (16). Furthermore, since by construction this case satisfies $\left\|\pi_{\mathcal{D}}\left(y_{\boldsymbol{\rho}}^{\rm c}(\mathbf{x}^{i})(s)% \right)\right\|_{2}=\left\|\tau_{\boldsymbol{\rho}}^{\mathrm{c}}\left(\mathbf{% x}^{i}\right)(s)\right\|_{2}\equiv 1$ , we set $\gamma=0$ . We present numerical experiments for the two proposed continuous networks $q_{\boldsymbol{\rho}}^{\textrm{c}}$ and $\theta_{\boldsymbol{\rho}}^{\textrm{c}}$ . In the latter case, by neural network architecture, we refer to $\hat{\theta}_{\boldsymbol{\rho}}^{\rm c}$ rather than $\theta_{\boldsymbol{\rho}}^{\rm c}$ in what follows. We analyse $q_{\boldsymbol{\rho}}^{\textrm{c}}$ more thoroughly in Section 5.1, mirroring most of the discrete case experiments. In Section 5.2 we study how the results are affected when we impose the arc length parametrisation and enforce the boundary conditions to be exactly satisfied by the network $\theta_{\boldsymbol{\rho}}^{\textrm{c}}$ .

5.1 Numerical experiments with $q_{\boldsymbol{\rho}}^{\textrm{c}}$

As for the case of the discrete network, we perform an in-depth investigation of this learning setting. In this case, the experiments are run on a GPU-P100 machine. For this continuous setup, the standard MLP architecture does not provide accurate results even after a hyperparameter optimisation routine. Given, hence, that the simple MLP architecture does not seem to be flexible enough to capture the complexity of the elastica solution in this continuous framework, we move to a different architecture that we call MULT for the presence of multiplicative interactions in its architecture. This network has demonstrated superior performance to standard fully connected neural networks in the context of operator learning, see e.g. [51]. Details on this architecture can be found in Appendix A. We fix the learning rate $\eta$ to $5\cdot 10^{-3}$ and only vary the number of layers and of hidden nodes in the training procedure, with the range of options reported in Appendix B, Table 6. In this case, we define the loss as in Equation (12), with $\gamma=10^{-2}$ . The weight decay is systematically set to $0$ . For the both-ends data set, this leads to a training error equal to $3.554\cdot 10^{-6}$ , a validation error equal to $4.779\cdot 10^{-6}$ , and a test error equal to $4.354\cdot 10^{-6}$ . In Figure 3, the comparison over test trajectories for $\mathbf{q}$ and $\mathbf{q}^{\prime}$ is shown. As we can see in the plot showing the mean error over the trajectories, the error on the end nodes is nonzero, since we are not imposing boundary conditions by construction. This is in contrast to the corresponding plot for the discrete network in Figure 1.

Also in this case, we examine the behaviour of the learning process with different splittings of the data set into training and test sets. We display the results in Table 3 and summarise the corresponding hyperparameters in Appendix B, Table 7.

$\begin{array}[]{c}\text{Data set splitting}\\ \text{Training \-- validation \-- test}\end{array}$	Training accuracy	Validation accuracy	Test accuracy
10% - 10% - 10%	$2.146\cdot 10^{-4}$	$1.252\cdot 10^{-3}$	$8.811\cdot 10^{-4}$
20% - 10% - 10%	$4.187\cdot 10^{-5}$	$4.239\cdot 10^{-5}$	$6.279\cdot 10^{-5}$
40% - 10% - 10%	$7.037\cdot 10^{-6}$	$8.357\cdot 10^{-6}$	$8.434\cdot 10^{-6}$
80% - 10% - 10%	$3.554\cdot 10^{-6}$	$4.779\cdot 10^{-6}$	$4.354\cdot 10^{-6}$

Table 3: Behaviour of the continuous network

q_{\boldsymbol{\rho}}^{\textrm{c}}

5.2 Numerical experiments with $\theta_{\boldsymbol{\rho}}^{\textrm{c}}$

Here we consider a neural network approximation of the angle $\theta(s)$ that parametrises the tangent vector $\mathbf{q}^{\prime}(s)=(\cos(\theta(s)),\sin(\theta(s)))$ . By design, the approximation $\tau_{\boldsymbol{\rho}}^{\mathrm{c}}$ of the tangent vector $\mathbf{q}^{\prime}$ satisfies the constraint $\|\tau_{\boldsymbol{\rho}}^{\mathrm{c}}(\mathbf{x}^{i})(s)\|_{2}=1$ for every $s\in[0,L]$ and $\mathbf{x}^{i}\in\mathbb{R}^{8}$ . We also analyse how the neural network approximation behaves when the boundary conditions $\tau_{\boldsymbol{\rho}}^{\mathrm{c}}(\mathbf{x}^{i})(0)=\mathbf{q}^{\prime}(0)$ and $\tau_{\boldsymbol{\rho}}^{\mathrm{c}}(\mathbf{x}^{i})(L)=\mathbf{q}^{\prime}(L)$ are imposed by construction. To do so, we model the parametric function $\hat{\theta}_{\boldsymbol{\rho}}^{\textrm{c}}$ , defined in Equation (13), in one of the two following ways:

\hat{\theta}_{\boldsymbol{\rho}}^{\textrm{c}}(\mathbf{x}^{i})(s)=f_{% \boldsymbol{\rho}}(s,\theta_{0}^{i},\theta_{N}^{i}),

(17)

\begin{split}\hat{\theta}_{\boldsymbol{\rho}}^{\textrm{c}}(\mathbf{x}^{i})(s)&% =f_{\boldsymbol{\rho}}(s,\theta_{0}^{i},\theta_{N}^{i})+(\theta_{0}^{i}-f_{% \boldsymbol{\rho}}(0,\theta_{0}^{i},\theta_{N}^{i}))e^{-100s^{2}}\\ &+(\theta_{N}^{i}-f_{\boldsymbol{\rho}}(L,\theta_{0}^{i},\theta_{N}^{i}))e^{-1% 00(s-L)^{2}},\end{split}

(18)

where $f_{\boldsymbol{\rho}}:\mathbb{R}^{3}\to\mathbb{R}$ is any neural network, and we recall that $\pi(\mathbf{x}^{i})=(\theta_{0}^{i},\theta_{N}^{i})$ . We remark that, in the case of the parameterisation in Equation (18), one gets $\theta_{\boldsymbol{\rho}}^{\textrm{c}}(\mathbf{x}^{i})(0)=\theta_{0}^{i}$ and $\theta_{\boldsymbol{\rho}}^{\textrm{c}}(\mathbf{x}^{i})(L)=\theta_{N}^{i}$ up to machine precision, due to the fast decay of the Gaussian function. As in the previous sections, we collect the hyperparameter and architecture options with the respective range of choices in Table 8, and we report the results without imposing the boundary conditions in Figure 4, while those imposing them in Figure 5, in both cases using the both-ends data set, with $80\%\--10\%\--10\%$ splitting into training, validation, and test sets. The results shown in the two figures correspond respectively to training errors of $6.288\cdot 10^{-6}$ and $5.301\cdot 10^{-6}$ , validation errors $5.874\cdot 10^{-6}$ and $5.065\cdot 10^{-6}$ , and test errors of $5.089\cdot 10^{-6}$ and $4.385\cdot 10^{-6}$ . The best-performing hyperparameter combinations can be found in Table 9.

6 Discussion

The results in Figures 4 and 5 are comparable, especially looking at the mean error plots. This suggests that the imposition of the boundary conditions, in the proposed way, is not limiting the expressivity of the considered network. Thus, given the boundary value nature of our problem, these figures advocate the enforcement of the boundary conditions on the network $\theta_{\boldsymbol{\rho}}^{\textrm{c}}$ . However, due to the chosen reconstruction procedure in Equation (15) for the variable $\mathbf{q}$ , we are able to impose the boundary conditions on $\mathbf{q}$ only on the left node. Other more symmetric reconstruction procedures can be adopted, but the proposed one has provided better experimental results.

Comparing the results related to $q_{\boldsymbol{\rho}}^{\textrm{c}}$ with those of $\theta_{\boldsymbol{\rho}}^{\textrm{c}}$ , we notice similar performances in terms of training and test errors. In both cases, they have one order of magnitude more than the corresponding training and test errors of the discrete network $q_{\boldsymbol{\rho}}^{\textrm{d}}$ . Thus, as a result of our experiments, we can conclude that

•

if the accuracy and the efficient evaluation of the model at the discrete nodes are of interest, the discrete network is the best option;
•

for a more flexible model, not restricted to the discrete nodes, the continuous network is a better choice; among the two proposed modelling strategies, working with $q_{\boldsymbol{\rho}}^{\textrm{c}}$ is more suitable for an easy parametrisation of both $\mathbf{q}$ and $\mathbf{q}^{\prime}$ , while $\theta_{\boldsymbol{\rho}}^{\textrm{c}}$ is more suitable to impose geometrical structure and constraints.

The total accuracy error of a neural network model can be defined by splitting it into three components: approximation error, optimisation error, and generalisation error (see e.g. [17]). To achieve excellent agreement between predicted and reference trajectories, it is crucial to select the appropriate architecture and fine-tune the model hyperparameters. Our results demonstrate that we can construct a network that is expressive enough to provide a small approximation error and with very good generalisation capability.

Lastly, we have compared the time cost of the Neural Network prediction against the traditional approach with numerical solvers as described in Section 2.2. The discrete and the continuous approaches outperformed the traditional solvers with an average speedup of 105.000 times and 260.000 times, respectively, across the test trajectories. The training time of the continuous network is 1.25 times larger than that of the discrete network. It’s important to note that these results are subject to certain limitations, such as the specific choice of the hyperparameters or the machine used to train and test the network. These findings suggest that using neural networks to predict new solutions of the elastica for unseen boundary conditions is much more time efficient than the classical numerical methods, although requiring intensive offline training.

6.1 Future work

In the methods presented in this paper, the mathematical problem and the neural network model do not interact once the data set is created. To improve the results presented here, one could include Euler’s elastica model directly into the training process. This could be done either by directly imposing in the loss function that $\mathbf{q}(s)$ satisfies the differential equations (5), or one could add the constrained action integral from Equation (4) into the loss function that is minimised, see e.g. [30, 10, 12, 11].

There are many promising directions to follow up on this work. One is to consider 3D versions of Euler’s elastica, another is to look at the dynamical problem, and finally one may examine industrial applications, as mentioned in the introduction.

CRediT author statement.

Elena Celledoni: Conceptualisation, Validation, Writing - Review & Editing, Supervision, Funding acquisition. Ergys Çokaj: Validation, Investigation, Visualisation, Writing - Review & Editing. Andrea Leone: Methodology, Software, Investigation, Writing - Original Draft, Review & Editing. Sigrid Leyendecker: Conceptualisation, Writing - Review & Editing, Supervision, Funding acquisition. Davide Murari: Methodology, Software, Investigation, Writing - Original Draft, Review & Editing. Brynjulf Owren: Conceptualisation, Validation, Writing - Review & Editing, Supervision, Funding acquisition. Rodrigo T. Sato Martín de Almagro: Methodology, Validation, Writing - Review & Editing, Supervision. Martina Stavole: Methodology, Software, Investigation, Writing - Original Draft, Review & Editing.

Acknowledgments. This project has received funding from the European Union’s Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agreement No 860124. This work was partially supported by a grant from the Simons Foundation (DM). This contribution reflects only the authors’ view, and the Research Executive Agency and the European Commission are not responsible for any use that may be made of the information it contains.

Appendix A Architecture for the continuous network

We provide the expression of the forward propagation of the multiplicative network MULT used for the experiments in Section 5:

	$\displaystyle\mathbf{U}=\sigma(\mathbf{W}_{1}\mathbf{x}+\mathbf{b}_{1}),\quad% \mathbf{V}=\sigma(\mathbf{W}_{2}\mathbf{x}+\mathbf{b}_{2})$		(19)
	$\displaystyle\mathbf{H}_{1}=\sigma(\mathbf{W}_{3}\mathbf{x}+\mathbf{b}_{3})$		(20)
	$\displaystyle\mathbf{Z}_{j}=\sigma(\mathbf{W}^{z}_{j}\mathbf{H}_{j}+\mathbf{b}% ^{z}_{j}),\;j=1,\dots,\ell$		(21)
	$\displaystyle\mathbf{H}_{j+1}=(1-\mathbf{Z}_{j})\odot\mathbf{U}+\mathbf{Z}_{j}% \odot\mathbf{V},\;j=1,\dots,\ell$		(22)
	$\displaystyle f_{\boldsymbol{\rho}}^{\mathrm{MULT}}(\mathbf{x})=\mathbf{W}% \mathbf{H}_{\ell+1}+\mathbf{b},$		(23)

where $\odot$ denotes the component-wise multiplications. In this case, $\boldsymbol{\rho}=\left\{\mathbf{W}_{1},\mathbf{b}_{1},\mathbf{W}_{2},\mathbf{% b}_{2},\mathbf{W}_{3},\mathbf{b}_{3},\left(\mathbf{W}^{z}_{j},\mathbf{b}^{z}_{% j}\right)_{j=1}^{\ell},\mathbf{W},\mathbf{b}\right\}$ , and the weight matrices and biases have shapes that allow for the expressions (19)-(23) to be well-defined. This architecture is inspired by neural attention mechanisms and was introduced in [52] to improve the gradient behaviour. A further motivation for our choice of including this architecture is experimental since it has proven effective in solving the task of interest, while still having a similar number of parameters to the MLP architecture. Throughout the paper, we refer to this architecture as multiplicative since it includes component-wise multiplications, which help capture multiplicative interactions between the variables.

Appendix B Details on hyperparameter optimisation

We provide here further details on the hyperparameter optimisation strategy with Optuna, relative to the results in Sections 4 and 5. The tables below display the hyperparameters we optimise for in each of the networks, the ranges and the distribution, as well as the selected combinations used to perform the experiments in the paper.

Hyperparameter	Range	Distribution
#layers $\ell$	$\{0,...,10\}$	discrete uniform
#hidden nodes	$[10,1000]\cap\mathbb{N}$	discrete uniform
$\gamma$	$[0,1\cdot 10^{-2}]$	uniform

Table 4: Hyperparameter ranges for the discrete network

q_{\boldsymbol{\rho}}^{\textrm{d}}

. The first column of the table reports the hyperparameters we test. The second describes the set of allowed values for each, while the third specifies how such values are explored through Optuna.

$\begin{array}[]{c}\text{Hyperparameter}\\ \text{combination}\end{array}$	% of trajectories of the whole dataset in the training set
	10%	20%	40%	80%
# layers $\ell$	4	4	4	4
#hidden nodes	950	978	997	985
$\gamma$	$7.044\cdot 10^{-3}$	$6.336\cdot 10^{-3}$	$9.004\cdot 10^{-3}$	$3.853\cdot 10^{-3}$

Table 5: Choice of hyperparameters for the training of the discrete network

q_{\boldsymbol{\rho}}^{\textrm{d}}

tested on the both-ends data set with different sizes of the training set, with the validation and test sets each containing

10\%

of trajectories of the dataset.

Hyperparameter	Range	Distribution
#layers $\ell$	$\{5,\ldots,10\}$	discrete uniform
#hidden nodes	$[10,250]\cap\mathbb{N}$	discrete uniform

Table 6: Hyperparameter ranges for the continuous network

q_{\boldsymbol{\rho}}^{\textrm{c}}

. The first column of the table reports the hyperparameters we test. The second describes the set of allowed values for each, while the third specifies how such values are explored through Optuna.

$\begin{array}[]{c}\text{Hyperparameter}\\ \text{combination}\end{array}$	% of trajectories of the whole dataset in the training set
	10%	20%	40%	80%
# layers $\ell$	6	7	8	6
#hidden nodes	139	185	181	106

Table 7: Choice of hyperparameters for the training of the continuous network

q_{\boldsymbol{\rho}}^{\textrm{c}}

tested on the both-ends data set with different sizes of the training set, with the validation and test sets each containing

10\%

of trajectories of the dataset.

Hyperparameter	Range	Distribution
#layers $\ell$	$\{1,\ldots,10\}$	discrete uniform
#hidden nodes	$[50,200]\cap\mathbb{N}$	discrete uniform

Table 8: Hyperparameter ranges for the continuous network

\theta_{\boldsymbol{\rho}}^{\textrm{c}}

.The first column of the table reports the hyperparameters we test. The second describes the set of allowed values for each, while the third specifies how such values are explored through Optuna.

$\begin{array}[]{c}\text{Hyperparameter}\\ \text{combination}\end{array}$	$\theta_{\boldsymbol{\rho}}^{\textrm{c}}$
	$\theta_{\boldsymbol{\rho}}^{\textrm{c}}$ as in (17)	$\theta_{\boldsymbol{\rho}}^{\textrm{c}}$ as in (18)
# layers $\ell$	8	8
#hidden nodes	93	58

Table 9: Choice of the hyperparameters for the training of the continuous network

\theta_{\boldsymbol{\rho}}^{\textrm{c}}

tested on the both-ends data set with

80\%\--10\%\--10\%

splitting into training, validation, and test sets. The second column shows the combination of hyperparameters yielding the best result corresponding to Figure 4, while the third column that corresponding to Figure 5.

References

\bibcommenthead
[1] Saad, Y.: Iterative Methods for Sparse Linear Systems. SIAM, (2003)
[2] Nocedal, J., Wright, S.J.: Numerical Optimization. Springer, (1999)
Marsden and West [2001] Marsden, J.E., West, M.: Discrete mechanics and variational integrators. Acta numerica 10, 357–514 (2001)
[4] Hairer, E., Nørsett, S.P., Wanner, G.: Solving Ordinary Differential Equations I, Nonstiff Problems, Second revised edition edn. Springer, (1993)
[5] Hairer, E., Wanner, G.: Solving Ordinary Differential Equations II, Stiff and Differential-Algebraic Problems, Second revised edition edn. Springer, (1996)
[6] Brenner, S.C.: The Mathematical Theory of Finite Element Methods. Springer, (2008)
[7] Quarteroni, A., Sacco, R., Saleri, F.: Numerical Mathematics vol. 37. Springer, (2006)
Cuomo et al. [2022] Cuomo, S., Di Cola, V.S., Giampaolo, F., Rozza, G., Raissi, M., Piccialli, F.: Scientific machine learning through physics–informed neural networks: Where we are and what’s next. Journal of Scientific Computing 92(3), 88 (2022)
Brunton and Kutz [2023] Brunton, S.L., Kutz, J.N.: Machine learning for partial differential equations. arXiv:2303.17078 (2023)
Raissi et al. [2019] Raissi, M., Perdikaris, P., Karniadakis, G.E.: Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. Journal of Computational physics 378, 686–707 (2019)
Samaniego et al. [2020] Samaniego, E., Anitescu, C., Goswami, S., Nguyen-Thanh, V.M., Guo, H., Hamdia, K., Zhuang, X., Rabczuk, T.: An energy approach to the solution of partial differential equations in computational mechanics via machine learning: Concepts, implementation and applications. Computer Methods in Applied Mechanics and Engineering 362, 112790 (2020)
E and Yu [2018] E, W., Yu, B.: The Deep Ritz Method: A Deep Learning-Based Numerical Algorithm for Solving Variational Problems. Communications in Mathematics and Statistics 6(1), 1–12 (2018)
Gu and Ng [2023] Gu, Y., Ng, M.K.: Deep neural networks for solving large linear systems arising from high-dimensional problems. SIAM Journal on Scientific Computing 45(5), 2356–2381 (2023)
Kadupitiya et al. [2022] Kadupitiya, J., Fox, G.C., Jadhao, V.: Solving Newton’s equations of motion with large timesteps using recurrent neural networks based operators. Machine Learning: Science and Technology 3(2), 025002 (2022)
Liu et al. [2020] Liu, Y., Kutz, J., Brunton, S.: Hierarchical deep learning of multiscale differential equation time-steppers, arxiv. arXiv:2008.09768 (2020)
Mattheakis et al. [2022] Mattheakis, M., Sondak, D., Dogra, A.S., Protopapas, P.: Hamiltonian neural networks for solving equations of motion. Physical Review E 105(6), 065305 (2022)
Lu et al. [2021a] Lu, L., Jin, P., Pang, G., Zhang, Z., Karniadakis, G.E.: Learning nonlinear operators via deeponet based on the universal approximation theorem of operators. Nature machine intelligence 3(3), 218–229 (2021)
Lu et al. [2021b] Lu, L., Meng, X., Mao, Z., Karniadakis, G.E.: Deepxde: A deep learning library for solving differential equations. SIAM review 63(1), 208–228 (2021)
Chevalier et al. [2022] Chevalier, S., Stiasny, J., Chatzivasileiadis, S.: Accelerating dynamical system simulations with contracting and physics-projected neural-newton solvers. In: Learning for Dynamics and Control Conference, PMLR, pp. 803–816 (2022)
Li et al. [2022] Li, Y., Zhou, Z., Ying, S.: Delisa: Deep learning based iteration scheme approximation for solving pdes. Journal of Computational Physics 451, 110884 (2022)
Schiassi et al. [2021] Schiassi, E., Furfaro, R., Leake, C., De Florio, M., Johnston, H., Mortari, D.: Extreme theory of functional connections: A fast physics-informed neural network method for solving ordinary and partial differential equations. Neurocomputing 457, 334–356 (2021)
De Florio et al. [2022] De Florio, M., Schiassi, E., Furfaro, R.: Physics-informed neural networks and functional interpolation for stiff chemical kinetics. Chaos: An Interdisciplinary Journal of Nonlinear Science 32(6) (2022)
Fabiani et al. [2023] Fabiani, G., Galaris, E., Russo, L., Siettos, C.: Parsimonious physics-informed random projection neural networks for initial value problems of ODEs and index-1 DAEs. Chaos: An Interdisciplinary Journal of Nonlinear Science 33(4) (2023)
Mortari et al. [2019] Mortari, D., Johnston, H., Smith, L.: High accuracy least-squares solutions of nonlinear differential equations. Journal of computational and applied mathematics 352, 293–307 (2019)
Loc Vu-Quoc [2023] Loc Vu-Quoc, A.H.: Deep learning applied to computational mechanics: A comprehensive review, state of the art, and the classics. Computer Modeling in Engineering & Sciences 137(2), 1069–1343 (2023)
Brunton and Kutz [2022] Brunton, S.L., Kutz, J.N.: Data-Driven Science and Engineering: Machine Learning, Dynamical Systems, and Control, 2nd edn. Cambridge University Press, ??? (2022)
[27] Yagawa, G., Oishi, A.: Computational Mechanics with Deep Learning: An Introduction. Springer, (2022)
Fabiani et al. [2021] Fabiani, G., Calabrò, F., Russo, L., Siettos, C.: Numerical solution and bifurcation analysis of nonlinear partial differential equations with extreme learning machines. Journal of Scientific Computing 89(2), 44 (2021)
Galaris et al. [2022] Galaris, E., Fabiani, G., Gallos, I., Kevrekidis, I., Siettos, C.: Numerical bifurcation analysis of PDEs from lattice boltzmann model simulations: a parsimonious machine learning approach. Journal of Scientific Computing 92(2), 34 (2022)
Lagaris et al. [1998] Lagaris, I.E., Likas, A., Fotiadis, D.I.: Artificial neural networks for solving ordinary and partial differential equations. IEEE transactions on neural networks 9(5), 987–1000 (1998)
[31] Kollmannsberger, S., D’Angella, D., Jokeit, M., Herrmann, L.: Deep Learning in Computational Mechanics. Springer, (2021)
Ntarladima et al. [2023] Ntarladima, K., Pieber, M., Gerstmayr, J.: A model for contact and friction between beams under large deformation and sheaves. Nonlinear Dynamics, 1–18 (2023)
Stavole et al. [2022] Stavole, M., Almagro, R.T.S.M., Lohk, M., Leyendecker, S.: Homogenization of the constitutive properties of composite beam cross-sections. In: ECCOMAS Congress 2022-8th European Congress on Computational Methods in Applied Sciences and Engineering (2022)
Manfredo et al. [2023] Manfredo, D., Dörlich, V., Linn, J., Arnold, M.: Data based constitutive modelling of rate independent inelastic effects in composite cables using preisach hysteresis operators. Multibody System Dynamics, 1–16 (2023)
Saadat and Durville [2023] Saadat, M.A., Durville, D.: A mixed stress-strain driven computational homogenization of spiral strands. Computers & Structures 279, 106981 (2023)
Euler [1744] Euler, L.: De Curvis Elastici. Additamentum in Methodus inveniendi lineas curvas maximi minimive proprietate gaudentes, sive solutio problematis isoperimetrici lattissimo sensu accepti, Lausanne (1744)
Love [1863 - 1940] Love, A.E.H.: A Treatise on the Mathematical Theory of Elasticity. Cambridge University Press, Cambridge (1863 - 1940)
Matsutani [2010] Matsutani, S.: Euler’s elastica and beyond. Journal of Geometry and Symmetry in Physics 17, 45–86 (2010)
Singer [2008] Singer, D.A.: Lectures on elastic curves and rods. In: AIP Conference Proceedings, vol. 1002. American Institute of Physics, pp. 3–32 (2008)
Rohrhofer et al. [2022] Rohrhofer, F.M., Posch, S., Gößnitzer, C., Geiger, B.C.: On the role of fixed points of dynamical systems in training physics-informed neural networks. Transactions on Machine Learning Research (2022)
Colombo et al. [2016] Colombo, L., Ferraro, S., Diego, D.: Geometric integrators for higher-order variational systems and their application to optimal control. Journal of Nonlinear Science 26, 1615–1650 (2016)
Ferraro et al. [2021] Ferraro, S.J., Diego, D.M., Almagro, R.T.S.M.: Parallel iterative methods for variational integration applied to navigation problems. IFAC-PapersOnLine 54(19), 321–326 (2021)
[43] Timoshenko, S.P., Gere, J.M.: Theory of Elastic Stability. McGraw-Hill Book Company, (1961)
[44] Bigoni, D.: Nonlinear Solid Mechanics: Bifurcation Theory and Material Instability. Cambridge University Press, (2012)
Virtanen et al. [2020] Virtanen, P., Gommers, R., Oliphant, T.E., Haberland, M., Reddy, T., Cournapeau, D., Burovski, E., Peterson, P., Weckesser, W., Bright, J., van der Walt, S.J., Brett, M., Wilson, J., Millman, K.J., Mayorov, N., Nelson, A.R.J., Jones, E., Kern, R., Larson, E., Carey, C.J., Polat, İ., Feng, Y., Moore, E.W., VanderPlas, J., Laxalde, D., Perktold, J., Cimrman, R., Henriksen, I., Quintero, E.A., Harris, C.R., Archibald, A.M., Ribeiro, A.H., Pedregosa, F., van Mulbregt, P., SciPy 1.0 Contributors: SciPy 1.0: Fundamental Algorithms for Scientific Computing in Python. Nature Methods 17, 261–272 (2020)
Higham and Higham [2019] Higham, C.F., Higham, D.J.: Deep learning: An introduction for applied mathematicians. Siam review 61(4), 860–891 (2019)
[47] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT press, (2016)
Kingma and Ba [2015] Kingma, D., Ba, J.: Adam: A method for stochastic optimization. In: International Conference on Learning Representations (ICLR), San Diega, CA, USA (2015)
Paszke et al. [2019] Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., Antiga, L., et al.: Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems 32 (2019)
Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 2623–2631 (2019)
Wang and Perdikaris [2023] Wang, S., Perdikaris, P.: Long-time integration of parametric evolution equations with physics-informed deeponets. Journal of Computational Physics 475, 111855 (2023)
Wang et al. [2021] Wang, S., Teng, Y., Perdikaris, P.: Understanding and mitigating gradient flow pathologies in physics-informed neural networks. SIAM Journal on Scientific Computing 43(5), 3055–3081 (2021)

Neural networks for the approximation of Euler’s elastica

Abstract

keywords:

1 Introduction

2 Euler’s elastica model

2.1 Space discretisation of the elastica

2.2 Data generation

3 Approximation with neural networks

4 The discrete network

4.1 Numerical experiments

5 The continuous network

5.1 Numerical experiments with q𝝆csuperscriptsubscript𝑞𝝆cq_{\boldsymbol{\rho}}^{\textrm{c}}italic_q start_POSTSUBSCRIPT bold_italic_ρ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT c end_POSTSUPERSCRIPT

5.2 Numerical experiments with θ𝝆csuperscriptsubscript𝜃𝝆c\theta_{\boldsymbol{\rho}}^{\textrm{c}}italic_θ start_POSTSUBSCRIPT bold_italic_ρ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT c end_POSTSUPERSCRIPT

6 Discussion

6.1 Future work

CRediT author statement.

Appendix A Architecture for the continuous network

Appendix B Details on hyperparameter optimisation

References

5.1 Numerical experiments with $q_{\boldsymbol{\rho}}^{\textrm{c}}$

5.2 Numerical experiments with $\theta_{\boldsymbol{\rho}}^{\textrm{c}}$