Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

[2]\fnmSigrid \surLeyendecker

[1]\orgdivDepartment of Mathematical Sciences, \orgnameNorwegian University of Science and Technology, \orgaddress\streetAlfred Getz’ vei 1, \cityTrondheim, \postcode7034, \countryNorway

2]\orgdivInstitute of Applied Dynamics, \orgnameFriedrich-Alexander-Universität Erlangen-Nürnberg, \orgaddress\streetImmerwahrstrasse 1, \cityErlangen, \postcode91058, \countryGermany

Neural networks for the approximation of Euler’s elastica

\fnmElena \surCelledoni elena.celledoni@ntnu.no    \fnmErgys \surÇokaj ergys.cokaj@ntnu.no    \fnmAndrea \surLeone andrea.leone@ntnu.no    sigrid.leyendecker@fau.de    \fnmDavide \surMurari davide.murari@ntnu.no    \fnmBrynjulf \surOwren brynjulf.owren@ntnu.no    \fnmRodrigo T. \surSato Martín de Almagro rodrigo.t.sato@fau.de    \fnmMartina \surStavole martina.stavole@fau.de * [
Abstract

Euler’s elastica is a classical model of flexible slender structures relevant in many industrial applications. Static equilibrium equations can be derived via a variational principle. The accurate approximation of solutions to this problem can be challenging due to nonlinearity and constraints. We here present two neural network-based approaches for simulating Euler’s elastica. Starting from a data set of solutions of the discretised static equilibria, we train the neural networks to produce solutions for unseen boundary conditions. We present a discrete approach learning discrete solutions from the discrete data. We then consider a continuous approach using the same training data set but learning continuous solutions to the problem. We present numerical evidence that the proposed neural networks can effectively approximate configurations of the planar Euler’s elastica for a range of different boundary conditions.

keywords:
planar Euler’s elastica, supervised learning, neural networks, geometric mechanics, variational problem.

1 Introduction

Modelling of mechanical systems is relevant in various branches of engineering. Typically, it leads to the formulation of variational problems and differential equations, whose solutions are approximated with numerical techniques. The efficient solution of linear and nonlinear systems resulting from the discretisation of mechanical problems has been a persistent challenge of applied mathematics. While classical solvers are characterised by a well-established and mature body of literature [1, 2, 3, 4, 5, 6, 7], the past decade has witnessed a surge in the use of novel machine learning-assisted techniques [8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24]. These approaches aim at enhancing solution methods by leveraging the wealth of available data and known physical principles. The use of deep learning techniques to improve the performance of traditional numerical algorithms in terms of efficiency, accuracy, and computational scalability [9], is becoming increasingly popular also in computational mechanics (see, e.g. [25]). Examples include a wide range of problems that require the approximation of functions, as well as efficient reduced order modeling [26] or more specific numerical tasks such as optimising the quadrature rule for computing the finite element stiffness matrix [27] or the investigation of data-driven numerical frameworks for the bifurcation analysis of partial differential equations [28, 29]. This recent literature is evidence that neural networks can be used successfully as surrogate models for the solution operators of various differential equations.

In the context of ordinary and partial differential equations, two main trends can be identified. The first one aims at providing a machine learning-based approximation to the discrete solutions of differential problems on a specific space-time grid, for example, by solving linear or nonlinear systems efficiently and accelerating convergence of iterative schemes [20, 19, 15, 14, 13]. The second one provides instead solutions to the differential problem as continuous (and differentiable) functions of the temporal and spatial variables. Depending on the context, conditions on such approximate solutions are provided by the differential problem itself, the initial values and boundary conditions, and the available data. The idea of providing approximate solutions as functions defined on the space-time domain and parametrised as neural networks was proposed in the nineties [30] and was recently revived in the framework of Physics-Informed Neural Networks in [10]. Since then, such an approach has attracted much interest and developed in many directions [8, 21, 31].

In this work, we use neural networks to approximate the configurations of highly flexible slender structures modelled as beams. Such models are of great interest in industrial applications like cable car ropes, diverse types of wires or endoscopes [32, 33, 34, 35]. Notwithstanding their ingenious and simple mathematical formulation, slender structure models can accurately reproduce complex mechanical behaviour and for this reason their numerical discretisation is often challenging. Furthermore, the use of 3-dimensional models requires high computational time. Due to the fact that slender deformable structures have one dimension (length) being orders of magnitude larger than their other dimensions (cross-section), it is possible to reduce the complexity of the problem from a 3333-dimensional elastic continuum to a 1111-dimensional beam. A beam is modelled as a centerline curve, 𝐪:[0,L]n,s𝐪(s):𝐪formulae-sequence0𝐿superscript𝑛maps-to𝑠𝐪𝑠\mathbf{q}:[0,L]\to\mathbb{R}^{n},s\mapsto\mathbf{q}(s)bold_q : [ 0 , italic_L ] → blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT , italic_s ↦ bold_q ( italic_s ), with n=2𝑛2n=2italic_n = 2 or n=3𝑛3n=3italic_n = 3, along which a rigid cross-section Σ(s)Σ𝑠\Sigma(s)roman_Σ ( italic_s ) is attached. The main model assumption is that the diameter of Σ(s)Σ𝑠\Sigma(s)roman_Σ ( italic_s ) is small compared with the undeformed length L𝐿Litalic_L. The complexity of the model depends on factors such as the dimension of the problem, the translational and rotational degrees of freedom (DOF) at each node of the beam, the analysis, i.e., static or dynamic. Exploring the numerous beam models documented in the literature, we choose to approach the challenge of approximating beam deformations using a simple yet widely employed model, i.e., the 2222-dimensional Euler’s elastica [36]. The cross-section Σ(s)Σ𝑠\Sigma(s)roman_Σ ( italic_s ) is assumed to have unchanged geometrical and material properties, and be orthogonal to the centerline 𝐪(s)𝐪𝑠\mathbf{q}(s)bold_q ( italic_s ). The latter is an inextensible curve and solution of a bending energy minimisation problem [37, 38, 39] for given boundary conditions.

Although the 2-dimensional Euler’s elastica is relatively simple compared to more comprehensive models, it can robustly represent interesting real-world phenomena. For instance, the elastica model appropriately captures the high bending deformations of flexible endoscopes, complex medical devices, during surgeries [33]. The approximation of the elastica through neural networks can help predict the deformed configuration of the beam for endoscopy simulations, particularly when the beam encounters constraints in confined spaces.

When approximating static equilibria of Euler’s elastica via neural networks, a key issue is to ensure the inextensibility of the curve (having unit norm tangents) as well as the boundary conditions. Two main approaches can be found in the literature [31, 21, 40]. One is the weak imposition of constraints and boundary conditions adding appropriate extra terms to the loss function. The other is a strong imposition strategy consisting in shaping the network architectures to satisfy the constraints by construction. We show examples of both the approaches in Sections 4 and 5.

The paper is organised as follows. In Section 2, we present the mathematical model of the planar Euler’s elastica, including its continuous and discrete equilibrium equations. We describe the approach used to generate the data sets for the numerical experiments. In Section 3, we introduce some basic theory and notation for neural networks that we shall use in the succeeding sections. Starting from general theory, we specialise in the task of approximating configurations of Euler’s elastica. In Section 4, we introduce the discrete approach, which aims to approximate precomputed numerical discretisations of Euler’s elastica. This represents the natural approach to approximate the discrete solution trajectories with a parametric method. We discuss some drawbacks associated with this approach and then propose an alternative approximation strategy in Section 5, that leverages the fact that we are approximating a continuous curve on a spatial grid. The continuous approach consists in computing an arc length parametrisation of the beam configuration. We provide insights into two additional networks and analyse how the test accuracy changes with varying constraints, such as boundary conditions or tangent vector norms. Data and codes for the numerical experiments are available in the GitHub repository associated to the paper111https://github.com/ergyscokaj/LearningEulersElastica.

Main contributions: This paper presents advancements in the approximation of beam static configurations using neural networks. These advancements include: (i) A detailed experimental analysis of approximating numerical discretisations of Euler’s elastica configurations through what we call discrete network, (ii) Identification and discussion of the limitations associated with this discrete approach, and (iii) Introduction of a new parametrisation strategy called continuous network to address some of these drawbacks.

Nomenclature
\mathcal{L}caligraphic_L continuous Lagrangian function
𝒮𝒮\mathcal{S}caligraphic_S continuous action functional
dsubscript𝑑\mathcal{L}_{d}caligraphic_L start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT discrete Lagrangian function
𝒮dsubscript𝒮𝑑\mathcal{S}_{d}caligraphic_S start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT discrete action functional
𝐪𝐪\mathbf{q}bold_q configuration of the beam
𝐪superscript𝐪\mathbf{q}^{\prime}bold_q start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT first spatial derivative of 𝐪𝐪\mathbf{q}bold_q
θ𝜃\thetaitalic_θ tangential angle
s𝑠sitalic_s arc length parameter
κ𝜅\kappaitalic_κ curvature
L𝐿Litalic_L length of the undeformed beam
EI𝐸𝐼EIitalic_E italic_I bending stiffness, with E𝐸Eitalic_E the elastic modulus and I𝐼Iitalic_I the second moment of area
𝐪^^𝐪\hat{\mathbf{q}}over^ start_ARG bold_q end_ARG numerical approximation of 𝐪𝐪\mathbf{q}bold_q
N+1𝑁1N+1italic_N + 1 number of discretisation nodes, with N𝑁Nitalic_N the number of intervals
hhitalic_h space step (length of each interval)
q𝝆dsuperscriptsubscript𝑞𝝆dq_{\boldsymbol{\rho}}^{\textrm{d}}italic_q start_POSTSUBSCRIPT bold_italic_ρ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT d end_POSTSUPERSCRIPT discrete neural network
q𝝆csuperscriptsubscript𝑞𝝆cq_{\boldsymbol{\rho}}^{\textrm{c}}italic_q start_POSTSUBSCRIPT bold_italic_ρ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT c end_POSTSUPERSCRIPT continuous neural network approximating the solution curve 𝒒(s)𝒒𝑠\boldsymbol{q}(s)bold_italic_q ( italic_s )
θ𝝆csuperscriptsubscript𝜃𝝆c\theta_{\boldsymbol{\rho}}^{\textrm{c}}italic_θ start_POSTSUBSCRIPT bold_italic_ρ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT c end_POSTSUPERSCRIPT continuous neural network approximating the angular function θ(s)𝜃𝑠\theta(s)italic_θ ( italic_s )
𝝆𝝆\boldsymbol{\rho}bold_italic_ρ parameters of the neural network
\ellroman_ℓ number of layers in the neural network
σ𝜎\sigmaitalic_σ activation function
M𝑀Mitalic_M number of training data
B𝐵Bitalic_B size of one training batch
MSE mean squared error
MLP multi layer perceptron
MULT multiplicative neural network
𝒟𝒟\mathcal{D}caligraphic_D differential operator
\mathcal{I}caligraphic_I quadrature operator
Table 1: List of abbreviations and notations.

2 Euler’s elastica model

We consider an inextensible beam model in which the cross-section Σ(s)Σ𝑠\Sigma(s)roman_Σ ( italic_s ) is assumed to be constant along the arc length s𝑠sitalic_s and perpendicular to the centerline 𝐪(s)𝐪𝑠\mathbf{q}(s)bold_q ( italic_s ), which means that no shear deformation can occur. Thus, the deformation of the centerline is a pure bending problem, precisely Euler’s elastica curve. In the following, we assume 𝐪C2([0,L],2)𝐪superscript𝐶20𝐿superscript2\mathbf{q}\in C^{2}([0,L],\mathbb{R}^{2})bold_q ∈ italic_C start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( [ 0 , italic_L ] , blackboard_R start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ), i.e., the curve is planar and twice continuously differentiable with length L𝐿Litalic_L. If s𝑠sitalic_s denotes the arc length parameter, then 𝐪(s)=1normsuperscript𝐪𝑠1\|\mathbf{q}^{\prime}(s)\|=1∥ bold_q start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_s ) ∥ = 1, where =dds{}^{\prime}=\frac{d}{ds}start_FLOATSUPERSCRIPT ′ end_FLOATSUPERSCRIPT = divide start_ARG italic_d end_ARG start_ARG italic_d italic_s end_ARG, for all s[0,L]𝑠0𝐿s\in[0,L]italic_s ∈ [ 0 , italic_L ]. The elastica problem consists in minimising the following Euler-Bernoulli energy functional

0Lκ(s)2𝑑s,superscriptsubscript0𝐿𝜅superscript𝑠2differential-d𝑠\int_{0}^{L}\kappa(s)^{2}ds,∫ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT italic_κ ( italic_s ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_d italic_s ,

where κ(s)𝜅𝑠\kappa(s)italic_κ ( italic_s ) denotes the curvature of 𝐪(s)𝐪𝑠\mathbf{q}(s)bold_q ( italic_s ), [38]. Given the arc length parametrisation, then κ(s)=𝐪′′(s)𝜅𝑠normsuperscript𝐪′′𝑠\kappa(s)=\|\mathbf{q}^{\prime\prime}(s)\|italic_κ ( italic_s ) = ∥ bold_q start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT ( italic_s ) ∥.

We can reformulate this problem as a constrained Lagrangian problem as follows. Consider the second-order Lagrangian :T(2)Q:superscript𝑇2𝑄\mathcal{L}:T^{(2)}Q\rightarrow\mathbb{R}caligraphic_L : italic_T start_POSTSUPERSCRIPT ( 2 ) end_POSTSUPERSCRIPT italic_Q → blackboard_R, where T(2)Qsuperscript𝑇2𝑄T^{(2)}Qitalic_T start_POSTSUPERSCRIPT ( 2 ) end_POSTSUPERSCRIPT italic_Q denotes the second-order tangent bundle [41] of the configuration manifold Q𝑄Qitalic_Q, which in this case is 2superscript2\mathbb{R}^{2}blackboard_R start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT:

(𝐪,𝐪,𝐪′′)=12EI𝐪′′2.𝐪superscript𝐪superscript𝐪′′12𝐸𝐼superscriptnormsuperscript𝐪′′2\mathcal{L}\left(\mathbf{q},\mathbf{q}^{\prime},\mathbf{q}^{\prime\prime}% \right)=\frac{1}{2}EI\left\|\mathbf{q}^{\prime\prime}\right\|^{2}\,.caligraphic_L ( bold_q , bold_q start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , bold_q start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT ) = divide start_ARG 1 end_ARG start_ARG 2 end_ARG italic_E italic_I ∥ bold_q start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT . (1)

Here, abusing the notation, denotes a spatial derivative, but we do not initially assume arc length parametrisation. The parameter EI𝐸𝐼EIitalic_E italic_I is the bending stiffness, which governs the response of the elastica under bending. This mechanical parameter consists of a material and a geometric properties, where E𝐸Eitalic_E is the Young’s modulus and I𝐼Iitalic_I is the second moment of area of the cross-section ΣΣ\Sigmaroman_Σ. For simplicity, these parameters are assumed to be constant along the length of the beam.

In order to recover the solutions of the elastica, the Lagrangian in Equation (1) must be supplemented with the constraint equation

Φ(𝐪,𝐪)=𝐪21=0.Φ𝐪superscript𝐪superscriptnormsuperscript𝐪210\Phi(\mathbf{q},\mathbf{q}^{\prime})=\|\mathbf{q}^{\prime}\|^{2}-1=0.roman_Φ ( bold_q , bold_q start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) = ∥ bold_q start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - 1 = 0 . (2)

This imposes arc length parametrisation of the curve 𝐪(s)𝐪𝑠\mathbf{q}(s)bold_q ( italic_s ) and leads to the augmented Lagrangian ~:T(2)Q×:~superscript𝑇2𝑄\widetilde{\mathcal{L}}:T^{(2)}Q\times\mathbb{R}\rightarrow\mathbb{R}over~ start_ARG caligraphic_L end_ARG : italic_T start_POSTSUPERSCRIPT ( 2 ) end_POSTSUPERSCRIPT italic_Q × blackboard_R → blackboard_R

~(𝐪,𝐪,𝐪′′,Λ)=(𝐪,𝐪,𝐪′′)+ΛΦ(𝐪,𝐪),~𝐪superscript𝐪superscript𝐪′′Λ𝐪superscript𝐪superscript𝐪′′ΛΦ𝐪superscript𝐪\widetilde{\mathcal{L}}\left(\mathbf{q},\mathbf{q}^{\prime},\mathbf{q}^{\prime% \prime},\Lambda\right)=\mathcal{L}\left(\mathbf{q},\mathbf{q}^{\prime},\mathbf% {q}^{\prime\prime}\right)+\Lambda\Phi(\mathbf{q},\mathbf{q}^{\prime}),over~ start_ARG caligraphic_L end_ARG ( bold_q , bold_q start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , bold_q start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT , roman_Λ ) = caligraphic_L ( bold_q , bold_q start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , bold_q start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT ) + roman_Λ roman_Φ ( bold_q , bold_q start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) , (3)

where Λ(s)Λ𝑠\Lambda(s)roman_Λ ( italic_s ) is a Lagrange multiplier, see [39]. The Lagrangian function coincides with the total elastic energy over solutions of the corresponding Euler-Lagrange equations. The internal bending moment is directly related to the curvature κ(s)𝜅𝑠\kappa(s)italic_κ ( italic_s ).

The continuous action functional 𝒮𝒮\mathcal{S}caligraphic_S is defined as:

𝒮[𝐪]=0L~(𝐪,𝐪,𝐪′′,Λ)𝑑s.𝒮delimited-[]𝐪superscriptsubscript0𝐿~𝐪superscript𝐪superscript𝐪′′Λdifferential-d𝑠\mathcal{S}[\mathbf{q}]=\int_{0}^{L}\widetilde{\mathcal{L}}\left(\mathbf{q},% \mathbf{q}^{\prime},\mathbf{q}^{\prime\prime},\Lambda\right)\,ds.caligraphic_S [ bold_q ] = ∫ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT over~ start_ARG caligraphic_L end_ARG ( bold_q , bold_q start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , bold_q start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT , roman_Λ ) italic_d italic_s . (4)

Applying Hamilton’s principle of stationary action, δ𝒮=0𝛿𝒮0\delta\mathcal{S}=0italic_δ caligraphic_S = 0, yields the Euler-Lagrange equations

d2ds2(𝐪′′)dds(𝐪)+𝐪superscript𝑑2𝑑superscript𝑠2superscript𝐪′′𝑑𝑑𝑠superscript𝐪𝐪\displaystyle\frac{d^{2}}{ds^{2}}\left(\frac{\partial\mathcal{L}}{\partial% \mathbf{q}^{\prime\prime}}\right)-\frac{d}{ds}\left(\frac{\partial\mathcal{L}}% {\partial\mathbf{q}^{\prime}}\right)+\frac{\partial\mathcal{L}}{\partial% \mathbf{q}}divide start_ARG italic_d start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_d italic_s start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ( divide start_ARG ∂ caligraphic_L end_ARG start_ARG ∂ bold_q start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT end_ARG ) - divide start_ARG italic_d end_ARG start_ARG italic_d italic_s end_ARG ( divide start_ARG ∂ caligraphic_L end_ARG start_ARG ∂ bold_q start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_ARG ) + divide start_ARG ∂ caligraphic_L end_ARG start_ARG ∂ bold_q end_ARG =dds(Φ𝐪Λ)Φ𝐪Λ,absent𝑑𝑑𝑠Φsuperscript𝐪ΛΦ𝐪Λ\displaystyle=\frac{d}{ds}\left(\frac{\partial\Phi}{\partial\mathbf{q}^{\prime% }}\Lambda\right)-\frac{\partial\Phi}{\partial\mathbf{q}}\Lambda,= divide start_ARG italic_d end_ARG start_ARG italic_d italic_s end_ARG ( divide start_ARG ∂ roman_Φ end_ARG start_ARG ∂ bold_q start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_ARG roman_Λ ) - divide start_ARG ∂ roman_Φ end_ARG start_ARG ∂ bold_q end_ARG roman_Λ , (5)
𝐪21superscriptnormsuperscript𝐪21\displaystyle\|\mathbf{q}^{\prime}\|^{2}-1∥ bold_q start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - 1 =0,absent0\displaystyle=0,= 0 ,

which need to be satisfied together with the boundary conditions on positions and tangents, i.e., (𝐪(0),𝐪(0))=(𝐪0,𝐪0)𝐪0superscript𝐪0subscript𝐪0subscriptsuperscript𝐪0(\mathbf{q}(0),\mathbf{q}^{\prime}(0))=(\mathbf{q}_{0},\mathbf{q}^{\prime}_{0})( bold_q ( 0 ) , bold_q start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( 0 ) ) = ( bold_q start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , bold_q start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) and (𝐪(L),𝐪(L))=(𝐪N,𝐪N)𝐪𝐿superscript𝐪𝐿subscript𝐪𝑁subscriptsuperscript𝐪𝑁(\mathbf{q}(L),\mathbf{q}^{\prime}(L))=(\mathbf{q}_{N},\mathbf{q}^{\prime}_{N})( bold_q ( italic_L ) , bold_q start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_L ) ) = ( bold_q start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT , bold_q start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT ).

2.1 Space discretisation of the elastica

The continuous augmented Lagrangian ~~\widetilde{\mathcal{L}}over~ start_ARG caligraphic_L end_ARG in Equation (3) and the action integral 𝒮𝒮\mathcal{S}caligraphic_S in Equation (4) are discretised over the beam length L𝐿Litalic_L using constant step size h=L/N𝐿𝑁h=L/Nitalic_h = italic_L / italic_N, with N+1𝑁1N+1italic_N + 1 the number of the resulting equidistant nodes 0=s0<s1<<sN1<sN=L0subscript𝑠0subscript𝑠1subscript𝑠𝑁1subscript𝑠𝑁𝐿0=s_{0}<s_{1}<\ldots<s_{N-1}<s_{N}=L0 = italic_s start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT < italic_s start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT < … < italic_s start_POSTSUBSCRIPT italic_N - 1 end_POSTSUBSCRIPT < italic_s start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT = italic_L. In second-order systems, the discrete Lagrangian is a function ~d:TQ×TQ××:subscript~𝑑𝑇𝑄𝑇𝑄\widetilde{\mathcal{L}}_{d}:TQ\times TQ\times\mathbb{R}\times\mathbb{R}% \rightarrow\mathbb{R}over~ start_ARG caligraphic_L end_ARG start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT : italic_T italic_Q × italic_T italic_Q × blackboard_R × blackboard_R → blackboard_R. In this study, we refer to a discretisation of the Lagrangian function proposed in [42] based on the trapezoidal rule:

~d(𝐪k,𝐪k,𝐪k+1,𝐪k+1,Λk,Λk+1)=h2[~(𝐪k,𝐪k,(𝐪k′′),Λk)+~(𝐪k+1,𝐪k+1,(𝐪k+1′′)+,Λk+1)],subscript~𝑑subscript𝐪𝑘subscriptsuperscript𝐪𝑘subscript𝐪𝑘1subscriptsuperscript𝐪𝑘1subscriptΛ𝑘subscriptΛ𝑘12delimited-[]~subscript𝐪𝑘subscriptsuperscript𝐪𝑘superscriptsubscriptsuperscript𝐪′′𝑘subscriptΛ𝑘~subscript𝐪𝑘1subscriptsuperscript𝐪𝑘1superscriptsubscriptsuperscript𝐪′′𝑘1subscriptΛ𝑘1\begin{split}&\widetilde{\mathcal{L}}_{d}\left(\mathbf{q}_{k},\mathbf{q}^{% \prime}_{k},\mathbf{q}_{k+1},\mathbf{q}^{\prime}_{k+1},\Lambda_{k},\Lambda_{k+% 1}\right)\\ &=\frac{h}{2}\left[\widetilde{\mathcal{L}}(\mathbf{q}_{k},\mathbf{q}^{\prime}_% {k},\left(\mathbf{q}^{\prime\prime}_{k}\right)^{-},\Lambda_{k})+\widetilde{% \mathcal{L}}(\mathbf{q}_{k+1},\mathbf{q}^{\prime}_{k+1},\left(\mathbf{q}^{% \prime\prime}_{k+1}\right)^{+},\Lambda_{k+1})\right],\end{split}start_ROW start_CELL end_CELL start_CELL over~ start_ARG caligraphic_L end_ARG start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ( bold_q start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , bold_q start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , bold_q start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT , bold_q start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT , roman_Λ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , roman_Λ start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT ) end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL = divide start_ARG italic_h end_ARG start_ARG 2 end_ARG [ over~ start_ARG caligraphic_L end_ARG ( bold_q start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , bold_q start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , ( bold_q start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT , roman_Λ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) + over~ start_ARG caligraphic_L end_ARG ( bold_q start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT , bold_q start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT , ( bold_q start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT , roman_Λ start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT ) ] , end_CELL end_ROW

where 𝐪ksubscript𝐪𝑘\mathbf{q}_{k}bold_q start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT, 𝐪ksubscriptsuperscript𝐪𝑘\mathbf{q}^{\prime}_{k}bold_q start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT, and ΛksubscriptΛ𝑘\Lambda_{k}roman_Λ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT are approximations of 𝐪(sk)𝐪subscript𝑠𝑘\mathbf{q}(s_{k})bold_q ( italic_s start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ), 𝐪(sk)superscript𝐪subscript𝑠𝑘\mathbf{q}^{\prime}(s_{k})bold_q start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_s start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ), and Λ(sk)Λsubscript𝑠𝑘\Lambda(s_{k})roman_Λ ( italic_s start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ), and the curvature on the interval [sk,sk+1]subscript𝑠𝑘subscript𝑠𝑘1[s_{k},s_{k+1}][ italic_s start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_s start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT ] is approximated in terms of lower order derivatives as follows

𝐪′′(sk)(𝐪k′′)=superscript𝐪′′subscript𝑠𝑘superscriptsubscriptsuperscript𝐪′′𝑘absent\displaystyle\mathbf{q}^{\prime\prime}(s_{k})\approx(\mathbf{q}^{\prime\prime}% _{k})^{-}=bold_q start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT ( italic_s start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) ≈ ( bold_q start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT = (2𝐪k+14𝐪k)h+6(𝐪k+1𝐪k)h2,2subscriptsuperscript𝐪𝑘14subscriptsuperscript𝐪𝑘6subscript𝐪𝑘1subscript𝐪𝑘superscript2\displaystyle\,\frac{\left(-2\mathbf{q}^{\prime}_{k+1}-4\mathbf{q}^{\prime}_{k% }\right)h+6(\mathbf{q}_{k+1}-\mathbf{q}_{k})}{h^{2}}\,,divide start_ARG ( - 2 bold_q start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT - 4 bold_q start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) italic_h + 6 ( bold_q start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT - bold_q start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) end_ARG start_ARG italic_h start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ,
𝐪′′(sk+1)(𝐪k+1′′)+=superscript𝐪′′subscript𝑠𝑘1superscriptsubscriptsuperscript𝐪′′𝑘1absent\displaystyle\mathbf{q}^{\prime\prime}(s_{k+1})\approx(\mathbf{q}^{\prime% \prime}_{k+1})^{+}=bold_q start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT ( italic_s start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT ) ≈ ( bold_q start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT = (4𝐪k+1+2𝐪k)h6(𝐪k+1𝐪k)h2.4subscriptsuperscript𝐪𝑘12subscriptsuperscript𝐪𝑘6subscript𝐪𝑘1subscript𝐪𝑘superscript2\displaystyle\,\frac{\left(4\mathbf{q}^{\prime}_{k+1}+2\mathbf{q}^{\prime}_{k}% \right)h-6(\mathbf{q}_{k+1}-\mathbf{q}_{k})}{h^{2}}\,.divide start_ARG ( 4 bold_q start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT + 2 bold_q start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) italic_h - 6 ( bold_q start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT - bold_q start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) end_ARG start_ARG italic_h start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG .

This amounts to a piece-wise linear and discontinuous approximation of the curvature on [0,L]0𝐿[0,L][ 0 , italic_L ].

The action integral in Equation (4) along the exact solution 𝐪𝐪\mathbf{q}bold_q with boundary conditions (𝐪0,𝐪0)subscript𝐪0subscriptsuperscript𝐪0\left(\mathbf{q}_{0},\mathbf{q}^{\prime}_{0}\right)( bold_q start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , bold_q start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) and (𝐪N,𝐪N)subscript𝐪𝑁subscriptsuperscript𝐪𝑁\left(\mathbf{q}_{N},\mathbf{q}^{\prime}_{N}\right)( bold_q start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT , bold_q start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT ) is approximated by

𝒮d=k=0N1~d(𝐪k,𝐪k,𝐪k+1,𝐪k+1,Λk,Λk+1).subscript𝒮𝑑superscriptsubscript𝑘0𝑁1subscript~𝑑subscript𝐪𝑘subscriptsuperscript𝐪𝑘subscript𝐪𝑘1subscriptsuperscript𝐪𝑘1subscriptΛ𝑘subscriptΛ𝑘1\mathcal{S}_{d}=\sum_{k=0}^{N-1}\widetilde{\mathcal{L}}_{d}\left(\mathbf{q}_{k% },\mathbf{q}^{\prime}_{k},\mathbf{q}_{k+1},\mathbf{q}^{\prime}_{k+1},\Lambda_{% k},\Lambda_{k+1}\right).caligraphic_S start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_k = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N - 1 end_POSTSUPERSCRIPT over~ start_ARG caligraphic_L end_ARG start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ( bold_q start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , bold_q start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , bold_q start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT , bold_q start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT , roman_Λ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , roman_Λ start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT ) . (6)

The discrete variational principle δ𝒮d=0𝛿subscript𝒮𝑑0\delta\mathcal{S}_{d}=0italic_δ caligraphic_S start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT = 0 leads to the following discrete Euler-Lagrange equations:

D3~d(𝐪k1,𝐪k1,𝐪k,𝐪k,Λk1,Λk)+D1~d(𝐪k,𝐪k,𝐪k+1,𝐪k+1,Λk,Λk+1)subscript𝐷3subscript~𝑑subscript𝐪𝑘1subscriptsuperscript𝐪𝑘1subscript𝐪𝑘subscriptsuperscript𝐪𝑘subscriptΛ𝑘1subscriptΛ𝑘subscript𝐷1subscript~𝑑subscript𝐪𝑘subscriptsuperscript𝐪𝑘subscript𝐪𝑘1subscriptsuperscript𝐪𝑘1subscriptΛ𝑘subscriptΛ𝑘1\displaystyle D_{3}\widetilde{\mathcal{L}}_{d}\left(\mathbf{q}_{k-1},\mathbf{q% }^{\prime}_{k-1},\mathbf{q}_{k},\mathbf{q}^{\prime}_{k},\Lambda_{k-1},\Lambda_% {k}\right)+D_{1}\widetilde{\mathcal{L}}_{d}\left(\mathbf{q}_{k},\mathbf{q}^{% \prime}_{k},\mathbf{q}_{k+1},\mathbf{q}^{\prime}_{k+1},\Lambda_{k},\Lambda_{k+% 1}\right)italic_D start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT over~ start_ARG caligraphic_L end_ARG start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ( bold_q start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT , bold_q start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT , bold_q start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , bold_q start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , roman_Λ start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT , roman_Λ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) + italic_D start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT over~ start_ARG caligraphic_L end_ARG start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ( bold_q start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , bold_q start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , bold_q start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT , bold_q start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT , roman_Λ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , roman_Λ start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT ) =0,absent0\displaystyle=0,= 0 , (7)
D4~d(𝐪k1,𝐪k1,𝐪k,𝐪k,Λk1,Λk)+D2~d(𝐪k,𝐪k,𝐪k+1,𝐪k+1,Λk,Λk+1)subscript𝐷4subscript~𝑑subscript𝐪𝑘1subscriptsuperscript𝐪𝑘1subscript𝐪𝑘subscriptsuperscript𝐪𝑘subscriptΛ𝑘1subscriptΛ𝑘subscript𝐷2subscript~𝑑subscript𝐪𝑘subscriptsuperscript𝐪𝑘subscript𝐪𝑘1subscriptsuperscript𝐪𝑘1subscriptΛ𝑘subscriptΛ𝑘1\displaystyle D_{4}\widetilde{\mathcal{L}}_{d}\left(\mathbf{q}_{k-1},\mathbf{q% }^{\prime}_{k-1},\mathbf{q}_{k},\mathbf{q}^{\prime}_{k},\Lambda_{k-1},\Lambda_% {k}\right)+D_{2}\widetilde{\mathcal{L}}_{d}\left(\mathbf{q}_{k},\mathbf{q}^{% \prime}_{k},\mathbf{q}_{k+1},\mathbf{q}^{\prime}_{k+1},\Lambda_{k},\Lambda_{k+% 1}\right)italic_D start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT over~ start_ARG caligraphic_L end_ARG start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ( bold_q start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT , bold_q start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT , bold_q start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , bold_q start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , roman_Λ start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT , roman_Λ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) + italic_D start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT over~ start_ARG caligraphic_L end_ARG start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ( bold_q start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , bold_q start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , bold_q start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT , bold_q start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT , roman_Λ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , roman_Λ start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT ) =0,absent0\displaystyle=0,= 0 ,
D6~d(𝐪k1,𝐪k1,𝐪k,𝐪k,Λk1,Λk)+D5~d(𝐪k,𝐪k,𝐪k+1,𝐪k+1,Λk,Λk+1)subscript𝐷6subscript~𝑑subscript𝐪𝑘1subscriptsuperscript𝐪𝑘1subscript𝐪𝑘subscriptsuperscript𝐪𝑘subscriptΛ𝑘1subscriptΛ𝑘subscript𝐷5subscript~𝑑subscript𝐪𝑘subscriptsuperscript𝐪𝑘subscript𝐪𝑘1subscriptsuperscript𝐪𝑘1subscriptΛ𝑘subscriptΛ𝑘1\displaystyle D_{6}\widetilde{\mathcal{L}}_{d}\left(\mathbf{q}_{k-1},\mathbf{q% }^{\prime}_{k-1},\mathbf{q}_{k},\mathbf{q}^{\prime}_{k},\Lambda_{k-1},\Lambda_% {k}\right)+D_{5}\widetilde{\mathcal{L}}_{d}\left(\mathbf{q}_{k},\mathbf{q}^{% \prime}_{k},\mathbf{q}_{k+1},\mathbf{q}^{\prime}_{k+1},\Lambda_{k},\Lambda_{k+% 1}\right)italic_D start_POSTSUBSCRIPT 6 end_POSTSUBSCRIPT over~ start_ARG caligraphic_L end_ARG start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ( bold_q start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT , bold_q start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT , bold_q start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , bold_q start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , roman_Λ start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT , roman_Λ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) + italic_D start_POSTSUBSCRIPT 5 end_POSTSUBSCRIPT over~ start_ARG caligraphic_L end_ARG start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ( bold_q start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , bold_q start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , bold_q start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT , bold_q start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT , roman_Λ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , roman_Λ start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT ) =0,absent0\displaystyle=0,= 0 ,

for k=1,,N1𝑘1𝑁1k=1,\dots,N-1italic_k = 1 , … , italic_N - 1, which approximate the equilibrium equations of the beam in Equations (5) and can be solved together with the boundary conditions. Here, Disubscript𝐷𝑖D_{i}italic_D start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT for i=1,,6𝑖16i=1,\dots,6italic_i = 1 , … , 6 denotes the differentiation with respect to the i𝑖iitalic_i-th argument.

2.2 Data generation

The elastica was one of the first examples displaying elastic instability and bifurcation phenomena [43, 44]. Elastic instability implies that small perturbations of the boundary conditions might lead to large changes in the beam configuration, which results in unstable equilibria. Under certain boundary conditions, bifurcation can appear leading to a multiplicity of solutions [38]. In particular, this means that the numerical problem may display history-dependence and converge to solutions that do not minimise the bending energy. In order to generate a physically meaningful data set, avoiding unstable and non-unique solutions is essential. Thus, in addition to the minimisation of the discrete action Sdsubscript𝑆𝑑S_{d}italic_S start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT in Equation (6), we ensure the fulfilment of the discrete Euler-Lagrange equations (7), which can be seen as necessary conditions for the stationarity of the discrete action. We exclude from the data set numerical solutions computed with boundary conditions where minimisation of Equation (6) and accurate solution of Equations (7) can not be simultaneously achieved.

In particular, we consider a curve of length L=3.3𝐿3.3L=3.3italic_L = 3.3 and bending stiffness EI=10𝐸𝐼10EI=10italic_E italic_I = 10, divided into N=50𝑁50N=50italic_N = 50 intervals. We fix the endpoints 𝐪0=(0,0)subscript𝐪000\mathbf{q}_{0}=(0,0)bold_q start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = ( 0 , 0 ), 𝐪N=(3,0)subscript𝐪𝑁30\mathbf{q}_{N}=(3,0)bold_q start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT = ( 3 , 0 ). The units of measurement are deliberately omitted as they have no impact on the results of this work. We impose boundary conditions on the tangents in the following two variants:

  1. 1.

    the angle of the tangents with respect to the x𝑥xitalic_x-axis at the boundary, θ0subscript𝜃0\theta_{0}italic_θ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT and θNsubscript𝜃𝑁\theta_{N}italic_θ start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT, is prescribed in the range [0,2π]02𝜋[0,2\pi][ 0 , 2 italic_π ], in a specular symmetric fashion, i.e., θN=πθ0subscript𝜃𝑁𝜋subscript𝜃0\theta_{N}=\pi-\theta_{0}italic_θ start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT = italic_π - italic_θ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT. Hereafter, we refer to this case as both-ends,

  2. 2.

    the angle of the left tangent is left fixed as θ0=0subscript𝜃00\theta_{0}=0italic_θ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = 0 and the angle of the right tangent, θNsubscript𝜃𝑁\theta_{N}italic_θ start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT, varies in the range of [0,2π]02𝜋[0,2\pi][ 0 , 2 italic_π ]. We refer to this case as right-end.

Based on these parameters and boundary values, and using cubic splines as initial guess, we generate a data set of 2000200020002000 trajectories (1000100010001000 trajectories for each case) by minimising the particular action in Equation (6), with the trust-constr solver of the optimize.minimize procedure provided in SciPy [45]. We check the resulting solutions by using them as initial guesses for the optimize.root method of SciPy, solving the discrete Euler-Lagrange equations (7).

The learning problem we consider relies on numerically generated solution curves. This choice allows us to work with data points that are quantifiably close to the analytical solution of Euler’s elastica. Consequently, showing that the neural networks we propose can accurately approximate these curves translates into their ability to approximate the analytical solution accurately. The motivation of this strategy is not to improve on the numerical solver we use, but to use its accuracy to train a model able to extrapolate to unseen boundary conditions and generate their solution curves more efficiently than the numerical method itself. The chosen supervised learning setting is independent of the fact that we use numerical solutions as data. Indeed, if one had another reliable approximation of the analytical solution, for example, based on realistic measurements, those could also be used or combined with numerically generated trajectories. Using numerical solutions as data is not an inherent limitation of the proposed procedure but a choice we make to quotient out the issue of not having reliable input data. Furthermore, we mainly focus on the development of neural networks able to approximate such input data with high accuracy.

3 Approximation with neural networks

We start by providing a concise overview of neural networks, which also serves to define the notation used in Sections 4 and 5. We refer to [46, 31, 47] and references therein for a more extensive introduction. A neural network is a parametric function f𝝆:𝒪:subscript𝑓𝝆𝒪f_{\boldsymbol{\rho}}:\mathcal{I}\rightarrow\mathcal{O}italic_f start_POSTSUBSCRIPT bold_italic_ρ end_POSTSUBSCRIPT : caligraphic_I → caligraphic_O with parameters 𝝆Ψ𝝆Ψ\boldsymbol{\rho}\in\Psibold_italic_ρ ∈ roman_Ψ given as a composition of multiple transformations,

f𝝆:=ffjf1,assignsubscript𝑓𝝆subscript𝑓subscript𝑓𝑗subscript𝑓1f_{\boldsymbol{\rho}}:=f_{\ell}\circ\dots\circ f_{j}\circ\dots\circ f_{1},italic_f start_POSTSUBSCRIPT bold_italic_ρ end_POSTSUBSCRIPT := italic_f start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ∘ ⋯ ∘ italic_f start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ∘ ⋯ ∘ italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , (8)

where each fjsubscript𝑓𝑗f_{j}italic_f start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT represents the j𝑗jitalic_j-th layer of the network, with j=1,,𝑗1j=1,\dots,\ellitalic_j = 1 , … , roman_ℓ, and \ellroman_ℓ is the number of layers. For example, multi-layer perceptrons (MLPs) have each layer fjsubscript𝑓𝑗f_{j}italic_f start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT defined as

fjMLP(𝐱)=σ(𝐀j𝐱+𝐛j)nj,superscriptsubscript𝑓𝑗MLP𝐱𝜎subscript𝐀𝑗𝐱subscript𝐛𝑗superscriptsubscript𝑛𝑗f_{j}^{\mathrm{MLP}}(\mathbf{x})=\sigma(\mathbf{A}_{j}\mathbf{x}+\mathbf{b}_{j% })\in\mathbb{R}^{n_{j}},italic_f start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_MLP end_POSTSUPERSCRIPT ( bold_x ) = italic_σ ( bold_A start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT bold_x + bold_b start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) ∈ blackboard_R start_POSTSUPERSCRIPT italic_n start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUPERSCRIPT , (9)

where njsubscript𝑛𝑗n_{j}italic_n start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT is the dimension of the output of the j𝑗jitalic_j-th layer, 𝐱nj1𝐱superscriptsubscript𝑛𝑗1\mathbf{x}\in\mathbb{R}^{n_{j-1}}bold_x ∈ blackboard_R start_POSTSUPERSCRIPT italic_n start_POSTSUBSCRIPT italic_j - 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT, and 𝐀jnj×nj1subscript𝐀𝑗superscriptsubscript𝑛𝑗subscript𝑛𝑗1\mathbf{A}_{j}\in\mathbb{R}^{n_{j}\times n_{j-1}}bold_A start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_n start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT × italic_n start_POSTSUBSCRIPT italic_j - 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT, 𝐛jnjsubscript𝐛𝑗superscriptsubscript𝑛𝑗\mathbf{b}_{j}\in\mathbb{R}^{n_{j}}bold_b start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_n start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUPERSCRIPT are the parameters of the j𝑗jitalic_j-th layer, i.e., 𝝆={𝐀j,𝐛j}j=1𝝆superscriptsubscriptsubscript𝐀𝑗subscript𝐛𝑗𝑗1\boldsymbol{\rho}=\{\mathbf{A}_{j},\mathbf{b}_{j}\}_{j=1}^{\ell}bold_italic_ρ = { bold_A start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , bold_b start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT. The activation function σ𝜎\sigmaitalic_σ is a continuous nonlinear scalar function, which acts component-wise on vectors. The architecture of the neural network is prescribed by the layers fjsubscript𝑓𝑗f_{j}italic_f start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT in Equation (8) and determines the space of functions ={f𝝆:𝒪,𝝆Ψ}conditional-setsubscript𝑓𝝆formulae-sequence𝒪𝝆Ψ\mathcal{F}=\{f_{\boldsymbol{\rho}}:\mathcal{I}\rightarrow\mathcal{O},\,\,% \boldsymbol{\rho}\in\Psi\}caligraphic_F = { italic_f start_POSTSUBSCRIPT bold_italic_ρ end_POSTSUBSCRIPT : caligraphic_I → caligraphic_O , bold_italic_ρ ∈ roman_Ψ } that can be represented. The weights 𝝆𝝆\boldsymbol{\rho}bold_italic_ρ are chosen such that f𝝆subscript𝑓𝝆f_{\boldsymbol{\rho}}italic_f start_POSTSUBSCRIPT bold_italic_ρ end_POSTSUBSCRIPT approximates accurately enough a map of interest f:𝒪:𝑓𝒪f:\mathcal{I}\rightarrow\mathcal{O}italic_f : caligraphic_I → caligraphic_O. Usually, this choice follows from minimising a purposely designed loss function Loss(𝝆)Loss𝝆\rm{Loss}(\boldsymbol{\rho})roman_Loss ( bold_italic_ρ ).

In supervised learning, we are given a data set Ω={𝐱i,𝐲i}i=1MΩsuperscriptsubscriptsuperscript𝐱𝑖superscript𝐲𝑖𝑖1𝑀\Omega=\{\mathbf{x}^{i},\mathbf{y}^{i}\}_{i=1}^{M}roman_Ω = { bold_x start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , bold_y start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_M end_POSTSUPERSCRIPT consisting of M𝑀Mitalic_M pairs (𝐱i,𝐲i=f(𝐱i))superscript𝐱𝑖superscript𝐲𝑖𝑓superscript𝐱𝑖(\mathbf{x}^{i},\mathbf{y}^{i}=f\left(\mathbf{x}^{i}\right))( bold_x start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , bold_y start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT = italic_f ( bold_x start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) ). The loss function measures the distance between the network predictions f𝝆(𝐱i)subscript𝑓𝝆superscript𝐱𝑖f_{\boldsymbol{\rho}}\left(\mathbf{x}^{i}\right)italic_f start_POSTSUBSCRIPT bold_italic_ρ end_POSTSUBSCRIPT ( bold_x start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) and the desired outputs 𝐲isuperscript𝐲𝑖\mathbf{y}^{i}bold_y start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT in some appropriate norm \|\cdot\|∥ ⋅ ∥,

Loss(𝝆)=1Mi=1Mf𝝆(𝐱i)𝐲i2.Loss𝝆1𝑀superscriptsubscript𝑖1𝑀superscriptnormsubscript𝑓𝝆superscript𝐱𝑖superscript𝐲𝑖2\textrm{Loss}(\boldsymbol{\rho})=\frac{1}{M}\sum_{i=1}^{M}\left\|f_{% \boldsymbol{\rho}}\left(\mathbf{x}^{i}\right)-\mathbf{y}^{i}\right\|^{2}.Loss ( bold_italic_ρ ) = divide start_ARG 1 end_ARG start_ARG italic_M end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_M end_POSTSUPERSCRIPT ∥ italic_f start_POSTSUBSCRIPT bold_italic_ρ end_POSTSUBSCRIPT ( bold_x start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) - bold_y start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT .

The training of the network is the process of minimising Loss(𝝆)Loss𝝆\rm{Loss}(\boldsymbol{\rho})roman_Loss ( bold_italic_ρ ) with respect to 𝝆𝝆\boldsymbol{\rho}bold_italic_ρ and it is usually done with gradient descent (GD):

𝝆(k)𝝆(k)ηLoss(𝝆)(k)=:𝝆(k+1).\boldsymbol{\rho}{{}^{(k)}}\mapsto\boldsymbol{\rho}{{}^{(k)}}-\eta\nabla% \textrm{Loss}\left(\boldsymbol{\rho}{{}^{(k)}}\right)=:{\boldsymbol{\rho}^{(k+% 1)}}.bold_italic_ρ start_FLOATSUPERSCRIPT ( italic_k ) end_FLOATSUPERSCRIPT ↦ bold_italic_ρ start_FLOATSUPERSCRIPT ( italic_k ) end_FLOATSUPERSCRIPT - italic_η ∇ Loss ( bold_italic_ρ start_FLOATSUPERSCRIPT ( italic_k ) end_FLOATSUPERSCRIPT ) = : bold_italic_ρ start_POSTSUPERSCRIPT ( italic_k + 1 ) end_POSTSUPERSCRIPT .

The scalar value η𝜂\etaitalic_η is known as the learning rate. The iteration process is often implemented using subsets of data ΩΩ\mathcal{B}\subset\Omegacaligraphic_B ⊂ roman_Ω of cardinality B=||𝐵B=|\mathcal{B}|italic_B = | caligraphic_B | (batches). In this paper we use an accelerated version of GD known as Adam [48].

During training, we evaluate the model’s prediction accuracy using inputs in a validation set. This helps to prevent overfitting on the training data and may serve as a stopping criterion if the training loss diminishes but the validation error rises. Once the training is complete, we assess the model’s accuracy in predicting the correct output for new inputs included in a test set composed of boundary conditions outside the training and validation sets. In the following, we measure the accuracy on the training, validation, and test data using the mean squared error of the difference between the predicted trajectories and the true ones.

We now turn to the task of approximating the static equilibria of the planar elastica introduced in Section 2, i.e., approximating a family of curves {𝐪i:[0,L]2}conditional-setsuperscript𝐪𝑖maps-to0𝐿superscript2\{\mathbf{q}^{i}:[0,L]\mapsto\mathbb{R}^{2}\}{ bold_q start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT : [ 0 , italic_L ] ↦ blackboard_R start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT } determined by boundary conditions,

{𝐪i(0)=𝐪0i,𝐪i(L)=𝐪Ni,(𝐪i)(0)=(𝐪0i),(𝐪i)(L)=(𝐪Ni)},formulae-sequencesuperscript𝐪𝑖0subscriptsuperscript𝐪𝑖0formulae-sequencesuperscript𝐪𝑖𝐿subscriptsuperscript𝐪𝑖𝑁formulae-sequencesuperscriptsuperscript𝐪𝑖0superscriptsubscriptsuperscript𝐪𝑖0superscriptsuperscript𝐪𝑖𝐿superscriptsubscriptsuperscript𝐪𝑖𝑁\{\mathbf{q}^{i}(0)=\mathbf{q}^{i}_{0},\;\mathbf{q}^{i}(L)=\mathbf{q}^{i}_{N},% \;(\mathbf{q}^{i})^{\prime}(0)=(\mathbf{q}^{i}_{0})^{\prime},\;(\mathbf{q}^{i}% )^{\prime}(L)=(\mathbf{q}^{i}_{N})^{\prime}\},{ bold_q start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( 0 ) = bold_q start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , bold_q start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( italic_L ) = bold_q start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT , ( bold_q start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( 0 ) = ( bold_q start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , ( bold_q start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_L ) = ( bold_q start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT } , (10)

where (𝐪0i,𝐪Ni,(𝐪0i),(𝐪Ni))8subscriptsuperscript𝐪𝑖0subscriptsuperscript𝐪𝑖𝑁superscriptsubscriptsuperscript𝐪𝑖0superscriptsubscriptsuperscript𝐪𝑖𝑁superscript8\left(\mathbf{q}^{i}_{0},\mathbf{q}^{i}_{N},(\mathbf{q}^{i}_{0})^{\prime},(% \mathbf{q}^{i}_{N})^{\prime}\right)\in\mathbb{R}^{8}( bold_q start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , bold_q start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT , ( bold_q start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , ( bold_q start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) ∈ blackboard_R start_POSTSUPERSCRIPT 8 end_POSTSUPERSCRIPT. To tackle this problem, we require a set of evaluations {𝐪ki,(𝐪ki)}subscriptsuperscript𝐪𝑖𝑘superscriptsubscriptsuperscript𝐪𝑖𝑘\{\mathbf{q}^{i}_{k},(\mathbf{q}^{i}_{k})^{\prime}\}{ bold_q start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , ( bold_q start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT } on the nodes sk[0,L]subscript𝑠𝑘0𝐿s_{k}\in[0,L]italic_s start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∈ [ 0 , italic_L ] of a discretisation. More precisely, in our setting, the data set includes numerical approximations 𝐪^^𝐪\hat{\mathbf{q}}over^ start_ARG bold_q end_ARG of the solution 𝐪(s)𝐪𝑠\mathbf{q}(s)bold_q ( italic_s ) and its spatial derivative 𝐪(s)superscript𝐪𝑠\mathbf{q}^{\prime}(s)bold_q start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_s ) at the N1𝑁1N-1italic_N - 1 discrete locations sk=khLsubscript𝑠𝑘𝑘𝐿s_{k}=\frac{kh}{L}italic_s start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = divide start_ARG italic_k italic_h end_ARG start_ARG italic_L end_ARG in the interval [0,L]0𝐿[0,L][ 0 , italic_L ], for M𝑀Mitalic_M pairs of boundary conditions, as described in Section 2.2.

4 The discrete network

The discretisation of Euler’s elastica presented in Section 2.1 provides discrete solutions on a set of nodes along the curve. These solutions can sometimes be hard to obtain since a global optimisation problem needs to be solved, and the number of nodes can be large. This motivates using neural networks to learn the approximate solution on the internal nodes for a given set of boundary conditions. The data set ΩΩ\Omegaroman_Ω consists of M𝑀Mitalic_M precomputed discrete solutions

Ω={(𝐱i,𝐲i)}i=1M,Ωsuperscriptsubscriptsuperscript𝐱𝑖superscript𝐲𝑖𝑖1𝑀\Omega=\left\{(\mathbf{x}^{i},\mathbf{y}^{i})\right\}_{i=1}^{M},roman_Ω = { ( bold_x start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , bold_y start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_M end_POSTSUPERSCRIPT ,

where

𝐱i=(𝐪0i,(𝐪0i),𝐪Ni,(𝐪Ni))8superscript𝐱𝑖subscriptsuperscript𝐪𝑖0superscriptsubscriptsuperscript𝐪𝑖0subscriptsuperscript𝐪𝑖𝑁superscriptsubscriptsuperscript𝐪𝑖𝑁superscript8\mathbf{x}^{i}=\left(\mathbf{q}^{i}_{0},(\mathbf{q}^{i}_{0})^{\prime},\mathbf{% q}^{i}_{N},(\mathbf{q}^{i}_{N})^{\prime}\right)\in\mathbb{R}^{8}bold_x start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT = ( bold_q start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , ( bold_q start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , bold_q start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT , ( bold_q start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) ∈ blackboard_R start_POSTSUPERSCRIPT 8 end_POSTSUPERSCRIPT

are the input boundary conditions and

𝐲i=(𝐪^1i,(𝐪^1i),,𝐪^N1i,(𝐪^N1i))4(N1)superscript𝐲𝑖superscriptsubscript^𝐪1𝑖superscriptsubscriptsuperscript^𝐪𝑖1superscriptsubscript^𝐪𝑁1𝑖superscriptsubscriptsuperscript^𝐪𝑖𝑁1superscript4𝑁1\mathbf{y}^{i}=(\hat{\mathbf{q}}_{1}^{i},(\hat{\mathbf{q}}^{i}_{1})^{\prime},% \ldots,\hat{\mathbf{q}}_{N-1}^{i},(\hat{\mathbf{q}}^{i}_{N-1})^{\prime})\in% \mathbb{R}^{4(N-1)}bold_y start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT = ( over^ start_ARG bold_q end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , ( over^ start_ARG bold_q end_ARG start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , … , over^ start_ARG bold_q end_ARG start_POSTSUBSCRIPT italic_N - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , ( over^ start_ARG bold_q end_ARG start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_N - 1 end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) ∈ blackboard_R start_POSTSUPERSCRIPT 4 ( italic_N - 1 ) end_POSTSUPERSCRIPT

are the computed solutions at the internal nodes that serve as output data for the network’s training.

For any symmetric positive definite matrix W𝑊Witalic_W, we define the weighted norm 𝐱W2=𝐱W𝐱superscriptsubscriptnorm𝐱𝑊2superscript𝐱top𝑊𝐱\|\mathbf{x}\|_{W}^{2}=\mathbf{x}^{\top}W\mathbf{x}∥ bold_x ∥ start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = bold_x start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_W bold_x. The weighted MSE loss

Loss(𝝆)=14M(N1)i=1Mq𝝆d(𝐱i)𝐲iW2Loss𝝆14𝑀𝑁1superscriptsubscript𝑖1𝑀superscriptsubscriptnormsuperscriptsubscript𝑞𝝆dsuperscript𝐱𝑖superscript𝐲𝑖𝑊2\mathrm{Loss}(\boldsymbol{\rho})=\frac{1}{4M(N-1)}\sum_{i=1}^{M}\left\|q_{% \boldsymbol{\rho}}^{\textrm{d}}\left(\mathbf{x}^{i}\right)-\mathbf{y}^{i}% \right\|_{W}^{2}roman_Loss ( bold_italic_ρ ) = divide start_ARG 1 end_ARG start_ARG 4 italic_M ( italic_N - 1 ) end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_M end_POSTSUPERSCRIPT ∥ italic_q start_POSTSUBSCRIPT bold_italic_ρ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT d end_POSTSUPERSCRIPT ( bold_x start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) - bold_y start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT (11)

will be used to learn the input-to-output map q𝝆d:84(N1):superscriptsubscript𝑞𝝆dsuperscript8superscript4𝑁1q_{\boldsymbol{\rho}}^{\textrm{d}}:\mathbb{R}^{8}\to\mathbb{R}^{4(N-1)}italic_q start_POSTSUBSCRIPT bold_italic_ρ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT d end_POSTSUPERSCRIPT : blackboard_R start_POSTSUPERSCRIPT 8 end_POSTSUPERSCRIPT → blackboard_R start_POSTSUPERSCRIPT 4 ( italic_N - 1 ) end_POSTSUPERSCRIPT, where the superscript dd\mathrm{d}roman_d stands for discrete. One should be aware that there is a numerical error in 𝐲isuperscript𝐲𝑖\mathbf{y}^{i}bold_y start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT compared to the exact solution and the size of this error will pose a limit to the accuracy of the neural network approximation.

4.1 Numerical experiments

This section provides experimental support to the proposed learning framework using the machine learning library PyTorch [49]. The experiments of this section are run on a CPU machine. We perform a series of experiments varying some hyperparameters in the training procedure. We fix the batch size B𝐵Bitalic_B to 32 and use the Adam optimiser [48] for the training with learning rate 103superscript10310^{-3}10 start_POSTSUPERSCRIPT - 3 end_POSTSUPERSCRIPT and weight decay set to 00. In (11) we use the weight matrix

W=I+γGG,𝑊𝐼𝛾superscript𝐺top𝐺W=I+\gamma G^{\top}G,italic_W = italic_I + italic_γ italic_G start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_G ,

where G=S4I𝐺superscript𝑆4𝐼G=S^{4}-Iitalic_G = italic_S start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT - italic_I with S𝑆Sitalic_S the forward shift operator on vectors of 4(N1)superscript4𝑁1\mathbb{R}^{4(N-1)}blackboard_R start_POSTSUPERSCRIPT 4 ( italic_N - 1 ) end_POSTSUPERSCRIPT. This choice of G𝐺Gitalic_G allows us to compute differences between corresponding entries of the input associated with neighbouring nodes. We determine the number of epochs for training both the discrete and continuous networks based on experimental evidence. We fix a high enough number which allows us to achieve qualitatively accurate predictions and ensure that both training and validation losses start to plateau a few epochs before the set maximum. We consider a multi-layer perceptron with the hyperbolic tangent as an activation function, and we vary the number of layers and the number of hidden nodes in each layer. We also test different values of the parameter γ𝛾\gammaitalic_γ in the weight matrix W𝑊Witalic_W. We rely on the software framework Optuna [50], which employs Bayesian optimisation methods to automate and efficiently conduct the search for the combination that yields the best result. We collect in Table 4 the hyperparameters with the corresponding ranges and in Table 5 the selected values. The resulting training error on the both-end data set is 1.141071.14superscript1071.14\cdot 10^{-7}1.14 ⋅ 10 start_POSTSUPERSCRIPT - 7 end_POSTSUPERSCRIPT, the validation error is 2.1511072.151superscript1072.151\cdot 10^{-7}2.151 ⋅ 10 start_POSTSUPERSCRIPT - 7 end_POSTSUPERSCRIPT, and the test error is 4.0091074.009superscript1074.009\cdot 10^{-7}4.009 ⋅ 10 start_POSTSUPERSCRIPT - 7 end_POSTSUPERSCRIPT. Figure 1 compares test trajectories for 𝐪𝐪\mathbf{q}bold_q and 𝐪superscript𝐪\mathbf{q}^{\prime}bold_q start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT. We remark that, as already clear from the low value of the training and test errors, the network can accurately replicate the behaviour of the training and test data. Furthermore, we have zero errors at the end nodes since the network is trained only on the internal nodes and the boundary values are appended to the predicted solution in a post-processing phase. On the other hand, since this discrete approach does not relate the components as evaluations of a smooth curve, there is no regular behaviour in the error.

Refer to caption
Refer to caption
Refer to caption
Figure 1: Comparison over test trajectories for 𝐪𝐪\mathbf{q}bold_q and 𝐪superscript𝐪\mathbf{q}^{\prime}bold_q start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT for the discrete network q𝝆dsuperscriptsubscript𝑞𝝆dq_{\boldsymbol{\rho}}^{\textrm{d}}italic_q start_POSTSUBSCRIPT bold_italic_ρ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT d end_POSTSUPERSCRIPT tested on the both-ends data set with 80%10%10%percent80percent10percent1080\%\--10\%\--10\%80 % - 10 % - 10 % splitting into training, validation, and test sets. The mean squared error on the test set equals 4.0091074.009superscript1074.009\cdot 10^{-7}4.009 ⋅ 10 start_POSTSUPERSCRIPT - 7 end_POSTSUPERSCRIPT. For presentation purposes, only 10 randomly selected trajectories are considered in the first two plots.

As an additional evaluation of the deep learning framework’s behaviour, we conduct experiments to assess how the learning process performs when the number of training data varies, i.e., with different splittings of the data set into training, validation, and test sets. We report the results in Table 2 and summarise the corresponding hyperparameters in Table 5 of the Appendix.

Data set splittingTraining - validation - testData set splittingTraining - validation - test\begin{array}[]{c}\text{Data set splitting}\\ \text{Training \-- validation \-- test}\end{array}start_ARRAY start_ROW start_CELL Data set splitting end_CELL end_ROW start_ROW start_CELL Training - validation - test end_CELL end_ROW end_ARRAY Training accuracy Validation accuracy Test accuracy
10% - 10% - 10% 2.3311052.331superscript1052.331\cdot 10^{-5}2.331 ⋅ 10 start_POSTSUPERSCRIPT - 5 end_POSTSUPERSCRIPT 3.8741053.874superscript1053.874\cdot 10^{-5}3.874 ⋅ 10 start_POSTSUPERSCRIPT - 5 end_POSTSUPERSCRIPT 8.5451048.545superscript1048.545\cdot 10^{-4}8.545 ⋅ 10 start_POSTSUPERSCRIPT - 4 end_POSTSUPERSCRIPT
20% - 10% - 10% 1.8521061.852superscript1061.852\cdot 10^{-6}1.852 ⋅ 10 start_POSTSUPERSCRIPT - 6 end_POSTSUPERSCRIPT 1.3271061.327superscript1061.327\cdot 10^{-6}1.327 ⋅ 10 start_POSTSUPERSCRIPT - 6 end_POSTSUPERSCRIPT 1.3611041.361superscript1041.361\cdot 10^{-4}1.361 ⋅ 10 start_POSTSUPERSCRIPT - 4 end_POSTSUPERSCRIPT
40% - 10% - 10% 4.8021074.802superscript1074.802\cdot 10^{-7}4.802 ⋅ 10 start_POSTSUPERSCRIPT - 7 end_POSTSUPERSCRIPT 4.7931074.793superscript1074.793\cdot 10^{-7}4.793 ⋅ 10 start_POSTSUPERSCRIPT - 7 end_POSTSUPERSCRIPT 1.2951061.295superscript1061.295\cdot 10^{-6}1.295 ⋅ 10 start_POSTSUPERSCRIPT - 6 end_POSTSUPERSCRIPT
80% - 10% - 10% 1.1401071.140superscript1071.140\cdot 10^{-7}1.140 ⋅ 10 start_POSTSUPERSCRIPT - 7 end_POSTSUPERSCRIPT 2.1511072.151superscript1072.151\cdot 10^{-7}2.151 ⋅ 10 start_POSTSUPERSCRIPT - 7 end_POSTSUPERSCRIPT 4.0091074.009superscript1074.009\cdot 10^{-7}4.009 ⋅ 10 start_POSTSUPERSCRIPT - 7 end_POSTSUPERSCRIPT
Table 2: Behaviour of the discrete network q𝝆dsuperscriptsubscript𝑞𝝆dq_{\boldsymbol{\rho}}^{\textrm{d}}italic_q start_POSTSUBSCRIPT bold_italic_ρ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT d end_POSTSUPERSCRIPT tested on the both-ends data set with fewer training data points. The size of the training set varies, while that of the validation and the test sets is fixed. The last row corresponds to the results in Figure 1.

We also report results obtained by merging the both-end and the right-end trajectories, with 80%10%10%percent80percent10percent1080\%\--10\%\--10\%80 % - 10 % - 10 % splitting of the whole new data set into training, validation, and test sets. The results are shown in Figure 2 and are obtained with 3 layers, 616 hidden nodes, and γ=7.323103𝛾7.323superscript103\gamma=7.323\cdot 10^{-3}italic_γ = 7.323 ⋅ 10 start_POSTSUPERSCRIPT - 3 end_POSTSUPERSCRIPT. The resulting training, validation, and test errors are, respectively, 9.8931089.893superscript1089.893\cdot 10^{-8}9.893 ⋅ 10 start_POSTSUPERSCRIPT - 8 end_POSTSUPERSCRIPT, 1.1261071.126superscript1071.126\cdot 10^{-7}1.126 ⋅ 10 start_POSTSUPERSCRIPT - 7 end_POSTSUPERSCRIPT, and 7.8541087.854superscript1087.854\cdot 10^{-8}7.854 ⋅ 10 start_POSTSUPERSCRIPT - 8 end_POSTSUPERSCRIPT.

Refer to caption
Refer to caption
Refer to caption
Figure 2: Comparison over test trajectories for 𝐪𝐪\mathbf{q}bold_q and 𝐪superscript𝐪\mathbf{q}^{\prime}bold_q start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT for the discrete network q𝝆dsuperscriptsubscript𝑞𝝆dq_{\boldsymbol{\rho}}^{\textrm{d}}italic_q start_POSTSUBSCRIPT bold_italic_ρ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT d end_POSTSUPERSCRIPT tested on the both-ends + right-end data set with 80%10%10%percent80percent10percent1080\%\--10\%\--10\%80 % - 10 % - 10 % splitting into training, validation, and test sets. The mean squared error on the test set equals 7.8541087.854superscript1087.854\cdot 10^{-8}7.854 ⋅ 10 start_POSTSUPERSCRIPT - 8 end_POSTSUPERSCRIPT. For presentation purposes, only 10 randomly selected trajectories are considered in the first two plots.

5 The continuous network

The approach described in the previous section shows accurate results, given a large enough amount of beam discretisations with a fixed number of nodes N+1𝑁1N+1italic_N + 1, equally distributed in [0,L]0𝐿[0,L][ 0 , italic_L ]. It seems reasonable to expect the parametric model’s approximation quality to improve when the number of discretisation nodes increases. However, in this approach, the dimension of the predicted vector grows with N𝑁Nitalic_N, and hence minimising the loss function (11) becomes more difficult. In addition, the fact that the discrete network approach depends on the spatial discretisation of the training data restricts the output dimension to a specific number of nodes. Consequently, there would be two main options to assess the solution at different locations: training the network once more, or interpolating the previously obtained approximation. These limitations make such a discrete approach less appealing and suggest that having a neural network that is a smooth function of the arc length coordinate s𝑠sitalic_s can be beneficial. This modelling assumption would also be compatible with different discretisations of the curve and would not suffer from the curse of dimensionality if more nodes were added. In this setting, the discrete node sksubscript𝑠𝑘s_{k}italic_s start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT at which an approximation of the solution is available, is included in the input data together with the boundary conditions. As a result, we work with the following data set

Ω={(sk,𝐱i),𝐲ki}k=0,,Ni=1,,M,Ωsuperscriptsubscriptsubscript𝑠𝑘superscript𝐱𝑖superscriptsubscript𝐲𝑘𝑖𝑘0𝑁𝑖1𝑀\Omega=\left\{\left(s_{k},\,\mathbf{x}^{i}\right),\;{\mathbf{y}}_{k}^{i}\right% \}_{k=0,\dots,N}^{i=1,\dots,M},roman_Ω = { ( italic_s start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , bold_x start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) , bold_y start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT } start_POSTSUBSCRIPT italic_k = 0 , … , italic_N end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i = 1 , … , italic_M end_POSTSUPERSCRIPT ,

where, as in the previous section,

𝐱i=(𝐪0i,(𝐪0i),𝐪Ni,(𝐪Ni))8,superscript𝐱𝑖subscriptsuperscript𝐪𝑖0superscriptsubscriptsuperscript𝐪𝑖0subscriptsuperscript𝐪𝑖𝑁superscriptsubscriptsuperscript𝐪𝑖𝑁superscript8\mathbf{x}^{i}=\left(\,\mathbf{q}^{i}_{0},\,(\mathbf{q}^{i}_{0})^{\prime},\,% \mathbf{q}^{i}_{N},\,(\mathbf{q}^{i}_{N})^{\prime}\right)\in\mathbb{R}^{8},bold_x start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT = ( bold_q start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , ( bold_q start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , bold_q start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT , ( bold_q start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) ∈ blackboard_R start_POSTSUPERSCRIPT 8 end_POSTSUPERSCRIPT ,

and

𝐲ki=(𝐪^ki,(𝐪^ki)).superscriptsubscript𝐲𝑘𝑖superscriptsubscript^𝐪𝑘𝑖superscriptsuperscriptsubscript^𝐪𝑘𝑖{\mathbf{y}}_{k}^{i}=\left(\hat{\mathbf{q}}_{k}^{i},\,(\hat{\mathbf{q}}_{k}^{i% })^{\prime}\right).bold_y start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT = ( over^ start_ARG bold_q end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , ( over^ start_ARG bold_q end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) .

Here 𝐪^kisuperscriptsubscript^𝐪𝑘𝑖\hat{\mathbf{q}}_{k}^{i}over^ start_ARG bold_q end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT is the numerical solution 𝐪^^𝐪\hat{\mathbf{q}}over^ start_ARG bold_q end_ARG on the node sksubscript𝑠𝑘s_{k}italic_s start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT, satisfying the i𝑖iitalic_i-th boundary conditions in Equation (10). Let us introduce the neural network

q𝝆c:8𝒞([0,L],2),:superscriptsubscript𝑞𝝆csuperscript8superscript𝒞0𝐿superscript2q_{\boldsymbol{\rho}}^{\mathrm{c}}:\mathbb{R}^{8}\to\mathcal{C}^{\infty}\left(% [0,L],\mathbb{R}^{2}\right),italic_q start_POSTSUBSCRIPT bold_italic_ρ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_c end_POSTSUPERSCRIPT : blackboard_R start_POSTSUPERSCRIPT 8 end_POSTSUPERSCRIPT → caligraphic_C start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT ( [ 0 , italic_L ] , blackboard_R start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) ,

and the differential operator

𝒟:𝒞([0,L],2)𝒞([0,L],2),𝒟(q𝝆c(𝐱i))(sk)=dds(q𝝆c(𝐱i))(s)|s=sk,:𝒟formulae-sequencesuperscript𝒞0𝐿superscript2superscript𝒞0𝐿superscript2𝒟superscriptsubscript𝑞𝝆csuperscript𝐱𝑖subscript𝑠𝑘evaluated-at𝑑𝑑𝑠superscriptsubscript𝑞𝝆csuperscript𝐱𝑖𝑠𝑠subscript𝑠𝑘\mathcal{D}:\mathcal{C}^{\infty}\left([0,L],\mathbb{R}^{2}\right)\to\mathcal{C% }^{\infty}\left([0,L],\mathbb{R}^{2}\right),\,\,\mathcal{D}\left(q_{% \boldsymbol{\rho}}^{\mathrm{c}}\left(\mathbf{x}^{i}\right)\right)(s_{k})=\frac% {d}{ds}\left(q_{\boldsymbol{\rho}}^{\rm c}(\mathbf{x}^{i})\right)(s)\Big{|}_{s% =s_{k}},caligraphic_D : caligraphic_C start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT ( [ 0 , italic_L ] , blackboard_R start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) → caligraphic_C start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT ( [ 0 , italic_L ] , blackboard_R start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) , caligraphic_D ( italic_q start_POSTSUBSCRIPT bold_italic_ρ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_c end_POSTSUPERSCRIPT ( bold_x start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) ) ( italic_s start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) = divide start_ARG italic_d end_ARG start_ARG italic_d italic_s end_ARG ( italic_q start_POSTSUBSCRIPT bold_italic_ρ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_c end_POSTSUPERSCRIPT ( bold_x start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) ) ( italic_s ) | start_POSTSUBSCRIPT italic_s = italic_s start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT ,

so that we can define

y𝝆(𝐱i)(sk):=(q𝝆c(𝐱i)(sk),𝒟(q𝝆c(𝐱i))(sk)).assignsubscript𝑦𝝆superscript𝐱𝑖subscript𝑠𝑘superscriptsubscript𝑞𝝆csuperscript𝐱𝑖subscript𝑠𝑘𝒟superscriptsubscript𝑞𝝆csuperscript𝐱𝑖subscript𝑠𝑘y_{\boldsymbol{\rho}}\left(\mathbf{x}^{i}\right)(s_{k}):=\left(q_{\boldsymbol{% \rho}}^{\mathrm{c}}\left(\mathbf{x}^{i}\right)(s_{k}),\,\mathcal{D}\left(q_{% \boldsymbol{\rho}}^{\mathrm{c}}\left(\mathbf{x}^{i}\right)\right)(s_{k})\right).italic_y start_POSTSUBSCRIPT bold_italic_ρ end_POSTSUBSCRIPT ( bold_x start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) ( italic_s start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) := ( italic_q start_POSTSUBSCRIPT bold_italic_ρ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_c end_POSTSUPERSCRIPT ( bold_x start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) ( italic_s start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) , caligraphic_D ( italic_q start_POSTSUBSCRIPT bold_italic_ρ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_c end_POSTSUPERSCRIPT ( bold_x start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) ) ( italic_s start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) ) .

To train the network q𝝆csuperscriptsubscript𝑞𝝆cq_{\boldsymbol{\rho}}^{\mathrm{c}}italic_q start_POSTSUBSCRIPT bold_italic_ρ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_c end_POSTSUPERSCRIPT, we define the loss function

Loss(𝝆)=Loss𝝆absent\displaystyle\textrm{Loss}(\boldsymbol{\rho})=Loss ( bold_italic_ρ ) = 14M(N+1)i=1Mk=0N(y𝝆(𝐱i)(sk)𝒚ki22+\displaystyle\frac{1}{4M(N+1)}\sum_{i=1}^{M}\sum_{k=0}^{N}\left(\left\|{y_{% \boldsymbol{\rho}}}\left(\mathbf{x}^{i}\right)(s_{k})-{\boldsymbol{y}}_{k}^{i}% \right\|^{2}_{2}+\right.divide start_ARG 1 end_ARG start_ARG 4 italic_M ( italic_N + 1 ) end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_M end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_k = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT ( ∥ italic_y start_POSTSUBSCRIPT bold_italic_ρ end_POSTSUBSCRIPT ( bold_x start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) ( italic_s start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) - bold_italic_y start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT + (12)
γ(π𝒟(y𝝆(𝐱i)(sk))221)2),\displaystyle\left.\gamma\left(\left\|\pi_{\mathcal{D}}\left(y_{\boldsymbol{% \rho}}\left(\mathbf{x}^{i}\right)(s_{k})\right)\right\|_{2}^{2}-1\right)^{2}% \right),italic_γ ( ∥ italic_π start_POSTSUBSCRIPT caligraphic_D end_POSTSUBSCRIPT ( italic_y start_POSTSUBSCRIPT bold_italic_ρ end_POSTSUBSCRIPT ( bold_x start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) ( italic_s start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) ) ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - 1 ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) ,

where π𝒟:84:subscript𝜋𝒟superscript8superscript4\pi_{\mathcal{D}}:\mathbb{R}^{8}\to\mathbb{R}^{4}italic_π start_POSTSUBSCRIPT caligraphic_D end_POSTSUBSCRIPT : blackboard_R start_POSTSUPERSCRIPT 8 end_POSTSUPERSCRIPT → blackboard_R start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT is the projection on the second component 𝒟(q𝝆c(𝐱i))(sk)𝒟superscriptsubscript𝑞𝝆csuperscript𝐱𝑖subscript𝑠𝑘\mathcal{D}(q_{\boldsymbol{\rho}}^{\rm c}(\mathbf{x}^{i}))(s_{k})caligraphic_D ( italic_q start_POSTSUBSCRIPT bold_italic_ρ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_c end_POSTSUPERSCRIPT ( bold_x start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) ) ( italic_s start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ), and γ0𝛾0\gamma\geq 0italic_γ ≥ 0 weighs the violation of the normality constraint. The map q𝝆csuperscriptsubscript𝑞𝝆cq_{\boldsymbol{\rho}}^{\textrm{c}}italic_q start_POSTSUBSCRIPT bold_italic_ρ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT c end_POSTSUPERSCRIPT is now a neural network that associates each set of boundary conditions 𝐱isuperscript𝐱𝑖\mathbf{x}^{i}bold_x start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT with a smooth curve q𝝆c(𝐱i):[0,L]2:superscriptsubscript𝑞𝝆csuperscript𝐱𝑖0𝐿superscript2q_{\boldsymbol{\rho}}^{\textrm{c}}\left(\mathbf{x}^{i}\right):[0,L]\to\mathbb{% R}^{2}italic_q start_POSTSUBSCRIPT bold_italic_ρ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT c end_POSTSUPERSCRIPT ( bold_x start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) : [ 0 , italic_L ] → blackboard_R start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT that can be evaluated at every point s[0,L]𝑠0𝐿s\in[0,L]italic_s ∈ [ 0 , italic_L ]. We denote this network with the superscript cc\mathrm{c}roman_c since this curve is, in particular, continuous. The outputs q𝝆c(𝐱i)(s)2superscriptsubscript𝑞𝝆csuperscript𝐱𝑖𝑠superscript2q_{\boldsymbol{\rho}}^{\textrm{c}}\left(\mathbf{x}^{i}\right)(s)\in\mathbb{R}^% {2}italic_q start_POSTSUBSCRIPT bold_italic_ρ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT c end_POSTSUPERSCRIPT ( bold_x start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) ( italic_s ) ∈ blackboard_R start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT are approximations of the configuration of the beam at s[0,L]𝑠0𝐿s\in[0,L]italic_s ∈ [ 0 , italic_L ].

We point out that, contrary to the discrete case, we learn approximations of 𝐪(s)𝐪𝑠\mathbf{q}(s)bold_q ( italic_s ) also on the end nodes, i.e., at s=0𝑠0s=0italic_s = 0 and s=L𝑠𝐿s=Litalic_s = italic_L. This is because we do not impose the boundary conditions by construction. Even though there are multiple approaches to embed them into the network architecture, the one we try in our experiments made the optimisation problem too complex, thus we only impose the boundary conditions weakly in the loss function.

Another strategy is to compute the angles θksubscript𝜃𝑘\theta_{k}italic_θ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT between the tangents (𝐪^k)superscriptsubscript^𝐪𝑘(\hat{\mathbf{q}}_{k})^{\prime}( over^ start_ARG bold_q end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT and the x𝑥xitalic_x-axis and to use them as training data. To this end, we define the neural network

θ𝝆c:8𝒞([0,L],):superscriptsubscript𝜃𝝆csuperscript8superscript𝒞0𝐿\theta_{\boldsymbol{\rho}}^{\textrm{c}}:\mathbb{R}^{8}\to\mathcal{C}^{\infty}% \left([0,L],\mathbb{R}\right)italic_θ start_POSTSUBSCRIPT bold_italic_ρ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT c end_POSTSUPERSCRIPT : blackboard_R start_POSTSUPERSCRIPT 8 end_POSTSUPERSCRIPT → caligraphic_C start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT ( [ 0 , italic_L ] , blackboard_R )

as θ𝝆c=θ^𝝆cπsuperscriptsubscript𝜃𝝆csuperscriptsubscript^𝜃𝝆c𝜋\theta_{\boldsymbol{\rho}}^{\textrm{c}}=\hat{\theta}_{\boldsymbol{\rho}}^{% \textrm{c}}\circ\piitalic_θ start_POSTSUBSCRIPT bold_italic_ρ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT c end_POSTSUPERSCRIPT = over^ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT bold_italic_ρ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT c end_POSTSUPERSCRIPT ∘ italic_π, where

θ^𝝆c:2𝒞([0,L],):superscriptsubscript^𝜃𝝆csuperscript2superscript𝒞0𝐿\hat{\theta}_{\boldsymbol{\rho}}^{\textrm{c}}:\mathbb{R}^{2}\to\mathcal{C}^{% \infty}\left([0,L],\mathbb{R}\right)over^ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT bold_italic_ρ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT c end_POSTSUPERSCRIPT : blackboard_R start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT → caligraphic_C start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT ( [ 0 , italic_L ] , blackboard_R ) (13)

is a neural network, and the function π:82:𝜋superscript8superscript2\pi:\mathbb{R}^{8}\to\mathbb{R}^{2}italic_π : blackboard_R start_POSTSUPERSCRIPT 8 end_POSTSUPERSCRIPT → blackboard_R start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT extracts the tangential angles from the boundary conditions, i.e., π(𝐱i)=(θ0i,θNi)𝜋superscript𝐱𝑖superscriptsubscript𝜃0𝑖superscriptsubscript𝜃𝑁𝑖\pi\left(\mathbf{x}^{i}\right)=\left(\theta_{0}^{i},\theta_{N}^{i}\right)italic_π ( bold_x start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) = ( italic_θ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_θ start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ). Such a network should approximate the angular function θ:[0,L]s:𝜃contains0𝐿𝑠\theta:[0,L]\ni s\to\mathbb{R}italic_θ : [ 0 , italic_L ] ∋ italic_s → blackboard_R, so that

τ𝝆c(𝐱i)(s):=(cos(θ𝝆c(𝐱i)(s)),sin(θ𝝆c(𝐱i)(s)))2assignsuperscriptsubscript𝜏𝝆csuperscript𝐱𝑖𝑠superscriptsubscript𝜃𝝆csuperscript𝐱𝑖𝑠superscriptsubscript𝜃𝝆csuperscript𝐱𝑖𝑠superscript2\tau_{\boldsymbol{\rho}}^{\mathrm{c}}\left(\mathbf{x}^{i}\right)(s):=\left(% \cos\left(\theta_{\boldsymbol{\rho}}^{\textrm{c}}\left(\mathbf{x}^{i}\right)(s% )\right),\sin\left(\theta_{\boldsymbol{\rho}}^{\textrm{c}}\left(\mathbf{x}^{i}% \right)(s)\right)\right)\in\mathbb{R}^{2}italic_τ start_POSTSUBSCRIPT bold_italic_ρ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_c end_POSTSUPERSCRIPT ( bold_x start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) ( italic_s ) := ( roman_cos ( italic_θ start_POSTSUBSCRIPT bold_italic_ρ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT c end_POSTSUPERSCRIPT ( bold_x start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) ( italic_s ) ) , roman_sin ( italic_θ start_POSTSUBSCRIPT bold_italic_ρ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT c end_POSTSUPERSCRIPT ( bold_x start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) ( italic_s ) ) ) ∈ blackboard_R start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT (14)

gets close to the tangent vector 𝐪(s)superscript𝐪𝑠\mathbf{q}^{\prime}(s)bold_q start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_s ). As a result, the constraint on the unit norm of the tangents is satisfied by construction, and the inextensibility of the elastica is guaranteed. The curve

𝐪(s)=𝐪0+0s𝐪(s¯)ds¯𝐪𝑠subscript𝐪0superscriptsubscript0𝑠superscript𝐪¯𝑠differential-d¯𝑠\mathbf{q}(s)=\mathbf{q}_{0}+\int_{0}^{s}\mathbf{q}^{\prime}(\bar{s})\mathrm{d% }\bar{s}bold_q ( italic_s ) = bold_q start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + ∫ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT bold_q start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( over¯ start_ARG italic_s end_ARG ) roman_d over¯ start_ARG italic_s end_ARG

can then be approximated through the reconstruction formula

q𝝆c(𝐱i)(s)=𝐪0+(τρc(𝐱i))(s),superscriptsubscript𝑞𝝆csuperscript𝐱𝑖𝑠subscript𝐪0superscriptsubscript𝜏𝜌csuperscript𝐱𝑖𝑠q_{\boldsymbol{\rho}}^{\textrm{c}}\left(\mathbf{x}^{i}\right)(s)=\mathbf{q}_{0% }+\mathcal{I}\left(\tau_{\rho}^{\mathrm{c}}\left(\mathbf{x}^{i}\right)\right)(% s),italic_q start_POSTSUBSCRIPT bold_italic_ρ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT c end_POSTSUPERSCRIPT ( bold_x start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) ( italic_s ) = bold_q start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + caligraphic_I ( italic_τ start_POSTSUBSCRIPT italic_ρ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_c end_POSTSUPERSCRIPT ( bold_x start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) ) ( italic_s ) , (15)

where the operator :𝒞([0,L],2)𝒞([0,L],2):superscript𝒞0𝐿superscript2superscript𝒞0𝐿superscript2\mathcal{I}:\mathcal{C}^{\infty}\left([0,L],\mathbb{R}^{2}\right)\to\mathcal{C% }^{\infty}\left([0,L],\mathbb{R}^{2}\right)caligraphic_I : caligraphic_C start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT ( [ 0 , italic_L ] , blackboard_R start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) → caligraphic_C start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT ( [ 0 , italic_L ] , blackboard_R start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) is such that

(τρc(𝐱i))(s)0sτρc(𝐱i)(s¯)ds¯.superscriptsubscript𝜏𝜌csuperscript𝐱𝑖𝑠superscriptsubscript0𝑠superscriptsubscript𝜏𝜌csuperscript𝐱𝑖¯𝑠differential-d¯𝑠\mathcal{I}\left(\tau_{\rho}^{\mathrm{c}}\left(\mathbf{x}^{i}\right)\right)(s)% \approx\int_{0}^{s}\tau_{\rho}^{\mathrm{c}}\left(\mathbf{x}^{i}\right)(\bar{s}% )\mathrm{d}\bar{s}.caligraphic_I ( italic_τ start_POSTSUBSCRIPT italic_ρ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_c end_POSTSUPERSCRIPT ( bold_x start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) ) ( italic_s ) ≈ ∫ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT italic_τ start_POSTSUBSCRIPT italic_ρ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_c end_POSTSUPERSCRIPT ( bold_x start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) ( over¯ start_ARG italic_s end_ARG ) roman_d over¯ start_ARG italic_s end_ARG .

In the numerical experiments, \mathcal{I}caligraphic_I is based on the 3333-point Gaussian quadrature formula applied to a partition of the interval [0,L]0𝐿[0,L][ 0 , italic_L ], see [7, Chapter 9]. As done previously, we define the vector

y𝝆(𝐱i)(sk):=(q𝝆c(𝐱i)(sk),τ𝝆c(𝐱i)(sk)),assignsubscript𝑦𝝆superscript𝐱𝑖subscript𝑠𝑘superscriptsubscript𝑞𝝆csuperscript𝐱𝑖subscript𝑠𝑘superscriptsubscript𝜏𝝆csuperscript𝐱𝑖subscript𝑠𝑘y_{\boldsymbol{\rho}}\left(\mathbf{x}^{i}\right)(s_{k}):=\left(q_{\boldsymbol{% \rho}}^{\mathrm{c}}\left(\mathbf{x}^{i}\right)(s_{k}),\,\tau_{\boldsymbol{\rho% }}^{\rm c}\left(\mathbf{x}^{i}\right)(s_{k})\right),italic_y start_POSTSUBSCRIPT bold_italic_ρ end_POSTSUBSCRIPT ( bold_x start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) ( italic_s start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) := ( italic_q start_POSTSUBSCRIPT bold_italic_ρ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_c end_POSTSUPERSCRIPT ( bold_x start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) ( italic_s start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) , italic_τ start_POSTSUBSCRIPT bold_italic_ρ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_c end_POSTSUPERSCRIPT ( bold_x start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) ( italic_s start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) ) , (16)

with components defined as in Equations (14) and (15). This allows us to train the network θ𝝆csuperscriptsubscript𝜃𝝆c\theta_{\boldsymbol{\rho}}^{\rm c}italic_θ start_POSTSUBSCRIPT bold_italic_ρ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_c end_POSTSUPERSCRIPT by minimising the same loss function as in Equation (12), where this time y𝝆csuperscriptsubscript𝑦𝝆cy_{\boldsymbol{\rho}}^{\rm c}italic_y start_POSTSUBSCRIPT bold_italic_ρ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_c end_POSTSUPERSCRIPT is given by Equation (16). Furthermore, since by construction this case satisfies π𝒟(y𝝆c(𝐱i)(s))2=τ𝝆c(𝐱i)(s)21subscriptnormsubscript𝜋𝒟superscriptsubscript𝑦𝝆csuperscript𝐱𝑖𝑠2subscriptnormsuperscriptsubscript𝜏𝝆csuperscript𝐱𝑖𝑠21\left\|\pi_{\mathcal{D}}\left(y_{\boldsymbol{\rho}}^{\rm c}(\mathbf{x}^{i})(s)% \right)\right\|_{2}=\left\|\tau_{\boldsymbol{\rho}}^{\mathrm{c}}\left(\mathbf{% x}^{i}\right)(s)\right\|_{2}\equiv 1∥ italic_π start_POSTSUBSCRIPT caligraphic_D end_POSTSUBSCRIPT ( italic_y start_POSTSUBSCRIPT bold_italic_ρ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_c end_POSTSUPERSCRIPT ( bold_x start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) ( italic_s ) ) ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = ∥ italic_τ start_POSTSUBSCRIPT bold_italic_ρ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_c end_POSTSUPERSCRIPT ( bold_x start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) ( italic_s ) ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ≡ 1, we set γ=0𝛾0\gamma=0italic_γ = 0. We present numerical experiments for the two proposed continuous networks q𝝆csuperscriptsubscript𝑞𝝆cq_{\boldsymbol{\rho}}^{\textrm{c}}italic_q start_POSTSUBSCRIPT bold_italic_ρ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT c end_POSTSUPERSCRIPT and θ𝝆csuperscriptsubscript𝜃𝝆c\theta_{\boldsymbol{\rho}}^{\textrm{c}}italic_θ start_POSTSUBSCRIPT bold_italic_ρ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT c end_POSTSUPERSCRIPT. In the latter case, by neural network architecture, we refer to θ^𝝆csuperscriptsubscript^𝜃𝝆c\hat{\theta}_{\boldsymbol{\rho}}^{\rm c}over^ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT bold_italic_ρ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_c end_POSTSUPERSCRIPT rather than θ𝝆csuperscriptsubscript𝜃𝝆c\theta_{\boldsymbol{\rho}}^{\rm c}italic_θ start_POSTSUBSCRIPT bold_italic_ρ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_c end_POSTSUPERSCRIPT in what follows. We analyse q𝝆csuperscriptsubscript𝑞𝝆cq_{\boldsymbol{\rho}}^{\textrm{c}}italic_q start_POSTSUBSCRIPT bold_italic_ρ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT c end_POSTSUPERSCRIPT more thoroughly in Section 5.1, mirroring most of the discrete case experiments. In Section 5.2 we study how the results are affected when we impose the arc length parametrisation and enforce the boundary conditions to be exactly satisfied by the network θ𝝆csuperscriptsubscript𝜃𝝆c\theta_{\boldsymbol{\rho}}^{\textrm{c}}italic_θ start_POSTSUBSCRIPT bold_italic_ρ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT c end_POSTSUPERSCRIPT.

5.1 Numerical experiments with q𝝆csuperscriptsubscript𝑞𝝆cq_{\boldsymbol{\rho}}^{\textrm{c}}italic_q start_POSTSUBSCRIPT bold_italic_ρ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT c end_POSTSUPERSCRIPT

As for the case of the discrete network, we perform an in-depth investigation of this learning setting. In this case, the experiments are run on a GPU-P100 machine. For this continuous setup, the standard MLP architecture does not provide accurate results even after a hyperparameter optimisation routine. Given, hence, that the simple MLP architecture does not seem to be flexible enough to capture the complexity of the elastica solution in this continuous framework, we move to a different architecture that we call MULT for the presence of multiplicative interactions in its architecture. This network has demonstrated superior performance to standard fully connected neural networks in the context of operator learning, see e.g. [51]. Details on this architecture can be found in Appendix A. We fix the learning rate η𝜂\etaitalic_η to 51035superscript1035\cdot 10^{-3}5 ⋅ 10 start_POSTSUPERSCRIPT - 3 end_POSTSUPERSCRIPT and only vary the number of layers and of hidden nodes in the training procedure, with the range of options reported in Appendix B, Table 6. In this case, we define the loss as in Equation (12), with γ=102𝛾superscript102\gamma=10^{-2}italic_γ = 10 start_POSTSUPERSCRIPT - 2 end_POSTSUPERSCRIPT. The weight decay is systematically set to 00. For the both-ends data set, this leads to a training error equal to 3.5541063.554superscript1063.554\cdot 10^{-6}3.554 ⋅ 10 start_POSTSUPERSCRIPT - 6 end_POSTSUPERSCRIPT, a validation error equal to 4.7791064.779superscript1064.779\cdot 10^{-6}4.779 ⋅ 10 start_POSTSUPERSCRIPT - 6 end_POSTSUPERSCRIPT, and a test error equal to 4.3541064.354superscript1064.354\cdot 10^{-6}4.354 ⋅ 10 start_POSTSUPERSCRIPT - 6 end_POSTSUPERSCRIPT. In Figure 3, the comparison over test trajectories for 𝐪𝐪\mathbf{q}bold_q and 𝐪superscript𝐪\mathbf{q}^{\prime}bold_q start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT is shown. As we can see in the plot showing the mean error over the trajectories, the error on the end nodes is nonzero, since we are not imposing boundary conditions by construction. This is in contrast to the corresponding plot for the discrete network in Figure 1.

Refer to caption
Refer to caption
Refer to caption
Figure 3: Comparison over test trajectories for 𝐪𝐪\mathbf{q}bold_q and 𝐪superscript𝐪\mathbf{q}^{\prime}bold_q start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT for the continuous network q𝝆csuperscriptsubscript𝑞𝝆cq_{\boldsymbol{\rho}}^{\textrm{c}}italic_q start_POSTSUBSCRIPT bold_italic_ρ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT c end_POSTSUPERSCRIPT tested on the both-ends data set with 80%10%10%percent80percent10percent1080\%\--10\%\--10\%80 % - 10 % - 10 % splitting into training, validation, and test sets. The mean squared error on the test set equals 4.3541064.354superscript1064.354\cdot 10^{-6}4.354 ⋅ 10 start_POSTSUPERSCRIPT - 6 end_POSTSUPERSCRIPT. For presentation purposes, only 10 randomly selected trajectories are considered in the first two plots.

Also in this case, we examine the behaviour of the learning process with different splittings of the data set into training and test sets. We display the results in Table 3 and summarise the corresponding hyperparameters in Appendix B, Table 7.

Data set splittingTraining - validation - testData set splittingTraining - validation - test\begin{array}[]{c}\text{Data set splitting}\\ \text{Training \-- validation \-- test}\end{array}start_ARRAY start_ROW start_CELL Data set splitting end_CELL end_ROW start_ROW start_CELL Training - validation - test end_CELL end_ROW end_ARRAY Training accuracy Validation accuracy Test accuracy
10% - 10% - 10% 2.1461042.146superscript1042.146\cdot 10^{-4}2.146 ⋅ 10 start_POSTSUPERSCRIPT - 4 end_POSTSUPERSCRIPT 1.2521031.252superscript1031.252\cdot 10^{-3}1.252 ⋅ 10 start_POSTSUPERSCRIPT - 3 end_POSTSUPERSCRIPT 8.8111048.811superscript1048.811\cdot 10^{-4}8.811 ⋅ 10 start_POSTSUPERSCRIPT - 4 end_POSTSUPERSCRIPT
20% - 10% - 10% 4.1871054.187superscript1054.187\cdot 10^{-5}4.187 ⋅ 10 start_POSTSUPERSCRIPT - 5 end_POSTSUPERSCRIPT 4.2391054.239superscript1054.239\cdot 10^{-5}4.239 ⋅ 10 start_POSTSUPERSCRIPT - 5 end_POSTSUPERSCRIPT 6.2791056.279superscript1056.279\cdot 10^{-5}6.279 ⋅ 10 start_POSTSUPERSCRIPT - 5 end_POSTSUPERSCRIPT
40% - 10% - 10% 7.0371067.037superscript1067.037\cdot 10^{-6}7.037 ⋅ 10 start_POSTSUPERSCRIPT - 6 end_POSTSUPERSCRIPT 8.3571068.357superscript1068.357\cdot 10^{-6}8.357 ⋅ 10 start_POSTSUPERSCRIPT - 6 end_POSTSUPERSCRIPT 8.4341068.434superscript1068.434\cdot 10^{-6}8.434 ⋅ 10 start_POSTSUPERSCRIPT - 6 end_POSTSUPERSCRIPT
80% - 10% - 10% 3.5541063.554superscript1063.554\cdot 10^{-6}3.554 ⋅ 10 start_POSTSUPERSCRIPT - 6 end_POSTSUPERSCRIPT 4.7791064.779superscript1064.779\cdot 10^{-6}4.779 ⋅ 10 start_POSTSUPERSCRIPT - 6 end_POSTSUPERSCRIPT 4.3541064.354superscript1064.354\cdot 10^{-6}4.354 ⋅ 10 start_POSTSUPERSCRIPT - 6 end_POSTSUPERSCRIPT
Table 3: Behaviour of the continuous network q𝝆csuperscriptsubscript𝑞𝝆cq_{\boldsymbol{\rho}}^{\textrm{c}}italic_q start_POSTSUBSCRIPT bold_italic_ρ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT c end_POSTSUPERSCRIPT tested on the both-ends data set with fewer training data points. The size of the training set varies, while that of the validation and the test sets is fixed. The last row corresponds to the results in Figure 3.

5.2 Numerical experiments with θ𝝆csuperscriptsubscript𝜃𝝆c\theta_{\boldsymbol{\rho}}^{\textrm{c}}italic_θ start_POSTSUBSCRIPT bold_italic_ρ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT c end_POSTSUPERSCRIPT

Here we consider a neural network approximation of the angle θ(s)𝜃𝑠\theta(s)italic_θ ( italic_s ) that parametrises the tangent vector 𝐪(s)=(cos(θ(s)),sin(θ(s)))superscript𝐪𝑠𝜃𝑠𝜃𝑠\mathbf{q}^{\prime}(s)=(\cos(\theta(s)),\sin(\theta(s)))bold_q start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_s ) = ( roman_cos ( italic_θ ( italic_s ) ) , roman_sin ( italic_θ ( italic_s ) ) ). By design, the approximation τ𝝆csuperscriptsubscript𝜏𝝆c\tau_{\boldsymbol{\rho}}^{\mathrm{c}}italic_τ start_POSTSUBSCRIPT bold_italic_ρ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_c end_POSTSUPERSCRIPT of the tangent vector 𝐪superscript𝐪\mathbf{q}^{\prime}bold_q start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT satisfies the constraint τ𝝆c(𝐱i)(s)2=1subscriptnormsuperscriptsubscript𝜏𝝆csuperscript𝐱𝑖𝑠21\|\tau_{\boldsymbol{\rho}}^{\mathrm{c}}(\mathbf{x}^{i})(s)\|_{2}=1∥ italic_τ start_POSTSUBSCRIPT bold_italic_ρ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_c end_POSTSUPERSCRIPT ( bold_x start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) ( italic_s ) ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = 1 for every s[0,L]𝑠0𝐿s\in[0,L]italic_s ∈ [ 0 , italic_L ] and 𝐱i8superscript𝐱𝑖superscript8\mathbf{x}^{i}\in\mathbb{R}^{8}bold_x start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT 8 end_POSTSUPERSCRIPT. We also analyse how the neural network approximation behaves when the boundary conditions τ𝝆c(𝐱i)(0)=𝐪(0)superscriptsubscript𝜏𝝆csuperscript𝐱𝑖0superscript𝐪0\tau_{\boldsymbol{\rho}}^{\mathrm{c}}(\mathbf{x}^{i})(0)=\mathbf{q}^{\prime}(0)italic_τ start_POSTSUBSCRIPT bold_italic_ρ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_c end_POSTSUPERSCRIPT ( bold_x start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) ( 0 ) = bold_q start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( 0 ) and τ𝝆c(𝐱i)(L)=𝐪(L)superscriptsubscript𝜏𝝆csuperscript𝐱𝑖𝐿superscript𝐪𝐿\tau_{\boldsymbol{\rho}}^{\mathrm{c}}(\mathbf{x}^{i})(L)=\mathbf{q}^{\prime}(L)italic_τ start_POSTSUBSCRIPT bold_italic_ρ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_c end_POSTSUPERSCRIPT ( bold_x start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) ( italic_L ) = bold_q start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_L ) are imposed by construction. To do so, we model the parametric function θ^𝝆csuperscriptsubscript^𝜃𝝆c\hat{\theta}_{\boldsymbol{\rho}}^{\textrm{c}}over^ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT bold_italic_ρ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT c end_POSTSUPERSCRIPT, defined in Equation (13), in one of the two following ways:

θ^𝝆c(𝐱i)(s)=f𝝆(s,θ0i,θNi),superscriptsubscript^𝜃𝝆csuperscript𝐱𝑖𝑠subscript𝑓𝝆𝑠superscriptsubscript𝜃0𝑖superscriptsubscript𝜃𝑁𝑖\hat{\theta}_{\boldsymbol{\rho}}^{\textrm{c}}(\mathbf{x}^{i})(s)=f_{% \boldsymbol{\rho}}(s,\theta_{0}^{i},\theta_{N}^{i}),over^ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT bold_italic_ρ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT c end_POSTSUPERSCRIPT ( bold_x start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) ( italic_s ) = italic_f start_POSTSUBSCRIPT bold_italic_ρ end_POSTSUBSCRIPT ( italic_s , italic_θ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_θ start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) , (17)
θ^𝝆c(𝐱i)(s)=f𝝆(s,θ0i,θNi)+(θ0if𝝆(0,θ0i,θNi))e100s2+(θNif𝝆(L,θ0i,θNi))e100(sL)2,superscriptsubscript^𝜃𝝆csuperscript𝐱𝑖𝑠subscript𝑓𝝆𝑠superscriptsubscript𝜃0𝑖superscriptsubscript𝜃𝑁𝑖superscriptsubscript𝜃0𝑖subscript𝑓𝝆0superscriptsubscript𝜃0𝑖superscriptsubscript𝜃𝑁𝑖superscript𝑒100superscript𝑠2superscriptsubscript𝜃𝑁𝑖subscript𝑓𝝆𝐿superscriptsubscript𝜃0𝑖superscriptsubscript𝜃𝑁𝑖superscript𝑒100superscript𝑠𝐿2\begin{split}\hat{\theta}_{\boldsymbol{\rho}}^{\textrm{c}}(\mathbf{x}^{i})(s)&% =f_{\boldsymbol{\rho}}(s,\theta_{0}^{i},\theta_{N}^{i})+(\theta_{0}^{i}-f_{% \boldsymbol{\rho}}(0,\theta_{0}^{i},\theta_{N}^{i}))e^{-100s^{2}}\\ &+(\theta_{N}^{i}-f_{\boldsymbol{\rho}}(L,\theta_{0}^{i},\theta_{N}^{i}))e^{-1% 00(s-L)^{2}},\end{split}start_ROW start_CELL over^ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT bold_italic_ρ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT c end_POSTSUPERSCRIPT ( bold_x start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) ( italic_s ) end_CELL start_CELL = italic_f start_POSTSUBSCRIPT bold_italic_ρ end_POSTSUBSCRIPT ( italic_s , italic_θ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_θ start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) + ( italic_θ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT - italic_f start_POSTSUBSCRIPT bold_italic_ρ end_POSTSUBSCRIPT ( 0 , italic_θ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_θ start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) ) italic_e start_POSTSUPERSCRIPT - 100 italic_s start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL + ( italic_θ start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT - italic_f start_POSTSUBSCRIPT bold_italic_ρ end_POSTSUBSCRIPT ( italic_L , italic_θ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_θ start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) ) italic_e start_POSTSUPERSCRIPT - 100 ( italic_s - italic_L ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT , end_CELL end_ROW (18)

where f𝝆:3:subscript𝑓𝝆superscript3f_{\boldsymbol{\rho}}:\mathbb{R}^{3}\to\mathbb{R}italic_f start_POSTSUBSCRIPT bold_italic_ρ end_POSTSUBSCRIPT : blackboard_R start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT → blackboard_R is any neural network, and we recall that π(𝐱i)=(θ0i,θNi)𝜋superscript𝐱𝑖superscriptsubscript𝜃0𝑖superscriptsubscript𝜃𝑁𝑖\pi(\mathbf{x}^{i})=(\theta_{0}^{i},\theta_{N}^{i})italic_π ( bold_x start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) = ( italic_θ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_θ start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ). We remark that, in the case of the parameterisation in Equation (18), one gets θ𝝆c(𝐱i)(0)=θ0isuperscriptsubscript𝜃𝝆csuperscript𝐱𝑖0superscriptsubscript𝜃0𝑖\theta_{\boldsymbol{\rho}}^{\textrm{c}}(\mathbf{x}^{i})(0)=\theta_{0}^{i}italic_θ start_POSTSUBSCRIPT bold_italic_ρ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT c end_POSTSUPERSCRIPT ( bold_x start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) ( 0 ) = italic_θ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT and θ𝝆c(𝐱i)(L)=θNisuperscriptsubscript𝜃𝝆csuperscript𝐱𝑖𝐿superscriptsubscript𝜃𝑁𝑖\theta_{\boldsymbol{\rho}}^{\textrm{c}}(\mathbf{x}^{i})(L)=\theta_{N}^{i}italic_θ start_POSTSUBSCRIPT bold_italic_ρ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT c end_POSTSUPERSCRIPT ( bold_x start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) ( italic_L ) = italic_θ start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT up to machine precision, due to the fast decay of the Gaussian function. As in the previous sections, we collect the hyperparameter and architecture options with the respective range of choices in Table 8, and we report the results without imposing the boundary conditions in Figure 4, while those imposing them in Figure 5, in both cases using the both-ends data set, with 80%10%10%percent80percent10percent1080\%\--10\%\--10\%80 % - 10 % - 10 % splitting into training, validation, and test sets. The results shown in the two figures correspond respectively to training errors of 6.2881066.288superscript1066.288\cdot 10^{-6}6.288 ⋅ 10 start_POSTSUPERSCRIPT - 6 end_POSTSUPERSCRIPT and 5.3011065.301superscript1065.301\cdot 10^{-6}5.301 ⋅ 10 start_POSTSUPERSCRIPT - 6 end_POSTSUPERSCRIPT, validation errors 5.8741065.874superscript1065.874\cdot 10^{-6}5.874 ⋅ 10 start_POSTSUPERSCRIPT - 6 end_POSTSUPERSCRIPT and 5.0651065.065superscript1065.065\cdot 10^{-6}5.065 ⋅ 10 start_POSTSUPERSCRIPT - 6 end_POSTSUPERSCRIPT, and test errors of 5.0891065.089superscript1065.089\cdot 10^{-6}5.089 ⋅ 10 start_POSTSUPERSCRIPT - 6 end_POSTSUPERSCRIPT and 4.3851064.385superscript1064.385\cdot 10^{-6}4.385 ⋅ 10 start_POSTSUPERSCRIPT - 6 end_POSTSUPERSCRIPT. The best-performing hyperparameter combinations can be found in Table 9.

Refer to caption
Refer to caption
Refer to caption
Refer to caption
Figure 4: Comparison over test trajectories for 𝐪𝐪\mathbf{q}bold_q and 𝐪superscript𝐪\mathbf{q}^{\prime}bold_q start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT, for the case θ𝝆csuperscriptsubscript𝜃𝝆c\theta_{\boldsymbol{\rho}}^{\textrm{c}}italic_θ start_POSTSUBSCRIPT bold_italic_ρ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT c end_POSTSUPERSCRIPT is modelled as in Equation (17), with 80%10%10%percent80percent10percent1080\%\--10\%\--10\%80 % - 10 % - 10 % splitting of the both-ends data set into training, validation, and test sets. The mean squared error on the test set equals 6.2891066.289superscript1066.289\cdot 10^{-6}6.289 ⋅ 10 start_POSTSUPERSCRIPT - 6 end_POSTSUPERSCRIPT. For presentation purposes, only 10 randomly selected trajectories are considered in the first two plots.
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Figure 5: Comparison over test trajectories for 𝐪𝐪\mathbf{q}bold_q and 𝐪superscript𝐪\mathbf{q}^{\prime}bold_q start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT, for the case θ𝝆csuperscriptsubscript𝜃𝝆c\theta_{\boldsymbol{\rho}}^{\textrm{c}}italic_θ start_POSTSUBSCRIPT bold_italic_ρ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT c end_POSTSUPERSCRIPT is modelled as in Equation (18), with 80%10%10%percent80percent10percent1080\%\--10\%\--10\%80 % - 10 % - 10 % splitting of the both-ends data set into training, validation, and test sets. The mean squared error on the test set equals 4.3851064.385superscript1064.385\cdot 10^{-6}4.385 ⋅ 10 start_POSTSUPERSCRIPT - 6 end_POSTSUPERSCRIPT. For presentation purposes, only 10 randomly selected trajectories are considered in the first two plots.

6 Discussion

The results in Figures 4 and 5 are comparable, especially looking at the mean error plots. This suggests that the imposition of the boundary conditions, in the proposed way, is not limiting the expressivity of the considered network. Thus, given the boundary value nature of our problem, these figures advocate the enforcement of the boundary conditions on the network θ𝝆csuperscriptsubscript𝜃𝝆c\theta_{\boldsymbol{\rho}}^{\textrm{c}}italic_θ start_POSTSUBSCRIPT bold_italic_ρ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT c end_POSTSUPERSCRIPT. However, due to the chosen reconstruction procedure in Equation (15) for the variable 𝐪𝐪\mathbf{q}bold_q, we are able to impose the boundary conditions on 𝐪𝐪\mathbf{q}bold_q only on the left node. Other more symmetric reconstruction procedures can be adopted, but the proposed one has provided better experimental results.

Comparing the results related to q𝝆csuperscriptsubscript𝑞𝝆cq_{\boldsymbol{\rho}}^{\textrm{c}}italic_q start_POSTSUBSCRIPT bold_italic_ρ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT c end_POSTSUPERSCRIPT with those of θ𝝆csuperscriptsubscript𝜃𝝆c\theta_{\boldsymbol{\rho}}^{\textrm{c}}italic_θ start_POSTSUBSCRIPT bold_italic_ρ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT c end_POSTSUPERSCRIPT, we notice similar performances in terms of training and test errors. In both cases, they have one order of magnitude more than the corresponding training and test errors of the discrete network q𝝆dsuperscriptsubscript𝑞𝝆dq_{\boldsymbol{\rho}}^{\textrm{d}}italic_q start_POSTSUBSCRIPT bold_italic_ρ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT d end_POSTSUPERSCRIPT. Thus, as a result of our experiments, we can conclude that

  • if the accuracy and the efficient evaluation of the model at the discrete nodes are of interest, the discrete network is the best option;

  • for a more flexible model, not restricted to the discrete nodes, the continuous network is a better choice; among the two proposed modelling strategies, working with q𝝆csuperscriptsubscript𝑞𝝆cq_{\boldsymbol{\rho}}^{\textrm{c}}italic_q start_POSTSUBSCRIPT bold_italic_ρ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT c end_POSTSUPERSCRIPT is more suitable for an easy parametrisation of both 𝐪𝐪\mathbf{q}bold_q and 𝐪superscript𝐪\mathbf{q}^{\prime}bold_q start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT, while θ𝝆csuperscriptsubscript𝜃𝝆c\theta_{\boldsymbol{\rho}}^{\textrm{c}}italic_θ start_POSTSUBSCRIPT bold_italic_ρ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT c end_POSTSUPERSCRIPT is more suitable to impose geometrical structure and constraints.

The total accuracy error of a neural network model can be defined by splitting it into three components: approximation error, optimisation error, and generalisation error (see e.g. [17]). To achieve excellent agreement between predicted and reference trajectories, it is crucial to select the appropriate architecture and fine-tune the model hyperparameters. Our results demonstrate that we can construct a network that is expressive enough to provide a small approximation error and with very good generalisation capability.

Lastly, we have compared the time cost of the Neural Network prediction against the traditional approach with numerical solvers as described in Section 2.2. The discrete and the continuous approaches outperformed the traditional solvers with an average speedup of 105.000 times and 260.000 times, respectively, across the test trajectories. The training time of the continuous network is 1.25 times larger than that of the discrete network. It’s important to note that these results are subject to certain limitations, such as the specific choice of the hyperparameters or the machine used to train and test the network. These findings suggest that using neural networks to predict new solutions of the elastica for unseen boundary conditions is much more time efficient than the classical numerical methods, although requiring intensive offline training.

6.1 Future work

In the methods presented in this paper, the mathematical problem and the neural network model do not interact once the data set is created. To improve the results presented here, one could include Euler’s elastica model directly into the training process. This could be done either by directly imposing in the loss function that 𝐪(s)𝐪𝑠\mathbf{q}(s)bold_q ( italic_s ) satisfies the differential equations (5), or one could add the constrained action integral from Equation (4) into the loss function that is minimised, see e.g. [30, 10, 12, 11].

There are many promising directions to follow up on this work. One is to consider 3D versions of Euler’s elastica, another is to look at the dynamical problem, and finally one may examine industrial applications, as mentioned in the introduction.


CRediT author statement.

Elena Celledoni: Conceptualisation, Validation, Writing - Review & Editing, Supervision, Funding acquisition. Ergys Çokaj: Validation, Investigation, Visualisation, Writing - Review & Editing. Andrea Leone: Methodology, Software, Investigation, Writing - Original Draft, Review & Editing. Sigrid Leyendecker: Conceptualisation, Writing - Review & Editing, Supervision, Funding acquisition. Davide Murari: Methodology, Software, Investigation, Writing - Original Draft, Review & Editing. Brynjulf Owren: Conceptualisation, Validation, Writing - Review & Editing, Supervision, Funding acquisition. Rodrigo T. Sato Martín de Almagro: Methodology, Validation, Writing - Review & Editing, Supervision. Martina Stavole: Methodology, Software, Investigation, Writing - Original Draft, Review & Editing.


Acknowledgments. This project has received funding from the European Union’s Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agreement No 860124. This work was partially supported by a grant from the Simons Foundation (DM). This contribution reflects only the authors’ view, and the Research Executive Agency and the European Commission are not responsible for any use that may be made of the information it contains.

Appendix A Architecture for the continuous network

We provide the expression of the forward propagation of the multiplicative network MULT used for the experiments in Section 5:

𝐔=σ(𝐖1𝐱+𝐛1),𝐕=σ(𝐖2𝐱+𝐛2)formulae-sequence𝐔𝜎subscript𝐖1𝐱subscript𝐛1𝐕𝜎subscript𝐖2𝐱subscript𝐛2\displaystyle\mathbf{U}=\sigma(\mathbf{W}_{1}\mathbf{x}+\mathbf{b}_{1}),\quad% \mathbf{V}=\sigma(\mathbf{W}_{2}\mathbf{x}+\mathbf{b}_{2})bold_U = italic_σ ( bold_W start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT bold_x + bold_b start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) , bold_V = italic_σ ( bold_W start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT bold_x + bold_b start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) (19)
𝐇1=σ(𝐖3𝐱+𝐛3)subscript𝐇1𝜎subscript𝐖3𝐱subscript𝐛3\displaystyle\mathbf{H}_{1}=\sigma(\mathbf{W}_{3}\mathbf{x}+\mathbf{b}_{3})bold_H start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = italic_σ ( bold_W start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT bold_x + bold_b start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ) (20)
𝐙j=σ(𝐖jz𝐇j+𝐛jz),j=1,,formulae-sequencesubscript𝐙𝑗𝜎subscriptsuperscript𝐖𝑧𝑗subscript𝐇𝑗subscriptsuperscript𝐛𝑧𝑗𝑗1\displaystyle\mathbf{Z}_{j}=\sigma(\mathbf{W}^{z}_{j}\mathbf{H}_{j}+\mathbf{b}% ^{z}_{j}),\;j=1,\dots,\ellbold_Z start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT = italic_σ ( bold_W start_POSTSUPERSCRIPT italic_z end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT bold_H start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT + bold_b start_POSTSUPERSCRIPT italic_z end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) , italic_j = 1 , … , roman_ℓ (21)
𝐇j+1=(1𝐙j)𝐔+𝐙j𝐕,j=1,,formulae-sequencesubscript𝐇𝑗1direct-product1subscript𝐙𝑗𝐔direct-productsubscript𝐙𝑗𝐕𝑗1\displaystyle\mathbf{H}_{j+1}=(1-\mathbf{Z}_{j})\odot\mathbf{U}+\mathbf{Z}_{j}% \odot\mathbf{V},\;j=1,\dots,\ellbold_H start_POSTSUBSCRIPT italic_j + 1 end_POSTSUBSCRIPT = ( 1 - bold_Z start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) ⊙ bold_U + bold_Z start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ⊙ bold_V , italic_j = 1 , … , roman_ℓ (22)
f𝝆MULT(𝐱)=𝐖𝐇+1+𝐛,superscriptsubscript𝑓𝝆MULT𝐱subscript𝐖𝐇1𝐛\displaystyle f_{\boldsymbol{\rho}}^{\mathrm{MULT}}(\mathbf{x})=\mathbf{W}% \mathbf{H}_{\ell+1}+\mathbf{b},italic_f start_POSTSUBSCRIPT bold_italic_ρ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_MULT end_POSTSUPERSCRIPT ( bold_x ) = bold_WH start_POSTSUBSCRIPT roman_ℓ + 1 end_POSTSUBSCRIPT + bold_b , (23)

where direct-product\odot denotes the component-wise multiplications. In this case, 𝝆={𝐖1,𝐛1,𝐖2,𝐛2,𝐖3,𝐛3,(𝐖jz,𝐛jz)j=1,𝐖,𝐛}𝝆subscript𝐖1subscript𝐛1subscript𝐖2subscript𝐛2subscript𝐖3subscript𝐛3superscriptsubscriptsubscriptsuperscript𝐖𝑧𝑗subscriptsuperscript𝐛𝑧𝑗𝑗1𝐖𝐛\boldsymbol{\rho}=\left\{\mathbf{W}_{1},\mathbf{b}_{1},\mathbf{W}_{2},\mathbf{% b}_{2},\mathbf{W}_{3},\mathbf{b}_{3},\left(\mathbf{W}^{z}_{j},\mathbf{b}^{z}_{% j}\right)_{j=1}^{\ell},\mathbf{W},\mathbf{b}\right\}bold_italic_ρ = { bold_W start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , bold_b start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , bold_W start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , bold_b start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , bold_W start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT , bold_b start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT , ( bold_W start_POSTSUPERSCRIPT italic_z end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , bold_b start_POSTSUPERSCRIPT italic_z end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT , bold_W , bold_b }, and the weight matrices and biases have shapes that allow for the expressions (19)-(23) to be well-defined. This architecture is inspired by neural attention mechanisms and was introduced in [52] to improve the gradient behaviour. A further motivation for our choice of including this architecture is experimental since it has proven effective in solving the task of interest, while still having a similar number of parameters to the MLP architecture. Throughout the paper, we refer to this architecture as multiplicative since it includes component-wise multiplications, which help capture multiplicative interactions between the variables.

Appendix B Details on hyperparameter optimisation

We provide here further details on the hyperparameter optimisation strategy with Optuna, relative to the results in Sections 4 and 5. The tables below display the hyperparameters we optimise for in each of the networks, the ranges and the distribution, as well as the selected combinations used to perform the experiments in the paper.

Hyperparameter Range Distribution
#layers \ellroman_ℓ {0,,10}010\{0,...,10\}{ 0 , … , 10 } discrete uniform
#hidden nodes [10,1000]101000[10,1000]\cap\mathbb{N}[ 10 , 1000 ] ∩ blackboard_N discrete uniform
γ𝛾\gammaitalic_γ [0,1102]01superscript102[0,1\cdot 10^{-2}][ 0 , 1 ⋅ 10 start_POSTSUPERSCRIPT - 2 end_POSTSUPERSCRIPT ] uniform
Table 4: Hyperparameter ranges for the discrete network q𝝆dsuperscriptsubscript𝑞𝝆dq_{\boldsymbol{\rho}}^{\textrm{d}}italic_q start_POSTSUBSCRIPT bold_italic_ρ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT d end_POSTSUPERSCRIPT. The first column of the table reports the hyperparameters we test. The second describes the set of allowed values for each, while the third specifies how such values are explored through Optuna.
HyperparametercombinationHyperparametercombination\begin{array}[]{c}\text{Hyperparameter}\\ \text{combination}\end{array}start_ARRAY start_ROW start_CELL Hyperparameter end_CELL end_ROW start_ROW start_CELL combination end_CELL end_ROW end_ARRAY % of trajectories of the whole dataset in the training set
10% 20% 40% 80%
# layers \ellroman_ℓ 4 4 4 4
#hidden nodes 950 978 997 985
γ𝛾\gammaitalic_γ 7.0441037.044superscript1037.044\cdot 10^{-3}7.044 ⋅ 10 start_POSTSUPERSCRIPT - 3 end_POSTSUPERSCRIPT 6.3361036.336superscript1036.336\cdot 10^{-3}6.336 ⋅ 10 start_POSTSUPERSCRIPT - 3 end_POSTSUPERSCRIPT 9.0041039.004superscript1039.004\cdot 10^{-3}9.004 ⋅ 10 start_POSTSUPERSCRIPT - 3 end_POSTSUPERSCRIPT 3.8531033.853superscript1033.853\cdot 10^{-3}3.853 ⋅ 10 start_POSTSUPERSCRIPT - 3 end_POSTSUPERSCRIPT
Table 5: Choice of hyperparameters for the training of the discrete network q𝝆dsuperscriptsubscript𝑞𝝆dq_{\boldsymbol{\rho}}^{\textrm{d}}italic_q start_POSTSUBSCRIPT bold_italic_ρ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT d end_POSTSUPERSCRIPT tested on the both-ends data set with different sizes of the training set, with the validation and test sets each containing 10%percent1010\%10 % of trajectories of the dataset.
Hyperparameter Range Distribution
#layers \ellroman_ℓ {5,,10}510\{5,\ldots,10\}{ 5 , … , 10 } discrete uniform
#hidden nodes [10,250]10250[10,250]\cap\mathbb{N}[ 10 , 250 ] ∩ blackboard_N discrete uniform
Table 6: Hyperparameter ranges for the continuous network q𝝆csuperscriptsubscript𝑞𝝆cq_{\boldsymbol{\rho}}^{\textrm{c}}italic_q start_POSTSUBSCRIPT bold_italic_ρ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT c end_POSTSUPERSCRIPT. The first column of the table reports the hyperparameters we test. The second describes the set of allowed values for each, while the third specifies how such values are explored through Optuna.
HyperparametercombinationHyperparametercombination\begin{array}[]{c}\text{Hyperparameter}\\ \text{combination}\end{array}start_ARRAY start_ROW start_CELL Hyperparameter end_CELL end_ROW start_ROW start_CELL combination end_CELL end_ROW end_ARRAY % of trajectories of the whole dataset in the training set
10% 20% 40% 80%
# layers \ellroman_ℓ 6 7 8 6
#hidden nodes 139 185 181 106
Table 7: Choice of hyperparameters for the training of the continuous network q𝝆csuperscriptsubscript𝑞𝝆cq_{\boldsymbol{\rho}}^{\textrm{c}}italic_q start_POSTSUBSCRIPT bold_italic_ρ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT c end_POSTSUPERSCRIPT tested on the both-ends data set with different sizes of the training set, with the validation and test sets each containing 10%percent1010\%10 % of trajectories of the dataset.
Hyperparameter Range Distribution
#layers \ellroman_ℓ {1,,10}110\{1,\ldots,10\}{ 1 , … , 10 } discrete uniform
#hidden nodes [50,200]50200[50,200]\cap\mathbb{N}[ 50 , 200 ] ∩ blackboard_N discrete uniform
Table 8: Hyperparameter ranges for the continuous network θ𝝆csuperscriptsubscript𝜃𝝆c\theta_{\boldsymbol{\rho}}^{\textrm{c}}italic_θ start_POSTSUBSCRIPT bold_italic_ρ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT c end_POSTSUPERSCRIPT.The first column of the table reports the hyperparameters we test. The second describes the set of allowed values for each, while the third specifies how such values are explored through Optuna.
HyperparametercombinationHyperparametercombination\begin{array}[]{c}\text{Hyperparameter}\\ \text{combination}\end{array}start_ARRAY start_ROW start_CELL Hyperparameter end_CELL end_ROW start_ROW start_CELL combination end_CELL end_ROW end_ARRAY θ𝝆csuperscriptsubscript𝜃𝝆c\theta_{\boldsymbol{\rho}}^{\textrm{c}}italic_θ start_POSTSUBSCRIPT bold_italic_ρ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT c end_POSTSUPERSCRIPT
θ𝝆csuperscriptsubscript𝜃𝝆c\theta_{\boldsymbol{\rho}}^{\textrm{c}}italic_θ start_POSTSUBSCRIPT bold_italic_ρ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT c end_POSTSUPERSCRIPT as in (17) θ𝝆csuperscriptsubscript𝜃𝝆c\theta_{\boldsymbol{\rho}}^{\textrm{c}}italic_θ start_POSTSUBSCRIPT bold_italic_ρ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT c end_POSTSUPERSCRIPT as in (18)
# layers \ellroman_ℓ 8 8
#hidden nodes 93 58
Table 9: Choice of the hyperparameters for the training of the continuous network θ𝝆csuperscriptsubscript𝜃𝝆c\theta_{\boldsymbol{\rho}}^{\textrm{c}}italic_θ start_POSTSUBSCRIPT bold_italic_ρ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT c end_POSTSUPERSCRIPT tested on the both-ends data set with 80%10%10%percent80percent10percent1080\%\--10\%\--10\%80 % - 10 % - 10 % splitting into training, validation, and test sets. The second column shows the combination of hyperparameters yielding the best result corresponding to Figure 4, while the third column that corresponding to Figure 5.

References

  • \bibcommenthead
  • [1] Saad, Y.: Iterative Methods for Sparse Linear Systems. SIAM, (2003)
  • [2] Nocedal, J., Wright, S.J.: Numerical Optimization. Springer, (1999)
  • Marsden and West [2001] Marsden, J.E., West, M.: Discrete mechanics and variational integrators. Acta numerica 10, 357–514 (2001)
  • [4] Hairer, E., Nørsett, S.P., Wanner, G.: Solving Ordinary Differential Equations I, Nonstiff Problems, Second revised edition edn. Springer, (1993)
  • [5] Hairer, E., Wanner, G.: Solving Ordinary Differential Equations II, Stiff and Differential-Algebraic Problems, Second revised edition edn. Springer, (1996)
  • [6] Brenner, S.C.: The Mathematical Theory of Finite Element Methods. Springer, (2008)
  • [7] Quarteroni, A., Sacco, R., Saleri, F.: Numerical Mathematics vol. 37. Springer, (2006)
  • Cuomo et al. [2022] Cuomo, S., Di Cola, V.S., Giampaolo, F., Rozza, G., Raissi, M., Piccialli, F.: Scientific machine learning through physics–informed neural networks: Where we are and what’s next. Journal of Scientific Computing 92(3), 88 (2022)
  • Brunton and Kutz [2023] Brunton, S.L., Kutz, J.N.: Machine learning for partial differential equations. arXiv:2303.17078 (2023)
  • Raissi et al. [2019] Raissi, M., Perdikaris, P., Karniadakis, G.E.: Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. Journal of Computational physics 378, 686–707 (2019)
  • Samaniego et al. [2020] Samaniego, E., Anitescu, C., Goswami, S., Nguyen-Thanh, V.M., Guo, H., Hamdia, K., Zhuang, X., Rabczuk, T.: An energy approach to the solution of partial differential equations in computational mechanics via machine learning: Concepts, implementation and applications. Computer Methods in Applied Mechanics and Engineering 362, 112790 (2020)
  • E and Yu [2018] E, W., Yu, B.: The Deep Ritz Method: A Deep Learning-Based Numerical Algorithm for Solving Variational Problems. Communications in Mathematics and Statistics 6(1), 1–12 (2018)
  • Gu and Ng [2023] Gu, Y., Ng, M.K.: Deep neural networks for solving large linear systems arising from high-dimensional problems. SIAM Journal on Scientific Computing 45(5), 2356–2381 (2023)
  • Kadupitiya et al. [2022] Kadupitiya, J., Fox, G.C., Jadhao, V.: Solving Newton’s equations of motion with large timesteps using recurrent neural networks based operators. Machine Learning: Science and Technology 3(2), 025002 (2022)
  • Liu et al. [2020] Liu, Y., Kutz, J., Brunton, S.: Hierarchical deep learning of multiscale differential equation time-steppers, arxiv. arXiv:2008.09768 (2020)
  • Mattheakis et al. [2022] Mattheakis, M., Sondak, D., Dogra, A.S., Protopapas, P.: Hamiltonian neural networks for solving equations of motion. Physical Review E 105(6), 065305 (2022)
  • Lu et al. [2021a] Lu, L., Jin, P., Pang, G., Zhang, Z., Karniadakis, G.E.: Learning nonlinear operators via deeponet based on the universal approximation theorem of operators. Nature machine intelligence 3(3), 218–229 (2021)
  • Lu et al. [2021b] Lu, L., Meng, X., Mao, Z., Karniadakis, G.E.: Deepxde: A deep learning library for solving differential equations. SIAM review 63(1), 208–228 (2021)
  • Chevalier et al. [2022] Chevalier, S., Stiasny, J., Chatzivasileiadis, S.: Accelerating dynamical system simulations with contracting and physics-projected neural-newton solvers. In: Learning for Dynamics and Control Conference, PMLR, pp. 803–816 (2022)
  • Li et al. [2022] Li, Y., Zhou, Z., Ying, S.: Delisa: Deep learning based iteration scheme approximation for solving pdes. Journal of Computational Physics 451, 110884 (2022)
  • Schiassi et al. [2021] Schiassi, E., Furfaro, R., Leake, C., De Florio, M., Johnston, H., Mortari, D.: Extreme theory of functional connections: A fast physics-informed neural network method for solving ordinary and partial differential equations. Neurocomputing 457, 334–356 (2021)
  • De Florio et al. [2022] De Florio, M., Schiassi, E., Furfaro, R.: Physics-informed neural networks and functional interpolation for stiff chemical kinetics. Chaos: An Interdisciplinary Journal of Nonlinear Science 32(6) (2022)
  • Fabiani et al. [2023] Fabiani, G., Galaris, E., Russo, L., Siettos, C.: Parsimonious physics-informed random projection neural networks for initial value problems of ODEs and index-1 DAEs. Chaos: An Interdisciplinary Journal of Nonlinear Science 33(4) (2023)
  • Mortari et al. [2019] Mortari, D., Johnston, H., Smith, L.: High accuracy least-squares solutions of nonlinear differential equations. Journal of computational and applied mathematics 352, 293–307 (2019)
  • Loc Vu-Quoc [2023] Loc Vu-Quoc, A.H.: Deep learning applied to computational mechanics: A comprehensive review, state of the art, and the classics. Computer Modeling in Engineering & Sciences 137(2), 1069–1343 (2023)
  • Brunton and Kutz [2022] Brunton, S.L., Kutz, J.N.: Data-Driven Science and Engineering: Machine Learning, Dynamical Systems, and Control, 2nd edn. Cambridge University Press, ??? (2022)
  • [27] Yagawa, G., Oishi, A.: Computational Mechanics with Deep Learning: An Introduction. Springer, (2022)
  • Fabiani et al. [2021] Fabiani, G., Calabrò, F., Russo, L., Siettos, C.: Numerical solution and bifurcation analysis of nonlinear partial differential equations with extreme learning machines. Journal of Scientific Computing 89(2), 44 (2021)
  • Galaris et al. [2022] Galaris, E., Fabiani, G., Gallos, I., Kevrekidis, I., Siettos, C.: Numerical bifurcation analysis of PDEs from lattice boltzmann model simulations: a parsimonious machine learning approach. Journal of Scientific Computing 92(2), 34 (2022)
  • Lagaris et al. [1998] Lagaris, I.E., Likas, A., Fotiadis, D.I.: Artificial neural networks for solving ordinary and partial differential equations. IEEE transactions on neural networks 9(5), 987–1000 (1998)
  • [31] Kollmannsberger, S., D’Angella, D., Jokeit, M., Herrmann, L.: Deep Learning in Computational Mechanics. Springer, (2021)
  • Ntarladima et al. [2023] Ntarladima, K., Pieber, M., Gerstmayr, J.: A model for contact and friction between beams under large deformation and sheaves. Nonlinear Dynamics, 1–18 (2023)
  • Stavole et al. [2022] Stavole, M., Almagro, R.T.S.M., Lohk, M., Leyendecker, S.: Homogenization of the constitutive properties of composite beam cross-sections. In: ECCOMAS Congress 2022-8th European Congress on Computational Methods in Applied Sciences and Engineering (2022)
  • Manfredo et al. [2023] Manfredo, D., Dörlich, V., Linn, J., Arnold, M.: Data based constitutive modelling of rate independent inelastic effects in composite cables using preisach hysteresis operators. Multibody System Dynamics, 1–16 (2023)
  • Saadat and Durville [2023] Saadat, M.A., Durville, D.: A mixed stress-strain driven computational homogenization of spiral strands. Computers & Structures 279, 106981 (2023)
  • Euler [1744] Euler, L.: De Curvis Elastici. Additamentum in Methodus inveniendi lineas curvas maximi minimive proprietate gaudentes, sive solutio problematis isoperimetrici lattissimo sensu accepti, Lausanne (1744)
  • Love [1863 - 1940] Love, A.E.H.: A Treatise on the Mathematical Theory of Elasticity. Cambridge University Press, Cambridge (1863 - 1940)
  • Matsutani [2010] Matsutani, S.: Euler’s elastica and beyond. Journal of Geometry and Symmetry in Physics 17, 45–86 (2010)
  • Singer [2008] Singer, D.A.: Lectures on elastic curves and rods. In: AIP Conference Proceedings, vol. 1002. American Institute of Physics, pp. 3–32 (2008)
  • Rohrhofer et al. [2022] Rohrhofer, F.M., Posch, S., Gößnitzer, C., Geiger, B.C.: On the role of fixed points of dynamical systems in training physics-informed neural networks. Transactions on Machine Learning Research (2022)
  • Colombo et al. [2016] Colombo, L., Ferraro, S., Diego, D.: Geometric integrators for higher-order variational systems and their application to optimal control. Journal of Nonlinear Science 26, 1615–1650 (2016)
  • Ferraro et al. [2021] Ferraro, S.J., Diego, D.M., Almagro, R.T.S.M.: Parallel iterative methods for variational integration applied to navigation problems. IFAC-PapersOnLine 54(19), 321–326 (2021)
  • [43] Timoshenko, S.P., Gere, J.M.: Theory of Elastic Stability. McGraw-Hill Book Company, (1961)
  • [44] Bigoni, D.: Nonlinear Solid Mechanics: Bifurcation Theory and Material Instability. Cambridge University Press, (2012)
  • Virtanen et al. [2020] Virtanen, P., Gommers, R., Oliphant, T.E., Haberland, M., Reddy, T., Cournapeau, D., Burovski, E., Peterson, P., Weckesser, W., Bright, J., van der Walt, S.J., Brett, M., Wilson, J., Millman, K.J., Mayorov, N., Nelson, A.R.J., Jones, E., Kern, R., Larson, E., Carey, C.J., Polat, İ., Feng, Y., Moore, E.W., VanderPlas, J., Laxalde, D., Perktold, J., Cimrman, R., Henriksen, I., Quintero, E.A., Harris, C.R., Archibald, A.M., Ribeiro, A.H., Pedregosa, F., van Mulbregt, P., SciPy 1.0 Contributors: SciPy 1.0: Fundamental Algorithms for Scientific Computing in Python. Nature Methods 17, 261–272 (2020)
  • Higham and Higham [2019] Higham, C.F., Higham, D.J.: Deep learning: An introduction for applied mathematicians. Siam review 61(4), 860–891 (2019)
  • [47] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT press, (2016)
  • Kingma and Ba [2015] Kingma, D., Ba, J.: Adam: A method for stochastic optimization. In: International Conference on Learning Representations (ICLR), San Diega, CA, USA (2015)
  • Paszke et al. [2019] Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., Antiga, L., et al.: Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems 32 (2019)
  • Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 2623–2631 (2019)
  • Wang and Perdikaris [2023] Wang, S., Perdikaris, P.: Long-time integration of parametric evolution equations with physics-informed deeponets. Journal of Computational Physics 475, 111855 (2023)
  • Wang et al. [2021] Wang, S., Teng, Y., Perdikaris, P.: Understanding and mitigating gradient flow pathologies in physics-informed neural networks. SIAM Journal on Scientific Computing 43(5), 3055–3081 (2021)