ML-based identification of the interface regions for coupling local and nonlocal models
Abstract
Local-nonlocal coupling approaches provide a means to combine the computational efficiency of local models and the accuracy of nonlocal models. However, the coupling process can be challenging, requiring expertise to identify the interface between local and nonlocal regions. This study introduces a machine learning-based approach to automatically detect the regions in which the local and nonlocal models should be used in a coupling approach. This identification process takes as input the loading functions evaluated at the grid points and provides as output the selected model at those points. Training of the networks is based on datasets provided by classes of loading functions for which reference coupling configurations are computed using accurate coupled solutions, where accuracy is measured in terms of the relative error between the solution to the coupling approach and the solution to the nonlocal model. We study two approaches that differ from one another in terms of the data structure. The first approach, referred to as the full-domain input data approach, inputs the full load vector and outputs a full label vector. In this case, the classification process is carried out globally. The second approach consists of a window-based approach, where loads are preprocessed and partitioned into windows and the problem is formulated as a node-wise classification approach in which the central point of each window is treated individually. The classification problems are solved via deep learning algorithms based on convolutional neural networks. The performance of these approaches is studied on one-dimensional numerical examples using F1-scores and accuracy metrics. In particular, it is shown that the windowing approach provides promising results, achieving an accuracy of 0.96 and an F1-score of 0.97. These results underscore the potential of the approach to automate coupling processes, leading to more accurate and computationally efficient solutions for material science applications.
Keywords coupling methods, local and nonlocal models, loading functions, machine learning, convolutional neural networks
1 Introduction
Recent advances in engineering mechanics have seen an increasing emphasis on coupling nonlocal models with classical local models, an approach driven by the need to address the excessive computational costs of high fidelity models, such as nonlocal equations, for modeling and simulation of complex behaviors in materials science. Nonlocal models, such as nonlocal diffusion and peridynamics, offer a more comprehensive representation of phenomena that are not adequately captured by classical Partial Differential Equations (PDEs). These phenomena include fracture and cracks in mechanics [14]. The scope of nonlocal models extends beyond diffusion and mechanics, finding applications in diverse areas like subsurface transport [3, 46, 25, 24, 49], phase transitions [2, 6, 8] and image processing [17, 31].
However, the implementation and simulation of nonlocal models present several challenges. These include the complexity of handling nonlocal boundary conditions [16] and the high computational costs associated with their numerical solutions. To address these challenges, coupling between local and nonlocal models approach has been proposed. Coupling methods aim to merge the computational efficiency of PDEs with the accuracy of nonlocal models, particularly in cases where nonlocal effects are confined to specific domain areas, and the rest of the system can be effectively described by a PDE.
For a general overview of local-nonlocal coupling methods, classified as either force-based or energy-based formulations, we refer to the recent survey [18]. Our focus is on the coupling of the bond-based peridynamic model with the classical linear elasticity continuum model, using a force-based coupling formulation. In this context, various coupling approaches have been proposed; these are based on matching the displacements (the Matching Displacement Coupling Method) or the stresses (the Matching Stress Coupling Method) either over an overlap region or a sharp interface obtained by shrinking the size of the nonlocal horizon when transitioning from a nonlocal to a local region. The latter approach is known as the variable horizon coupling method (VHCM); the interested reader can find an extended description in [15].
A key aspect of the coupling process is that it requires domain expertise to effectively choose the interface between the local and nonlocal regions. In this work, we propose to circumvent this challenge by letting machine learning detect such an interface region using a supervised approach. Specifically, the use of machine learning involves training neural network architectures on datasets containing local-nonlocal coupling regions that yield a certain accuracy of the coupled solution. Once trained, the networks can automatically identify the regions of the computational domain where nonlocal effects are likely to be significant, solely based on external loading information. This approach represents a significant advance, as it automates the identification of regions by removing the expert-in-the-loop.
Although to the best of our knowledge the use of machine learning for local-nonlocal coupling has never been pursued before, machine learning for nonlocal simulations and/or computational mechanics is not a new concept. In what follows, we briefly review the most relevant research. We first focus on machine learning for peridynamics. Haghighat et al. [22] presented a nonlocal physics-informed framework with the peridyanmic differential operator [33] using deep learning. Kim et al. [26] presented Peri-net to analyze crack patterns using Deep Neural Networks. Nguyen et al. [34] presented peridynamic-based machine learning approach for up to two-dimensional structures. Xu et al. [50] presented a machine-learning-based framework for peridynamic material models with physical constraints. Concerning nonlocal models, the following literature is available. Tao et al. [47] investigated nonlocal neural networks, nonlocal diffusion, and nonlocal modeling. Feng et al. [19] analyzed stochastic nonlocal damage using machine learning. Zhou et al. [56] learned nonlocal constitutive models using neural networks. You et al. [54, 52, 53] used data-driven learning of nonlocal physics from high-fidelity synthetic data. In [9], de Moraes et al. used machine learning to predict the evolution of nonlocal micro-structural defects in crystalline materials. Lal et al. [29] predicted nonlocal elasticity parameters using machine learning and molecular dynamics simulations.
Concerning local models, we restrict ourselves to classical continuum mechanics. For a review of machine learning and data-miming approaches for continuum material mechanics, we refer the reader to [5]; whereas for computational solid mechanics, we refer to [28]. A machine-learning approach for physics-based material models for multiscale solid mechanics was presented in [40].
Finally, the use of machine learning for coupling methods (not involving nonlocal or peridynamics models) is also not new; in fact, Raymond [39] used machine learning to model coupled solid and fluid mechanics with mesh-free methods.
The paper is organized as follows. Section 2 briefly introduces the models, coupling approaches, and discretization. In Section 3 the methodology and the ML model which is the convolution neural network (CNN) are described. Section 4 covers the data generation. In Section 5, we present numerical results for two cases using multiple node classification in Subsection 5.1 and the main case of this work using node-wise classification in Subsection 5.2. Finally, Section 6 concludes the paper.
2 Description of the models and coupling approach
The model problem that we consider in this work deals with the quasi-static simulation of a one-dimensional bar with mixed boundary conditions at the two extremities. The deformations within the bar are assumed to be small enough to be described by linear elasticity. This section briefly describes the local and nonlocal models: linear elasticity and linearized bond-based peridynamics [44]. We present the continuous formulations first and then provide their discretizations using finite differences for the local model and a collocation approach for peridynamics.
2.1 Classical linear elasticity model
We assume a one-dimensional bar of length occupying the open domain . Denoting the closure of as , we seek the displacement , , such that:
(1) | ||||
(2) | ||||
(3) |
where is the modulus of elasticity, denotes the cross-sectional area of the bar, is the external body force density (per unit length), and is a traction force applied at . We assume that is chosen smoothly enough so that the solution can be differentiated as many times as needed. For the sake of simplicity in the presentation, but without loss of generality, we assume a constant Young’s modulus and a constant cross-sectional area for the bar. The model is presented here with a homogeneous Dirichlet boundary condition at and a Neumann condition at , but we shall also consider examples with homogeneous Dirichlet conditions at both ends in the numerical experiments.
2.2 Linearized bond-based peridynamic model
The static peridynamic equation [44] reads in one dimension as
(4) |
where the so-called horizon determines the neighborhood of point
In addition, is the peridynamic stiffness parameter evaluated at point and is the displacement for the deformed configuration. Following [15], the peridynamic stiffness parameter is chosen such that the solution to the linearized peridynamic bond-based model is compatible with that to the continuum local model, i.e.,
(5) |
or, equivalently,
(6) |
As far as the boundary conditions are concerned, we consider the same local boundary conditions (2) and (3), which are imposed using the variable horizon method described below. The solution to this problem will serve as the reference solution. For more details about boundary conditions for the peridynamic model, we refer the reader to, e.g., [12, 37].
2.3 Variable horizon coupling approach
We consider in this work the Variable Horizon Coupling Method (VHCM) as introduced in [15]. A similar coupling approach was proposed in [43, 35]. For a broader review of coupling methods, including optimization-based and energy-based coupling methods, we refer to [11]. As shown in Figure 1, we define the region in which we shall use the nonlocal model (4) as and the region for the local model (1) as . We suppose here that is decomposed into the two subdomains and so that the end points of the bar are both adjacent to a domain governed by the local model. The main idea of the method is to make the horizon decrease to zero as one approaches the interfaces and so as to avoid the need to introduce an overlapping region around the interfaces. In this section, we describe the main ingredients of the VHCM method.
In the VHCM, the horizon depends on the position ; it is usually constant in the nonlocal region far from the local-nonlocal interface and shrinks to zero when approaching the interfaces and . Several choices for the transition of are possible. In one dimension, it was suggested in [43, 37, 15] to consider a piece-wise linear function, that is, given constant,
(7) |
as shown in Figure 2. It follows that Equation (6) for the stiffness constant now becomes:
(8) |
Using the variable horizon method, the coupling problem consists then in finding the displacements and such that
(9) | ||||
In other words, the displacements and stresses from the two models are matched at the interfaces, i.e. and , where the peridynamic stresses [42] are defined as
(10) |
2.4 Discretization
Several approaches for discretizing the coupled problem (9) are available [14]. Here we choose to discretize the classical linear elasticity model by a second-order finite differences scheme and the peridynamic model by a collocation approach, as introduced by Silling and Askari [45].
We now describe the discrete model associated with (9) on the configuration shown in Figure 3. As customarily done in the literature [45, 36], we fix the value of the horizon and introduce a uniform grid spacing chosen such that be a multiple of , i.e. , with a positive integer. Moreover, we choose the grid size to be the same in each of the subregions of the computational domain. In other words, the numbers of intervals , , and in , , and , respectively, are taken such that
(11) |
Since we assume that the grid points from the two models coincide at the interfaces and and that the grid is uniform in the three regions , , and , the grid points are simply given by:
(12) |
where . The numbering of the nodes is shown in Figure 3. In particular, we note that and . We consider the same numbering for the degrees of freedom associated with the displacements, that is, the nodal values will be denoted by:
(13) |
where we have used the continuity of the displacement at the two interfaces, that is, at and . The total number of degrees of freedom is then equal to , being each degree of freedom associated with one grid point.
The discretization of the VHC problem (9) leads to the following system of equations, in which we assume :
-
1.
Dirichlet BC at :
(14) -
2.
In :
(15) -
3.
At interface :
(16) -
4.
In :
(17) -
5.
At interface :
(18) -
6.
In :
(19) -
7.
Neumann BC at :
(20)
We recall that if and are constant, then the quantity remains also constant. Moreover, the stresses in Equations (16) and (18) are obtained as approximations of (10). More specifically, assuming that the solution is sufficiently regular in , the stresses are approximated using one-sided third-order finite differences stencils of the first derivative as
This choice ensures that the discretization method has a degree of precision of three, in other words, that any polynomial solution of degree up to three will be computed exactly.
3 Methodology
We propose to employ deep learning techniques to identify the regions in which one should use the local model or the nonlocal model, by solely using load information (i.e. external forces), as illustrated in Figure 4. More specifically, we shall use CNNs as our deep learning tool. We describe below CNNs and the specific architecture that we considered to classify the nature of the grid points.
3.1 Overview
Our main objective is to develop an automated approach to identify the nonlocal region and the local regions and , shown in Figure 3, to be used in a coupling method. From now on, the nodal values of the solution located in the regions and will be indexed by LM (Local Model), whereas the nodal values located in will be indexed by NLM (Nonlocal Model). Although this is a simplified setting that does not fully reflect the numerical tests presented below, we choose to describe the method using one single nonlocal region for the sake of clarity. The extension to more than one nonlocal domains is straightforward.
An overview of the proposed automated identification process is shown in Figure 4. The input data consists of the external load vector , , evaluated at all interior grid points , see Figure 4a. The output of the process is a label associated with each node that takes the value 0, if the node is classified as belonging to the local region, or 1 otherwise (i.e. the node belongs to the nonlocal region). It is essential to note that the nature of our approach is supervised learning, that is, each load input during training corresponds to a known classification. As this requires training data, the load input (see Figure 4a) is also used to generate the ground-truth data that will serve as the reference configuration during training. The generation, as shown in Figure 4b, is conducted using the following procedure: a coupling configuration (local-nonlocal domain assignment) is added to the training set when it provides a coupled solution whose error, with respect to the fully nonlocal solution, is below a given tolerance. More specifically, we first compute the reference displacement using the nonlocal model in which all nodal points are NLM nodes . Then, given a coupling configuration (in which the nodes are either LM nodes or NLM nodes ), we compute the displacement using the method described in the previous section, see Figure 4b. As mentioned above, the reference configuration is included in the training dataset if the relative error in the norm is below a given tolerance , namely
(21) |
For more details about the data generation, the reader is referred to Section 4. The loads and the reference configurations of the local-nonlocal splitting provide the training data for the CNN in a supervised manner. Once trained, the CNN model takes as input an (unseen) external load , see Figure 4c, and outputs the predicted configuration (i.e. it classifies the nodes as either LM or NLM at each grid point) in the domain, see Figure 4d. More details of the CNN architecture are given in the following section.
3.2 Convolution neural networks
In recent years, deep learning has yielded impressive results across a spectrum of challenges, such as in visual recognition, speech understanding, natural language processing, and even inference of physical systems responses. Among different deep neural network architectures, CNNs have been investigated and analyzed in depth [21]. CNNs are particularly effective for tasks involving data with a grid-like topology, such as images, audio spectrograms, and sequential data arising for instance in natural language processing, with applications ranging from image classification and object tracking to speech and natural language processing [21, 30].
The CNN structure consists of various fundamental components, including convolutional, pooling, and fully connected layers. The standard structure involves a series of convolution layers followed by a pooling layer. The layers are then succeeded by one or more fully connected layers. The phase in which input data undergoes transformation into output through the layers is the so-called forward propagation process [51]. In the next two sections, we give a brief description of CNNs’ features for the non-expert reader.
-
1.
Convolutional Layers: Convolutional layers are the cornerstone of CNNs. These layers use learnable kernels that convolve over the input data, enabling the network to detect local patterns and features. More precisely, a kernel is typically a small array, usually of size 3 3, which is systematically passed through an input array, also referred to as a tensor. At each position within the tensor, an element-wise multiplication is performed between the elements of the kernel and the corresponding elements in the input tensor. A bias is then added to the sum of these products in order to yield the output value, which occupies the corresponding position in the output tensor—termed “feature map”. This process is reiterated by employing multiple kernels to generate diverse feature maps. These feature maps encapsulate various attributes of the input tensors, such as distinct kernel functions serving as distinct feature extractors. The operation of a convolutional layer can be expressed as:
(22) where denotes an input matrix, a kernel matrix, is the number of feature maps from the previous layer, and a bias value. Finally, the rectified linear unit activation function [41],
is applied element-wise to the matrix, resulting in the output matrix . The primary task of the learning process involves identifying sets of suitable kernel matrices capable of extracting distinctive and discriminative features for the output tasks. To achieve this, the back-propagation algorithm, commonly employed to optimize the connection weights in neural networks, can be adapted. In this context, it serves to train both the kernel matrices and biases, functioning as shared neuron connection weights.
-
2.
Pooling Layers: Pooling layers downsample the dimensions of feature maps while retaining the salient information; thus, their main effect is an intelligent feature-dimension reduction. To effectively reduce the number of output neurons, pooling algorithms usually combine neighboring elements in the convolution output matrices. Notable among these algorithms are max-pooling and average-pooling [1]. In this study, we implement a max-pooling layer with a 2 × 1 size, which selects the highest value from the two neighboring elements of the input feature map, thereby generating a single element for the output feature map.
-
3.
Fully Connected Layers: Fully connected layers process the flattened feature vectors to make the final predictions. A fully connected layer operation with input and output is represented as
(23) where is the output, the input vector, the weight matrix, the bias vector, and a given activation function [41].
The selected CNN architecture:
We propose to use the following CNN architecture for the identification of local and nonlocal regions. First, as a preprocessing step, the input of the network is normalized to a load vector with unit variance and zero mean. The architecture is composed of convolutional layers with ReLU activation functions for feature extraction, followed by pooling layers for spatial reduction, and fully connected layers for the final output. This is shown in Fig. 5. The first convolution layer uses 32 filters with kernels. The second convolution layer has 64 filters with 3 kernels, and the third convolution layer has 128 filters with kernels. Each convolution layer is followed by a max pooling layer with pool size of two. The output tensor from these three conv-pooling blocks has a dimension of . The following two layers consist of one flattened layer, followed by a fully connected layer with 64 neurons. The last layer of our CNN model is a fully connected (dense) layer, where the number of neurons matches the output size. In our study, we explore two cases with distinct output sizes. In the first case, the CNN receives as input the full vector of the second derivative of the load functions at the grid points. The resulting output is a vector of labels, having the same size of the input load vector. Herein, the output size is 257, leading to a fully connected layer with 257 neurons. Meanwhile, in the second case, the CNN receives a window as input. The resulting output is a label for the central node. Herein, the output size is then one, corresponding to a single neuron in the final layer. Further details on these cases are discussed in Section 4.4. The sigmoid activation function is used in the last layer across both cases. It is important to note that our experimental investigations explore a variety of network architectures, using different numbers of layers and neurons. Out of all the architectures evaluated, the one that produces the best results is the one that is presented herein.
Computational resources and model parameters:
The CNN model is trained by using the Adam optimizer [27] and a batch size of 32. The training proceeds for a maximum of 200 epochs; however, an early stopping callback is implemented to avoid overfitting. When early stopping is used, training will end as soon as the validation dataset’s loss does not improve [4]. The learning rate is set to the commonly used value of 0.001. The proposed model is implemented in TensorFlow 2.10, with Keras in Python 3.10.11.
3.3 Classification evaluation metrics
In the context of binary classification problems such as ours, the classification capability of the trained network is usually evaluated with a confusion matrix. The confusion matrix displays the correctly or incorrectly predicted results. It is based on four key elements: true positives (TP, correctly predicted LM), false positives (FP, wrongly predicted LM), true negatives (TN, accurately predicted NLM), and false negatives (FN, wrongly predicted NLM). To evaluate the confusion matrix, we use two induced metrics, namely the accuracy metric and the F1-score, which are the most used metrics in classification [23, 32]:
-
1.
The accuracy metric measures the ratio of correct predictions over the total number of instances evaluated. Higher values for accuracy are indicative of better performance. Formally,
(24) -
2.
The F1-score is the harmonic mean of “precision” and “recall”. It ranges from 0 to 1, where a higher value indicates better classification performance. Formally,
(25) where
We note that in case 1, see Section 5.1, we report the average over all samples for all metrics (i.e. confusion matrix, accuracy, and F1-score).
4 Data generation
The CNN used for classification is trained on input-output pairs of external loads and corresponding coupling configurations (inducing a coupled solution with a certain accuracy); this pipeline is illustrated in Figure 4. In this section, we introduce the definition of the load functions and present the steps involved in the generation of the coupling regions.
4.1 Families of loading functions
In this section, we consider several classes of loading functions, including singular functions that induce discontinuous solutions with a finite jump to mimic fracture phenomena in higher dimensions. One advantage of peridynamics is that the model does not feature any spatial derivatives thus allowing for solutions with jumps.
-
1.
The first family of load functions, , is based on functions [7] featuring a singularity at and defined as:
(26) The function and the corresponding fully nonlocal solution are shown in Figure 6(a-b) for = 0.03 and .
In cases where the singularity is located at a point different from , we will consider the loading function .
-
2.
The second family of load functions, , featuring a singularity at [10], is defined by:
(27) The function and the corresponding full nonlocal solution are plotted in Figure 6(c-d) for = 0.03 and .
In cases where the singularity is located at a point different from , we will evaluate the loading function as .
-
3.
The study also includes the family of loads characterized by polynomial solutions of degree three and lower. The loading functions are in this case linear of the form:
(28) an example of which is shown in Fig. 6(e-f) with and .
-
4.
We consider as well the following family of load functions:
(29) where the parameter controls the variation in the solution around the point . An example of function is shown in Fig. 7(a-b) for .
-
5.
Finally, we also include a family of smooth load functions that induce continuous solutions with local behavior:
(30) Such a function is a smooth representation of a point-wise load applied at the point . The load and corresponding solution is shown in Fig. 7(c-d) for .
4.2 Generation of the coupling regions
We present the steps used to generate the output data for the CNN training, i.e. we describe how the reference configurations of the coupling regions are constructed based on the loading functions.
For the sake of simplicity, all experiments focus on a one-dimensional bar with a material property of and cross-sectional area , resulting in . We consider the configuration of Figure 3 with and interface locations and . We choose the values of and such that grid points coincide the two interfaces. The domain is partitioned into sub-intervals of size , with given. The horizon is taken as a multiple of , i.e. , with . We acknowledge that implementing non-uniform grids in our approach poses significant challenges, particularly in integrating them with the standard CNN architecture and our coupling method. Therefore, and for the purposes of this proof-of-concept, we will employ a fixed discretization grid to evaluate the innovative aspects of our proposed approach. To avoid any issues with applying the local boundary conditions, we assume that is such that and . We will nevertheless vary the positions and of the interfaces in order to consider solutions with a discontinuity located in the interval .
We now describe the process of generating the reference coupling regions used in the training for the special cases and . Since these loads are smooth away from their respective singularities, it seems sensible that the region where one would use the nonlocal model in the local-nonlocal coupling approach should contain the singular point. We thus compute the fully nonlocal reference solution and choose the smallest interval around the singularity, i.e. where is a multiple of the mesh size , such that the relative error (21) between the fully nonlocal and coupled solutions is below a given tolerance . In our experiments, we fix to . It is clear that the proposed process is directly influenced by the specific features of the chosen loading functions.
In Figure 8, we show an example of an optimal nonlocal region configuration using and with singularity at . The displacements and using are shown in Figure 8(a). The value of using is . Figure 8(b) presents the displacement and using the loading type . The value of in this case is . In both cases, the error in the coupled solution is below the selected threshold. We also show the pointwise error between the nonlocal solution (NLM) and the coupled solution (VHCM) in Figure 8(c-d), to illustrate that the local error remains relatively small at all points.
As mentioned in Section 2, the solutions to the local and the nonlocal models coincide for polynomials of degree up to three; furthermore, it is known that the fourth derivative of the solution is an indicator of the extent of the nonlocality of the solution [7]. As a pre-processing step, we leverage this information by taking the second derivative of the loading functions rather than the original functions themselves. Here, the second derivatives are simply evaluated using finite differences approximations of the loading functions. This step will help in accurately capturing the underlying behavior for the load and serve as a nonlocality indicator. The second derivatives of these loads are then used as inputs for our ML model.
We now proceed to explain the process when using load . The coefficients and in Eq. (28) are random values within the range and , respectively. In this case, the reference configuration is characterized by a fully local region as the local model and the nonlocal model lead to the same solution. This choice is made to anchor the training of our model on an instance where the behavior is fully local. This is pivotal for enriching the learning process of the model, enabling it to discern and accurately predict the nuances of fully local configurations alongside the more complex local-nonlocal configurations. Notably, the second derivative of is always zero and is used as input for our ML model.
4.3 Approaches for input-output selection
We recall that we consider two strategies. In the first one, we utilize the complete dataset over the full computational domain as input. In this scenario, the nature of the problem is akin to the simultaneous classification of all nodes. In the second approach, we transform the datasets into windowed inputs, and solve a point-wise classification problem.
4.3.1 Full-domain input data
The first strategy consists of considering the second derivative of the load functions at all grid points simultaneously following the generation of data as explained in Section 4.2. In this context, we increase the model’s data volume and enhance its ability to generalize by introducing specific transformations to the functions (26) and (27). Specifically, we incorporate the negation operation, which complements our existing data by providing contrasting cases. The second derivatives of , , and are estimated and used here as input vectors. In this approach, the input-output pairs are given by a full second derivative of the load vector-full labels vector. The CNN receives the full second derivative vector as input. The resulting output is a vector of labels, having the same size of the input vector. Each output node has either the LM or NLM label, according to the region (local or nonlocal) they belong to. After prediction, we implement a post-processing step where nonlocal regions containing fewer than eight nodes are treated as local (LM) since we are using the variable horizon coupling method (VHCM) with .
A. Dataset Distribution | ||
Case 1: Full-domain Data | Case 2: Windowing Data | |
Train | 589 | 22802 |
Test | 119 | 4498 |
Validation | 78 | 3476 |
Total | 786 | 30776 |
B. Processing Time (seconds) | ||
Case 1: Full-domain Data | Case 2: Windowing Data | |
Generation Time | 138.77 | 182.37 |
Train Time | 30.19 | 137.17 |
Test Time | 0.6 | 10.9 |
4.3.2 Window-based input data
The second (and more effective) strategy proposed in this paper is the windowing approach. This approach proves to be a highly effective strategy for handling data with several singularities. In this approach, each node, denoted as , is treated individually. In the windowing approach, we reformulate the classification problem as point-wise classification; an illustration of the window selection is presented in Figure 9. By utilizing window configurations around each data point, we are effectively segmenting the load’s second derivative, enabling a focused and precise classification strategy at each point. This method allows for a more focused classification approach, providing a detailed insight into the load characteristics within specific windowed segments. The induced input-output pairs are then given by window-label. The CNN receives a window of load’s second derivative as input. The resulting output is a label for the central node. The label is either the LM or NLM, according to the region (local or nonlocal) the central node belong to. In a post-processing step, we apply the same process detailed in Section 4.3.1. This approach aims at enhancing the model’s capability to detect patterns and variations in the data, making our analysis more precise and adaptable to different load scenarios.
We present in Figure 9 an illustration of the window selection for the window-based approach. Considering two points and their corresponding windows, the window spans two horizons on the left and on the right of the central point, thereby covering a total of four horizons. Figure 9 shows two central nodes and , each with its associated windows and . These windows are used as input data for the CNN enabling, classification labels based on their region. The output labels will be then LM for since it is located in the local region (represented by ) and NLM for since it is located in the nonlocal region (represented by ).
We note that this approach can be also applied and used in higher dimensions. For instance, in two or three dimensions, the windows might transition from one-dimensional segments to two-dimensional areas or even three-dimensional volumes. By varying these windows, we aim to capture and understand how nonlocal effects act within specific regions or volumes across the broader dimensional space.
4.4 Case studies
In this section, we describe the different scenarios and case studies considered in our tests. A summary is reported in Table 2.
4.4.1 Case 1: Full-domain input data
In this case, we generate the data as explained in Section 4.3.1. These simulations result in a total of 3,084 data samples, as shown in Table 1(A). The data is first split into training, testing, and validation sets. The training set is used to train the CNN model, where the model learns the relationship between inputs and outputs, adjusting its parameters (like weights and biases) accordingly. The validation set is used during the training process; it helps in evaluating the performance of the model on a dataset it has not be subjected to before. The testing set is used to check how well the model can generalize and helps prevent overfitting. In other words, the testing set allows one to evaluate the performance of the model on new, unseen data [38]. We split the dataset by assigning 75% of the samples to the training set, 10% to the validation set and 15% to the testing set. The size of each data sample is 257, resulting in a training set of size and a validation set of size , while the testing set has size . The generation and partitioning of the dataset into training, validation, and testing sets necessitates a total of 138.77 seconds (Table 1(B)).
In the full-domain input data approach, the CNN receives as input the full vector of the second derivatives of the load functions at the grid points. The resulting output is a vector of labels, having the same size of the input load vector. Each output node has either the LM or NLM label, according to the region (local or nonlocal) they belong to. The CNN model is structured as multiple node classification model; the results are detailed in Section 5.1. A summary of this test case is shown in Table 2.
Case Study | Case 1 | Case 2 |
Section | Section 5.1 | Section 5.2 |
Input | Full Load’s Second Derivative Vector | Windows |
Output | Full Labels Vector | One Node Label |
Load Dataset | , . | , . |
ML Model Type | Multiple node Classification | Node wise Classification |
Interpretation | Demonstrated the feasibility of the proposed approach for region detection, as a kind of a “proof of concept". | Highly effective strategy for handling data with varying numbers of discontinuities without the need for retraining the model for each specific scenario. |
4.4.2 Case 2: Window-based input data
In this case, we use the window-based approach, in which the dataset is generated and preprocessed using the methodology explained in Section 4.3.2. The initial dataset, consisting of 786 complete input-output vector pairs, is partitioned into distinct sets for training, validation, and testing. The training set constitutes 75% of the total data, the validation set allocates 10%, and the remaining 15% is designated for the testing set. We then implement the windowing approach, segmenting each vector into windows of data. Subsequently, we perform a deduplication process, removing duplicated input windows-output label in each set and between sets (train-test, train-validation, and validation-test), ensuring unique samples across all sets. The resultant windowed dataset comprises a total of 30776 data samples (Table 1(A)). The training set, the validation set, and the testing set contain 22802, 3476, and 4498 data samples, respectively. We note that the size of input samples depends on the window size. The generation and data splitting into windows require a total of 182.37 seconds (Table 1(B)). We point out that we introduced the deduplication step to ensure fairness in the testing process. While straightforward in 1D experiments with uniform grids, in practice and especially in higher dimensions, deduplication is nontrivial. However, it is often the case that solutions have common features in certain regions of a computational domain so that having duplicates is very natural and actually helps the learning process. Thus, our results show that even when we stress test the model by removing favorable data, the model still performs well (Section 5.2).
The CNN inputs are the windows and the corresponding output is the label (local or nonlocal) of the central node. Thus, the model has been restructured as a node-wise classification model. We recall that the labels are either LM or NLM depending on the classification (local or nonlocal). An overview of this case is summarized in Table 2.
5 Numerical results
5.1 Case 1: Full-domain input data
In the process of training our CNN model, both training and validation loss were computed to monitor the model’s learning progress and its ability to generalize. The training proceed for a maximum of 200 epochs; however, an early stopping callback is implemented to avoid overfitting. The training time is notably brief, amounting to only 30.19 seconds. However, testing the model requires a duration of 0.6 seconds (Table 1(B)). Figure 10(a) presents the training and validation losses over the training epochs. The performance of the model is assessed using the evaluation metrics in Section 3.3 and is shown in Table 3. The model’s performance evaluation produces an average accuracy of 0.99 and an F1-score value of 0.94 (Table 3). We present the corresponding average confusion matrix in Figure 10(b). Within the confusion matrix, the model predicts correctly 99.73% of the LM instances and 96.34% of the NLM ones. These percentage-based values illustrate the model’s high precision and recall scores, and show its robustness in accurately classifying nodes within local and nonlocal regions for the training dataset.
To evaluate the results after regions detection, we estimate the error in Eq (21) between the fully nonlocal displacement and the coupled displacement , whose nonlocal region is the one predicted by the CNN. An example of the results for both and is reported in Figure 11, where, in both cases, the discontinuity is at . The value of using is ; while using is .
For the test set, the predicted nonlocal region consistently induces an error in the coupled solution much lower than the tolerance used during training . Across all cases, the estimated errors fall within the range of to using and to using (Table 3).
Accuracy | F1-score | Load | ||
Case 1: Full-domain Data | 0.99 | 0.94 | – | |
– | ||||
() | ||||
() | ||||
Case 2: Windowing data | 0.96 | 0.97 | – | |
– | ||||
() | ||||
() | ||||
∗Here the coupled solution corresponds to the fully local solution . The estimated error is then computed as the difference between and .
5.2 Case 2: Window-based input data
The results in the previous Section 5.1 showed promise that machine learning could support the selection of local-nonlocal coupling regions; however, the full-load strategy does not have satisfactory generalization properties when testing using unseen data. In this section, we illustrate how the windowing approach circumvents these limitations.
As before, the number of epochs is set to of 200; however, an early stopping callback is used to avoid overfitting. Training the model with windowed data take 137.17 seconds while testing requires a duration of 10.9 seconds (Table 1(B)). Figure 12(a) presents the training and validation losses over the training epochs. While both losses decrease overall, the close tracking between training and validation loss suggests that the model is not overfitting to the training data. The plot shows that the callback effectively prevented overfitting by stopping training at epoch 50 when the validation loss showed no further improvement. The performance of the model was assessed using the evaluation metrics are shown in Table 3. The performance metrics show an accuracy of 0.96 and an F1-score of 0.97; the corresponding confusion matrix, reported in Figure 12(c), shows that the model predicted correctly 94.03% of the LM instances and 97.18% of of the NLM ones. Also in this case, the model has high precision and recall scores.
For the test set, the predicted nonlocal region induces an error in the coupled solution much lower than the tolerance used during training all cases using both and . Across all cases, the estimated errors fall within the range of to using and to using (Table 3). The lowest errors are obtained when using the original version of loads and , which means without transformations.
To evaluate the windowing approach’s ability to identify multiple discontinuities, we test our pre-trained model, which has not been trained on loads with more than one discontinuity, using a load with two discontinuities. The results are presented in Figure 13. The predicted configurations of local and nonlocal regions are shown in Figure 13(a), while the displacements (represented by ) and (represented by ) are shown in Figure 13(b). The corresponding error, denoted as , is .
In addition, we test our model using load (Eq. 29) and (Eq. 30); these loading functions do not belong to the training set. Thus, with these examples, we test mild generalization properties of the algorithm.
For , we consider two scenarios; one with t=0.05 and another with t=0.0005. In all cases, the second derivative is estimated and subsequently segmented into windows, serving as the input for the pre-trained model. Once again, to evaluate the precision of the region detection, we estimate the error between the fully nonlocal displacement () and the coupled displacement () using Eq. (21). Figure 14(Left) shows the predicted configurations of local and nonlocal regions. Figure 14(Right) shows , represented by and , represented by .
For with t=0.05, the prediction labels and configurations of local and nonlocal regions are showcased in Figure 14(a-b). The corresponding error value, denoted as , is . When testing with t=0.0005, the findings are similarly reported in Figures 14(c-d), with an error value of .
For , as defined in Eq. 30, we follow the same process. The results are presented in Figure 14(e-f), and the corresponding error is .
6 Conclusion
This paper shows a proof of concept for a ML-based identification of local-nonlocal coupling regions that is independent of the local and nonlocal models utilized. Specifically, the load used as identification input is independent of the two models and the method used to generate local and nonlocal solutions is irrelevant. Furthermore, while we only considered a non-overlapping approach, a coupling approach with overlapping regions could also be used (examples can be found in [15]).
Among the two approaches we presented, the full-domain approach yields promising results for the classification of the local and nonlocal regions as long as the solutions do not feature multiple discontinuities. To overcome this limitation, we introduced the windowing approach based on a node-wise classification. The latter shows promising results even in the presence of several discontinuities without retraining (see Section 5.2). This model demonstrated robust performance, as reflected by high evaluation metrics, with an accuracy of 0.98 and an F1-score of 0.97. The results were further validated by the corresponding confusion matrix (see Figure 12(c)), which provides a detailed breakdown of the model’s classification accuracy. Nevertheless, a minor limitation in our model’s predictive precision is noticed. Specifically, our analysis identified a misclassification rate of 5.21% of instances as LM (nodes in Local region) and 1.14% as NLM (nodes in nonlocal region). These relatively low values of the total predictions highlight areas of potential improvement.
The proof of concept approach introduced in this work opens several avenues of improvement and extension. In the results presented in this study, the horizon is constant; for more realistic tests, varying values should be investigated (the work in [13] can be used as a guide for the choice of and ). As for the boundary conditions, we use homogeneous Dirichlet boundary conditions, while in real-world applications mixed conditions are usually prescribed. More importantly, an extension to two- and three-dimensional problems is mandatory and it is the subject of our current work. Furthermore, stress testing the approach by using more complex forcing functions from a larger subspace should also be explored (e.g. randomly generated functions with prescribed regularity properties).
Another relevant aspect not directly addressed in this work is the presence of cracks and fractures. These features were mimicked in the current study by the presence of discontinuities. In two- and three-dimensional settings, we plan to incorporate these features into the machine learning model and validate our results against real experiments.
In terms of machine learning algorithms, the windowing approach showed promising results and is the candidate strategy to proceed with in higher dimensions. However, the open question is the choice of the shape of the window. Possible candidates in two dimensions could be rectangles or ellipses. More generally, the shape of the window should be chosen such that it matches the support of the nonlocal kernel function. Alternatively, one could parameterize a family of two-dimensional shapes, in which case the parameters would become learnable parameters in the training process. Another open question is the architecture of the CNN for higher dimensions as the input layer and output layer need to be adapted to the higher dimensions.
Additionally, due to the fact that CNNs fail in the presence of nonuniform unstructured grids, our future exploration includes the use of graph neural networks (GNNs) [55, 48, 20]. Given the graph-like structure of our problem, leveraging GNNs could offer a compelling approach to model and predict complex interactions within nodes in local and nonlocal regions. The adaptability of GNNs to capture relationships among various regions’ nodes makes them a promising approach especially for higher dimensions.
7 Funding statement
Sandia National Laboratories is a multimission laboratory managed and operated by National Technology & Engineering Solutions of Sandia, LLC, a wholly owned subsidiary of Honeywell International Inc., for the U.S. Department of Energy’s National Nuclear Security Administration under contract DE-NA0003525.
This paper describes objective technical results and analysis. Any subjective views or opinions that might be expressed in the paper do not necessarily represent the views of the U.S. Department of Energy or the United States Government.
References
- [1] S. Albawi, T. A. Mohammed, and S. Al-Zawi. Understanding of a convolutional neural network. In 2017 international conference on engineering and technology (ICET), pages 1–6. Ieee, 2017.
- [2] P. W. Bates and A. Chmaj. An integrodifferential model for phase transitions: stationary solutions in higher space dimensions. Journal of statistical physics, 95:1119–1139, 1999.
- [3] D. A. Benson, S. W. Wheatcraft, and M. M. Meerschaert. Application of a fractional advection-dispersion equation. Water resources research, 36(6):1403–1412, 2000.
- [4] E. Bisong and E. Bisong. Regularization for deep learning. Building Machine Learning and Deep Learning Models on Google Cloud Platform: A Comprehensive Guide for Beginners, pages 415–421, 2019.
- [5] F. E. Bock, R. C. Aydin, C. J. Cyron, N. Huber, S. R. Kalidindi, and B. Klusemann. A review of the application of machine learning and data mining approaches in continuum materials mechanics. Frontiers in Materials, 6:110, 2019.
- [6] C. Chen and P. C. Fife. Nonlocal models of phase transitions in solids. Advances in Mathematical Sciences and Applications, 10(2):821–849, 2000.
- [7] X. Chen and M. Gunzburger. Continuous and discontinuous finite element methods for a peridynamics model of mechanics. Computer Methods in Applied Mechanics and Engineering, 200(9-12):1237–1250, 2011.
- [8] K. Dayal and K. Bhattacharya. Kinetics of phase transformations in the peridynamic formulation of continuum mechanics. Journal of the Mechanics and Physics of Solids, 54(9):1811–1842, 2006.
- [9] E. A. B. de Moraes, M. D’Elia, and M. Zayernouri. Machine learning of nonlocal micro-structural defect evolutions in crystalline materials. Computer Methods in Applied Mechanics and Engineering, 403:115743, 2023.
- [10] M. D’Elia and M. Gunzburger. Optimal distributed control of nonlocal steady diffusion problems. SIAM Journal on Control and Optimization, 52(1):243–273, 2014.
- [11] M. D’Elia, X. Li, P. Seleson, X. Tian, and Y. Yu. A review of local-to-nonlocal coupling methods in nonlocal diffusion and nonlocal mechanics. Journal of Peridynamics and Nonlocal Modeling, 2021.
- [12] M. D’Elia, X. Tian, and Y. Yu. A physically-consistent, flexible and efficient strategy to convert local boundary conditions into nonlocal volume constraints. SIAM Journal of Scientific Computing, 42(4):A1935–A1949, 2020.
- [13] P. Diehl, F. Franzelin, D. Pflüger, and G. C. Ganzenmüller. Bond-based peridynamics: a quantitative study of mode i crack opening. International Journal of Fracture, 201:157–170, 2016.
- [14] P. Diehl, R. Lipton, T. Wick, and M. Tyagi. A comparative review of peridynamics and phase-field models for engineering fracture mechanics. Computational Mechanics, 69(6):1259–1293, 2022.
- [15] P. Diehl and S. Prudhomme. Coupling approaches for classical linear elasticity and bond-based peridynamic models. Journal of Peridynamics and Nonlocal Modeling, 4(3):336–366, 2022.
- [16] P. Diehl, S. Prudhomme, and M. Lévesque. A review of benchmark experiments for the validation of peridynamics models. Journal of Peridynamics and Nonlocal Modeling, 1:14–35, 2019.
- [17] M. D’Elia, J. C. De Los Reyes, and A. Miniguano-Trujillo. Bilevel parameter learning for nonlocal image denoising models. Journal of Mathematical Imaging and Vision, 63(6):753–775, 2021.
- [18] M. D’Elia, X. Li, P. Seleson, X. Tian, and Y. Yu. A review of local-to-nonlocal coupling methods in nonlocal diffusion and nonlocal mechanics. Journal of Peridynamics and Nonlocal Modeling, pages 1–50, 2021.
- [19] Y. Feng, Q. Wang, D. Wu, W. Gao, and F. Tin-Loi. Stochastic nonlocal damage analysis by a machine learning approach. Computer Methods in Applied Mechanics and Engineering, 372:113371, 2020.
- [20] R. J. Gladstone, H. Rahmani, V. Suryakumar, H. Meidani, M. D’Elia, and A. Zareei. Gnn-based physics solver for time-independent pdes. arXiv preprint arXiv:2303.15681, 2023.
- [21] J. Gu, Z. Wang, J. Kuen, L. Ma, A. Shahroudy, B. Shuai, T. Liu, X. Wang, G. Wang, J. Cai, et al. Recent advances in convolutional neural networks. Pattern recognition, 77:354–377, 2018.
- [22] E. Haghighat, A. C. Bekar, E. Madenci, and R. Juanes. A nonlocal physics-informed deep learning framework using the peridynamic differential operator. Computer Methods in Applied Mechanics and Engineering, 385:114012, 2021.
- [23] M. Hossin and M. N. Sulaiman. A review on evaluation metrics for data classification evaluations. International journal of data mining & knowledge management process, 5(2):1, 2015.
- [24] A. Katiyar, S. Agrawal, H. Ouchi, P. Seleson, J. T. Foster, and M. M. Sharma. A general peridynamics model for multiphase transport of non-newtonian compressible fluids in porous media. Journal of Computational Physics, 402:109075, 2020.
- [25] A. Katiyar, J. T. Foster, H. Ouchi, and M. M. Sharma. A peridynamic formulation of pressure driven convective fluid transport in porous media. Journal of Computational Physics, 261:209–229, 2014.
- [26] M. Kim, N. Winovich, G. Lin, and W. Jeong. Peri-net: analysis of crack patterns using deep neural networks. Journal of Peridynamics and Nonlocal Modeling, 1:131–142, 2019.
- [27] D. P. Kingma and J. Ba. Adam: A method for stochastic optimization, 2014.
- [28] S. Kumar and D. M. Kochmann. What machine learning can do for computational solid mechanics. In Current trends and open problems in computational mechanics, pages 275–285. Springer, 2022.
- [29] H. P. Lal, B. Abhiram, and D. Ghosh. Prediction of nonlocal elasticity parameters using high-throughput molecular dynamics simulations and machine learning. European Journal of Mechanics-A/Solids, page 105175, 2023.
- [30] Z. Li, F. Liu, W. Yang, S. Peng, and J. Zhou. A survey of convolutional neural networks: analysis, applications, and prospects. IEEE transactions on neural networks and learning systems, 2021.
- [31] Y. Lou, X. Zhang, S. Osher, and A. Bertozzi. Image recovery via nonlocal operators. Journal of Scientific Computing, 42(2):185–197, 2010.
- [32] J. Lu, L. Tan, and H. Jiang. Review on convolutional neural network (CNN) applied to plant leaf disease classification. Agriculture, 11(8):707, 2021.
- [33] E. Madenci, A. Barut, and M. Futch. Peridynamic differential operator and its applications. Computer Methods in Applied Mechanics and Engineering, 304:408–451, 2016.
- [34] C. T. Nguyen, S. Oterkus, and E. Oterkus. A peridynamic-based machine learning model for one-dimensional and two-dimensional structures. Continuum Mechanics and Thermodynamics, 35(3):741–773, 2023.
- [35] J. Nikpayam and M. A. Kouchakzadeh. A variable horizon method for coupling meshfree peridynamics to FEM. Computer Methods in Applied Mechanics and Engineering, 355:308–322, 2019.
- [36] M. L. Parks, R. B. Lehoucq, S. J. Plimpton, and S. A. Silling. Implementing peridynamics within a molecular dynamics code. Computer Physics Communications, 179(11):777–783, 2008.
- [37] S. Prudhomme and P. Diehl. On the treatment of boundary conditions for bond-based peridynamic models. Computer Methods in Applied Mechanics and Engineering, 372:113391, 2020.
- [38] S. Raschka. Model evaluation, model selection, and algorithm selection in machine learning. arXiv:1811.12808, 2018.
- [39] S. J. Raymond. Combining numerical simulation and machine learning-modeling coupled solid and fluid mechanics using mesh free methods. PhD thesis, Massachusetts Institute of Technology, 2020.
- [40] I. Rocha, P. Kerfriden, and F. van der Meer. Machine learning of evolving physics-based material models for multiscale solid mechanics. Mechanics of Materials, page 104707, 2023.
- [41] S. Sharma, S. Sharma, and A. Athaiya. Activation functions in neural networks. Towards Data Sci, 6(12):310–316, 2017.
- [42] S. Silling. Local-nonlocal coupling in Emu/PDMS. Sandia Report (SAND2020-11382), 2020.
- [43] S. Silling, D. Littlewood, and P. Seleson. Variable horizon in a peridynamic medium. Journal of Mechanics of Materials and Structures, 10(5):591–612, 2015.
- [44] S. A. Silling. Reformulation of elasticity theory for discontinuities and long-range forces. Journal of the Mechanics and Physics of Solids, 48(1):175–209, 2000.
- [45] S. A. Silling and E. Askari. A meshfree method based on the peridynamic model of solid mechanics. Computers & Structures, 83(17-18):1526–1535, 2005.
- [46] J. L. Suzuki, M. Gulian, M. Zayernouri, and M. D’Elia. Fractional modeling in action: A survey of nonlocal models for subsurface transport, turbulent flows, and anomalous materials. Journal of Peridynamics and Nonlocal modeling, 5(3):392–459, 2023.
- [47] Y. Tao, Q. Sun, Q. Du, and W. Liu. Nonlocal neural networks, nonlocal diffusion and nonlocal modeling. Advances in Neural Information Processing Systems, 31, 2018.
- [48] Z. Wu, S. Pan, F. Chen, G. Long, C. Zhang, and S. Y. Philip. A comprehensive survey on graph neural networks. IEEE transactions on neural networks and learning systems, 32(1):4–24, 2020.
- [49] X. Xu, M. D’Elia, C. Glusa, and J. T. Foster. Machine-learning of nonlocal kernels for anomalous subsurface transport from breakthrough curves. arXiv preprint arXiv:2201.11146, 2022.
- [50] X. Xu, M. D’Elia, and J. T. Foster. A machine-learning framework for peridynamic material models with physical constraints. Computer Methods in Applied Mechanics and Engineering, 386:114062, 2021.
- [51] R. Yamashita, M. Nishio, R. K. G. Do, and K. Togashi. Convolutional neural networks: an overview and application in radiology. Insights into imaging, 9:611–629, 2018.
- [52] H. You, Y. Yu, S. Silling, and M. D’Elia. Data-driven learning of nonlocal models: from high-fidelity simulations to constitutive laws. Accepted in AAAI Spring Symposium: MLPS, 2021.
- [53] H. You, Y. Yu, S. Silling, and M. D’Elia. A data-driven peridynamic continuum model for upscaling molecular dynamics. Computer Methods in Applied Mechanics and Engineering, 389:114400, 2022.
- [54] H. You, Y. Yu, N. Trask, M. Gulian, and M. D’Elia. Data-driven learning of nonlocal physics from high-fidelity synthetic data. Computer Methods in Applied Mechanics and Engineering, 374:113553, 2021.
- [55] J. Zhou, G. Cui, S. Hu, Z. Zhang, C. Yang, Z. Liu, L. Wang, C. Li, and M. Sun. Graph neural networks: A review of methods and applications. AI open, 1:57–81, 2020.
- [56] X.-H. Zhou, J. Han, and H. Xiao. Learning nonlocal constitutive models with neural networks. Computer Methods in Applied Mechanics and Engineering, 384:113927, 2021.