Theory of Machine Learning Assisted Structrual Optimization Algorith and Its Application

AIAA JOURNAL
Vol. 61, No. 10, October 2023
Theory of Machine Learning Assisted Structural Optimization

Algorithm and Its Application
Yi Xing∗ and Liyong Tong†

University of Sydney, Sydney, New South Wales 2006, Australia
https://doi.org/10.2514/1.J062195
The machine learning assisted structural optimization (MLASO) algorithm has recently been proposed to expedite
topology optimization. In the MLASO algorithm, the machine learning model learns and predicts the update of the
chosen optimization quantity in routine and prediction iterations. The routine and prediction iterations are activated
with a predefined learning and predicting scheme; and in the prediction iterations, the design variable can be updated
using the predicted quantity without running a finite element analysis and sensitivity analysis, and thus the
computational time can be saved. Based on the MLASO algorithm, this work first proposes a novel generic
criterion-driven learning and predicting (CDLP) scheme that allows the algorithm to autonomously activate
prediction iterations in the solution procedure. Second, this work presents the convergence analysis and the
Downloaded by University of Georgia on January 16, 2024 | http://arc.aiaa.org | DOI: 10.2514/1.J062195
computational efficiency analysis of the MLASO algorithm with the CDLP scheme. The MLASO algorithm is
then embedded within the solid isotropic material with penalization topology optimization method to solve two-
dimensional and three-dimensional problems. Numerical examples and results demonstrate the prediction accuracy
and the computational efficiency of the MLASO algorithm, and that the CDLP scheme can remarkably improve the
computational efficiency of the MLASO algorithm.
Nomenclature Nt = number of terms in test function

b = vector of bias n = sample size for dynamic move limit (predefined
bs = coefficient for the mapping of range learning and predicting)
c = difference between predicted and exact gradients ns = sample size
E = elastic module P = constraint function
F = global load vector p = material penalty factor
f = objective function q = penalty in the error function
fe = error function R = element centroid distance
fm = mapping function r = constant to control the spread of the radial basis
g = gradient of the objective function weightage
= the hidden layer rl = learning rate
h
ru = upper bound of initial weights and biases
K = global stiffness matrix
s = predicted gradient
Kn = total number of iterations at convergence
t = total computational time
Kp = total number of prediction iterations at convergence
td = total computational time difference
Kr = total number of routine iterations at convergence tg = total computational time for one calculation of gra-
k = iteration index dient information
ki = the set of iteration indices for all iterations tPred = total computational time for one prediction
kip = the set of iteration indices for prediction iterations ts = total time savings
kir = the set of iteration indices for routine iterations tTrain = total computational time for one training
kPL = the index of leading predictions tup = computational time of one design variable update
kp = prediction iteration index U = global displacement vector
kr = routine iteration index u = element displacement vector
krc = routine iteration index at convergence V = design domain volume
ks = entry point (predefined learning and predicting) V = volume limit
k0 = stiffness matrix for a solid element ve = element volume
L = Lipschitz constant w = matrix of weights
lx ; ly ; lz = lengths of design domains in x; y, and z directions ws = coefficient for the mapping of range
move = step size in solid isotropic material with penalization w = user-defined weight matrix
Ne = number of minimizers or elements x = design variable vector
N in = number of training data points for the training input α = move step size
Nm = number of training data points for the hidden layer β = dynamic smoothing factor (predefined learning and
N out = number of training data points for the training output predicting)
Np = number of consecutive predictions γ0 = initial dynamic weightage (predefined learning and
predicting)
Δk = increment of iteration number
Received 2 July 2022; revision received 15 May 2023; accepted for Δγ = the increment of dynamic weightage (predefined
publication 24 May 2023; published online 14 July 2023. Copyright © learning and predicting)
2023 by the American Institute of Aeronautics and Astronautics, Inc. All δ = collective representation of the rescaled g and the
rights reserved. All requests for copying and permission to reprint should be predicted s
submitted to CCC at www.copyright.com; employ the eISSN 1533-385X
ϵf = relative difference of objective function in leading
to initiate your request. See also AIAA Rights and Permissions www.aiaa.
org/randp. prediction iterations
*Ph.D. Student, School of Aerospace, Mechanical and Mechatronic εf = relative difference in the final objective function
Engineering. εKr = relative difference in the total number of routine
†
Professor, School of Aerospace, Mechanical and Mechatronic Engineering. iterations
4664
XING AND TONG 4665
ϵm = relative difference between the predicted gradient Due to the unique characteristic of TO, the FEA and sensitivity
and its exact value analysis can be computationally expensive: in particular, for large-
λ = Lagrange multiplier scale and complicated physical problems. To improve the computa-
μ = range of the gradient tional efficiency of TO, some available works have managed to skip
σ = activation function or accelerate FEA runs and sensitivity analyses in selected iterations
τc = convergence criterion or entirely skip all iterations using ML-based methods [4,7–17] or
τt = training convergence criterion non-ML-based methods [18–21]. For those ML-based methods, an
Φ = neural network input vector offline-training strategy is typically used to train the ML model with a
Ψ = neural network output vector large set of training samples before solving the target TO problem [7–
Ψ = neural network prediction output 16]. In the offline-training strategy, the generation of training samples
ω = collective representation of w and b is accomplished by solving many TO problems using conventional TO
methods, which can be time consuming. For instance, in Ref. [15],
Subscripts 15,000 training samples (including 12,000 samples for training and
3000 samples for validation) were generated for the three-dimensional
e = element e (3-D) simply supported stiffened panel with the distributed-load TO
i, j = element index numbers i and j problem. The training samples were generated by solving the con-
MLASO = results obtained using machine learning assisted sidered TO problems with 15,000 different loading conditions using
structural optimization a conventional gradient-based TO method. To generate these training
max = maximum value samples, eight CPUs were employed to work simultaneously using a
min = minimum value single CPU for each single loading case calculation, and the total
pred = values calculated in prediction iterations computational time for training sample generation was 94 h
real = values calculated using routine optimization (≈4 days). The trained machine learning model obtained using the
methods offline-training strategy can normally solve the design problems
ref = results obtained using gradient descent or solid iso- within a scope instantly and accurately, but additional training sam-
tropic material with penalization ples may need to be generated if design problems beyond the scope
are encountered. For example, in Ref. [16], 1000 training samples
Superscripts were used to train a generative adversarial network (GAN) for
structural and heat conduction TO problems, respectively. However,
ep = epoch number in a training loop to solve the multiphysics TO problems, the training for the GAN
k = iteration index required 2400 training samples in total, including 1000 structural,
kp = prediction iteration index 1000 heat conduction and 400 multiphysics training samples. To
kr = routine iteration index improve the training efficiency, an online-learning and online-
prediction strategy has been recently adopted [4,17]. By using the
online-learning and online-prediction strategy, the training process is
embedded into the iterative process of TO; and training samples are
I. Introduction collected from the historical data of the chosen optimization quan-
T HE structural design of aircraft and spacecraft needs to optimize

both weight and performance, and such optimum design is often
obtained through shape and sizing optimizations. In practice, top-
tities computed by running routine TO steps. In selected iterations,
the trained ML model is used to predict the chosen optimization
quantities so that the routine TO steps in these iterations can be
ology optimization (TO) is a powerful design tool for shape and skipped. The online-learning and online-prediction strategy can be
sizing optimizations, but its computational efficiency can sometimes instantly implemented to accelerate various types of targeted TO
be a limiting factor when solving large and complicated physical problems without extra computational effort in preparing training
problems. In recent years, a strong effort has been devoted to using samples.
machine learning (ML) to speed up TO [1–3] with various proposed In Ref. [4], the authors proposed a MLASO-d algorithm that
algorithms, among which is the machine learning assisted structural contains a simple and efficient online-training module and an
optimization (MLASO) algorithm [4]. Based on the previous work in online-prediction module that can be embedded into the popular
Ref. [4], this study presents the mathematical theory of the MLASO solid isotropic material with penalization (SIMP) method [5]. In
algorithm with a novel generic criterion-driven learning and predic- the MLASO-d algorithm, each iteration can be either a routine
tion (CDLP) scheme. iteration or a prediction iteration. In routine iterations, routine TO
In general, TO is an iterative process to redistribute the material in a steps are performed, and the exact value of the gradient of the
design domain under specified constraints, loads, and boundary objective function is calculated and used to update the design vari-
conditions until certain design objectives are optimized; and it can able. At the same time, in each routine iteration, there is a training
be formulated as a constrained optimization problem as module in which a ML model is trained to update the gradient of the
objective function using its historical data as training samples. In the
min∕max ∶ fx prediction iterations, the FEA, the sensitivity analysis, and the train-
subject to ∶ h1 x ≤ 0 ing are skipped; and the trained ML model is used to predict the
gradient of the objective function in the current iteration based on that
h2 x 0 (1) calculated in the previous iteration, which is subsequently used to
update the design variable. The routine iteration and the prediction
where x is the design variable, fx is the objective function, and iteration can be performed according to a predefined learning and
h1 x and h2 x are inequality and equality constraints. Some TO predicting (PDLP) scheme, in which both routine and prediction
methods use gradient information to search for the optima, and they iterations are performed alternatively once the first prediction iteration
are categorized as gradient-based methods. Usually, in gradient- is activated. With the MLASO-d embedded SIMP method, FEA runs
based methods, there are three key steps in each iteration: finite and sensitivity analyses in prediction iterations are replaced by com-
element analysis (FEA), sensitivity analysis, and design variable putationally cost-effective ML model predictions; thus, the overall
update. The necessary gradient information is often computed by computational efficiency is improved. Note that the ML model pre-
running FEA and sensitivity analysis, and then it is used to solve dictions in prediction iterations are data-driven computations, which
Eq. (1) directly with a gradient-based mathematical programming enable the MLASO-d to be incorporated into a variety of gradient-
method or with an optimality criteria (OC) method [5] through based optimization methods by replacing costly high-dimensional
transforming Eq. (1) into an unconstrained problem using the simulation and sensitivity analysis with the fast data-driven computa-
Lagrange multiplier method [6]. tions in selected iterations. The computational efficiency and the
4666 XING AND TONG
prediction accuracy of the MLASO-d algorithm are demonstrated size of N out × 1. In MLASO-d, the scaled gradient of the objective
and compared with the top88 algorithm [22] by using the numerical function obtained in the previous and the current iterations are used as
results of the two-dimensional (2-D) TO problems in Ref. [4]. the training sample input and output, respectively [4]. Let w be the
In this work, the computational efficiency of the MLASO-d algo- user-defined weight matrix with the size of N m × N in connecting the
rithm is further improved, and the mathematical background of the input and the hidden layer. For any given input Φ ∈ R, the output of
MLASO-d algorithm is studied. The first contribution of this work is the hidden layer is
to propose a criterion-driven learning and predicting scheme for the
MLASO-d algorithm. With the CDLP scheme, the MLASO-d algo- hΦ wΦ (4a)
rithm can autonomously decide the timing of activating single or
multiple prediction iterations. Compared with the PDLP scheme, the Let w and b be the weight matrix and bias vector with sizes of
implementation of the CDLP scheme significantly improves the N out × N m and N out × 1 connecting the hidden layer and the output
computational efficiency of the MLASO-d algorithm when solving layer; also, let σ⋅ be the rectified linear activation function for the
TO problems. Furthermore, we also demonstrate that MLASO-d can output layer, and the output of the NN is
cooperate with the multigrid preconditioned conjugate-gradient
(MGCG) method [18], which is a non-ML based TO acceleration
method, to remarkably reduce the computational time of TO. Ψ σ σ whΦ b (4b)
The second contribution of this work is to establish the mathematical
theory of the MLASO-d algorithm from two aspects: the convergence The error function (denoted as fe ) measures the sum of the mean
analysis, and the computational efficiency analysis. In the convergence square error between the target output Ψ and the NN output Ψ for N s
analysis, we assume that MLASO-d is embedded into the gradient training samples, and it is defined as
descent (GD) algorithm to solve unconstrained convex optimization

problems. GD is an iterative process to find the optima using the Ns
qn T
gradient of the objective function. In each iteration, the gradient of fe ω Ψn − Ψn Ψn − Ψn (5)
the objective function is calculated and used to update the design n1
2
variable. In the computational efficiency analysis, we compare the total
computational time needed to converge when solving the same opti- where ω is the collective representation of the weight matrices and
mization problem using GD with and without MLASO-d embedded. bias vector; qn is the user-defined penalty and, in MLASO, we choose
Based on the mathematical theory of the MLASO-d algorithm for qNs 1 and qn 0 for n 0; 1; 2; : : : ; N s − 1.
unconstrained convex optimization problems, the feasibility of apply- The error function fe can be minimized by finding the optimum
ing the MLASO-d algorithm for nonconvex and unconstrained opti- weights and bias (collectively denoted as ω ) via training, i.e.,
mization problems, like TO problems, is discussed in the present work.
Furthermore, numerical examples for 2-D and 3-D TO problems, ω arg min fe ω (6)
including an aircraft engine pylon design problem, are presented to
numerically demonstrate the convergence and the computational effi- To reduce the computational cost of training, we define w to be a
ciency of the MLASO-d embedded SIMP method for TO problems. fixed weight so that w only needs to be computed once before
running the optimization. For simplicity, when embedding the
II. MLASO-d Algorithm and Theory MLASO-d into the GD method, we define w to be an identity matrix.
As for the weight w and bias b, they are updated iteratively via a
A. Problem Statement and Algorithm
backpropagation algorithm based on the gradient descent method. In
In this section, we present the problem statement of unconstrained every training, w and b are updated by
optimization problems and the GD method; we also introduce a
method to implement the MLASO-d in GD. In the rest of this work, ∂fe
we refer to the MLASO-d embedded GD method and the MLASO-d ωep1 ωep − rl ep 0;1; 2 : : : (7)
∂ωep
embedded TO methods as the MLASO-d algorithm.
Consider an unconstrained optimization problem defined as where ωep denotes the weight and bias at the epth epoch in the current
training. The weight and bias at the end of the current training are
x arg minn fx (2) saved and used as ω0 for the next training, and ω0 for the first training
x∈R
are set to be a random number in the range of [0, ru ].
where f∶Rn → R is convex and continuously differentiable, and the After introducing the error function and training, let us introduce
gradient of function f is Lipschitz continuous with a Lipschitz the online prediction. The prediction is accomplished by a forward
constant of L > 0; x is a vector of minimizers with the size of calculation of the NN with the latest weight and bias, i.e.,
N e × 1, and N e is the total number of minimizers. The optimization
problem defined in Eq. (2) can be solved using the GD method with a s~k σ σ whΦ b (8)
constant move step size. In the GD method, the minimizer x is usually
updated iteratively in the negative gradient direction using Eq. (3),
where s~k is the predicted gradient in the kth iteration.
xk1 xk − α∇fxk k 0; 1; 2; : : : (3) In MLASO-d, all iterations can be either routine iterations or
prediction iterations. A routine iteration updates the design variable
where xk is the minimizer in kth iteration, α > 0 is the constant using the exact gradient calculated via the routine GD step and trains
move step size, and ∇fxk is the gradient of the objective function; the ML model, whereas a prediction iteration updates the design
and we define gk ∇fxk . The iterative process stops when xk1 variable using the predicted gradient calculated using the ML model.
reaches x and the objective function reaches its minimum value as Let us define ki as the set of iteration indices for all iterations and kir
f fx . and kip as the sets of iteration indices for routine and prediction
Then, let us introduce the MLASO-d algorithm. The ML model iterations, respectively, and so kir ⊆ ki kip ⊆ ki kir ∩ kip ∅, and
used in MLASO-d is a feedforward neural network (NN) with an kir ∪ kip ki . In routine iterations, the design variable is updated
input layer, an output layer, and one hidden layer. The input layer and using the exact gradient gk :
the hidden layer are fully connected, whereas the hidden layer and the
output layer are locally connected. The number of neurons in these xk1 xk − αgk ; ∀k ∈ kir (9)
layers is N in N out , and N m , respectively. Assume the training sample
pair is fΦ; Ψg, which consists of training sample input matrix Φ with Training is conducted in every routine iteration. To ensure the
a size of N in × 1, and its corresponding target output matrix Ψ has the selection of the training parameters is independent of the optimization
XING AND TONG 4667
problems in MLASO, we scale the exact gradient to the range of Based on Eq. (13e), we define our activation criterion using
[0, 1]. The scaled exact gradient (denoted as g~ k−1 ) or the predicted Eq. (14a):
gradient in the previous iteration (denoted as s~k−1 ) is used as the p
training sample input (i.e., Φ δk−1 ; δ is the collective representa- ϵa ksk − gk−1 k αLkgk−1 k − 1 − αLksk k ≤ 0 k > 0
tion for the rescaled g~ and predicted s), ~ whereas the scaled exact (14a)
gradient in the current iteration is used as the training sample output
for the current training (i.e., Ψ g~k ). so that when the activation criterion is satisfied, we have
In prediction iterations, the input to the NN is the scaled exact
gradient g~ k−1 or the predicted gradient s~k−1 computed in the previous p
iteration (i.e., Φ δk−1 ), and the prediction output s~k is in the range ksk − gk−1 k αLkgk−1 k − 1 − αLksk k
of [0, 1]. Before using the predicted gradient to update the design p
variable, s~k needs to be scaled to an estimated range of [μmin ; μmax ], × ksk − gk−1 k αLkgk−1 k 1 − αLksk k ≤ 0 (14b)
where
2
μ ws μkr bs (10a) ksk − gk−1 k αLkgk−1 k ≤ 1 − αLksk k2 (14c)
and μkr is the range of g in the most recent routine iteration; ws and bs Noting Eqs. (13e) and (14c), one can obtain Eq. (14d) for all
are the coefficients for a mapping from the second most recent routine activated prediction iterations:
iteration (denoted as μkr −1 ) to μkr [i.e., fm ws ; bs ∶μkr −1 → μkr ], and
kck k2 ≤ 1 − αLksk k2 (14d)

kr −1
μ ws μ
kr
bs (10b)
The Lipschitz constant L is difficult to calculate. So, in practice, we
Here, we denote the predicted gradient after scaling as sk , and the approximately let αL αLk in the kth iteration so that the activation
design variable is updated as criterion can be examined at the beginning of iteration k > 1, and
αLk can be calculated as
xk1 xk − αsk ; ∀k ∈ kip (11)
αLk−1 αLk−2
αLk k > 1 (15a)
The routine and prediction iterations are activated according to the 2
CDLP scheme. An activation criterion needs to be examined at
the beginning of the following iteration of every routine iteration: if with L0 L1 being assumed, and
the activation criterion is satisfied, the prediction iterations will be
activated; otherwise, a routine iteration will be conducted. kδk−2 − δk−1 k
αLk−1 α ; k > 2 (15b)
kxk−2 − xk−1 k
B. CDLP Scheme
In CDLP, the prediction iteration is activated when the difference kδk−3 − δk−2 k
αLk−2 α ; k > 2 (15c)
between the predicted gradient and its exact gradient is small. Let us kxk−3 − xk−2 k
define ck as a measurement for the difference between the predicted
sk and its exact value gk , and where δk represents the exact gradient if the kth iteration is a routine
iteration, and δk is the rescaled predicted gradient if the kth iteration is
ck sk − gk (12) a prediction iteration.
In the CDLP scheme, MLASO-d starts with two routine iterations
The value of gk is usually unknown at the beginning of an iteration, and checks the activation criterion at the beginning of the third
but the norm-2 of ck can be calculated as iteration. If the activation criterion is not satisfied, a routine iteration
will be conducted in the third iteration; otherwise, single or multiple
kck k ksk − gk k prediction iterations will be performed in the third and following
iterations, and the number of consecutive predictions is controlled by
ksk − gk−1 gk−1 − gk k (13a)
a user-defined parameter N p (N p ≥ 1). For the default setting, N p
1 and only one prediction iteration will be activated once the activa-
where gk−1 is the exact gradient calculated in the k − 1th iteration. tion criterion is satisfied. The multiprediction function is enabled if
Thus, by using the triangle inequality, one obtains N p > 1 and N p consecutive predictions will be performed. We define
the first prediction iteration after each activation as the leading
kck k ≤ ksk − gk−1 k kgk−1 − gk k (13b)
prediction, and the following N p − 1 prediction iterations will be
activated automatically without checking the activation criterion.
Assume the gradient of function f is Lipschitz continuous on Q and
After the prediction iterations, a routine iteration is conducted; and
there is a constant of L ≥ 0 such that, for all x; y ∈ Q, we have [23]
the activation criterion is examined again at the beginning of the
following iteration. The MLASO-d algorithm with the CDLP
kgx − gy k ≤ Lkx − yk (13c)
scheme is summarized and listed in Algorithm 1.
where L is also named the Lipschitz constant for function f on Q. Let
C. Mathematical Theory
gx and x be gk−1 and xk−1 , and let gy and y be gk and xk :
In this section, we present the mathematical theory of the MLASO-d
kck k ≤ ksk − gk−1 k Lkxk−1 − xk k with the CDLP scheme from two aspects: the convergence analysis,
and the computational efficiency analysis. To analyze the conver-
≤ ksk − gk−1 k αLkgk−1 k (13d) gence of the MLASO-d algorithm, we consider using MLASO-d
to solve unconstrained optimization problems defined by Eq. (2).
and hence To analyze the computational efficiency, we compare the total
computational time needed to reach the same convergence criterion
kck k2 ≤ ksk − gk−1 k αLkgk−1 k
2
(13e) when solving the same unconstrained optimization problem using
MLASO-d and GD.
4668 XING AND TONG
Algorithm 1: Algorithm of MLASO-d with CDLP scheme

1: Initialization: x0 , routine 0
2: Input: N p
3: for k 0; 1; 2; 3; 4; : : :
4: if k > 1
5: Do prediction, and calculate the scaled gradient sk ;
6: αLk αLk−1 αLk−2 ∕2;
7: if routine 0,
p
8: ϵa ksk − gk−1 k αLk kgk−1 k − 1 − αLk ksk k;
9: else
10: ϵa 0;
11: end if
12: if ϵa ≤ 0, ▸ Check activation criterion
13: routine routine 1.001∕N p 1; ▸ Limit the number of consecutive predictions to N p
14: if routine is less than one,
15: δk sk ; ▸ Update δk
16: else
17: routine 1;
18: end if
19: else
20: routine 1;
21: end if
22: end if
23: if routine ≥1 || k ≤ 1, ▸ If prediction is not activated, do routine iteration
24: gk ∇fxk ; ▸ Calculate the exact gradient
25: δk gk ; ▸ Update δk
26: Do training, update w and b;
27: routine 0;
28: end if
29: xk1 xk − αδk ; ▸ Update design variable
30: if k > 0,
kδk−1 −δk k
31: αLk α kx k−1 −xk k
;
32: end if
33: end for
1. Convergence Analysis Proposition 2: By using the CDLP scheme in MLASO-d,

The convergence analysis requires the following assumptions: Eq. (14d) holds in all prediction iterations. Based on Eq. (14d), we
Assumption 1: The objective function fx is a lower-bounded can derive Eq. (19) for all prediction iterations with x; y ∈ m; n,
convex function. noting that at point x, we have gx ∇fxsx gx cx and
Assumption 2: Note that fx is differentiable and L smooth, and
its gradient is Lipschitz continuous with constant L > 0 and 1 − αLksx k2 ≥ kcx k2
0 < Lα ≤ 1; thus, kgx − gy k ≤ Lkx − yk for any x and y in the range
of [m, n]. ksx k2 − kcx k2 ≥ αLksx k2
First, we reiterate and derive some statements for the convergence kgx cx k2 − kcx k2 ≥ αLksx k2
analysis of the MLASO-d algorithm using the following proposition
and lemmas: kgx k2 2hgx ; cx i ≥ αLksx k2
Proposition 1: If Assumptions 1 and 2 are valid, by using the
1 ks k2
Newton–Leibniz formula, the multivariate Taylor theorem with 12 ; cx ≥ αL x 2
the integral remainder term, and the integration by part formula, gx kgx k
the second-order approximation for f around fx is 1 ks k2
hgx ; x − yi 2 ; cx hgx ; x − yi ≥ αL x 2 hgx ; x − yi
fy fx gTx y − x gx kgx k
ksx k2

1
gx τb y − x − gx T y − x dτb (16) 2hcx ; x − yi ≥ hgx ; x − yi αL −1
kgx k2
0
(19)
By using the Cauchy–Schwarz inequality and Assumption 2, for
any x and y in the range of [m, n], the second-order approximation for
fy can be rewritten as in Eq. (17): The following Lemma proves the descent property of the MLASO-d
algorithm.
L Lemma 1: Supposing Assumptions 1 and 2 are valid, with the
fy ≤ fx gTx y − x ky − xk2 ; ∀ x; y ∈ m; n (17)
2 CDLP scheme, the objective function fx is monotonically non-
increasing in all iterations; i.e.,
Also, due to the convexity and the smoothness of function f, the
theorem in Ref. [23] proves Eq. (18) holds for any x and y in the range
of [m, n]: fxk1 ≤ fxk ; ∀k ≥ 0 (20)
1 Proof: Let us replace y and x in Eq. (17) with xk1 and xk ; then, we
k∇fx − ∇fyk2 ≤ h∇fx − ∇fy; x − yi (18)
L have the following:
XING AND TONG 4669
For routine iterations, Proof: Because Rk Lα ≤ 0,
L Rk Lα ≤ 0 (24)
fxk1 ≤ fxk gTk xk − αgk − xk kx − αgk − xk k2
2 k k∈kip
Lα
≤ fxk − α 1 − kgk k2 (21a) In routine iterations, due to Eq. (21b),
2
Lα
Based on Assumption 2, α ≤ 1∕L; so, in routine iterations, α 1− kgk k2 ≤ fxk − fxk1 (25a)
2
Lα In prediction iterations, due to Eq. (22c),

fxk1 − fxk ≤ −α 1 − kgk k2 ≤ 0 (21b)
2
α
kg k2 − Rk Lα ≤ fxk − fxk1 (25b)
2 k
For prediction iterations,
So, by taking the summation of fxk − fxk1 over all iterations
L (i.e., from k 0 to k K n − 1), noting 0 < Lα ≤ 1,
fxk1 ≤ fxk gTk xk − αsk − xk kxk − αsk − xk k2
2
(22a) α Lα
kg k2 − Rk Lα α 1 − kgk k2
2 k 2
k∈kip k∈kir
Based on Eq. (12), substituting sk gk ck into Eq. (22a) yields
≤ fxk − fxk1
k∈ki
Lα2
fxk1 ≤ fxk − αgTk gk ck kgk ck k2 α
2 kg k2 − Rk Lα
2 k
≤ fxk − αgTk gk − αhgk ⋅ ck i k∈kip
Lα2 ≤ fx0 − fx1 fx1 − fx2 : : : − fxKn

kgk k2 hgk ⋅ ck i hck ⋅ gk i kck k2
2 α
kg k2 − Rk Lα ≤ fx0 − fxKn (26)
Lα Lα2 2 k
≤ fxk − α 1 − kgk k2 kck k2 k∈kip
2 2
α1 − Lα Because of Eq. (24), we have
− hck ⋅ gk i hgk ⋅ ck i
2
α Lα2 Rk Lα ≥ −fx0 fxKn (27a)
≤ fxk − kgk k2 kck gk k2 k∈kip
2 2
α
− kck gk k2 − kck k2 Because of Assumption 1 and Lemma 1, function fx is lower
2
α α bounded and monotonically nonincreasing, and one can obtain
≤ fxk − kgk k2 Lα − 1kck gk k2 kck k2
2 2
(22b) Rk Lα ≥ −fx0 fx (27b)
k∈kip
Letting
Equations (24) and (27b) correspond to the left- and right-hand
sides of Eq. (23), and
α
Rk Lα −1 − Lαkck gk k2 kck k2
2 Rk Lα
k∈kip
with the CDLP scheme, Eq. (14d) must hold in all prediction iter-
ations; thus, Rk Lα ≤ 0 and must be a negative finite number.
The following theorem proves the convergence of the MLASO-d
α algorithm for unconstrained optimization with convex and smooth
fxk1 − fxk ≤ − kgk k2 Rk Lα ≤ 0 (22c)
2 functions.
Theorem 1: If Assumptions 1 and 2 are valid, starting from any
Equations (21b) and (22c) illustrate the objective function in both arbitrary initial value x0 , the objective function defined in Eq. (2) can
prediction and routine iterations is monotonically nonincreasing; always converge to its minimum value [denoted as fx ] if design
therefore, Eq. (20) is proved. variable x is updated using Eqs. (9) and (11) with a CDLP scheme.
The next lemma shows, with the CDLP scheme, the summation of The convergence of the objective function can be represented as
Rk Lα, over all prediction iterations is a negative finite number.
Lemma 2: If Assumptions 1 and 2 hold, the summation of Rk Lα lim fxKn − fx 0 (28)
for all prediction iterations is a negative finite number within the K n →∞
range
Proof: In routine iterations,
0≥ Rk Lα ≥ −fx0 fx (23)
kxk1 − x k2 kxk − x − αgk k2
k∈kip
kxk − x k2 − 2αhxk − x ; gk i α2 kgk k2 (29)

where fx0 and fx are the objective function values calculated
using the initial value of the minimizer x0 and using the optima Substituting x and y in Eq. (18) with xk and x , noting
value x . gk ∇fxk ∇fx 0, gives
4670 XING AND TONG
1 Because of Eq. (21b), in routine iterations,

kg k2 ≤ hgk ; xk − x i (30)
L k Lα
fxk1 − fx ≤ fxk − fx − α 1 − kgk k2
2
Thus, from Eq. (29), we have α 1 − Lα
2
≤ fxk − fx − fxk − fx 2
2α kx0 − x k2
kxk1 − x k2 ≤ kxk − x k2 − kgk k2 α2 kgk k2 (38a)
L
2
≤ kxk − x k2 − α2 − 1 kgk k2 (31) multiplying Eq. (38a) by
Lα
1
In prediction iterations, fxk1 − fx fxk − fx
we have
kxk1 − x k2 kxk − x − αsk k2
kxk − x k2 − 2αhxk − x ; sk i α2 ksk k2 1 1 α 1 − Lα fxk − fx
≤ −
2
fxk − fx fxk1 − fx kx0 − x k fxk1 − fx
2
kxk − x k2 − 2αhxk − x ; gk ck i α2 ksk k2
α 1 − Lα fxk − fx 1 1
kxk − x k2 − 2αhxk − x ; gk i
2
≤ −
kx0 − x k fxk1 − fx fxk1 − fx fxk − fx

2
− 2αhxk − x ; ck i α2 ksk k2 (32) (38b)
We substitute x; y; sx ; gx , and cx in Eq. (19) with xk ; x ; sk ; gk , and Because fxk1 − fx ≤ fxk − fx ,
ck ; and
1 1 α 1 − Lα
2
ks k2 − ≥ (38c)
2hck ; xk − x i ≥ hgk ; xk − x i αL k 2 − 1 (33a) fxk1 − fx fxk − fx kx0 − x k2
kgk k
In prediction iterations, based on Eq. (22c),
Because of Eq. (30), we have α
fxk1 − fx ≤ fxk − fx − kgk k2 Rk Lα
2
1 ks k2 α
2hck ; xk − x i ≥ kgk k2 αL k 2 − 1
≤ fxk − fx − fxk − fx 2
L kgk k 2kx0 − x k2
ksk k2 1 Rk Lα (39a)
≥ kgk k2 α −
kgk k2 L
multiplying Eq. (39a) by
1
≥ αksk k2 − kgk k2 (33b)
L 1
fxk1 − fx fxk − fx
Substituting Eqs. (30) and (33b) into Eq. (32) yields
we have
kxk1 − x k2 ≤ kxk − x k2− 2αhxk − x ; g ki − α2 ks k k2 1 1 α fxk − fx
α ≤ −
kgk k2 α2 ksk k2 fxk − fx fxk1 − fx 2kx0 − x k fxk1 − fx
2
L
α Rk Lα
≤ kxk − x k2 − kgk k2 (34)
L fxk1 − fx fxk − fx
1 1 α
− ≥
Because 0 < Lα ≤ 1, we can conclude from Eqs. (31) and (34) that fxk1 − fx fxk − fx 2kx0 − x k2
kxk − x k2 is a nonincreasing sequence for all iterations and Rk Lα
− (39b)
kxk − x k2 ≤ kx0 − x k2 (35)
By taking the sum of Eqs. (38c) and (39b) over all iterations (i.e.,
Due to the convexity of the considered problem and the Cauchy– from k 0 to k Kn − 1),
Schwarz inequality, in both routine and prediction iterations, we have
1 1 1
≥ −
fxKn − fx fxKn − fx fx0 − fx

fx ≥ fxk gTk x − xk (36a)
α α 1 − Lα
≥ K
2 p
2
Kr
2kx0 − x k kx0 − x k2
fxk − fx ≤ gTk xk − x ≤ kgk kkxk − x k ≤ kgk kkx0 − x k
Rk Lα
(36b) − fx − fx
k∈kip
fxk1 − fx k
α α 1 − Lα
So, ≥
Kp 2
Kr
2kx0 − x k 2 kx0 − x k2
fxk − fx Rk Lα
≤ kgk k (37) − 2
(40a)
kx0 − x k k∈kip
fx0 − fx
XING AND TONG 4671
where K p and Kr are the total numbers of routine and prediction The reduction in routine iterations is accompanied by extra com-
iterations. Because of Lemma 2, let putational costs for training and prediction, and so the secondary
condition for saving total computational time is that the time saved by
Rk Lα reducing routine iterations must cover the extra time costs of training
zR
k∈kip
fx0 − fx 2 and prediction. Here, we define the computational time used to do
one calculation of gradient information, one design variable update,
one training, and one prediction as tg ; tu , ttrain , and tpred ; and the
and zR is a negative finite number; then, difference between the total computational time when solving the
same optimization problem by using a gradient-based optimization
1 α α 1 − Lα
≥
Kp 2
Kr (40b) method and MLASO-d is defined as td . Based on the primary and
fxKn − fx 2kx0 − x k2 kx0 − x k2 secondary conditions for time savings, we establish the following
theorem:
Theorem 2: Note that td ≥ 0 when the following condition is
Because Lα ≤ 1, we can derive from Eq. (40b) that satisfied:
1 α Np 1
≥ K Kr (41a) 1 ≥ −εKr ≥ (46)
fxKn − fx 2kx0 − x k2 p tg
tu ttrain tpred
and thus
Proof: Let us define the total computational time when solving the
2kx0 − x k2 same optimization problem using a gradient-based optimization

fxKn − fx ≤ (41b) method and MLASO-d as t 0 and t, and
αKp K r
t K r × tg ttrain tu K n − K r × tu Kn − 2 × tpred
Thus, when K n Kp Kr ∞, the right-hand side of Eq. (41b)
approaches zero, and Theorem 1 is proved. □ K r × tg ttrain K n × tpred tu − 2 × tpred
≤ Kr × tg ttrain K n × tpred tu (47a)
2. Computational Efficiency Analysis
Let us define the stopping criterion for the unconstrained optimi-
t 0 K n0 × tg tu (47b)
zation problem as fxk − fx ≤ τc using Eq. (41b); the total
number of routine and prediction iterations needed to converge to
the stopping criterion in MLASO-d is given by Hence, td can be calculated as
2kx0 − x k2 td t 0 − t ≥ K n0 × tg tu − K r × tg ttrain − Kn × tpred tu

Kn Kp Kr ≤ (42)
ατc ≥ Kn0 − K r × tg K n0 − K n × tu − Kr × ttrain
The convergence of the GD method with the constant move step − Kn × tpred (48)
size α for the same unconstrained optimization problem can be
proved using the same method as in Theorem 1 by considering all In Eq. (48), if K n0 ≥ Kn , noting Kn ≥ Kr , we have
iterations as routine iterations. Because the GD method has been
proved to be monotonically nonincreasing [24], fxk1 − fx ≤ td ≥ K n0 − K r × tg K n0 − Kn0 × tu − Kn0 × ttrain − Kn0 × tpred
fxk − fx . Based on Eq. (38c), the sum of
≥ K n0 − K r × tg − K n0 × ttrain tpred (49)
1 1
− which implies td ≥ 0 when
tg K0 1
over all iterations is (i.e., from k 0 to k Kn0 − 1, where K n0 is the ≥ 0 n (50)
total number of iterations in GD) ttrain tpred Kn − Kr −εKr
1 1 1 In Eq. (48), if K r ≤ Kn0 ≤ K n , we have

≥ −
fxKn0 − fx fxKn0 − fx fx0 − fx
0
td ≥ K n0 − K r × tg 0 − Kn × tu − K n × ttrain − Kn × tpred
α 1 − Lα
2 Kn
≥ 2
(43) ≥ K n0 − K r × tg − K n tu ttrain tpred (51)
kx0 − x k
Noting the stopping criterion and Lα ≤ 1, one has which implies td ≥ 0 when
kx0 − x k2 2kx0 − x k2 tg Kn Kn
Kn0 ≤ ≤ ≥ (52)
α 1 − Lα ατc
(44) tu ttrain tpred K n0 − Kr K n0 −εKr
2 τc
MLASO-d targets to reduce the number of routine iterations to Because K r ≤ Kn0 ,

save the total computational time; thus, the primary condition for
saving computational time is K r ≤ Kn0 . Let us define the relative Kn Kn K
≤ ≤ n (53)
difference between the number of routine iterations at convergence Kn0 Kr Krmin
for GD and MLASO-d as εKr :
and Krmin is the minimum number of routine iterations in MLASO-d
K − K0 with the CDLP scheme. In the CDLP, the more the prediction
εK r r 0 n (45)
Kn iterations are activated, the less the routine iteration is conducted;
so, K rmin can be achieved by activating predictions after every single
and 0 ≤ −εKr ≤ 1 if Kr ≤ K n0 . routine iteration since the second routine iteration, and
4672 XING AND TONG
Kn − 2 K −2 We also use εf to measure the relative difference between the final

Krmin floor 2≥ n 1 (54)
Np 1 Np 1 objective function and
Hence, fMLASO − fref

εf (60)
fref
Kn K K Np 1
≤ n ≤ −2 n ≤ Np 1 (55)
K n0 Krmin NKn1 1 1 N p −1 where fref and fMLASO are the final objective functions calculated by
p Kn
using GD and MLASO-d. To measure the difference between −εKr
So, the condition in Eq. (52) can be satisfied if the following strict and −εKr , we define
condition is satisfied:
ϵc max 0; −εKr εKr (61)
tg Np 1
≥ (56)
tu ttrain tpred −εKr Based on Theorem 2, MLASO-d saves computational time when
ϵc > 0.
Combining Eqs. (50) and (56), one can conclude that td ≥ 0 if the All problems, including the TO problems presented in Sec. III.C,
conditions in Eq. (57) hold: are solved by using an Intel® Core™ i7-8700 CPU with 3.20 GHz
and a PC with 32 GB of RAM.
K n0 ≥ K r Table 2 lists the total number of routine iterations at convergence
when GD and MLASO-d with CDLP (N p 1) are used to solve
tg Np 1 (57)
≥ considered test functions. Compared to the GD, the MLASO-d

tu ttrain tpred −εKr reduces the total number of routine iterations when solving all test
functions. Table 2 also lists the objective function at convergence
Note that εKr is unknown before solving an optimization problem; when GD and MLASO-d are used. The relative differences between
but from Eq. (57), we can estimate the expected relative difference of the final objective functions calculated by GD and MLASO-d
routine iterations (denoted as εKr ) using Eq. (58a): (denoted as εf ) are within 1% for test functions P1 and P2 and
0% for test function P3. Evidently, the MLASO-d algorithm reduces
Np 1 the number of routine iterations as compared to the GD algorithm,
−εKr tg (58a) and the minimum values of the objective function predicted by
tu ttrain tpred MLASO-d are almost the same as those calculated by GD.
When solving optimization problems as simple as test functions
Thus, to save total computational time, one requires P1, P2, and P3 using MLASO-d (N p 1), the condition for time
1 ≥ −εKr ≥ −εKr ; and Theorem 2 is proved. □ savings, as proposed by Theorem 2, is not satisfied because their
Based on Theorem 2, 1 > −εKr is necessary to guarantee savings gradient information can be calculated rapidly and the actual reduc-
in the total computational time, which suggests that when solving tion in routine iterations is much smaller than expected. Here, we
optimization problems with CDLP, N p can be selected from the extend the number of terms in test function P3 by repeating the first
following range: four terms multiple times to simulate test functions that take more
time than P3 to calculate the gradient information. We denote the
tg number of terms as N t (N t 4 in test function P3) and extend test
− 1 > Np ≥ 1 (58b)
tu ttrain tpred function P3 to N t 8; 16; 32; 64, and 128 (see Appendix A for the
extended test function P3). Increasing N t increases the computation
where tg ; tu ; ttrain , and tpred can be acquired in the first several time of gradient calculations tg , thus lowering the expected routine
iterations. iteration reduction −εKr . Based on Eq. (46), we observe that ϵc ≥ 0 as
N t ≥ 32 and N p 1, which suggests that the implementation of
D. Numerical Examples for Unconstrained Optimization MLASO-d (N p 1) guarantees the saving in total computational
In this section, we use numerical examples for unconstrained opti- time when N t is larger than 32. The actual relative time savings ts for
mization to demonstrate the convergence and the computational effi- solving these extended test functions are depicted in Fig. 1. Because
ciency for MLASO-d. In these examples, GD and MLASO-d are used N p 1, MLASO-d starts to save the total computational time as
to minimize test functions listed in Table 1 [25] (see Appendix A for N t ≥ 8, and more time is saved as N t increases from eight to 128,
details of test functions) with a constant move step size; and small reaching ts −44% at N t 128. The time-saving result for N p 1
move step sizes are chosen to ensure the convergence of GD. For the implies that Theorem 2 provides a reasonable estimation for the
structure of the NN, we define three layers with the same number of computational efficiency of MLASO-d, although it seems to be a
neurons, where N in N out N m N e and the hidden layer is con- strict condition. In addition, the time-saving results for different N t
nected singly to the output layer. The upper bound of the initial weight cases imply that MLASO-d can be more suitable for accelerating TO
and bias is ru 1, the learning rate is rl 0.2; and the stopping problems because calculating gradient information in TO typically
criterion for training is fe ω ≤ τt , with τt 10−6 . The optimization involves large-scale and time-consuming finite element simulations.
is assumed to be converged if kfxk − f k ≤ 1 × 10−18 , and we use When solving problems with larger tg, the upper bound of N p to
ts to measure the relative difference between the total computational guarantee a savings in computational time also increases. It is sug-
time used by GD and the MLASO-d; and gested by Eq. (58b) that N p needs to be smaller than or equal to 125
and 51 when N t increases from 16 to 128, which signifies that more
t − t0
ts (59)
t0 Table 2 Number of routine iterations and minimum objective
function for solving test functions using GD and MLASO-d with CDLP
scheme (Np 1)
Table 1 The list of test functions
Test function α Kn0 fref Kn Kr fMLASO εKr ; %
Function index Function name Ne f Reference
P1 0.0012 146 9.33 × 10−19 158(80) 9.32 × 10−19 −45
P1 Extended DENSCHNF 30,000 0 [25]
P2 Generalized quartic 30,000 0 [25] P2 0.065 150 7.92 × 10−19 158(87) 7.90 × 10−19 −42
P3 DIXMAANA-DISMAANL(A) 30,000 1 [25] P3 0.055 204 1 215(109) 1 −47
XING AND TONG 4673
A. Problem Statement
SIMP is a popular gradient-based TO method for solving the
minimum compliance TO problems with volume constraint [5,22].
In SIMP, the design domain Ω is discretized with N e elements. In
each iteration, the density of these elements is redistributed to min-
imize compliance. Let us define the density for the eth element,
which is also the design variable, as xe e 1; 2; 3 : : : N e ; and the
material model in SIMP is defined as [5]
Ee Emin xe p E0 − Emin (62)
where Ee , E0 , and Emin are the Young’s modulus for the eth element,
solid elements, and void elements; and a penalty factor of p 3 is
often used [5].
We also define the minimum compliance TO problem using [5]
Ne
min∶ fx C UT KU Ee uTe k0 ue
Fig. 1 The value of ϵc and the actual time savings ts achieved by solving
e1
test function P3 with Nt terms using MLASO-d with CDLP scheme.
KU F
e ve xe
subject to∶ Vf (63)
predictions can be activated to save more computational time for test V
functions with a larger N t ; for example, with N t 128, more com- 0 ≤ xe ≤ 1
putational time can be reduced if N p increases from N p 1 to
N p 11. Note that Eq. (58b) offers a guideline for selecting N p that where the objective function fx calculates the structural compli-
can be safely used; but typically, higher N p can be used to push the ance; ue is the displacement of the eth element, k0 is the stiffness
limit of the computational efficiency of the MLASO-d algorithm. matrix for a solid element; K and U are the global stiffness matrix and
However, if N p is too large, the prediction quality can be affected; and the global displacement vector, F is the load vector; ve and V are the
more trainings need to be conducted to find the correct mapping from volume of element e and the volume of the design domain; and V f is
the predicted gradient in previous iterations to the exact gradient the volume fraction.
calculated in the current iteration. Intuitively, at N t 8, MLASO-d
should have more time savings when N p is larger than nine ; but in B. Solution Method and Algorithm
fact, as N p increases from N p 9 to N p 11, the time or cost of one The framework of SIMP includes three key steps in each iteration:
training doubles, resulting in less time savings. In general, N p > 1 the FEA, the sensitivity analysis, and the design variable update
seems to be the best option for solving the considered test functions [5,22]. By implementing MLASO-d into the SIMP method, the
because MLASO-d (N p > 1) reduces the total computational time framework of SIMP remains in routine iterations; but in prediction
for almost every N t cases. iterations, the FEA and the sensitivity analysis are skipped, and the
gradient of the objective function is predicted via machine learning.
E. Applicability Discussion Let us represent the design variable in MLASO-d as a function of
The problem in Eq. (1) is generic, and it can be convex or gradient information xe δ, and δ is the exact gradient information
nonconvex. The MLASO-d algorithm determines the gradient of calculated using routine SIMP TO steps in routine iterations (i.e.,
the objective function in the next few iterations from these gra- δ g); whereas δ is the scaled predicted gradient calculated using
dients in the previous few iterations via machine learning. In other machine learning in prediction iterations (i.e., δ s). Equation (63)
words, the MLASO-d algorithm can be viewed as an alternative is solved by updating the design variable xe δ iteratively to minimize
method of determining the gradient of the objective function at the compliance. The first two iterations are routine iterations. Starting
some iterations in a broad feasible gradient-based optimization from the third iteration, the predicted gradient information for each
method. In Sec. II.C, it is mathematically proven that the gradient element s~e is calculated, and the activation criterion for prediction
of the objective function determined by using the MLASO-d iteration needs to be examined to decide the activation of routine and
algorithm can guarantee convergence and efficiency when the prediction iterations; then, the following steps will be performed
problem in Eq. (1) is convex. Although this mathematical proof accordingly:
cannot be readily extended to the generic nonconvex case, it is In routine iterations, the nodal displacements are calculated by
believed that MLASO-d algorithm should be capable of efficiently solving the equilibrium equation KU F, and the objective function
solving some nonconvex problems within the framework of a is computed based on the nodal displacements [5]; the predicted
feasible gradient-based solution method. Therefore, Sec. III aims gradient information is discarded, and the exact gradient of the
to illustrate how the MLASO-d algorithm can be integrated into objective function ge for element e is calculated via a sensitivity
the SIMP method, as well as to numerically demonstrate the analysis using Eq. (64) [5]. To guarantee that the selection of the
expected convergence and efficiency. training parameters is independent of the TO problems, ge is scaled to
[0, 1] before the training and the scaled gradient (denoted as g~ e ) is
rescaled back to its original range once the training is completed; the
training process is conducted to update the ML model by using the
III. MLASO-d Embedded SIMP Method scaled g~ e in the previous and current iterations as the training input
This section presents a method that embeds the MLASO-d algo- and target output:
rithm within the solid isotropic material with penalization method for
solving the minimum compliance TO problems. First, we present the ∂C
ge −pxe δp−1 E0 − Emin uTe k0 ue (64)
problem statement for the minimum compliance TO problems, fol- ∂xe
lowed by an introduction to the implementation of MLASO-d in
SIMP. The computational efficiency and the prediction accuracy of In prediction iterations, the predicted gradient information s~e is
the MLASO-d algorithm are demonstrated by solving 2-D and 3-D adopted. The value of the predicted gradient information s~e is in the
numerical examples. range of [0, 1], and it needs to be scaled to an estimated range
4674 XING AND TONG
[μmin ; μmax ]; we denote the scaled gradient for the eth element as se . where Rij is the centroid distance between elements i and j, and
The algorithms for the routine and prediction iterations are demon- r 0.1; the maximum radius used to define the local connectivity
strated in Appendix B. between the hidden layer and the output layer is Rmin [4], and
Once the value of δe is calculated or predicted, the optimization Rmin 1. The learning rate is rl 0.2.
problem in Eq. (63) can be transformed into an unconstrained opti- 3) The convergence criteria for the 2-D and 3-D problems are the
mization problem by using the standard OC method [5]. In the OC same as the ones used in Ref. [5], and they are defined in Eqs. (68a)
method, the Lagrange multiplier λ is calculated to satisfy the volume and (68b). In this study, τc 1 × 10−5 for 2-D design problems and
constraint; then, Be is calculated as [5] τc 1 × 10−6 for 3-D design problems:
−δe ∂V kxk−1 − xk k
Be ; and 1 (65) p ≤ τc (68a)
∂V ∂xe
λ ∂xe Ne
kxk−1 − xk k
Based on Be , the design variable xe can be updated using the ≤ τc (68b)
heuristic updating scheme as [5] Ne
maxxmin ; xe − move if xe B0.5

e ≤ maxxmin ; xe − move
xnew
e xe B0.5
e if maxxmin ; xe − move ≤ xe B0.5
e ≤ min1; xe move (66)
min1; xe move if min1; xe move ≤ xe B0.5

e
where xmin is the minimum allowable value for xe ; move is the move Before we present the numerical results, let us denote the final
limit in SIMP; and, normally, xmin 0 and move 0.2. compliance computed by top99neo, top3D125 as fref , and the one
By using the CDLP scheme, Eq. (66) is transformed into predicted by MLaSO-d as fMLASO , and we use Eq. (60) to calculate
the relative difference between fref and fMLASO . Also, Eqs. (45) and
xnew
e xe − αδe (67a) (59) are used to calculate the relative differences between the total
number of routine iterations and the overall computational time when
where solving TO problems; Kn0 and tref are the total number of routine
xe − xmin if xe − αδe ≤ xmin

αδe − max −move; min move; −xe 1 − B0.5
e if xmin ≤ xe − αδe ≤ 1 (67b)
xe − 1 if xe − αδe ≥ 1
Then, αLk can be calculated using Eq. (15a) and used to examine iterations and the total computational time for top99neo and
the activation criterion in the next iteration. The same procedure will top3D125 algorithms.
be repeatedly conducted until compliance is minimized.
1. 2-D Topology Optimization Examples
C. Numerical Results and Discussion Four 2-D design problems are considered in the section: the
The top99neo and top3D125 algorithms [5] are the latest versions Messerschmitt–Bölkow–Blohm (MMB) beam problem, the mid
of the SIMP method for solving 2-D and 3-D TO problems, respec- cantilever problem, the Michell beam problem, and the distributed-
tively. Compared to top88 [22], top99neo and top3D125 accelerate load beam problem. In this work, these four design problems are
referred to as problems A, B, C, and D. The geometry, boundary, and
the FEA computation and design variable updates. In this section, we
loading conditions, as well as the material properties and volume
assess the computational efficiency and the prediction accuracy of the
fraction for each design problem, are presented in Fig. 2. For all
MLASO-d algorithm by comparing 2-D and 3-D numerical results
obtained by using MLASO-d with those obtained by top99neo and
top3D125 algorithms. In addition, we also compare the prediction
accuracy and the computational efficiency of the MLASO-d algo-
rithm with CDLP and PDLP schemes to illustrate the superiority of
the CDLP scheme.
The following general settings are defined when using MLASO-d:
1) When using the PDLP scheme, ks 5 for all design problems,
where ks is the total number of initial routine iterations before
the first ML prediction iteration; and the following parameters
of the exponential moving average filter [4] are defined:
γ 0 0.1; Δγ 0.1; n 3, and βmin 0.6.
2) For both CDLP and PDLP schemes, the upper bound of the
initial weights and bias is ru 1 × 10−9 . The fixed weight between
the input layer and the hidden layer is defined as [4]
Rij 2
wij exp − i 1; 2; : : : ; N e ; j 1; 2; : : : ; N e Fig. 2 Problem setup for a) MMB Beam, b) mid cantilever beam,
r2 c) Michell beam, and d) distribute-load beam.
XING AND TONG 4675
considered 2-D TO problems using CDLP. In addition, we use Fig. 3

to depict the difference between the actual routine iteration reduction
and the expected reduction of routine iterations (denoted as ϵc )
calculated using Eq. (61). Because of Theorem 2, the ϵc values in
Fig. 3 further suggest that using 1 ≤ N p < 7 may result in better
computational efficiency when solving the four considered problems
using MLASO-d because ϵc > 0 for these N p cases.
Table 3 lists a comparison of computational efficiencies between
the top99neo and MLASO-d algorithms for solving four selected
design problems. In general, both MLASO-d with PDLP and CDLP
schemes reduces the total computational time as compared to top99-
neo, and MLASO-d with the CDLP scheme has better computational
efficiency than using the PDLP scheme for problems A and C. For
problem B, the MLASO-d algorithm with the CDLP scheme has less
total time savings as compared to the PDLP scheme when N p 1 but
the computational efficiency of CDLP is improved as N p increases,
and the CDLP scheme saves more than twice as much time as the
PDLP scheme with N p 10 For problem D. The CDLP scheme
starts to be more efficient than PDLP as N p increases to two, reaching
Fig. 3 The difference between −εKr and −^εKr (denoted as ϵc ) for solving the best computational efficiency at N p 3 with εKr 67% and
problems A, B, C, and D with PDLP and CDLP (Np 1, 2, 4, 7, and 10).

ts −56%. However, when N p increases to four and 10, the pre-
diction quality seems to be affected; and the computational efficiency
considered design problems, the design domain is discretized with of CDLP is reduced for problem D. These time savings demonstrate
300 × 100 square elements with a unit side length, and we use density that MLASO-d has better computational efficiency than top99neo,
filtering with a radius that is 0.04 times the width of the design and the CDLP scheme has better computational efficiency than the
domain. PDLP scheme when 2 ≤ N p ≤ 3.
When solving design problems A, B, C, and D using MLASO-d, Table 4 depicts the optimized topologies and the final compliance
the computational times spent on one gradient information calcula- obtained by running top99neo and MLASO-d with PDLP and CDLP
tion tg are (on average) 0.15, 0.15, 0.16, and 0.17 s, which are 10, 10, schemes. For design problems A, B, C, and D, the optimized topol-
11, and 13 times the sums of tu ; ttrain , and tpred for N p 1. Thus, ogies computed by both MLASO-d algorithms are identical to the
based on Eq. (58b), we suggest using N p < 10 when solving the four one computed by top99neo. The relative difference between the final
Table 3 Total computation time (and ts ) for solving 2-D problems using top99neo and MLASO-d with
PDLP and CDLP schemes
Problem CDLP CDLP CDLP CDLP

(V f ) top99neo PDLP Np 1 Np 2 Np 4 N p 10
A 160.7 98.0 95.0 73.7 48.0 29.3
(0.45) (−39.0%) (−40.9%) (−54.1%) (−70.1%) (−81.8%)
B 135.4 66.2 81.6 60.1 45.7 32.2
(0.5) (−51.1%) (−39.7%) (−55.6%) (−66.2%) (−76.2%)
C 129.2 81.5 71.3 54.2 41.3 27.5
(0.3) (−36.9%) (−44.8%) (−58.0%) (−68.0%) (−78.7%)
D 157.1 89.6 97.0 67.6 96.0 132.1
(0.35) (−43.0%) (−38.3%) (−57.0%) (−38.9%) (−15.9%)
Table 4 Optimized topologies and final compliance calculated by using top99neo and MLASO-d with
CDLP CDLP CDLP CDLP

Problem top99neo PDLP Np 1 Np 2 Np 4 N p 10
A
f 271.97 271.95 271.95 271.95 271.93
εf 271.95 (0.01%) (0.00%) (0.00%) (0.01%) (−0.01%)
f 227.10 227.06 227.06 227.06 227.04

εf 227.06 (0.02%) (0.00%) (0.00%) (0.00%) (−0.01%)
f 31.73 31.73 31.73 31.73 31.74

εf 31.73 (0.00%) (0.00%) (0.00%) (0.00%) (0.03%)
D
f 8.73 8.73 8.73 8.73 8.73
εf 8.73 (0.00%) (0.00%) (0.00%) (0.00%) (0.00%)
4676 XING AND TONG
compliances εf computed by using the MLASO-d algorithms are 48 × 24 × 24; 80 × 40 × 40, and 112 × 56 × 56. The Young’s moduli
within 0.03% for all problems. With N p 1; 2, and 4, the relative for the solid and void elements are E0 1 and Emin 1 × 10−9 , and
difference εf for the CDLP scheme is lower than or equal to the one the Poisson’s ratio is v 0.3. The SIMP penalty is three,pthe volume
for the PDLP scheme for all considered problems. The results dem-
lx
fraction is 0.12, and the size of the density filter is 48 × 3.
onstrate that the MLASO-d algorithm can calculate the optimized As reported in Ref. [5], the direct solver in top3D125 needs to be
topology and the minimum compliance accurately for 2-D TO prob- replaced with the multigrid preconditioned conjugate-gradient solver
lems with both PDLP and CDLP schemes, and CDLP can have more [18] when solving 3-D problems with meshes finer than 48 × 24 × 24.
accurate prediction results than PDLP when N p 1; 2, and 4. The implementation of the MGCG solver also accelerates the com-
putation of gradient information. When solving the 3-D cantilever
2. 3-D Topology Optimization Examples beam example with a 48 × 24 × 24 mesh using top3D125, the aver-
a. 3-D Cantilever Beam. In this example, we assess the computa- age computational costs of one calculation of gradient information
tional efficiency and the prediction accuracy of MLASO-d with are 8.1 and 1.6 s for direct and MGCG solvers, and they are 338 and
PDLP and CDLP schemes in the optimization of a 3-D cantilever 67 times the sum of tu ; ttrain , and tpred for MLASO-d with CDLP.
beam. The geometry and boundary, as well as the loading conditions, According to Eq. (58b), the ratio of
are demonstrated in Fig. 4; and the load is a sine-shaped load, as tg
defined in Ref. [18]. The design domain is discretized by cubic tpred ttrain tu
elements with a unit side length and three sets of length lx , width
lz , and height ly of the design domain are considered, which are for the 3-D cantilever beam problem with the 48 × 24 × 24 mesh
suggests that N p can theoretically be as large as 337 and 66 when
using the direct solver and the MGCG solver, and the upper bound of
N p can be higher if the mesh is finer.
Tables 5 and 6 list the computational efficiencies of the top3D125
and MLASO-d algorithms for the 3-D cantilever problem. As listed
in Table 6, when the mesh is 48 × 24 × 24, the top3D125 with the
MGCG solver is around 4.6 times faster than the top3D125 with the
direct solver. When MLASO-d with PDLP and CDLP (N p 1) are
implemented, the numbers of routine iterations are reduced by 55.1
and 42.9%, which result in 52.5 and 41.3% total time reductions as
compared to the top3D125 with the MGCG solver. For the cases of
48 × 24 × 24; 80 × 40 × 40, and 112 × 56 × 56 meshes, the PDLP
scheme shows better computational efficiency than CDLP with
N p 1, but the computational efficiency of CDLP increases as N p
increases from one to two, four, and 10; and the CDLP scheme
(N p 10) is 2.3, 2.2, and 1.8 times faster than the PDLP scheme.
Note that based on the upper bound of our estimated N p ; we can
choose a larger N p to save more computation time, but the prediction
accuracy will deteriorate.
The optimized topologies and the final compliance computed by
Fig. 4 Problem setup for 3-D Cantilever beam. using top3D125 with the direct and MGCG solver and MLASO-d
Table 5 Number of routine iterations needed to converge (and εKr ) for solving 3-D cantilever problem using top99neo
and MLASO-d with PDLP and CDLP schemes
Problem top3D CDLP CDLP CDLP CDLP

(lx × ly × lz ) FEA solver 125 PDLP Np 1 Np 2 Np 4 N p 10
Cantilever 146 174 137 97 57
Direct solver 305
(48 × 24 × 24) (−52.1%) (−42.9%) (−55.1%) (−68.2%) (−81.3%)
Cantilever 146 174 137 97 57
MGCG 305
(48 × 24 × 24) (−52.1%) (−42.9%) (−55.1%) (−68.2%) (−81.3%)
Cantilever 77 79 62 45 32
MGCG 152
(80 × 40 × 40) (−49.3%) (−48.0%) (−59.2%) (−70.4%) (−78.9%)
Cantilever 47 53 42 32 25
MGCG 88
(112 × 56 × 56) (−46.6%) (−39.8%) (−52.3%) (−63.6%) (−71.6%)
Table 6 Total computation time (and ts ) for solving 3-D cantilever problem using top99neo and MLASO-d with
Problem top3D CDLP CDLP CDLP CDLP

(lx × ly × lz ) FEA solver 125 PDLP Np 1 Np 2 Np 4 N p 10
Cantilever 1050.6 1329.6 1042.0 772.8 479.6
Direct solver 2192.2
(48 × 24 × 24) (−52.1%) (−39.4%) (−52.5%) (−64.7%) (−78.1%)
Cantilever 237.8 274.5 224.8 166.8 102.9
MGCG 467.8
(48 × 24 × 24) (−49.2%) (−41.3%) (−51.9%) (−64.3%) (−78.0%)
Cantilever 631.2 664.3 522.9 392.5 288.0
MGCG 1188.1
(80 × 40 × 40) (−46.9%) (−44.1%) (−56.0%) (−67.0%) (−75.8%)
Cantilever 980.2 1107.2 896.3 694.4 551.5
MGCG 1727.3
(112 × 56 × 56) (−43.3%) (−35.9%) (−48.1%) (−59.8%) (−68.1%)
XING AND TONG 4677
Table 7 Optimized topologies and final compliance for 3-D cantilever beam problem, calculated using
top3D125 and MLASO-d with PDLP and CDLP schemes
Size top3D CDLP CDLP CDLP CDLP

(lx × ly × lz ) 125 PDLP Np 1 Np 2 Np 4 N p 10
48 × 24 × 24
f 3607.83 3633.17 3607.74 3602.48 3602.76 3639.05

εf (0.70%) (−0.00%) (−0.15%) (−0.14%) (0.86%)
80 × 40 × 40
f 6075.14 6131.30 6094.65 6093.02 6105.41 6175.37

εf (0.92%) (0.32%) (0.29%) (0.50%) (1.65%)
112 × 56 × 56
f εf 8679.32 8784.80 8692.23 8699.08 8741.51 8935.05

(1.22%) (0.15%) (0.23%) (0.72%) (2.9%)
with PDLP and CDLP schemes are demonstrated in Table 7. The unit side length, and the size of the design domain is
optimized topologies predicted by MLASO-d are similar to the ones lx 100; with ly lz 16, The Young’s moduli for solid and void
obtained by using top3D125 algorithms with direct and MGCG elements are E0 1 and Emin 1 × 10−9 ; and the Poisson’s ratio is
solvers. The relative differences in the final compliances calculated v 0.3. The SIMP penalty isp three, the volume fraction is 0.25, and
by CDLP with N p 1; 2, and 4 are within 1% for the three the size of the density filter is 3.
considered meshes; and they are lower than the one calculated by When solving the engine pylon problem with CDLP, the average
the PDLP scheme. However, with N p 10, the relative difference computational cost of one calculation of gradient information is 62
computed by the CDLP scheme is higher than the one calculated by the times the sum of tu , ttrain , and tpred , which implies that the upper bound
PDLP scheme for all considered meshes; and the relative difference is of N p can be as large as 61. Considering a large N p may deteriorate the
higher than 1% for the 80 × 40 × 40 and 112 × 56 × 56 meshes. The prediction accuracy, here, we use N p 1; 2; 4, and 10. The computa-
results of the final compliance suggest that using N p 10 can degrade tional efficiencies of top3D125 and MLASO-d for the design of the
the prediction accuracy for the 3-D cantilever beam problem, although aircraft engine pylon are listed in Table 8. The MLASO-d with PDLP
it can also lead to remarkable computational efficiency. schemes can use less routine iteration runs, and thus less computational
time to finish the design as compared to the top3D125; and the CDLP
b. 3-D Aircraft Engine Pylon. To demonstrate the computational scheme shows an even better computational efficiency than the PDLP
efficiency and prediction accuracy of MLASO-d in the design of an scheme. With N p 10, the MLASO-d with CDLP is 2.8 times faster
aircraft component, we present the 3-D aircraft engine pylon design than using the PDLP scheme, and it is 3.9 times faster than the
problem. The aircraft engine pylon is the structure that connects top3D125 with the MGCG solver. Table 8 also lists the optimized
the engine to the wing. Each pylon is usually attached to the wing at topologies and the final compliance calculated by top3D125 with the
the fore and aft attachment points, whereas the engine is mounted to the
MGCG solver and MLASO-d with PDLP and CDLP schemes. The
pylon at the fore and aft engine mounts. The conceptual design of the
optimized topologies obtained using MLASO-d algorithms are similar
engine pylon can be accomplished using the gradient-based TO
to the one obtained by top3D125, and the final compliances computed
method [26]. In this work, we consider the design of the engine pylon
by MLASO-d with PDLP and CDLP schemes (with N p < 10) are
with respect to the minimum compliance requirement; the aft and fore
attachment points are at x 0 and x 0.3lx , and fixed boundary lower than the one calculated by top3D125 with the MGCG solver; the
conditions are used to simulate the attachment to the wing. The aft and relative error of the final objective function values is within 1% for all
fore engine mounts are at x 0.6lx and x lx , and unit downward MLASO-d results.
loads are used to illustrate the engine weight; see Fig. 5 for the setup of The convergence history of the engine pylon design is depicted in
the problem. The design domain is discretized by cubic elements with a Fig. 6, and the indexes of the routine iteration at convergence
(denoted as) krc are labelled. When using the PDLP and CDLP
scheme, the first prediction is activated after kr 5 and kr 12
and thus, the objective function value starts to drop faster than the one
calculation by top3D125 at kr 6 and.kr 13 The PDLP scheme
activates prediction iteration earlier than the CDLP scheme, resulting
in a lower objective function in the range of kr 6 to kr 12. But,
once the prediction is activated in the CDLP scheme, the objective
function of CDLP drops sharply, reaching the same level as that
calculated by PDLP with only one routine iteration when N p 10.
The convergence history of the objective function again demonstrates
the superior computational efficiency of the MLASO-d with the
CDLP scheme.
To measure the prediction accuracy of each leading prediction, let
us use ϵm to represent the relative error between the predicted gradient
and its exact value. Note that
ks − gk
ϵm
Fig. 5 Problem setup for aircraft engine pylon design. kgk
4678 XING AND TONG
Table 8 Calculation results and computational efficiency for solving engine pylon problem using top3D125 and
MLASO-d with PDLP and CDLP schemes
CDLP CDLP CDLP CDLP

top3D125 PDLP Np 1 Np 2 Np 4 N p 10
Topology
f 1572.06 1580.75 1581.13 1573.00 1582.75

1582.22
εf (−0.64%) (−0.09%) (−0.07%) (−0.58%) (0.03%)
Kr 189 165 110 88 67
305
εKr (−38.0%) (−46.6%) (−63.9%) (−71.1%) (−78.0%)
t 463.6 394.2 268.2 211.7 164.5
634.7
(ts ) (−27.0%) (−37.9%) (−57.7%) (−66.6%) (−74.1%)
Also, we use ϵf to measure the relative error between the predicted increase in the first three and five leading predictions; and this is
compliance and its exact value, and because the prediction is activated too early. On the contrary, when
the CDLP scheme is used (with N p 1; 2; 4, and 10), the prediction
kfpred − freal k
ϵf is activated once the ML model is prepared; Thus, both ϵm and ϵf are

kfreal k lower than 10% for the first prediction, and they drop continuously
afterward, resulting in faster convergence than the ϵm and ϵf calcu-
The values of ϵm and ϵf for the kpL th leading prediction are depicted lated by using the PDLP scheme.
in Fig. 7. When using the PDLP scheme, both ϵm and ϵf appear to The numerical results for the 2-D and 3-D examples demonstrated
earlier in this paper suggest that the implementation of MLASO-d
can result in faster convergence and less computational time as
compared to top99neo and top3D125 with direct and MGCG solvers
when both PDLP and CDLP schemes are used. Between the PDLP
and the CDLP schemes, all evidence shows that the CDLP scheme
can have superior computational efficiency and better prediction
accuracy than the PDLP scheme when N p is around two to three.
Finally, we highlight that MLASO-d is a generic algorithm that
provides a method to determine the gradient of the objective func-
tion rapidly using machine learning in selected iterations for
gradient-based structural optimization methods. In the present work,
MLASO-d is implemented into the SIMP method for minimum
compliance TO problems to demonstrate the computational effi-
ciency and the prediction accuracy of the MLASO-d, but the
MLASO-d can be further incorporated into other TO methods for
the pointwise density field-based TO. For example, it is possible to
implement the MLASO-d into the level set [27] and the isogeometric
[28] TO methods to replace the FEA [27] and the isogeometric
analysis [28] and their corresponding sensitivity analyses using
data-driven computations in selected iterations. Besides, MLASO-
d can be adapted to accelerate the solution for any arbitrary and
Fig. 6 The convergence history of the objective function for the aircraft generic optimization problems, including multiphysics TO problems
engine pylon problem calculated by top3D125 as well as MLASO-d with [16,29,30] and nonlinear TO problems [31,32], without the require-
PDLP and CDLP schemes.
ment of generating specific training samples.
IV. Conclusions
In this work, a novel generic CDLP scheme is proposed to control
the activation of predictions in the MLASO algorithm. The CDLP
scheme reduces the prediction error in early predictions and
improves the computational efficiency of MLASO when solving
TO problems. Based on the CDLP scheme, the mathematical theory
to demonstrate the convergence and the computational efficiency of
the MLASO-d algorithm for unconstrained optimization problems
is also established, and it is shown that the mathematical theory is
valid for the TO problems. To support this mathematical theory, the
MLASO-d is embedded into top99neo and top3D125 algorithms to
solve 2-D and 3-D TO problems. The implementation of MLASO-d
reduces the total computational time as compared to using top99neo
and top3D125 when solving the selected 2-D and 3-D TO problems.
Based on the present numerical results, it is concluded that the
MLASO-d with the CDLP scheme has superior computational
Fig. 7 The convergence history of relative errors ϵm and ϵf for the efficiency with multiple consecutive predictions as compared to
aircraft engine pylon problem calculated using MLASO-d with PDLP the top99neo, the top3D125, and the MLASO-d with the PDLP
and CDLP schemes. scheme.
XING AND TONG 4679
Appendix A: Unconstrained Optimization Test Functions For extended test function P3, the extended DIXMAANA-DIX-
The following test functions are used in Sec. II.D: MAANL(A) function (nt is the index of each term, and 0 ≤ nt ≤ N t ) is
For test function P1, the extended Dennis and Schnabel test
Mb B1
problems (version F), denoted as DENSCHNF function [con- i
strained and unconstrained testing environment (CUTE)], is fx 1 a1 x2i
Mb
i1 nt 1
n∕2 Mb −1 B2
i
fx 2x2i−1 − 22 x2i−1 − 22 − 82 a2 x2i xi1 x2i1
i1 Mb
i1 nt 2
5x22i−1 x2i − 32 − 92 ; x0 2; 0; 2; 0; : : : ; 2; 0 2Ma B3
i
a3 x2i x2iMa
For test function P2, the generalized quartic function is Mb
i1 nt 3
Ma B4
n−1 i
a4 xi xi2Ma
fx x2i xi1 x2i 2 ; x0 1; 1; 1; : : : ; 1 Mb
i1 nt 4
i1
Mb B1
i
For test function P3, the DIXMAANA-DIXMAANL(A) function is ::: a1 x2i
Mb
i1 nt N t −3
Mb Mb −1
i B1 i B2 Mb −1 B2
fx 1 a1 x2i a2 x2i xi1 x2i1 i
Mb Mb a2 x2i xi1 x2i1
i1 i1 Mb
i1 nt N t −2
2Ma B3 Ma B4
i i 2Ma
a3 x2i x2iMa a4 xi xi2Ma i B3
Mb Mb a3 x2i x2iMa
i1 i1 Mb
i1 nt N t −1
a1 1; a2 0; a3 0.125; a4 0.125; Ma
i B4
Mb a4 xi xi2Ma
B1 B2 B3 B4 0; Ma ; x0 2; 2; : : : ; 2 Mb
3 i1 nt N t
Appendix B: Algorithms of MLASO-d
Algorithm B1: Prediction iteration in MLASO-d
1: Define μkr −1 ws μkr −2 bs

kr −2 kr −1 kr −2 kr −1
2: Let J 1
2×2 ws μmin bs − μmin 2
ws μmax bs − μmax 2
∂J ∂J
3: Calculate ws and bs by letting ∂w s
0 and ∂bs
0
4: μ^ ws μkr −1 bs
5: s~k σ wkr −1 hδk−1 bkr −1 ▸ Predicting s~
6: δk s~k
s~k −mins~k
7: δk sk μ^ min maxs~k −mins~k × ^μmax − μ^ min ▸ Rescaling s~ to range [^μmin , μ^ max ]
Algorithm B2: Routine iteration in MLASO-d
1: U Kxk −1 F ▸ FEA

2: ge −−pxke p−1 E0 − Emin uTe k0 ue ▸ Calculating the exact value of g
3: w0 wkr −1 , b0 bkr −1 ▸ Initiation for training
4: μkr μkmax
r
; μkmin
r
maxgk ; mingk ; ▸ Recording the range of g
gk −μkmin
5: g~k g~k ▸ Rescaling g to range [0, 1]
r μkmax −μkmin
6: δk g~k
7: If k > 0,
8: hδk−1 wδk−1 ▸Calculating hidden layer output
9: For ep 0; 1; 2; 3; 4 : : : ▸Start training
10: zep σ wep hδk−1 bep − g~k
11: if 12 zTep zep ≤ τ t ▸ Checking converging criterion for training
12: wkr wep and bkr bep
13: else
14: wep1 wep − rl zep zep zep · · · zep fhg ▸ Updating weight matrix
15: bep1 bep − rl zep ▸ Updating bias vector
16: end if
17: end for
18: δk μkmin g~ k −ming~ k
r
maxg~ k −ming~ k × μkmax
r
− μkmin
r
▸ Rescaling g~ to range [μkmin
r
, μkmax
r
]
19: kr kr 1 ▸ Updating routine iteration index number
4680 XING AND TONG
Acknowledgment [16] Kazemi, H., Seepersad, C., and Kim, H. A., “Topology Optimization
Integrated Deep Learning for Multiphysics Problems,” AIAA Science
L. Tong would like to acknowledge the support of the Australian and Technology Forum and Exposition, AIAA Paper 2022-0802, 2022.
Research Council (grant number DP170104916). https://doi.org/10.2514/6.2022-0802
[17] Chi, H., Zhang, Y., Tang, T. L. E., Mirabella, L., Dalloro, L., Song, L.,
References and Paulino, G. H., “Universal Machine Learning for Topology Opti-
mization,” Computer Methods in Applied Mechanics and Engineering,
[1] Mukherjee, S., Lu, D., Raghavan, B., Breitkopf, P., Dutta, S., Xiao, M., Vol. 375, March 2021, Paper 112739.
and Zhang, W., “Accelerating Large-Scale Topology Optimization:
https://doi.org/10.1016/j.cma.2019.112739
State-of-the-Art and Challenges,” Archives of Computational Methods
[18] Amir, O., Aage, N., and Lazarov, B. S., “On Multigrid-CG for Efficient
in Engineering, Vol. 28, No. 7, 2021, pp. 4549–4571.
Topology Optimization,” Structural and Multidisciplinary Optimiza-
https://doi.org/10.1007/s11831-021-09544-3
tion, Vol. 49, No. 5, 2014, pp. 815–829.
[2] Maksum, Y., Amirli, A., Amangeldi, A., Inkarbekov, M., Ding, Y.,
Romagnoli, A., Rustamov, S., and Akhmetov, B., “Computational https://doi.org/10.1007/s00158-013-1015-5
Acceleration of Topology Optimization Using Parallel Computing and [19] Gogu, C., “Improving the Efficiency of Large Scale Topology Optimi-
Machine Learning Methods—Analysis of Research Trends,” Journal of zation Through On-the-Fly Reduced Order Model Construction,”
Industrial Information Integration, Vol. 28, July 2022, Paper 100352. International Journal for Numerical Methods in Engineering, Vol. 101,
https://doi.org/10.1016/j.jii.2022.100352 No. 4, 2015, pp. 281–304.
[3] Brunton, S. L., Nathan Kutz, J., Manohar, K., Aravkin, A. Y., https://doi.org/10.1002/nme.4797
Morgansen, K., Klemisch, J., Goebel, N., Buttrick, J., Poskin, J., [20] Kim, Y. Y., and Yoon, G. H., “Multi-Resolution Multi-Scale Topology
Blom-Schieber, A. W., Hogan, T., and McDonald, D., “Data-Driven Optimization—A New Paradigm,” International Journal of Solids and
Aerospace Engineering: Reframing the Industry with Machine Learn- Structures, Vol. 37, No. 39, 2000, pp. 5529–5559.
https://doi.org/10.1016/S0020-7683(99)00251-6
ing,” AIAA Journal, Vol. 59, No. 8, 2021, pp. 2820–2847.

https://doi.org/10.2514/1.J060131 [21] Wu, J., Dick, C., and Westermann, R., “A System for High-Resolution
[4] Xing, Y., and Tong, L., “A Machine Learning Assisted Structural Topology Optimization,” IEEE Transactions on Visualization and
Optimization Scheme for Fast-Tracking Topology Optimization,” Computer Graphics, Vol. 22, No. 3, 2015, pp. 1195–1208.
Structural and Multidisciplinary Optimization, Vol. 65, No. 4, 2022, https://doi.org/10.1109/TVCG.2015.2502588
pp. 1–19. [22] Andreassen, E., Clausen, A., Schevenels, M., Lazarov, B. S., and
https://doi.org/10.1007/s00158-022-03181-5 Sigmund, O., “Efficient Topology Optimization in Matlab Using 88
[5] Ferrari, F., and Sigmund, O., “A New Generation 99 Line Matlab Code Lines of Code,” Structural and Multidisciplinary Optimization, Vol. 43,
for Compliance Topology Optimization and Its Extension to 3D,” No. 1, 2011, pp. 1–16.
Structural and Multidisciplinary Optimization, Vol. 62, No. 4, 2020, https://doi.org/10.1007/s00158-010-0594-7
pp. 2211–2228. [23] Nesterov, Y., Introductory Lectures on Convex Optimization: A Basic
https://doi.org/10.1007/s00158-020-02629-w Course, Springer, New York, 2004, pp. 56–58.
[6] Kim, N. H., Dong, T., Weinberg, D., and Dalidd, J., “Generalized https://doi.org/10.1007/978-1-4419-8853-9
Optimality Criteria Method for Topology Optimization,” Applied Sci- [24] Vishnoi, N. K., Algorithms for Convex Optimization, Cambridge Univ.
ences, Vol. 11, No. 7, 2021, Paper 3175. Press, Cambridge, England, U.K., 2021, pp. 90–94.
https://doi.org/10.3390/app11073175 [25] Andrei, N., “An Unconstrained Optimization Test Functions Collec-
[7] Banga, S., Gehani, H., Bhilare, S., Patel, S., and Kara, L., “3D Topology tion,” Advanced Modeling and Optimization, Vol. 10, No. 1, 2008,
Optimization Using Convolutional Neural Networks,” Preprint, submit- pp. 147–161.
ted 22 Aug. 2018, https://arxiv.org/abs/1808.07440. [26] Coniglio, S., Gogu, C., Amargier, R., and Morlier, J., “Engine Pylon
[8] Cang, R., Yao, H., and Ren, Y., “One-Shot Generation of Near-Optimal Topology Optimization Framework Based on Performance and Stress
Topology Through Theory-Driven Machine Learning,” Computer- Criteria,” AIAA Journal, Vol. 57, No. 12, 2019, pp. 5514–5526.
Aided Design, Vol. 109, April 2019, pp. 12–21. https://doi.org/10.2514/1.J058117
https://doi.org/10.1016/j.cad.2018.12.008 [27] Wang, Y., and Kang, Z., “MATLAB Implementations of Velocity Field
[9] Kallioras, N. A., Nordas, A. N., and Lagaros, N. D., “Deep Learning- Level Set Method for Topology Optimization: An 80-Line Code for 2D
Based Accuracy Upgrade of Reduced Order Models in Topology Opti- and a 100-Line Code for 3D Problems,” Structural and Multidiscipli-
mization,” Applied Sciences, Vol. 11, No. 24, 2021, Paper 12005. nary Optimization, Vol. 64, No. 6, 2021, pp. 4325–4342.
https://doi.org/10.3390/app112412005 https://doi.org/10.1007/s00158-021-02958-4
[10] Deng, H., and To, A. C., “A Parametric Level Set Method for Topology [28] Gao, J., Wang, L., Luo, Z., and Gao, L., “IgaTop: An Implementation
Optimization Based on Deep Neural Network,” Journal of Mechanical
of Topology Optimization for Structures Using IGA in MATLAB,”
Design, Vol. 143, No. 9, 2021, Paper 091702.
Structural and Multidisciplinary Optimization, Vol. 64, No. 3, 2021,
https://doi.org/10.1115/1.4050105
pp. 1669–1700.
[11] Nie, Z., Lin, T., Jiang, H., and Kara, L. B., “Topologygan: Topology
Optimization Using Generative Adversarial Networks Based on Physi- https://doi.org/10.1007/s00158-021-02858-7
cal Fields over the Initial Domain,” Journal of Mechanical Design, [29] Gomes, P., and Palacios, R., “Aerodynamic-Driven Topology Optimi-
Vol. 143, No. 3, 2021, Paper 031715. zation of Compliant Airfoils,” Structural and Multidisciplinary Opti-
https://doi.org/10.1115/1.4049533 mization, Vol. 62, No. 4, 2020, pp. 2117–2130.
[12] Kollmann, H. T., Abueidda, D. W., Koric, S., Guleryuz, E., and Sobh, https://doi.org/10.1007/s00158-020-02600-9
N. A., “Deep Learning for Topology Optimization of 2D Metamateri- [30] Yan, S., Wang, F., Hong, J., and Sigmund, O., “Topology Optimization
als,” Materials and Design, Vol. 196, Nov. 2020, Paper 109098. of Microchannel Heat Sinks Using a Two-Layer Model,” International
https://doi.org/10.1016/j.matdes.2020.109098 Journal of Heat and Mass Transfer, Vol. 143, Nov. 2019, Paper 118462.
[13] Xue, L., Liu, J., Wen, G., and Wang, H., “Efficient, High-Resolution https://doi.org/10.1016/j.ijheatmasstransfer.2019.118462
Topology Optimization Method Based on Convolutional Neural Net- [31] Chen, Q., Zhang, X., and Zhu, B., “A 213-Line Topology Optimization
works,” Frontiers of Mechanical Engineering, Vol. 16, No. 105, 2021, Code for Geometrically Nonlinear Structures,” Structural and Multi-
pp. 80–96. disciplinary Optimization, Vol. 59, No. 5, 2019, pp. 1863–1879.
https://doi.org/10.1007/s11465-020-0614-2 https://doi.org/10.1007/s00158-018-2138-5
[14] Xiang, C., Chen, A., and Wang, D., “Real-Time Stress-Based Topology [32] Han, Y., Xu, B., and Liu, Y., “An Efficient 137-Line MATLAB Code for
Optimization via Deep Learning,” Thin-Walled Structures, Vol. 181, Geometrically Nonlinear Topology Optimization Using Bi-Directional
Dec. 2022, Paper 110055. Evolutionary Structural Optimization Method,” Structural and Multi-
https://doi.org/10.1016/j.tws.2022.110055 disciplinary Optimization, Vol. 63, No. 5, 2021, pp. 2571–2588.
[15] Seo, J., and Kapania, R. K., “Development of Deep Convolutional https://doi.org/10.1007/s00158-020-02816-9
Neural Network for Structural Topology Optimization,” AIAA Journal,
Vol. 61, No. 1, 2022, pp. 1–14. R. K. Kapania
https://doi.org/10.2514/1.J061664 Associate Editor

Theory of Machine Learning Assisted Structrual Optimization Algorith and Its Application

Uploaded by

Copyright:

Available Formats

Theory of Machine Learning Assisted Structrual Optimization Algorith and Its Application

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Theory of Machine Learning Assisted Structrual Optimization Algorith and Its Application

Uploaded by

Copyright:

Available Formats

AIAA JOURNAL

Vol. 61, No. 10, October 2023

Theory of Machine Learning Assisted Structural Optimization

Yi Xing∗ and Liyong Tong†

Nomenclature Nt = number of terms in test function

T HE structural design of aircraft and spacecraft needs to optimize

descent (GD) algorithm to solve unconstrained convex optimization

kck k2 ≤ 1 − αLksk k2 (14d)

Algorithm 1: Algorithm of MLASO-d with CDLP scheme

1. Convergence Analysis Proposition 2: By using the CDLP scheme in MLASO-d,

For routine iterations, Proof: Because Rk Lα ≤ 0,

Lα In prediction iterations, due to Eq. (22c),

Lα2 ≤ fx0 − fx1  fx1 − fx2  : : : − fxKn

1 Because of Eq. (21b), in routine iterations,

kx0 − x k fxk1 − fx fxk1 − fx fxk − fx

− 2αhxk − x ; ck i  α2 ksk k2 (32) (38b)

2kx0 − x k2 same optimization problem using a gradient-based optimization

2kx0 − x k2 td t 0 − t ≥ K n0 × tg  tu − K r × tg  ttrain − Kn × tpred  tu

1 1 1 In Eq. (48), if K r ≤ Kn0 ≤ K n , we have

MLASO-d targets to reduce the number of routine iterations to Because K r ≤ Kn0 ,

Kn − 2 K −2 We also use εf to measure the relative difference between the final

Hence, fMLASO − fref

≥ considered test functions. Compared to the GD, the MLASO-d

Ee Emin  xe p E0 − Emin (62)

maxxmin ; xe − move if xe B0.5

min1; xe  move if min1; xe  move ≤ xe B0.5

xe − xmin if xe − αδe ≤ xmin

considered 2-D TO problems using CDLP. In addition, we use Fig. 3

problems A, B, C, and D with PDLP and CDLP (Np 1, 2, 4, 7, and 10).

Problem CDLP CDLP CDLP CDLP

CDLP CDLP CDLP CDLP

f 227.10 227.06 227.06 227.06 227.04

f 31.73 31.73 31.73 31.73 31.74

Problem top3D CDLP CDLP CDLP CDLP

Problem top3D CDLP CDLP CDLP CDLP

Size top3D CDLP CDLP CDLP CDLP

f 3607.83 3633.17 3607.74 3602.48 3602.76 3639.05

f 6075.14 6131.30 6094.65 6093.02 6105.41 6175.37

f εf 8679.32 8784.80 8692.23 8699.08 8741.51 8935.05

(1.22%) (0.15%) (0.23%) (0.72%) (2.9%)

CDLP CDLP CDLP CDLP

f 1572.06 1580.75 1581.13 1573.00 1582.75

ϵf is activated once the ML model is prepared; Thus, both ϵm and ϵf are

Appendix B: Algorithms of MLASO-d

Algorithm B1: Prediction iteration in MLASO-d

1: Define μkr −1 ws μkr −2  bs

Algorithm B2: Routine iteration in MLASO-d

1: U Kxk −1 F ▸ FEA

ing,” AIAA Journal, Vol. 59, No. 8, 2021, pp. 2820–2847.

You might also like

Lα2 ≤ fx0 − fx1 fx1 − fx2 : : : − fxKn

kx0 − x k fxk1 − fx fxk1 − fx fxk − fx

− 2αhxk − x ; ck i α2 ksk k2 (32) (38b)

2kx0 − x k2 same optimization problem using a gradient-based optimization

2kx0 − x k2 td t 0 − t ≥ K n0 × tg tu − K r × tg ttrain − Kn × tpred tu

Hence, fMLASO − fref

Ee Emin xe p E0 − Emin (62)

min1; xe move if min1; xe move ≤ xe B0.5

f 227.10 227.06 227.06 227.06 227.04

f 31.73 31.73 31.73 31.73 31.74

f 3607.83 3633.17 3607.74 3602.48 3602.76 3639.05

f 6075.14 6131.30 6094.65 6093.02 6105.41 6175.37

f εf 8679.32 8784.80 8692.23 8699.08 8741.51 8935.05

f 1572.06 1580.75 1581.13 1573.00 1582.75

1: Define μkr −1 ws μkr −2 bs