Spe 187430 Pa

J187430 DOI: 10.
2118/187430-PA Date: 1-December-18 Stage: Page: 2428 Total Pages: 16
Enhancing the Performance of the

Distributed Gauss-Newton Optimization
Method by Reducing the Effect of
Numerical Noise and Truncation Error
With Support-Vector Regression
Zhenyu Guo, University of Tulsa; Chaohui Chen, Shell International Exploration and Production Incorporated;
Guohua Gao, Shell Global Solutions US Incorporated; and Jeroen Vink, Shell Global Solutions International
Downloaded from http://onepetro.org/SJ/article-pdf/23/06/2428/2116211/spe-187430-pa.pdf/1 by Indian Institute of Technology (ISM) Dhanbad user on 28 March 2023
Summary
Numerical optimization is an integral part of many history-matching (HM) workflows. However, the optimization performance can be
affected negatively by the numerical noise existent in the forward models when the gradients are estimated numerically. As an unavoid-
able part of reservoir simulation, numerical noise refers to the error caused by the incomplete convergence of linear or nonlinear solvers
or truncation errors caused by different timestep cuts. More precisely, the allowed solver tolerances and allowed changes of pressure
and saturation imply that simulation results no longer smoothly change with changing model parameters. For HM with linear-
distributed Gaussian-Newton (L-DGN), caused by the discontinuity of simulation results, the sensitivity matrix computed by linear
interpolation might be less accurate, which might result in slow convergence or, even worse, failure of convergence.
Recently, we have developed an HM workflow by integrating the support-vector regression (SVR) with the distributed-Gaussian-
Newton (DGN) method optimization method referred to as SVR-DGN. Unlike L-DGN that computes the sensitivity matrix with a
simple linear proxy, SVR-DGN computes the sensitivity matrix by taking the gradient of the SVR proxies. In this paper, we provide
theoretical analysis and case studies to show that SVR-DGN can compute a more-accurate sensitivity matrix than L-DGN, and SVR-DGN
is insensitive to the negative influence of numerical noise. We also propose a cost-saving training procedure by replacing bad-training
points, which correspond to relatively large values of the objective function, with those training-data points (simulation data) that have
smaller values of the objective function and are generated at most-recent iterations for training the SVR proxies.
Both the L-DGN approach and the newly proposed SVR-DGN approach are tested first with a 2D toy problem to show the effect of
numerical noise on their convergence performance. We find that their performance is comparable when the toy problem is free of
numerical noise. As the numerical-noise level increases, the performance of the L-DGN degrades sharply. By contrast, the SVR-DGN
performance is quite stable. Then, both methods are tested using a real-field HM example. The convergence performance of the SVR-DGN
is quite robust for both the tight and loose numerical settings, whereas the performance of the L-DGN degrades significantly when loose
numerical settings are applied.
Introduction
Numerical-optimization techniques have been used widely for closed-loop reservoir management (Jansen et al. 2005, 2009; Chen et al.
2010; Chen and Reynolds 2016; Guo et al. 2018c), in which assisted HM (AHM) (Reynolds et al. 2006; Yu et al. 2007; Guo et al.
2018b) is one of the essential steps. The optimization problem for AHM is defined as searching for a set (or several sets) of parameters
defining the reservoir-simulation model(s). More specifically, AHM minimizes an objective function that measures the error between
the simulated-flow responses and the observed historical data plus a regularization term. HM problems usually are solved by gradient-
based optimization algorithms when the analytical gradient of the objective function with respect to model parameters is available.
One efficient way to obtain the gradient is the adjoint-based method (Li et al. 2003; Reynolds et al. 2004; Gao and Reynolds 2006;
Kahrobaei et al. 2013), which saves a significant amount of computational time (Oliver et al. 2008) compared with those derivative-free
optimization methods.
Unfortunately, for most (commercial) simulators, the adjoint-based gradient is unavailable. An alternative way is to compute
numerically the gradient of the objective function or the sensitivity matrix (the gradients of simulated responses). A common way to
obtain the sensitivity matrix is a finite-difference approximation (Nash and Sofer 1996). By perturbing the HM parameters one by one,
the corresponding perturbed-flow responses are obtained (e.g., by running reservoir simulations), and the partial derivatives of flow
responses with respect to different parameters are computed by single-sided or two-sided (central) finite difference. However, the finite-
difference method has the following two drawbacks: (1) When the number of parameters for HM is large, the computational cost to
numerically estimate the gradient of the objective function or the sensitivity matrix is quite expensive; (2) the numerically estimated
gradient (or sensitivity matrix) may become unacceptably inaccurate because of numerical errors (Gao et al. 2016a).
To reduce the computational cost of traditional optimization algorithms and to take advantage of gradient-based HM, the DGN
method was proposed by Gao et al. (2016b, 2017a). The basic idea of DGN is as follows: An ensemble of initial search points (also
referred to as initial base cases) is generated by randomly sampling them from either their prior probability distribution or from a uni-
form distribution. Subsequently, a specifically designed algorithm is applied to update iteratively the set of base cases and, thus, to grad-
ually decrease their objective-function values in parallel. In DGN, a new search point for each base case is determined from solving a
trust-region subproblem (Nocedal and Wright 1999; Gao et al. 2017a). This finds the global minimum of a quadratic approximation of
the objective function in a limited searching area, namely, the trust region. Both the gradient and the Hessian of the quadratic model
Copyright V
C 2018 Society of Petroleum Engineers
This paper (SPE 187430) was accepted for presentation at the SPE Annual Technical Conference and Exhibition, San Antonio, Texas, USA, 9–11 October 2017, and revised for publication.
Original manuscript received for review 5 December 2017. Revised manuscript received for review 5 July 2018. Paper peer approved 9 July 2018.
2428 December 2018 SPE Journal
ID: jaganm Time: 11:34 I Path: S:/J###/Vol00000/180083/Comp/APPFile/SA-J###180083

J187430 DOI: 10.2118/187430-PA Date: 1-December-18 Stage: Page: 2429 Total Pages: 16
can be computed from the sensitivity matrix, which is evaluated at each of the current base cases using the Gauss-Newton (GN) formu-
lation. When an analytically calculated sensitivity matrix is unavailable, each flow response assessed at a given point (a target point) is
approximated by a linear combination of model parameters on the basis of the first-order Taylor-series expansion. The coefficients for
the linear relationships between model parameters and flow responses are obtained by solving a series of linear systems using the model
parameters and the flow responses of the target point and its surrounded points. Compared with finite difference, the computational effi-
ciency when applying DGN is enhanced significantly because no extra perturbation simulations are required to compute the sensitivity
matrix evaluated at each base case. The local-search DGN optimization method using a linear proxy is referred to as L-DGN in
this paper.
However, the sensitivity matrix obtained in this way might not be sufficiently accurate because of numerical errors (Gao et al.
2016a). First, when the training-data points used to compute the coefficients defining the linear relationships have relatively large dis-
tances among each other, the approximation to use a linear relationship between flow responses and model parameters is not accurate
(because of significant truncation errors caused by neglecting higher-order nonlinear terms in the Taylor-series expansion). Second,
when the training-data points used to compute the coefficients defining the linear relationships have relatively small distances between
each other, the effect of numerically induced discontinuities (or “noise”) becomes pronounced.
Guo et al. (2017, 2018a) proposed to enhance the performance of the L-DGN optimization method through integrating the
parallelized local-search optimizer DGN (Gao et al. 2016b, 2017a) with the SVR technique (Cortes and Vapnik 1995; Platt 1998;
Suykens and Vandewalle 1999; Suykens et al. 2002; Smola and Schölkopf 2004; Demyanov et al. 2010; De Brabanter 2011; Guo and
Reynolds 2018); it is referred to as the SVR-DGN optimization method. In Guo et al. (2017), the sensitivity matrix was computed from
an SVR proxy that was built using a set of reservoir models and their simulated-flow responses generated during the DGN iteration. As
an example, they applied the SVR-DGN to quantify the uncertainty of the estimated ultimate recovery of an unconventional asset with
13 uncertainty parameters. Their numerical test results confirm that the SVR-DGN is more efficient than the L-DGN method. However,
they did not provide sufficient theoretical analyses and discussions to explain why the SVR-DGN performs better than the
L-DGN. The SVR-DGN optimization method proposed by Guo et al. (2017) has a major limitation: It becomes less efficient when
applied to HM problems with more parameters and more observed data.
The major objectives of this work include (1) further investigating the underlying mechanisms that can effectively enhance the per-
formance of the DGN optimization method with SVR proxies, both theoretically and numerically, and (2) effectively reducing the com-
putational cost of building SVR proxies by removing training-data points (or simulation cases) with large values of the objective
function from the training-data set.
In this paper, we demonstrate that the performance of the SVR-DGN is very stable to numerical noise, whereas the original method,
L-DGN, is very-sensitive to numerical noise. When HM is performed with the SVR-DGN, reasonably loose convergence criteria for
linear and nonlinear solvers can be applied to reduce further the computational cost on forward reservoir-simulation runs. We also dem-
onstrate that the computational-cost-saving version of the SVR-DGN optimizer can be applied effectively to HM problems with hun-
dreds of uncertainty parameters, without the loss of stability and efficiency.
The remainder of this paper is organized as follows: (1) We introduce some fundamentals about formulating HM problems within
the Bayesian framework. (2) A brief discussion of SVR formulation is summarized. (3) We propose a computational cost-saving SVR
and integrate it with the DGN optimization method. (4) We investigate the underlying mechanisms that effectively enhance the per-
formance of the DGN optimization method with SVR proxies by providing more-detailed discussions about the negative effects of
numerical noise and truncation errors on the performance of numerical-optimization algorithms, especially the DGN algorithm. (5) The
proposed method is validated by a 2D toy problem with multiple best matches. (6) The proposed method is then applied to a real HM
problem. (7) We draw conclusions in the last section.
Formulating HM Problems Within the Bayesian Framework

Within the Bayesian framework, the maximum a posteriori (MAP) estimate (Oliver et al. 2008) can be obtained by minimizing the
objective function defined by
1 1
OðmÞ ¼ ðm mpr ÞT C1 T 1
M ðm mpr Þ þ ½yðmÞ d obs CD ½yðmÞ d obs ; . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ð1Þ
2 2
where m is the N m -dimensional model-parameter vector; mpr is the prior model-parameter vector; yðmÞ is the N d -dimensional response
vector; d obs is the N d -dimensional observed data vector; CM is the prior model-covariance matrix; and CD is the observation-error
covariance matrix.
To quantify the uncertainty of the model parameters and the production forecasts after conditioning to production-history data, the
randomized-maximum-likelihood (RML) method (Kitanidis 1995; Oliver 1996) can be applied to generate a set of conditional realiza-
tions that approximates the posterior distribution of the reservoir models. To generate these realizations, a set of “perturbed” objective
functions must be minimized,
1 1
OðmÞ ¼ ðm muc ÞT C1 T 1
M ðm muc Þ þ ½yðmÞ d uc CD ½yðmÞ d uc : . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ð2Þ
2 2
The only change in the objective function defined in Eq. 1 is to replace mpr in Eq. 1 with muc in Eq. 2 and to replace d obs in Eq. 1
with d uc in Eq. 2. Here, muc is an unconditional realization generated by sampling the prior probability-density function (PDF), and
1=2
d uc ¼ d obs þ CD z with z being the Nd -dimensional random Gaussian vector with a zero mean and an identity covariance matrix.
Gradient-based optimization methods require the gradient (and the Hessian matrix when applicable) of OðmÞ in Eq. 1 or Eq. 2. The
gradient of OðmÞ is given by differentiating Eq. 1 or Eq. 2:
rOðmÞ ¼ C1 T 1
M Dm þ G ðmÞCD DyðmÞ; . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ð3Þ
where Dm ¼ m mpr or Dm ¼ m muc represents the model shift, and DyðmÞ ¼ yðmÞ d obs or yðmÞ d uc represents the data shift.
G is the sensitivity matrix that is defined as the derivatives of the flow-response vector ( y) with respect to the model parameters (m) as
given by
GðmÞ ¼ ½ryð1Þ ðmÞ; ryð2Þ ðmÞ; … ; ryðNd Þ ðmÞT ; . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .ð4Þ
December 2018 SPE Journal 2429

where the yð jÞ ðmÞ; j ¼ 1; 2; … Nd represents the jth simulated-flow response. The Hessian matrix is usually approximated by the GN for-
mulation, which is given by
HðmÞ ¼ C1 T 1
M þ G ðmÞCD GðmÞ: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ð5Þ
The response function yð jÞ ðmÞ; j ¼ 1; 2; … Nd is a nonlinear function of the model-parameter vector m; therefore, both the sensitivity
matrix GðmÞ and the Hessian matrix HðmÞ depend on m. The minimization of the nonlinear objective function OðmÞ is generally on an
iteration-by-iteration basis. Letting m be the optimal solution of the current iteration, we expect to find a new search point, m þ Dm,
such that Oðm þ DmÞ is minimized in a neighborhood around m . The neighborhood of m is also called the trust region. If the
response function yð jÞ ðmÞ (for j ¼ 1, 2, … Nd ) is smooth, then Oðm þ DmÞ can be approximated by the following quadratic model in
the trust region of m ,
1
Oðm þ DmÞ qðDmÞ ¼ Oðm Þ þ ½rOðm ÞT Dm þ ðDmÞT Hðm ÞDm: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ð6Þ
2
The new search point m ¼ m þ Dm can be determined by solving the trust-region subproblem (Gao et al. 2017b) [i.e., by finding the
global minimum of the quadratic model qðDmÞ defined in Eq. 6 within the given trust region].
SVR
SVR is a widely used machine-learning method for solving nonlinear-regression problems. With some training input and output data,
SVR “learns” how the input data are mapped to the output data, and generates a function that yields a reasonably good prediction of the
output for any given input data that fall within the range of the training data. For building an SVR proxy, let a training data set S be
fðxk ; yk Þ; k ¼ 1; 2; … Ns g, where Ns is the number of training points, xk is the kth input vector with a dimension of Nm , and yk is the
true response of the system corresponding to the kth input xk . For simplicity, we assume that yk is a scalar. The proxy function to esti-
mate the response given the input x is defined as
y^ ¼ wT uðxÞ þ b; . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ð7Þ
where uðxÞ is a function that maps the original input-parameter space into a higher dimension of the feature space with dimension equal
to Nw , b is a scalar-bias term, and w is a weighting vector,
w ¼ ½w1 ; w2 ; … ; wNw T : . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ð8Þ
The training procedure of an SVR proxy model is to minimize an objective function (Suykens and Vandewalle 1999; Suykens et al.
2002; Smola and Schölkopf 2004; De Brabanter 2011) given by
1 1 XNs
J ¼ wT w þ c ½wT uðxk Þ þ b yk 2 ; . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ð9Þ
2 2 k¼1
where c is the regularization factor that trades off the complexity of the SVR model presented by 12 wT w and the training error repre-
1 XNs
sented by ½wT uðxk Þ þ b yk 2 . The minimization of Eq. 9 is difficult to solve in the primal space because of the high dimen-
2 k¼1
sionality of w. The usual way is to transform the minimization of Eq. 9 to the dual space so that the optimization problem can be solved
with less computational cost. By introducing an independent variable ek , the minimization of Eq. 9 is equivalent to
1 1 XNs
minimize J ¼ wT w þ c e2 ; . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ð10Þ
2 2 k¼1 k
subject to
ek ¼ yk y^k ; for k ¼ 1; 2; … Ns : . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ð11Þ
To solve the constrained optimization problem given by Eqs. 10 and 11, we define the Lagrangian as
Lðw; b; e; aÞ ¼ Jðw; eÞ RNk¼1

s
ak ½wT uðxk Þ þ b þ ek yk ; . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ð12Þ
where the scalar ak is the Lagrange multiplier and a ¼ fa1 ; a2 ; … ; aNs g; e represents the error term and is defined as e ¼
fe1 ; e2 ; … ; eNs gT . The unique solution of the optimization problem of Eqs. 10 and 12 is obtained by solving the linear system, rL ¼ 0.
By rearranging rL ¼ 0, w and e are eliminated (Guo et al. 2017). Then, the prediction function of Eq. 7 is rewritten as
y^ðxÞ ¼ RNk¼1
s
ak Kðxk ; xÞ þ b; . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ð13Þ
where Kðxk ; xÞ is a kernel function that satisfies Mercer’s condition (Mercer 1909), which is given by
Kðxk ; xÞ ¼ uðxk ÞT uðxÞ: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ð14Þ
For the examples in this paper, we use a radial-basis-function kernel:

Kðxk ; xÞ ¼ expð k xk x k22 =r2 Þ; . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ð15Þ
where r is the specified-kernel bandwidth and, by the rule of thumb, r is given by
r ¼ 0:2 k xmax xmin k2 ; . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ð16Þ

where each element of xmax and xmin , respectively, represents the maximal and minimal value over xk ; k ¼ 1; 2; … Ns . To approximate
a reservoir-simulation model, the SVR proxy model is built to emulate the nonlinear relationship between the simulated responses and
the model-parameter vector. For a specific flow response yð iÞ , j ¼ 1; 2; … Nd , the training data set for building an SVR model is written
ð iÞ
as fðmk ; yk Þ; k ¼ 1; 2; … Ns g, where mk is the kth training input representing a realization of a reservoir model and yk is the corre-
sponding flow response predicted by the simulation run. Through the previous-training procedure, the SVR model for approximating
yð iÞ ðmÞ is obtained and denoted by y^ð iÞ ðmÞ. In the same way, we can obtain a combination of SVR models, ^y ðmÞ ¼
y ð1Þ ðmÞ; y^ð2Þ ðmÞ; … ; y^ðNd Þ ðmÞ, as a proxy of yðmÞ. On the basis of the closed-form formulation of ^y ðmÞ, the sensitivity matrix is cal-
½^
culated analytically as r½^ y ðmÞT T .
Computational Cost-Saving SVR and Its Integration With the DGN

In the L-DGN HM workflow (Gao et al. 2017a), the following four steps are repeated iteratively until convergence. First, initialize (for
the first iteration) or generate (for later iterations) an ensemble of search points. Second, construct a reservoir-simulation model for
each search point, distribute all simulation jobs to different computer cores using a parallelized high-performance computing system,
and report relevant simulation results for completed simulation jobs. Third, update the training-data set by adding relevant simulation
results of all completed-simulation jobs, and update each base case by its corresponding search point if it improves the objective func-
tion. Fourth, approximate the sensitivity matrix with linear interpolation, and then update the quadratic model around each base case.
In the first step of the first iteration, an ensemble of initial search points (also used as initial base cases) is generated by randomly
sampling from either the prior probability distribution or the uniform distribution. In the first step of later iterations, each search point is
the global minimum of the quadratic model that approximates the objective function in the neighborhood (called trust region) of each
base case, solved from a trust-region subproblem (Gao et al. 2017b).
In the last step, the quadratic model around each base case is updated using the GN formulation. The gradient and the GN Hessian
of the objective function are computed using Eqs. 4 and 6, with the sensitivity matrix approximated by linear interpolation of the Nm
points in the training data set that are closest to the base case. Letting the ensemble size be Ne , each member (base case) of the ensemble
is denoted by mj , j ¼ 1; 2; … Ne , and the ensemble is denoted by {mj , j ¼ 1; 2; … Ne }. Around each base case, mj , j ¼ 1; 2; … Ne , Nd
local linear functions y^j ðmÞ are built as the proxies of the Nd data responses, yj ðmÞ, where yj ðmÞ is given by
yj ðmÞ ¼ ½yð1Þ ðmÞ; yð2Þ ðmÞ; … yðNd Þ ðmÞT : . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ð17Þ
Then the sensitivity matrix ½Gðmj Þ; j ¼ 1; 2; … Ne for each base case is computed by differentiating the corresponding local linear
proxy ½^ y j ðmÞ.
The major difference between the L-DGN and the SVR-DGN lies in differently computing Gðmj Þ for j ¼ 1; 2; … Ne . The L-DGN
builds a set of Nd local linear proxies to estimate the sensitivity matrices; the SVR-DGN builds a set of Nd global SVR proxies of the
Nd data responses, ^y ðmÞ, using all training-data points recorded in the training-data set that includes relevant simulation results of simu-
lation jobs successfully completed at the current iteration and at previous iterations (Guo et al. 2017).
Fig. 1 illustrates one iteration of HM workflow using the SVR-DGN. In Fig. 1, fmj ; j ¼ 1; 2; … Ne g represents the ensemble of
search points generated at the current iteration, and fyj ; j ¼ 1; 2; … Ne g represents the corresponding set of data responses generated by
running reservoir simulations; f½mk ; yðmk Þ; k ¼ 1; 2; … Ns g represents the training-data set for building an SVR proxy, where Ns repre-
sents the number of training samples and GðmÞ ¼ r^y ðmÞ represents the sensitivity matrix estimated by taking the gradient of the SVR proxy
^y ðmÞ with respect to m. Similar to the L-DGN, the workflow has four steps for one iteration; the difference is that the SVR-DGN computes
the sensitivity matrix by differentiating the SVR proxy, whereas the L-DGN computes the sensitivity matrix by linear interpolation.
Reservoir simulator ˆ
SVR proxy: y(m)
N ],
... k )
1.
2, (m
s}
)T
2. (m
)T
*
1, k ,y
(y j ,j =
(m
, j=
i = {[m
j 1,2
(yˆ
1,2 , ..
Δ
3.
, .. .N
)=
.N e)
(m
e)
G
4.
DGN
Fig. 1—The SVR-DGN workflow for assisted HM. The numbers 1 to 4 represent the four steps of one SVR-DGN iteration.
In Guo et al. (2017), simulation results generated in all previous iterations are used as the training-data points for SVR. It is fine for
problems with only a few parameters and a small number of observed data (e.g., the problem they tested having only 13 uncertainty
parameters and 150 observed data). However, it becomes less efficient when applied to problems with more parameters and more
observed data.
The number of training-data points (Ns ) can be estimated approximately by Ns ¼ Ne NItr , where NItr is the number of iterations
required for the DGN to converge. NItr ranges from 15 to 50 for most problems that we tested. More iterations are required for

more-complicated problems with highly nonlinear responses. In our current implementation of the DGN optimization algorithm,
Ne > Nm is required to guarantee the unique solution of the linear-interpolation equation for the sensitivity-matrix estimation.
Because an SVR proxy is constructed for each response, the total computational cost increases as the number of observed data Nd
increases. Eq. 13 indicates that the computational cost of evaluating the SVR proxy y^ðxÞ for each individual datum increases as the
number of training-data points (Ns ) increases. Therefore, the computational cost of building Nd SVR proxies might become extremely
expensive for problems with large Nm and large Nd .
The following four different HM problems are used as examples to show the effect of the number of observed data (Nd ) and the
number of training-data points (Ns ) on the computational cost of SVR training, as shown in Fig. 2.
• Case 1: A real-field channelized reservoir model with 235 parameters and 1,701 observed data (Gao et al. 2017a)
• Case 2: A synthetic reservoir model (SPE-1) with 8 parameters and 116 observed data
• Case 3: A synthetic reservoir model (SPE-1) with 8 parameters and 73 observed data
• Case 4: A real-field unconventional reservoir with 13 parameters and 42 observed data (Guo et al. 2017)
1,000 1 0
y = 9.5367×10–14x 3.5453×10
Observed Data (minutes)
SVR-Training Cost for all
Observed Data (minutes)

R 2 = 9.7446×10–1
SVR-Training Cost per

100 0.1
10 0.01
1 0.001
Case 1 Case 1
Case 2 Case 2
0.1 0.0001
Case 3 Case 3
Case 4 Case 4
0.01 0.00001
100 1,000 10,000 100 1,000 10,000
Number of Training Data Points Number of Training Data Points
(a) Cost for all observed data (b) Cost per observed data
Fig. 2—The SVR training cost for responses of all observed data (a) and per observed datum (b), tested with different
HM problems.
In Fig. 2, the horizontal axis represents the number of training-data points (Ns ). In Fig. 2a, the vertical coordinate represents the
computational cost (in minutes) of building the SVR proxies for all Nd simulated responses. Results shown in Fig. 2a indicate clearly
that the total computational cost increases as the number of observed data Nd increases. Consider Case 1 as an example: It might take
more than 300 minutes (quite expensive) to build the 1,701 SVR proxies when the number of training-data points is more than 3,000.
In Fig. 2b, the vertical coordinate represents the average computational cost (in minutes) of building the SVR proxy for an individual
simulated response (i.e., the total computational cost shown in Fig. 2a divided by the number of observed data). Results shown in
Fig. 2b indicate that the computational cost of building the SVR proxy for an individual simulated response is affected mainly by the
number of training-data points (Ns ), following a power-law function with an exponent of 3.55. To reduce the computational cost of
SVR training, we need to limit the number of training-data points.
In this paper, instead of using all training-data points recorded in the training-data set, only a part of them is used for SVR. The max-
imal number of training-data points used for SVR is limited by Ns Nsmax ¼ Ne NSVR , where NSVR , which represents the number of
most-recent DGN iterations that are incorporated in training an SVR model, is used to limit the number of training-data points for
SVR proxies.
If the number of training-data points in the training-data set is fewer than Nsmax (i.e., Ns Nsmax ), then the training-data set is
updated by simply adding a new search point (simulation case) with successful simulations. Otherwise, the training-data set is updated
by replacing the oldest training-data point for a realization (i.e., the search point generated at the earliest iteration) with the new search
point for the same realization if it is successful.
Search points generated in earlier iterations generally have larger values of the objective function, and removing them from the
training-data set might affect the global accuracy or the global quality of the SVR proxies. However, it has a minimal impact on the
local accuracy of the SVR proxies near the current best solutions (or base cases), because more search points will be clustered around
those local minima of the objective function after a few iterations. When integrated with the DGN optimization method, the SVR
proxies are used to approximate the local sensitivity matrix at the best solution (with the smallest value of the objective function)
obtained in the current iteration for each realization. Therefore, removing search points generated in earlier iterations from the training-
data set has only a minimal impact on the convergence performance of the proposed SVR-DGN optimization method.
Effect of Numerical Noise and Truncation Error on the Performance of Optimization

As addressed by Vugrin (2005), inverse problems (or parameter-estimation problems) always involve some errors that are often
expressed as the discrepancy between the models’ predictions and the true-system response. These errors include observation error
(caused by inaccurate measurements or observations), model-structure error (caused by undermodeling), and forward-solution error
(caused by numerical simulation).
In this paper, the term “numerical noise” is used to denote the part of the forward-solution error. The forward-solution error includes
both the discretization (or truncation) error and the residual error of iterative linear and nonlinear solvers.
Discretization (e.g., using gridblocks and timesteps) may introduce some truncation errors. The accuracy of simulation results
depends on the sizes of gridblocks and timesteps. The simulation results are more accurate using a smaller gridblock size and a smaller
timestep. For most commercial reservoir simulators, the timestep will be cut when the pressure and saturation changes exceed their
maximal limits (parts of numerical settings) specified by the user. Obviously, a larger timestep will be generated during simulation
when using larger maximal limits (looser numerical settings), and larger truncation errors will be introduced. Because both pressure
change and saturation change are functions of reservoir-model parameters, timesteps generated for one reservoir model are generally

different from those for another model. Therefore, when reservoir-model parameters change gradually (e.g., with very small perturba-
tion), the simulated responses may change randomly because of a different timestep cut.
When iterative methods are applied to solve linear or nonlinear equations, residual errors are unavoidable. Numerical noise caused
by residual errors often occurs when predicting a flow response using a reservoir simulator. For a particular timestep, the number of iter-
ations for the iterative linear solver, or the Newton-Rapson (NR) nonlinear solver, required to converge might be entirely different for
two reservoir models even though they have only a slight difference in reservoir properties. Even worse, various simulation results
might be obtained when the same reservoir-simulation model is run in various computer clusters or with a different number of cores
(e.g., for parallel computing). For example, the NR method is generally applied to solve nonlinear equations iteratively, and it is termi-
nated whenever the residual is smaller than the tolerance (part of numerical settings) specified by the user. Take the bottomhole pressure
(BHP) in a producer simulated at a certain timestep as an example. We assume that the exact solution is PBHP; Exact , and PBHP; Sim is the
simulated BHP using the tolerance ecr . Obviously, jPBHP; Sim PBHP; Exact j increases as the tolerance ecr > 0 increases. Therefore, we
have PBHP; Sim ¼ PBHP; Exact þ ep e, where 1 e 1 is a random number and ep represents the magnitude of the numerical noise
caused by a residual error. The relatively tighter numerical settings used for iterative solvers will generate numerical noise with a
smaller magnitude (ep ) than do the relatively looser settings. Therefore, the objective function and data responses obtained with the
tighter settings are generally more accurate than those obtained with the looser settings.
The effects of numerical noise to optimization performance have been widely studied (Burman and Gebart 2001; Borggaard et al.
2002; Moré and Wild 2011; Gao et al. 2015, 2016a). Depending on its level or magnitude, the numerical noise may lead to completely
incorrect numerical derivatives, which in turn may yield a bad-search direction when line-search optimization methods are applied
(Borggaard et al. 2002; Gao et al. 2016a). In the deep-learning community, researchers (Szegedy et al. 2014) similarly found that a
well-performed neural network can fail in predicting an image when adding an imperceptibly small nonrandom error. In practical appli-
cations, unfortunately, the numerical noise is inevitable because it is computationally infeasible to reduce the convergence tolerance of
linear and nonlinear solvers so much that the effect of numerical noise becomes negligible. In fact, to reduce the run time of a forward-
simulation run, the convergence tolerances of linear or nonlinear solvers are usually set to maximally large, acceptable values.
To mitigate the adverse effect of numerical noise on the performance of optimization, Gao et al. (2015, 2016a) developed a hybrid
optimization algorithm: Simultaneous Perturbation and Multivariate Interpolation (SPMI). The basic idea of SPMI is to hybridize a
quasi-Newton or a GN-optimization algorithm with a direct-pattern-search (DPS) -optimization algorithm (Hooke and Jeeves 1961). At
each iteration, the SPMI simultaneously generates perturbation points and searching points. Also, to perform a parallel DPS, the
perturbation points are used to approximate the numerical gradient (or sensitivity matrix). Then, the Hessian can be approximated
either by a quasi-Newton formulation (e.g., Broyden-Fletcher-Goldfarb-Shanno) or by a GN formulation. In the next iteration, different
searching points are generated using both a trust region with different searching radii and line-search methods with different step sizes.
With the help of a high-performance compute cluster, all simulation jobs with reservoir models represented by all the perturbation
points and searching points are executed in parallel.
The applications to different HM problems show that the SPMI performs much more efficiently (regarding the time used to complete
an HM task) and more robustly (regarding the finally converged objective-function value) than other model-based derivative-free opti-
mization methods (Chen et al. 2012). However, a major drawback of the SPMI is that only one history-matched model can be obtained
at one time. To generate multiple conditional realizations using the SPMI together with the randomized RML method, multiple optimi-
zation tasks starting from different initial guesses must run independently multiple times (Chen et al. 2016). The procedure is computa-
tionally expensive, especially when the number of uncertain parameters is large.
Although numerical tests (Gao et al. 2017a; Chen et al. 2017b) indicate that the L-DGN performs quite efficiently for noise-free toy
problems, it also suffers from the same issue as other gradient-based or model-based derivative-free optimization methods (i.e., degra-
dation in performance caused by the adverse effect of numerical noise).
According to the Taylor-series expansion, a truncation-error term will be introduced using a linear approximation by neglecting
higher-order terms. In addition to the truncation-error term, numerical noise might also affect the accuracy of the linear approximation.
We take the one-sided finite-difference method to approximate the first-order derivative of a response dðmÞ with respect to the ith
parameter mi as an example (Gao et al. 2016a),

@dðmÞ @dðmÞ ep ðeþ e0 Þ
¼ þ oðDmi Þ þ : . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ð18Þ
@mi @mi 0 Dmi

@dðmÞ
In Eq. 18, denotes the actual or exact derivative evaluated at m. Dmi is the perturbation size of mi . The term oðDmi Þ
@mi 0
represents the truncation error caused by neglecting the higher-order terms in the Taylor-series expansion to derive the one-sided finite-
ep ðeþ e0 Þ
difference equation. In Eq. 18, we define ENN ¼ , which represents the effect of numerical noise, where ep is the level or
Dmi
magnitude of numerical noise, and eþ and e0 are the normalized numerical noises introduced from numerical simulations at the current
point (mi ) and the perturbation point (mi þ Dmi ), ranging from –1.0 to 1.0.
To reduce the truncation error, a smaller perturbation size should be used. However, ENN may become unacceptably large using a
very small perturbation size. To overcome this conflicting issue, Gao et al. (2016a) developed an auto-adaptive perturbation-size updat-
ing algorithm to find the proper perturbation size by striking the balance between the truncation error and the error introduced from
numerical noise.
The estimation of the sensitivity matrix for the DGN optimization method is similar to the one-sided finite-difference approximation
of derivatives for the SPMI optimization method. In the DGN formulation, simulated responses are approximated by a linear proxy by
neglecting the higher-order nonlinear terms. As discussed in Gao et al. (2017a), the sensitivity matrix evaluated at a point m is approxi-
mated by interpolating the point m and the N m (the number of uncertainty parameters) points in the training-data set that are closest to
m , to reduce the truncation error. We should reemphasize that training-data points are searching points generated in previous iterations
during the DGN minimization process. Among the N m training points selected to estimate the sensitivity matrix at m by linear interpo-
lation, some of them may be quite close to each other, and some others are quite far away from each other. Interpolating those points
that are very close to each other may introduce a large error caused by numerical noise, similar to the last term (ENN ) in Eq. 18. On the
other hand, interpolating those points that are quite far away from each other may introduce a large truncation error, in the order of
o½max k mð jÞ mk2 . Using more than Nm points for linear fitting may suppress the negative effect of numerical noise, but it will
increase the truncation error.

Depending on the level or scale of numerical noise, the sensitivity matrix obtained with the linear interpolation as used in the
L-DGN may become very inaccurate, which may result in generating bad search points and thus significantly degrade the performance
of DGN. The linear interpolation in the L-DGN requires that the linear proxies must honor the simulated data exactly, which is ideal
when the simulated data are noise-free; however, when the simulated data are mingled with numerical noise, the data-exact strategy
will incur a significant error for predictions, especially when partial derivatives (or the sensitivity matrix) are used to guide the search
for GN-type optimization algorithms. Unlike linear interpolation, an SVR proxy model is more like a well-designed smooth, nonlinear-
response surface that does not honor the simulated data exactly, but generates the minimal distance (or training error) with the simulated
data as well as keeping the surface as flat as possible to prevent data overfitting. The nonlinear feature of the SVR proxy can effectively
suppress the truncation error because higher-order nonlinear terms are not completely neglected. On the other hand, the smoothing fea-
ture of the SVR proxy through the minimization of the distance between the SVR proxy and actual training-data points can significantly
reduce the error introduced from numerical noise. Because of these good features of an SVR proxy model, the SVR-DGN is expected
to perform more robustly against numerical noise and to provide more-accurate sensitivity matrix than does the L-DGN.
Validation With a 2D Synthetic Toy Problem

The same 2D synthetic toy problem in Gao et al. (2017a) is used to validate that the SVR-DGN is insensitive to the level of numerical
noise, by comparing the history-matched results between the SVR-DGN and the L-DGN under different noise levels. For the synthetic
2D toy problem, the flow response is given in an analytical formulation. The numerical noise is artificially added to the calculated-
flow response.
This 2D synthetic toy problem was originally designed by Gao et al. (2017a) to show that the DGN can detect multiple posterior
modes after HM. The analytical-flow responses of this 2D problem is given by
y0 ðx; tÞ ¼ 2 sin2 ð2px1 Þsin t þ 6 cos2 ð2px2 Þcos t þ ep ð2e 1Þ; x1 ; x2 2 ½0; 2; . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ð19Þ
where x1 and x2 are two parameters that define this 2D problem; y0 is the “noise-free” flow-response function as a function of time t and
the parameter vector x, where x ¼ ½x1 ; x2 .
To mimic the numerical noise generated in the process of numerical simulation, a term of ep ð2e 1Þy0 ðx; tÞ is introduced, where
ep > 0 represents the relative level of numerical noise added to the “noise-free” data responses and e is a random number uniformly dis-
tributed within [0,1]. The flow response mingled with artificial numerical noise is
yðx; t; ep Þ ¼ y0 ðx; tÞ½1 þ ep ð2e 1Þ: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ð20Þ
Also, the objective function for HM is given by

XNd
yðx; j; ep Þ dobs; j 2
Oðx; ep Þ ¼ ; . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ð21Þ
j¼1
1=3
where dobs; j for j ¼ 1; 2; … Nd represents the observed data at jth timestep and Nd ¼ 10 and 1=3 represents the standard deviation of the
measurement error. The objective function only includes the data-mismatch part. The surface map of the objective function for ep ¼ 0
(noise-free) is shown in Fig. 3. The objective function has 16 local minima (or equivalently, 16 local MAP estimates). The observed
data are obtained from inputting the parameters at any one of the 16 local minima into the analytical flow-response function defined in
Eq. 19 without adding any noise. In this example, three different numerical-noise levels are tested: noise free (ep ¼ 0), ep ¼ 0:01, and
ep ¼ 0:1.
700
600
x1
500
2 400
1,000
1.5 300
500
x2 0 1 200
2
0.5 100
1.5
1
0.5
0
0
Fig. 3—Illustration of the surface map of the objective function (copied from Gao et al. 2017a).
The example is now used to demonstrate the robustness of the SVR-DGN against numerical noise. Also, the results from the
L-DGN with linear interpolation are computed as a reference.
In this example, 100 realizations of x ¼ ½x1 ; x2 are sampled from a uniform distribution between the interval [0,2] as the initial guess
of the base cases. During the DGN optimization, the 100 base cases are updated iteratively using a new formulation of the GN trust-
region method (Gao et al. 2017b), so that the corresponding objective functions are decreasing on an iteration-by-iteration basis. At
each iteration, an SVR proxy y^ðx; tÞ as a surrogate of yðx; tÞ is built using simulation results generated at a current iteration and those
recorded during the previous iterations. Based on the SVR proxy, the sensitivity matrix for the next iteration can be analytically

calculated and used for updating the quadratic model approximated at each base case. Note that, for linear interpolation, we generate a
linear-proxy model for each base case using the points close to the base case. The reason that we cannot use the same amount of data to
generate the linear proxy is that the linear relationship will decay very fast if the training samples are far away from each base case,
which is caused by the inaccuracy of linear interpolation in approximating the nonlinear relationship between the model parameters and
the data response. However, unlike linear interpolation, the SVR can use more data points as a training-data set, and, therefore, it pro-
vides a better proxy. As shown in Figs. 4a through 6a, the natural logarithm of the median value of the objective functions for the
100 base-case points at each iteration is used to characterize the performance of the optimization process for both the SVR-DGN and
the L-DGN. As shown in Fig. 4a, if the noise level is zero, the medians obtained by the SVR-DGN and the L-DGN fall on top of each
other, indicating that the L-DGN performs equally well as the SVR-DGN. When the noise level is increased to 0.01, the advantage of
the SVR-DGN is already clearly visible in Fig. 5a, which shows that the median of the objective function (SVR-DGN) reaches its mini-
mal value with fewer iterations than required by the L-DGN. When the noise level is increased further to 0.1, the advantage of the
SVR-DGN is even more pronounced: It takes only seven iterations for the SVR-DGN to converge according to the medians of objective
functions, whereas the L-DGN cannot converge within ten iterations; see results shown in Fig. 6a. Note that all the SVR-DGN history-
matched results obtained with different noise levels can find all the 16 local minima of the objective functions. Here, we show only the
representative one that is without noise in Fig. 7. As one can see, 16 local minima are found using the SVR-DGN, which correspond to
all the local minima of the objective function of Eq. 21.
0
Median In(obj)
–5
SVR-DGN
–10 L-DGN
–15
–20
0 2 4 6 8 10
Iteration Number
Fig. 4—Median of ln(obj) vs. the iteration number obtained with the SVR-DGN and the L-DGN when the noise level is zero. The two
approaches, the SVR-DGN and the L-DGN, perform equally efficiently in this case. Hence, the curves overlap.
5
Median In(obj )
SVR-DGN
L-DGN
–5
–10
0 2 4 6 8 10
Iteration Number
Fig. 5—Median of ln(obj) vs. the iteration number obtained with the SVR-DGN (full line) and the L-DGN (dashed line) when the noise
level is 1%.
5
Median In(obj )
SVR-DGN
L-DGN
–5
–10
0 2 4 6 8 10
Iteration Number
Fig. 6—Median of ln(obj) vs. the iteration number obtained with the SVR-DGN (full line) and the L-DGN (dashed line) when the noise
level is 10%.

1.0
0.8
0.6
0.4
0.2
0.0
0.0 0.2 0.4 0.6 0.8 1.0
Fig. 7—Local minima obtained with the SVR-DGN at the 10th iteration. Black nodes represent different local minima. Around each
local minimum, a set of realizations is labeled with the same number that represents that they are from the same local basin.
From the comparisons shown in Figs. 4a through 6a, we can see that the SVR-DGN is more tolerant to numerical noise for this
simple example. By using the SVR-DGN, we may set a larger convergence tolerance for both linear and nonlinear solvers when running
forward reservoir-simulation models, which will reduce further the time to complete one simulation in each iteration.
Because the SVR proxies are used to estimate the sensitivity matrix evaluated at the best solutions (the base cases) for each realiza-
tion in the current iteration, the accuracy of the estimated sensitivity matrix is more important for the DGN optimization method. In this
paper, the Frobenius norm (summation of the squared values of all elements) of the matrix DGðmÞ ¼ GActual ðmÞ GEstimate ðmÞ,
averaged over the bases cases obtained in the same iteration, is used to quantify the accuracy of SVR proxies and linear proxies.
Figs. 4b, 5b, and 6b compare the accuracy of SVR proxies and linear proxies in terms of the averaged Frobenius norm of DGðmÞ. As
expected, the SVR proxies are more accurate than linear proxies, especially when numerical noise presents.
Application to a Real-Field Example With Synthetic Data

Reservoir Description. This field example features a channelized reservoir with three different facies—the channel-sand facies, the
levee facies, and the shale facies. An object-based modeling approach is applied to model-facies distribution in a channelized reservoir.
The static model of this reservoir contains 106,680 active gridblocks, in which the porosity and absolute-permeability fields in grid-
blocks with the same facies are obtained with sequential-Gaussian simulation. A total of 1,410 unconditional realizations is generated
using the object-based modeling approach together with the sequential Gaussian-simulation technique. There is one gas injector and
seven horizontal producers. The production data to be history matched are generated synthetically as follows: A total of 1,400 uncondi-
tional realizations is used for pluri-principal-component analysis (pluri-PCA) (Chen et al. 2015), which reparameterizes the reservoir
parameters from the original 426,720-gridblock-based variables (facies indicator, porosity, permeability, and net-to-gross ratio in each
gridblock) to 200 principal-component coefficients. In addition to the 200 variables, 18 parameters that define the rock-type rule and
17 global parameters, such as aquifer strength, fault transmissibility, and others, are also included as uncertain parameters to be tuned
for HM. One of the unconditional realizations among the remaining ten models that is not included for pluri-PCA analysis is selected as
the “true” reservoir model. Fig. 8 shows the areal and cross-sectional views of the facies distribution in the true model. The total liquid-
production-rate data observed in each producer and gas-injection-rate data observed in the gas injectors, provided by the real asset, are
used as well-operating conditions for the forward-simulation run. The synthetic-production historical data are generated from the
reservoir-simulation run on the basis of the true model using the “tight” settings. The synthetic data include gas- and water-production
rates in all producers and BHPs in all wells. The total number of synthetic production data is 1,701. The synthetic data are subsequently
perturbed by adding uncorrelated Gaussian random noise with zero-mean values and standard deviations specified per different types of
data (i.e., 100 STB/D for water-production rate, 100 psi for BHP, and 500,000 scf/D for gas-production rate). A total of 300 uncondi-
tional realizations is used as the initial base cases.
Effect of Removing Searching Points Generated in Earlier Iterations From the SVR Training-Data Set. First, we will validate
that removing searching points generated in earlier iterations from the SVR training-data set will reduce the SVR training time signifi-
cantly, but has minimal impact on the convergence performance of the proposed SVR-DGN optimization method, using this real-field
example. Ne ¼ 300 unconditional realizations are generated from the prior Gaussian distribution, and are used as initial guesses. The
RML formulation defined in Eq. 2 is used as the objective function to be minimized. The normalized objective function, as shown in
Figs. 9 and 10, is the objective function defined in Eq. 2 divided by Nd ¼ 1; 701 for this example.
Fig. 9 compares the cumulative SVR training costs (in minutes) between using more training-data points (red curve, NSVR ¼ 10) and
using fewer training-data points (black curve, NSVR ¼ 5). The red curve indicates that the cumulative SVR training cost might become
extremely expensive using more training-data points. It takes more than 3,000 minutes (approximately 2 days) cumulatively to train
SVR proxies using training-data points collected in the last 10 iterations. In contrast, it takes only 250 minutes (approximately 4 hours)
cumulatively to train SVR proxies using training-data points collected in the last five iterations.
Fig. 10a compares the convergence profiles between using more training-data points (red curve, NSVR ¼ 10) and using fewer
training-data points (black curve, NSVR ¼ 5). The two curves are almost identical, which indicates clearly that using fewer training-data
points will not affect the convergence performance very much. In Fig. 10a, the vertical coordinate is the median value of the normalized
objective function evaluated among the 300 base cases (the best solution for each realization) in each iteration. After approximately
10 iterations, both curves converge to a value that is quite close to 1.0, the theoretically expected value (Tarantola 2005). Fig. 10b com-
pares the cumulative-probability-distribution function (CDF) plots of the normalized objective function evaluated at the 300 uncondi-
tional realizations or initial reservoir models (solid curves) and 300 RML samples or history-matched reservoir models (dashed curves)

obtained at the 10th iteration. As an example, Fig. 11 compares the marginal CDF plots of the first uncertain parameters. The light-blue
solid curve is generated from unconditional realizations. The red-dashed curve is obtained by the SVR-DGN using more training-data
points (NSVR ¼ 10), and the black-dashed curve is obtained by the SVR-DGN using fewer training-data points (NSVR ¼ 5). The solid-
black curve and the solid-red curve are identical, which means that the same 300 unconditional realizations are used for both cases. The
fact that the black-dashed curve and the red-dashed curve are almost identical validates that they both obtain equally good RML sam-
ples with a negligible difference for uncertainty quantification.
Shale
Levee
Sand
(a) Areal view (b) Cross-sectional view
Fig. 8—Areal view (a) and a cross-sectional view (b) of the facies distribution for the true reservoir model.
3,500
Cumulative SVR-Training Cost
3,000
2,500
(minutes)
2,000
NSVR = 10
1,500 NSVR = 5
1,000
500
0
0 5 10 15 20
Iteration
Fig. 9—Comparison of SVR training costs between using more (red curve with NSVR 5 10) and fewer (black curve with NSVR 5 5)
training-data points.
Normalized Objective Function
18 1
16 NSVR = 10 0.9
14 NSVR = 5 0.8
0.7
12
0.6 NSVR = 5, Iteration = 0
10
CDF
0.5 NSVR = 5, Iteration = 10

8 0.4 NSVR = 10, Iteration = 0
6 0.3 NSVR = 10, Iteration = 10
4 0.2
2 0.1
0 0
0 5 10 15 0.1 1 10 100 1,000
Iteration Normalized Objective Function
(a) Convergence profiles (b) CDF plots of the normalized objective function
Fig. 10—Comparison of convergence profiles (a) and CDF plots of the normalized objective function between using more (black
curves, NSVR 5 10) and fewer (red curves, NSVR 5 5) training-data points.

0.8
0.6 Unconditional
CDF
Iteration 10, NSVR = 10
0.4
Iteration 10, NSVR = 5
0.2
0
–3 –2 –1 0 1 2 3
x1
Fig. 11—Comparison of marginal CDF plots of the first uncertain parameter between using more (dashed red curve, NSVR 5 10) and
fewer (dashed black curve, NSVR 5 5) training-data points.
How to determine the best value of NSVR is still an open question. On the basis of the numerical tests, we recommend selecting NSVR
such that Ne NSVR ¼ bNm with b ranging from 3 to 10, and using a larger b for HM problems with highly nonlinear responses.
Effects of Numerical Noise. Next, the effect of numerical noise is explored for this example. Specifically, the synthetic data of this
field example are history matched using the SVR-DGN and the L-DGN with relatively loose (denoted as “default” next) and tight
solver-convergence settings for the reservoir simulator. A summary of the default and tight settings is listed in Table 1. As shown, it
only takes one-third of central-processing-unit (CPU) time to complete one reservoir simulation using the default setting compared with
using the tight setting when the same reservoir model is applied. In Table 1, for nonlinear solver, Solver Tolerance Multiplier represents
the maximum-allowed relative-error multiplier applied to five nonlinear-solver tolerances (i.e., well-constraint equations, well-rate
equations, well-pressure equations, volume-balance equations, and component-flow equations), whereas, for linear solver, it represents
the required relative tolerance to converge.
Solver Tolerance
Multiplier
Numerical Non- Linear Simulation Time Newton Linear Solver Max.-Error Cumulative
Settings linear Solver (seconds) Timesteps Steps Steps Mass Balance
–5 –6
Tight 0.01 1x10 1,718 173 438 11,855 2.5x10
–3 –4
Default 1 1x10 487 173 248 2,463 3x10
Table 1—Summary of simulation run time using the default and tight settings.
Fig. 12 shows the CDF curves obtained with the SVR-DGN and the L-DGN at the 10th iteration with two different numerical set-
tings of solvers. In Fig. 12, the horizontal axis is the value of the natural logarithm of the objective function, and the vertical axis is the
corresponding value of the CDF. At the same iteration count, the SVR-DGN obtains almost the same CDF curves using different
numerical settings: tight-convergence criteria (as shown by the orange curve in Fig. 12) and default-convergence criteria (as demon-
strated by the blue curve in Fig. 12). Numerical results for the real-field HM problem also validate that numerical settings (or numerical
noise) have very little influence on the performance of the SVR-DGN (i.e., the SVR-DGN is quite robust against numerical noise). On
the contrary, at the same iteration count, the L-DGN obtains relatively smaller objective-function values (the CDF curve is shifted
toward the left, as shown by the purple curve in Fig. 12) using the tight tolerance compared with using the default settings (as demon-
strated by the yellow curve in Fig. 12). In other words, the numerical noise has a significant impact on the performance of the L-DGN.
1
CDF of Normalized OBJ
0.8
0.6 SVR-DGN default

SVR-DGN tight
0.4 L-DGN default
L-DGN tight
0.2
0
2 4 6 8 10
Normalized OBJ
Fig. 12—Cumulative PDF obtained with the L-DGN and the SVR-DGN with the different noise level at the 10th iteration; “default”
denotes default numerical setting, and “tight” refers to the tight numerical setting. Each curve represents a distribution obtained
with 300 history-matched models.

As an example, Fig. 13 compares the marginal CDF plots of the first uncertain parameters with samples generated by different meth-
ods (the L-DGN in Fig. 13a and the SVR-DGN in Fig. 13b) with different numerical settings (the Default settings in red and the Tight
settings in black). The light-blue solid curve is generated from unconditional realizations. As shown in Fig. 13a, the wide gap between
the red-dashed curve and the black-dashed curve indicates that numerical noise has a significant impact on the distribution of condi-
tional samples generated by the L-DGN optimizer. In contrast, numerical noise has a minimal impact on the results of the SVR-DGN
optimizer, as indicated by the fact that the red-dashed curve and black-dashed curve shown in Fig. 13b are almost identical. Although
not shown in this paper, similar results are also observed for other uncertain parameters.
1 1
0.8 0.8
0.6 0.6
CDF
CDF
0.4 0.4
Unconditional Unconditional
Linear default SVR default
0.2 0.2
Linear tight SVR tight
0 0
–3 –2 –1 0 1 2 3 –3 –2 –1 0 1 2 3
x1 x1
(a) L-DGN (b) SVR DGN
Fig. 13—Comparison of marginal CDF plots of the first uncertain parameter between using the default settings (red-dashed curve)
and tight settings (black-dashed curve): (a) the L-DGN and (b) the SVR-DGN.
We also compare the performances of the SVR-DGN with the L-DGN in Table 2 regarding the number of simulation runs required
to obtain one acceptable conditional realization (or history-matched model) on average. The L-DGN takes approximately 75 simulation
runs to obtain one acceptable conditional realization using the default numerical setting for linear and nonlinear solvers, whereas it
takes 46 simulations using the tight-convergence criterion. The difference between using the tight setting and the default setting of solv-
ers is quite significant when the L-DGN is applied. In contrast, it takes almost the same number of simulation runs (approximately 11)
on average for the SVR-DGN to obtain one conditional realization using the two different numerical settings (“tight” and “default”).
With these numerical-test results, we may conclude that the SVR-DGN performs in a significantly more-stable manner against
numerical noise than does the L-DGN. In other words, the computational cost spent on forward-simulation runs can be reduced signifi-
cantly by applying the default, or even further-loosened, numerical settings for solvers of reservoir simulators when we apply the
SVR-DGN for solving HM problems, without sacrificing the efficiency of finding history-matched models.
SVR Default SVR Tight Linear Default Linear Tight

Simulation numbers 3,000 3,000 4,800 4,800
Conditional realizations 277 278 64 107
Simulations/conditional realizations 10.83 10.8 75 45.86
Table 2—Comparison between the L-DGN and the SVR-DGN with the different noise levels.
Performance Comparison Among Different HM Methods. Finally, AHM is performed using the L-DGN and the SVR-DGN meth-
ods with the same default numerical setting for the simulation solvers. The performances of the L-DGN and the SVR-DGN are com-
pared in Fig. 14, where the horizontal axis is the natural logarithm of the objective function and the vertical axis is the CDF of the
objective-function values evaluated from the simulation results of the 300 history-matched models. The orange and blue curves are the
CDFs of the objective-function values obtained with the SVR-DGN and the L-DGN at the 10th iteration. Overall, the objective-
function values obtained by the SVR-DGN are considerably smaller than those obtained with the L-DGN. The two curves in purple and
yellow show that, to achieve the similar CDF curve of the objective-function values, the L-DGN requires 16 iterations, whereas the
SVR-DGN requires only six iterations.
1
CDF of Normalized OBJ
0.8
0.6 L-DGN Iteration 10

SVR-DGN Iteration 10
0.4 L-DGN Iteration 16
SVR-DGN Iteration 6
0.2
0
2 4 6 8 10
Normalized OBJ
Fig. 14—CDF obtained with the L-DGN and the SVR-DGN using the default-convergence settings at different iteration numbers.

We summarize the quantitative comparison among the SPMI, the L-DGN, and the SVR-DGN in Table 3. The following four aspects
are listed: the initial sample size, the number of acceptable conditional realizations obtained after HM, the number of iterations required
to achieve convergence, and the average simulation runs required to obtain one acceptable conditional realization. In this example, a
history-matched model is accepted as a conditional realization if the value of the objective function is smaller than 3,000 (1.76 Nd ).
Because it is very computationally expensive for the SPMI to generate one conditional realization (e.g., it requires 470 simulation runs
per iteration to evaluate the numerical gradient for one realization), the initial sample size for the SPMI is limited to five, much smaller
than that used by the L-DGN and the SVR-DGN. From Table 3, we can see that the SVR-DGN performs the best compared with the
SPMI and the L-DGN. For example, the SVR-DGN only needs 10.8 simulation runs on average to obtain one history-matched model,
whereas the L-DGN needs 75 simulation runs, and the SPMI needs 4,700 simulation runs. For this real-field HM example, the SVR-
DGN reduces the computational cost by a factor of 7 compared with the L-DGN, and a reduction factor of 460 compared with the SPMI,
in terms of the average number of simulation runs required to obtain one acceptable history-matched model or conditional realization.
SPMI L-DGN SVR-DGN
Sample size per iteration 5 300 300
Conditional realizations (OBJ < 3,000) 5 64 277
Iterations to converge 10 16 10
Simulation runs per conditional realizations 4,700 75 10.8
Table 3—Comparison among the SPMI, the L-DGN, and the SVR-DGN. OBJ 5 objective function.
Production Forecasts. For illustration, production forecasts from one producer are shown in Fig. 14. Green curves in Fig. 15 are gen-
erated by simulating the 300 initial models (or unconditional realizations), and blue curves are generated from the 277 acceptable
history-matched models (or conditional realizations) obtained by the SVR-DGN. In this example, the synthetic observed production
data (denoted by the red curve in Fig. 14) in the first 4.7 normalized times are used for HM, and the synthetic observed production data
at the remaining period are used for validation and future-performance prediction. In the production-forecasting period, all producers
apply constant total-liquid rate given at the end of the HM period, as well as operating conditions with the minimal-target BHP and the
maximal-target gas-production-rate set. After the liquid-production rate yields a BHP value smaller than this minimal-target value, the
well is automatically converted to BHP control with this target value; after the gas-production rate is more than the maximal gas rate,
the well control is automatically converted to gas-rate control with the constant maximal gas rate.
2 2
1.5 1.5
1 1
0.5
0.5
0
0
5 10 15 20 5 10 15 20
Normalized Time Normalized Time
(a) Normalized oil rate (b) Normalized water rate
0.9
1.2 0.8
1
0.7
0.8
0.6
0.6
0.5
0.4
0.2 0.4
0 0.3
5 10 15 20 5 10 15 20
Normalized Time Normalized Time
(c) Normalized gas rate (d) Normalized BHP
Fig. 15—Comparison of production forecasts generated from 300 unconditional realizations (green curves) and 277 conditional
realizations (blue curves). Red curves denote the “true” data, and red dots indicate the observed data. The vertical black-dashed
line separates the HM period and the forecast period. (a) Normalized oil rate, (b) normalized water rate, (c) normalized gas rate,
(d) normalized BHP.

In Fig. 15, the flow responses obtained with the 300 initial models (or unconditional realizations) in the green cannot match the his-
torical data. However, after HM, the flow responses in blue generated from the 277 conditional realizations are in good agreement with
the observed data. Furthermore, in the prediction period, the production forecasts generated from the “true” model are at almost the
center of the uncertainty range of the forecasts generated from the 277 conditional realizations.
The results shown in Fig. 15 demonstrate that, by using the SVR-DGN, we can more efficiently obtain multiple conditional realiza-
tions with an equal or better quality (measured by the value of the objective function) than those obtained with our previous L-DGN
method, as well as for the geologically complex reservoir featured in this field case.
Our numerical results of the real-field example indicate that the proposed SVR-DGN HM workflow can be applied to generate con-
ditional realizations. However, as noted by Chen et al. (2017b), applying a local-search optimization method together with the RML
method might generate biased samples of the posterior PDF, especially when it is applied to highly nonlinear systems. One feasible
way of generating unbiased samples of the posterior PDF is to find multiple local MAP estimates using the local-search DGN (e.g., the
SVR-DGN discussed in this paper), construct a Gaussian-mixture model (GMM) to approximate the posterior PDF, and then generate
conditional realizations by sampling from the GMM, as suggested by Gao et al. (2016b). An alternative is to create approximate condi-
tional realizations by integrating the global-search DGN with the RML methods as proposed by Chen et al. (2017b). The major objec-
tive of this paper, however, is to demonstrate the capability of enhancing the overall performance (robustness and efficiency) of the
L-DGN HM workflow through the proper integration of the machine-learning technique with the DGN optimization method. Results
shown in Fig. 14 indicate that the proposed SVR-DGN optimization method can find solutions (or history-matched reservoir models)
that generate production profiles matching the observed data reasonably well. In the future, we intend to address the challenge of prop-
erly generating an unbiased ensemble suitable for uncertainty quantification of forecast results (Chen et al. 2017a).
Conclusion
Forward-solution error caused by numerical simulation is unavoidable, and it might degrade the performance of the L-DGN optimiza-
tion. Although using tighter settings in the reservoir simulation might improve the performance of the L-DGN, the overall run time will
increase, which will obviously jeopardize the overall efficiency for HM. In this paper, we have investigated the mechanisms of perform-
ance enhancement of the DGN optimization using the SVR technique, both theoretically and numerically. The nonlinear feature of the
SVR proxy can effectively suppress the truncation error, whereas the smoothing feature of the SVR proxy can significantly reduce the
error introduced from numerical noise. On the basis of our theoretical discussions and numerical experiments, we can draw the follow-
ing conclusions:
• The SVR-DGN performs more robustly against numerical noise than does the L-DGN.
• The SVR-DGN performs more efficiently than the SPMI and the L-DGN, and it requires fewer reservoir-simulation runs to generate
a conditional realization.
• The SVR-DGN can generate a set of history-matched models with smaller values (or at least equal values) of the objective function
than those obtained with our previously developed L-DGN method.
• The SVR-DGN can reduce the overall computational cost, and therefore improve the overall efficiency of HM, using the relatively
loose numerical settings without sacrificing the quality or accuracy of history-matched models.
Because the SVR is one of the machine-learning or data-analytics approaches, there might be other approaches that can be applied
to provide even more-effective and -robust regression methods that can be integrated with our DGN optimization method. We will
investigate other methods in future research.
Nomenclature
Alpha ¼ permeability reduction coefficient
b ¼ parameter of SVR model
CD ¼ covariance matrix for Gaussian measurement error of observed data
CM ¼ covariance matrix of prior models M
dobs ¼ vector of observed data
e ¼ error term or numerical noise
g ¼ gradient
G ¼ sensitivity matrix
H ¼ Hessian matrix
J ¼ square error for estimating SVR tuning parameter
K ¼ kernel function
m ¼ vector of model parameters
mpr ¼ prior mean of the model-parameter vector
Nd ¼ number of observed data
Ne ¼ number of RML samples per iteration
NItr ¼ number of iterations required for DGN to converge
Nm ¼ number of uncertainty parameters
Ns ¼ number of training samples for SVR model
Nw ¼ dimension of weighting vector
O ¼ objective function
q ¼ quadratic model
w ¼ weighting vector
y ¼ response of the reservoir model
a ¼ parameter of LS-SVR model
e ¼ threshold, a small positive number
ep ¼ magnitude of numerical noise
r ¼ bandwidth of Gaussian kernel
c ¼ regularization factor for SVR
u ¼ mapping function

Superscripts
1 ¼ inverse matrix
T ¼ transpose matrix
* ¼ best solution at current GN iteration
Subscripts
cr ¼ threshold
i ¼ index of the ith observed data or response
j ¼ index of realization
k ¼ index of training point or iteration
min ¼ minimum
max ¼ maximum
uc ¼ unconditional realization
Acknowledgments
We want to thank Shell Global Solutions (US) Incorporated and Shell International Exploration and Production Incorporated for their
support to publish this paper.
References
Borggaard, J., Pelletier, D., and Vugrin, K. 2002. On Sensitivity Analysis for Problems With Numerical Noise. Presented at the 9th AIAA/NASA/USAF/
ISSMO Symposium on Multidisciplinary Analysis and Optimization, Atlanta, Georgia, USA, 4–6 September. https://doi.org/10.2514/6.2002-5553.
Burman, J. and Gebart, B. 2001. Influence of Numerical Noise in the Objective Function for Flow Design Optimization. International Journal of Numer-
ical Methods for Heat & Fluid Flow 11 (1): 6–19. https://doi.org/10.1108/09615530110364051.
Chen, C., Li, G., and Reynolds, A. 2010. Closed-Loop Reservoir Management on the Brugge Test Case. Computational Geosciences 14 (4): 691–703.
https://doi.org/10.1007/s10596-010-9181-7.
Chen, C., Jin, L., Gao, G. et al. 2012. Assisted History Matching Using Three Derivative-Free Optimization Algorithms. Presented at the SPE Europec/
EAGE Annual Conference, Copenhagen, Denmark, 4–7 June. SPE-154112-MS. https://doi.org/10.2118/154112-MS.
Chen, C., Gao, G., Ramirez, B. A. et al. 2015. Assisted History Matching of Channelized Models Using Pluri-Principal Component Analysis. Presented
at the SPE Reservoir Simulation Symposium, Houston, 23–25 February. SPE-173192-MS. https://doi.org/10.2118/173192-MS.
Chen, C., Li, R., Gao, G. et al. 2016. EUR Assessment of Unconventional Assets Using Parallelized History Matching Workflow Together With the
RML Method. Presented at the Unconventional Resources Technology Conference, San Antonio, Texas, 1–3 August. URTeC-2429986-MS. https://
doi.org/10.15530/URTeC-2016-2429986-MS.
Chen, B. and Reynolds, A. C. 2016. Ensemble-Based Optimization of the Water-Alternating-Gas-Injection Process. SPE J. 21 (3): 786–798. SPE-
173217-PA. https://doi.org/10.2118/173217-PA.
Chen, B., He, J., Wen, X.-H. et al. A. C. 2017a. Uncertainty Quantification and Value of Information Assessment Using Proxies and Markov Chain
Monte Carlo Method for a Pilot Project. Journal of Petroleum Science and Engineering 157: 328–339. https://doi.org/10.1016/j.petrol.2017.07.039.
Chen, C., Gao, G., Li, R. et al. 2017b. Integration of Distributed Gauss-Newton With Randomized Maximum Likelihood Method for Uncertainty Quanti-
fication of Reservoir Performance. Presented at the SPE Reservoir Simulation Conference, Montgomery, Texas, 20–22 February. SPE-182639-MS.
https://doi.org/10.2118/182639-MS.
Cortes, C. and Vapnik, V. 1995. Support-Vector Networks. Machine Learning 20 (3): 273–297. https://doi.org/10.1007/BF00994018.
De Brabanter, K. 2011. Least Squares Support Vector Regression With Applications to Large-Scale Data: A Statistical Approach. Faculty of Engineer-
ing, KU Leuven, Katholieke Universiteit Leuven.
Demyanov, V., Pozdnoukhov, A., Christie, M. et al. 2010. Detection of Optimal Models in Parameter Space With Support Vector Machines. In GeoENV
VII—Geostatistics for Environmental Applications. Quantitative Geology and Geostatistics, ed. P. Atkinson and C. Lloyd C., Vol. 16. Dordrecht:
Springer.
Gao, G., and Reynolds, A. C. 2006. An Improved Implementation of the LBFGS Algorithm for Automatic History Matching. SPE J. 11 (1): 5–17. SPE-
90058-PA. https://doi.org/10.2118/90058-PA.
Gao, G., Vink, J. C., Alpak, F. O. et al. 2015. An Efficient Optimization Work Flow for Field-Scale In-Situ Upgrading Developments. SPE J. 20 (4):
701–715. SPE-163634-PA. https://doi.org/10.2118/163634-PA.
Gao, G., Vink, J. C., Chen, C. et al. 2016a. A Parallelized and Hybrid Data-Integration Algorithm for History Matching of Geologically Complex Reser-
voirs. SPE J. 21 (6): SPE-175039-PA. https://doi.org/10.2118/175039-PA.
Gao, G., Vink, J. C., Chen, C. et al. 2016b. Uncertainty Quantification for History Matching Problems With Multiple Best Matches Using a Distributed
Gauss-Newton Method. Presented at the SPE Annual Technical Conference and Exhibition, Dubai, 26–28 September. SPE-181611-MS. https://
doi.org/10.2118/181611-MS.
Gao, G., Vink, J. C., Chen, C. et al. 2017a. Distributed Gauss-Newton Optimization Method for History Matching Problems With Multiple Best Matches.
Computational Geosciences 21 (5–6): 1325–1342. https://doi.org/10.1007/s10596-017-9657-9.
Gao, G., Jiang, H., Hagen, P. V. et al. 2017b. A Gauss–Newton Trust Region Solver for Large-Scale History Matching Problems. Presented at the SPE
Reservoir Simulation Conference, Montgomery, Texas, 20–22 February. SPE-182602-MS. https://doi.org/10.2118/182602-MS.
Guo, Z., Chen, C., Gao, G. et al. 2017. EUR Assessment of Unconventional Assets Using Machine Learning and Distributed Computing Techniques.
Presented at the SPE/AAPG/SEG Unconventional Resources Technology Conference, Austin, Texas, 24–26 July. URTeC-2659996-MS. https://
doi.org/10.15530/URTeC-2017-2659996-MS.
Guo, Z., Chen, C., Gao, G. et al. 2018a. Integration of Support Vector Regression With Distributed Gauss-Newton Optimization Method and Its Applications
to Uncertainty Assessment of Unconventional Assets. SPE Res Eval & Eng 21 (4): 1007–1026. SPE-191373-PA. https://doi.org/10.2118/191373-PA.
Guo, Z., Reynolds, A. C., and Zhao, H. 2018b. A Physics-Based Data-Driven Model for History-Matching, Prediction, and Characterization of Water-
flooding Performance. SPE J. 23 (2): 367–395. SPE-182660-PA. https://doi.org/10.2118/182660-PA.
Guo, Z., Reynolds, A. C., and Zhao, H. 2018c. Waterflooding Optimization With the INSIM-FT Data-Driven Model. Computational Geosciences 22 (3):
745–761. https://doi.org/10.1007/s10596-018-9723-y.
Guo, Z. and Reynolds, A. C. 2018. Robust Life-Cycle Production Optimization With a Support-Vector-Regression Proxy. SPE J. 23 (6): 2409–2427.
SPE-191378-PA. https://doi.org/10.2118/191378-PA.

Hooke, R. and Jeeves, T. A. 1961. “Direct Search” Solution of Numerical and Statistical Problems. Journal of the ACM 8 (2): 212–229. https://doi.org/
10.1145/321062.321069.
Jansen, J. D., Brouwer, D., Naevdal, G. et al. 2005. Closed-Loop Reservoir Management. First Break 23 (1079): 43–48. https://doi.org/10.3997/1365-
2397.2005002.
Jansen, J. D., Douma, S. D., Brouwer, D. R. et al. 2009. Closed-Loop Reservoir Management. Presented at the SPE Reservoir Simulation Symposium,
The Woodlands, Texas, 2–4 February. SPE-119098-MS. https://doi.org/10.2118/119098-MS.
Kahrobaei, S., Van Essen, G., Van Doren, J. et al. 2013. Adjoint-Based History Matching of Structural Models Using Production and Time-Lapse Seis-
mic Data. Presented at the SPE Reservoir Simulation Symposium, The Woodlands, Texas, 18–20 February. SPE-163586-MS. https://doi.org/
10.2118/163586-MS.
Kitanidis, P. K. 1995. Quasi-Linear Geostatistical Theory for Inversing. Water Resources Research 31 (10): 2411–2419. https://doi.org/10.1029/
95WR01945.
Li, R., Reynolds, A. C., and Oliver, D. S. 2003. Sensitivity Coefficients for Three-Phase Flow History Matching. J Can Pet Technol 42 (4): 70–77.
PETSOC-34-04-04. https://doi.org/10.2118/03-04-04.
Mercer, J. 1909. Functions of Positive and Negative Type, and Their Connection With the Theory of Integral Equations. In Philosophical Transactions
of the Royal Society of London. Series A, Containing Papers of a Mathematical or Physical Character, Vol. 209, 415–446.
Moré, J. J. and Wild, S. M. 2011. Estimating Computational Noise. SIAM Journal on Scientific Computing 33 (3): 1292–1314. https://doi.org/10.1137/
100786125.
Nash, S. G. and Sofer, A. 1996. Linear and Nonlinear Programming. Blacklick, Ohio: McGraw-Hill Science/Engineering/Math.
Nocedal, J. and Wright, S. J. 1999. Numerical Optimization. New York: Springer.
Oliver, D. S. 1996. Multiple Realizations of the Permeability Field From Well-Test Data. SPE J. 1 (2): 145–155. SPE-27970-PA. https://doi.org/
10.2118/27970-PA.
Oliver, D. S., Reynolds, A. C., and Liu, N. 2008. Inverse Theory for Petroleum Reservoir Characterization and History Matching. Cambridge University
Press. https://doi.org/10.1017/CB09780511535642.
Platt, J. 1998. Sequential Minimal Optimization: A Fast Algorithm for Training Support Vector Machines. In Advances in Kernel Methods—Support
Vector Learning. TechReport. No. MSR-TR-98-14.
Reynolds, A. C., Li, R., and Oliver, D. S. 2004. Simultaneous Estimation of Absolute and Relative Permeability by Automatic History Matching of
Three-Phase Flow Production Data. J Can Pet Technol 43 (3): 37–46. PETSOC-04-03-03. https://doi.org/10.2118/04-03-03.
Reynolds, A. C., Zafari, M., and Li, G. 2006. Iterative Forms of the Ensemble Kalman Filter. In Proc., 10th European Conference on the Mathematics of
Oil Recovery, Amsterdam, 4–7 September. https://doi.org/10.3997/2214-4609-201402496.
Smola, A. J. and Schölkopf, B. 2004. A Tutorial on Support Vector Regression. Statistics and Computing 14 (3): 199–222. https://doi.org/10.1023/
B:STCO.0000035301.49549.88.
Suykens, J. A. and Vandewalle, J. 1999. Least Squares Support Vector Machine Classifiers. Neural Processing Letters 9 (3): 293–300. https://doi.org/
10.1023/A:1018628609742.
Suykens, J. A., De Brabanter, J., Lukas, L. et al. 2002. Weighted Least Squares Support Vector Machines: Robustness and Sparse Approximation. Neuro-
computing 48 (1): 85–105. https://doi.org/10.1016/S0925-2312(01)00644-0.
Szegedy, C., Zaremba, W., Sutskever, I. et al. 2014. Intriguing Properties of Neural Networks. Cornell University Library. arXiv:1312.6199.
Tarantola, A. 2005. Inverse Problem Theory and Methods for Model Parameter Estimation. Society of Industrial and Applied Mathematics (SIAM).
https://doi.org/10.1137/1.9780898717921.
Vugrin, K. E. 2005. On the Effects of Noise on Parameter Identification Optimization Problems. PhD dissertation, Virginia Polytechnic Institute and
State University, Blacksburg, Virginia (April 2005).
Yu, T., Wilkinson, D., and Castellini, A. 2007. Applying Genetic Programming to Reservoir History Matching Problem. In Genetic Programming Theory
and Practice IV. Genetic and Evolutionary Computation, ed. R. Riolo, T. Soule, and B. Worzel. Boston, Massachusetts: Springer.
Zhenyu Guo currently is an analytics engineer with Occidental Petroleum Corporation. The work of this paper was performed
when he was a PhD-degree student at the University of Tulsa. Guo’s research interests include data-driven modeling, machine
learning, AHM, uncertainty quantification, and production optimization. He earned a BA degree and an MS degree from the
China University of Geosciences, Beijing, and a PhD degree from the University of Tulsa, all in petroleum engineering. Guo is a
member of SPE.
Chaohui Chen is a reservoir engineer in Shell Exploration and Production Company, Incorporated. His interests include reservoir
modeling, optimization, uncertainty quantification, and reservoir management. Chen has worked for Shell for more than 7 years,
with experience in deepwater, onshore, and unconventional reservoir-modeling projects. He holds a PhD degree in petroleum
engineering from the University of Tulsa. Chen is a member of SPE and currently serves as a SPE Journal technical editor.
Guohua Gao is a staff reservoir engineer in the Modeling and Optimization Department of Shell Global Solutions US Incorpo-
rated. He has been an active researcher in the oil and gas industry for 30 years. Gao’s research covers reservoir simulation,
subsurface-uncertainty quantification, HM, production optimization, and oilwell-tubular mechanics. He has published more than
20 papers in peer-reviewed journals. Gao is an associate editor for SPE Journal and a technical reviewer for eight prestigious
international journals. He was the recipient of the 2010 and 2011 SPE Journal Outstanding Associate Editor Award and received
the SPE A Peer Apart honor in 2015. Gao earned a PhD degree in petroleum engineering from the University of Tulsa.
Jeroen C. Vink is principal reservoir engineer and subject-matter expert in reservoir-simulation technology in Shell Global Solu-
tions International. He has worked on many aspects of subsurface-flow simulation (linear and nonlinear solvers, parallelization,
AHM, thermal enhanced oil recovery) and on in-situ conversion and upgrading processes. Vink serves on the organizing commit-
tee of the European Conference on the Mathematics of Oil Recovery. He earned a PhD degree in theoretical physics from the
University of Amsterdam.

Spe 187430 Pa

Uploaded by

Copyright:

Available Formats

Spe 187430 Pa

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Spe 187430 Pa

Uploaded by

Copyright:

Available Formats

J187430 DOI: 10.

2118/187430-PA Date: 1-December-18 Stage: Page: 2428 Total Pages: 16

Enhancing the Performance of the

2428 December 2018 SPE Journal

ID: jaganm Time: 11:34 I Path: S:/J###/Vol00000/180083/Comp/APPFile/SA-J###180083

Formulating HM Problems Within the Bayesian Framework

GðmÞ ¼ ½ryð1Þ ðmÞ; ryð2Þ ðmÞ; … ; ryðNd Þ ðmÞT ; . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .ð4Þ

December 2018 SPE Journal 2429

ID: jaganm Time: 11:34 I Path: S:/J###/Vol00000/180083/Comp/APPFile/SA-J###180083

ek ¼ yk y^k ; for k ¼ 1; 2; … Ns : . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ð11Þ

Lðw; b; e; aÞ ¼ Jðw; eÞ RNk¼1

For the examples in this paper, we use a radial-basis-function kernel:

where r is the specified-kernel bandwidth and, by the rule of thumb, r is given by

r ¼ 0:2 k xmax xmin k2 ; . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ð16Þ

2430 December 2018 SPE Journal

ID: jaganm Time: 11:34 I Path: S:/J###/Vol00000/180083/Comp/APPFile/SA-J###180083

Computational Cost-Saving SVR and Its Integration With the DGN

yj ðmÞ ¼ ½yð1Þ ðmÞ; yð2Þ ðmÞ; … yðNd Þ ðmÞT : . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ð17Þ

December 2018 SPE Journal 2431

ID: jaganm Time: 11:34 I Path: S:/J###/Vol00000/180083/Comp/APPFile/SA-J###180083

Observed Data (minutes)

SVR-Training Cost per

Effect of Numerical Noise and Truncation Error on the Performance of Optimization

2432 December 2018 SPE Journal

ID: jaganm Time: 11:34 I Path: S:/J###/Vol00000/180083/Comp/APPFile/SA-J###180083

December 2018 SPE Journal 2433

ID: jaganm Time: 11:34 I Path: S:/J###/Vol00000/180083/Comp/APPFile/SA-J###180083

Validation With a 2D Synthetic Toy Problem

yðx; t; ep Þ ¼ y0 ðx; tÞ½1 þ ep ð2e 1Þ: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ð20Þ

Also, the objective function for HM is given by

2434 December 2018 SPE Journal

ID: jaganm Time: 11:34 I Path: S:/J###/Vol00000/180083/Comp/APPFile/SA-J###180083

December 2018 SPE Journal 2435

ID: jaganm Time: 11:34 I Path: S:/J###/Vol00000/180083/Comp/APPFile/SA-J###180083

Application to a Real-Field Example With Synthetic Data

2436 December 2018 SPE Journal

ID: jaganm Time: 11:34 I Path: S:/J###/Vol00000/180083/Comp/APPFile/SA-J###180083

(a) Areal view (b) Cross-sectional view

0.5 NSVR = 5, Iteration = 10

December 2018 SPE Journal 2437

ID: jaganm Time: 11:34 I Path: S:/J###/Vol00000/180083/Comp/APPFile/SA-J###180083

0.6 SVR-DGN default

2438 December 2018 SPE Journal

ID: jaganm Time: 11:34 I Path: S:/J###/Vol00000/180083/Comp/APPFile/SA-J###180083

SVR Default SVR Tight Linear Default Linear Tight

0.6 L-DGN Iteration 10

December 2018 SPE Journal 2439

ID: jaganm Time: 11:34 I Path: S:/J###/Vol00000/180083/Comp/APPFile/SA-J###180083

2440 December 2018 SPE Journal

ID: jaganm Time: 11:34 I Path: S:/J###/Vol00000/180083/Comp/APPFile/SA-J###180083

December 2018 SPE Journal 2441

ID: jaganm Time: 11:34 I Path: S:/J###/Vol00000/180083/Comp/APPFile/SA-J###180083

2442 December 2018 SPE Journal

ID: jaganm Time: 11:35 I Path: S:/J###/Vol00000/180083/Comp/APPFile/SA-J###180083

December 2018 SPE Journal 2443

ID: jaganm Time: 11:35 I Path: S:/J###/Vol00000/180083/Comp/APPFile/SA-J###180083

You might also like