Urtec 2659996 Ms

URTeC: 2659996
Downloaded from http://onepetro.org/URTECONF/proceedings-pdf/17URTC/All-17URTC/URTEC-2659996-MS/1253811/urtec-2659996-ms.pdf/1 by Indian Institute of Technology (ISM) Dhanbad user on 28 March 2023
EUR Assessment of Unconventional Assets Using Machine Learning
and Distributed Computing Techniques
Zhenyu Guo, Chaohui Chen*, Shell International Exploration & Production Inc.; Guohua
Gao, Shell Global Solutions (US) Inc.; Richard Cao, Ruijian Li, Chunlei Liu, Shell
Exploration & Production Co.
Copyright 2017, Unconventional Resources Technology Conference (URTeC) DOI 10.15530/urtec-2017-2659996
This paper was prepared for presentation at the Unconventional Resources Technology Conference held in Austin, Texas, USA, 24-26 July 2017.
The URTeC Technical Program Committee accepted this presentation on the basis of information contained in an abstract submitted by the author(s). The contents of this paper
have not been reviewed by URTeC and URTeC does not warrant the accuracy, reliability, or timeliness of any information herein. All information is the responsibility of, and, is
subject to corrections by the author(s). Any person or entity that relies on any information obtained from this paper does so at their own risk. The information herein does not
necessarily reflect any position of URTeC. Any reproduction, distribution, or storage of any part of this paper without the written consent of URTeC is prohibited.
Summary
Reservoir model parameters generally have quite large uncertainty ranges and need to be calibrated by history
matching available production data. The ensemble of calibrated models has a direct impact on business decision
making, such as EUR assessment and well spacing optimization. Multi-realization history matching (MHM)
techniques have been applied to quantify the EUR uncertainty. In practice, MHM requires performing a large
number of reservoir simulations on a distributed computing environment. Given the current low oil price
environment, it is demanding to reduce the computational cost of MHM without compromising forecasting quality.
To solve this challenge, this paper proposed a novel EUR assessment method that integrates a machine learning
technique with an advanced distributed computing technique in a history matching workflow.
Starting from an initial ensemble of reservoir models, each realization is calibrated iteratively with a previously
published Distributed Gauss-Newton method (DGN). Responses of the reservoir realizations are generated by
running simulations on a high-performance-cluster (HPC) concurrently. The responses generated during iterations
are added to the training data set, which is used to train a set of support vector regression (SVR) models. As the
sensitivity matrix for each realization can be estimated analytically from the SVR models, the DGN can use the
sensitivity matrix to generate better search point such that the objective function value can decrease more rapidly.
The procedure is repeated until convergence.
The proposed method is applied to assess EUR for several wells in the Permian Liquid Rich Shale Reservoir field
with complicated subsurface oil/water/gas multiphase flows. The uncertain parameters include reservoir static
properties, hydraulic-fracture properties, and parameters defining dynamic properties such as relative permeabilities,
etc. With the integration of SVR in DGN, the new method saves about 65% of the simulation runs compared to the
method without using SVR. Efficiency comes from learning the simulated results in the previous iterations. The case
study indicates that the new method provides faster EUR forecast and comparable uncertainty ranges compared with
those obtained without using SVR. More importantly, the new method enhances the statistical learning of reservoir
performance and therefore significantly increases capital efficiency for exploiting unconventional resources.
Introduction
Assisted history matching is an essential part of reservoir management, where one or a set of reservoir models is
obtained to yield simulated responses agreed with the observed historical production data. It is a complicated ill-
conditioned inverse problem to which an infinite number of solutions exist that all match the observed data.
According to Bayesian point of view, the desired solutions represented by a set of reservoir models should follow a
posterior distribution conditioned to the observed data. Current researches have been focused on obtaining at least an
approximation of the correct sampling from the posterior distribution, which can properly reflect the uncertainty in
the model space. To achieve this goal, a set of history-matched models instead of a single model should be obtained
to approximate the posterior distribution.
URTeC 2659996 Page 2
The commonly used methods to perform assisted history matching can be roughly divided into two categories:
ensemble-based methods and traditional methods. Ensemble-based methods are able to simultaneously generate a
set of reservoir models that match the historical data and therefore provide a mean to quantify the uncertainty of
both model parameters and production forecasts. A well-known ensemble-based history matching method is the
ensemble Kalman Filter (EnKF) introduced by Evensen (1994), which has been extensively applied to many
different areas. Since the first application of EnKF into history matching reservoir performance (Nævdal, et al.
2002), the studying interests of EnKF for the oil industry have been increasing significantly. To overcome the
disadvantages of EnKF that requires recurrent simulation restarts, Emerick and Reynolds (2012, 2013a, 2013b)
developed an iterative version of ensemble smoother (ES) (Leeuwen et al. 1996), ensemble smoother with multiple
data assimilation (ES-MDA), which has better accuracy than EnKF with comparable computational cost. Further
studies of ES-MDA have been extended into history matching the production data of highly-channelized non-
Gaussian reservoirs (Le, et al. 2015a, 2015b; Le and Reynolds, 2015) and waterflooding reservoirs using a data-
driven model as the forward model (Guo, et al. 2017).
On the other hand, the traditional methods regard the history matching problem as an optimization problem where
the objective function usually consists of model mismatch and data mismatch. One or multiple history-matched
models can be found by applying different types of optimization algorithms such as gradient-based optimization
algorithms, model-based derivative-free optimization algorithms, direct pattern search derivative-free optimization
algorithms, stochastic derivative-free optimization algorithms, and their hybrid counterparts. Starting from an initial
guess of the reservoir model, the optimization procedure iteratively updates the model until a local or a global
minimum of objective function is found. Multiple history-matched models can be found by starting from different
initial guesses.
When an adjoint-based gradient is available (e.g., using the Shell in house reservoir simulator MoReS), the gradient-
based optimization algorithms (Li et al. 2003; Reynolds et al. 2004; Gao and Reynolds 2006; Kahrobaei et al. 2013)
perform better than other optimization algorithms without using adjoint gradient. One of the popular gradient-based
optimization methods used for history matching is the Gauss-Newton method, where the Hessian matrix of the
objective function can be evaluated analytically using the first-order derivatives of data (or the sensitivity matrix).
Alternatively, one may apply a quasi-Newton method such as limited-memory Broyden-Fletcher-Goldfarb-Shanno
(LBFGS) (Liu and Nocedal 1989). As benchmarked by Zhou and Zhang (2010) and Gao et al (2016a), Gauss-
Newton methods perform better than quasi-Newton methods for history matching or least square problems.
Unfortunately, the adjoint information is not always available from commercial reservoir simulators, which actually
restricts the application of adjoint-based history-matching methods. To take advantage of the gradient-based
optimizer, we can obtain the gradient of objective function in different ways. One is the finite difference with one-
time perturbation of each variable (Nash and Sofer 1996) when the adjoint information is not available. Unlike that
adjoint-based methods compute the gradient based on a single reservoir simulation, evaluating the gradient by the
finite-difference scheme will cost the number of simulation runs proportional to the number of parameters for
history matching, which becomes computationally infeasible when the number of reservoir parameters is
considerably large. An alternative method is to estimate both the gradient and Hessian matrix by multivariate
perturbation (Powell 2004). With the numerically estimated derivative information, those gradient-based
optimization methods can also be applied to perform assisted history matching.
In another respect, since the procedures to evaluate the flow responses of perturbation points in order to compute
numerical gradient are independent to each other, the efficiency of gradient evaluation can be further improved by
concurrently evaluating the flow responses of several perturbation reservoir models. Based on this understanding, a
Simultaneous Perturbation and Multivariate Interpolation (SPMI) method was developed (Chen et al. 2012; Gao et
al. 2015, 2016a) for solving the history matching optimization problem. At each iteration, SPMI generates
perturbation points and searching points simultaneously. The perturbation points are generated to compute the
numerical gradient (and Gauss-Newton Hessian Matrix) of the objective function for the next SPMI iteration in a
way similar to direct-pattern-search optimization with the pattern size adaptively decided. The searching points are
selected by use of both trust region and line search methods with different searching radii. With the help of a HPC,
the job to run multiple reservoir simulations on the basis of a set of reservoir models represented by all the
perturbation points and searching points can be performed in a parallel scheme. However, one drawback of SPMI is
that only one history-matched model can be obtained at one time, which leads to expensive computational cost if a
multitude of history-matched realizations are desired (Chen et al, 2016).
To reduce the computational cost of SPMI for history matching and also take the advantage of the gradient-based
history matching method, the Distributed Gauss-Newton (DGN) method was proposed by Gao et al. (2016b, 2016c).
The basic idea of DGN is that, an ensemble of prior model realizations is initially generated as the set of base cases
and an optimization method is applied to the set of base cases to iteratively decrease their objective function values.
The key feature of DGN is to evaluate the Hessian together with trust region search strategy based on the Gauss-
Newton formulation. The trust region method minimizes a quadratic approximation of the objective function in a
limited searching area and it has been proven much more efficient and robust than line search algorithms (Gao, et
al., 2017). The two key parts of the quadratic model, the Gauss-Newton Hessian matrix and gradient of the objective
function, are computed from the corresponding set of linear proxies of data responses that are approximated using
the base-case point and its adjacent points. Specifically, in order to compute the gradient and Gauss-Newton Hessian
matrix, the sensitivity matrix with respect to the reservoir parameters are analytically obtained by taking the
derivative of the set of closed-form linear proxies. One of the advantages of DGN over SPMI is that DGN requires
no extra computational cost on obtaining the perturbation points for derivative estimate as for SPMI, because some
of those simulation cases (also called training data points) completed in previous iterations during the process of
DGN minimization serve as the perturbation points to build the linear proxy. Second, unlike that SPMI only
generate one history-matched reservoir model, DGN can simultaneously generate multiple realizations of reservoir
models which are all conditioned to the observed data. In another word, DGN incorporates both the advantages of
ensemble-based methods and gradient-based methods for assisted history matching; meanwhile, it avoids the
difficulty in modifying the source code of reservoir simulators by integrating with the adjoint code. More
importantly, because the linear proxy for each base case point is built based on the linear interpolation (or
regression) of the surrounded training data points that are closest to the base case in terms of the L2-norm distance
in the model space, the estimated sensitivity matrices of different base cases are generally different from another.
Therefore, for a strong nonlinear problem where the objective function may have multiple local minima, the DGN
method is able to find multiple local minima of the objective function or local maximum a posteriori (MAP)
estimates by searching around different base case realizations. When DGN is integrated with a randomized
maximum likelihood (RML) approach (Chen et al, 2017), it is able to generate an ensemble of conditional
realizations that will be approximately distributed in different local basins of the posterior distribution. These
features will definitely avoid some issues of ensemble-based approach, e.g., ensemble collapse of EnKF, or biased
sampling due to the unrealistic assumption of a linear relationship between data response and model parameter as
assumed in ES-MDA.
However, the use of the linear proxy in DGN to compute the sensitivity matrix of flow responses with respect to
model parameters is not flawless. Since a practical simulation problem has a highly nonlinear relationship between
flow responses and model parameters, linear proxies are not accurate enough to represent the nonlinear relation in
the complex reservoir simulation problem, which may result in generating bad searching points for DGN and hence
take more iterations to converge. In order to improve the computational efficiency of DGN, it is necessary to
propose more decent searching points for the optimization problem. An evident way is to produce a more accurate
response surface model (or proxy model) than the previously used linear model. This new accurate response surface
model is expected to be built at an endurable cost, and it should be easy to obtain the closed-form formulations to
compute the flow responses and sensitivity matrix. Through searching for different methods to build response
surfaces in machine-learning area, we find that the support vector regression (SVR) approach is a good candidate to
replace the linear proxy model.
The objective of this research project is to integrate the SVR proxy model into DGN so that it can provide a better or
more accurate proxy than the linear proxy for computation of the sensitivity matrix. Therefore, the computational
cost (e.g., in terms of the number of simulation runs required for generating an acceptable conditional realization on
average) for assisted history matching using the SVR proxy model is expected to be significantly reduced when
compared with using the original linear proxy model. The proposed new history matching method is applied to EUR
assessment for several wells in the Permian Liquid Rich Shale Reservoir field with complicated subsurface
oil/water/gas multiphase flows. In the following context, we denote the original DGN with linear proxy by L-DGN
and the new method with SVR by SVR-DGN.
Integration of SVR with DGN

The history matching problem is often regarded as an optimization problem. Within the Bayesian framework, the
MAP estimate can be found by minimizing the objective function defined by
1 𝑇𝑇 1
𝑂𝑂(𝒎𝒎) = �𝒎𝒎 − 𝒎𝒎𝒑𝒑𝒑𝒑 � 𝐶𝐶𝑀𝑀−1 �𝒎𝒎 − 𝒎𝒎𝒑𝒑𝒑𝒑 � + (𝒚𝒚(𝒎𝒎) − 𝒅𝒅𝒐𝒐𝒐𝒐𝒐𝒐 )𝑇𝑇 𝐶𝐶𝐷𝐷−1 (𝒚𝒚(𝒎𝒎) − 𝒅𝒅𝒐𝒐𝒐𝒐𝒐𝒐 ), (1)
2 2
where 𝒎𝒎 is the 𝑁𝑁𝑚𝑚 -dimensional model parameter vector; 𝒎𝒎𝒑𝒑𝒑𝒑 is the prior model parameter vector; 𝑦𝑦(𝒎𝒎) is the 𝑁𝑁𝑑𝑑 -
dimensional flow-response vector; 𝒅𝒅𝒐𝒐𝒐𝒐𝒐𝒐 is the 𝑁𝑁𝑑𝑑 -dimensional observed data vector; 𝐶𝐶𝑀𝑀 is the prior model
covariance matrix; and 𝐶𝐶𝐷𝐷 is the observation error covariance matrix, usually in the form of a diagonal matrix. To
quantify the uncertainty of the history-matched models, the RML method can be applied to generate approximate
conditional realizations. The only change is to replace 𝒎𝒎𝒑𝒑𝒑𝒑 with 𝒎𝒎𝒖𝒖𝒖𝒖 and replace 𝒅𝒅𝒐𝒐𝒐𝒐𝒐𝒐 with 𝒅𝒅𝒖𝒖𝒖𝒖 in Eq. 1, where 𝒎𝒎𝒖𝒖𝒖𝒖
is an unconditional realization generated by sampling the prior probability density function (PDF), and 𝒅𝒅𝒖𝒖𝒖𝒖 =
−1/2
𝒅𝒅𝒐𝒐𝒐𝒐𝒐𝒐 + 𝐶𝐶𝐷𝐷 𝒛𝒛, with 𝒛𝒛 being the 𝑁𝑁𝑑𝑑 -dimensional random Gaussian vector with zero mean and identity covariance
matrix.
The gradient-based history matching method often requires the gradient and Hessian matrix of Eq. 1. Analytically,
the gradient is given by
∇𝑂𝑂(𝒎𝒎) = 𝐶𝐶𝑀𝑀−1 �𝒎𝒎 − 𝒎𝒎𝒑𝒑𝒑𝒑 � + 𝐺𝐺 𝑇𝑇 𝐶𝐶𝐷𝐷−1 (𝒚𝒚(𝒎𝒎) − 𝒅𝒅𝒐𝒐𝒐𝒐𝒐𝒐 ), (2)
where 𝐺𝐺 is the sensitivity matrix which is defined as the derivatives of all flow responses with respect to model
parameters as given by
𝐺𝐺(𝒎𝒎) = [∇𝑦𝑦1 (𝒎𝒎), ∇𝑦𝑦 2 (𝒎𝒎), … , ∇𝑦𝑦 𝑁𝑁𝑑𝑑 (𝒎𝒎)]𝑇𝑇 , (3)
and the Hessian matrix is often simplified as the Gauss-Newton Hessian as given by
𝐻𝐻(𝒎𝒎) = 𝐶𝐶𝑀𝑀−1 + 𝐺𝐺 𝑇𝑇 (𝒎𝒎)𝐶𝐶𝐷𝐷−1 𝐺𝐺(𝒎𝒎). (4)
(j) (𝒎𝒎)
When the response 𝑦𝑦 (for j=1,2,…, 𝑁𝑁𝑑𝑑 ) is a nonlinear function of model parameter vector 𝒎𝒎, both the
sensitivity matrix 𝐺𝐺(𝒎𝒎) and the Hessian matrix 𝐻𝐻(𝒎𝒎) depend on 𝒎𝒎 too. Letting 𝒎𝒎∗ be the best solution of current
iteration, we want to find a new search point such that has a larger chance to reduce the objective function, which is
denoted by 𝒎𝒎∗ + 𝚫𝚫𝒎𝒎. If the response 𝑦𝑦 (j) (𝒎𝒎) (for j=1,2,…, 𝑁𝑁𝑑𝑑 ) is smooth, then the objective function 𝑂𝑂(𝒎𝒎) can be
approximated by the following quadratic model in the neighborhood of 𝒎𝒎∗ ,
1
𝑂𝑂(𝒎𝒎∗ + ∆𝒎𝒎) ≈ 𝑞𝑞(∆𝒎𝒎) = 𝑂𝑂(𝒎𝒎∗ ) + [∇𝑂𝑂(𝒎𝒎∗ )]𝑇𝑇 ∆𝒎𝒎 + (∆𝒎𝒎)𝑇𝑇 𝐻𝐻(𝒎𝒎∗ )∆𝒎𝒎. (5)
2
∗
The new search point 𝒎𝒎 = 𝒎𝒎 + ∆𝒎𝒎 can be determined by solving the trust region subproblem, i.e., by finding the
global minimum of the quadratic model 𝑞𝑞(∆𝒎𝒎) defined in Eq. 5 within a given trust region.
A DGN iteration is briefly described as follows. First, the HPC assigns the task to run a multitude of simulations
based on an ensemble of base-case searching points (initially generated or iteratively generated using DGN) to
different computational cores, which will finish the task in a parallel scheme. Next, the simulated results are
wrapped as training data for building a set of SVR proxies, which are used to compute the sensitivity matrix defined
in Eq. 3. At last, DGN generates a set of new base-case searching points using a modified Gauss-Newton trust
region with the SVR proxies and pass them to the HPC.
The sensitivity matrix given a base-case point 𝒎𝒎 in Eq. 3 is estimated using the linear interpolation in L-DGN,
whereas in SVR-DGN, 𝑁𝑁𝑑𝑑 SVR proxy models, as the surrogates of the 𝑁𝑁𝑑𝑑 simulated flow responses, are built based
on a set of the base-case points (including a set of vectors of model parameters and their simulated flow-response
vectors) generated during DGN iterations. By using the analytical form of the SVR prediction function (will be
introduced later), the sensitivity matrix 𝐺𝐺 given any model-parameter vector 𝒎𝒎 can be easily obtained.
SVR Proxy
Two versions of SVR proxy model are implemented in our program, namely, 𝜀𝜀-SVR and LS-SVR, with their
complete derivations given in Appendix A. We define 𝑁𝑁𝑠𝑠 as the number of training points, 𝒙𝒙𝒌𝒌 ∈ 𝑹𝑹𝑵𝑵𝒎𝒎 , 𝑘𝑘 = 1,2, … 𝑁𝑁𝑠𝑠
as the k 𝑡𝑡ℎ training input vector, and 𝑦𝑦(𝒙𝒙𝒌𝒌 ) as the scalar of the real data response corresponding to 𝒙𝒙𝒌𝒌 . The
prediction function 𝑦𝑦� of 𝜀𝜀-SVR as the proxy function of 𝑦𝑦 at given unseen point 𝒙𝒙 is given by
𝑁𝑁𝑠𝑠
𝑦𝑦�(𝒙𝒙) = ∑𝑘𝑘=1 𝛽𝛽𝑘𝑘 𝐾𝐾(𝒙𝒙𝒌𝒌 , 𝒙𝒙) + 𝑏𝑏, (6)
where b and 𝛽𝛽𝑘𝑘 , 𝑘𝑘 = 1,2, … 𝑁𝑁𝑠𝑠 are parameters decided by the training procedure that solves an optimization
problem that minimizes the error between the predictions 𝑦𝑦�(𝒙𝒙𝒌𝒌 ) and the real data 𝑦𝑦(𝒙𝒙𝒌𝒌 ) for 𝑘𝑘 = 1,2, … 𝑁𝑁𝑠𝑠 , plus a
regularization term (see appendix A); 𝐾𝐾(𝒙𝒙𝒌𝒌 , 𝒙𝒙) is a kernel function and for our applications is given by
𝐾𝐾(𝒙𝒙𝒌𝒌 , 𝒙𝒙) = exp{−‖𝒙𝒙𝒌𝒌 − 𝒙𝒙‖2 /𝜎𝜎 2 },
(7)
where 𝜎𝜎 is the specified kernel bandwidth. Similar to 𝜀𝜀-SVR, the prediction function of LS-SVR is given by
𝑁𝑁
𝑦𝑦�(𝒙𝒙) = ∑𝑘𝑘=1
𝑠𝑠
𝛼𝛼𝑘𝑘 𝐾𝐾(𝒙𝒙𝒌𝒌 , 𝒙𝒙) + 𝑏𝑏, (8)
where b and 𝛼𝛼𝑘𝑘 , 𝑘𝑘 = 1,2, … 𝑁𝑁𝑠𝑠 are parameters decided by solving a different optimization problem as for 𝜀𝜀-SVR
(see Appendix A). For history matching applications, each input training point (𝒙𝒙𝒌𝒌 ) is represented by each vector of
model parameters (𝒎𝒎 in Eq. 1) that characterize the reservoir properties; the data response (𝑦𝑦) is represented by a
single flow response simulated at a given time step and a given well, which is used for matching with the one of
observed data (𝒅𝒅𝒐𝒐𝒐𝒐𝒐𝒐 in Eq. 1) measured at the same time and the same well. Typically, there are 𝑁𝑁𝑑𝑑 number of
observed data for a history matching problem as shown in Eq. 1 and therefore we need to build 𝑁𝑁𝑑𝑑 SVR proxies
corresponding to the 𝑁𝑁𝑑𝑑 flow responses that are associated with the observed data.
Sensitivity Matrix
The major goal of building SVR proxies is to provide the analytical function to compute the sensitivity matrix as
defined in Eq. 3. Since both 𝜀𝜀-SVR and LS-SVR have the closed form of predictive formulations as shown in Eqs. 6
and 8 respectively, we can analytically calculate the gradient of each response with respect to input variables. Then,
the sensitivity matrix is constructed by the derivative of all the flow responses with respect to model parameters. For
𝜀𝜀-SVR, the gradient of 𝑦𝑦� in Eq. 6 with respect to x is given by
𝑁𝑁𝑠𝑠
∇𝒙𝒙 𝑦𝑦�(𝒙𝒙) = ∑𝑘𝑘=1 𝛽𝛽𝑘𝑘 ∇𝐾𝐾(𝒙𝒙𝒌𝒌 , 𝒙𝒙). (9)
The gradient of the kernel function is computed by using the chain rule
2
∇𝒙𝒙 𝐾𝐾(𝒙𝒙𝒌𝒌 , 𝒙𝒙) = 2 𝐾𝐾(𝒙𝒙𝒌𝒌 , 𝒙𝒙) ∙ (𝒙𝒙𝒌𝒌 − 𝒙𝒙) . (10)
σ
Substituting Eq. 10 into Eq. 9 yields
−||𝒙𝒙𝒌𝒌 −𝒙𝒙||2
𝑁𝑁 2𝛽𝛽𝑘𝑘
∇𝒙𝒙 𝑦𝑦�(𝒙𝒙) = ∑𝑘𝑘 𝑠𝑠 𝑒𝑒 𝜎𝜎2 ∙ (𝒙𝒙𝒌𝒌 − 𝒙𝒙). (7)
𝜎𝜎 2
In a similar way, the gradient of the LS-SVR response with respect to the input variable vector 𝒙𝒙 is given by
−||𝒙𝒙𝒌𝒌 −𝒙𝒙||2
𝑁𝑁 2𝛼𝛼𝑘𝑘
𝛻𝛻𝒙𝒙 𝑦𝑦�(𝒙𝒙) = ∑𝑘𝑘 𝑠𝑠 𝑒𝑒 𝜎𝜎2 ∙ (𝒙𝒙𝒌𝒌 − 𝒙𝒙). (8)
𝜎𝜎 2
Assembling all the gradients that computed with Eq. 11 or Eq. 12 for 𝑁𝑁𝑑𝑑 responses yields the sensitivity matrix
𝑇𝑇
𝐺𝐺 = �𝛻𝛻𝒙𝒙 𝑦𝑦�1 (𝒙𝒙), 𝛻𝛻𝒙𝒙 𝑦𝑦�2 (𝒙𝒙), … , 𝛻𝛻𝒙𝒙 𝑦𝑦�𝑘𝑘 (𝒙𝒙), … , 𝛻𝛻𝒙𝒙 𝑦𝑦�𝑁𝑁𝑑𝑑 (𝒙𝒙)� . (9)
Validation with a Synthetic Example
One two-dimensional synthetic example is presented here to validate that both 𝜀𝜀-SVR and LS-SVR can build
proxies that accurately predict responses given a random input that is not included into training. In this example, the
response function is given by
2 2
𝑔𝑔(𝑥𝑥, 𝑦𝑦) = −�𝑥𝑥sin(20𝑦𝑦) + 𝑦𝑦sin(20𝑥𝑥)� cosh(sin(10𝑥𝑥) 𝑥𝑥) − �𝑥𝑥cos(10𝑦𝑦) − 𝑦𝑦sin(10𝑥𝑥)� cosh(cos(20𝑦𝑦) 𝑦𝑦), (10)
where 𝑥𝑥 and 𝑦𝑦 are two input parameters for this problem and 𝑔𝑔 is the response function. The range of 𝑥𝑥 and 𝑦𝑦 is
from -2 to 2. The surface map of the function is shown in Figure 1 and we can see that the function is highly
nonlinear with multiple local minima.
Figure 1: Surface map of the two-dimensional synthetic example
Based on the true response function defined in Eq. 14, 2,000 training points are generated without adding any
random noise. First, the second order polynomial regression is applied to the 2,000 training points. The blind test
result for another 1,000 points is shown in Figure 2, which shows that the polynomial model has very poor
predictability since 𝑅𝑅2 is only 0.05. The predicted values of responses almost keep constant no matter how the true
responses vary.
Figure 2: The blind test result with polynomial regression. The horizontalaxis represents the true response values of the blind test data set, and
the vertical axis represents the predicted responses for the blind data set.
Second, both the LS-SVR and the 𝜀𝜀-SVR approaches are applied to the same training data using the setting for
tuning parameters listed in Table 1 and Table 2. Blind test results of the LS-SVR proxy and the 𝜀𝜀-SVR proxy are
shown in Figure 3 and Figure 4, respectively, which show that both SVR proxies generate much better predictions
than is generated with the second order polynomial regression.
Table 1: Tuning parameter setting for LS-SVR

𝜎𝜎 0.037
𝛾𝛾 200
Table 2: Tuning parameter setting for 𝜀𝜀 -SVR

𝜎𝜎 0.057
𝐶𝐶 10
𝜀𝜀 0.01
Figure 3: The blind test result with LS-SVR. The horizontal axis represents the true response values of the blind test data set, and the vertical
axis represents the predicted responses for the blind data set.
Figure 4: The blind test result by 𝜀𝜀 -SVR. The horizontal axis represents the true response values of the blind test data set, and the vertical axis
represents the predicted responses for the blind data set.
Application to a Liquid-Rich-Shale Example
In this example, a sector model that contains a horizontal well with multi-stage hydraulic fractures is cut from the
Permian reservoir. As shown in the Figure 5, there are four fracturing stages for this horizontal well. Three types of
variables are included into the history matching procedure: well completion parameters, rock properties and relative
permeabilities. The detailed parameters (Chen et al. 2016) are shown in Table 3. We only tune those parameters
sensitive to historic production data. Those parameters that impact forecasting range, such as the gas-liquid relative
permeability parameters, are sampled randomly for the forecasting simulation period and they are not in the Table 3.
Figure 5: Schematic illustration of hydraulic fractures in an unconventional reservoir (Image from
http://www.indianz.com/News/2012/006216.asp).
In this example, the total number of parameters is 13. The measured flow responses including the oil rates and the
water-oil ratio (WOR) are obtained under the bottom-hole pressure (BHP) controls. The history matching period is
from the 60 days to 200th day. The data in the first 60 days are ignored because this period is considered as a
flowback period and the simulator cannot capture the flow behavior for this period. The forecast period is from the
200th day to 500th day, and the production forecasts in this period are used for uncertainty quantification of EUR
with the history-matched models.
In this example, 500 unconditional realizations are generated and used as the base cases in the first iteration, and
both SVR-DGN and L-DGN are applied to find conditional realizations. For the purpose of comparison, the
traditional history matching approach using SPMI (Chen et al. 2012; Gao et al. 2015, 2016a) as the optimizer is also
applied to generate conditional realizations. Because it is very computationally expensive for SPMI to find one
conditional realization, to save computation cost, only 100 realizations are generated by SPMI.
Table 3: Parameter Summary

Type of parameter Parameter in simulation Comment
Completion CE Indicator for how many fracture clusters open to flow.
H_factor Indicator of various SRV fracture height.
Xf Effective flowing fracture half length.
Rock properties PermSRV Effective permeability for stimulated rock volume (SRV) zone.
PermMat Effective permeability for the matrix.
Alpha Permeability reduction coefficient. The horizontal permeability may reduce
when the pressure is depleted.
FCD Dimensionless fracture conductivity.
Relative Permeability Nw, Krwir, Krowcw, No, Sorw, Swcon Water-oil real perm curves.
Performance Comparison
First, cumulative distribution function (CDF) profiles of the objective function is used as the performance metric,
and performances of SVR-DGN and L-DGN are compared in Figure 6. In Figure 6, the horizontal axis denotes the
natural log of the objective function, whereas the vertical axis represents the CDF. The orange curve and the blue
curve shown in Figure 6 represent the CDF profiles generated by SVR-DGN and L-DGN, respectively, at the same
iteration number 12. From Figure 6, the objective functions obtained with SVR-DGN is much smaller than that
obtained with L-DGN using the same computation cost. For example, after 12 iterations, the P50 value of objective
function generated with SVR-DGN is about exp(7.2)≈1,340, which is about 1/3 of the one generated with L-DGN,
which is exp(8.2)≈3,641. The other two curves show the similar CDF profiles obtained with the two methods. SVR-
DGN only needs 8 iterations to achieve almost the same CDF profile as obtained with L-DGN after 29 iterations,
which indicates that SVR-DGN is able to speed up the convergence rate by a factor of 3 when compared to L-DGN
for this example.
Figure 6: Cumulative probability distribution function obtained with L-DGN and SVR-DGN at different iteration numbers.
Second, the convergence performance is characterized by the median of the objective function values for the 500
realizations evaluated at each iteration. Figure 7 compares convergence performances between SVR-DGN and L-
DGN based on the median value. As shown in Figure 7, at the same iteration number, SVR-DGN yields a much
lower median objective function value than is obtained with L-DGN.
Figure 7: Median of the objective function values versus the iteration number obtained with SVR-DGN and L-DGN. Red curve denotes the
result of L-DGN and blue curve denotes the result of SVR-DGN.
Third, the computation cost measured by the number of simulation runs required to generate one acceptable history-
matched model or conditional realization is used to quantify the performance of three different approaches. Table 4
quantitatively compares the history matching performance by using three different methods, i.e. SPMI, L-DGN and
SVR-DGN, together with the RML method (Chen et al, 2017) to generate realizations conditioned to production
data. In this example, a history-matched model is accepted as a conditional realization only when the objective
function is reduced to below 1,100. SVR-DGN shows the best efficiency on finishing the assisted history matching
task compared with other two methods. In order to obtain one acceptable history-matched model or conditional
realization, SVR-DGN uses the least number of simulation runs, or equivalently, it uses the least computational cost,
about 1/30 of the simulation runs used by SPMI and 1/3 of the simulation runs used by L-DGN.
Table 4: Comparison among SPMI, L-DGN and SVR-DGN

SPMI L-DGN SVR-DGN
Sample Size per Iteration 100 500 500
Conditional Realizations (OBJ<1,100) 98 249 357
Iterations to Converge 16 29 12
Simulation Runs/Conditional Realizations 570 58 18
Blind Test
Finally, we provide some results simulated from history-matched models to illustrate that history-matched models or
conditional realizations are able to generate production forecasts that are compatible with actual production data, in
both the history matching period (indicated in red dots in Fig. 8) and the blind test period (indicated in blue dots in
Fig. 8). With some minor corrections, these results can be used to quantify uncertainty of reservoir model
parameters and uncertainty of production forecasts.
Historical BHP data and oil-water ratio (WOR) data are shown as red open circles in Figure 8 and Figure 9
respectively. Simulation results generated from all acceptable conditional realizations are shown as green curves in
both figures. Results shown in both figures indicate that production forecasts generated from all acceptable
conditional realizations agree well with historical production data measured in both periods.
Figure 8: BHP after history matching. Red dots are observed data used for history matching; blue dots are observed data used for blind test; and
green lines are flow responses obtained from the history-matched realizations.
Figure 9: Water oil ratio after history matching. Red dots are observed data used for history matching; blue dots are observed data used for blind
test; and green lines are flow responses obtained from the history-matched realizations.
Uncertainty Quantification
To quantify the uncertainty of production forecast, we also calculated the CDF profiles of normalized cumulative oil
production, water production and gas production reported at the end of the production period, as shown in Figure 10.
The three percentiles on the three sub figures in Figure 10 denote the estimated P10-P50-P90 values. The three
percentiles obtained with L-DGN and SVR-DGN are compared in Table 5, and their differences are negligible,
which validates that both methods yield similar uncertainty ranges.
(a) CDF of cumulative oil production
(b) CDF of cumulative water production
(c) CDF of cumulative gas production
Figure 10: CDF of normalized cumulative oil production, cumulative water production and cumulative gas production. The three numbers on the
subplots denote P10, P50 and P90 of corresponding variables.
Table 5: A summary of P10-P50-P90 values of the normalized oil, water and gas production
Cum. Oil Cum. Water Cum. Gas
Conditional P10 (Linear DGN) 1 1 1
Conditional P50 (Linear DGN) 1.38 1.44 1.29
Conditional P90 (Linear DGN) 2.28 2.02 1.72
Conditional P10 (SVR DGN) 1 1 1
Conditional P50 (SVR DGN) 1.44 1.59 1.30
Conditional P90 (SVR DGN) 2.48 2.20 1.79
Conclusions
Based on our theoretical discussions and numerical experiments, we can draw the following conclusions:
1. The SVR proxy is more accurate than polynomial proxy for highly nonlinear problems.
2. The proposed SVR-DGN history matching approach performs more efficiently than both the traditional
history matching approach (e.g., using the SPMI optimizer) and the linear version DGN approach.
Integration of SVR with DGN may reduce the computational cost by a factor of 3 when compared with the
original linear DGN.
3. The quality of history-matched models (or conditional realizations) generated using the proposed SVR-
DGN approach is assured by the actual production data measured in both the history-matched period and
the blind test period.
4. History-matched models (or conditional realizations) generated by the proposed SVR-DGN approach can
be used to quantify uncertainty of both model parameters and production forecasts by conditioning to
production data.
Acknowledgements
We want to thank Shell Global Solutions (US) Inc., Shell International Exploration & Production Inc., and Shell
Exploration & Production Co. for their support to publish the paper.
References
Chen, C., Jin, L., Gao, G., Weber, D., Vink, J. C., Hohl, D. F. Hohl, Alpak, F. O., and Pirmes, Carlos. 2012.
Assisted History Matching Using Three Derivative-Free Optimization Algorithms. Proceedings of SPE
Europec/EAGE Annual Conference, Copenhagen, Denmark, 4-7 June. http://dx.doi.org/10.2118/154112-MS.
Chen, C., Gao, G., Li, R., Cao, R., Chen, T., Vink, J. and Gelderblom, P., Integration of Distributed Gauss-Newton
with Randomized Maximum Likelihood Method for Uncertainty Quantification of Reservoir Performance,
SPE-182639-MS, SPE Reservoir Simulation Conference 2017.
Chen, C., Li, R., Gao, G., Vink, J.C., and Cao, R., EUR Assessment of Unconventional Assets Using Parallelized
History Matching Workflow Together with RML method, URTeC 2429986, Unconventional Resources
Technology Conference, San Antonio, Texas, US, 1-3 August, 2016.
Cortes, C., and Vapnik, V. 1995. Support-vector networks. Machine learning, 20(3), 273-297.
De Brabanter, K. 2011. Least squares support vector regression with applications to large-scale data: a statistical
approach. Faculty of Engineering, KU Leuven, Katholieke Universiteit Leuven.
Emerick, A. A., and Reynolds, A. C. 2012. History Matching Time-lapse Seismic Data Using the Ensemble Kalman
Filter with Multiple Data Assimilations. Computational Geosciences, 16(3), 639-659.
Emerick, A. A., and Reynolds, A. C. 2013a. Ensemble Smoother with Multiple Data Assimilations. Computers &
Geosciences, Vol. 55, 3-15.
Emerick, A. A., and Reynolds, A. C. 2013b. History-Matching Production and Seismic Data in a Real Field Case
Using the Ensemble Smoother With Multiple Data Assimilation. Proceedings of the SPE Reservoir Simulation
Symposium, The Woodlands, Texas, USA, 18-20 February. http://dx.doi.org/10.2118/163675-MS.
Evensen, G. 1994. Sequential data assimilation with a nonlinear quasi-geostrophic model using Monte Carlo
Methods to forecast error statistics. Journal of Geophysical Research, 99(C5), 10143-10162.
Gao, G., and Reynolds, A. C. 2004. An Improved Implementation of the LBFGS Algorithm for Automatic History
Matching. Proceedings of the SPE Annual Technical Conference and Exhibition, Houston, Texas, 26-29
September, SPE-90058-MS. http://dx.doi.org/10.2118/90058-MS.
Gao, G., and Reynolds, A. C. 2006. An Improved Implementation of the LBFGS Algorithm for Automatic History
Matching. SPE Journal, 11(1), 5-17. http://dx.doi.org/10.2118/90058-PA.
Gao, G., Vink, J. C., Alpak, F. O., and Mo, W. 2015. An Efficient Optimization Work Flow for Field-Scale In-Situ
Upgrading Developments. SPE Journal, 20(04), 701-715. http://dx.doi.org/10.2118/163634-PA.
Gao, G., Vink, J. C., Chen, C., Alpak, F. O., and Du, K. 2016a. A Parallelized and Hybrid Data-Integration
Algorithm for History Matching of Geologically Complex Reservoirs. SPE Journal.
http://dx.doi.org/10.2118/175039-PA (in press; posted May 2016).
Gao, G., Vink, J. C., Chen, C., El Khamra, Y., and Tarrahi, M. In press. Distributed Gauss-Newton Method for
History Matching Problems with Multiple Best Matches. Computational Geosciences.
Gao, G., Vink, J. C., Chen, C., Tarrahi, M., and El Khamra, Y. 2016. Uncertainty Quantification for History
Matching Problems with Multiple Best Matches Using a Distributed Gauss-Newton Method. Paper SPE-
181611-MS at the SPE Annual Technical Conference and Exhibition held in Dubai, UAE, 26–28 September
2016.
Gao, G. Jiang, H., Van Hagan, P., Vink, J.C., and Wells, Terence. A Gauss-Newton Trust Region Solver for Large
Scale History Matching Problems. DOI: 10.2118/182602-MS, SPE Reservoir Simulation Conference, February
2017.
Guo, Z., Reynolds, A. C., and Zhao, H. 2017. A Physics-Based Data-Driven Model for History-Matching,
Prediction and Characterization of Waterflooding Performance. Proceedings of SPE Reservoir Simulation
Conference. https://doi.org/10.2118/182660-MS.
Kahrobaei, S., Van Essen, G., Van Doren, J., Van den Hof, P., and Jansen, J. D. 2013. Adjoint-Based History
Matching of Structural Models Using Production and Time-Lapse Seismic Data. proceedings of SPE Reservoir
Simulation Symposium. http://dx.doi.org/10.2118/163586-MS.
Le, D. H., and Reynolds, A. C. 2015. An Adaptive Ensemble Smoother for Assisted History Matching. TUPREP
Research Report, The University of Tulsa.
Le, D. H., Emerick, A. A., and Reynolds, A. C. 2015a. An Adaptive Ensemble Smoother with Multiple Data
Assimilation for Assisted History Matching. Proceedings of SPE Reservoir Simulation Symposium, Houston,
Texas, USA, 23-25 February. http://dx.doi.org/10.2118/173214-MS.
Le, D. H., Younis, R., and Reynolds, A. C. 2015b. A History Matching Procedure for Non-Gaussian Facies Based
on ES-MDA. proceedings of SPE Reservoir Simulation Symposium, Houston, Texas, USA, 23-25 February.
http://dx.doi.org/10.2118/173233-MS.
Leeuwen, V., Evensen, P. J., and Geir. 1996. Data assimilation and inverse methods in terms of a probabilistic
formulation. Monthly Weather Review, Vol. 124, 2898-2913.
Li, R., Reynolds, A. C., and Oliver, D. S. 2003. Sensitivity Coefficients for Three-Phase Flow History Matching. J.
Canadian Pet. Tech., 42(4), 70-77. http://dx.doi.org/10.2118/03-04-04.
Liu, D. C., and Nocedal, J. 1989. On the limited memory BFGS method for large scale optimization. Mathematical
programming, 45(1-3), 503-528.
Mercer, J. 1909. Functions of positive and negative type, and their connection with the theory of integral equations.
Philosophical transactions of the royal society of London. Series A, containing papers of a mathematical or
physical character, Vol. 209, 415-446.
Micchelli, C. A. 1984. Interpolation of scattered data: distance matrices and conditionally positive definite functions.
In Approximation theory and spline functions, pp. 143-145. Netherlands: Springer.
Dyn, N. 1987. Interpolation of scattered data by radial functions. Topics in multivariate approximation, 47-61.
Nævdal, G., Mannseth, T., and Vefring, E. H. 2002. Near-Well Reservoir Monitoring Through Ensemble Kalman
Filter. Proceedings of the SPE/DOE Improved Oil Recovery Symposium, 13-17 April.
http://dx.doi.org/10.2118/75235-MS.
Nash, S. G. and Sofer, A. 1996. Linear and Nonlinear Programming. Blacklick, Ohio: McGraw-Hill
Science/Engineering/Math.
Platt, J. 1998. Sequential minimal optimization: A fast algorithm for training support vector machines.
Powell, M. J. D. 2004. Least Frobenius Norm Updating of Quadratic Models That Satisfy Interpolation Conditions.
Math. Program. B, 100(1), 183-215.
Reynolds, A. C., Li, R., and Oliver, D. S. 2004. Simultaneous estimation of absolute and relative permeability by
automatic history matching of three-phase flow production data. Journal of Canadian Petroleum Technology,
43(03), 37-46.
Smola, A. J., and Schölkopf, B. 2004. A tutorial on support vector regression. Statistics and Computing, 14(3), 199-
222.
Suykens, J. A., and Vandewalle, J. 1999. Least squares support vector machine classifiers. Neural processing letters,
9(3), 293-300.
Suykens, J. A., De Brabanter, J., Lukas, L., and Vandewalle, J. 2002. Weighted least squares support vector
machines: robustness and sparse approximation. Neurocomputing, 48(1), 85-105.
Appendix A: Formulations of SVR proxy
Here, the formulations of two different SVR proxies which are implemented in our program, the ε-SVR and least-
square SVR, are shown as follows.
𝜺𝜺-SVR. The proxy model to estimate the response 𝑦𝑦� for the input 𝒙𝒙 is defined as
𝑦𝑦� = 𝒘𝒘𝑻𝑻 𝜑𝜑(𝒙𝒙) + 𝑏𝑏, (A-1)
where 𝜑𝜑(. ) is a function that maps 𝒙𝒙 ∈ 𝑹𝑹𝑵𝑵𝒎𝒎 in the original input space with dimension of 𝑁𝑁𝑚𝑚 to 𝜑𝜑(𝒙𝒙) ∈ 𝑹𝑹𝑵𝑵𝒘𝒘 in a
new feature space with dimension equal to 𝑁𝑁𝑤𝑤 , and 𝑏𝑏 as a scalar is the bias term. In Eq. A-1, the weighting vector 𝒘𝒘
is the defined as
𝑇𝑇
𝒘𝒘 = �𝑤𝑤1 , 𝑤𝑤2 , … , 𝑤𝑤𝑁𝑁𝑤𝑤 � . (A-2)
Both 𝑏𝑏 and 𝒘𝒘 are determined by fitting 𝑁𝑁𝑠𝑠 training data points, {𝒙𝒙𝒌𝒌 , 𝑦𝑦𝑘𝑘 }𝑁𝑁 𝑵𝑵𝒎𝒎
Here, 𝒙𝒙𝒌𝒌 ∈ 𝑹𝑹 is the 𝑘𝑘th input vector,
𝑠𝑠
𝑘𝑘=1 .
and 𝑦𝑦𝑘𝑘 is assumed a scalar for simplicity. Following the formulation stated in Cortes and Vapnik (1995), the
optimization problem in the primal space is defined as
1 𝑁𝑁𝑠𝑠
Minimize 𝐽𝐽 = 𝒘𝒘𝑇𝑇 𝒘𝒘 + 𝐶𝐶 ∑𝑘𝑘=1 (𝜉𝜉𝑘𝑘 + 𝜉𝜉𝑘𝑘∗ ), (A-3)
2
subject to
𝑦𝑦𝑘𝑘 − 𝒘𝒘𝑇𝑇 𝜑𝜑(𝒙𝒙𝒌𝒌 ) − 𝑏𝑏 ≤ 𝜖𝜖 + 𝜉𝜉𝑘𝑘
�𝒘𝒘𝑇𝑇 𝜑𝜑(𝒙𝒙𝒌𝒌 ) + 𝑏𝑏 − 𝑦𝑦𝑘𝑘 ≤ 𝜖𝜖 + 𝜉𝜉𝑘𝑘∗ , (A-4)
𝜉𝜉𝑘𝑘∗ , 𝜉𝜉𝑘𝑘 ≥ 0
where 𝐶𝐶 is the trade-off factor that balances the flatness of the proxy model and the training error; 𝜉𝜉𝑘𝑘 and 𝜉𝜉𝑘𝑘∗ are
defined by the 𝜀𝜀-loss function as
0 𝑖𝑖𝑖𝑖 𝑦𝑦𝑘𝑘 − 𝒘𝒘𝑇𝑇 𝜑𝜑(𝒙𝒙𝒌𝒌 ) − 𝑏𝑏 ≤ 𝜖𝜖,
𝜉𝜉𝑘𝑘 = � 𝑇𝑇 (A-5)
𝑦𝑦𝑘𝑘 − 𝒘𝒘 𝜑𝜑(𝒙𝒙𝒌𝒌 ) − 𝑏𝑏 − 𝜖𝜖 𝑜𝑜𝑜𝑜ℎ𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒.
0 𝑖𝑖𝑖𝑖 𝒘𝒘𝑇𝑇 𝜑𝜑(𝒙𝒙𝒌𝒌 ) + 𝑏𝑏 − 𝑦𝑦𝑘𝑘 ≤ 𝜖𝜖,
𝜉𝜉𝑘𝑘∗ = � 𝑇𝑇 (A-6)
𝒘𝒘 𝜑𝜑(𝒙𝒙𝒌𝒌 ) + 𝑏𝑏 − 𝑦𝑦𝑘𝑘 − 𝜖𝜖 𝑜𝑜𝑜𝑜ℎ𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒.
The Lagrange formulation is given by
𝑁𝑁
𝐿𝐿 = 𝐽𝐽 − ∑𝑘𝑘=1
𝑠𝑠
𝛼𝛼𝑘𝑘 (𝜖𝜖 + 𝜉𝜉𝑘𝑘 − 𝑦𝑦𝑘𝑘 + 𝒘𝒘𝑇𝑇 𝜑𝜑(𝒙𝒙𝒌𝒌 ) + 𝑏𝑏) −
∑𝑁𝑁 ∗ ∗ 𝑇𝑇 𝑁𝑁𝑠𝑠 ∗ ∗
𝑘𝑘=1 𝛼𝛼𝑘𝑘 (𝜖𝜖 + 𝜉𝜉𝑘𝑘 + 𝑦𝑦𝑘𝑘 − 𝒘𝒘 𝜑𝜑(𝒙𝒙𝒌𝒌 ) − 𝑏𝑏) − ∑𝑘𝑘=1(𝜂𝜂𝑘𝑘 𝜉𝜉𝑘𝑘 + 𝜂𝜂𝑘𝑘 𝜉𝜉𝑘𝑘 ).
𝑠𝑠
(A-7)
According to Karush–Kuhn–Tucker (KKT) conditions, the derivatives of 𝐿𝐿 with respect to primal variables are
equal to 0, which yields
𝑁𝑁𝑠𝑠
𝜕𝜕𝑏𝑏 𝐿𝐿 = ∑𝑘𝑘=1 (𝛼𝛼𝑘𝑘∗ − 𝛼𝛼𝑘𝑘 ) = 0, (A-8)
𝑁𝑁
𝛻𝛻𝑤𝑤 𝐿𝐿 = 𝒘𝒘 − ∑𝑘𝑘=1
𝑠𝑠
(𝛼𝛼𝑘𝑘∗ − 𝛼𝛼𝑘𝑘 ) 𝜑𝜑(𝒙𝒙𝒌𝒌 ) = 0, (A-9)
and
(∗) (∗)
𝜕𝜕𝜉𝜉 (∗) 𝐿𝐿 = 𝐶𝐶 − 𝛼𝛼𝑘𝑘 − 𝜂𝜂𝑘𝑘 = 0. (A-10)
𝑘𝑘
Substituting Eqs. A-8-A-10 into Eq. A-7 yields the following dual optimization problem.
1 𝑁𝑁𝑠𝑠
− ∑𝑘𝑘=1 ∑𝑁𝑁 ∗ ∗ 𝑇𝑇
𝑙𝑙=1(𝛼𝛼𝑘𝑘 − 𝛼𝛼𝑘𝑘 )(𝛼𝛼𝑙𝑙 − 𝛼𝛼𝑙𝑙 )𝜑𝜑(𝒙𝒙𝒌𝒌 ) ∙ 𝜑𝜑(𝒙𝒙𝒍𝒍 )
𝑠𝑠
maximize � 2 𝑁𝑁𝑠𝑠
, (A-11)
+ ∑𝑘𝑘=1 (𝛼𝛼𝑘𝑘 − 𝛼𝛼𝑘𝑘∗ )𝑦𝑦𝑖𝑖 − 𝜖𝜖 ∑𝑁𝑁 ∗
𝑘𝑘=1(𝛼𝛼𝑘𝑘 + 𝛼𝛼𝑘𝑘 )
𝑠𝑠
subject to
𝑁𝑁
𝛴𝛴𝑘𝑘=1 𝑠𝑠
(𝛼𝛼𝑘𝑘 − 𝛼𝛼𝑘𝑘∗ ) = 0,
� (A-12)
𝛼𝛼𝑘𝑘 , 𝛼𝛼𝑘𝑘∗ ∈ [0, 𝐶𝐶].
In Eq. A-11, the inner product of 𝜑𝜑(𝒙𝒙𝒌𝒌 ) and 𝜑𝜑(𝒙𝒙𝒍𝒍 ) can be replaced by a proper kernel function satisfying Mercer’s
condition (Mercer 1909),
𝐾𝐾(𝒙𝒙𝒌𝒌 , 𝒙𝒙𝒍𝒍 ) = 𝜑𝜑(𝒙𝒙𝒌𝒌 )𝑇𝑇 𝜑𝜑(𝒙𝒙𝒍𝒍 ) . (A-13)
One of the commonly used kernel functions for nonlinear regression is the radial basis function (RBF) kernel
(Micchelli 1984; Dyn 1987), which is given by
𝐾𝐾(𝒙𝒙𝒌𝒌 , 𝒙𝒙𝒍𝒍 ) = exp{−‖𝒙𝒙𝒌𝒌 − 𝒙𝒙𝒍𝒍 ‖/𝜎𝜎 2 }, (A-14)
where 𝜎𝜎 is the kernel bandwidth functioning as a tuning parameter that controls the impacting region of the RBF
kernel, i.e., the bigger 𝜎𝜎 is, the more impact distant points will have on the unseen point to be predicted. The
optimization problem defined in Eqs. A-11 and A-12 can be solved by using the sequential minimal optimization
(Platt, J. 1998). Substituting Eq. A-9 into Eq. A-1 yields the prediction formulation of 𝜀𝜀-SVR
𝑁𝑁𝑠𝑠
𝑦𝑦�(𝒙𝒙) = ∑𝑘𝑘=1 (𝛼𝛼𝑘𝑘∗ − 𝛼𝛼𝑘𝑘 )𝐾𝐾(𝒙𝒙𝒌𝒌 , 𝒙𝒙) + 𝑏𝑏. (A-15)
Denote 𝛼𝛼𝑘𝑘∗ − 𝛼𝛼𝑘𝑘 as 𝛽𝛽𝑘𝑘 , then Eq. A-15 is rewritten as
Ns
𝑦𝑦�(𝒙𝒙) = ∑k=1 𝛽𝛽𝑘𝑘 𝐾𝐾(𝒙𝒙𝒌𝒌 , 𝒙𝒙) + 𝑏𝑏. (A-16)
LS-SVR. Least square support vector regression (LS-SVR) (Suykens and Vandewalle 1999; Suykens et al 2002;
Smola and Schölkopf 2004; De Brabanter 2011) is a variant of support vector regression. The basic idea of LS-SVR
is similar to 𝜀𝜀-SVR: first mapping 𝒙𝒙 ∈ 𝑹𝑹𝑵𝑵𝒎𝒎 in the original input space to 𝜑𝜑(𝒙𝒙) ∈ 𝑹𝑹𝑵𝑵𝒘𝒘 in a new feature space and
then doing the linear regression in the new feature space as shown in Figure A-1.
Figure A-1: Mapping function and linear regression in feature space
Similar to 𝜀𝜀-SVR, both 𝑏𝑏 and 𝒘𝒘 are determined by fitting 𝑁𝑁𝑠𝑠 training data points. However, a different measure of
the error, the least-square loss function defined in Eq. A-17, is used for 𝜀𝜀-SVR,
𝑁𝑁𝑠𝑠
∑𝑘𝑘=1 𝑒𝑒𝑘𝑘2 , (A-17)
where 𝑒𝑒𝑘𝑘 = 𝑦𝑦𝑘𝑘 − 𝑦𝑦�𝑘𝑘 . Subsequently, the objective function is written as
1 1 𝑁𝑁𝑠𝑠
𝐽𝐽(𝒘𝒘, 𝒆𝒆) = 𝒘𝒘𝑇𝑇 𝒘𝒘 + 𝛾𝛾 ∑𝑘𝑘=1 𝑒𝑒𝑘𝑘2 , (A-18)
2 2
where 𝛾𝛾 is the regularization factor to balance the complexity (flatness) of the model and data mismatch term. The
optimization problem defined in the primal space is given by
minimize 𝐽𝐽 (𝒘𝒘, 𝒆𝒆), (A-19)
subject to
𝑦𝑦𝑘𝑘 = 𝒘𝒘𝑇𝑇 𝜑𝜑(𝑥𝑥𝑘𝑘 ) + 𝑏𝑏 + 𝑒𝑒𝑘𝑘 , 𝑘𝑘 = 1,2, … 𝑁𝑁𝑠𝑠 , (A-20)
where 𝒆𝒆 is the error vector as defined by
𝑇𝑇
𝒆𝒆 = �𝑒𝑒1 , 𝑒𝑒2 , … , 𝑒𝑒𝑁𝑁𝑠𝑠 � .
One can define the following Lagrangian to solve the optimization problem defined by Eq. A-19 and Eq. A-20 in the
dual space.
𝑁𝑁𝑠𝑠
𝐿𝐿(𝒘𝒘, 𝑏𝑏, 𝒆𝒆; 𝜶𝜶) = 𝐽𝐽(𝒘𝒘, 𝒆𝒆) − 𝛴𝛴𝑘𝑘=1 𝛼𝛼𝑘𝑘 {𝒘𝒘𝑇𝑇 𝜑𝜑(𝑥𝑥𝑘𝑘 ) + 𝑏𝑏 + 𝑒𝑒𝑘𝑘 − 𝑦𝑦𝑘𝑘 }, (A-21)
where 𝜶𝜶 = [𝛼𝛼1 , 𝛼𝛼2 , … , 𝛼𝛼𝑁𝑁𝑠𝑠 ] is the vector of Lagrange multipliers. According to KKT condition, Eq. A-21 is
minimized under the necessary condition that 𝛻𝛻𝛻𝛻 = 0, which is expressed as
𝑁𝑁𝑠𝑠
⎧ ∇𝒘𝒘 𝐿𝐿 = 0 → 𝒘𝒘 = 𝛴𝛴𝑘𝑘=1 𝛼𝛼𝑘𝑘 𝜑𝜑(𝑥𝑥𝑘𝑘 ),
⎪ 𝜕𝜕𝜕𝜕 𝑁𝑁
= 0 → 𝛴𝛴𝑘𝑘=1 𝛼𝛼𝑘𝑘 = 0,
𝑠𝑠
𝜕𝜕𝜕𝜕 (A-22)
⎨ ∇𝒆𝒆 𝐿𝐿 = 0 → 𝜶𝜶 = 𝛾𝛾𝒆𝒆,
⎪ 𝜕𝜕𝜕𝜕 = 0 → 𝒘𝒘𝑻𝑻 𝜑𝜑(𝑥𝑥 ) + 𝑏𝑏 + 𝑒𝑒 − 𝑦𝑦 = 0.
⎩ 𝜕𝜕𝛼𝛼𝑘𝑘 𝑘𝑘 𝑘𝑘 𝑘𝑘
Eliminating 𝒘𝒘, 𝒆𝒆 in Eq. A-22 yields the linear system

0 1𝑇𝑇𝑁𝑁 𝑏𝑏 0
� 1 � � � = [ ], (A-23)
1𝑁𝑁 𝛺𝛺 + 𝐼𝐼 𝜶𝜶 𝒀𝒀 𝛾𝛾
T T
where 𝒀𝒀 = �𝑦𝑦1 , … , 𝑦𝑦𝑁𝑁𝑠𝑠 � ; 𝜶𝜶 = �𝛼𝛼1 , … , 𝛼𝛼𝑁𝑁𝑠𝑠 � ; 𝛺𝛺𝑘𝑘𝑘𝑘 = 𝜑𝜑(𝑥𝑥𝑘𝑘 )𝑇𝑇 𝜑𝜑(𝑥𝑥𝑙𝑙 ) is the entry of 𝛺𝛺 at the 𝑘𝑘 th row and 𝑙𝑙 th column
and 1𝑁𝑁𝑠𝑠 is the column vector where every element is 1. The inner product of 𝜑𝜑(𝒙𝒙𝒌𝒌 ) and φ(𝒙𝒙𝒍𝒍 ) can be replaced by a
kernel function that satisfies Mercer’s condition as defined in Eq. A-13,
Same to 𝜀𝜀-SVR, we can use the RBF kernel that defined in Eq. A-14 as the kernel function. Therefore, we have two
tuning parameters in LS-SVR, i.e., 𝜎𝜎 and 𝛾𝛾. Typically, large 𝛾𝛾 and small 𝜎𝜎 tend to make the trained model honor the
training data more and even yield over-fitting that will cause a big generalization error for prediction. A good way to
select the two tuning parameters is to perform the cross-validation. The derivation to compute the parameters 𝜶𝜶 and
𝑏𝑏 is given as follows. From Eq. A-23,
1 −1 1 −1
𝑏𝑏 �𝛺𝛺 + 𝐼𝐼� ∙ 1𝑁𝑁 + 𝜶𝜶 = �𝛺𝛺 + 𝐼𝐼� 𝑌𝑌. (A-24)
𝛾𝛾 𝛾𝛾
1
One may notice that 𝛺𝛺 + 𝐼𝐼 is positive definite and therefore its inverse matrix always exists. Pre-multiplying both
𝛾𝛾
sides of Eq. A-24 by 1𝑇𝑇𝑁𝑁 yields
1 −1 1 −1
𝑏𝑏 ∙ 1𝑇𝑇𝑁𝑁 ∙ �𝛺𝛺 + 𝐼𝐼� ∙ 1𝑁𝑁 = 1𝑇𝑇𝑁𝑁 ∙ �𝛺𝛺 + 𝐼𝐼� 𝑌𝑌. (A-25)
𝛾𝛾 𝛾𝛾
Rearranging Eq. A-25yields
1 −1
1𝑇𝑇
𝑁𝑁 ∙�𝛺𝛺+ 𝐼𝐼� 𝑌𝑌
𝛾𝛾
𝑏𝑏 = 1 −1 . (A-26)
1𝑇𝑇
𝑁𝑁 ∙�𝛺𝛺+𝛾𝛾𝐼𝐼� ∙1𝑁𝑁
Substituting Eq. A-25 into Eq. A-24 yields

1 −1 1 −1
𝜶𝜶 = �𝛺𝛺 + 𝐼𝐼� 𝑌𝑌 − 𝑏𝑏 �𝛺𝛺 + 𝐼𝐼� ∙ 1𝑁𝑁 .
𝛾𝛾 𝛾𝛾
The resulting LS-SVR model to predict the response of an unknown point 𝒙𝒙 is given by
𝑁𝑁𝑠𝑠
𝑦𝑦�(𝒙𝒙) = Σ𝑘𝑘=1 𝛼𝛼𝑘𝑘 𝐾𝐾(𝒙𝒙𝒌𝒌 , 𝒙𝒙) + 𝑏𝑏. (A-27)
The major difference of SVR and LS-SVR is that for LS-SVR, the data mismatch term is in the form of 𝑒𝑒𝑘𝑘2
as shown
in Eq. A-17, while for the traditional 𝜀𝜀-SVR, the data mismatch term, also called loss function, is illustrated as
0 𝑖𝑖𝑖𝑖 |𝑒𝑒𝑘𝑘 | ≤ 𝜀𝜀
|𝜀𝜀𝑘𝑘 | = � . (A-28)
|𝑒𝑒𝑘𝑘 | − 𝜀𝜀 otherwise
As shown in Figure A-2, for 𝜀𝜀-SVR, the training points with deviation between observable and predicted response
less than 𝜖𝜖 will not participate in the training and the values of the loss function are 0. Only the training points with
deviation larger than 𝜖𝜖 are included as the supportive training points. This feature makes ε-SVR reduce the training
cost by reducing the number of effective training data compared with LS-SVR, which incorporates every training
point to build a proxy model. However, solving the optimization problem defined in Eqs. A-11 and A-12 is more
computationally expensive, which degenerates the efficiency improved by reducing the training points. For a field
problem, the training data size is usually smaller than 5,000 and we find out that the overall computational cost of ε-
SVR is bigger than LS-SVR. Therefore, LS-SVR is the better choice to build the response surface model in such
medium scale.
Figure A-2: Loss function of ϵ-SVR

Urtec 2659996 Ms

Uploaded by

Copyright:

Available Formats

Urtec 2659996 Ms

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Urtec 2659996 Ms

Uploaded by

Copyright:

Available Formats

URTeC: 2659996

Integration of SVR with DGN

Validation with a Synthetic Example

Table 1: Tuning parameter setting for LS-SVR

Table 2: Tuning parameter setting for 𝜀𝜀 -SVR

Application to a Liquid-Rich-Shale Example

Table 3: Parameter Summary

Table 4: Comparison among SPMI, L-DGN and SVR-DGN

(c) CDF of cumulative gas production

Appendix A: Formulations of SVR proxy

Eliminating 𝒘𝒘, 𝒆𝒆 in Eq. A-22 yields the linear system

Substituting Eq. A-25 into Eq. A-24 yields

Figure A-2: Loss function of ϵ-SVR

You might also like