Correlation Method for Variance Reduction of Monte
Carlo Integration in RS-HDMR
GENYUAN LI,1 HERSCHEL RABITZ,1 SHENG-WEI WANG,2 PANOS G. GEORGOPOULOS2
1
Department of Chemistry, Princeton University, Princeton, New Jersey 08544
Environmental and Occupational Health Sciences Institute, 170 Frelinghuysen Road,
Piscataway, New Jersey 08854
2
Received 5 June 2002; Accepted 24 July 2002
Abstract: The High Dimensional Model Representation (HDMR) technique is a procedure for efficiently representing
high-dimensional functions. A practical form of the technique, RS-HDMR, is based on randomly sampling the overall
function and utilizing orthonormal polynomial expansions. The determination of expansion coefficients employs Monte
Carlo integration, which controls the accuracy of RS-HDMR expansions. In this article, a correlation method is used to
reduce the Monte Carlo integration error. The determination of the expansion coefficients becomes an iteration
procedure, and the resultant RS-HDMR expansion has much better accuracy than that achieved by direct Monte Carlo
integration. For an illustration in four dimensions a few hundred random samples are sufficient to construct an
RS-HDMR expansion by the correlation method with an accuracy comparable to that obtained by direct Monte Carlo
integration with thousands of samples.
© 2003 Wiley Periodicals, Inc.
J Comput Chem 24: 277–283, 2003
Key words: HDMR; high dimensional systems; random sampling; correlation method with Monte Carlo integration;
atmospheric chemistry
Introduction
High-dimensional model representation (HDMR) is a general set
of quantitative model assessment and analysis tools for capturing
high-dimensional input– output system behavior.1– 4 The impact of
the multiple input variables on the output can be independent as
well as cooperative, and HDMR expresses the model output f(x) as
a finite hierarchical cooperative function expansion in terms of its
input variables:
冘
n
f共x兲 ⫽ f0 ⫹
i⫽1
⫹
fi 共xi 兲 ⫹
冘
冘
fij 共xi , xj 兲 ⫹ · · ·
term f 12. . .n ( x 1 , x 2 , . . . , x n ) contains any residual nth order
cooperative contribution of all input variables.
Distinct, but formally equivalent HDMR expansions, all of the
same structure as eq. (1), may be constructed. There are two
commonly used HDMR expansions: Cut- and RS (Random Sampling)-HDMR. Cut-HDMR expresses f(x) in reference to a specified reference point x in ⍀ of the n-dimensional input variable x
space while RS-HDMR depends on the average value of f(x) over
the whole domain ⍀.
For Cut-HDMR a reference point x is first chosen in ⍀. The
Cut-HDMR component functions with respect to reference point x
have the following forms1,2:
1ⱕi⬍jⱕn
fi 1i 2 . . . i l共xi 1, xi 2, . . . , xi l兲 ⫹ · · ·
f 0 ⫽ f共 x 兲,
(2)
f i共 x i兲 ⫽ f共 x i, x i兲 ⫺ f 0,
(3)
1ⱕi 1 ⬍· · ·⬍i l ⱕn
⫹ f12 . . . n 共x1 , x2 , . . . , xn 兲,
(1)
where x ⫽ ( x 1 , x 2 , . . . , x n ). The zeroth order component function f 0 is a constant representing the mean response to f(x), the
first-order component function f i ( x i ) gives the independent contribution to f(x) by the ith input variable acting alone, the secondorder component function f ij ( x i , x j ) gives the pair cooperative
contribution to f(x) by the input variables x i and x j , etc. The last
Correspondence to: H. Rabitz; e-mail: hrabitz@princeton.edu
Contract/grant sponsor: Petroleum Research Fund of the American
Chemical Society
Contract/grant sponsor: U.S. Environment Protection Agency; contract/
grant number: EPAR-827033 (to Environmental and Occupational
Health Sciences Institute)
© 2003 Wiley Periodicals, Inc.
Li et al.
278
•
Vol. 24, No. 3
f ij共 x i, x j兲 ⫽ f共 x i, x j, x ij兲 ⫺ f i共 x i兲 ⫺ f j共 x j兲 ⫺ f 0,
•
Journal of Computational Chemistry
(4)
f ijk共 x i, x j, x k兲 ⫽ f共 x i, x j, x k, x ijk兲 ⫺ f ij共 x i, x j兲 ⫺ f ik共 x i, x k兲
⫺ f jk共 x j, x k兲 ⫺ f i共 x i兲 ⫺ f j共 x j兲 ⫺ f k共 x k兲 ⫺ f 0,
. . .
(5)
ponent function of RS-HDMR satisfies the condition that the
integral with respect to any of its own variables is zero, i.e.,
冕
1
f i1i2 . . . il共 x i1, x i2, . . . , x il兲dx s ⫽ 0,
s 僆 兵i 1, i 2, . . . , i l其,
(12)
0
which defines the orthogonality relation between two RS-HDMR
component functions as
where
共 x i, x i兲 ⫽ 共 x 1, . . . , x i⫺1, x i, x i⫹1, . . . , x n兲,
(6)
共 x i, x j, x ij兲 ⫽
(7)
The last term f 12. . .n ( x 1 , x 2 , . . . , x n ) in eq. (1) is determined
by the difference between f(x) and all other Cut-HDMR component functions. The Cut-HDMR component functions f i ( x i ), f ij ( x i ,
x j ), . . . are typically attained numerically at discrete values of the
input variables x i , x j , . . . produced from sampling the output
function f(x) for employment on the R.H.S. of eqs. (2)–(5). Experience shows that often only low-order terms of Cut-HDMR are
needed to give a good approximation for f(x). Numerical data
tables can be constructed for these component functions, and the
values of f(x) for an arbitrary point x can be determined from these
tables by performing only low dimensional interpolation over
f i ( x i ), f ij ( x i , x j ), . . . .
When data are randomly sampled, this produces the RS(Random Sampling)-HDMR expansion. For RS-HDMR, we first
rescale the variables x i by some suitable transformations such that
0 ⱕ x i ⱕ 1 for all i. The output function f(x) is then defined in
the unit hypercube K n ⫽ {( x 1 , x 2 , . . . , x n )兩0 ⱕ x i ⱕ 1, i ⫽ 1,
2, . . . , n}. The component functions of RS-HDMR have the
following forms2:
冕
(8)
f共x兲dx,
冕
i
f共x兲dx ⫺ f0 ,
(9)
Kn⫺1
f ij共 x i, x j兲 ⫽
兵i 1, i 2, . . . , i l其 ⫽ 兵 j 1, j 2, . . . , j k其.
冕
(13)
A practical approach to determine the RS-HDMR component
functions has been developed5 based on approximating them in
terms of orthonormal polynomials
冘
k
f i共 x i兲 ⬇
␣ ri r共 x i兲,
(14)
r⫽1
冘冘
l
f ij共 x i, x j兲 ⬇
l⬘
ij
 pq
p共 x i兲 q共 x j兲,
(15)
p⫽1 q⫽1
冘冘冘
m
f ijk共 x i, x j, x k兲 ⬇
m⬘
m⬙
ijk
␥ pqr
p共 x i兲 q共 x j兲 r共 x k兲, . . . ,
(16)
p⫽1 q⫽1 r⫽1
where k, l, l⬘, m, m⬘, m⬙ are integers, and
1共 x兲 ⫽
Kn
f i共 x i兲 ⫽
f i1i2 . . . il共 x i1, x i2, . . . , x il兲 f j1j2· · ·jk共 x j1, x j2, . . . , x jk兲dx ⫽ 0,
Kn
共 x 1, . . . , x i⫺1, x i, x i⫹1, . . . , x j⫺1, x j, x j⫹1, . . . , x n兲, . . . .
f0 ⫽
冕
冑3共2x ⫺ 1兲,
(17)
2共 x兲 ⫽ 6 冑5共 x 2 ⫺ x ⫹ 61 兲,
(18)
1
3共 x兲 ⫽ 20 冑7共 x 3 ⫺ 23 x 2 ⫹ 53 x ⫺ 20
兲, . . . ,
(19)
which are obtained from the orthonormality conditions
f共x兲dxij ⫺ fi 共xi 兲 ⫺ fj 共xj 兲 ⫺ f0 ,
冕
(10)
Kn⫺2
1
r共 x兲dx ⫽ 0,
᭙r
(20)
r2共 x兲dx ⫽ 1,
᭙r
(21)
共 p ⫽ q兲.
(22)
0
f ijk共 x i, x j, x k兲 ⫽
冕
冕
f共x兲dxijk ⫺ fij 共xi , xj 兲 ⫺ fik 共xi , xk 兲
Kn⫺3
0
⫺ fjk 共xj , xk 兲 ⫺ fi 共xi 兲 ⫺ fj 共xj 兲 ⫺ fk 共xk 兲 ⫺ f0 ,
...,
1
(11)
where dxi , dxij and dxijk are the products dx 1 dx 2 . . .dx n without
dx i , dx i dx j and dx i dx j dx k , respectively. Finally, the last term
f 12 . . .n( x 1 , x 2 , . . . , x n ) is determined from the difference between
f(x) and all the other component functions in eq. (1). Each com-
冕
1
p共 x兲 q共 x兲dx ⫽ 0,
0
In the approximations of eqs. (14)–(16), the orthonormality
property of the set {} preserves the mutual orthogonality of the
Correlation Method for Variance Reduction of Monte Carlo Integration in RS-HDMR
RS-HDMR component functions given by eq. (13), and eq. (1) is
equivalent to
ijk
␥ pqr
⫽
冕
279
f共x兲p 共xi 兲q 共xj 兲r 共xk 兲dx
Kn
冘冘
n
f共x兲 ⬇ f0 ⫹
冘 冘冘
k
l
␣ri r 共xi 兲 ⫹
1
⬇
N
ij
pq
p 共xi 兲q 共xj 兲
1ⱕi⬍jⱕn p⫽1 q⫽1
i⫽1 r⫽1
冘 冘冘冘
m
⫹
l⬘
m⬘
m⬙
ijk
␥pqr
p 共xi 兲q 共xj 兲r 共xk 兲 ⫹ . . . , (23)
1ⱕi⬍j⬍kⱕn p⫽1 q⫽1 r⫽1
where the integers k, l, l⬘, m, m⬘, m⬙ are small (i.e., ⱕ3) for many
systems.
ijk
The coefficient { ␣ ri ,  ijpq , ␥ pqr
, . . .} may be determined by
minimization of the functional
min
ij
, · · ·其
僆兵 ␣ ri ,  pq
冕冋
n
l
共s兲
共s兲
f共x共s兲 兲p 共x共s兲
i 兲q 共xj 兲r 共xk 兲, . . . ,
k
␣ri r 共xi 兲
FN ⫽
l⬘
1
N
⫺
冘 冘冘冘
m⬘
冘
N
F共x 共s兲 兲
(29)
F共x兲dx,
(30)
1
var兵F共x兲其,
N
(31)
s⫽1
has the expectation
ij
pq
pq 共xi , xj 兲
1ⱕi⬍jⱕn p⫽1 q⫽1
m
(28)
s⫽1
(s)
(s)
where x(s) ⫽ ( x (s)
1 , x 2 , . . . , x n ) (s ⫽ 1, 2, . . . , N) is the sth
random sample point, and N is the total number of random samples. The accuracy of the resultant coefficients depends on the
Monte Carlo integration.
The theoretical foundation of Monte Carlo integration is the
following.6 If x ⫽ ( x 1 , x 2 , . . . , x n ) are independent random
variables with a uniform probability density function and F(x) is a
function of x, then the random variable
i⫽1 r⫽1
Kn
冘 冘冘
⫺
冘冘
f共x兲 ⫺ f0 ⫺
冘
N
册
m⬙
ijk
pqr
2
具F N典 ⫽
␥ p 共xi 兲q 共xj 兲r 共xk 兲 ⫺ · · · dx. (24)
1ⱕi⬍j⬍kⱕn p⫽1 q⫽1 r⫽1
冕
Kn
the variance
i
r
ij
pq
Then the coefficients { ␣ ,  , ␥
solving a linear algebraic equation
ijk
pqr
, . . .} can be obtained by
var兵FN 其 ⫽
Ay ⫽ b
(25)
and the standard deviation (standard error)
where A is a symmetric matrix, whose elements are integrals 兰
r ( x i ) r⬘ ( x i )dx i , 兰 r ( x i ) r⬘ ( x j )dx i dx j , 兰 r ( x i ) p ( x i ) p ( x j )
dx i dx j , . . . ; b is a vector whose elements are integrals for the
products of f(x) and the basis functions, for instance 兰
f(x) r ( x i )dx; and y is the vector of coefficients { ␣ ri ,  ijpq ,
ijk
␥ pqr
, . . .} in certain order. All the above integrals may be approximated by Monte Carlo integration, and then A and b can be
obtained. However, for a system with a high dimension n, A can
be a very large matrix. Solving a very large linear algebraic
equation is not computationally efficient. If the integrals of the
products of the orthonormal polynomials are treated analytically,
then A is an identity matrix result from the orthonormality property
of {}. In general, the integrals in b cannot be obtained analytically, but they may be approximated by Monte Carlo integration.
ijk
Hence, the expansion coefficients { ␣ ri ,  ijpq , ␥ pqr
, . . .} can be
readily determined by the following formulas:
␣ ri ⫽
冕
f共x兲r 共xi 兲dx ⬇
1
N
Kn
ij
⫽
 pq
冕
Kn
f共x兲p 共xi 兲q 共xj 兲dx ⬇
1
N
冘
N
f共x共s兲 兲r 共x共s兲
i 兲,
(26)
s⫽1
冘
N
s⫽1
共s兲
f共x共s兲 兲p 共x共s兲
i 兲q 共xj 兲,
(27)
兵F N其 ⫽ 共var兵FN 其兲1/ 2 ⫽
1
冑N
兵F共x兲其.
(32)
Therefore, F N can be used as an estimate of the integral 兰 K n
F(x)dx with a standard error {F N } proportional to the standard
error of the integrand random variable F(x). The error of Monte
Carlo integration can be reduced either by increasing the sample
size N or decreasing the variance of F(x) in K n . Monte Carlo
integration error becomes troublesome when the random data of
the integrand F(x) has a large variance, i.e., F(x) has rapid changes
in the desired domain, especially in sign. This behavior is expected
to arise when considering integrands with more basis functions
such as f(x) p ( x i ) q ( x j ) and f(x) p ( x i ) q ( x j ) r ( x k ) because the
functions {} are fixed regardless of the form of f(x). It was found
that the required sample size N to achieve a desired accuracy for
the coefficients in eqs. (26)–(28), and consequently, for the approximations of the RS-HDMR component functions, is mainly
dependent on the order of the component function (i.e., the number
of multiplicative functions in eqs. (26)–(28) in the RS-HDMR
expansion, but not much on the dimension n of the variable vector
x. Therefore, the determination of high-order RS-HDMR component functions generally requires additional samples. For example,
to determine f i ( x i ) a few hundred samples may give a good
accuracy, but for f ij ( x i , x j ) to achieve the same accuracy thousands
Li et al.
280
•
Vol. 24, No. 3
•
Journal of Computational Chemistry
samples may be needed. However, the sample size is often restricted by time and cost considerations. The other way to improve
the accuracy of Monte Carlo integration is to reduce the variance
of the integrand. This article will employ the correlation method
with Monte Carlo integration6 for this purpose such that for a
given sample size the accuracy of the RS-HDMR expansions will
be significantly improved.
The article is organized as follows. The next Section introduces
the correlation method with Monte Carlo integration. Then we
present an illustration of this method to an atmospheric chemical
kinetics model. Finally we conclude.
冕
f共x兲r 共xi 兲dx.
␣ ri ⫽
冕
关 f共x兲 ⫺ h共x兲兴r 共xi 兲dx ⫹
Kn
冕
h共x兲r 共xi 兲dx ⫽
(34)
ijk
␥ pqr
p 共xi 兲q 共xj 兲r 共xk 兲,
(38)
冕冋
冘 冘冘
l
⫹
冘冘
k
␣ ri r 共xi 兲
i⫽1 r⫽1
l⬘
冘 冘冘冘
m
ij
 pq
p 共xi 兲q 共xj 兲 ⫹
m⬘
m⬙
1ⱕi⬍j⬍kⱕn p⫽1 q⫽1 r⫽1
册
ijk
⫻ ␥ pqr
p 共xi 兲q 共xj 兲r 共xk 兲 r 共xi 兲dx ⫽ ␣ ri .
(39)
Then we have
␣ ri ⬇
1
N
冘
N
ri .
关 f共x 共s兲 兲 ⫺ h共x共s兲 兲兴r 共x共s兲
i 兲⫹␣
(40)
s⫽1
Similarly, we also have
Kn
ij
 pq
⬇
(2) the second integral is known analytically as
冕
h共x兲r 共xi 兲dx ⫽ cri .
(35)
Kn
Thus, the variance comes only from the first term in eq. (34).
As f(x) ⫺ h(x) is almost constant or zero everywhere, we expect
that
var兵关 f共x兲 ⫺ h共x兲兴r 共xi 兲其 ⬍ var兵 f共x兲r 共xi 兲其,
(36)
and ␣ ri may be approximated by Monte Carlo integration
␣ ri ⬇
m⬙
n
f0 ⫹
Kn
(33)
h共x兲r 共xi 兲dx.
m⬘
ijk
where the coefficients { ␣ ri ,  ijpq , ␥ pqr
} are determined by direct
Monte Carlo integration given in eqs. (26)–(28). The difference
f(x) ⫺ h(x) should be small if the truncated RS-HDMR expansion
is a good approximation of f(x). Moreover, using the orthonormality property of {} we have
Kn
To reduce the variance of the integrand f(x) r ( x i ) in K n , we
seek a reference function h(x) satisfying two conditions: (1)
f(x) ⫺ h(x) is almost constant or vanishes in the whole domain.
Introducing h(x) gives
ij
 pq
p 共xi 兲q 共xj 兲
1ⱕi⬍j⬍kⱕn p⫽1 q⫽1 r⫽1
1ⱕi⬍jⱕn p⫽1 q⫽1
␣ ri ⫽
l⬘
1ⱕi⬍jⱕn p⫽1 q⫽1
m
Kn
Consider an integral for any coefficient in eqs. (26)–(28), for
example
l
␣ ri r 共xi 兲 ⫹
冘 冘冘冘
⫹
冘 冘冘
k
i⫽1 r⫽1
冕
Correlation Method for Variance Reduction of
HDMR Monte Carlo Integration
冘冘
n
h共x兲 ⫽ f0 ⫹
1
N
冘
N
i
关 f共x 共s兲 兲 ⫺ h共x共s兲 兲兴r 共x共s兲
i 兲 ⫹ cr
(37)
s⫽1
with better accuracy than that given by eq. (26).
A truncated RS-HDMR expansion of eq. (23) can be used as
h(x), for instance
1
N
ijk
⬇
␥ pqr
1
N
冘
N
共s兲
ij
关 f共x 共s兲 兲 ⫺ h共x共s兲 兲兴p 共x共s兲
i 兲q 共xj 兲 ⫹ pq ,
(41)
s⫽1
冘
N
共s兲
共s兲
ijk
pqr
.
关 f共x 共s兲 兲 ⫺ h共x共s兲 兲兴p 共x共s兲
i 兲q 共xj 兲r 共xk 兲 ⫹ ␥
(42)
s⫽1
Equations (40)–(42) show that the first terms in these equations
ijk
are corrections for the initial values ␣ ri ,  ijpq and ␥ pqr
. The resulti
ij
ijk
ant ␣ r ,  pq and ␥ pqr may be reused as new initial values for the
construction of a new h(x) with even smaller values of f(x) ⫺ h(x)
to repeat the calculation again. Then, eqs. (40)–(42) become an
iteration procedure for a given set of random samples. We expect
that the iteration will be convergent if the initial h(x) is close to
f(x) and the sample size N is large enough.
Illustration: A Photochemical Box Model
A zero-dimensional photochemical box model designed to treat the
ozone chemistry in the background troposphere is being used to
study three-dimensional global chemical transport.7 This box
model consists of 63 reactions and 28 chemical species used to
calculate the rates of ozone production P, destruction D and the
Correlation Method for Variance Reduction of Monte Carlo Integration in RS-HDMR
Table 1. The Variable Ranges.
Table 2. The Relative Errors of the Different Order RS-HDMR
Expansions for the Output P Determined by
Direct Monte Carlo Integration
Range
Relative humidity, (%), x 1
CO (ppb), x 2
NOx (ppt), x 3
O3 (ppb), x 4
281
Lower
limit
Upper
limit
10
20
1
10
100
200
950
200
Data portion (%)a
Sample
size
(N)
300
500
tendency P ⫺ D for incorporation into the overall three-dimensional model. The details of this process are not relevant here, but
the box model provides a good testing ground for the determination of RS-HDMR component functions. The rates of ozone production P, destruction D and the tendency P ⫺ D are used as three
output variables of the box model. The input random variables x ⫽
{ x 1 , x 2 , x 3 , x 4 } are the concentrations of the four precursors:
H2O, CO, NOx , and O3. In the following, we will apply the
correlation method discussed above to this model. The first- to
third-order RS-HDMR component functions with respect to the
different outputs are approximated by third-order orthonormal
polynomial expansions (i.e., k, l, l⬘, m, m⬘, m⬙ ⫽ 3). The
expansion coefficients { ␣ ri ,  ijpq , ␥ ijk
pqr } were determined by Monte
Carlo integration with different random sample sizes. Then the
accuracy of the resultant different order RS-HDMR expansions
were examined by comparison with 53,312 exact data, which
uniformly cover the whole domain of x. The ranges of the four
variables are shown in Table 1.
Direct Monte Carlo Integration
Tables 2– 4 give the accuracy of different order RS-HDMR expansions for the three outputs whose component functions were
approximated by third-order orthonormal polynomial expansions
ijk
where expansion coefficients { ␣ ri ,  ijpq , ␥ pqr
} were determined by
direct Monte Carlo integration [i.e., eqs. (26)–(28)] with different
random sample sizes (300, 500, 1000, 3000, and 5000).
The results in Tables 2– 4 show that the sample size does not
have a significant influence on the first-order RS-HDMR expansion. This implies that the ␣ ri can be accurately determined with a
few hundred samples. The accuracy of second-order RS-HDMR
approximation does depend on sample size, and the accurate
determination of  ijpq may need thousands samples. In Tables 2– 4
the third-order RS-HDMR expansion is worse than the secondorder one, especially for small samples where it is even worse than
the first-order one. This behavior implies that even if thousands
samples are used, the direct Monte Carlo integration is still inacijk
curate for the determination of ␥ pqr
. The accuracy in Tables 2– 4
is not satisfactory at small sample sizes. The accuracy obtained
from 5000 samples is acceptable, but still not satisfactory. Only
⬃90% of the test data have relative errors less than 5%. Therefore,
direct Monte Carlo integration generally cannot give good accuracy for RS-HDMR, especially for high-order component functions.
1000
3000
5000
Relative
error
(%)
First
order
Second
order
Third
order
5
10
20
5
10
20
5
10
20
5
10
20
5
10
20
48.8
69.3
81.7
45.8
68.6
81.6
46.3
68.7
81.8
46.0
68.5
81.7
45.7
68.1
81.6
23.3
44.2
71.2
57.1
78.3
91.7
72.6
88.3
96.6
85.9
95.7
99.1
90.4
96.8
99.4
15.9
30.7
53.3
26.5
47.1
70.2
48.6
72.2
88.5
68.3
88.7
96.9
85.7
95.3
98.9
a
The percentage of 53,312 data with a relative error not larger than a given
value.
Correlation Method with Monte Carlo Integration
The correlation method with Monte Carlo integration given by eqs.
(40)–(42) was used to determine the coefficients ␣ ri ,  ijpq , and
ijk
␥ pqr
. First, h(x) was set by eq. (38) with k, l, l⬘, m, m⬘, m⬙ ⫽ 3.
Table 3. The Relative Errors of the Different Order RS-HDMR
Expansions for the Output D Determined by
Direct Monte Carlo Integration.
Data portion (%)a
Sample
size
(N)
300
500
1000
3000
5000
a
Relative
error
(%)
First
order
Second
order
Third
order
5
10
20
5
10
20
5
10
20
5
10
20
5
10
20
33.7
55.0
70.0
37.0
55.7
69.2
38.6
56.7
69.7
38.4
55.4
68.8
38.4
55.7
68.9
24.2
43.4
68.5
34.6
59.6
80.2
58.6
81.4
93.6
86.4
95.5
99.2
91.8
97.6
99.6
14.6
28.1
49.7
19.3
36.8
61.8
36.8
61.6
83.8
65.5
84.0
95.3
75.9
90.6
97.1
The percentage of 53,312 data with a relative error not larger than a given
value.
Li et al.
282
•
Vol. 24, No. 3
•
Journal of Computational Chemistry
Table 4. The Relative Errors of the Different Order RS-HDMR
Table 5. The Relative Errors of the Resultant Third-Order RS-HDMR
Expansions for the Output P ⫺ D Determined by
Direct Monte Carlo Integration
Expansions Determined by Correlation Method with Monte Carlo
Integration Using h(x) Given by Eq. (38).
Data portion (%)a
Sample
size
(N)
Relative
error
(%)
First
order
Second
order
Third
order
5
10
20
5
10
20
5
10
20
5
10
20
5
10
20
47.3
72.5
87.3
48.8
72.2
86.5
49.6
71.4
86.4
48.6
70.9
86.2
48.5
70.8
86.2
26.0
51.8
83.9
58.5
83.4
96.0
77.5
93.4
99.3
90.7
97.7
99.7
93.8
98.7
99.9
18.8
36.1
62.1
29.6
52.3
76.1
50.7
76.6
92.8
80.3
94.2
98.8
90.4
97.9
99.8
Sample
size
(N)
P
D
P ⫺ D
5
10
20
5
10
20
99.9
100
100
98.0
99.4
99.8
99.7
100
100
98.0
99.5
99.9
99.7
100
100
99.4
99.9
100
3000
300
500
1000
3000
5000
a
The percentage of 53,312 data with a relative error not larger than a given
value.
5000
a
The percentage of 53,312 data with a relative error not larger than a given
value.
5000 possibly due to the randomness of the data. When it is
convergent, correlation method with Monte Carlo integration significantly reduces the variance, and hence improves the accuracy.
For N ⱕ 1000, the divergence may come from the large error of
ijk
␥ pqr
. Therefore, we choose
冘冘
n
h共x兲 ⫽ f0 ⫹
冘 冘冘
k
i⫽1 r⫽1
The iteration was convergent for N ⫽ 3000, 5000, but divergent
when N ⱕ 1000. Figure 1 gives the iteration behavior of the
portion of the test 53,312 exact data with relative error less than
5% for output P. The third-order RS-HDMR expansion was used
with component functions expanded by first- to third-order orthonormal polynomials. The other outputs D and P ⫺ D have
similar behavior. The converged results for 3000 and 5000 samples
and the three outputs are given in Table 5.
Compared to Tables 2– 4, the results in Table 5 are excellent.
The accuracy for N ⫽ 3000 is a little better than that for N ⫽
Data portion (%)a
Relative
error
(%)
l
␣ ri r 共xi 兲 ⫹
l⬘
ij
 pq
p 共xi 兲q 共xj 兲.
1ⱕi⬍jⱕn p⫽1 q⫽1
(43)
In this case, the coefficients ␣ ri and  ijpq are determined by
iteration eqs. (40) and (41). The iteration was convergent even for
N ⫽ 300. The resultant converged values of ␣ ri and  ijpq were used
ijk
to determine the coefficients ␥ pqr
by the following equation without iteration
ijk
⬇
␥ pqr
1
N
冘
N
共s兲
共s兲
关 f共x 共s兲 兲 ⫺ h共x共s兲 兲兴p 共x共s兲
i 兲q 共xj 兲r 共xk 兲
(44)
s⫽1
because
ijk
␥ pqr
⫽
冕
h共x兲p 共xi 兲q 共xj 兲r 共xk 兲dx ⫽
Kn
Kn
冘 冘冘
l
⫹
l⬘
1ⱕi⬍jⱕn p⫽1 q⫽1
Figure 1. The iteration behavior of the portion of the test 53,312 exact
data with relative error less than 5% for output P in the iteration. The
third-order RS-HDMR expansion was used whose component functions are expanded by first- to third-order orthonormal polynomials.
冕冋
冘冘
n
f0 ⫹
k
␣ri r 共xi 兲
i⫽1 r⫽1
册
ij
pq
p 共xi 兲q 共xj 兲 p 共xi 兲q 共xj 兲r 共xk 兲dx ⫽ 0, (45)
where ␣ ri and  ijpq are converged values obtained from the above
iteration.
The second- and third-order RS-HDMR expansions for different outputs and sample sizes were constructed. The accuracy of the
second- and third-order RS-HDMR expansions for different sample sizes is shown in Tables 6 and 7. The results are satisfactory
even for small sample sizes. Compared to Table 5 for N ⫽ 3000
and 5000 using second-order RS-HDMR as h(x) gives almost the
same accuracy. Because the correlation method with Monte Carlo
Correlation Method for Variance Reduction of Monte Carlo Integration in RS-HDMR
Table 6. The Relative Errors of the Resultant Second-Order RS-HDMR
283
Table 8. The Accuracy of Second-Order Cut-HDMR Expansion.
Expansions [h(x) is the Second-Order RS-HDMR Expansion].
Sample
size
(N)
Sample
size
(N)
a
Relative
error
(%)
P
D
P ⫺ D
5
10
20
5
10
20
5
10
20
5
10
20
5
10
20
88.0
94.7
98.6
91.3
96.7
99.2
91.0
96.5
99.2
91.4
96.8
99.2
91.5
96.8
99.2
91.8
97.8
99.9
94.5
99.2
99.9
95.1
99.1
99.9
95.4
99.1
99.9
95.6
99.2
99.9
94.4
98.7
99.8
93.8
98.2
99.6
93.9
98.5
99.8
94.2
98.5
99.8
94.1
98.6
99.8
Data portion (%)
1183
300
500
1000
3000
5000
Data portion (%)a
Relative
error
(%)
P
D
P ⫺ D
5
10
20
93.27
97.48
99.46
97.76
99.25
99.79
86.84
92.01
95.54
a
The percentage of 53,312 data with a relative error not larger than a given
value.
1183 data. Its accuracy is worse than the third-order RS-HDMR
expansion in Table 7 constructed from 1000 random samples, and
even worse than that constructed from 500 samples. Therefore,
RS-HDMR can have better accuracy than Cut-HDMR for a given
sample size.
Conclusions
a
The percentage of 53,312 data with a relative error not larger than a given
value.
integration can accurately determine ␥ ijk
pqr with small sample sizes
and only one set of random samples are needed to determine all
order RS-HDMR component functions, it is possible to construct
truncated high-order RS-HDMR expansions from small random
samples. This is very beneficial for HDMR to treat high-dimensional systems, especially for those whose data cannot be obtained
easily.
For comparison, Table 8 gives the accuracy of the second-order
Cut-HDMR expansion whose look-up tables were composed of
Table 7. The Relative Errors of the Resultant Third Order RS-HDMR
Expansions [h(x) is the Second-Order RS-HDMR Expansion].
Sample
size
(N)
300
500
1000
3000
5000
Data portion (%)a
Relative
error
(%)
P
D
P ⫺ D
5
10
20
5
10
20
5
10
20
5
10
20
5
10
20
93.6
97.2
99.6
95.6
98.7
99.8
97.3
99.1
99.8
99.2
99.9
100
99.6
100
100
92.7
98.4
100
96.5
99.6
100
97.9
99.7
100
99.6
100
100
99.9
100
100
98.9
99.8
100
98.0
99.4
99.8
99.1
99.7
100
99.5
99.9
100
99.7
100
100
a
The percentage of 53,312 data with a relative error not larger than a given
value.
In practice, RS-HDMR component functions may be approximated
by orthonormal polynomial expansions whose expansion coefficients are determined by Monte Carlo integration. The Monte
Carlo integration error depends on the sample size N and the
variance of the integrand. As sample size is often restricted by time
and cost, one cannot always improve the accuracy by increasing
the sample size. The correlation method with Monte Carlo integration is a way to reduce the variance of the integrand. A secondor third-order RS-HDMR expansion can be used as a reference
function h(x) whose orthonormal polynomial expansion coefficients are first determined by direct Monte Carlo integration. The
resultant coefficients may be reused as new initial values for the
construction of a new h(x) with an even smaller value of f(x) ⫺
h(x) to repeat the calculation again. When N is not very large, this
iteration may be divergent if the third-order RS-HDMR expansion
is used as a reference function h(x). When the second-order
RS-HDMR expansion is used for h(x), the iteration was found
convergent even for N ⫽ 300 in an illustration. The resultant
second, and especially third-order, RS-HDMR expansion has very
good accuracy. The correlation method with Monte Carlo integration is capable of giving accurate RS-HDMR expansions from
small samples. This is very beneficial for practical applications of
HDMR in high-dimensional systems.
References
1. Rabitz, H.; Alis, O. F.; Shorter, J.; Shim, K. Comput Phys Commun
1998, 115, 1.
2. Alis, O. H.; Rabitz, H. J Math Chem 1999, 25, 197.
3. Shorter, J.; Rabitz, H. J Phys Chem A 1999, 103, 7192.
4. Alis, O. H.; Rabitz, H. Mathematical and Statistical Methods for Sensitivity Analysis; John Wiley and Sons: New York, 2001.
5. Li, G.; Wang, S. W.; Rabitz, H. J Phys Chem 2002, 106, 8721.
6. Kalos, M. H.; Whitlock, P. A. Monte Carlo Methods; John Wiley &
Sons: New York, 1986.
7. Wang, S. W.; Levy, II, H.; Li, G.; Rabitz, H. J Geophys Res 1999, 104,
D23, 30417–30426.