Correlation method for variance reduction of Monte Carlo integration in RS-HDMR

Genyuan Li; Herschel Rabitz; Sheng-Wei Wang; Panos G. Georgopoulos

Correlation Method for Variance Reduction of Monte Carlo Integration in RS-HDMR GENYUAN LI,1 HERSCHEL RABITZ,1 SHENG-WEI WANG,2 PANOS G. GEORGOPOULOS2 1 Department of Chemistry, Princeton University, Princeton, New Jersey 08544 Environmental and Occupational Health Sciences Institute, 170 Frelinghuysen Road, Piscataway, New Jersey 08854 2 Received 5 June 2002; Accepted 24 July 2002 Abstract: The High Dimensional Model Representation (HDMR) technique is a procedure for efﬁciently representing high-dimensional functions. A practical form of the technique, RS-HDMR, is based on randomly sampling the overall function and utilizing orthonormal polynomial expansions. The determination of expansion coefﬁcients employs Monte Carlo integration, which controls the accuracy of RS-HDMR expansions. In this article, a correlation method is used to reduce the Monte Carlo integration error. The determination of the expansion coefﬁcients becomes an iteration procedure, and the resultant RS-HDMR expansion has much better accuracy than that achieved by direct Monte Carlo integration. For an illustration in four dimensions a few hundred random samples are sufﬁcient to construct an RS-HDMR expansion by the correlation method with an accuracy comparable to that obtained by direct Monte Carlo integration with thousands of samples. © 2003 Wiley Periodicals, Inc. J Comput Chem 24: 277–283, 2003 Key words: HDMR; high dimensional systems; random sampling; correlation method with Monte Carlo integration; atmospheric chemistry Introduction High-dimensional model representation (HDMR) is a general set of quantitative model assessment and analysis tools for capturing high-dimensional input– output system behavior.1– 4 The impact of the multiple input variables on the output can be independent as well as cooperative, and HDMR expresses the model output f(x) as a ﬁnite hierarchical cooperative function expansion in terms of its input variables: 冘 n f共x兲 ⫽ f0 ⫹ i⫽1 ⫹ fi 共xi 兲 ⫹ 冘冘 fij 共xi , xj 兲 ⫹ · · · term f 12. . .n ( x 1 , x 2 , . . . , x n ) contains any residual nth order cooperative contribution of all input variables. Distinct, but formally equivalent HDMR expansions, all of the same structure as eq. (1), may be constructed. There are two commonly used HDMR expansions: Cut- and RS (Random Sampling)-HDMR. Cut-HDMR expresses f(x) in reference to a speciﬁed reference point x៮ in ⍀ of the n-dimensional input variable x space while RS-HDMR depends on the average value of f(x) over the whole domain ⍀. For Cut-HDMR a reference point x៮ is ﬁrst chosen in ⍀. The Cut-HDMR component functions with respect to reference point x៮ have the following forms1,2: 1ⱕi⬍jⱕn fi 1i 2 . . . i l共xi 1, xi 2, . . . , xi l兲 ⫹ · · · f 0 ⫽ f共 x៮ 兲, (2) f i共 x i兲 ⫽ f共 x i, x៮ i兲 ⫺ f 0, (3) 1ⱕi 1 ⬍· · ·⬍i l ⱕn ⫹ f12 . . . n 共x1 , x2 , . . . , xn 兲, (1) where x ⫽ ( x 1 , x 2 , . . . , x n ). The zeroth order component function f 0 is a constant representing the mean response to f(x), the ﬁrst-order component function f i ( x i ) gives the independent contribution to f(x) by the ith input variable acting alone, the secondorder component function f ij ( x i , x j ) gives the pair cooperative contribution to f(x) by the input variables x i and x j , etc. The last Correspondence to: H. Rabitz; e-mail: hrabitz@princeton.edu Contract/grant sponsor: Petroleum Research Fund of the American Chemical Society Contract/grant sponsor: U.S. Environment Protection Agency; contract/ grant number: EPAR-827033 (to Environmental and Occupational Health Sciences Institute) © 2003 Wiley Periodicals, Inc. Li et al. 278 • Vol. 24, No. 3 f ij共 x i, x j兲 ⫽ f共 x i, x j, x៮ ij兲 ⫺ f i共 x i兲 ⫺ f j共 x j兲 ⫺ f 0, • Journal of Computational Chemistry (4) f ijk共 x i, x j, x k兲 ⫽ f共 x i, x j, x k, x៮ ijk兲 ⫺ f ij共 x i, x j兲 ⫺ f ik共 x i, x k兲 ⫺ f jk共 x j, x k兲 ⫺ f i共 x i兲 ⫺ f j共 x j兲 ⫺ f k共 x k兲 ⫺ f 0, . . . (5) ponent function of RS-HDMR satisﬁes the condition that the integral with respect to any of its own variables is zero, i.e., 冕 1 f i1i2 . . . il共 x i1, x i2, . . . , x il兲dx s ⫽ 0, s 僆兵i 1, i 2, . . . , i l其, (12) 0 which deﬁnes the orthogonality relation between two RS-HDMR component functions as where 共 x i, x៮ i兲 ⫽ 共 x៮ 1, . . . , x៮ i⫺1, x i, x៮ i⫹1, . . . , x៮ n兲, (6) 共 x i, x j, x៮ ij兲 ⫽ (7) The last term f 12. . .n ( x 1 , x 2 , . . . , x n ) in eq. (1) is determined by the difference between f(x) and all other Cut-HDMR component functions. The Cut-HDMR component functions f i ( x i ), f ij ( x i , x j ), . . . are typically attained numerically at discrete values of the input variables x i , x j , . . . produced from sampling the output function f(x) for employment on the R.H.S. of eqs. (2)–(5). Experience shows that often only low-order terms of Cut-HDMR are needed to give a good approximation for f(x). Numerical data tables can be constructed for these component functions, and the values of f(x) for an arbitrary point x can be determined from these tables by performing only low dimensional interpolation over f i ( x i ), f ij ( x i , x j ), . . . . When data are randomly sampled, this produces the RS(Random Sampling)-HDMR expansion. For RS-HDMR, we ﬁrst rescale the variables x i by some suitable transformations such that 0 ⱕ x i ⱕ 1 for all i. The output function f(x) is then deﬁned in the unit hypercube K n ⫽ {( x 1 , x 2 , . . . , x n )兩0 ⱕ x i ⱕ 1, i ⫽ 1, 2, . . . , n}. The component functions of RS-HDMR have the following forms2: 冕 (8) f共x兲dx, 冕 i f共x兲dx ⫺ f0 , (9) Kn⫺1 f ij共 x i, x j兲 ⫽ 兵i 1, i 2, . . . , i l其 ⫽ 兵 j 1, j 2, . . . , j k其. 冕 (13) A practical approach to determine the RS-HDMR component functions has been developed5 based on approximating them in terms of orthonormal polynomials 冘 k f i共 x i兲 ⬇ ␣ ri␸ r共 x i兲, (14) r⫽1 冘冘 l f ij共 x i, x j兲 ⬇ l⬘ ij ␤ pq ␸ p共 x i兲 ␸ q共 x j兲, (15) p⫽1 q⫽1 冘冘冘 m f ijk共 x i, x j, x k兲 ⬇ m⬘ m⬙ ijk ␥ pqr ␸ p共 x i兲 ␸ q共 x j兲 ␸ r共 x k兲, . . . , (16) p⫽1 q⫽1 r⫽1 where k, l, l⬘, m, m⬘, m⬙ are integers, and ␸ 1共 x兲 ⫽ Kn f i共 x i兲 ⫽ f i1i2 . . . il共 x i1, x i2, . . . , x il兲 f j1j2· · ·jk共 x j1, x j2, . . . , x jk兲dx ⫽ 0, Kn 共 x៮ 1, . . . , x៮ i⫺1, x i, x៮ i⫹1, . . . , x៮ j⫺1, x j, x៮ j⫹1, . . . , x៮ n兲, . . . . f0 ⫽ 冕冑3共2x ⫺ 1兲, (17) ␸ 2共 x兲 ⫽ 6 冑5共 x 2 ⫺ x ⫹ 61 兲, (18) 1 ␸ 3共 x兲 ⫽ 20 冑7共 x 3 ⫺ 23 x 2 ⫹ 53 x ⫺ 20 兲, . . . , (19) which are obtained from the orthonormality conditions f共x兲dxij ⫺ fi 共xi 兲 ⫺ fj 共xj 兲 ⫺ f0 , 冕 (10) Kn⫺2 1 ␸ r共 x兲dx ⫽ 0, ᭙r (20) ␸ r2共 x兲dx ⫽ 1, ᭙r (21) 共 p ⫽ q兲. (22) 0 f ijk共 x i, x j, x k兲 ⫽ 冕冕 f共x兲dxijk ⫺ fij 共xi , xj 兲 ⫺ fik 共xi , xk 兲 Kn⫺3 0 ⫺ fjk 共xj , xk 兲 ⫺ fi 共xi 兲 ⫺ fj 共xj 兲 ⫺ fk 共xk 兲 ⫺ f0 , ..., 1 (11) where dxi , dxij and dxijk are the products dx 1 dx 2 . . .dx n without dx i , dx i dx j and dx i dx j dx k , respectively. Finally, the last term f 12 . . .n( x 1 , x 2 , . . . , x n ) is determined from the difference between f(x) and all the other component functions in eq. (1). Each com- 冕 1 ␸ p共 x兲 ␸ q共 x兲dx ⫽ 0, 0 In the approximations of eqs. (14)–(16), the orthonormality property of the set {␸} preserves the mutual orthogonality of the Correlation Method for Variance Reduction of Monte Carlo Integration in RS-HDMR RS-HDMR component functions given by eq. (13), and eq. (1) is equivalent to ijk ␥ pqr ⫽ 冕 279 f共x兲␸p 共xi 兲␸q 共xj 兲␸r 共xk 兲dx Kn 冘冘 n f共x兲 ⬇ f0 ⫹ 冘冘冘 k l ␣ri ␸r 共xi 兲 ⫹ 1 ⬇ N ij ␤pq ␸p 共xi 兲␸q 共xj 兲 1ⱕi⬍jⱕn p⫽1 q⫽1 i⫽1 r⫽1 冘冘冘冘 m ⫹ l⬘ m⬘ m⬙ ijk ␥pqr ␸p 共xi 兲␸q 共xj 兲␸r 共xk 兲 ⫹ . . . , (23) 1ⱕi⬍j⬍kⱕn p⫽1 q⫽1 r⫽1 where the integers k, l, l⬘, m, m⬘, m⬙ are small (i.e., ⱕ3) for many systems. ijk The coefﬁcient { ␣ ri , ␤ ijpq , ␥ pqr , . . .} may be determined by minimization of the functional min ij , · · ·其 ␰ 僆兵 ␣ ri , ␤ pq 冕冋 n l 共s兲共s兲 f共x共s兲兲␸p 共x共s兲 i 兲␸q 共xj 兲␸r 共xk 兲, . . . , k ␣ri ␸r 共xi 兲 FN ⫽ l⬘ 1 N ⫺ 冘冘冘冘 m⬘ 冘 N F共x 共s兲兲 (29) F共x兲dx, (30) 1 var兵F共x兲其, N (31) s⫽1 has the expectation ij ␤pq ␸pq 共xi , xj 兲 1ⱕi⬍jⱕn p⫽1 q⫽1 m (28) s⫽1 (s) (s) where x(s) ⫽ ( x (s) 1 , x 2 , . . . , x n ) (s ⫽ 1, 2, . . . , N) is the sth random sample point, and N is the total number of random samples. The accuracy of the resultant coefﬁcients depends on the Monte Carlo integration. The theoretical foundation of Monte Carlo integration is the following.6 If x ⫽ ( x 1 , x 2 , . . . , x n ) are independent random variables with a uniform probability density function and F(x) is a function of x, then the random variable i⫽1 r⫽1 Kn 冘冘冘 ⫺ 冘冘 f共x兲 ⫺ f0 ⫺ 冘 N 册 m⬙ ijk pqr 2 具F N典 ⫽ ␥ ␸p 共xi 兲␸q 共xj 兲␸r 共xk 兲 ⫺ · · · dx. (24) 1ⱕi⬍j⬍kⱕn p⫽1 q⫽1 r⫽1 冕 Kn the variance i r ij pq Then the coefﬁcients { ␣ , ␤ , ␥ solving a linear algebraic equation ijk pqr , . . .} can be obtained by var兵FN 其 ⫽ Ay ⫽ b (25) and the standard deviation (standard error) where A is a symmetric matrix, whose elements are integrals 兰 ␸ r ( x i ) ␸ r⬘ ( x i )dx i , 兰 ␸ r ( x i ) ␸ r⬘ ( x j )dx i dx j , 兰 ␸ r ( x i ) ␸ p ( x i ) ␸ p ( x j ) dx i dx j , . . . ; b is a vector whose elements are integrals for the products of f(x) and the basis functions, for instance 兰 f(x) ␸ r ( x i )dx; and y is the vector of coefﬁcients { ␣ ri , ␤ ijpq , ijk ␥ pqr , . . .} in certain order. All the above integrals may be approximated by Monte Carlo integration, and then A and b can be obtained. However, for a system with a high dimension n, A can be a very large matrix. Solving a very large linear algebraic equation is not computationally efﬁcient. If the integrals of the products of the orthonormal polynomials are treated analytically, then A is an identity matrix result from the orthonormality property of {␸}. In general, the integrals in b cannot be obtained analytically, but they may be approximated by Monte Carlo integration. ijk Hence, the expansion coefﬁcients { ␣ ri , ␤ ijpq , ␥ pqr , . . .} can be readily determined by the following formulas: ␣ ri ⫽ 冕 f共x兲␸r 共xi 兲dx ⬇ 1 N Kn ij ⫽ ␤ pq 冕 Kn f共x兲␸p 共xi 兲␸q 共xj 兲dx ⬇ 1 N 冘 N f共x共s兲兲␸r 共x共s兲 i 兲, (26) s⫽1 冘 N s⫽1 共s兲 f共x共s兲兲␸p 共x共s兲 i 兲␸q 共xj 兲, (27) ␴ 兵F N其 ⫽ 共var兵FN 其兲1/ 2 ⫽ 1 冑N ␴兵F共x兲其. (32) Therefore, F N can be used as an estimate of the integral 兰 K n F(x)dx with a standard error ␴ {F N } proportional to the standard error of the integrand random variable F(x). The error of Monte Carlo integration can be reduced either by increasing the sample size N or decreasing the variance of F(x) in K n . Monte Carlo integration error becomes troublesome when the random data of the integrand F(x) has a large variance, i.e., F(x) has rapid changes in the desired domain, especially in sign. This behavior is expected to arise when considering integrands with more basis functions such as f(x) ␸ p ( x i ) ␸ q ( x j ) and f(x) ␸ p ( x i ) ␸ q ( x j ) ␸ r ( x k ) because the functions {␸} are ﬁxed regardless of the form of f(x). It was found that the required sample size N to achieve a desired accuracy for the coefﬁcients in eqs. (26)–(28), and consequently, for the approximations of the RS-HDMR component functions, is mainly dependent on the order of the component function (i.e., the number of multiplicative ␸ functions in eqs. (26)–(28) in the RS-HDMR expansion, but not much on the dimension n of the variable vector x. Therefore, the determination of high-order RS-HDMR component functions generally requires additional samples. For example, to determine f i ( x i ) a few hundred samples may give a good accuracy, but for f ij ( x i , x j ) to achieve the same accuracy thousands Li et al. 280 • Vol. 24, No. 3 • Journal of Computational Chemistry samples may be needed. However, the sample size is often restricted by time and cost considerations. The other way to improve the accuracy of Monte Carlo integration is to reduce the variance of the integrand. This article will employ the correlation method with Monte Carlo integration6 for this purpose such that for a given sample size the accuracy of the RS-HDMR expansions will be signiﬁcantly improved. The article is organized as follows. The next Section introduces the correlation method with Monte Carlo integration. Then we present an illustration of this method to an atmospheric chemical kinetics model. Finally we conclude. 冕 f共x兲␸r 共xi 兲dx. ␣ ri ⫽ 冕关 f共x兲 ⫺ h共x兲兴␸r 共xi 兲dx ⫹ Kn 冕 h共x兲␸r 共xi 兲dx ⫽ (34) ijk ␥៮ pqr ␸p 共xi 兲␸q 共xj 兲␸r 共xk 兲, (38) 冕冋冘冘冘 l ⫹ 冘冘 k ␣៮ ri ␸r 共xi 兲 i⫽1 r⫽1 l⬘ 冘冘冘冘 m ij ␤៮ pq ␸p 共xi 兲␸q 共xj 兲 ⫹ m⬘ m⬙ 1ⱕi⬍j⬍kⱕn p⫽1 q⫽1 r⫽1 册 ijk ⫻ ␥៮ pqr ␸p 共xi 兲␸q 共xj 兲␸r 共xk 兲 ␸r 共xi 兲dx ⫽ ␣៮ ri . (39) Then we have ␣ ri ⬇ 1 N 冘 N ៮ ri . 关 f共x 共s兲兲 ⫺ h共x共s兲兲兴␸r 共x共s兲 i 兲⫹␣ (40) s⫽1 Similarly, we also have Kn ij ␤ pq ⬇ (2) the second integral is known analytically as 冕 h共x兲␸r 共xi 兲dx ⫽ cri . (35) Kn Thus, the variance comes only from the ﬁrst term in eq. (34). As f(x) ⫺ h(x) is almost constant or zero everywhere, we expect that var兵关 f共x兲 ⫺ h共x兲兴␸r 共xi 兲其 ⬍ var兵 f共x兲␸r 共xi 兲其, (36) and ␣ ri may be approximated by Monte Carlo integration ␣ ri ⬇ m⬙ n f0 ⫹ Kn (33) h共x兲␸r 共xi 兲dx. m⬘ ijk where the coefﬁcients { ␣៮ ri , ␤៮ ijpq , ␥៮ pqr } are determined by direct Monte Carlo integration given in eqs. (26)–(28). The difference f(x) ⫺ h(x) should be small if the truncated RS-HDMR expansion is a good approximation of f(x). Moreover, using the orthonormality property of {␸} we have Kn To reduce the variance of the integrand f(x) ␸ r ( x i ) in K n , we seek a reference function h(x) satisfying two conditions: (1) f(x) ⫺ h(x) is almost constant or vanishes in the whole domain. Introducing h(x) gives ij ␤៮ pq ␸p 共xi 兲␸q 共xj 兲 1ⱕi⬍j⬍kⱕn p⫽1 q⫽1 r⫽1 1ⱕi⬍jⱕn p⫽1 q⫽1 ␣ ri ⫽ l⬘ 1ⱕi⬍jⱕn p⫽1 q⫽1 m Kn Consider an integral for any coefﬁcient in eqs. (26)–(28), for example l ␣៮ ri ␸r 共xi 兲 ⫹ 冘冘冘冘 ⫹ 冘冘冘 k i⫽1 r⫽1 冕 Correlation Method for Variance Reduction of HDMR Monte Carlo Integration 冘冘 n h共x兲 ⫽ f0 ⫹ 1 N 冘 N i 关 f共x 共s兲兲 ⫺ h共x共s兲兲兴␸r 共x共s兲 i 兲 ⫹ cr (37) s⫽1 with better accuracy than that given by eq. (26). A truncated RS-HDMR expansion of eq. (23) can be used as h(x), for instance 1 N ijk ⬇ ␥ pqr 1 N 冘 N 共s兲 ៮ ij 关 f共x 共s兲兲 ⫺ h共x共s兲兲兴␸p 共x共s兲 i 兲␸q 共xj 兲 ⫹ ␤pq , (41) s⫽1 冘 N 共s兲共s兲 ijk ៮ pqr . 关 f共x 共s兲兲 ⫺ h共x共s兲兲兴␸p 共x共s兲 i 兲␸q 共xj 兲␸r 共xk 兲 ⫹ ␥ (42) s⫽1 Equations (40)–(42) show that the ﬁrst terms in these equations ijk are corrections for the initial values ␣៮ ri , ␤៮ ijpq and ␥៮ pqr . The resulti ij ijk ant ␣ r , ␤ pq and ␥ pqr may be reused as new initial values for the construction of a new h(x) with even smaller values of f(x) ⫺ h(x) to repeat the calculation again. Then, eqs. (40)–(42) become an iteration procedure for a given set of random samples. We expect that the iteration will be convergent if the initial h(x) is close to f(x) and the sample size N is large enough. Illustration: A Photochemical Box Model A zero-dimensional photochemical box model designed to treat the ozone chemistry in the background troposphere is being used to study three-dimensional global chemical transport.7 This box model consists of 63 reactions and 28 chemical species used to calculate the rates of ozone production P, destruction D and the Correlation Method for Variance Reduction of Monte Carlo Integration in RS-HDMR Table 1. The Variable Ranges. Table 2. The Relative Errors of the Different Order RS-HDMR Expansions for the Output P Determined by Direct Monte Carlo Integration Range Relative humidity, (%), x 1 CO (ppb), x 2 NOx (ppt), x 3 O3 (ppb), x 4 281 Lower limit Upper limit 10 20 1 10 100 200 950 200 Data portion (%)a Sample size (N) 300 500 tendency P ⫺ D for incorporation into the overall three-dimensional model. The details of this process are not relevant here, but the box model provides a good testing ground for the determination of RS-HDMR component functions. The rates of ozone production P, destruction D and the tendency P ⫺ D are used as three output variables of the box model. The input random variables x ⫽ { x 1 , x 2 , x 3 , x 4 } are the concentrations of the four precursors: H2O, CO, NOx , and O3. In the following, we will apply the correlation method discussed above to this model. The ﬁrst- to third-order RS-HDMR component functions with respect to the different outputs are approximated by third-order orthonormal polynomial expansions (i.e., k, l, l⬘, m, m⬘, m⬙ ⫽ 3). The expansion coefﬁcients { ␣ ri , ␤ ijpq , ␥ ijk pqr } were determined by Monte Carlo integration with different random sample sizes. Then the accuracy of the resultant different order RS-HDMR expansions were examined by comparison with 53,312 exact data, which uniformly cover the whole domain of x. The ranges of the four variables are shown in Table 1. Direct Monte Carlo Integration Tables 2– 4 give the accuracy of different order RS-HDMR expansions for the three outputs whose component functions were approximated by third-order orthonormal polynomial expansions ijk where expansion coefﬁcients { ␣ ri , ␤ ijpq , ␥ pqr } were determined by direct Monte Carlo integration [i.e., eqs. (26)–(28)] with different random sample sizes (300, 500, 1000, 3000, and 5000). The results in Tables 2– 4 show that the sample size does not have a signiﬁcant inﬂuence on the ﬁrst-order RS-HDMR expansion. This implies that the ␣ ri can be accurately determined with a few hundred samples. The accuracy of second-order RS-HDMR approximation does depend on sample size, and the accurate determination of ␤ ijpq may need thousands samples. In Tables 2– 4 the third-order RS-HDMR expansion is worse than the secondorder one, especially for small samples where it is even worse than the ﬁrst-order one. This behavior implies that even if thousands samples are used, the direct Monte Carlo integration is still inacijk curate for the determination of ␥ pqr . The accuracy in Tables 2– 4 is not satisfactory at small sample sizes. The accuracy obtained from 5000 samples is acceptable, but still not satisfactory. Only ⬃90% of the test data have relative errors less than 5%. Therefore, direct Monte Carlo integration generally cannot give good accuracy for RS-HDMR, especially for high-order component functions. 1000 3000 5000 Relative error (%) First order Second order Third order 5 10 20 5 10 20 5 10 20 5 10 20 5 10 20 48.8 69.3 81.7 45.8 68.6 81.6 46.3 68.7 81.8 46.0 68.5 81.7 45.7 68.1 81.6 23.3 44.2 71.2 57.1 78.3 91.7 72.6 88.3 96.6 85.9 95.7 99.1 90.4 96.8 99.4 15.9 30.7 53.3 26.5 47.1 70.2 48.6 72.2 88.5 68.3 88.7 96.9 85.7 95.3 98.9 a The percentage of 53,312 data with a relative error not larger than a given value. Correlation Method with Monte Carlo Integration The correlation method with Monte Carlo integration given by eqs. (40)–(42) was used to determine the coefﬁcients ␣ ri , ␤ ijpq , and ijk ␥ pqr . First, h(x) was set by eq. (38) with k, l, l⬘, m, m⬘, m⬙ ⫽ 3. Table 3. The Relative Errors of the Different Order RS-HDMR Expansions for the Output D Determined by Direct Monte Carlo Integration. Data portion (%)a Sample size (N) 300 500 1000 3000 5000 a Relative error (%) First order Second order Third order 5 10 20 5 10 20 5 10 20 5 10 20 5 10 20 33.7 55.0 70.0 37.0 55.7 69.2 38.6 56.7 69.7 38.4 55.4 68.8 38.4 55.7 68.9 24.2 43.4 68.5 34.6 59.6 80.2 58.6 81.4 93.6 86.4 95.5 99.2 91.8 97.6 99.6 14.6 28.1 49.7 19.3 36.8 61.8 36.8 61.6 83.8 65.5 84.0 95.3 75.9 90.6 97.1 The percentage of 53,312 data with a relative error not larger than a given value. Li et al. 282 • Vol. 24, No. 3 • Journal of Computational Chemistry Table 4. The Relative Errors of the Different Order RS-HDMR Table 5. The Relative Errors of the Resultant Third-Order RS-HDMR Expansions for the Output P ⫺ D Determined by Direct Monte Carlo Integration Expansions Determined by Correlation Method with Monte Carlo Integration Using h(x) Given by Eq. (38). Data portion (%)a Sample size (N) Relative error (%) First order Second order Third order 5 10 20 5 10 20 5 10 20 5 10 20 5 10 20 47.3 72.5 87.3 48.8 72.2 86.5 49.6 71.4 86.4 48.6 70.9 86.2 48.5 70.8 86.2 26.0 51.8 83.9 58.5 83.4 96.0 77.5 93.4 99.3 90.7 97.7 99.7 93.8 98.7 99.9 18.8 36.1 62.1 29.6 52.3 76.1 50.7 76.6 92.8 80.3 94.2 98.8 90.4 97.9 99.8 Sample size (N) P D P ⫺ D 5 10 20 5 10 20 99.9 100 100 98.0 99.4 99.8 99.7 100 100 98.0 99.5 99.9 99.7 100 100 99.4 99.9 100 3000 300 500 1000 3000 5000 a The percentage of 53,312 data with a relative error not larger than a given value. 5000 a The percentage of 53,312 data with a relative error not larger than a given value. 5000 possibly due to the randomness of the data. When it is convergent, correlation method with Monte Carlo integration signiﬁcantly reduces the variance, and hence improves the accuracy. For N ⱕ 1000, the divergence may come from the large error of ijk ␥៮ pqr . Therefore, we choose 冘冘 n h共x兲 ⫽ f0 ⫹ 冘冘冘 k i⫽1 r⫽1 The iteration was convergent for N ⫽ 3000, 5000, but divergent when N ⱕ 1000. Figure 1 gives the iteration behavior of the portion of the test 53,312 exact data with relative error less than 5% for output P. The third-order RS-HDMR expansion was used with component functions expanded by ﬁrst- to third-order orthonormal polynomials. The other outputs D and P ⫺ D have similar behavior. The converged results for 3000 and 5000 samples and the three outputs are given in Table 5. Compared to Tables 2– 4, the results in Table 5 are excellent. The accuracy for N ⫽ 3000 is a little better than that for N ⫽ Data portion (%)a Relative error (%) l ␣៮ ri ␸r 共xi 兲 ⫹ l⬘ ij ␤៮ pq ␸p 共xi 兲␸q 共xj 兲. 1ⱕi⬍jⱕn p⫽1 q⫽1 (43) In this case, the coefﬁcients ␣ ri and ␤ ijpq are determined by iteration eqs. (40) and (41). The iteration was convergent even for N ⫽ 300. The resultant converged values of ␣ ri and ␤ ijpq were used ijk to determine the coefﬁcients ␥ pqr by the following equation without iteration ijk ⬇ ␥ pqr 1 N 冘 N 共s兲共s兲关 f共x 共s兲兲 ⫺ h共x共s兲兲兴␸p 共x共s兲 i 兲␸q 共xj 兲␸r 共xk 兲 (44) s⫽1 because ijk ␥៮ pqr ⫽ 冕 h共x兲␸p 共xi 兲␸q 共xj 兲␸r 共xk 兲dx ⫽ Kn Kn 冘冘冘 l ⫹ l⬘ 1ⱕi⬍jⱕn p⫽1 q⫽1 Figure 1. The iteration behavior of the portion of the test 53,312 exact data with relative error less than 5% for output P in the iteration. The third-order RS-HDMR expansion was used whose component functions are expanded by ﬁrst- to third-order orthonormal polynomials. 冕冋冘冘 n f0 ⫹ k ␣ri ␸r 共xi 兲 i⫽1 r⫽1 册 ij ␤pq ␸p 共xi 兲␸q 共xj 兲 ␸p 共xi 兲␸q 共xj 兲␸r 共xk 兲dx ⫽ 0, (45) where ␣ ri and ␤ ijpq are converged values obtained from the above iteration. The second- and third-order RS-HDMR expansions for different outputs and sample sizes were constructed. The accuracy of the second- and third-order RS-HDMR expansions for different sample sizes is shown in Tables 6 and 7. The results are satisfactory even for small sample sizes. Compared to Table 5 for N ⫽ 3000 and 5000 using second-order RS-HDMR as h(x) gives almost the same accuracy. Because the correlation method with Monte Carlo Correlation Method for Variance Reduction of Monte Carlo Integration in RS-HDMR Table 6. The Relative Errors of the Resultant Second-Order RS-HDMR 283 Table 8. The Accuracy of Second-Order Cut-HDMR Expansion. Expansions [h(x) is the Second-Order RS-HDMR Expansion]. Sample size (N) Sample size (N) a Relative error (%) P D P ⫺ D 5 10 20 5 10 20 5 10 20 5 10 20 5 10 20 88.0 94.7 98.6 91.3 96.7 99.2 91.0 96.5 99.2 91.4 96.8 99.2 91.5 96.8 99.2 91.8 97.8 99.9 94.5 99.2 99.9 95.1 99.1 99.9 95.4 99.1 99.9 95.6 99.2 99.9 94.4 98.7 99.8 93.8 98.2 99.6 93.9 98.5 99.8 94.2 98.5 99.8 94.1 98.6 99.8 Data portion (%) 1183 300 500 1000 3000 5000 Data portion (%)a Relative error (%) P D P ⫺ D 5 10 20 93.27 97.48 99.46 97.76 99.25 99.79 86.84 92.01 95.54 a The percentage of 53,312 data with a relative error not larger than a given value. 1183 data. Its accuracy is worse than the third-order RS-HDMR expansion in Table 7 constructed from 1000 random samples, and even worse than that constructed from 500 samples. Therefore, RS-HDMR can have better accuracy than Cut-HDMR for a given sample size. Conclusions a The percentage of 53,312 data with a relative error not larger than a given value. integration can accurately determine ␥ ijk pqr with small sample sizes and only one set of random samples are needed to determine all order RS-HDMR component functions, it is possible to construct truncated high-order RS-HDMR expansions from small random samples. This is very beneﬁcial for HDMR to treat high-dimensional systems, especially for those whose data cannot be obtained easily. For comparison, Table 8 gives the accuracy of the second-order Cut-HDMR expansion whose look-up tables were composed of Table 7. The Relative Errors of the Resultant Third Order RS-HDMR Expansions [h(x) is the Second-Order RS-HDMR Expansion]. Sample size (N) 300 500 1000 3000 5000 Data portion (%)a Relative error (%) P D P ⫺ D 5 10 20 5 10 20 5 10 20 5 10 20 5 10 20 93.6 97.2 99.6 95.6 98.7 99.8 97.3 99.1 99.8 99.2 99.9 100 99.6 100 100 92.7 98.4 100 96.5 99.6 100 97.9 99.7 100 99.6 100 100 99.9 100 100 98.9 99.8 100 98.0 99.4 99.8 99.1 99.7 100 99.5 99.9 100 99.7 100 100 a The percentage of 53,312 data with a relative error not larger than a given value. In practice, RS-HDMR component functions may be approximated by orthonormal polynomial expansions whose expansion coefﬁcients are determined by Monte Carlo integration. The Monte Carlo integration error depends on the sample size N and the variance of the integrand. As sample size is often restricted by time and cost, one cannot always improve the accuracy by increasing the sample size. The correlation method with Monte Carlo integration is a way to reduce the variance of the integrand. A secondor third-order RS-HDMR expansion can be used as a reference function h(x) whose orthonormal polynomial expansion coefﬁcients are ﬁrst determined by direct Monte Carlo integration. The resultant coefﬁcients may be reused as new initial values for the construction of a new h(x) with an even smaller value of f(x) ⫺ h(x) to repeat the calculation again. When N is not very large, this iteration may be divergent if the third-order RS-HDMR expansion is used as a reference function h(x). When the second-order RS-HDMR expansion is used for h(x), the iteration was found convergent even for N ⫽ 300 in an illustration. The resultant second, and especially third-order, RS-HDMR expansion has very good accuracy. The correlation method with Monte Carlo integration is capable of giving accurate RS-HDMR expansions from small samples. This is very beneﬁcial for practical applications of HDMR in high-dimensional systems. References 1. Rabitz, H.; Alis, O. F.; Shorter, J.; Shim, K. Comput Phys Commun 1998, 115, 1. 2. Alis, O. H.; Rabitz, H. J Math Chem 1999, 25, 197. 3. Shorter, J.; Rabitz, H. J Phys Chem A 1999, 103, 7192. 4. Alis, O. H.; Rabitz, H. Mathematical and Statistical Methods for Sensitivity Analysis; John Wiley and Sons: New York, 2001. 5. Li, G.; Wang, S. W.; Rabitz, H. J Phys Chem 2002, 106, 8721. 6. Kalos, M. H.; Whitlock, P. A. Monte Carlo Methods; John Wiley & Sons: New York, 1986. 7. Wang, S. W.; Levy, II, H.; Li, G.; Rabitz, H. J Geophys Res 1999, 104, D23, 30417–30426.

RELATED PAPERS

RELATED TOPICS

Log In

Correlation method for variance reduction of Monte Carlo integration in RS-HDMR

Correlation method for variance reduction of Monte Carlo integration in RS-HDMR

Related Papers

RELATED PAPERS

RELATED TOPICS