Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Estimating a class of diffusions from discrete observations via approximate maximum likelihood method

Statistics, 2017
...Read more
arXiv:1607.06699v2 [math.ST] 20 Aug 2018 Estimating a class of diffusions from discrete observations via approximate maximum likelihood method Miljenko Huzak Abstract. An approximate maximum likelihood method of estimation of diffusion parameters (ϑ, σ) based on discrete observations of a diffusion X along fixed time-interval [0,T ] and Euler ap- proximation of integrals is analyzed. We assume that X satisfies a SDE of form dXt = µ(Xt ) dt + σb(Xt ) dWt, with non-random initial condition. SDE is nonlinear in ϑ generally. Based on assump- tion that maximum likelihood estimator ˆ ϑ T of the drift parameter based on continuous observation of a path over [0,T ] exists we prove that measurable estimator ( ˆ ϑ n,T , ˆ σ n,T ) of the parameters obtained from discrete observations of X along [0,T ] by maximization of the approximate log-likelihood function exists, ˆ σ n,T being consistent and asymptotically normal, and ˆ ϑ n,T ˆ ϑ T tends to zero with rate δ n,T in probability when δ n,T = max 0i<n (t i+1 t i ) tends to zero with T fixed. The same holds in case of an ergodic diffusion when T goes to infinity in a way that n goes to zero with equidistant sampling, and we applied these to show consistency and asymptotical normality of ˆ ϑ n,T σ n,T and asymptotic efficiency of ˆ ϑ n,T in this case. Key words. parameter estimation, diffusion processes, discrete observation AMS subject classifications. 62M05, 62F12, 60J60 1 Introduction Let X =(X t ,t 0) be an one-dimensional diffusion which satisfies Itˆ o’s stochastic differential equation (SDE) of the form X t = x 0 + t 0 µ(X s ) ds + t 0 σb(X s ) dW s , t> 0. (1) Here, W =(W t ,t 0) is an one-dimensional standard Brownian motion, µ and b are real functions such that they ensure the uniqueness in law of a solution to (1) and x 0 is a given deterministic initial value of X (see e.g. [25] as a reference for SDE). The problem is to estimate unknown vector parameter θ =(ϑ, σ) of X , given a discrete observation (X ti , 0 i n) of a trajectory (X t ,t [0,T ]) over a time interval subdivision 0 =: t 0 <t 1 < ··· <t n := T ,(n is a positive integer) with diameter δ n,T := max 0i<n (t i+1 t i ), T> 0 being fixed. Component ϑ of θ is a (vector) drift parameter, and σ is a diffusion coefficient parameter. We assume that ϑ belongs to drift parameter space Θ, which is an open and convex set in Euclidean space R d , and that σ is a positive real number. Hence, θ =(ϑ, σ) is an element of open and convex parameter space Ψ := Θ ×〈0, +∞〉. This work has been partially supported by Croatian Science Foundation under the project 3526, and by Ministry of Science, Education and Sports, Republic of Croatia, Grants 037-0372790-2800 and 037058. Department of Mathematics, Faculty of Science, University of Zagreb, Bijeniˇcka 30, HR-10002 Zagreb, Croatia (huzak@math.hr) 1
ESTIMATING A CLASS OF DIFFUSIONS VIA APPROXIMATE ML METHOD 2 Diffusion parameter estimation problems based on discrete observations have been discussed by many authors (see [1, 2, 3, 4, 9, 10, 12, 19, 20, 22, 23, 28]). Although the maximum likelihood estimator (MLE) has the usual good properties (see [9]), it may not be possible to calculate it explicitly because the transition density of process X is generally unknown and so the likelihood function (LF) of the discrete process is unknown as well. Hence, other methods of estimations have to be considered. The method of parameter estimation which is discussed in this paper and described in Section 3 below, is based on a Gaussian approximation of the transition density and can be interpreted as based on maximization of a discretized continuous-time log-likelihood function (LLF) as well. Such methods are usually called quasi-likelihood or approximate maximum likelihood (AML) methods, and in these ways obtained estimators we will briefly call approximate maximum likelihood estimators (AMLEs). Motivation for analyzing the method described in Section 3 is in the fact that it can provide us with useful estimators of the parameters. It is well known that in a such way obtained AMLE of diffusion coefficient parameter σ is consistent and asymptotically normally distributed over fixed observational time interval [0,T ] when δ n,T 0 (see [10] in case where all drift parameters are known, and see [14] in general cases). The same holds in ergodic diffusion cases when T +in a way that δ n,T = T/n 0 for appropriate equidistant sampling (see e.g. [12] or [19]). Local asymptotic properties of the AMLE of drift parameters over fixed interval [0,T ] and when δ n,T 0 are less known especially in more general cases, particularly when drift is nonlinear in its parameters (see [5]). Although a knowledge of local asymptotic properties of drift parameter AMLEs does not imply their consistency or asymptotic normality necessarily it may help in further analysis of the AMLEs which might include, for example, measuring effects of discretization on the estimator’s standard errors with applications in simulation studies. In ergodic diffusion cases it is well known that the AMLE of drift (vector) parameter is consistent and asymptotically normal and efficient when T +in a way that 2 n,T 0 for equidistant sampling (see e.g. [12] for one-dimensional case and [19] for vector and more general cases) but the rate of convergence of ˆ ϑ n,T ˆ ϑ T to zero are still less investigated. Let us stress that the problems of statistical inferences about diffusion drift parameters are very important especially in biomedical modeling (see [16]). For the completeness we should also stress that local convergence of the AMLE of both vector parameters θ =(ϑ, σ) to the MLE of θ based on discrete observations and equidistant sampling, have been investigated (see [1, 3, 23]). Let ˜ θ n,δ denote MLE of θ based on discrete observations with δ n,T δ = const., and let ˜ θ (k) n,δ be AMLE obtained from an approximate LF based on a closed-form kth order approximation of the transition densities. Then in case of Hermite-polynomial-based analytical expansion approach for approximation of transition density, ˜ θ (kn) n,δ ˜ θ n,δ 0 when k n →∞, and a sequence (k n ) can be chosen sufficiently large to deliver any rate of convergence (see [1]), and there exist sequences of regular matrices (S n,δ ) and positive numbers (δ n ) such that δ n 0 and S 1 n,δn ( ˜ θ (k) n,δn ˜ θ n,δn )= O P (1) (see [3]). For an alternative approach to approximation and analog results, see [23]. In this paper we analyze the considered AMLE of drift parameters by studying the relation between the AMLE and the MLE obtained from continuously observed diffusion paths. We state general conditions for proving and prove: (1.) existence and measurability of the AMLE, (2.) that ˆ ϑ n,T ˆ ϑ T converges to zero with rate δ n,T in probability when δ n,T 0 over fixed bounded observational time interval [0,T ], and (3.) that ˆ ϑ n,T ˆ ϑ T converges to zero with rate δ n,T in probability when T + in a way that δ n,T = T/n 0 in an ergodic diffusion case and equidistant sampling.
arXiv:1607.06699v2 [math.ST] 20 Aug 2018 Estimating a class of diffusions from discrete observations via approximate maximum likelihood method∗ Miljenko Huzak† Abstract. An approximate maximum likelihood method of estimation of diffusion parameters (ϑ, σ) based on discrete observations of a diffusion X along fixed time-interval [0, T ] and Euler approximation of integrals is analyzed. We assume that X satisfies a SDE of form dXt = µ(Xt , ϑ) dt + √ σb(Xt ) dWt , with non-random initial condition. SDE is nonlinear in ϑ generally. Based on assumption that maximum likelihood estimator ϑ̂T of the drift parameter based on continuous observation of a path over [0, T ] exists we prove that measurable estimator (ϑ̂n,T , σ̂n,T ) of the parameters obtained from discrete observations of X along [0, T ] by maximization of the approximate log-likelihood function √ exists, σ̂n,T being consistent and asymptotically normal, and ϑ̂n,T − ϑ̂T tends to zero with rate δ n,T in probability when δn,T = max0≤i<n (ti+1 − ti ) tends to zero with T fixed. The same holds in case of an ergodic diffusion when T goes to infinity in a way that T δn goes to zero with equidistant sampling, and we applied these to show consistency and asymptotical normality of ϑ̂n,T , σ̂n,T and asymptotic efficiency of ϑ̂n,T in this case. Key words. parameter estimation, diffusion processes, discrete observation AMS subject classifications. 62M05, 62F12, 60J60 1 Introduction Let X = (Xt , t ≥ 0) be an one-dimensional diffusion which satisfies Itô’s stochastic differential equation (SDE) of the form Rt Rt√ (1) Xt = x0 + 0 µ(Xs , ϑ) ds + 0 σ b(Xs ) dWs , t > 0. Here, W = (Wt , t ≥ 0) is an one-dimensional standard Brownian motion, µ and b are real functions such that they ensure the uniqueness in law of a solution to (1) and x0 is a given deterministic initial value of X (see e.g. [25] as a reference for SDE). The problem is to estimate unknown vector parameter θ = (ϑ, σ) of X, given a discrete observation (Xti , 0 ≤ i ≤ n) of a trajectory (Xt , t ∈ [0, T ]) over a time interval subdivision 0 =: t0 < t1 < · · · < tn := T , (n is a positive integer) with diameter δn,T := max0≤i<n (ti+1 − ti ), T > 0 being fixed. Component ϑ of θ is a (vector) drift parameter, and σ is a diffusion coefficient parameter. We assume that ϑ belongs to drift parameter space Θ, which is an open and convex set in Euclidean space Rd , and that σ is a positive real number. Hence, θ = (ϑ, σ) is an element of open and convex parameter space Ψ := Θ × h0, +∞i. ∗ This work has been partially supported by Croatian Science Foundation under the project 3526, and by Ministry of Science, Education and Sports, Republic of Croatia, Grants 037-0372790-2800 and 037058. † Department of Mathematics, Faculty of Science, University of Zagreb, Bijenička 30, HR-10002 Zagreb, Croatia (huzak@math.hr) 1 ESTIMATING A CLASS OF DIFFUSIONS VIA APPROXIMATE ML METHOD 2 Diffusion parameter estimation problems based on discrete observations have been discussed by many authors (see [1, 2, 3, 4, 9, 10, 12, 19, 20, 22, 23, 28]). Although the maximum likelihood estimator (MLE) has the usual good properties (see [9]), it may not be possible to calculate it explicitly because the transition density of process X is generally unknown and so the likelihood function (LF) of the discrete process is unknown as well. Hence, other methods of estimations have to be considered. The method of parameter estimation which is discussed in this paper and described in Section 3 below, is based on a Gaussian approximation of the transition density and can be interpreted as based on maximization of a discretized continuous-time log-likelihood function (LLF) as well. Such methods are usually called quasi-likelihood or approximate maximum likelihood (AML) methods, and in these ways obtained estimators we will briefly call approximate maximum likelihood estimators (AMLEs). Motivation for analyzing the method described in Section 3 is in the fact that it can provide us with useful estimators of the parameters. It is well known that in a such way obtained AMLE of diffusion coefficient parameter σ is consistent and asymptotically normally distributed over fixed observational time interval [0, T ] when δn,T → 0 (see [10] in case where all drift parameters are known, and see [14] in general cases). The same holds in ergodic diffusion cases when T → +∞ in a way that δn,T = T /n → 0 for appropriate equidistant sampling (see e.g. [12] or [19]). Local asymptotic properties of the AMLE of drift parameters over fixed interval [0, T ] and when δn,T → 0 are less known especially in more general cases, particularly when drift is nonlinear in its parameters (see [5]). Although a knowledge of local asymptotic properties of drift parameter AMLEs does not imply their consistency or asymptotic normality necessarily it may help in further analysis of the AMLEs which might include, for example, measuring effects of discretization on the estimator’s standard errors with applications in simulation studies. In ergodic diffusion cases it is well known that the AMLE of drift (vector) parameter is consistent and asymptotically normal and efficient when T → +∞ in a way that 2 T δn,T → 0 for equidistant sampling (see e.g. [12] for one-dimensional case and [19] for vector and more general cases) but the rate of convergence of ϑ̂n,T − ϑ̂T to zero are still less investigated. Let us stress that the problems of statistical inferences about diffusion drift parameters are very important especially in biomedical modeling (see [16]). For the completeness we should also stress that local convergence of the AMLE of both vector parameters θ = (ϑ, σ) to the MLE of θ based on discrete observations and equidistant sampling, have been investigated (see [1, 3, 23]). Let θ̃n,δ denote MLE (k) of θ based on discrete observations with δn,T ≡ δ = const., and let θ̃n,δ be AMLE obtained from an approximate LF based on a closed-form kth order approximation of the transition densities. Then in case of Hermite-polynomial-based analytical expansion (k ) approach for approximation of transition density, θ̃n,δn − θ̃n,δ → 0 when kn → ∞, and a sequence (kn ) can be chosen sufficiently large to deliver any rate of convergence (see [1]), and there exist sequences of regular matrices (Sn,δ ) and positive numbers (δn ) such (k) −1 (θ̃n,δn − θ̃n,δn ) = OP (1) (see [3]). For an alternative approach to that δn → 0 and Sn,δ n approximation and analog results, see [23]. In this paper we analyze the considered AMLE of drift parameters by studying the relation between the AMLE and the MLE obtained from continuously observed diffusion paths. We state general conditions for proving and prove: (1.) existence and p measurability of the AMLE, (2.) that ϑ̂n,T − ϑ̂T converges to zero with rate δn,T in probability when δn,T → 0 over fixed bounded observational time interval [0, T ], and p (3.) that ϑ̂n,T − ϑ̂T converges to zero with rate δn,T in probability when T → +∞ in a way that δn,T = T /n → 0 in an ergodic diffusion case and equidistant sampling. ESTIMATING A CLASS OF DIFFUSIONS VIA APPROXIMATE ML METHOD 3 We apply these findings in proving: (4.) measurability, consistency and asymptotic normality of diffusion coefficient parameter AMLEs when δn,T → 0 in both cases: when T is fixed, and in an ergodic diffusion case when T → +∞ and T δn,T = T 2 /n → 0 with equidistant sampling, and (5.) consistency and asymptotic normality and efficiency of drift parameter AMLEs in an ergodic case when T → +∞ in a way that T δn,T → 0 with equidistant sampling. Properties (1.-2.) for drift parameter AMLEs were proved in [22] in cases when drift depended linearly on its parameters. For detailed review of liner case see [5]. The first nonlinear case was covered by the author in his Ph.D. thesis [15]. The main assumption was that the drift was an analytic function in its parameters with properly bounded derivatives of all orders. In this paper we only assume that the drift has at least d + 3 continuous derivatives with respect to the drift parameters (d is a dimension of the drift parameter vector). The main difficulty was in proving core technical Theorem 6.1 of Section 5. Although facts (4.-5.) have been already known we included these alternative proofs for completeness and the illustrative purposes of the applicability of the findings (1.-3.) and in this paper developed methods. We belive that other discretization schemes (for example, of higher order) can be analyzed similarly by using the techniques of this paper. The paper is organized in the following way. In the next section we introduce notation used through the paper. The discussed method of estimation is described in Section 3. The main results are presented in Section 4. Examples are provided in Section 5. The proofs of the main results are in the last section. Lemmas are proved in Appendix. 2 Notations Let | · | denote Euclidean norm in Rd and its induced operator norm, and let | · |∞ be max-norm. If f is a bounded real function, kf k∞ := supz |f (z)| is a sup-norm of f . Let Lp (P) be the Banach space of all random variables with finite p-th moment and let k · kLp (P) denote its norm. If (x, ϑ) 7→ f (x, ϑ) is a real function defined on an open subset of R × Rd , then we denote by Dϑm f (x, ϑ) the m-th partial derivative with respect to ϑ. Let |Dϑm f (x, ϑ)|∞ := m maxj1 +···+jd =m | j1∂ f jd |. In this case we say that Dϑm f (x, ϑ) is bounded if all partial derivatives ∂ϑ1 ...∂ϑd ∂mf j j ∂ϑ11 ...∂ϑdd (x, ϑ) are bounded, and kDϑm f k∞ := maxj1 +···+jd =m k Dϑ2 f (x, ϑ) < Ø means that the Hessian Dϑ2 f (x, ϑ) positively definite matrix. Dz0 f ≡ f by m ∂m f j j ∂ϑ11 ...∂ϑdd k∞ . The notation is a negatively definite matrix. Similarly for a convention. The m-th derivative of f at a point z we simply denote by D f (z). Let K and Θ be open sets in Rd . The closure and the boundary of K will be denoted by K and ∂K respectively, and the σ-algebra of Borel subsets of Θ by B(Θ). If K ⊂ Θ is an open set such that K is compact in Θ then we will say that K is a relatively compact set in Θ. Let (γn , n ≥ 1) be a sequence of positive numbers and let (Yn , n ≥ 1) be a sequence of random variables defined on some probability space. We will say that (Yn , n ≥ 1) is OP (γn ), and write Yn = OP (γn ), if the sequence (Yn /γn , n ≥ 1) is bounded in probability, i.e. if limA→+∞ limn P{γn−1 |Yn | > A} = 0. ESTIMATING A CLASS OF DIFFUSIONS VIA APPROXIMATE ML METHOD 4 3 Estimation method Let 0 = t0 < t1 < · · · < tn = T be discrete times at which diffusion X is observed, and let us denote by ∆ the difference operator defined in the following way: if F is a function defined on [0, T ] then ∆i F := F (ti+1 ) − F (ti ), 0 ≤ i < n. Let us discretize SDE (1) over interval [ti , ti+1 ] by using the Euler approximation of the both types of integrals: √ Xti+1 − Xti ≈ µ(Xti , ϑ)(ti+1 − ti ) + σ b(Xti )(Wti+1 − Wti ). In this way the following stochastic difference equation is obtained: √ ∆i Z = µ(Zi , ϑ) ∆i t + σ b(Zi ) ∆i W (2) for 0 ≤ i < n, and Z0 = x0 . Solution to (2) is a time-discrete process Z = (Z0 , Z1 , . . . , Zn ) that is an approximation of X over [0, T ]. Up to the constant not depending on the parameters a LLF of the process Z is  2 Pn−1  i ,θ)∆i t) + log σ . (3) − 21 i=0 (∆i Z−µ(Z 2 σb (Zi )∆i t Criterion function Ln,T (θ) = Ln,T (ϑ, σ) := − 21 Pn−1  (∆i X−µ(Xti ,ϑ)∆i t)2 σb2 (Xti )∆i t i=0 + log σ  (4) is obtained from (3) by substituting (Zi , 0 ≤ i ≤ n) with discrete observations (Xti , 0 ≤ i ≤ n) of diffusion X. Notice that 1 Ln,T (ϑ, σ) = − 2σ where ℓn,T (ϑ) = Pn−1 i=0 (∆i X)2 i=0 b2 (Xti )∆i t Pn−1 µ(Xti ,ϑ) b2 (Xti ) ∆i X − − 1 2 n 2 log σ + σ1 ℓn,T (ϑ), Pn−1 i=0 µ2 (Xti ,ϑ) b2 (Xti ) ∆i t (5) depends only on drift parameter ϑ. A point of maximum θ̂n,T = (ϑ̂n,T , σ̂n,T ) of function (4) in Ψ is an AMLE of vector parametar θ if it exists. Notice that if AMLE exists then necessary ( Dℓn,T (ϑ̂n,T ) = 0 Pn−1 (∆i X−µ(Xti ,ϑ̂n,T )∆i t)2 DLn,T (ϑ̂n,T , σ̂n,T ) = 0 ⇔ (6) σ̂n,T = n1 i=0 . b2 (Xt )∆i t i Hence every stationary point ϑ̂n,T of function ℓn,T uniquely determines second component σ̂n,T of stationary point θ̂n,T = (ϑ̂n,T , σ̂n,T ) of function Ln,T by the following expression: Pn−1 (∆i X−µ(Xti ,ϑ̂n,T )∆i t)2 (7) σ̂n,T = n1 i=0 . b2 (Xt )∆i t i Moreover, if ϑ̂n,T is a unique point of the global maximum of function ℓn,T then θ̂n,T is a unique point of the global maximum of function Ln,T . Hence to prove existence of a measurable AMLE θ̂n,T it is sufficient to prove that there exists a measurable point of maximum of function ℓn,T . ESTIMATING A CLASS OF DIFFUSIONS VIA APPROXIMATE ML METHOD 5 4 Main results 4.1 Fixed maximal observational time case Let the following assumptions be satisfied. (H1a): For all θ = (ϑ, σ) ∈ Ψ, there exists a strong solution (X, W ) of the SDE (1) on time interval [0, +∞i with values in open interval E ⊆ R. (H2a): For all ϑ ∈ Θ, µ(·, ϑ) ∈ C 2 (E) and b ∈ C 3 (E). Moreover for all x ∈ E, b(x) 6= 0 and sign b = const. For example, by Theorem 5.2.2 in [13], (H1a) will be satisfied if in addition to (H2a) we assume that for all ϑ ∈ Θ SDE (1) satisfies so called the bounded linear growth assumption, i.e. that there exists a positive constant C such that for all x ∈ E, |µ(x, ϑ)| + |b(x)| ≤ C(1 + |x|). More precisely, (H2a) states that the functions x 7→ b(x) and x 7→ µ(x, ϑ), ϑ ∈ Θ, are continuously differentiable in E and hence locally Lipschitz. In this case there exists a strong, continuous and pathwise unique solution to SDE (1) on time interval [0, +∞i. However, there are some SDEs which satisfy (H1a) and (H2a) but do not satisfy the linear growth assumption (see e.g. Example 5.1 of Section 5). (H3a): For all (x, ϑ) ∈ E × Θ and all 1 ≤ m ≤ d + 3, there exists partial derivatives ∂ ∂2 m Dϑm µ(x, ϑ), ∂x Dϑm µ(x, ϑ),and ∂x 2 Dϑ µ(x, ϑ) of drift function µ. Moreover, for all 0 ≤ 2 ∂ ∂ m m ≤ d + 3, Dϑm µ, ∂x Dϑm µ, ∂x 2 Dϑ µ ∈ C(E × Θ). Let Pθ denote the law of X for θ ∈ Ψ. We assume that probabilities Pθ , θ ∈ Ψ, are defined on filtered space (Ω, (FT0 , T ≥ 0)) where Ω is a set of continuous functions ω : [0, +∞i → E such that ω(0) = x0 , FT0 is a σ-algebra generated by the coordinate functions up to the time T , and the filtration is augmented in so called the usual way (see e.g. I.4 in [25]). On this space, coordinate process (ω 7→ ω(t), t ≥ 0) is a canonical version of X (see [25], I.§3). Hence, for each T > 0 we assume that X is defined on the measurable space (Ω, FT0 ) as a canonical process with law Pθ . For the moment, let us assume that we are able to observe the process (Xt , 0 ≤ t ≤ T ) continuously. Because diffusion coefficient parameter σ can be uniquely determined through equation σ= limn P2n i=1 (XjT 2−n −X(j−1)T 2−n ) RT b2 (Xt ) dt 0 2 (8) (a.s. Pθ ) (see [8]) since b2 > 0 by (H2a), the estimation problem from continuously observed process can be reduced to an estimation problem for drift parameter ϑ ∈ Θ. In this case for every fixed diffusion parameter σ assumed to be known, and every two different ϑ1 , ϑ2 ∈ Θ, probability measures P(ϑ1 ,σ) and P(ϑ2 ,σ) are equivalent on FT0 , and RT RT 2 2 dP 2 ,σ) (Xt ,ϑ1 ) t ,ϑ1 ) log dP(ϑ = σ1 ( 0 µ(Xt ,ϑb22)−µ(X dXt − 21 0 µ (Xt ,ϑb22)−µ dt) (Xt ) (Xt ) (ϑ ,σ) 1 where FT0 dP(ϑ2 ,σ) dP(ϑ1 ,σ) denotes Radon-Nikodym derivative of P(ϑ2 ,σ) with respect to P(ϑ1 ,σ) on dP (see [11]). If we fix some ϑ∗ ∈ Θ, a continuous-time LLF is ϑ 7→ log dP(ϑ(ϑ,σ) . Up to ∗ ,σ) the constant and factor not depending on ϑ, function RT R 2 1 T µ (Xt ,ϑ) t ,ϑ) (9) ℓT (ϑ) := 0 µ(X b2 (Xt ) dXt − 2 0 b2 (Xt ) dt. is equal to the LLF. Hence, ℓT will be called a continuous-time LLF (see [21]). Assumption (H3a) implies that ℓT is at least three-times continuously differentiable function ESTIMATING A CLASS OF DIFFUSIONS VIA APPROXIMATE ML METHOD 6 on Θ, and for 1 ≤ m ≤ d + 3, its derivatives are equal to (see [21] for m ≤ 2) Dm ℓT (ϑ) = RT 0 1 m b2 (Xt ) Dϑ µ(Xt , ϑ) dXt − 1 2 RT 0 1 m 2 b2 (Xt ) Dϑ µ (Xt , ϑ) dt. (10) (H4a): For all ω ∈ Ω, function ϑ 7→ ℓT (ϑ) = ℓT (ϑ, ω) has a unique point of global maximum ϑ̂T = ϑ̂T (ω) in Θ. Moreover, Dϑ2 ℓT (ϑ̂T ) < Ø. Assumption (H4a) enables property (ii) in Theorem 4.1 below, to be proved. If (H3a) and (H4a) hold then Lemma 4.1. from [17] implies that (ω, ϑ) 7→ ℓT (ϑ)(ω) is an FT0 ⊗ B(Θ)-measurable function, and continuous-time MLE ϑ̂T is an FT0 -measurable random variable. Let Fn,T be a σ-subalgebra of FT0 generated by discrete observation (Xti , 0 ≤ i ≤ n) of process (Xt , 0 ≤ t ≤ T ). Notice that if (H3a) holds then (ω, ϑ) 7→ ℓn,T (ϑ, ω) (given by 5) is an Fn,T ⊗ B(Θ) measurable function by Lemma 4.1. in [17]. If ℓn,T is a concave function on Θ then a stationary point ϑ̂n,T is an unique point of maximum of ℓn,T on Θ and hence it is Fn,T -measurable by e.g. Lemma 4.1. in [17]. If ℓn,T is not a concave function on Θ, for proving Fn,T -measurability of estimators ϑ̂n,T (and so θ̂n,T ) introduced in Section 3 we need additional assumptions: (H5a): Θ is a relative compact set in Rd , and for each 0 ≤ m ≤ d+3, Dϑm µ, ∂ m ∂x2 Dϑ µ ∈ C(E × Θ). 2 ∂ m ∂x Dϑ µ, (H6a): For all ω ∈ Ω and some r > 0, ℓT (ϑ̂(ω), ω) > sup|x|≥r ℓT (ϑ̂(ω) + x, ω). Assumption (H6a) holds if (H5a) holds and ϑ̂T is the unique point of maximum of ℓT on compact Θ. Theorem 4.1 Let us assume that (H1a-4a) hold and T > 0 be fixed. Then there exists a sequence (ϑ̂n,T , n ≥ 1) of FT0 -measurable random vectors such that for all θ = (ϑ, σ) ∈ Ψ and when δn,T ↓ 0, (i) limn Pθ (Dℓn,T (ϑ̂n,T ) = Ø) = 1 (ii) (Pθ ) limn ϑ̂n,T = ϑ̂T p (iii) ϑ̂n,T − ϑ̂T = OPθ ( δn,T ), n → +∞ (iv) If (ϑ̃n,T , n ≥ 1) is an FT0 -measurable sequence in Θ that satisfies (i − ii) then limn Pθ (ϑ̃n,T = ϑ̂n,T ) = 1. If either for n ≥ 1 and almost all ω ∈ Ω function ϑ 7→ ℓn,T (ϑ, ω) has a unique point of local maximum which is a point of the global maximum as well, or the hypotheses (H5a-6a) are satisfied, then θ̂n,T can be chosen to be Fn,T -measurable. Corollary 4.2 Let (H1a-4a) hold, T > 0 be fixed, and (σ̂n,T , n ≥ 1) be given by (7). Then (i) (P√ θ ) limn σ̂n,T = σ; 1 (σ̂n,T − σ), n ≥ 1) converges in law w.r.t. Pθ to the standard normal (ii) ( n σ√ 2 distribution N (0, 1) when n → +∞. Moreover, if ϑ̂n,T is Fn,T -measurable then σ̂n,T is Fn,T -measurable too. ESTIMATING A CLASS OF DIFFUSIONS VIA APPROXIMATE ML METHOD 7 Remark 4.3 Theorem 4.1 still holds if we replace (H1a) with the assumption that T < ξ a.s. where ξ is a maximal random time such that SDE (1) has a solution on [[0, ξ[[= {(ω, t) ∈ Ω × [0, +∞i : 0 ≤ t < ξ(ω)}. ξ exists by assumption (H2a) and the existence and uniqueness theorem for SDEs (see e.g. [13] or [25]). Remark 4.4 Theorem 4.1 still holds if the drift and diffusion coefficient functions depend on time variable too (non autonomous case: (t, x) 7→ µ(t, x, ϑ), σb(t, x)) in a way that assumptions (H2a) and (H3a) hold for µ and b with x and E replaced with (t, x) and Ẽ = [0, +∞i × E respectively. 4.2 Ergodic diffusions case Let the coefficient diffusion function parameter σ > 0 be fixed. We need the following assumptions. (H1b): (H1a) holds, and X is an ergodic diffusion with stationary distribution πϑ (dx), ϑ ∈ Θ. (H2b): (H2a) holds, and for all ϑ ∈ Θ functions µ(·, ϑ)b′ /b, (b′ )2 , b′′ b ∈ L16 (πϑ ), b b ∈ L8 (πϑ ), and there exist a function c ∈ L1 (πϑ ) and a number h0 > 0 such that  R    ′ h ′′ ′2 sup0<h≤h0 E(ϑ,σ) exp 8 0 2 µ(·,ϑ)b + σ(b b + 15b ) (X ) ds ≤ c(x0 ). s b 2 ′′′ (H3b): (H3a) and (H5a) hold, and there exist nonnegative functions g0 , g1 , g2 : E → R such that for all ϑ0 ∈ Θ, g0 ∈ L32 (πϑ0 ) ∩ C 1 (E) such that g0′ b ∈ L16 (πϑ0 ), g1 ∈ L16 (πϑ0 ) ∩ C(E), g2 ∈ L8 (πϑ0 ) ∩ C(E), and for all x ∈ E and 0 ≤ m ≤ d + 3, supϑ∈Θ |Dϑm µ(x, ·)/b(x)|∞ ∂ supϑ∈Θ | ∂x Dϑm µ(x, ·)|∞ ∂2 supϑ∈Θ | ∂x2 Dϑm µ(x, ·)b(x)|∞ ≤ ≤ ≤ g0 (x) g1 (x) g2 (x). (H4b): For all ϑ ∈ Θ, (∀ϑ′ ∈ Θ) ϑ′ 6= ϑ ⇒ (H5b): For all ϑ ∈ Θ, functions 2 L (πϑ ). R E (µ(x,ϑ)−µ(x,ϑ′ ))2 b2 (x) ∂µ ∂ϑi (·, ϑ)/b, πϑ (dx) > 0. (11) 1 ≤ i ≤ d, are linearly independent in Θ is a relatively compact set in Rd by assumption (H5a) since (H3b) holds. Assumptions (Hb1-b3) imply that for all ϑ0 ∈ Θ and ϑ ∈ Θ, P(ϑ0 ,σ) -a.s. limT →+∞ 1 T ℓT (ϑ) = 1 2 R E µ(x,ϑ0 )2 −(µ(x,ϑ0 )−µ(x,ϑ))2 b2 (x) πϑ0 (dx) =: ℓϑ0 (ϑ) (12) by ergodic property of the diffusion and the law of large numbers for continuous martingales (see e.g. [25], Chapters V and X). Function ℓϑ0 : Θ → R defined for every ϑ0 ∈ Θ by formula (12) is at least three times continuously differentiable on compact Θ by (H3b), and (µ(x,ϑ0 )−µ(x,ϑ)) Dϑ µ(x, ϑ) πϑ0 (dx) b2 (x) E   R (µ(x,ϑ0 )−µ(x,ϑ)) 2 1 τ 2 D µ(x, ϑ) − (D µD µ)(x, ϑ) πϑ0 (dx). D ℓϑ0 (ϑ) = E ϑ 2 2 ϑ ϑ b (x) b (x) Dℓϑ0 (ϑ) = R ESTIMATING A CLASS OF DIFFUSIONS VIA APPROXIMATE ML METHOD 8 Hence, by the same argument as for (12), for any fixed ϑ ∈ Θ, P(ϑ0 ,σ) -a.s. limT →+∞ limT →+∞ 1 T DℓT (ϑ) 1 2 T D ℓT (ϑ) = = Dℓϑ0 (ϑ), D2 ℓϑ0 (ϑ). (13) If ϑ 6= ϑ0 then ℓϑ0 (ϑ) < ℓϑ0 (ϑ0 ) by (12), and (H4b). Hence ϑ0 is the unique point of maximum of ℓϑ0 on Θ. This implies identifiability property of the model: let ϑ1 , ϑ2 ∈ Θ be such that P(ϑ1 ,σ) = P(ϑ2 ,σ) . Then πϑ1 = πϑ2 and so ℓϑ1 ≡ ℓϑ2 by (12). Hence ϑ1 = ϑ2 . Moreover, (H5b) implies that the Fisher information matrix is positive definite, i.e. R I(ϑ0 ) = −D2 ℓϑ0 (ϑ0 ) = E b21(x) (Dϑτ µDϑ µ)(x, ϑ0 ) πϑ0 (dx) > Ø. The next theorem states that the continuous-time MLE of drift parameters exists, is consistent and asymptotically efficient, and satisfies assumptions (H4a) and (H6a) a.s. for almost all observational times. Generally these are well known facts (see e.g. [8] or [11]) but we provided it here for completeness, and in the appropriate form for the purpose of proving Theorem 4.6 below. Theorem 4.5 Let us assume that (H1b-5b) hold. Then there exists an (FT0 , T > 0)adapted process (ϑ̂T , T > 0) of random vectors such that for every θ = (ϑ, σ) ∈ Ψ the following holds: (i) Pθ -a.s. there exists T0 > 0 such that for all T ≥ T0 , ϑ̂T ∈ Θ is the unique point of maximum of ℓT on Θ, and D2 ℓT (ϑ̂T ) < Ø in a way that min|y|=1 y τ (− T1 D2 ℓT (ϑ̂T ))y ≥ 1 τ 2 min|y|=1 y I(ϑ)y. (ii) lim √T →+∞ ϑ̂T = ϑ Pθ -a.s. (iii) ( T (ϑ̂T − ϑ), T > 0) converges in law w.r.t. Pθ to normal law N (Ø, σI(ϑ)−1 ) with expectation Ø and covariance matrix σI(ϑ)−1 . The following theorem is a version of Theorem 4.1 for ergodic diffusions. In addition it states that AMLEs are consistent and asymptotically efficient when both maximal observational time and number of discrete observational time points tend to infinity for appropriate sampling schemes. Hence in its statement ’limn,T ’ denotes the limit when both T → +∞ and n → +∞. Theorem 4.6 Let us assume that (H1b-5b) hold. Then there exists a process (ϑ̂n,T ; n ≥ 1, T > 0) of Fn,T -measurable random vectors ϑ̂n,T such that for all θ = (ϑ, σ) ∈ Ψ and πϑ -a.s. nonrandom initial conditions, and all equidistant samplings such that δn,T = T /n → 0, the following holds. (i) limn,T Pθ (Dℓn,T (ϑ̂n,T ) = Ø) = 1. (ii) (Pθ ) limn,T (ϑ̂n,T − ϑ̂T ) = Ø, p (iii) ϑ̂n,T − ϑ̂T = OPθ ( δn,T ), n → +∞, T → +∞ (iv) If (ϑ̃n,T ; n ≥ 1, T > 0) is a process of random vectors in Θ that satisfies (i − ii) then limn,T Pθ (ϑ̃n,T = ϑ̂n,T ) = 1. (v) (Pθ ) limn,T ϑ̂n,T = ϑ, and if in addition limn,T T δn,T = 0 then √ L−P T (ϑ̂T,n − ϑ) −→θ N (Ø, σI(ϑ)−1 ), T → +∞, n → +∞. (vi) (Pθ ) limn,T σn,T = σ, and if in addition limn,T T δn,T = 0 then √ 1 L−P n σ√2 (σ̂n,T − σ) −→θ N (0, 1), T → +∞, n → +∞. ESTIMATING A CLASS OF DIFFUSIONS VIA APPROXIMATE ML METHOD 9 5 Examples Example 5.1 Generalized logistic model. Let the stochastic generalized logistic model be given with the following SDE: √ dXt = (α − βXtγ )Xt dt + σXt dWt , X0 = x0 > 0 (14) where ϑ = (α, β, γ) (γ > 0) is a drift vector parameter. By using the methods of stochastic calculus it is possible to explicitly solve (14) that proves that there exists pathwise unique, continuous and strong solution to this SDE with X defined on Ω × [0, +∞i and values in E = h0, +∞i. Moreover, it turns out that for drift parameters such that α > σ/2, β > 0 and γ > 0, generalized logistic process X is positive recurrent and ergodic with a such stationary distribution πϑ that for stationary X, Xtγ follows Γdistribution with parameters A := 2(α − σ/2)/(γσ) and B := γσ/(2β) (i.e. EXtγ = AB, E(Xtγ )2 = AB(B + 1)) by e.g. Theorem 7.1, pp. 219-220 in [13]. Hence, assumption (H1b) holds. In generalized logistic model, drift function is equal µ(x, ϑ) = (α − βxγ )x, and up to the diffusion parameter σ > 0, diffusion coefficient function is b(x) = x > 0 on E. Hence b′ ≡ 1, bb′′ = b2 b′′′ ≡ 0 that are trivially integrable with respect to any probability law. Let f (x, ϑ) = µ(x, ϑ)/b(x) = α − βxγ . Notice that any partial derivatives of f with respect to ϑ are of the form −β n xγ logm x where n ∈ {0, 1}, m ∈ N0 . Of the ∂k m same forms are components of bk ∂x k Dϑ f for k = 1, 2. Finally, any p-th power of their absolute values (p is a positive integer) are of the form xc | log x|m up to a constant, where c > 0 is a real number and m is a nonnegative integer. These functions are integrable with respect to πϑ . If we choose a relative compact Θ of drift parametric set h σ2 , +∞i × h0, +∞i2 then there exist α0 > σ/2, β0 > 0 and γ0 > 0 such that for all ϑ ∈ Θ, x > 0, and all 0 ≤ m ≤ 6, k ∈ {0, 1, 2} and integers jα , jβ , jγ such that jα + jβ + jγ = m, m+k |bk (x) ∂ k x∂α∂jα ∂β jβ ∂γ jγ f (x, ϑ)| ≤ g(x) := α0 + β0 xγ0 (1 + log2 x + log4 x + log6 x). Then g ∈ Lp (πϑ ) ∩ C 1 (E) for all p ≥ 1 and ϑ ∈ Θ which implies partially (H2b) and (H3b) by simple calculation (see the proof of Corollary 6.13 below). To finish the proof of (H2b) notice that for all h0 > 0, and all 0 < h ≤ h0 , exp(16 Rh 0 ((α − βXtγ + 15σ 2 ) dt) ≤ exp((16α0 + 120σ)h0 ) = c(x0 ) = constant since Xt > 0 for all t ≥ 0 and β > 0. This implies the same inequalities for expectations with respect to any initial conditions X0 = x0 . Hence (H2b) is proved. To show that (H4b) holds, let us assume that R (µ(x, ϑ1 ) − µ(x, ϑ2 ))2 /b2 (x)πϑ1 (dx) = 0 E for some ϑ1 ∈ Θ and ϑ2 ∈ Θ. Since πϑ1 is absolutely continuous w.r.t. Lesbegues measure λ on E, this implies that µ(x, ϑ1 ) = µ(x, ϑ2 ) for a.s. x > 0 w.r.t. λ. Hence, smooth function u(x) := β1 xγ1 − β2 xγ2 must be a constant function for λ-a.s. x > 0. This implies that γ1 = γ2 and hence ϑ1 = ϑ2 . This proves (H4b). ∂ ∂ ∂ µ(x, ϑ) = 1, ∂β µ(x, ϑ) = −xγ , and ∂γ µ(x, ϑ) = Finally, (H5b) holds since ∂α γ 2 −βx log x are obviously linearly independent functions in L (πϑ ). ESTIMATING A CLASS OF DIFFUSIONS VIA APPROXIMATE ML METHOD 10 Example 5.2 Cox-Ingersoll-Ross (CIR) model. CIR model (or Feller’s square root model) is given by SDE: p (15) dXt = (β − αXt ) dt + σ|Xt | dWt , X0 = x0 > 0. Vector of drift parameters is pϑ = (α, β), drift function µ(x, ϑ) = β − αx is linear in its parameters, and b(x) = |x|. It has been known (see e.g. [18]) that if α > 0 and β > 0 are such that 2β > σ, and x0 > 0 then SDE (15) has strong positive recurrent and ergodic solution in state space E = h0, +∞i with stationary distribution πϑ which has Γ-law with expectation β/α and variance βσ/(2α2 ). Hence (H1b) and (H2a-3a) hold for any open relatively compact and convex set Θ in h0, +∞i2 ∩ {(α, β) : 2β > σ} that contains the true drift parameter value. Additionally let us assume that if ϑ = (α, β) ∈ Θ then 2β/σ > 16. Then function x 7→ 1/x is in L16 (πϑ ) which implies (H3b) and partially (H2b). Since inequality in (H2b) is used only for proving the statement of Lemma 6.5 it is sufficient to prove this lemma directly (instead of this inequality), i.e. for each ϑ ∈ Θ we want to find a function c0 ∈ L1 (πϑ ) and h0 > 0 such that the following inequality holds for any t ≥ 0: sup0<h≤h0 E(ϑ,σ) (b(Xt+h /b(Xt ))8 ≤ E(ϑ,σ) c0 (Xt ). (16) Let x > 0 and h > 0 be arbitrary, and let X be such that (15) holds with X0 = x. Let E ≡ E(ϑ,σ) , and let us calculate E(b(Xh )/b(x))8 = (1/x)4 · where R +∞ 0 y 4 p(h, x, y) dy √ p(h, x, y) = Ce−u−v (v/u)q/2 Iq (2 uv), is the transition density of Xh given X0 = x (see [18]). Here u = Cxe−αh , v = Cy, C = (2α)/(σ(1 − e−αh )), q = (2β/σ) − 1, and Iq is the modified Bessel function of the first kind of order q. Since q > 15 and e−αh < 1 it turns out that R +∞ E(b(Xh )/b(x))8 = (1/x)4 · 0 y 4 p(h, x, y) dy ≤ 8q 4 ( x34 + x123 + x92 + x2 + 1) =: c0 (x). Then E(b(Xt+h /b(Xt ))8 = E[E[(b(Xt+h /b(Xt ))8 |Ft0 ]] = E[EXt [(b(Xh /b(X0 ))8 ]] ≤ Ec0 (Xt ) by Markov property and above inequality. Hence (16) holds for any h0 > 0, and c0 ∈ L1 (πϑ ) since x 7→ 1/x ∈ L16 (πϑ ). √ √ Finally, (H4b-5b) follow easily since functions x and 1/ x are linearly independent, and πϑ is dominated by Lesbegue measure on h0, +∞i. Hence, if 2β/σ > 16 then Theorem 4.6 can be applied on CIR model (15). Since in CIR model the drift function is linear in its parameters ALF ℓn,T (ϑ) and LF ℓT (ϑ) (ϑ = (α, β)) are quadratic functions. Hence there exist unique explicit solutions to stationary equations Dℓn,T (ϑ) = 0 and DℓT (ϑ) = 0, and properties of the AMLE can be investigate by simulation techniques easily. For this purpose we simulate M = 1000 paths of the process X over time-interval [0, T ] for true parameter values ϑ0 = (α0 , β0 ) = (0.5, 0.03) and σ0 = 0.062, and several different values of T , precisely for T = 3, 4, . . . , 11. Drift parameter values ϑ0 have been borrowed from similar examples in [1] or [23], and σ0 has been chosen to be a such that 2β0 /σ0 ≈ 16.7 > 16. Each path initially starts at x0 = 1, and have been simulated by using Milstein sheme based on discretization of [0, T ] on 216 equidistant points. Using the same discretization [0, T ] each Riemann ESTIMATING A CLASS OF DIFFUSIONS VIA APPROXIMATE ML METHOD 11 log(n) 3 4 5 6 7 8 9 10 11 SW 0.0000 0.0000 0.0000 0.0001 0.0190 0.0959 0.3887 0.5968 0.6537 Lillie 0.0010 < 0.0010 < 0.0010 0.0136 0.0382 0.1104 > 0.5000 > 0.5000 > 0.5000 JB 0.0010 < 0.0010 < 0.0010 < 0.0010 0.0240 0.0926 0.2146 > 0.5000 > 0.5000 KS 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0010 0.0037 Table 1: P-values of Shapiro-Wilk (SW), Lilliefors (Lillie), Jarque-Bera (JB) and (KS) tests of normality applied on samples of statistic √ √ Kolmogorov-Smirnov ( n/σ0 2)(σ̂n,T − σ0 ) (of length M = 1000) with respect to different sampling sizes n with fixed T = 7. integral in ℓT (ϑ) have been approximated by trapezoidal rule, and Itô integral by Euler approximation. Any estimate θ̂n,T = (ϑ̂n,T , σ̂n,T ) for varying n has been calculated from the same path as estimate ϑ̂T does. The results of analyzing asymptotic behavior of deviances ϑ̂n,T − ϑ̂T and σ̂n,T − σ0 are presented at Figure 1. Subfigures A and C represent mean deviances relative to the true parameter values, and B and √ p subfigures √ D represents standard deviations of deviances standardized with δn,T / T = 1/ n and relative to the true parameter values too. In case of subfigures A and B, T = 7 is fixed, but number n of equidistant sampling time-points varies from 23 to 211 in a way that log(n) = k, k = 3, 4, . . . , 11, where ’log(·)’ represents logarithm with base 2. Subfigure A shows the expected asymptotic behavior that limn ϑ̂n,T = ϑ̂T and limn σ̂n,T = σ0 in case of fixed T and δn,T = T /n → 0, but also that AMLE subestimates MLE and similarly for σ̂n,,T . The rate of convergence can be seen from subfigure √ B. Namely, the pconvergence of empirical standard deviations (estd) of components of T (ϑ̂n,T − ϑ̂T )/ δn,T (relative to ϑ0 ) shows that these statis√ tics are bounded √ in probability, while the convergence of estd of n(σ̂n,T√− σ0 )/σ0 to a neighborhood of 2 ≈ 1.41 are also expected by convergence in law of n(σ̂n,T − σ0 ) √ to the normal distribution with standard deviation σ0 2. Table 1 shows p-values of three tests of normality: Shapiro-Wilk (SW), Lilliefors Kolmogorov-Smirnov (Lillie) and Jarque-Bera (JB), √ Kolmogorov-Smirnov test (KS) of standard normality of simu√ and lated statistic ( n/σ0 2)(σ̂n,T −σ0 ) with respect to n (and fixed T = 7). Obviously, the statistic converges to normality, but slowly to the specific limiting normal distribution. The same behavior of deviances ϑ̂n,T − ϑ̂T and σ̂n,T − σ0 when T → +∞ in a way that δn,T → 0 can be seen from subfigures C and D. In case of these subfigures the relative mean deviances and the relative standard deviations of standardized deviances T are presented with respect to δ = δ√ , 11. n,T = T √/2 = log(n)/n for T = log(n) = 3, 4 . . .11 Normal q-q plot of the sample of ( n/σ0 2)(σ̂n,T − σ0 ) in case T = 11 and n = 2 is presented at subfigure B of Figure 2. Asymptotic properties of deviances ϑ̂n,T −ϑ0 when T → +∞ in a way that δn,T → 0, are presented in Figure 2. Subfigure A presents the relative mean deviances with respect to δ = T /2T = log(n)/n for T = log(n) = 3, 4 . . . , 11. We notice the concave shape of the both curves tending to zero when δ → 0. The convergence to the normality is very slow as illustrated with q-q plots of the standardized components of AMLEs (with α β σ −0.3 −0.4 −0.5 relative mean deviance 2 4 6 8 log (n) C 10 12 0 −0.1 −0.2 −0.3 −0.4 0 0.1 0.2 δ 0.3 0.4 rel. stdd. of stand. deviance relative mean deviance A 0.1 0 −0.1 −0.2 rel. stdd. of stand. deviance ESTIMATING A CLASS OF DIFFUSIONS VIA APPROXIMATE ML METHOD 12 B 1.5 1 0.5 0 2 4 6 8 log (n) D 10 12 1.5 1 0.5 0 0 0.1 0.2 δ 0.3 0.4 Figure 1: (A) Relative means of components of statistics ϑ̂n,T −ϑ̂T and σ̂n,T −σ0 (relative to ϑ0 and σ0 respectively) with respect to different sampling √ sizes n and fixed T = p T (ϑ̂n,T − ϑ̂T )/ δn,T 7. (B) Relative standard deviations of standardized deviances √ and n(σ̂n,T − σ0 )/σ0 (relative to the true parameter values) with respect to different sampling sizes n and fixed T = 7. (C) Relative means of components of the same statistics as in A but with respect to δ = δn,T = T /2T = log(n)/n for different T s. (D) Relative standard deviations of the same standardized deviances as in B but with respect to δ = δn,T = T /2T = log(n)/n for different T s. In all cases means and std. deviatians are estimated based on simulated samples with length M = 1000. ESTIMATING A CLASS OF DIFFUSIONS VIA APPROXIMATE ML METHOD 13 respect to the limiting normal laws and at T = 11, n = 211 ) at subfigures C and D. 6 Proofs Basically the proof of Theorem 4.1 is based on the so-called general theorem on approximate maximum likelihood estimation and its corollary that are stated and proved in [17] as Theorem 3.1 and Corollary 3.2. The proof of Theorem 4.6 is a modification of the proof of the same theorem based on Theorem 4.5. But first we need to state and prove Theorem 6.1, and its Corollaries 6.11 and 6.13 that are needed in applying the general theorem in this context. Proofs of some technical lemmas are in Appendix. Let us suppose that X = (Xt , t ≥ 0) is a diffusion satisfying (H1a-2a) with true parameter θ0 = (ϑ0 , σ) ∈ Θ, and such that P(X0 = x0 )√= 1 for x0 ∈ E. Here P ≡ Pθ0 , E ≡ Eθ0 and L2 ≡ L2 (P). Denote µ0 = µ(·, ϑ0 ), ν = σb, Aa := a′ µ0 + a′′ ν 2 /2, and ā := |Aa| + |a′ ν| for a ∈ C 2 (E). Theorem 6.1 Let Θ ⊂ Rd be an open convex set, and let f : E × Θ → R, a : E → R be functions. Let 0 = t0 < t1 < · · · < tn = T be subdivisions of intervals [0, T ], T > 0, such that δn,T ↓ 0. Assume the following: (B1): a ∈ C 2 (E) and there exist constants Ca > 0, Ta ≥ 0, and na ∈ N such that R  Pn−1 4 T (∀ T > Ta )(∀n ≥ na ) T1 E 0 (a4 +ā4 )(Xt ))dt + i=0 a (Xti )∆i t ≤ Ca . (B2): For all ϑ ∈ Θ, f (·, ϑ) ∈ C 2 (E), and for all (x, ϑ) ∈ E × Θ and 1 ≤ m ≤ ∂2 ∂ m Dϑm f (x, ϑ), and ∂x d + 1 there exists partial derivatives Dϑm f (x, ϑ), ∂x 2 Dϑ f (x, ϑ). Moreover, (∀ 0 ≤ m ≤ d + 1) Dϑm f, ∂ ∂2 m m ∂x Dϑ f, ∂x2 Dϑ f ∈ C(E × Θ). (B3): For any relatively compact set K in Θ there exist: a positive measurable function g : E → R such that for all 0 ≤ m ≤ d + 1,   ∂2 ∂ m 2 ≤ g, Dϑm f (·, ϑ)|∞ (|µ0 |+|ν|)+| ∂x supϑ∈K |Dϑm f (·, ϑ)|∞ +| ∂x 2 Dϑ f (·, ϑ)ν |∞ and constants Cg > 0, Tg ≥ 0, and ng ∈ N, such that  R Pn−1 T (∀ T > Tg )(∀n ≥ ng ) T1 E 0 g 4(Xt ) dt + i=0 g 4(Xti )∆i t   R ti+1 4 1 Pn−1 4 4 (a + ā4 )(Xt ) dt i=0 (ga) (Xti )∆i t + g (Xti ) ti TE ≤ Cg & ≤ Cg . (B4): There exist: a measurable function c : E → R and constants h0 > 0, Cc > 0, ′ Tc ≥ 0, and nc ∈ N such that for r := | µ0bb | + |b′′ b| + |b′ |,  R    ′ h sup0<h≤h0 E exp 8 0 2 µ0bb + σ(b′′ b + 15b′2 ) (Xs ) ds ≤ c(x0 ), P  R T 8 n−1 ≤ Cc . (∀ T > Tc )(∀n ≥ nc ) T1 E i=0 c(Xti )∆i t + 0 r (Xt )dt Then there exist constants C1 > 0, C2 > 0, T0 ≥ 0, and n0 ∈ N, possible dependent on α β 0.1 0 −0.1 0 0.1 0.2 δ C 0.3 0.4 1 0.5 0 −0.5 −1 −4 −2 0 2 4 Standard Normal Quantiles quantiles of stand. AMLE of β quantiles of stand. AMLE of α relative mean deviance A 0.2 quantiles of stand. AMLE of σ ESTIMATING A CLASS OF DIFFUSIONS VIA APPROXIMATE ML METHOD 14 B 4 2 0 −2 −4 −4 −2 0 2 4 Standard Normal Quantiles D 10 5 0 −5 −10 −4 −2 0 2 4 Standard Normal Quantiles Figure 2: (A) Relative means of components of statistics ϑ̂n,T − ϑ0 (relative to ϑ0 ) with respect to δ = δn,T =√T /2T = log(n)/n for different T s. (B) Normal q-q plot of the standardized statistics n(σ̂n,T − σ0 ) with respect to the limiting normal law (with T = 11, n = 211 ) (C-D) Normal q-q plots of the standardized components of statistics √ T (ϑ̂n,T − ϑ0 ) with respect to the limiting normal law (with T = 11, n = 211 ) (C for α and D for β components). In all cases estimations are based on simulated samples with length M = 1000. ESTIMATING A CLASS OF DIFFUSIONS VIA APPROXIMATE ML METHOD 15 K, d, and a, such that for all T > T0 , and n ≥ n0 , E sup ϑ∈K n−1 X Z ti+1 1 p (f (Xt , ϑ) − f (Xti , ϑ))a(Xt ) dt T δn,T i=0 ti !2 ≤ C1 !2 n−1 X Z ti+1 1 E sup p ≤ C1 (f (Xt , ϑ) − f (Xti , ϑ))a(Xt ) dWt T δn,T i=0 ti ϑ∈K !2   n−1 X Z ti+1 1 b(Xt ) p E sup f (Xti , ϑ) − 1 a(Xt ) dt ≤ C2 b(Xti ) T δn,T i=0 ti ϑ∈K !2   n−1 X Z ti+1 1 b(Xt ) f (Xti , ϑ) E sup p − 1 a(Xt ) dWt ≤ C2 . b(Xti ) T δn,T i=0 ti ϑ∈K (17) (18) (19) (20) Remark 6.2 If function f that satisfies (B2) and all its partial derivatives from (B2) are bounded on E × K then (B3) holds if a is bounded too. Similarly if a ∈ C 2 (E) is bounded then satisfies (B1). If in addition µ0 , b, b′ , and b′′ are bounded then (B4) holds for constant function c ≡ exp(γh0 ) where γ > 0 and h0 > 0 are constants. In this case the statements of Theorem 6.1 hold for T0 = 0, and hence for all T > 0 obviously from the proof of Theorem 6.1. Qd For a moment let us assume that K = i=1 hai , bi i is an open and bounded dQd dimensional rectangular in Θ. Then there exists ε > 0 such that Kε := i=1 hai −ε, bi +εi is an open and bounded d-rectangular in Θ too. Let φ : Rd → R be a C ∞ -function such that φ ≡ 1 on K and φ ≡ 0 on Kεc . Such a function exists (see e.g. [6], Lemma IV.4.4, p. 176). Then function (x, ϑ) 7→ f˜(x, ϑ) := f (x, ϑ) · φ(ϑ) satisfies (B2-3) if f satisfies the same assumption (with rescaled function g). Namely, f˜ ≡ f on E × K and f˜ ≡ 0 on ∂Kε . The same holds for all partial derivatives of f˜ that exist, and f˜ satisfies (B1) obviously. Since φ and all of its derivatives are bounded, f˜ satisfies (B3) too with Cg instead of g with a constant C depending on φ. Obviously, statements (17-20) hold for a function f that satisfies (B2-3), and a rectangular K if (17-20) hold for f˜ and the rectangular Kε . Moreover, notice that if (17-20) hold for an arbitrary open and bounded d-dimensional rectangular K, then the same statements hold for every relatively compact set in Θ. Hence it is sufficient to prove (17-20) for an open and bounded d-dimensional rectangular K ⊂ Θ, and a function f satisfying (B2-3) and the following additional assumption. (B K): For all x ∈ E and all 0 ≤ m ≤ d + 1, Dϑm f (x, ·) ≡ Ø, ∂ m ∂x2 Dϑ f (x, ·) ≡ Ø on ∂K. 2 ∂ m ∂x Dϑ f (x, ·) ≡ Ø and Moreover, let A be an invertible affine mapping of Rd , and let f be a function on E × Θ that satisfies (B2-3) and (B K). Then the function f¯ defined on E × A(Θ) by the rule f¯(x, η) := f (x, A−1 η), satisfies (B2-3) and (B AK) too. Since the left hand side of (17-20) do not change by the change of variable ϑ 7→ η = Aϑ, it is sufficient to prove (17-20) for K0 := h−π, πid and a function f that satisfies (B2-3) and (B K0 ). Now, let f be a function satisfying (B2-3) and (B K0 ). For x ∈ E, k = (k1 , . . . , kd ) ∈ Zd , ϑ = (ϑ1 , . . . , ϑd ) ∈ Θ, and j = (j1 , . . . , jd ) where j1 ,..., jd are nonnegative integers such that m := j1 + · · · + jd ≤ d + 1, let us define Fourier coefficients of f by R 1 −ihk|ϑi dϑ, Ck (x) := (2π) d K0 f (x, ϑ)e R m (j) ∂ f 1 −ihk|ϑi dϑ. Ck (x) := (2π)d K0 j1 jd (x, ϑ)e ∂ϑ1 ···∂ϑd ESTIMATING A CLASS OF DIFFUSIONS VIA APPROXIMATE ML METHOD 16 (j) Let kj := k1j1 · · · kdjd . Since (B K0 ) holds, it is well known that Ck (x) = im kj Ck (x) for each fixed x ∈ E (see e.g. [27], pp. 177-178). This relation is used in the proof of the next few lemmas (see Appendix). Lemma 6.3 Let x, y ∈ E. Then for all k ∈ Zd , |Ck (x)| ≤ g(x)  d+1 1+|k1 |+···+|kd | Lemma 6.4 Let f ∈ C(E). Then for all 0 ≤ t0 < t, 4 R t E t0f (Xs ) dWs d+1 . Rt ≤ 3e3(t−t0 ) E t0 f 4 (Xs ) ds ≤ Rt ≤ 24(e3(t−t0 ) E t0(f (Xs )−f (Xt0 ))4 ds + E[f 4 (Xt0 )](t − t0 )2 ). Lemma 6.5 Let (B4) hold. If c0 := (1 + c)/2 then (∀t ≥ 0) sup0<h≤h0 E  b(Xt+h ) b(Xt ) 8 ≤ E c0 (Xt ). (21) Lemma 6.6 There exist constants K1 > 0, K2 > 0, T0 ≥ 0, and n0 ∈ N, depending on K0 , g and a, and such that for all k ∈ Zd , T > T0 , n ≥ n0 and subdivisions 0 = t0 < t1 < · · · < tn = T (with δn,T ↓ 0) the following hold: n−1Z 1 X ti+1 k p (Ck (Xt )−Ck (Xti ))a(Xt ) dtkL2 T δn,T i=0 ti n−1 XZ ti+1 1 p (Ck (Xt )−Ck (Xti ))a(Xt ) dWt kL2 k T δn,T i=0 ti   n−1Z b(Xt ) 1 X ti+1 Ck (Xti ) k p −1 a(Xt ) dtkL2 b(Xti ) T δn,T i=0 ti   Z n−1 X ti+1 b(Xt ) 1 −1 a(Xt ) dWt kL2 kp Ck (Xti ) b(Xti ) T δn,T i=0 ti ≤ K1 · Kk (22) ≤ K1 · Kk (23) ≤ K2 · Kk (24) ≤ K2 · Kk , (25) where Kk := ((d + 1)/(1 + |k1 | + · · · + |kd |))d+1 . P Let SN (x, ϑ) := |k|≤N Ck (x)eihk|ϑi for x ∈ E, ϑ ∈ K0 and N be a positive integer. Then it can be proved that limN |SN (x, ϑ) − f (x, ϑ)| = 0 uniformly in ϑ ∈ K0 by the methods of Fourier analysis (see e.g. [27], pp. 180-183). P Lemma 6.7 k∈Zd |Ck (x)| ≤ Kg(x), and supN,ϑ∈K0 |SN (x, ϑ) − f (x, ϑ)| ≤ Kg(x) for a positive and finite constant K= P  d+1 . d+1 k∈Zd 1+|k1 |+···+|kd | (26) Lemma 6.8 Let a ∈ C 1 (E) and let f be a function that satisfies (B2). Then for a.s. RT ω ∈ Ω, function ϑ 7→ 0 f (Xt , ϑ)a(Xt ) dWt (ω) is continuous on Θ. Proof of Theorem 6.1. Let us prove (18) and (20). The proofs of (17) and (19) go in the same way but we have to obtain expressions of form (27) below with respect to Lesbegues’ instead of Winner’s integral, and to apply Lemma 6.6 (22) and (24). Without ESTIMATING A CLASS OF DIFFUSIONS VIA APPROXIMATE ML METHOD 17 Qd loosing generality let us assume that K = K0 = i=1 h−π, πi and let f satisfy (B2-3) and (B K0 ). For fixed ϑ ∈ K0 , T > 0 and a subdivision 0 = t0 < t1 < · · · < tn = T we define the following processes: Pn−1 Ut := i=0 (f (Xt , ϑ) − f (Xti , ϑ))a(Xt )1 hti ,ti+1 ] (t), t ∈ [0, T ], Pn−1 (N ) Ut := i=0 (SN (Xt , ϑ) − SN (Xti , ϑ))a(Xt )1 hti ,ti+1 ] (t), t ∈ [0, T ], N ∈ N, and Vt := (N ) Vt := Pn−1   1 a(Xt )1 hti ,ti+1 ] (t), t ∈ [0, T ],  − 1 a(Xt )1 hti ,ti+1 ] (t), t ∈ [0, T ], N ∈ N. b(Xt ) − i=0 b(X  ti ) Pn−1 b(Xt ) i=0 SN (Xti , ϑ) b(Xti ) f (Xti , ϑ) (N ) (N ) (N ) Then limN |Ut − Ut | = 0, limN |Vt − Vt | = 0, and supN |Ut − Ut | ≤ K 2 (g 2 (Xt ) + (N ) − Vt | ≤ (K/2)g 2 (Xti ) a2 (Xt ) + 2(b(Xt )/b(Xti ))2 + 2, g 2 (Xti )) + a2 (Xt )/2, supN |Vt for t ∈ hti , ti+1 ] by Lemma 6.7. Since (B1-4) hold and hence Lemma 6.5 holds there RT exist T1 ≥ 0 and n1 ∈ N such that for all T > T1 , n ≥ n1 integrals 0 g 2 (Xt ) dWt , R R Pn−1 ti+1 Pn−1 2 Pn−1 2 ti+1 2 a (Xt ) dWt , i=0 (b(Xt )/b(Xti ))2 dWt , i=0 g (Xti ) ti i=0 g (Xti )∆i W , ti RT 2 and 0 a (Xt ) dWt are well defined, and so IN (ϑ) := JN (ϑ) := RT R0T 0 P (N ) Ut (N ) Vt dWt → P dWt → RT R0T 0 Ut dWt =: I(ϑ), N → +∞, Vt dWt =: J(ϑ), N → +∞, by the dominated convergence theorem for stochastic integrals (see e.g. [25], Theorem (2.12), pp. 134-135). First, let us consider sequence (IN (ϑ)). For every ϑ ∈ K0 ∩ Qd there exists a subsequence (Np ) ≡ (Np (ϑ)) and an event A(ϑ) of the probability 1 such that for all ω ∈ A(ϑ), limp INp (ϑ)(ω) = I(ϑ)(ω). Let us recall that IN (ϑ) = I(ϑ) = Pn−1 R ti+1 i=0 Rti (SN (Xt , ϑ) − SN (Xti , ϑ))a(Xt ) dWt , N ∈ N, Pn−1 ti+1 i=0 ti (f (Xt , ϑ) − f (Xti , ϑ))a(Xt ) dWt . Let Ω0 := ∩ϑ∈K0 ∩Qd A(ϑ). Then on this event of probability 1, for all ϑ ∈ K0 ∩ Qd , the following holds: |I(ϑ)| ≤ |I(ϑ) − INp (ϑ) (ϑ)| + |INp (ϑ) (ϑ)| ≤ |I(ϑ) − INp (ϑ) (ϑ)|+ P Pn−1 R t + k∈Zd | i=0 tii+1 (Ck (Xt ) − Ck (Xti ))a(Xt ) dWt |. By taking limit when p → +∞, we get the following inequality: Pn−1 R t P |I(ϑ)| ≤ k∈Zd | i=0 tii+1 (Ck (Xt ) − Ck (Xti ))a(Xt ) dWt |. Since ϑ 7→ I(ϑ) is a continuous function by Lemma 6.8, it turns out that supϑ∈K0 |I(ϑ)| = supϑ∈K0 ∩Qd |I(ϑ)|, and so supϑ∈K0 |I(ϑ)| is a random variable. Hence supϑ∈K0 |I(ϑ)| ≤ P k∈Zd Pn−1 R ti+1 i=0 ti (Ck (Xt ) − Ck (Xti ))a(Xt ) dWt a.s. (27) Since there exist T0 ≥ T1 and n0 ≥ n1 such that for all T > T0 , n ≥ n0 and subdivisions of [0, T ] with δn,T ↓ 0, p Pn−1 R ti+1 P (Ck (Xt ) − Ck (Xti ))a(Xt ) dWt kL2 ≤ K1 K T δn,T , k∈Zd k i=0 ti ESTIMATING A CLASS OF DIFFUSIONS VIA APPROXIMATE ML METHOD 18 by Lemma 6.6 and (26), the series on the righthand side of (27) converges a.s. and in L2 norm to a.s. equal limits (see Proposition 2.10.1. in [7], p. 68). Hence k supϑ∈K0 |I(ϑ)|kL2 p ≤ C1 T δn,T for C1 := K1 K. That proves (18). The proof of (20) goes in a similar way considering sequence (JN (ϑ)). We need following lemma for proving consistency and asymptotic normality of diffusion coefficient parameter estimator. Lemma 6.9 Let (B4) hold, and let b ∈ C 3 (E). Moreover, let there exist constants Cb > 0 and Tb ≥ 0 such that RT Pn−1 4 r (Xti )∆i t) ≤ Cb . (∀T > Tb ) T1 E( 0 ((b2 b′′′ )2 + r16 )(Xt ) dt + i=0 Then there exist constants C > 0, T0 ≥ 0, and n0 ∈ N, such that for all T > T0 , and n ≥ n0 ,   2 Pn−1 1 R ti+1 b(Xt ) 1 2 − (∆i W ) ≤ C. i=0 ∆i t TE b(Xt ) dWt ti i Remark 6.10 If b and its derivatives up to the third order are bounded then the statement of Lemma 6.9 hold for all T > T0 = 0 by the same arguments as in Remark 6.2. 6.1 Fixed maximal observational time case Let T > 0 be fixed, and let 0 = t0 < · · · tn = T , n ∈ N, be subdivisions of [0, T ] such that δn,T = max0≤i≤n−1 ∆i t ↓ 0 when n → +∞. We need the next corollary to Theorem 6.1. Corollary 6.11 Let X be a diffusion such that (H1a-4a) hold and let K ⊂ Θ be a relatively compact set. Then for all θ0 = (ϑ0 , σ) ∈ Ψ, T > 0, and r = 0, 1, 2, p supϑ∈K |Dr ℓn,T (ϑ) − Dr ℓT (ϑ)| = OPθ0 ( δn,T ), n → +∞. (28) Proof of Corollary 6.11. We prove (28) for r = 0. Statement (28) for cases r = 1 and r = 2 can be proved similarly. Let θ0 = (ϑ0 , σ) ∈ Ψ be arbitrary, and let µ0 := µ(·, ϑ0 ). Moreover, let f (·, ϑ) := µ(·, ϑ)/b, ϑ ∈ K, and f0 := µ0 /b. Then for any n, ℓn,T (ϑ) − ℓT (ϑ) = 2 R Pn−1R ti+1 µ(Xti ,ϑ) µ(Xt ,ϑ) µ2(Xt ,ϑ) 1 Pn−1 ti+1 µ (Xti ,ϑ) = 2 (X ) − b2 (X ) ) dt = i=0 ti ( b2 (Xti ) − b2 (Xt ) ) dXt − 2 i=0 ti ( b ti  t Pn−1R ti+1 b(Xt ) = −1 f0 (Xt )) dt+ , ϑ) , ϑ))f (X ) + f (X ((f (X , ϑ)−f (X 0 t ti t ti i=0 ti  b(Xti )  √ Pn−1R ti+1 b(Xt ) −1 ) dWt − + σ i=0 ti ((f (Xt , ϑ)−f (Xti , ϑ)) + f (Xti , ϑ) b(X ti ) R P t n−1 i+1 2 1 2 − 2 i=0 ti (f (Xt , ϑ)−f (Xti , ϑ)) dt (29) by the definitions of ℓT and ℓn,T , and (1). Let us assume for a moment that functions f0 , b, b′ , b′′ , are bounded on E, and ∂2 ∂ m Dϑm f , and ∂x f and its partial derivatives Dϑm f , ∂x 2 Dϑ f are bounded on E × K for 0 ≤ m ≤ d + 1. Then f and f 2 satisfy condition (B2) from Theorem 6.1, and f0 and a constant function 1 satisfy (B1), since (H2a-3a) hold. Hence, by Remark 6.2 the statements of Theorem 6.1 holds for these functions, and any T > 0. By applying this conclusion to (29), the following holds: p k supϑ∈K |ℓn,T (ϑ) − ℓT (ϑ)|kL2 (Pθ0 ) ≤ C δn,T , (30) ESTIMATING A CLASS OF DIFFUSIONS VIA APPROXIMATE ML METHOD 19 for any T > 0 and subdivisions of [0, T ] with δn,T ≤ h0 , and a constant C > 0 which depends on T , X and K. Now, let X, µ and b satisfy assumptions (H1a-3a), and let x0 be the initial state of X. Moreover, let (Em , m ≥ 1) be a sequence ofSopen and bounded subintervals of E +∞ such that for all m, E m ⊂ Em+1 , x0 ∈ E1 , and m=1 Em = E, and let (φm , m ≥ 1) ∞ be a sequence of C -functions on E such that for all m, 0 ≤ φm ≤ 1, φm (x) = 1 for c x ∈ E m and φm ≡ 0 on Em+1 . Let us define the following bounded functions for each m: µm (x, ϑ) := φm (x)µ(x, ϑ), (x, ϑ) ∈ E × Θ, bm (x) := φm (x)b(x) + cm (1 − φm (x)), x ∈ E where cm := sign b · maxx∈Em+1 |b(x)|. Since µ and b satisfy (Ha2-a3), bm ∈ C 2 (E), and bm , b′m , b′′m are bounded on E, and (x, ϑ) 7→ µm (x, ϑ)/b(x), µ2m (x, ϑ)/b2 (x) satisfy (B2) and are bounded on E × K, and hence satisfy (B3) too, for each m. Moreover, c let τm := inf{t ≥ 0 : Xt ∈ Em }, m ≥ 1. Since X is a continuous process, (τm , m ≥ 1) is an increasing sequence of stopping times (see [25]) such that τm ↑ +∞ a.s., when m → +∞. Let m be fixed and let diffusion X m = (Xtm ; t ≥ 0) be defined as solution to SDE: Rt √ Rt Xtm = x0 + 0 µm (Xsm , ϑ0 ) ds + σ 0 bm (Xsm ) dWs , t > 0. By Theorem V.11.2 in [26] (Vol. 2, p. 128) such a diffusion exists and is a.s. unique. Moreover, for almost all ω ∈ Ω and t ∈ [0, τm (ω)], Xt (ω) = Xtm (ω) by Corollary V.11.10 in [26] (Vol. 2, p. 131). This implies (see [29]) that for an arbitrary number A > 0, p Pθ0 {supϑ∈K |ℓn,T (ϑ) − ℓT (ϑ)| > A δn,T } ≤ (31) m ≤ Pθ0 {τm ≤ T } + √1 k supϑ∈K |ℓm n,T (ϑ) − ℓT (ϑ)|kL2 (Pθ0 ) , A δn,T m both based where ℓm T and ℓn,T are LLF (9) and its Euler approximation (5) respectively, √ m on diffusion X with drift µm (·, ϑ0 ), and diffusion coefficient function σbm . Now, (30) m holds for functions ℓm T and ℓn,T with constant C = Cm . Hence the righthand side of (31) is dominated by expression Pθ0 {τm ≤ T } + A1 Cm . First, let us take a limit when n → +∞, and then when A → +∞. Next, we take a limit when m → +∞, and hence we prove (28). Proof of Theorem 4.1. We need to show that the model and random functions ℓT and ℓn,T , n ≥ 1, for fixed T > 0, satisfy conditions (A1-5) of Theorem 3.1 of [17]. Let Fn,T be σ-subalgebras of FT0 that are introduced in Section 4. We recall from the same section that ℓT is a FT0 ⊗ B(Θ)-measurable function. In the same way, ℓn,T is Fn,T ⊗ B(Θ)-measurable, for each n. Hence (A1) is satisfied. Corollary 6.11 implies that functions ℓT and ℓn,T , n ≥ 1, satisfy (A3). The same corollary and (H5a) imply (A4) and (A5). Condition (A2) is the same as assumption (H4a). Hence by Theorem 3.1 of [17] there exists a sequence of FT0 -measurable random vectors (ϑ̂n,T , n ≥ 1) such that the statements of Theorem 4.1 hold. For proving Corollary 4.2 we need the following lemma. Lemma 6.12 Let (H1a-2a) hold, and T > 0 be fixed. Then for θ = (ϑ, σ) ∈ Ψ, Pn−1 i=0 (∆i X−µ(Xti ,ϑ)∆i t)2 b2 (Xti )∆i t −σ Pn−1 i=0 (∆i W )2 ∆i t = OPθ (1), n → +∞. (32) Proof of Corollary 4.2. Notice that (ii) implies the consistency (i.e. (i)) of σ̂n . Let us prove (ii). Since √ √ n(σ̂n,T − σ) = n(σ̂n,T − σ n Pn−1 i=0 (∆i W )2 ∆i t ) √ +σ 2· √1 2n Pn−1 i=0 (∆i W )2 −∆i t ∆i t (33) ESTIMATING A CLASS OF DIFFUSIONS VIA APPROXIMATE ML METHOD 20 and (L) limn √12n that for all ǫ > 0, Pn−1 i=0 (∆i W )2 −∆i t ∆i t = N (0, 1), for (ii) to hold it is sufficient to prove √ limn Pθ { n(σ̂n,T − σ n Pn−1 i=0 (∆i W )2 ∆i t ) (34) ≥ ǫ} = 0. Let ǫ > 0 and η > 0 be any numbers and let K be a relatively compact set in Θ. If K + η := {ϑ ∈ Θ : (∃ϑ′ ∈ K) |ϑ − ϑ′ | < η} then on event Pn−1 Pn−1 (∆i X−µ(Xti ,ϑ)∆i t)2 )2 )| < 5ǫ , ϑ̂T ∈ K}∩ − σ i=0 (∆∆i W A = {| √1n ( i=0 b2 (Xti )∆i t it √ ǫ √ ǫ }∩ ∩{|ϑ̂n,T − ϑ̂T | < η, |ℓT (ϑ) −√ℓT (ϑ̂T )| < n 10 , |ℓn,T (ϑ) − ℓT (ϑ)| < n 10 √ ǫ ǫ ∩{supϑ′ ∈K+η |DℓT (ϑ′ )| < ηn 10 , supϑ′ ∈K+η |ℓn,T (ϑ′ ) − ℓT (ϑ′ )| < n 10 }, Pn−1 √ the following holds: | n(σ̂n,T − nσ i=0 2 σ Pn−1 (∆i W ) i=0 n ∆i t ) < ǫ}. Hence (∆i W )2 ∆i t )| √ < ǫ. This implies that A ⊆ { n(σ̂n,T − Pn−1 (∆i W )2 √ Pθ { n(σ̂n,T − nσ i=0 ∆i t ) ≥ ǫ} ≤ 2 Pn−1 (∆i W )2 Pn−1 (∆i X−µ(Xti ,ϑ)∆ i t) 1 1 − σ i=0 ≤ Pθ {| √n ( i=0 b2 (Xti )∆i t ∆i t )| ≥ 5 ǫ}+ √ ǫ +Pθ {ϑ̂T ∈ Kc } + Pθ {|ℓT (ϑ) −√ℓT (ϑ̂T )| ≥ n 10 } + Pθ {|ϑ̂n,T − ϑ̂T | ≥ η}+ √ ǫ n ǫ ′ +Pθ {supϑ′ ∈K+η |DℓT (ϑ )| ≥ η 10 } + Pθ {|ℓn,T (ϑ) − ℓT (ϑ)| ≥ n 10 }+ √ ǫ ′ ′ +Pθ {supϑ′ ∈K+η |ℓn,T (ϑ ) − ℓT (ϑ )| ≥ n 10 }. By Lemma 6.12, Corollary 6.11, property (ii) of ϑ̂n,T from Theorem 4.1, and arbitrariness of K, (34) follows. Ergodic case 6.2 For all T > 0 let 0 = t0 < · · · < tn = T , n ∈ N, be equidistant subdivisions of [0, T ] such that δn,T = T /n → 0 when T → +∞ and n → +∞. We need the following corollary to Theorem 6.1. Corollary 6.13 Let X be a diffusion such that (H1b-3b) hold. Then for all θ0 = (ϑ0 , σ) ∈ Ψ, πϑ0 -a.s. nonrandom initial conditions, and r = 0, 1, 2, p supϑ∈Θ | T1 Dr ℓn,T (ϑ) − T1 Dr ℓT (ϑ)| = OPθ0 ( δn,T ), T → +∞, n → +∞. (35) Proof of Corollary 6.13. Similarly to the proof of Corolarlly 6.11 it is sufficient to prove (35) for r = 0 since the statement of the corollary for cases r = 1 and r = 2 can be proved √ in the same way. Let θ0 = (ϑ0 , σ) ∈ Ψ be arbitrary, and let µ0 := µ(·, ϑ0 ), ν := σb, and P ≡ Pθ0 , E ≡ Eθ0 . Let us recall expression (29) from the proof of Corolarlly 6.11 where f (·, ϑ) = µ(·, ϑ)/b, ϑ ∈ Θ, and f0 = µ0 /b. Notice that f and f 2 satisfy (B2) since (H2a-3a) hold by (H2b-3b). Let us show that f0 satisfies (B1) and f satisfies (B3) with respect to a ≡ f0 and compact Θ, and that f 2 satisfies (B3) with respect to constant function a ≡ 1 and the same compact (notice that constant function trivially satisfies (B1)). If we fix ϑ ∈ Θ, m such that 0 ≤ m ≤ d + 1, and m nonnegative integers j1 ,..., jd such that j1 + · · · + jd = m then let f˜ := j1∂ jd f (·, ϑ), ∂m j µ(·, ϑ). By (H3a), j1 ∂ϑ1 ···∂ϑdd g0 ∈ L32 (πϑ0 ) ⊂ L8 (πϑ0 ), and and µ̃ := |f˜| ≤ |f˜′ b| = ′ ˜ |f µ 0 | = |f˜′′ b2 | = ∂ϑ1 ···∂ϑd f˜, µ̃ ∈ C 2 (E). Since (H3b) holds it follows that |µ̃′ − f˜b′ | ≤ g1 + g0 |b′ | =: g01 ∈ L16 (πϑ0 ) ⊂ L8 (πϑ0 ) |(f˜′ b)f0 | ≤ g01 g1 =: g02 ∈ L8 (πϑ0 ) |µ̃′′ b − 2(f˜′ b)b′ − f (b′′ b)| ≤ g2 + 2g01 |b′ | + g0 |b′′ b| =: g03 ∈ L8 (πϑ0 ) ESTIMATING A CLASS OF DIFFUSIONS VIA APPROXIMATE ML METHOD 21 √ by (H2b-3b). Then function g00 := g0 + σg01 +g02 +σg03 is such that g00 ∈ L8 (πϑ0 ) ⊂ 4 L (πϑ0 ) and   ∂2 ∂ m 2 Dϑm f (·, ϑ)|∞ (|µ0 |+|ν|)+| ∂x supϑ∈Θ |Dϑm f (·, ϑ)|∞ +| ∂x 2 Dϑ f (·, ϑ)ν |∞ ≤ g00 for all 0 ≤ m ≤ d + 1. This implies that f satisfies the first part of (B3) with g ≡ g00 . This also implies that |f0 | + |f¯0 | ≤ g00 and hence f0 , f¯0 ∈ L8 (πϑ0 ). By ChaconOrnstein theorem, ergodic theorem for additive functionals and its corollary (e.g. Theorem (A.5.2) on p. 504, Theorem (X.3.12) on p. 397, and Exercise (X.3.18) on p. 399 in [25]), for πϑ0 -a.s. initial values x0 ∈ E, RT Pn−1 limT →+∞ E( T1 0 f08 (Xt ) dt) = limn,T E( T1 i=0 f08 (Xti )∆i t) = R 8 R P (36) n−1 (x)πϑ0 (dx) < +∞ = limn E( n1 i=0 f08 (Xti )) = E f08 (x)πϑ0 (dx) ≤ E g00 since (H1b) holds, and subdivisions are equidistant (∆i t = T /n for each i). Moreover, since f0 ∈ L4 (πϑ0 ) too, the same holds for 4th powers of f0 , i.e. if we substitute f04 instead of f08 in (36). Finally, the both conclusions hold for f¯0 too. Hence f0 satisfies (B1). It remains to show that g00 satisfies the limiting properties from (B3). Using the same arguments as in proving (36) it follows that (36) holds for 8th and hence for 4th power of g00 . Moreover, since f0 , g00 ∈ L8 (πϑ0 ) implies f0 g00 ∈ L4 (πϑ0 ), and (36) (with respect to f¯0 and g00 too) holds, it follows that R Pn−1 limn,T E( T1 i=0 (f0 g00 )4 (Xti )∆i t) = E (f0 g00 )4 (x)πϑ0 (dx) < +∞, R P t n−1 4 limn,T E( T1 i=0 g00 (Xti ) tii+1 (f04 + f¯04 )(Xt ) dt ≤ RT P n−1 8 (Xti )∆i t) + limT →+∞ E( T1 0 (f08 + f¯08 )(Xt )dt) < +∞. ≤ 12 limn,T E( T1 i=0 g00 Hence f satisfies (B3) for πϑ0 -a.s. nonrandom initial conditions. It remains to show 2 that f 2 satisfies (B3) with respect to function a ≡ 1. Let g := 7 · 2d+1 g00 ∈ L4 (πϑ0 ). Notice that uniformly with respect to ϑ ∈ Θ, 2 ∂ ∂2 ∂2 ∂ ∂ 2 2 2 |f 2 |+| ∂x (f 2 )|+| ∂x +f ∂x 2 (f )| ≤ |f |+2|f ∂x f |+2| ∂x f 2 f | ≤ 7g00 ≤ g. Let us put fˆ := ∂m j j ∂ϑ11 ···∂ϑdd (f 2 )(·, ϑ) for fixed ϑ ∈ Θ, m such that 0 ≤ m ≤ d + 1, and nonnegative integers j1 ,..., jd such that j1 + · · · + jd = m. Then by induction ∂2 ˆ ∂ ˆ m 2 f | + | ∂x g00 ≤ g. |fˆ| + | ∂x 2 f| ≤ 7 · 2 Then (36) (for 4th powers of g00 ) implies that f 2 satisfies (B3) with respect to a ≡ 1, for πϑ0 -a.s. nonrandom initial conditions. Finally, (B4) holds for πϑ0 -a.s. nonrandom initial conditions since (H1b-H2b) hold. Hence we can apply Theorem 6.1 to (29) to conclude that there exists constants C > 0, T0 ≥ 0, and n0 ∈ N, such that for all T > T0 and n ≥ n0 , and arbitrary A > 0, Pθ0 { √ 1 supϑ∈Θ | T1 ℓn,T (ϑ) − T1 ℓT (ϑ)| ≥ A} ≤ δ 2  n,T 1 1 1 1 supϑ∈Θ | T ℓn,T (ϑ) − T ℓT (ϑ)| ≤ AC2 . ≤ A2 E √ δn,T Hence limA→+∞ limn,T Pθ0 { √ 1 δn,T supϑ∈Θ | T1 ℓn,T (ϑ) − 1 T ℓT (ϑ)| ≥ A} = 0 which proves the corollary. In order to prove Theorems 4.5-4.6 we need the following lemmas. ESTIMATING A CLASS OF DIFFUSIONS VIA APPROXIMATE ML METHOD 22 Lemma 6.14 Let (H1b-3b) hold. Then for all θ0 = (ϑ0 , σ) ∈ Ψ there exist constants Cr > 0 (r = 0, 1, 2) such that Pθ0 -a.s. there exists T0 > 0 such that for all ϑ1 , ϑ2 ∈ Θ, and all T ≥ T0 , | T1 Dr ℓT (ϑ1 ) − T1 Dr ℓT (ϑ2 )| ≤ Cr |ϑ1 − ϑ2 |, r = 0, 1, 2, − T1 DℓT (ϑ2 ) − T1 D2 ℓT (ϑ2 )(ϑ1 − ϑ2 )| ≤ 21 C2 |ϑ1 − ϑ2 |2 , and supϑ∈Θ T1 |D3 ℓT (ϑ)| ≤ C2 . | T1 DℓT (ϑ1 ) Lemma 6.15 Let (H1b-3b) hold. Then for all θ0 = (ϑ0 , σ) ∈ Ψ, Pθ0 -a.s. limT →+∞ supϑ∈Θ | T1 ℓT (ϑ) − ℓϑ0 (ϑ)| = 0. Proof of Theorem 4.5. Let θ0 = (ϑ0 , σ) ∈ Ψ be arbitrary. Since Θ is an open set there exists ε0 > 0 such that K(ϑ0 , ε0 ) ⊂ Θ. Let ℓϑ0 be function (12) and let λ0 := min|y|=1 y τ I(ϑ0 )y = − max|y|=1 y τ D2 ℓϑ0 (ϑ0 )y > 0 be the minimal eigenvalue of the Fisher information matrix I(ϑ0 ) since it is positive definite by (H5b). Moreover, let Cr > 0 (r = 0, 1, 2) be constants from Lemma 6.14, and let Ω0 be an intersection of the events from Lemmas 6.14-6.15, and the events such that (12) and (13) hold for ϑ0 . Hence Pθ0 (Ω0 ) = 1, and for ω ∈ Ω0 , let T0 ≡ T0 (ω) > 0 be a such that the statements of Lemma 6.14 hold for T ≥ T0 . Let ε > 0 be such that ε ≤ ε0 ∧ λ0 /(4C2 ). Then K(ϑ0 , ε) ⊂ Θ. Let ω ∈ Ω0 be fixed. Since (13) holds, there exists T1 ≥ T0 such that for all T ≥ T1 , | T1 D2 ℓT (ϑ0 ) − D2 ℓϑ0 (ϑ0 )| < λ40 and | T1 DℓT (ϑ0 ) − Dℓϑ0 (ϑ0 )| < λ40 ε. Then for all y ∈ Rd , |y| = 1, T ≥ T1 , and ϑ ∈ K(ϑ0 , ε), y τ ( T1 D2 ℓT (ϑ))y ≤ ≤ | T1 D2 ℓT (ϑ) − T1 D2 ℓT (ϑ0 )| + | T1 D2 ℓT (ϑ0 ) − D2 ℓϑ0 (ϑ0 )|+ +y τ D2 ℓϑ0 (ϑ0 )y < C2 |ϑ − ϑ0 | + λ40 − λ0 ≤ λ0 + λ40 − λ0 = − λ20 . C2 4C 2 Hence ϑ 7→ T1 ℓT (ϑ) is a strictly concave function on K(ϑ0 , ε). Moreover, if z ∈ Rd is such that |z| = ε, then for y := z/|z| and T ≥ T1 , 1 T DℓT (ϑ0 + z)z = ≤ ≤ R 1 τ 1 1 2 T DℓT (ϑ0 )z + z ( T 0 D ℓT (ϑ0 + Rtz) dt)z ≤ 1 | T1 DℓT (ϑ0 ) − Dℓϑ0 (ϑ0 )|ε + y τ ( T1 0 D2 ℓT (ϑ0 λ0 2 λ0 2 λ0 2 4 ε − 2 ε = − 4 ε < 0. + tz)dt)yε2 ≤ Then there exists ϑ̂T ∈ K(ϑ0 , ε) such that DℓT (ϑ̂T ) = Ø (see e.g. Lemma 4.3. in [17]), and D2 ℓT (ϑ̂T ) < Ø since min|y|=1 y τ (− T1 D2 ℓT (ϑ))y ≥ λ20 = 12 min|y|=1 y τ I(ϑ0 )y for all ϑ ∈ K(ϑ0 , ε) obviously. Since ε > 0 is an arbitrary small number, these imply statement (ii) of the theorem. Notice that ϑ̂T is the unique point of maximum of function ℓT on K(ϑ0 , ε) since ℓT is strictly concave on this set. To finish the proof of statement (i) we have to prove that there exists T2 ≥ T1 such that ϑ̂T is the unique point of global maximum of ℓT on Θ. Since for all ϑ ∈ Θ \ {ϑ0 }, ℓϑ0 (ϑ0 ) > ℓϑ0 (ϑ), ℓϑ0 ∈ C(Θ), and Θ \ K(ϑ0 , ε) is a compact set, it follows that ℓϑ0 (ϑ0 ) > sup|y|≥ε ℓϑ0 (ϑ0 + y). By Lemma 4.4. in [17] there exists a number 0 < s(ε) < ε such that ∆(ϑ0 , ε) := inf |x|≤s(ε) ℓϑ0 (ϑ0 + x) − sup|y|≥ε ℓϑ0 (ϑ0 + y) > 0. Since Lemma 6.15 holds there exists T2 ≥ T1 such that for T ≥ T2 , supϑ∈Θ | T1 ℓT (ϑ) − ℓϑ0 (ϑ)| < ∆(ϑ0 ,ε) . 4 ESTIMATING A CLASS OF DIFFUSIONS VIA APPROXIMATE ML METHOD 23 If x, y ∈ Rd such that |x| ≤ s(ε) and |y| ≥ ε then 1 T ℓT (ϑ0 + x) = ≥ ≥ 1 T ℓT (ϑ0 + x) − ℓϑ0 (ϑ0 + x) + ℓϑ0 (ϑ0 + x) − ℓϑ0 (ϑ0 + y)+ +ℓϑ0 (ϑ0 + y) − T1 ℓT (ϑ0 + y) + T1 ℓT (ϑ0 + y) ≥ − ∆(ϑ40 ,ε) + inf |x|≤s(ε) ℓϑ0 (ϑ0 + x) − sup|y|≥ε ℓϑ0 (ϑ0 + y)+ − ∆(ϑ40 ,ε) + T1 ℓT (ϑ0 + y) ≥ ∆(ϑ0 ,ε) + T1 ℓT (ϑ0 + y) 2 implying that inf |x|≤s(ε) 1 T ℓT (ϑ0 + x) − sup|y|≥ε 1 T ℓT (ϑ0 + y) ≥ ∆(ϑ0 ,ε) 2 (37) >0 and hence ℓT (ϑ0 ) > sup|y|≥ε ℓT (ϑ0 + y). Finally, (i) follows. To prove statement (iii), first notice that √ R L−Pθ T 1 (38) √1 DℓT (ϑ0 ) = √ σ Dµ0 (Xt ) dWt −→0 N (Ø, σI(ϑ0 )), T → +∞ T T 0 b(Xt ) by Theorem 1 in [8] since (H1b-5b) hold, and second notice that for ϑ̄(s) := sϑT + (1 − s)ϑ0 , R 1R 1 (39) DℓT (ϑ̂T ) = DℓT (ϑ0 )+D2 ℓT (ϑ0 )(ϑ̂T −ϑ0 )+ 0 0 D3 ℓT (ϑ̄(st)) ds tdt(ϑ̂T −ϑ0 )2 . R 1R 1 Let HT (ϑ0 ) := T1 D2 ℓT (ϑ0 ) + T1 0 0 D3 ℓT (ϑ̄(st)) ds tdt(ϑ̂T −ϑ0 ), and let us recall ω ∈ Ω0 and T1 = T1 (ω) from the first part od the proof. Notice that HT (ϑ0 ) is a symmetric matrix. Then from Lemma 6.14, for T ≥ T1 , |HT (ϑ0 ) − 1 T 1 D2 ℓT (ϑ0 )| ≤ supϑ∈Θ | 2T D3 ℓT (ϑ)||ϑ̂T −ϑ0 | ≤ C2 2 |ϑ̂T −ϑ0 | and hence, for y ∈ Rd such that |y| = 1, y τ HT (ϑ0 )y ≤ |HT (ϑ0 ) − 1 2 T D ℓT (ϑ0 )| + y τ ( T1 D2 ℓT (ϑ))y ≤ − 3λ8 0 implying that HT (ϑ0 ) is a negative definite matrix, and |HT (ϑ0 )−1 | ≤ |I(ϑ0 )−1 | = 1/λ0 , ≤ 8 3λ0 . Since |HT (ϑ0 )−1 + I(ϑ0 )−1 | ≤ |HT (ϑ0 )−1 | · |HT (ϑ0 ) + I(ϑ0 )| · |I(ϑ0 )−1 | ≤ 8 ( C2 |ϑ̂T −ϑ0 | + | T1 D2 ℓT (ϑ0 ) − D2 ℓϑ0 (ϑ0 )|), 3λ2 2 0 and (ii) and (13) hold, it follows that Pθ0 -a.s. limT →+∞ HT (ϑ0 )−1 = −I(ϑ0 )−1 . (40) Finally, since DℓT (ϑ̂T ) = Ø and I(ϑ0 ) is nonrandom, (38-40) imply that √ L−Pϑ T (ϑ̂T − ϑ0 ) = −HT (ϑ0 )−1 √1T DℓT (ϑ0 ) −→0 N (Ø, σI(ϑ0 )−1 ), T → +∞. Proof of Theorem 4.6. Let θ0 = (ϑ0 , σ) ∈ Ψ be arbitrary, and let Cr > 0 (r = 0, 1, 2) be constants from Lemma 6.14. Moreover, let Ω0 be a Pθ0 -probability one event from Lemmas 6.14-6.15 and Theorem 4.5 (i-ii). Let ω ∈ Ω0 be fixed. Let ε0 > 0 be a such number that K(ϑ0 , ε0 ) ⊂ Θ, and let λ0 > 0 be the minimal eigenvalue of Fisher matrix I(ϑ0 ). Then there exists T0 = T0 (ω) ≥ 0 such that for all T > T0 , ϑ̂T ∈ K(ϑ0 , ε0 /2) and λT := min|y|=1 y τ (− T1 D2 ℓT (ϑ̂T ))y ≥ λ0 /2 > 0, and the statements of Lemma ESTIMATING A CLASS OF DIFFUSIONS VIA APPROXIMATE ML METHOD 24 6.14 hold. Let ε > 0 be an arbitrary small number such that ε < ε0 2 K(ϑ̂T , ε) ⊂ K(ϑ̂T , ε0 /2) ⊂ K(ϑ0 , ε0 ) ⊂ Θ. Moreover, on event Ωn,T := {supϑ∈Θ | T1 Dr ℓn,T (ϑ) − 1 T Dr ℓT (ϑ)| ≤ λ0 8 (1 ∧ λ0 8C2 ), ∧ λ0 8C2 . Then r = 1, 2}, for ϑ ∈ K(ϑ̂T , ε) and z ∈ Rd such that |z| = ε, and y := z/|z|, the following holds: y τ D2 ℓn,T (ϑ)y ≤ |D2 ℓn,T (ϑ) − D2 ℓT (ϑ)|+|D2 ℓT (ϑ) − D2 ℓT (ϑ̂T )|+ λ0 − λ20 )T = − λ40 T < 0, +y τ D2 ℓT (ϑ̂T )y < ( λ40 + C2 8C 2 R τ 1 2 = Dℓn,T (ϑ̂T )z + z ( 0 D ℓn,T (ϑ̂T + tz) dt)z ≤ R1 ≤ |Dℓn,T (ϑ̂T ) − DℓT (ϑ̂T )|ε + y τ ( 0 D2 ℓT (ϑ̂T + tz)dt)yε2 ≤ λ0 λ0 λ0 λ0 ≤ ε 8C T < 0. ( 8 − λ40 )T = −ε 8C 2 2 8 Dℓn,T (ϑ̂T + z)z Hence ϑ 7→ ℓn,T (ϑ) is a strictly concave function on K(ϑ̂T , ε), and there exists ϑ̂n,T ∈ K(ϑ̂T , ε) such that Dℓn,T (ϑ̂n,T ) = Ø, and ϑ̂n,T is the unique stationary point and a point of maximum of ℓn,T at K(ϑ̂T , ε). These imply that ϑ̂n,T is a random vector. Since limn,T Pθ0 (Ωcn,T ) = 0 by Corollary 6.13, and Ωn,T ⊂ {Dℓn,T (ϑ̂n,T ) = Ø}∩{|ϑ̂n,T − ϑ̂T | < ε}, statements (i) and (ii) of the theorem follow. Moreover if process (ϑ̃n,T ) satisfies (i) and (ii) then statement (iv) follows since Ωn,T ∩ {Dℓn,T (ϑ̃n,T ) = Ø} ∩ {|ϑ̃n,T − ϑ̂T | < ε} ⊆ {ϑ̂n,T = ϑ̃n,T } by uniqness of a stationary point of ℓn,T on K(ϑ̂T , ε). To prove (iii), let A > p 0 be an arbitrary number, and let Ωn,T (A) := {supϑ∈Θ | T1 Dℓn,T (ϑ) − T1 DℓT (ϑ)| ≤ λ40 A δn,T }. Then on event Ωn,T (A) ∩ Ωn,T , |ϑ̂n,T − ϑ̂T | ≤ ≤ ⇒ ≤ ≤ |(D2 ℓT (ϑ̂T ))−1 |·|D2 ℓT (ϑ̂T )(ϑ̂n,T − ϑ̂T )| ≤ |(D2 ℓT (ϑ̂T ))−1 |·|DℓT (ϑ̂n,T )−DℓT (ϑ̂T )−D2 ℓT (ϑ̂T )(ϑ̂n,T − ϑ̂T )|+ +|(D2 ℓT (ϑ̂T ))−1 |·|Dℓn,T (ϑ̂n,T ) − DℓT (ϑ̂n,T )| ≤ p 2 C2 λ0 ϑ̂T | + λ02T λ04T A δn,T ≤ λ0 T 2 2C2 T |ϑ̂n,T −p 1 1 2 |ϑ̂n,T − ϑ̂T | + p 2 A δn,T |ϑ̂n,T − ϑ̂T | ≤ A δn,T p λ0 . Hence Ωn,T (A) ∩ Ωn,T ⊆ {|ϑ̂n,T − ϑ̂T | ≤ A δn,T }, by Lemma 6.14 and since ε ≤ 2C 2 and p 0 ≤ limA→+∞ limn,T Pθ0 {|ϑ̂n,T − ϑ̂T | ≤ A δn,T } ≤ ≤ limA→+∞ limn,T Pθ0 (Ωn,T (A)c ) + limn,T Pθ0 (Ωcn,T ) = 0 by Corollary 6.13, and (iii) follows. Consistency of ϑ̂n,T (the first part of statement (v)) follows directly from (ii) and Theorem 4.5 (ii). To prove its asymptotic normality (the second part of (v)) notice that √ √ p | T (ϑ̂n,T −ϑ0 ) − T (ϑ̂T −ϑ0 )| = T δn,T √ 1 δn,T Pθ 0 |ϑ̂n,T − ϑ̂T | −→ 0 when limn,T T δn,T = 0 since (iii) holds. Then the second part of (v) follows by Slutsky theorem since Theorem 4.5 (iii) holds. To prove statement (vi), first we need to prove that Pn−1 (∆i X−µ(Xti ,ϑ)∆i t)2 Pn−1 )2 1 (41) ) = OPθ (1), T → +∞, n → +∞ − σ i=0 (∆∆i W i=0 T( b2 (Xt )∆i t it i ESTIMATING A CLASS OF DIFFUSIONS VIA APPROXIMATE ML METHOD 25 for πϑ0 -a.s. initial conditions. This follows from Lemma 6.9, the proof of Lemma 6.12, and the fact that the functions f := µ(·, ϑ)/b and b satisfies (B1-4) which is proved in Corollary 6.13. The proof of asymptotic normality of σ̂n,T is the same as in the proof of Corollary 4.2 since = Pn−1 (∆i W )2 Pn−1 (∆i X−µ(Xti ,ϑ)∆i t)2 √1 ( − σ 2 (X )∆ t i=0 i=0 b ∆i t ) = n ti i √ 2 P P T δn,T n−1 (∆i X−µ(Xti ,ϑ)∆i t) n−1 (∆i W )2 ( i=0 − σ i=0 T b2 (Xti )∆i t ∆i t ) →0 when T → +∞ such that T δn,T → 0, and since (i − v), Corollary 6.13, and Lemma 6.15 hold. Similarly consistency of √ σ̂n,T follows from decomposition (33) in the proof of Corollary 4.2 (but without factor ” n”) by using (41) which appears with factor ”δn,T ” (notice that δn,T /T = 1/n), and by the strong low of large numbers instead of CLT. In this case it is sufficient to assume that δn,T → 0 when T → +∞. Finally, for proving 0 Fn,T -measurability of ϑ̂n,T (and hence σ̂n,T too) it is sufficient to prove that ϑ̂n,T is a unique point of maximum of ℓn,T on Θ. This proof follows in the similar way as proof of uniqness of ϑ̂T as global point of maximum of ℓT on Θ by replacing ℓT with ℓn,T and ℓϑ0 with ℓT . References [1] Aı̈t-Sahalia, Y. (2002). Maximum likelihood estimation of discretely sampled diffusions: A closedform approximation approach. Econometrica, 70(1), 223-262 [2] Aı̈t-Sahalia, Y., & Mykland, P. A. (2004). Estimators of diffusions with randomly spaced discrete observations: a general theory. The Annals of Statistics, 32(5), 2186-2222 [3] Aı̈t-Sahalia, Y. (2008). Closed-form likelihood expansions for multivariate diffusions. The Annals of Statistics, 36(2), 906-937 [4] Bibby, B. M., & Sørensen, M. (1995). Martingale estimation functions for discretely observed diffusion processes. Bernoulli, 1(1/2), 17-39 [5] Bishwal, J. P. N. (2008). Parameter Estimation in Stochastic Differential Equations, Lecture Notes in Mathematics 1923, Berlin: Springer-Verlag. [6] Borisovich, Yu., Bliznyakov, N., Izrailevich, Ya., & Fomenko, T. (1985). Introduction to Topology, Moscow: Mir Publishers. [7] Brockwell, P. J. & Davis, R. A. (1991). Time Series: Theory and Methods, 2nd ed., New York: Springer-Verlag. [8] Brown, B. M., & Hewitt, J. I. (1975). Asymptotic likelihood theory for diffusion processes, J. Appl. Prob., 12 228-238. [9] Dacunha-Castelle, D., & Florens-Zmirou, D. (1986). Estimation of the coefficients of a diffusion from discrete observations, Stochastics, 19 263-284. [10] Dohnal, G. (1987). On estimating the diffusion coefficient. J. Appl. Prob., 24 105-114. [11] Feigin, P. D. (1976). Maximum likelihood estimation for continuous-time stochastic processes. Adv. Appl. Prob., 8 712-736 [12] Florens-Zmirou, D. (1989). Approximate discrete time shemes for statistics of diffusion processes. Statistics: A Journal of Theoretical and Applied Statistics, 20 547-557. [13] Friedman, A. (1975). Stochastic Differential Equations and Applications, Vol. 1-2, New York: Academic Press. [14] Genon-Catalot, V. & Jacod, J. (1993). On the estimation of the diffusion coefficient for multidimensional diffusion processes. Ann. Inst. H. Poincare Probab. Statist., 29 119-151. [15] Huzak, M. (1997). Selection of diffusion growth process and parameter estimation from discrete observation, Ph.D. thesis, University of Zagreb (in Croatian). [16] Huzak, M. (1998). Parameter estimation of diffusion models. Mathematical Communications, 3 129-134. 26 APPENDIX [17] Huzak, M. (2001). A general theorem on approximate maximum likelihood estimation. Glasnik matematički, 36(56) 139-153. (hrcak.srce.hr/file/7900). [18] Jin, P., Mandrekar, V., Rüdiger, B. & Trabelsi, C. (2013). Positive Harris recurrence of the CIR process and its applications. Communications on Stochastic Analysis, 7(3), 409-424 [19] Kessler, M. (1997). Estimation of an Ergodic Diffusion from Discrete Observations, Scand. J. Statist., 24 211-229. [20] Kloeden, P. E., Platen, E., Schurz, H., & Sørensen, M. (1996). On Effects of Discretization on Estimators of Drift Parameters for Diffusion Processes. J. Applied Prob. 33(4) 1061-1076. [21] Lanska, V. (1979). Minimum contrast estimation in diffusion processes. J. Appl. Prob., 16 65-75. [22] LeBreton, A. (1976). On continuous and discrete sampling for parameter estimation in diffusion type processes, Math. Prog. Study, 5 124-144. [23] Li, C. (2013). Maximum-likelihood estimation for diffusion processes via closed-form density expansions. The Annals of Statistics, 41(3), 1350-1380 [24] Liptser, R. S., & Shiryayev, A. N. (1977). Statistics of random processes I, General Theory, New York: Springer-Verlag. [25] Revuz, D., & Yor, M. (1991). Continuous martingales and Brownian motion, Berlin: SpringerVerlag. [26] Rogers, L. C. G., & Williams, D. (1987). Diffusion, Markov Processes, and Martingales, Vol. 1-2, Chichester: Wiley. [27] Taylor, M. E. (1996). Partial Differential Equations. Basic Theory., New York: Springer. [28] Yoshida, N. (1992). Estimation for diffusion processes from discrete observations. J. Multivar. Anal., 41 220-242. [29] Yor, M. (1975/76). Sur quelques approximations d’integrales stochastiques. Lecture notes in probability, 528 518-528. APPENDIX j Proof of Lemma 6.3. Let kj = k1j1 · · · kdd for nonnegative integers j1 ,..., jd such that m := j1 + · · · + jd ≤ d + 1. Then for x ∈ E, (j) |k|j |Ck (x)| = |Ck (x)| ≤ 1 (2π)d ∂m j j ∂ϑ11 ···∂ϑdd R K0 f (x, ϑ) dϑ ≤ g(x) by the definition of Fourier coefficients, the monotonicity of integral, and (B3). Hence = (1 + |k1 | + · · · + |kd |)d+1 |Ck (x)| = P (d+1)! j d+1 g(x) j0 +j1 +···+jd =d+1 j !j !···j ! |k| |Ck (x)| ≤ (d + 1) 0 1 d by multinomial theorem, which implies the statement of the lemma. Rt t0 R Proof of Lemma 6.4. At first, let us suppose that f is bounded on E. If Mt := ( tt f (Xs ) dWs )2 − 0 f 2 (Xs ) ds, then Itô formula and isometry implies R R E(Mt )2 = 4E( tt ( ts f (Xu ) dWu )f (Xs ) dWs )2 ≤ 2kf k4∞ (t − t0 )2 . 0 Hence, if Nt := Rt t0 0 f (Xs ) dWs then R E(Nt )4 ≤ 2E(Mt )2 + 2E( tt f 2 (Xs ) ds)2 ≤ 6kf k4∞ (t − t0 )2 < +∞. 0 Similarly, if t0 ≤ s < s + h ≤ t then 2 − Ns2 )2 ≤ 2kf k4∞ (4(s − t0 ) + 3h)h → 0, h → 0. E(Ns+h In addition E(f 2 (Xs+h ) − f 2 (Xs ))2 → 0, and Ef 4 (Xs+h ) → Ef 4 (Xs ) when h → 0 by the dominated convergence theorem. Hence ≤ ≤ 2 f 2 (Xs+h )) − E(Ns2 f 2 (Xs ))| ≤ |E(Ns+h 2 2 − Ns2 | + E(Ns2 |f 2 (Xs+h ) − f 2 (Xs )|) ≤ kf k∞ E|Ns+h q p 2 2 kf k∞ E(Ns+h − Ns2 )2 + E(Ns4 )E(f 2 (Xs+h ) − f 2 (Xs ))2 → 0, h → 0 27 APPENDIX implying that s 7→ E(Ns2 f 2 (Xs )), and s 7→ Ef 4 (Xs ) are continuous functions on [t0 , t]. Let x(t) := ENt4 . Since for t0 ≤ s < s + h ≤ t, R 4 − Ns4 ) = 6 ss+h E(Nu2 f 2 (Xu )) du, x(s + h) − x(s) = E(Ns+h by Itô formula, s 7→ x(s) is a differentiable function for s > t0 , and R R x(s + h) − x(s) ≤ 3 ss+h x(u) du + 3 ss+h Ef 4 (Xu ) du. Hence ẋ(s) ≤ 3x(s) + 3Ef 4 (Xs ) for s > t0 , and x(t0 ) = 0, implying that R R R E( ttf (Xs ) dWs )4 = x(t) ≤ 3e3t tt e−3s Ef 4 (Xs ) ds ≤ 3e3(t−t0 ) E( ttf 4 (Xs ) ds). 0 0 0 Now, let f ∈ C(E) be unbounded generally. Then there exists a sequence (fm ) of bounded functions such that for all m, fm ∈ C(E), |fm | ↑ |f |, and fm → f (see the proof of Corollary 6.11). Since limm fm (Xs ) = f (Xs ) and |fm (Xs ) − f (Xs )| ≤ 2|f (Xs )| for all m, and s, it follows that Rt P R f (Xs ) dWs −→ tt f (Xs ) dWs t m 0 0 by the dominated convergence theorem for stochastic integrals. Then there exists a subsequence such that 4 4  R Rt a.s. R a.s. R t t −→ tt f (Xs ) dWs , t fmk (Xs ) dWs t fmk (Xs ) dWs −→ t f (Xs ) dWs ⇒ 0 0 0 0 and hence by Fatou’s lemma and monotone convergence theorem R 4 R 4 E tt f (Xs ) dWs ≤ limk E tt fmk (Xs ) dWs ≤ 0 0 R R 4 ≤ 3e3(t−t0 ) limk E( tt fm (Xs ) ds)) = 3e3(t−t0 ) E( tt f 4 (Xs ) ds). k 0 0 The last inequality follows trivially from the first one. Proof of Lemma 6.5. By applying Itô formula on log-function of b16 over time interval [t, t + h] it follows that (b(Xt+h )/b(Xt ))16 = Mh Zh where   √ R R Mh = exp 16 σ tt+h b′ (Xs ) dWs − σ2 162 tt+h b′2 (Xs ) ds R R is a positive supermartingal (see [24], Lemma 6.1, p.207) since E tt+h b′2 (Xs ) ds ≤ E tt+h r 2 (Xs ) ds ≤ RT 8 (3T + E 0 r (Xs ) ds)/4 < +∞ by assumption (B4) (and so EMh ≤ EM0 = 1), and     R ′ Zh = exp 8 tt+h 2 µ0bb + σ(b′′ b + 15b′2 ) (Xs ) ds . By Markov property and assumption (B4), for 0 < h ≤ h0 , = 0 EZh = E[ E[Z  hR|Ft] ] = ′   h E[ EXt [ exp 8 0 2 µ0bb + σ(b′′ b + 15b′2 ) (Xs ) ds ] ] ≤ E[c(Xt )]. Hence, for 0 < h ≤ h0 ,   √ b(Xt+h ) 8 = E Mh Zh ≤ E b(X ) t 1 (EMh 2 + EZh ) ≤ 1 (1 2 + E c(Xt )) = E c0 (Xt ). Proof of Lemma 6.6. First, let us show that (23) implies (22). In the same way it can be shown that (25) implies (24). Let δ := δn,T , and Ii := hti , ti+1 ]. Then Cauchy-Schwarz inequality and isometry imply 2 R 1 Pn−1 E √ = i=0 Ii (Ck (Xt )−Ck (Xti ))a(Xt ) dt T δ 2 R P (t)) a(X ≤ ))1 (C (X )−C (X = T 12 δ E 0T ( n−1 t ) dt t ti Ii k k i=0 R P T n−1 1 2 2 (42) ≤ T δ E 0 | i=0 (Ck (Xt )−Ck (Xti ))1 Ii (t)| a (Xt ) dt = 2 R P ))1 = (C (X )−C (X (t)) a(X ) dW = T1δ E 0T ( n−1 t t t t Ii k k i i=0 2 Pn−1 R = E √1 ))a(X (C (X )−C (X t ) dWt . t t k k i i=0 I Tδ i Hence it is sufficient to prove that there exist constants K1 > 0, T1 ≥ 0, and n1 such that R 1 Pn−1 2 2 i=0 I E(|Ck (Xt )−Ck (Xti )| a (Xt )) dt ≤ K1 · Kk Tδ i (43) for T > T1 , and n ≥ n1 since the left hand side of (43) is equal to (42). Similarly, to prove (24) and (25) it is sufficient to prove that there exist constants K2 > 0, T0 ≥ T1 , and n0 ≥ n1 such that for T > T0 , and n ≥ n0 ,  2 R 1 Pn−1 2 b(Xt ) − 1 a2 (X )) dt ≤ K · K . (44) t 2 k i=0 I E(|Ck (Xti )| Tδ b(X ) i ti 28 APPENDIX Let j1 ,..., jd be nonnegative integers such that m := j1 + · · · + jd ≤ d + 1, and let ϑ ∈ K0 be fixed. ∂m 1 ˜′′ 2 2 ˜ ˜ ˜′ Then function f˜ := jd f (·, ϑ) ∈ C (E) by (B2). If Af := f µ0 + 2 f ν , then |Af | ≤ g, and j1 ∂ϑ1 ···∂ϑd |f˜′ ν| ≤ g by (B3). Hence by applying Itô formula, Jensen’s inequality, and Lemma 6.4 it follows that R R E(f˜(Xt ) − f˜(Xti ))4 = E( tt Af˜(Xs ) ds + tt (f˜′ ν)(Xs ) dWs )4 ≤ i i Rt R ≤ 8(E( t |Af˜|(Xs ) ds)4 + E( tt (f˜′ ν)(Xs ) dWs )4 ) ≤ i iR R R ≤ 8(δ3 E tt (Af˜)4 (Xs ) ds + 3e3δ E tt (f˜′ ν)4 (Xs ) ds) ≤ 24e3 E tt g 4 (Xs ) ds, i i i R and E(a(Xt )−a(Xti ))4 ≤ 24e3 E tt ā4 (Xs ) ds by an analogy, since we can assume that δ ≤ 1. Similarly, i 2 0 2 2 ˜ ˜ ˜ ˜ ti )(f (Xt ) − f (Xti )) ) = E[a (Xti )E[(f (Xt ) − f (Xti )) |Fti ]] ≤ R R 2E[a2 (Xti )E[δ tt (Af˜)2 (Xs ) ds + ( tt (f˜′ ν)(Xs ) dWs )2 |Ft0i ]] ≤ i i R R 4 E[a2 (Xti ) tt g 2 (Xs ) ds] ≤ 2(t − ti )Ea4 (Xti ) + 2E tt g 4 (Xs ) ds. E(a2 (X ≤ ≤ Hence E((f˜(Xt ) − f˜(Xti ))2 a2 (Xt )) ≤ E(f˜(Xt )−f˜(Xti ))4 + E(a(Xt )−a(Xti ))4 + 2E(a2 (Xti )(f˜(Xt )−f˜(Xti ))2 ) ≤ R R 25e3 E tt g 4 (Xs ) ds + 24e3 E tt ā4 (Xs ) ds + 4(t − ti )Ea4 (Xti ). ≤ ≤ (45) i i i (46) i j Now, let kj = k1j1 · · · kdd . Then (j) (j) |k|j 2 E(|Ck (Xt )−Ck (Xti )|2 a2 (Xt )) = E(|Ck (Xt )−Ck (Xti!)|2 a2 (Xt )) ≤  2 R ∂mf ∂m f 1 E a2 (Xt ) dϑ ≤ , ϑ) (X , ϑ) − (X t t j j d j j i (2π) K0 ∂ϑ11 ···∂ϑdd ∂ϑ11 ···∂ϑdd R R 25e3 E tt g 4 (Xs ) ds + 24e3 E tt ā4 (Xs ) ds + 4(t − ti )Ea4 (Xti ) ≤ ≤ i (47) i by the definition of Fourier’s coefficients, Jensen’s inequality, Fubini’s theorem, and (46). Hence (1 + |k1 | + · · · + |kd |)2(d+1) E(|Ck (Xt )−Ck (Xti )|2 a2 (Xt )) ≤ (d + 1)d+1 (1 + |k1 |2 + · · · + |kd |2 )d+1 E(|Ck (Xt )−Ck (Xti )|2 a2 (Xt )) = P (d+1)! (d + 1)d+1 j0 +···+jd =d+1j !j !···j ! |k|j 2 E(|Ck (Xt )−Ck (Xti )|2 a2 (Xt )) ≤ 0 1 d R R t (d + 1)2(d+1) (25e3 E t g 4 (Xs ) ds + 24e3 E tt ā4 (Xs ) ds + 4(t − ti )Ea4 (Xti )) ≤ = ≤ i implying that E(|Ck (Xt )−Ck (Xti )|2 a2 (Xt )) ≤ R R Kk2 (25e3 E tt g 4 (Xs ) ds + 24e3 E tt ā4 (Xs ) ds + 4(t − ti )Ea4 (Xti )). ≤ Finally, if K′ i i i = 5e3/2 then R 1 Pn−1 ≤ ≤ 2 2 t )−C (Xti )| a (Xt )) dt ≤ i=0 IiE(|Ck (X Tδ R R t 4k Pn−1 4 2 1 Pn−1 4 ′ 2 Kk K E( T δ i=0 I t (g + ā )(Xs ) ds dt + T1 i=0 a (Xti )∆i t) i i R R P 4 Kk2 K ′ 2 ( T1 E 0Tg 4 (Xt ) dt + T1 E 0T(ā4 (Xt ) dt + n−1 i=0 a (Xti )∆i t)). ≤ Assumptions (B1-3) imply that there exist T1 ≥ 0 and n1 ∈ N such that the expression in the parentheses on the right hand side of the above inequality is bounded by a constant K ′′ 2 = Cg + Ca > 0 for T > T1 and n ≥ n1 . Hence K1 := K ′ K ′′ > 0 in (43) and the statements (22-23) are proved. To prove (44) and hence statements (24-25) notice that √ R   R  ′ b(Xt ) σ tt b′ (Xs ) dWs + tt µ0bb + σ2 (b′′ b − b′2 ) (Xs ) ds = exp b(X ) 0 0 t0 from the proof of Lemma 6.5. It follows that R b(Xs ) √ ′ ′ b(Xt ) σb (Xs ) dWs + ( µ0bb + − 1 = tt b(X ) b(X ) t0 0 t0 σ ′′ b b)(Xs ) ds 2  (48) by Itô formula applied on the exponential function. Now, in the same way as equations (45) and (46) have been derived we obtain the following: for subdivisions such that δn,T ≤ h0 , and f˜ := ∂m ′ ′′ > 0, jd f (·, ϑ), ϑ ∈ K0 , and some constants C , C j1 ∂ϑ1 ···∂ϑd ≤ ≤  2 2  b(X ) b(X ) E(a2 (Xti )f˜2 (Xti ) b(X t ) − 1 ) ≤ E(a2 (Xti )g 2 (Xti ) b(X t ) − 1 ) ≤ ti ti R  b(X ) 4 C ′ (2(t − ti )E(a4 (Xti )g 4 (Xti )) + 2E tt b(X s ) r 4 (Xs ) ds) ≤ ti i R C ′ (2(t − ti )E(a4 (Xti )g 4 (Xti )) + (t − ti )E c0 (Xti ) + E tt r 8 (Xs ) ds), i 29 APPENDIX and, ≤   2 2 b(X ) b(X ) E(f˜2 (Xti ) b(X t ) − 1 a2 (Xt )) ≤ E(g 2 (Xti ) b(X t ) − 1 a2 (Xt )) ≤ ti  4 ti b(X ) E(g 4 (Xti )(a(Xt ) − a(Xti ))4 ) + E b(X t ) − 1 + ti  2 R b(Xt ) +2 E(a2 (Xti )g 2 (Xti ) b(X − 1 ) ≤ C ′′ ( E(g 4 (Xti ) tt ā4 (Xs ) ds)+ ti ) i R +(t − ti )E(a4 (Xti )g 4 (Xti )) + (t − ti )E c0 (Xti ) + E tt r 8 (Xs ) ds) i by Lemma 6.5, (B1) and (B4). Hence there exist K2 > 0, T0 ≥ T1 , and n0 ≥ n1 such that for all T > T0 and n ≥ n0 , (44) follows in the same way as (43) has been followed from (45) and (46) by using (B1-4). and Proof of Lemma 6.7. By Lemma 6.3, P P d+1 k∈Zd k∈Zd |Ck (x)| ≤ g(x)(d + 1) 1 , (1+|k1 |+···+|kd |)d+1 1 k∈Zd (1+|k1 |+···+|kd |)d+1 =  P∞ P∞ Pd r 2 + r=1 dr r d+1 k1 =1 · · · k =1  P∞r P 2r P∞ · · · + dr=1 dr r d+1 k1 =1 kr =1 P ≤ 1 ≤ 1 = 1+ d 2r r=1 r r d+1 Pd  P∞ k=1 1 d+1 k r r .  r k1 +···+kr 1 d+1 d+1 d+1 k1 r ···kr r = ≤ d+1 P r ) < +∞ for all r ≤ d, it follows that K < +∞. Moreover, for any N , ϑ ∈ K0 , and Since k (1/k P x ∈ E, |SN (x, ϑ) − f (x, ϑ)| ≤ |k|>N |Ck (x)| ≤ Kg(x), implying the statements of the lemma. R f (y,ϑ) Proof of Lemma 6.8. Let x0 ∈ E be fixed and F (x, ϑ) := xx ν(y) a(y) dy. Then F is a continuous 0 function on E × Θ. By Itô formula applied on F , RT 0 f (Xt , ϑ)a(Xt ) dWt R=   = F (XT , ϑ)−F (X0 , ϑ) − 0T f (·, ϑ) µν0 − 21 ν ′ a + 12 (f (·, ϑ)a)′ ν (Xt ) dt, which is a continuous function on Θ. Proof of Lemma 6.9. Let Ii := hti , ti+1 ]. Notice that   2 R ti+1 b(Xt ) 1 Pn−1 1 dWt − (∆i W )2 ≤ ti i=0 ∆i t T b(Xti ) R Pn−1 1 R b(Xt ) b(Xt ) 2 Pn−1 1 2 ≤ T1 i=0 ∆ t ( I ( b(X ) − 1)dWt ) + T | i=0 ∆ t ∆i W I ( b(X ) − 1)dWt |. 1 T E Pn−1 i=0 ti i i Since ti i i 1 T Pn−1 ti i i R 1 ( ( b(Xt ) − 1)dWt )2 = ∆ t I b(X ) 1 i=0 ∆i t R Ii b(Xt ) − 1)2 dt E( b(X ) ti by the isometry, it follows that this expression is bounded by a constant for all T > T1 and n ≥ n1 and some T1 ≥ 0 and n1 in the same way as in the proof of Lemma 6.6 since (B4) holds. It remains to prove the same for the second expression from the right hand side of the above inequality. By applying Ito formula and (48) the following holds: R b(Xt ) ∆i W I ( b(X − 1)dWt = ti ) i R R t b(X R b(Xt ) b(Xt ) s) ( = ( − 1)dW s + (Wt − Wti )( b(X ) − 1)) dWt + I ( b(X ) − 1) dt = Ii ti b(Xti ) ti ti i  R t b(Xs ) R √ b(Xt ) ′ b(Xt ) σ∆ t )( − 1)dW + (W − W − 1) + b (Xt )+ ( = s t t i ti b(Xti ) Ii b(Xti )  i b(XtRi ) R R t b(Xs ) ′ √ t b(Xs ) + σ(t − ti ) t b(X ) b (Xs ) dWs dWt + I t b(X ) v(Xs ) ds dt ti i i i ti where v := (µ(·, ϑ)/b)b′ + (σ/2)bb′′ . Then by applying the isometry and Cauchy inequality, and by assuming that T ≥ 1, R P b(Xt ) 1 2 E( T1 | n−1 i=0 ∆i t ∆i W Ii ( b(Xti ) − 1)dWt |) ≤  R R R P b(X ) b(Xt ) t n−1 1 1 s 2 4 ≤ T1 i=0 2 ∆i t2 Ii ti E( b(Xt ) − 1) ds dt + ∆i t2 Ii E( b(Xt ) − 1) dt + 2∆i t+ i R R i R t b(Xs ) 2 2 b(Xt ) 2 2 +σ I E( b(X ) ) r (Xt )dt + σ I t E( b(X ) ) r (Xs ) ds dt+ ti ti  i i i R R b(Xs ) 2 2 ) r (X ) ds dt +(1 + σ/2) ∆1 t I tt E( b(X s ) i i i ti since |v| ≤ (1 + σ/2)r and |b′ | ≤ r for function r from (B4). For all terms on the right hand side of the above inequality we can prove boundedness in the same way as in the proof of Lemma 6.6 by using 30 APPENDIX (B4), except for the following one for which we have to use the additional assumptions of the lemma to obtain the boundedness. First by using (48), then Lemma 6.4, and Ito formula we obtain the following: R b(Xt ) 1 Pn−1 1 4 i=0 ∆i t2 Ii E( b(Xt ) − 1) dt ≤ T i R P T 4 ≤ K ′ (1 + T1 E( 0 (r 8 + r 16 + (b2 b′′′ )8 )(Xt ) dt + n−1 i=0 (c0 + r )(Xti )∆i t) for some constant K ′ > 0. Now, the statement of the lemma follows. Proof of Lemma 6.12. By applying (1) it follows that: P (∆i X−µ(Xti ,ϑ)∆i t)2 (∆i W )2 = − n−1 i=0 ∆i t σb2 (Xti )∆i t Pn−1 1 R ti+1 µ(Xt ,ϑ)−µ(Xti ,ϑ) 2 dt − ti i=0 ∆i t b(Xt ) R ti+1 µ(Xt ,ϑ)−µ(Xti ,ϑ) Pn−1√ R ti+1 b(Xt ) i 1 −2 i=0 σ ti b(X ) dWt · ∆ t ti b(Xti ) i  R t ti  P i+1 b(Xt ) 1 dWt )2 − (∆i W )2 . +σ n−1 i=0 ∆ t ( ti b(Xt ) Pn−1 i=0 = i (E1) dt+ (E2) (E3) i First we will prove that the expression from the left hand side of the above equation is bounded in L1 -norm by a constant for all n ≥ n0 (for some n0 ) in case when all functions µ(·, ϑ), b and their appropriate partial derivatives are bounded on E, and then the statement of the lemma will follow by using local compactness of E and Markov’s inequality just in the same way as in the proof of Corollary 6.11. Let f := µ(·, ϑ)/b and Ii := hti , ti+1 ], and let n be such that δn,T ≤ 1. Then the expectation of (E1) is dominated by R µ(X ,ϑ)−µ(X ,ϑ) 2 R µ(X ,ϑ)−µ(X ,ϑ) 2 P P t ti t ti 1 1 E n−1 dt ≤ E n−1 dt ≤ i=0 ∆i t Ii i=0 (∆i t)2 Ii b(Xti ) b(Xti )  P n−1 1 R 2 dt+ )) E(f (X ) − f (X ≤ 2T T1 t ti i=0 ∆ t Ii  Pn−1 1 Ri 2 (X )( b(Xt ) − 1)2 dt ≤ T C ′ . + T1 Ef t i=0 ∆ t I b(X ) i ti i The existence of a constant C ′ > 0 follows in the same way as in the proof of Lemma 6.6 since (B1-4) hold for bounded functions by Remark 6.2. L1 -norm of (E2) is dominated by R µ(X ,ϑ)−µ(X ,ϑ) 2 R t 2 P P t ti i+1 b(Xt ) 1 E n−1 dWt + E n−1 dt ≤ ti i=0 i=0 (∆i t)2 Ii b(Xti ) b(Xti ) 1 Pn−1 ′ ≤ T K(1 + n i=0 c0 (Xti )) + T C for some constant K > 0 by the isometry and Lemma 6.5. Now, boundedness of L1 -norm of (E2) follows from (B4). L1 -norm of (E3) is bounded by Lemma 6.9 and Remark 6.10. Proof of Lemma 6.14. Let f := µ/b : E × Θ → R, and let f0 := µ(·, ϑ0 )/b. For nonnegative integers ∂3 ∂3 2 ˆ j1 ,..., jd such that j1 + · · · + jd = 3, let f˜ := jd f , and f := jd (f ). Then for T > 0, j1 j1 and ϑ ∈ Θ, ∂ϑ1 ···∂ϑd ∂3 j ℓT (ϑ) j ∂ϑ11 ···∂ϑdd = RT 0 ∂ϑ1 ···∂ϑd √ (f˜(·, ϑ)f0 − 12 fˆ(·, ϑ))(Xt ) dt + σ RT 0 f˜(Xt , ϑ) dWt by (10) and (1). Since (H2b-3b) hold, from the proof of Corollary 6.13 it follows that R R 2 (X ) dt, supϑ∈Θ T1 | 0T (f˜(·, ϑ)f0 − 12 fˆ(·, ϑ))(Xt ) dt| ≤ C T1 0T g00 t where C := 1+7·23 . The right hand side of the above inequality Pθ0 -a.s. converge to a finite nonrandom limit L0 = L0 (ϑ0 ) by the ergodic property of X. Hence on an Pθ0 -a.s. event there exists T0′ ≥ 0 such that for all T > T0′ , RT 2 R 1 2 (X ) dt − L | + L ≤ 1 + L . g00 (Xt ) dt ≤ | T1 0T g00 t 0 0 0 T 0 Let us suppose that b > 0. The case when b < 0 can be analyzing in the same way. By applying Itô R g0 (y) R f˜(y,ϑ) dy, we get the following formula twice, first on function x 7→ xx b(y) dy, and then on x 7→ xx b(y) 0 ≤ ≤ 0 √ R √ R R ˜(y,ϑ) σ | 0Tf˜(Xt , ϑ) dWt | = | Tσ xXT f b(y) dy− T1 0T(f˜f0 − σ2 (f˜b′ − f˜′ b))(Xt ) dt| T 0 √ R R σ 0 (y) | xXT gb(y) dy| + T1 0T (g02 + σ2 (g1 + 2g0 |b′ |))(Xt ) dt ≤ T 0 √ R σ RT | 0 g0 (Xt ) dWt |+ T1 0T(2g02 + σ2 (g0 +|g0′ b| + g1 + 2g0 |b′ |))(Xt ) dt. T ≤ Since the right hand side of the above inequality Pθ0 -a.s. converge to a finite nonrandom limit (by ergodic property and the law of large numbers for continuous martingales since (H2b-3b) hold), and since it is also an upper bond for the left hand side uniformly for all ϑ ∈ Θ, and all partial derivatives 31 APPENDIX of the third order, there exists a constant C ′′ > 0 such that Pθ0 -a.s. there exists T0′′ ≥ T0′ such that for all T > T0′′ , supϑ∈Θ T1 |D 3 ℓT (ϑ)|∞ ≤ C ′′ . From the definition of operator norm, for the same T , supϑ∈Θ 1 T |D 3 ℓT (ϑ)| ≤ supϑ∈Θ 1 T d3/2 |D 3 ℓT (ϑ)|∞ ≤ d3/2 C ′′ =: C2 . By the same arguments we can prove that there exist constants C0 > 0, C1 > 0 such that Pθ0 -a.s. exists T0 ≥ 0 such that T0 ≥ T0′′ , and for all T > T0 , supϑ∈Θ T1 |D r+1 ℓT (ϑ)| ≤ Cr for r = 0, 1. Finally, the statements of the lemma follow from the mean value theorem and Taylor expansion (39) from the proof of Theorem 4.5, where ϑ̂T and ϑ0 are replaced with ϑ1 and ϑ2 respectively. Proof of Lemma 6.15. Let C0 > 0 and Ω0 be an event, both from Lemma 6.14 such that Pθ0 (Ω0 ) = 1 and on Ω0 for all T ≥ T0 , and all ϑ1 , ϑ2 ∈ Ω, |ℓT (ϑ1 ) − ℓT (ϑ2 )| ≤ T C0 |ϑ1 − ϑ2 |. Let K0 > 0 be Lipschitz constant of function ℓϑ0 , and let ε > 0 be an arbitrary number. Let δ := ε/(2(C0 + K0 )). Since {K(ϑ, δ) : ϑ ∈ Θ} is an open cover of compact Θ, there exists a finite subcover {K(ϑi , δ) : i = 1, . . . , Kε }. Let Ω1 be an Pθ0 -a.s. event such that on this event there exists Tε ≥ T0 such that for all T ≥ Tε , and 1 ≤ j ≤ Kε , | T1 ℓT (ϑj )−ℓϑ0 (ϑj )| < ε/(2Kε ). Then on Ω0 ∩ Ω1 for all ϑ ∈ Θ there exists i = i(ϑ) ≤ Kε such that ϑ ∈ K(ϑi , δ), and ≤ < Hence | T1 ℓT (ϑ)−ℓϑ0 (ϑ)| = | T1 ℓT (ϑ)− T1 ℓT (ϑi )+ T1 ℓT (ϑi )−ℓϑ0(ϑi )+ℓϑ0(ϑi )−ℓϑ0(ϑ)| ≤ P ε 1 C0 |ϑ − ϑi |+ K j=1 | T ℓT (ϑj )−ℓϑ0 (ϑj )|+K0 |ϑ − ϑi | < ε = ε. (C0 + K0 )δ + Kε 2K supϑ∈Θ | T1 ℓT (ϑ)−ℓϑ0 (ϑ)| ε < ε which proves the lemma.
Keep reading this paper — and 50 million others — with a free Academia account
Used by leading Academics
E.i.abdul Sathar
University of Kerala
Adebayo D Agunbiade
Olabisi Onabanjo University, Ago-Iwoye, Nigeria
Heidi Jane Smith
Universidad Iberoamericana - Mexico
Srinivasa Rao Gadde
The University of Dodoma, Tanzania