arXiv:1607.06699v2 [math.ST] 20 Aug 2018
Estimating a class of diffusions from discrete
observations via approximate maximum likelihood
method∗
Miljenko Huzak†
Abstract. An approximate maximum likelihood method of estimation of diffusion parameters
(ϑ, σ) based on discrete observations of a diffusion X along fixed time-interval [0, T ] and Euler approximation
of integrals is analyzed. We assume that X satisfies a SDE of form dXt = µ(Xt , ϑ) dt +
√
σb(Xt ) dWt , with non-random initial condition. SDE is nonlinear in ϑ generally. Based on assumption that maximum likelihood estimator ϑ̂T of the drift parameter based on continuous observation of
a path over [0, T ] exists we prove that measurable estimator (ϑ̂n,T , σ̂n,T ) of the parameters obtained
from discrete observations of X along [0, T ] by maximization of the approximate log-likelihood function
√
exists, σ̂n,T being consistent and asymptotically normal, and ϑ̂n,T − ϑ̂T tends to zero with rate δ n,T
in probability when δn,T = max0≤i<n (ti+1 − ti ) tends to zero with T fixed. The same holds in case of
an ergodic diffusion when T goes to infinity in a way that T δn goes to zero with equidistant sampling,
and we applied these to show consistency and asymptotical normality of ϑ̂n,T , σ̂n,T and asymptotic
efficiency of ϑ̂n,T in this case.
Key words. parameter estimation, diffusion processes, discrete observation
AMS subject classifications. 62M05, 62F12, 60J60
1
Introduction
Let X = (Xt , t ≥ 0) be an one-dimensional diffusion which satisfies Itô’s stochastic
differential equation (SDE) of the form
Rt
Rt√
(1)
Xt = x0 + 0 µ(Xs , ϑ) ds + 0 σ b(Xs ) dWs , t > 0.
Here, W = (Wt , t ≥ 0) is an one-dimensional standard Brownian motion, µ and b are
real functions such that they ensure the uniqueness in law of a solution to (1) and x0 is
a given deterministic initial value of X (see e.g. [25] as a reference for SDE).
The problem is to estimate unknown vector parameter θ = (ϑ, σ) of X, given a
discrete observation (Xti , 0 ≤ i ≤ n) of a trajectory (Xt , t ∈ [0, T ]) over a time interval
subdivision 0 =: t0 < t1 < · · · < tn := T , (n is a positive integer) with diameter
δn,T := max0≤i<n (ti+1 − ti ), T > 0 being fixed. Component ϑ of θ is a (vector) drift
parameter, and σ is a diffusion coefficient parameter. We assume that ϑ belongs to drift
parameter space Θ, which is an open and convex set in Euclidean space Rd , and that σ
is a positive real number. Hence, θ = (ϑ, σ) is an element of open and convex parameter
space Ψ := Θ × h0, +∞i.
∗ This work has been partially supported by Croatian Science Foundation under the project 3526,
and by Ministry of Science, Education and Sports, Republic of Croatia, Grants 037-0372790-2800 and
037058.
† Department of Mathematics, Faculty of Science, University of Zagreb, Bijenička 30, HR-10002
Zagreb, Croatia (huzak@math.hr)
1
ESTIMATING A CLASS OF DIFFUSIONS VIA APPROXIMATE ML METHOD 2
Diffusion parameter estimation problems based on discrete observations have been
discussed by many authors (see [1, 2, 3, 4, 9, 10, 12, 19, 20, 22, 23, 28]). Although
the maximum likelihood estimator (MLE) has the usual good properties (see [9]), it
may not be possible to calculate it explicitly because the transition density of process
X is generally unknown and so the likelihood function (LF) of the discrete process is
unknown as well. Hence, other methods of estimations have to be considered.
The method of parameter estimation which is discussed in this paper and described in
Section 3 below, is based on a Gaussian approximation of the transition density and can
be interpreted as based on maximization of a discretized continuous-time log-likelihood
function (LLF) as well. Such methods are usually called quasi-likelihood or approximate
maximum likelihood (AML) methods, and in these ways obtained estimators we will
briefly call approximate maximum likelihood estimators (AMLEs).
Motivation for analyzing the method described in Section 3 is in the fact that it can
provide us with useful estimators of the parameters. It is well known that in a such way
obtained AMLE of diffusion coefficient parameter σ is consistent and asymptotically
normally distributed over fixed observational time interval [0, T ] when δn,T → 0 (see
[10] in case where all drift parameters are known, and see [14] in general cases). The
same holds in ergodic diffusion cases when T → +∞ in a way that δn,T = T /n → 0 for
appropriate equidistant sampling (see e.g. [12] or [19]). Local asymptotic properties of
the AMLE of drift parameters over fixed interval [0, T ] and when δn,T → 0 are less known
especially in more general cases, particularly when drift is nonlinear in its parameters
(see [5]). Although a knowledge of local asymptotic properties of drift parameter AMLEs
does not imply their consistency or asymptotic normality necessarily it may help in
further analysis of the AMLEs which might include, for example, measuring effects of
discretization on the estimator’s standard errors with applications in simulation studies.
In ergodic diffusion cases it is well known that the AMLE of drift (vector) parameter
is consistent and asymptotically normal and efficient when T → +∞ in a way that
2
T δn,T
→ 0 for equidistant sampling (see e.g. [12] for one-dimensional case and [19] for
vector and more general cases) but the rate of convergence of ϑ̂n,T − ϑ̂T to zero are still
less investigated. Let us stress that the problems of statistical inferences about diffusion
drift parameters are very important especially in biomedical modeling (see [16]).
For the completeness we should also stress that local convergence of the AMLE of
both vector parameters θ = (ϑ, σ) to the MLE of θ based on discrete observations
and equidistant sampling, have been investigated (see [1, 3, 23]). Let θ̃n,δ denote MLE
(k)
of θ based on discrete observations with δn,T ≡ δ = const., and let θ̃n,δ be AMLE
obtained from an approximate LF based on a closed-form kth order approximation of
the transition densities. Then in case of Hermite-polynomial-based analytical expansion
(k )
approach for approximation of transition density, θ̃n,δn − θ̃n,δ → 0 when kn → ∞, and
a sequence (kn ) can be chosen sufficiently large to deliver any rate of convergence (see
[1]), and there exist sequences of regular matrices (Sn,δ ) and positive numbers (δn ) such
(k)
−1
(θ̃n,δn − θ̃n,δn ) = OP (1) (see [3]). For an alternative approach to
that δn → 0 and Sn,δ
n
approximation and analog results, see [23].
In this paper we analyze the considered AMLE of drift parameters by studying
the relation between the AMLE and the MLE obtained from continuously observed
diffusion paths. We state general conditions for proving and prove: (1.) existence and
p
measurability of the AMLE, (2.) that ϑ̂n,T − ϑ̂T converges to zero with rate δn,T in
probability when δn,T → 0 over fixed bounded observational time interval [0, T ], and
p
(3.) that ϑ̂n,T − ϑ̂T converges to zero with rate δn,T in probability when T → +∞
in a way that δn,T = T /n → 0 in an ergodic diffusion case and equidistant sampling.
ESTIMATING A CLASS OF DIFFUSIONS VIA APPROXIMATE ML METHOD 3
We apply these findings in proving: (4.) measurability, consistency and asymptotic
normality of diffusion coefficient parameter AMLEs when δn,T → 0 in both cases: when
T is fixed, and in an ergodic diffusion case when T → +∞ and T δn,T = T 2 /n → 0 with
equidistant sampling, and (5.) consistency and asymptotic normality and efficiency of
drift parameter AMLEs in an ergodic case when T → +∞ in a way that T δn,T → 0
with equidistant sampling.
Properties (1.-2.) for drift parameter AMLEs were proved in [22] in cases when drift
depended linearly on its parameters. For detailed review of liner case see [5]. The first
nonlinear case was covered by the author in his Ph.D. thesis [15]. The main assumption
was that the drift was an analytic function in its parameters with properly bounded
derivatives of all orders. In this paper we only assume that the drift has at least d + 3
continuous derivatives with respect to the drift parameters (d is a dimension of the drift
parameter vector). The main difficulty was in proving core technical Theorem 6.1 of
Section 5. Although facts (4.-5.) have been already known we included these alternative
proofs for completeness and the illustrative purposes of the applicability of the findings
(1.-3.) and in this paper developed methods. We belive that other discretization schemes
(for example, of higher order) can be analyzed similarly by using the techniques of this
paper.
The paper is organized in the following way. In the next section we introduce notation
used through the paper. The discussed method of estimation is described in Section 3.
The main results are presented in Section 4. Examples are provided in Section 5. The
proofs of the main results are in the last section. Lemmas are proved in Appendix.
2
Notations
Let | · | denote Euclidean norm in Rd and its induced operator norm, and let | · |∞ be
max-norm. If f is a bounded real function, kf k∞ := supz |f (z)| is a sup-norm of f .
Let Lp (P) be the Banach space of all random variables with finite p-th moment and let
k · kLp (P) denote its norm.
If (x, ϑ) 7→ f (x, ϑ) is a real function defined on an open subset of R × Rd , then we
denote by Dϑm f (x, ϑ) the m-th partial derivative with respect to ϑ. Let |Dϑm f (x, ϑ)|∞ :=
m
maxj1 +···+jd =m | j1∂ f jd |. In this case we say that Dϑm f (x, ϑ) is bounded if all partial
derivatives
∂ϑ1 ...∂ϑd
∂mf
j
j
∂ϑ11 ...∂ϑdd
(x, ϑ) are bounded, and kDϑm f k∞ := maxj1 +···+jd =m k
Dϑ2 f (x, ϑ)
< Ø means that the Hessian Dϑ2 f (x, ϑ)
positively definite matrix. Dz0 f ≡ f by
m
∂m f
j
j
∂ϑ11 ...∂ϑdd
k∞ .
The notation
is a negatively definite
matrix. Similarly for a
convention. The m-th
derivative of f at a point z we simply denote by D f (z).
Let K and Θ be open sets in Rd . The closure and the boundary of K will be denoted
by K and ∂K respectively, and the σ-algebra of Borel subsets of Θ by B(Θ). If K ⊂ Θ is
an open set such that K is compact in Θ then we will say that K is a relatively compact
set in Θ.
Let (γn , n ≥ 1) be a sequence of positive numbers and let (Yn , n ≥ 1) be a sequence
of random variables defined on some probability space. We will say that (Yn , n ≥ 1) is
OP (γn ), and write Yn = OP (γn ), if the sequence (Yn /γn , n ≥ 1) is bounded in probability,
i.e. if
limA→+∞ limn P{γn−1 |Yn | > A} = 0.
ESTIMATING A CLASS OF DIFFUSIONS VIA APPROXIMATE ML METHOD 4
3
Estimation method
Let 0 = t0 < t1 < · · · < tn = T be discrete times at which diffusion X is observed,
and let us denote by ∆ the difference operator defined in the following way: if F is a
function defined on [0, T ] then ∆i F := F (ti+1 ) − F (ti ), 0 ≤ i < n.
Let us discretize SDE (1) over interval [ti , ti+1 ] by using the Euler approximation of
the both types of integrals:
√
Xti+1 − Xti ≈ µ(Xti , ϑ)(ti+1 − ti ) + σ b(Xti )(Wti+1 − Wti ).
In this way the following stochastic difference equation is obtained:
√
∆i Z = µ(Zi , ϑ) ∆i t + σ b(Zi ) ∆i W
(2)
for 0 ≤ i < n, and Z0 = x0 . Solution to (2) is a time-discrete process Z = (Z0 , Z1 , . . . , Zn )
that is an approximation of X over [0, T ]. Up to the constant not depending on the
parameters a LLF of the process Z is
2
Pn−1
i ,θ)∆i t)
+
log
σ
.
(3)
− 21 i=0 (∆i Z−µ(Z
2
σb (Zi )∆i t
Criterion function
Ln,T (θ) = Ln,T (ϑ, σ) := − 21
Pn−1 (∆i X−µ(Xti ,ϑ)∆i t)2
σb2 (Xti )∆i t
i=0
+ log σ
(4)
is obtained from (3) by substituting (Zi , 0 ≤ i ≤ n) with discrete observations (Xti , 0 ≤
i ≤ n) of diffusion X. Notice that
1
Ln,T (ϑ, σ) = − 2σ
where
ℓn,T (ϑ) =
Pn−1
i=0
(∆i X)2
i=0 b2 (Xti )∆i t
Pn−1
µ(Xti ,ϑ)
b2 (Xti ) ∆i X
−
−
1
2
n
2
log σ + σ1 ℓn,T (ϑ),
Pn−1
i=0
µ2 (Xti ,ϑ)
b2 (Xti ) ∆i t
(5)
depends only on drift parameter ϑ.
A point of maximum θ̂n,T = (ϑ̂n,T , σ̂n,T ) of function (4) in Ψ is an AMLE of vector
parametar θ if it exists. Notice that if AMLE exists then necessary
(
Dℓn,T (ϑ̂n,T ) = 0
Pn−1 (∆i X−µ(Xti ,ϑ̂n,T )∆i t)2
DLn,T (ϑ̂n,T , σ̂n,T ) = 0 ⇔
(6)
σ̂n,T = n1 i=0
.
b2 (Xt )∆i t
i
Hence every stationary point ϑ̂n,T of function ℓn,T uniquely determines second component σ̂n,T of stationary point θ̂n,T = (ϑ̂n,T , σ̂n,T ) of function Ln,T by the following
expression:
Pn−1 (∆i X−µ(Xti ,ϑ̂n,T )∆i t)2
(7)
σ̂n,T = n1 i=0
.
b2 (Xt )∆i t
i
Moreover, if ϑ̂n,T is a unique point of the global maximum of function ℓn,T then θ̂n,T is
a unique point of the global maximum of function Ln,T . Hence to prove existence of a
measurable AMLE θ̂n,T it is sufficient to prove that there exists a measurable point of
maximum of function ℓn,T .
ESTIMATING A CLASS OF DIFFUSIONS VIA APPROXIMATE ML METHOD 5
4
Main results
4.1
Fixed maximal observational time case
Let the following assumptions be satisfied.
(H1a): For all θ = (ϑ, σ) ∈ Ψ, there exists a strong solution (X, W ) of the SDE (1)
on time interval [0, +∞i with values in open interval E ⊆ R.
(H2a): For all ϑ ∈ Θ, µ(·, ϑ) ∈ C 2 (E) and b ∈ C 3 (E). Moreover for all x ∈ E,
b(x) 6= 0 and sign b = const.
For example, by Theorem 5.2.2 in [13], (H1a) will be satisfied if in addition to
(H2a) we assume that for all ϑ ∈ Θ SDE (1) satisfies so called the bounded linear
growth assumption, i.e. that there exists a positive constant C such that for all x ∈ E,
|µ(x, ϑ)| + |b(x)| ≤ C(1 + |x|). More precisely, (H2a) states that the functions x 7→ b(x)
and x 7→ µ(x, ϑ), ϑ ∈ Θ, are continuously differentiable in E and hence locally Lipschitz.
In this case there exists a strong, continuous and pathwise unique solution to SDE (1) on
time interval [0, +∞i. However, there are some SDEs which satisfy (H1a) and (H2a)
but do not satisfy the linear growth assumption (see e.g. Example 5.1 of Section 5).
(H3a): For all (x, ϑ) ∈ E × Θ and all 1 ≤ m ≤ d + 3, there exists partial derivatives
∂
∂2
m
Dϑm µ(x, ϑ), ∂x
Dϑm µ(x, ϑ),and ∂x
2 Dϑ µ(x, ϑ) of drift function µ. Moreover, for all 0 ≤
2
∂
∂
m
m ≤ d + 3, Dϑm µ, ∂x
Dϑm µ, ∂x
2 Dϑ µ ∈ C(E × Θ).
Let Pθ denote the law of X for θ ∈ Ψ. We assume that probabilities Pθ , θ ∈ Ψ,
are defined on filtered space (Ω, (FT0 , T ≥ 0)) where Ω is a set of continuous functions
ω : [0, +∞i → E such that ω(0) = x0 , FT0 is a σ-algebra generated by the coordinate
functions up to the time T , and the filtration is augmented in so called the usual way
(see e.g. I.4 in [25]). On this space, coordinate process (ω 7→ ω(t), t ≥ 0) is a canonical
version of X (see [25], I.§3). Hence, for each T > 0 we assume that X is defined on the
measurable space (Ω, FT0 ) as a canonical process with law Pθ .
For the moment, let us assume that we are able to observe the process (Xt , 0 ≤ t ≤ T )
continuously. Because diffusion coefficient parameter σ can be uniquely determined
through equation
σ=
limn
P2n
i=1 (XjT 2−n −X(j−1)T 2−n )
RT
b2 (Xt ) dt
0
2
(8)
(a.s. Pθ )
(see [8]) since b2 > 0 by (H2a), the estimation problem from continuously observed
process can be reduced to an estimation problem for drift parameter ϑ ∈ Θ. In this
case for every fixed diffusion parameter σ assumed to be known, and every two different
ϑ1 , ϑ2 ∈ Θ, probability measures P(ϑ1 ,σ) and P(ϑ2 ,σ) are equivalent on FT0 , and
RT
RT 2
2
dP 2 ,σ)
(Xt ,ϑ1 )
t ,ϑ1 )
log dP(ϑ
= σ1 ( 0 µ(Xt ,ϑb22)−µ(X
dXt − 21 0 µ (Xt ,ϑb22)−µ
dt)
(Xt )
(Xt )
(ϑ ,σ)
1
where
FT0
dP(ϑ2 ,σ)
dP(ϑ1 ,σ)
denotes Radon-Nikodym derivative of P(ϑ2 ,σ) with respect to P(ϑ1 ,σ) on
dP
(see [11]). If we fix some ϑ∗ ∈ Θ, a continuous-time LLF is ϑ 7→ log dP(ϑ(ϑ,σ)
. Up to
∗ ,σ)
the constant and factor not depending on ϑ, function
RT
R
2
1 T µ (Xt ,ϑ)
t ,ϑ)
(9)
ℓT (ϑ) := 0 µ(X
b2 (Xt ) dXt − 2 0
b2 (Xt ) dt.
is equal to the LLF. Hence, ℓT will be called a continuous-time LLF (see [21]). Assumption (H3a) implies that ℓT is at least three-times continuously differentiable function
ESTIMATING A CLASS OF DIFFUSIONS VIA APPROXIMATE ML METHOD 6
on Θ, and for 1 ≤ m ≤ d + 3, its derivatives are equal to (see [21] for m ≤ 2)
Dm ℓT (ϑ) =
RT
0
1
m
b2 (Xt ) Dϑ µ(Xt , ϑ) dXt
−
1
2
RT
0
1
m 2
b2 (Xt ) Dϑ µ (Xt , ϑ) dt.
(10)
(H4a): For all ω ∈ Ω, function ϑ 7→ ℓT (ϑ) = ℓT (ϑ, ω) has a unique point of global
maximum ϑ̂T = ϑ̂T (ω) in Θ. Moreover, Dϑ2 ℓT (ϑ̂T ) < Ø.
Assumption (H4a) enables property (ii) in Theorem 4.1 below, to be proved. If
(H3a) and (H4a) hold then Lemma 4.1. from [17] implies that (ω, ϑ) 7→ ℓT (ϑ)(ω) is
an FT0 ⊗ B(Θ)-measurable function, and continuous-time MLE ϑ̂T is an FT0 -measurable
random variable.
Let Fn,T be a σ-subalgebra of FT0 generated by discrete observation (Xti , 0 ≤ i ≤ n)
of process (Xt , 0 ≤ t ≤ T ). Notice that if (H3a) holds then (ω, ϑ) 7→ ℓn,T (ϑ, ω) (given
by 5) is an Fn,T ⊗ B(Θ) measurable function by Lemma 4.1. in [17].
If ℓn,T is a concave function on Θ then a stationary point ϑ̂n,T is an unique point of
maximum of ℓn,T on Θ and hence it is Fn,T -measurable by e.g. Lemma 4.1. in [17]. If
ℓn,T is not a concave function on Θ, for proving Fn,T -measurability of estimators ϑ̂n,T
(and so θ̂n,T ) introduced in Section 3 we need additional assumptions:
(H5a): Θ is a relative compact set in Rd , and for each 0 ≤ m ≤ d+3, Dϑm µ,
∂
m
∂x2 Dϑ µ ∈ C(E × Θ).
2
∂
m
∂x Dϑ µ,
(H6a): For all ω ∈ Ω and some r > 0,
ℓT (ϑ̂(ω), ω) > sup|x|≥r ℓT (ϑ̂(ω) + x, ω).
Assumption (H6a) holds if (H5a) holds and ϑ̂T is the unique point of maximum of ℓT
on compact Θ.
Theorem 4.1 Let us assume that (H1a-4a) hold and T > 0 be fixed. Then there exists
a sequence (ϑ̂n,T , n ≥ 1) of FT0 -measurable random vectors such that for all θ = (ϑ, σ) ∈
Ψ and when δn,T ↓ 0,
(i) limn Pθ (Dℓn,T (ϑ̂n,T ) = Ø) = 1
(ii) (Pθ ) limn ϑ̂n,T = ϑ̂T
p
(iii) ϑ̂n,T − ϑ̂T = OPθ ( δn,T ), n → +∞
(iv) If (ϑ̃n,T , n ≥ 1) is an FT0 -measurable sequence in Θ that satisfies (i − ii) then
limn Pθ (ϑ̃n,T = ϑ̂n,T ) = 1.
If either for n ≥ 1 and almost all ω ∈ Ω function ϑ 7→ ℓn,T (ϑ, ω) has a unique point
of local maximum which is a point of the global maximum as well, or the hypotheses
(H5a-6a) are satisfied, then θ̂n,T can be chosen to be Fn,T -measurable.
Corollary 4.2 Let (H1a-4a) hold, T > 0 be fixed, and (σ̂n,T , n ≥ 1) be given by (7).
Then
(i) (P√
θ ) limn σ̂n,T = σ;
1
(σ̂n,T − σ), n ≥ 1) converges in law w.r.t. Pθ to the standard normal
(ii) ( n σ√
2
distribution N (0, 1) when n → +∞.
Moreover, if ϑ̂n,T is Fn,T -measurable then σ̂n,T is Fn,T -measurable too.
ESTIMATING A CLASS OF DIFFUSIONS VIA APPROXIMATE ML METHOD 7
Remark 4.3 Theorem 4.1 still holds if we replace (H1a) with the assumption that
T < ξ a.s. where ξ is a maximal random time such that SDE (1) has a solution on
[[0, ξ[[= {(ω, t) ∈ Ω × [0, +∞i : 0 ≤ t < ξ(ω)}. ξ exists by assumption (H2a) and the
existence and uniqueness theorem for SDEs (see e.g. [13] or [25]).
Remark 4.4 Theorem 4.1 still holds if the drift and diffusion coefficient functions depend on time variable too (non autonomous case: (t, x) 7→ µ(t, x, ϑ), σb(t, x)) in a way
that assumptions (H2a) and (H3a) hold for µ and b with x and E replaced with (t, x)
and Ẽ = [0, +∞i × E respectively.
4.2
Ergodic diffusions case
Let the coefficient diffusion function parameter σ > 0 be fixed. We need the following
assumptions.
(H1b): (H1a) holds, and X is an ergodic diffusion with stationary distribution
πϑ (dx), ϑ ∈ Θ.
(H2b): (H2a) holds, and for all ϑ ∈ Θ functions µ(·, ϑ)b′ /b, (b′ )2 , b′′ b ∈ L16 (πϑ ),
b b ∈ L8 (πϑ ), and there exist a function c ∈ L1 (πϑ ) and a number h0 > 0 such that
R
′
h
′′
′2
sup0<h≤h0 E(ϑ,σ) exp 8 0 2 µ(·,ϑ)b
+
σ(b
b
+
15b
)
(X
)
ds
≤ c(x0 ).
s
b
2 ′′′
(H3b): (H3a) and (H5a) hold, and there exist nonnegative functions g0 , g1 , g2 :
E → R such that for all ϑ0 ∈ Θ, g0 ∈ L32 (πϑ0 ) ∩ C 1 (E) such that g0′ b ∈ L16 (πϑ0 ),
g1 ∈ L16 (πϑ0 ) ∩ C(E), g2 ∈ L8 (πϑ0 ) ∩ C(E), and for all x ∈ E and 0 ≤ m ≤ d + 3,
supϑ∈Θ |Dϑm µ(x, ·)/b(x)|∞
∂
supϑ∈Θ | ∂x
Dϑm µ(x, ·)|∞
∂2
supϑ∈Θ | ∂x2 Dϑm µ(x, ·)b(x)|∞
≤
≤
≤
g0 (x)
g1 (x)
g2 (x).
(H4b): For all ϑ ∈ Θ,
(∀ϑ′ ∈ Θ) ϑ′ 6= ϑ ⇒
(H5b): For all ϑ ∈ Θ, functions
2
L (πϑ ).
R
E
(µ(x,ϑ)−µ(x,ϑ′ ))2
b2 (x)
∂µ
∂ϑi (·, ϑ)/b,
πϑ (dx) > 0.
(11)
1 ≤ i ≤ d, are linearly independent in
Θ is a relatively compact set in Rd by assumption (H5a) since (H3b) holds. Assumptions (Hb1-b3) imply that for all ϑ0 ∈ Θ and ϑ ∈ Θ, P(ϑ0 ,σ) -a.s.
limT →+∞
1
T ℓT (ϑ)
=
1
2
R
E
µ(x,ϑ0 )2 −(µ(x,ϑ0 )−µ(x,ϑ))2
b2 (x)
πϑ0 (dx) =: ℓϑ0 (ϑ)
(12)
by ergodic property of the diffusion and the law of large numbers for continuous martingales (see e.g. [25], Chapters V and X). Function ℓϑ0 : Θ → R defined for every
ϑ0 ∈ Θ by formula (12) is at least three times continuously differentiable on compact Θ
by (H3b), and
(µ(x,ϑ0 )−µ(x,ϑ))
Dϑ µ(x, ϑ) πϑ0 (dx)
b2 (x)
E
R
(µ(x,ϑ0 )−µ(x,ϑ)) 2
1
τ
2
D
µ(x,
ϑ)
−
(D
µD
µ)(x,
ϑ)
πϑ0 (dx).
D ℓϑ0 (ϑ) = E
ϑ
2
2
ϑ
ϑ
b (x)
b (x)
Dℓϑ0 (ϑ) =
R
ESTIMATING A CLASS OF DIFFUSIONS VIA APPROXIMATE ML METHOD 8
Hence, by the same argument as for (12), for any fixed ϑ ∈ Θ, P(ϑ0 ,σ) -a.s.
limT →+∞
limT →+∞
1
T DℓT (ϑ)
1
2
T D ℓT (ϑ)
=
=
Dℓϑ0 (ϑ),
D2 ℓϑ0 (ϑ).
(13)
If ϑ 6= ϑ0 then ℓϑ0 (ϑ) < ℓϑ0 (ϑ0 ) by (12), and (H4b). Hence ϑ0 is the unique point of
maximum of ℓϑ0 on Θ. This implies identifiability property of the model: let ϑ1 , ϑ2 ∈ Θ
be such that P(ϑ1 ,σ) = P(ϑ2 ,σ) . Then πϑ1 = πϑ2 and so ℓϑ1 ≡ ℓϑ2 by (12). Hence ϑ1 = ϑ2 .
Moreover, (H5b) implies that the Fisher information matrix is positive definite, i.e.
R
I(ϑ0 ) = −D2 ℓϑ0 (ϑ0 ) = E b21(x) (Dϑτ µDϑ µ)(x, ϑ0 ) πϑ0 (dx) > Ø.
The next theorem states that the continuous-time MLE of drift parameters exists, is
consistent and asymptotically efficient, and satisfies assumptions (H4a) and (H6a) a.s.
for almost all observational times. Generally these are well known facts (see e.g. [8]
or [11]) but we provided it here for completeness, and in the appropriate form for the
purpose of proving Theorem 4.6 below.
Theorem 4.5 Let us assume that (H1b-5b) hold. Then there exists an (FT0 , T > 0)adapted process (ϑ̂T , T > 0) of random vectors such that for every θ = (ϑ, σ) ∈ Ψ the
following holds:
(i) Pθ -a.s. there exists T0 > 0 such that for all T ≥ T0 , ϑ̂T ∈ Θ is the unique point of
maximum of ℓT on Θ, and D2 ℓT (ϑ̂T ) < Ø in a way that min|y|=1 y τ (− T1 D2 ℓT (ϑ̂T ))y ≥
1
τ
2 min|y|=1 y I(ϑ)y.
(ii) lim
√T →+∞ ϑ̂T = ϑ Pθ -a.s.
(iii) ( T (ϑ̂T − ϑ), T > 0) converges in law w.r.t. Pθ to normal law N (Ø, σI(ϑ)−1 )
with expectation Ø and covariance matrix σI(ϑ)−1 .
The following theorem is a version of Theorem 4.1 for ergodic diffusions. In addition
it states that AMLEs are consistent and asymptotically efficient when both maximal
observational time and number of discrete observational time points tend to infinity for
appropriate sampling schemes. Hence in its statement ’limn,T ’ denotes the limit when
both T → +∞ and n → +∞.
Theorem 4.6 Let us assume that (H1b-5b) hold. Then there exists a process (ϑ̂n,T ; n ≥
1, T > 0) of Fn,T -measurable random vectors ϑ̂n,T such that for all θ = (ϑ, σ) ∈ Ψ and
πϑ -a.s. nonrandom initial conditions, and all equidistant samplings such that δn,T =
T /n → 0, the following holds.
(i) limn,T Pθ (Dℓn,T (ϑ̂n,T ) = Ø) = 1.
(ii) (Pθ ) limn,T (ϑ̂n,T − ϑ̂T ) = Ø,
p
(iii) ϑ̂n,T − ϑ̂T = OPθ ( δn,T ), n → +∞, T → +∞
(iv) If (ϑ̃n,T ; n ≥ 1, T > 0) is a process of random vectors in Θ that satisfies (i − ii)
then limn,T Pθ (ϑ̃n,T = ϑ̂n,T ) = 1.
(v) (Pθ ) limn,T ϑ̂n,T = ϑ, and if in addition limn,T T δn,T = 0 then
√
L−P
T (ϑ̂T,n − ϑ) −→θ N (Ø, σI(ϑ)−1 ), T → +∞, n → +∞.
(vi) (Pθ ) limn,T σn,T = σ, and if in addition limn,T T δn,T = 0 then
√ 1
L−P
n σ√2 (σ̂n,T − σ) −→θ N (0, 1), T → +∞, n → +∞.
ESTIMATING A CLASS OF DIFFUSIONS VIA APPROXIMATE ML METHOD 9
5
Examples
Example 5.1 Generalized logistic model. Let the stochastic generalized logistic model
be given with the following SDE:
√
dXt = (α − βXtγ )Xt dt + σXt dWt , X0 = x0 > 0
(14)
where ϑ = (α, β, γ) (γ > 0) is a drift vector parameter. By using the methods of
stochastic calculus it is possible to explicitly solve (14) that proves that there exists
pathwise unique, continuous and strong solution to this SDE with X defined on Ω ×
[0, +∞i and values in E = h0, +∞i. Moreover, it turns out that for drift parameters
such that α > σ/2, β > 0 and γ > 0, generalized logistic process X is positive recurrent
and ergodic with a such stationary distribution πϑ that for stationary X, Xtγ follows Γdistribution with parameters A := 2(α − σ/2)/(γσ) and B := γσ/(2β) (i.e. EXtγ = AB,
E(Xtγ )2 = AB(B + 1)) by e.g. Theorem 7.1, pp. 219-220 in [13]. Hence, assumption
(H1b) holds.
In generalized logistic model, drift function is equal µ(x, ϑ) = (α − βxγ )x, and up
to the diffusion parameter σ > 0, diffusion coefficient function is b(x) = x > 0 on E.
Hence b′ ≡ 1, bb′′ = b2 b′′′ ≡ 0 that are trivially integrable with respect to any probability
law. Let f (x, ϑ) = µ(x, ϑ)/b(x) = α − βxγ . Notice that any partial derivatives of f
with respect to ϑ are of the form −β n xγ logm x where n ∈ {0, 1}, m ∈ N0 . Of the
∂k
m
same forms are components of bk ∂x
k Dϑ f for k = 1, 2. Finally, any p-th power of their
absolute values (p is a positive integer) are of the form xc | log x|m up to a constant,
where c > 0 is a real number and m is a nonnegative integer. These functions are
integrable with respect to πϑ . If we choose a relative compact Θ of drift parametric
set h σ2 , +∞i × h0, +∞i2 then there exist α0 > σ/2, β0 > 0 and γ0 > 0 such that for
all ϑ ∈ Θ, x > 0, and all 0 ≤ m ≤ 6, k ∈ {0, 1, 2} and integers jα , jβ , jγ such that
jα + jβ + jγ = m,
m+k
|bk (x) ∂ k x∂α∂jα ∂β jβ ∂γ jγ f (x, ϑ)| ≤ g(x) := α0 + β0 xγ0 (1 + log2 x + log4 x + log6 x).
Then g ∈ Lp (πϑ ) ∩ C 1 (E) for all p ≥ 1 and ϑ ∈ Θ which implies partially (H2b) and
(H3b) by simple calculation (see the proof of Corollary 6.13 below). To finish the proof
of (H2b) notice that for all h0 > 0, and all 0 < h ≤ h0 ,
exp(16
Rh
0
((α − βXtγ +
15σ
2 ) dt)
≤ exp((16α0 + 120σ)h0 ) = c(x0 ) = constant
since Xt > 0 for all t ≥ 0 and β > 0. This implies the same inequalities for expectations
with respect to any initial conditions X0 = x0 . Hence (H2b) is proved.
To show that (H4b) holds, let us assume that
R
(µ(x, ϑ1 ) − µ(x, ϑ2 ))2 /b2 (x)πϑ1 (dx) = 0
E
for some ϑ1 ∈ Θ and ϑ2 ∈ Θ. Since πϑ1 is absolutely continuous w.r.t. Lesbegues
measure λ on E, this implies that µ(x, ϑ1 ) = µ(x, ϑ2 ) for a.s. x > 0 w.r.t. λ. Hence,
smooth function u(x) := β1 xγ1 − β2 xγ2 must be a constant function for λ-a.s. x > 0.
This implies that γ1 = γ2 and hence ϑ1 = ϑ2 . This proves (H4b).
∂
∂
∂
µ(x, ϑ) = 1, ∂β
µ(x, ϑ) = −xγ , and ∂γ
µ(x, ϑ) =
Finally, (H5b) holds since ∂α
γ
2
−βx log x are obviously linearly independent functions in L (πϑ ).
ESTIMATING A CLASS OF DIFFUSIONS VIA APPROXIMATE ML METHOD 10
Example 5.2 Cox-Ingersoll-Ross (CIR) model. CIR model (or Feller’s square root
model) is given by SDE:
p
(15)
dXt = (β − αXt ) dt + σ|Xt | dWt , X0 = x0 > 0.
Vector of drift parameters is
pϑ = (α, β), drift function µ(x, ϑ) = β − αx is linear in
its parameters, and b(x) = |x|. It has been known (see e.g. [18]) that if α > 0 and
β > 0 are such that 2β > σ, and x0 > 0 then SDE (15) has strong positive recurrent
and ergodic solution in state space E = h0, +∞i with stationary distribution πϑ which
has Γ-law with expectation β/α and variance βσ/(2α2 ). Hence (H1b) and (H2a-3a)
hold for any open relatively compact and convex set Θ in h0, +∞i2 ∩ {(α, β) : 2β > σ}
that contains the true drift parameter value. Additionally let us assume that if ϑ =
(α, β) ∈ Θ then 2β/σ > 16. Then function x 7→ 1/x is in L16 (πϑ ) which implies (H3b)
and partially (H2b). Since inequality in (H2b) is used only for proving the statement
of Lemma 6.5 it is sufficient to prove this lemma directly (instead of this inequality),
i.e. for each ϑ ∈ Θ we want to find a function c0 ∈ L1 (πϑ ) and h0 > 0 such that the
following inequality holds for any t ≥ 0:
sup0<h≤h0 E(ϑ,σ) (b(Xt+h /b(Xt ))8 ≤ E(ϑ,σ) c0 (Xt ).
(16)
Let x > 0 and h > 0 be arbitrary, and let X be such that (15) holds with X0 = x. Let
E ≡ E(ϑ,σ) , and let us calculate
E(b(Xh )/b(x))8 = (1/x)4 ·
where
R +∞
0
y 4 p(h, x, y) dy
√
p(h, x, y) = Ce−u−v (v/u)q/2 Iq (2 uv),
is the transition density of Xh given X0 = x (see [18]). Here u = Cxe−αh , v = Cy,
C = (2α)/(σ(1 − e−αh )), q = (2β/σ) − 1, and Iq is the modified Bessel function of the
first kind of order q. Since q > 15 and e−αh < 1 it turns out that
R +∞
E(b(Xh )/b(x))8 = (1/x)4 · 0 y 4 p(h, x, y) dy ≤ 8q 4 ( x34 + x123 + x92 + x2 + 1) =: c0 (x).
Then
E(b(Xt+h /b(Xt ))8 = E[E[(b(Xt+h /b(Xt ))8 |Ft0 ]] = E[EXt [(b(Xh /b(X0 ))8 ]] ≤ Ec0 (Xt )
by Markov property and above inequality. Hence (16) holds for any h0 > 0, and
c0 ∈ L1 (πϑ ) since x 7→ 1/x ∈ L16 (πϑ ).
√
√
Finally, (H4b-5b) follow easily since functions x and 1/ x are linearly independent, and πϑ is dominated by Lesbegue measure on h0, +∞i. Hence, if 2β/σ > 16 then
Theorem 4.6 can be applied on CIR model (15).
Since in CIR model the drift function is linear in its parameters ALF ℓn,T (ϑ) and LF
ℓT (ϑ) (ϑ = (α, β)) are quadratic functions. Hence there exist unique explicit solutions
to stationary equations Dℓn,T (ϑ) = 0 and DℓT (ϑ) = 0, and properties of the AMLE can
be investigate by simulation techniques easily. For this purpose we simulate M = 1000
paths of the process X over time-interval [0, T ] for true parameter values ϑ0 = (α0 , β0 ) =
(0.5, 0.03) and σ0 = 0.062, and several different values of T , precisely for T = 3, 4, . . . , 11.
Drift parameter values ϑ0 have been borrowed from similar examples in [1] or [23], and
σ0 has been chosen to be a such that 2β0 /σ0 ≈ 16.7 > 16. Each path initially starts
at x0 = 1, and have been simulated by using Milstein sheme based on discretization
of [0, T ] on 216 equidistant points. Using the same discretization [0, T ] each Riemann
ESTIMATING A CLASS OF DIFFUSIONS VIA APPROXIMATE ML METHOD 11
log(n)
3
4
5
6
7
8
9
10
11
SW
0.0000
0.0000
0.0000
0.0001
0.0190
0.0959
0.3887
0.5968
0.6537
Lillie
0.0010
< 0.0010
< 0.0010
0.0136
0.0382
0.1104
> 0.5000
> 0.5000
> 0.5000
JB
0.0010
< 0.0010
< 0.0010
< 0.0010
0.0240
0.0926
0.2146
> 0.5000
> 0.5000
KS
0.0000
0.0000
0.0000
0.0000
0.0000
0.0000
0.0000
0.0010
0.0037
Table 1: P-values of Shapiro-Wilk (SW), Lilliefors (Lillie), Jarque-Bera (JB)
and
(KS) tests of normality applied on samples of statistic
√
√ Kolmogorov-Smirnov
( n/σ0 2)(σ̂n,T − σ0 ) (of length M = 1000) with respect to different sampling sizes n
with fixed T = 7.
integral in ℓT (ϑ) have been approximated by trapezoidal rule, and Itô integral by Euler
approximation. Any estimate θ̂n,T = (ϑ̂n,T , σ̂n,T ) for varying n has been calculated
from the same path as estimate ϑ̂T does.
The results of analyzing asymptotic behavior of deviances ϑ̂n,T − ϑ̂T and σ̂n,T − σ0
are presented at Figure 1. Subfigures A and C represent mean deviances relative to
the true parameter values, and
B and
√
p subfigures
√ D represents standard deviations of
deviances standardized with δn,T / T = 1/ n and relative to the true parameter
values too.
In case of subfigures A and B, T = 7 is fixed, but number n of equidistant sampling
time-points varies from 23 to 211 in a way that log(n) = k, k = 3, 4, . . . , 11, where ’log(·)’
represents logarithm with base 2. Subfigure A shows the expected asymptotic behavior
that limn ϑ̂n,T = ϑ̂T and limn σ̂n,T = σ0 in case of fixed T and δn,T = T /n → 0, but
also that AMLE subestimates MLE and similarly for σ̂n,,T . The rate of convergence
can be seen from subfigure
√ B. Namely, the
pconvergence of empirical standard deviations
(estd) of components of T (ϑ̂n,T − ϑ̂T )/ δn,T (relative to ϑ0 ) shows
that these statis√
tics are bounded √
in probability, while the convergence of estd of n(σ̂n,T√− σ0 )/σ0 to a
neighborhood of 2 ≈ 1.41 are also expected by convergence
in law of n(σ̂n,T − σ0 )
√
to the normal distribution with standard deviation σ0 2. Table 1 shows p-values of
three tests of normality: Shapiro-Wilk (SW), Lilliefors Kolmogorov-Smirnov (Lillie) and
Jarque-Bera (JB),
√ Kolmogorov-Smirnov test (KS) of standard normality of simu√ and
lated statistic ( n/σ0 2)(σ̂n,T −σ0 ) with respect to n (and fixed T = 7). Obviously, the
statistic converges to normality, but slowly to the specific limiting normal distribution.
The same behavior of deviances ϑ̂n,T − ϑ̂T and σ̂n,T − σ0 when T → +∞ in a way
that δn,T → 0 can be seen from subfigures C and D. In case of these subfigures the
relative mean deviances and the relative standard deviations of standardized deviances
T
are presented with respect to δ = δ√
, 11.
n,T = T
√/2 = log(n)/n for T = log(n) = 3, 4 . . .11
Normal q-q plot of the sample of ( n/σ0 2)(σ̂n,T − σ0 ) in case T = 11 and n = 2 is
presented at subfigure B of Figure 2.
Asymptotic properties of deviances ϑ̂n,T −ϑ0 when T → +∞ in a way that δn,T → 0,
are presented in Figure 2. Subfigure A presents the relative mean deviances with respect
to δ = T /2T = log(n)/n for T = log(n) = 3, 4 . . . , 11. We notice the concave shape
of the both curves tending to zero when δ → 0. The convergence to the normality is
very slow as illustrated with q-q plots of the standardized components of AMLEs (with
α
β
σ
−0.3
−0.4
−0.5
relative mean deviance
2
4
6
8
log (n)
C
10
12
0
−0.1
−0.2
−0.3
−0.4
0
0.1
0.2
δ
0.3
0.4
rel. stdd. of stand. deviance
relative mean deviance
A
0.1
0
−0.1
−0.2
rel. stdd. of stand. deviance
ESTIMATING A CLASS OF DIFFUSIONS VIA APPROXIMATE ML METHOD 12
B
1.5
1
0.5
0
2
4
6
8
log (n)
D
10
12
1.5
1
0.5
0
0
0.1
0.2
δ
0.3
0.4
Figure 1: (A) Relative means of components of statistics ϑ̂n,T −ϑ̂T and σ̂n,T −σ0 (relative
to ϑ0 and σ0 respectively) with respect to different sampling √
sizes n and fixed T =
p
T (ϑ̂n,T − ϑ̂T )/ δn,T
7. (B)
Relative
standard
deviations
of
standardized
deviances
√
and n(σ̂n,T − σ0 )/σ0 (relative to the true parameter values) with respect to different
sampling sizes n and fixed T = 7. (C) Relative means of components of the same
statistics as in A but with respect to δ = δn,T = T /2T = log(n)/n for different T s.
(D) Relative standard deviations of the same standardized deviances as in B but with
respect to δ = δn,T = T /2T = log(n)/n for different T s. In all cases means and std.
deviatians are estimated based on simulated samples with length M = 1000.
ESTIMATING A CLASS OF DIFFUSIONS VIA APPROXIMATE ML METHOD 13
respect to the limiting normal laws and at T = 11, n = 211 ) at subfigures C and D.
6
Proofs
Basically the proof of Theorem 4.1 is based on the so-called general theorem on approximate maximum likelihood estimation and its corollary that are stated and proved in
[17] as Theorem 3.1 and Corollary 3.2. The proof of Theorem 4.6 is a modification of
the proof of the same theorem based on Theorem 4.5. But first we need to state and
prove Theorem 6.1, and its Corollaries 6.11 and 6.13 that are needed in applying the
general theorem in this context. Proofs of some technical lemmas are in Appendix.
Let us suppose that X = (Xt , t ≥ 0) is a diffusion satisfying (H1a-2a) with true
parameter θ0 = (ϑ0 , σ) ∈ Θ, and such that P(X0 = x0 )√= 1 for x0 ∈ E. Here P ≡ Pθ0 ,
E ≡ Eθ0 and L2 ≡ L2 (P). Denote µ0 = µ(·, ϑ0 ), ν = σb, Aa := a′ µ0 + a′′ ν 2 /2, and
ā := |Aa| + |a′ ν| for a ∈ C 2 (E).
Theorem 6.1 Let Θ ⊂ Rd be an open convex set, and let f : E × Θ → R, a : E → R
be functions. Let 0 = t0 < t1 < · · · < tn = T be subdivisions of intervals [0, T ], T > 0,
such that δn,T ↓ 0. Assume the following:
(B1): a ∈ C 2 (E) and there exist constants Ca > 0, Ta ≥ 0, and na ∈ N such that
R
Pn−1 4
T
(∀ T > Ta )(∀n ≥ na ) T1 E 0 (a4 +ā4 )(Xt ))dt + i=0
a (Xti )∆i t ≤ Ca .
(B2): For all ϑ ∈ Θ, f (·, ϑ) ∈ C 2 (E), and for all (x, ϑ) ∈ E × Θ and 1 ≤ m ≤
∂2
∂
m
Dϑm f (x, ϑ), and ∂x
d + 1 there exists partial derivatives Dϑm f (x, ϑ), ∂x
2 Dϑ f (x, ϑ).
Moreover,
(∀ 0 ≤ m ≤ d + 1) Dϑm f,
∂
∂2
m
m
∂x Dϑ f, ∂x2 Dϑ f
∈ C(E × Θ).
(B3): For any relatively compact set K in Θ there exist: a positive measurable function
g : E → R such that for all 0 ≤ m ≤ d + 1,
∂2
∂
m
2
≤ g,
Dϑm f (·, ϑ)|∞ (|µ0 |+|ν|)+| ∂x
supϑ∈K |Dϑm f (·, ϑ)|∞ +| ∂x
2 Dϑ f (·, ϑ)ν |∞
and constants Cg > 0, Tg ≥ 0, and ng ∈ N, such that
R
Pn−1
T
(∀ T > Tg )(∀n ≥ ng ) T1 E 0 g 4(Xt ) dt + i=0 g 4(Xti )∆i t
R ti+1 4
1 Pn−1
4
4
(a + ā4 )(Xt ) dt
i=0 (ga) (Xti )∆i t + g (Xti ) ti
TE
≤ Cg &
≤ Cg .
(B4): There exist: a measurable function c : E → R and constants h0 > 0, Cc > 0,
′
Tc ≥ 0, and nc ∈ N such that for r := | µ0bb | + |b′′ b| + |b′ |,
R
′
h
sup0<h≤h0 E exp 8 0 2 µ0bb + σ(b′′ b + 15b′2 ) (Xs ) ds
≤ c(x0 ),
P
R
T 8
n−1
≤ Cc .
(∀ T > Tc )(∀n ≥ nc ) T1 E
i=0 c(Xti )∆i t + 0 r (Xt )dt
Then there exist constants C1 > 0, C2 > 0, T0 ≥ 0, and n0 ∈ N, possible dependent on
α
β
0.1
0
−0.1
0
0.1
0.2
δ
C
0.3
0.4
1
0.5
0
−0.5
−1
−4
−2
0
2
4
Standard Normal Quantiles
quantiles of stand. AMLE of β
quantiles of stand. AMLE of α
relative mean deviance
A
0.2
quantiles of stand. AMLE of σ
ESTIMATING A CLASS OF DIFFUSIONS VIA APPROXIMATE ML METHOD 14
B
4
2
0
−2
−4
−4
−2
0
2
4
Standard Normal Quantiles
D
10
5
0
−5
−10
−4
−2
0
2
4
Standard Normal Quantiles
Figure 2: (A) Relative means of components of statistics ϑ̂n,T − ϑ0 (relative to ϑ0 )
with respect to δ = δn,T =√T /2T = log(n)/n for different T s. (B) Normal q-q plot of
the standardized statistics n(σ̂n,T − σ0 ) with respect to the limiting normal law (with
T = 11, n = 211 ) (C-D) Normal q-q plots of the standardized components of statistics
√
T (ϑ̂n,T − ϑ0 ) with respect to the limiting normal law (with T = 11, n = 211 ) (C for α
and D for β components). In all cases estimations are based on simulated samples with
length M = 1000.
ESTIMATING A CLASS OF DIFFUSIONS VIA APPROXIMATE ML METHOD 15
K, d, and a, such that for all T > T0 , and n ≥ n0 ,
E sup
ϑ∈K
n−1
X Z ti+1
1
p
(f (Xt , ϑ) − f (Xti , ϑ))a(Xt ) dt
T δn,T i=0 ti
!2
≤ C1
!2
n−1
X Z ti+1
1
E sup p
≤ C1
(f (Xt , ϑ) − f (Xti , ϑ))a(Xt ) dWt
T δn,T i=0 ti
ϑ∈K
!2
n−1
X Z ti+1
1
b(Xt )
p
E sup
f (Xti , ϑ)
− 1 a(Xt ) dt
≤ C2
b(Xti )
T δn,T i=0 ti
ϑ∈K
!2
n−1
X Z ti+1
1
b(Xt )
f (Xti , ϑ)
E sup p
− 1 a(Xt ) dWt
≤ C2 .
b(Xti )
T δn,T i=0 ti
ϑ∈K
(17)
(18)
(19)
(20)
Remark 6.2 If function f that satisfies (B2) and all its partial derivatives from (B2)
are bounded on E × K then (B3) holds if a is bounded too. Similarly if a ∈ C 2 (E) is
bounded then satisfies (B1). If in addition µ0 , b, b′ , and b′′ are bounded then (B4) holds
for constant function c ≡ exp(γh0 ) where γ > 0 and h0 > 0 are constants. In this case
the statements of Theorem 6.1 hold for T0 = 0, and hence for all T > 0 obviously from
the proof of Theorem 6.1.
Qd
For a moment let us assume that K = i=1 hai , bi i is an open and bounded dQd
dimensional rectangular in Θ. Then there exists ε > 0 such that Kε := i=1 hai −ε, bi +εi
is an open and bounded d-rectangular in Θ too. Let φ : Rd → R be a C ∞ -function
such that φ ≡ 1 on K and φ ≡ 0 on Kεc . Such a function exists (see e.g. [6], Lemma
IV.4.4, p. 176). Then function (x, ϑ) 7→ f˜(x, ϑ) := f (x, ϑ) · φ(ϑ) satisfies (B2-3) if f
satisfies the same assumption (with rescaled function g). Namely, f˜ ≡ f on E × K
and f˜ ≡ 0 on ∂Kε . The same holds for all partial derivatives of f˜ that exist, and f˜
satisfies (B1) obviously. Since φ and all of its derivatives are bounded, f˜ satisfies (B3)
too with Cg instead of g with a constant C depending on φ. Obviously, statements
(17-20) hold for a function f that satisfies (B2-3), and a rectangular K if (17-20) hold
for f˜ and the rectangular Kε . Moreover, notice that if (17-20) hold for an arbitrary
open and bounded d-dimensional rectangular K, then the same statements hold for
every relatively compact set in Θ. Hence it is sufficient to prove (17-20) for an open
and bounded d-dimensional rectangular K ⊂ Θ, and a function f satisfying (B2-3) and
the following additional assumption.
(B K): For all x ∈ E and all 0 ≤ m ≤ d + 1, Dϑm f (x, ·) ≡ Ø,
∂
m
∂x2 Dϑ f (x, ·) ≡ Ø on ∂K.
2
∂
m
∂x Dϑ f (x, ·)
≡ Ø and
Moreover, let A be an invertible affine mapping of Rd , and let f be a function on
E × Θ that satisfies (B2-3) and (B K). Then the function f¯ defined on E × A(Θ) by the
rule f¯(x, η) := f (x, A−1 η), satisfies (B2-3) and (B AK) too. Since the left hand side of
(17-20) do not change by the change of variable ϑ 7→ η = Aϑ, it is sufficient to prove
(17-20) for K0 := h−π, πid and a function f that satisfies (B2-3) and (B K0 ).
Now, let f be a function satisfying (B2-3) and (B K0 ). For x ∈ E, k = (k1 , . . . , kd ) ∈
Zd , ϑ = (ϑ1 , . . . , ϑd ) ∈ Θ, and j = (j1 , . . . , jd ) where j1 ,..., jd are nonnegative integers
such that m := j1 + · · · + jd ≤ d + 1, let us define Fourier coefficients of f by
R
1
−ihk|ϑi
dϑ,
Ck (x) := (2π)
d
K0 f (x, ϑ)e
R
m
(j)
∂ f
1
−ihk|ϑi
dϑ.
Ck (x) := (2π)d K0 j1
jd (x, ϑ)e
∂ϑ1 ···∂ϑd
ESTIMATING A CLASS OF DIFFUSIONS VIA APPROXIMATE ML METHOD 16
(j)
Let kj := k1j1 · · · kdjd . Since (B K0 ) holds, it is well known that Ck (x) = im kj Ck (x) for
each fixed x ∈ E (see e.g. [27], pp. 177-178). This relation is used in the proof of the
next few lemmas (see Appendix).
Lemma 6.3 Let x, y ∈ E. Then for all k ∈ Zd ,
|Ck (x)| ≤ g(x)
d+1
1+|k1 |+···+|kd |
Lemma 6.4 Let f ∈ C(E). Then for all 0 ≤ t0 < t,
4
R
t
E t0f (Xs ) dWs
d+1
.
Rt
≤ 3e3(t−t0 ) E t0 f 4 (Xs ) ds ≤
Rt
≤ 24(e3(t−t0 ) E t0(f (Xs )−f (Xt0 ))4 ds + E[f 4 (Xt0 )](t − t0 )2 ).
Lemma 6.5 Let (B4) hold. If c0 := (1 + c)/2 then
(∀t ≥ 0) sup0<h≤h0 E
b(Xt+h )
b(Xt )
8
≤ E c0 (Xt ).
(21)
Lemma 6.6 There exist constants K1 > 0, K2 > 0, T0 ≥ 0, and n0 ∈ N, depending
on K0 , g and a, and such that for all k ∈ Zd , T > T0 , n ≥ n0 and subdivisions
0 = t0 < t1 < · · · < tn = T (with δn,T ↓ 0) the following hold:
n−1Z
1 X ti+1
k p
(Ck (Xt )−Ck (Xti ))a(Xt ) dtkL2
T δn,T i=0 ti
n−1
XZ ti+1
1
p
(Ck (Xt )−Ck (Xti ))a(Xt ) dWt kL2
k
T δn,T i=0 ti
n−1Z
b(Xt )
1 X ti+1
Ck (Xti )
k p
−1 a(Xt ) dtkL2
b(Xti )
T δn,T i=0 ti
Z
n−1
X ti+1
b(Xt )
1
−1 a(Xt ) dWt kL2
kp
Ck (Xti )
b(Xti )
T δn,T i=0 ti
≤ K1 · Kk
(22)
≤ K1 · Kk
(23)
≤ K2 · Kk
(24)
≤ K2 · Kk ,
(25)
where Kk := ((d + 1)/(1 + |k1 | + · · · + |kd |))d+1 .
P
Let SN (x, ϑ) := |k|≤N Ck (x)eihk|ϑi for x ∈ E, ϑ ∈ K0 and N be a positive integer.
Then it can be proved that limN |SN (x, ϑ) − f (x, ϑ)| = 0 uniformly in ϑ ∈ K0 by the
methods of Fourier analysis (see e.g. [27], pp. 180-183).
P
Lemma 6.7
k∈Zd |Ck (x)| ≤ Kg(x), and supN,ϑ∈K0 |SN (x, ϑ) − f (x, ϑ)| ≤ Kg(x) for
a positive and finite constant
K=
P
d+1
.
d+1
k∈Zd 1+|k1 |+···+|kd |
(26)
Lemma 6.8 Let a ∈ C 1 (E) and let f be a function that satisfies (B2). Then for a.s.
RT
ω ∈ Ω, function ϑ 7→ 0 f (Xt , ϑ)a(Xt ) dWt (ω) is continuous on Θ.
Proof of Theorem 6.1. Let us prove (18) and (20). The proofs of (17) and (19) go
in the same way but we have to obtain expressions of form (27) below with respect to
Lesbegues’ instead of Winner’s integral, and to apply Lemma 6.6 (22) and (24). Without
ESTIMATING A CLASS OF DIFFUSIONS VIA APPROXIMATE ML METHOD 17
Qd
loosing generality let us assume that K = K0 = i=1 h−π, πi and let f satisfy (B2-3)
and (B K0 ). For fixed ϑ ∈ K0 , T > 0 and a subdivision 0 = t0 < t1 < · · · < tn = T we
define the following processes:
Pn−1
Ut := i=0
(f (Xt , ϑ) − f (Xti , ϑ))a(Xt )1 hti ,ti+1 ] (t), t ∈ [0, T ],
Pn−1
(N )
Ut := i=0
(SN (Xt , ϑ) − SN (Xti , ϑ))a(Xt )1 hti ,ti+1 ] (t), t ∈ [0, T ], N ∈ N,
and
Vt
:=
(N )
Vt
:=
Pn−1
1 a(Xt )1 hti ,ti+1 ] (t), t ∈ [0, T ],
− 1 a(Xt )1 hti ,ti+1 ] (t), t ∈ [0, T ], N ∈ N.
b(Xt )
−
i=0
b(X
ti )
Pn−1
b(Xt )
i=0 SN (Xti , ϑ) b(Xti )
f (Xti , ϑ)
(N )
(N )
(N )
Then limN |Ut − Ut | = 0, limN |Vt − Vt | = 0, and supN |Ut − Ut | ≤ K 2 (g 2 (Xt ) +
(N )
− Vt | ≤ (K/2)g 2 (Xti ) a2 (Xt ) + 2(b(Xt )/b(Xti ))2 + 2,
g 2 (Xti )) + a2 (Xt )/2, supN |Vt
for t ∈ hti , ti+1 ] by Lemma 6.7. Since (B1-4) hold and hence Lemma 6.5 holds there
RT
exist T1 ≥ 0 and n1 ∈ N such that for all T > T1 , n ≥ n1 integrals 0 g 2 (Xt ) dWt ,
R
R
Pn−1 ti+1
Pn−1 2
Pn−1 2
ti+1 2
a (Xt ) dWt , i=0
(b(Xt )/b(Xti ))2 dWt ,
i=0 g (Xti ) ti
i=0 g (Xti )∆i W ,
ti
RT 2
and 0 a (Xt ) dWt are well defined, and so
IN (ϑ) :=
JN (ϑ) :=
RT
R0T
0
P
(N )
Ut
(N )
Vt
dWt →
P
dWt →
RT
R0T
0
Ut dWt =: I(ϑ), N → +∞,
Vt dWt =: J(ϑ), N → +∞,
by the dominated convergence theorem for stochastic integrals (see e.g. [25], Theorem
(2.12), pp. 134-135).
First, let us consider sequence (IN (ϑ)). For every ϑ ∈ K0 ∩ Qd there exists a
subsequence (Np ) ≡ (Np (ϑ)) and an event A(ϑ) of the probability 1 such that for all
ω ∈ A(ϑ), limp INp (ϑ)(ω) = I(ϑ)(ω). Let us recall that
IN (ϑ) =
I(ϑ)
=
Pn−1 R ti+1
i=0 Rti (SN (Xt , ϑ) − SN (Xti , ϑ))a(Xt ) dWt , N ∈ N,
Pn−1
ti+1
i=0 ti (f (Xt , ϑ) − f (Xti , ϑ))a(Xt ) dWt .
Let Ω0 := ∩ϑ∈K0 ∩Qd A(ϑ). Then on this event of probability 1, for all ϑ ∈ K0 ∩ Qd , the
following holds:
|I(ϑ)|
≤
|I(ϑ) − INp (ϑ) (ϑ)| + |INp (ϑ) (ϑ)| ≤ |I(ϑ) − INp (ϑ) (ϑ)|+
P
Pn−1 R t
+ k∈Zd | i=0 tii+1 (Ck (Xt ) − Ck (Xti ))a(Xt ) dWt |.
By taking limit when p → +∞, we get the following inequality:
Pn−1 R t
P
|I(ϑ)| ≤ k∈Zd | i=0 tii+1 (Ck (Xt ) − Ck (Xti ))a(Xt ) dWt |.
Since ϑ 7→ I(ϑ) is a continuous function by Lemma 6.8, it turns out that supϑ∈K0 |I(ϑ)| =
supϑ∈K0 ∩Qd |I(ϑ)|, and so supϑ∈K0 |I(ϑ)| is a random variable. Hence
supϑ∈K0 |I(ϑ)| ≤
P
k∈Zd
Pn−1 R ti+1
i=0
ti
(Ck (Xt ) − Ck (Xti ))a(Xt ) dWt a.s.
(27)
Since there exist T0 ≥ T1 and n0 ≥ n1 such that for all T > T0 , n ≥ n0 and subdivisions
of [0, T ] with δn,T ↓ 0,
p
Pn−1 R ti+1
P
(Ck (Xt ) − Ck (Xti ))a(Xt ) dWt kL2 ≤ K1 K T δn,T ,
k∈Zd k
i=0 ti
ESTIMATING A CLASS OF DIFFUSIONS VIA APPROXIMATE ML METHOD 18
by Lemma 6.6 and (26), the series on the righthand side of (27) converges a.s. and in L2 norm to a.s. equal limits (see Proposition 2.10.1. in [7], p. 68). Hence k supϑ∈K0 |I(ϑ)|kL2
p
≤ C1 T δn,T for C1 := K1 K. That proves (18). The proof of (20) goes in a similar
way considering sequence (JN (ϑ)).
We need following lemma for proving consistency and asymptotic normality of diffusion coefficient parameter estimator.
Lemma 6.9 Let (B4) hold, and let b ∈ C 3 (E). Moreover, let there exist constants
Cb > 0 and Tb ≥ 0 such that
RT
Pn−1 4
r (Xti )∆i t) ≤ Cb .
(∀T > Tb ) T1 E( 0 ((b2 b′′′ )2 + r16 )(Xt ) dt + i=0
Then there exist constants C > 0, T0 ≥ 0, and n0 ∈ N, such that for all T > T0 , and
n ≥ n0 ,
2
Pn−1 1 R ti+1 b(Xt )
1
2
− (∆i W )
≤ C.
i=0 ∆i t
TE
b(Xt ) dWt
ti
i
Remark 6.10 If b and its derivatives up to the third order are bounded then the
statement of Lemma 6.9 hold for all T > T0 = 0 by the same arguments as in Remark
6.2.
6.1
Fixed maximal observational time case
Let T > 0 be fixed, and let 0 = t0 < · · · tn = T , n ∈ N, be subdivisions of [0, T ] such that
δn,T = max0≤i≤n−1 ∆i t ↓ 0 when n → +∞. We need the next corollary to Theorem
6.1.
Corollary 6.11 Let X be a diffusion such that (H1a-4a) hold and let K ⊂ Θ be a
relatively compact set. Then for all θ0 = (ϑ0 , σ) ∈ Ψ, T > 0, and r = 0, 1, 2,
p
supϑ∈K |Dr ℓn,T (ϑ) − Dr ℓT (ϑ)| = OPθ0 ( δn,T ), n → +∞.
(28)
Proof of Corollary 6.11. We prove (28) for r = 0. Statement (28) for cases r = 1 and
r = 2 can be proved similarly. Let θ0 = (ϑ0 , σ) ∈ Ψ be arbitrary, and let µ0 := µ(·, ϑ0 ).
Moreover, let f (·, ϑ) := µ(·, ϑ)/b, ϑ ∈ K, and f0 := µ0 /b. Then for any n,
ℓn,T (ϑ) − ℓT (ϑ) =
2
R
Pn−1R ti+1 µ(Xti ,ϑ) µ(Xt ,ϑ)
µ2(Xt ,ϑ)
1 Pn−1 ti+1 µ (Xti ,ϑ)
=
2 (X ) − b2 (X ) ) dt =
i=0 ti ( b2 (Xti ) − b2 (Xt ) ) dXt − 2
i=0 ti ( b
ti
t
Pn−1R ti+1
b(Xt )
=
−1
f0 (Xt )) dt+
,
ϑ)
,
ϑ))f
(X
)
+
f
(X
((f
(X
,
ϑ)−f
(X
0
t
ti
t
ti
i=0 ti
b(Xti )
√ Pn−1R ti+1
b(Xt )
−1 ) dWt −
+ σ i=0 ti ((f (Xt , ϑ)−f (Xti , ϑ)) + f (Xti , ϑ) b(X
ti )
R
P
t
n−1 i+1 2
1
2
− 2 i=0 ti (f (Xt , ϑ)−f (Xti , ϑ)) dt
(29)
by the definitions of ℓT and ℓn,T , and (1).
Let us assume for a moment that functions f0 , b, b′ , b′′ , are bounded on E, and
∂2
∂
m
Dϑm f , and ∂x
f and its partial derivatives Dϑm f , ∂x
2 Dϑ f are bounded on E × K for
0 ≤ m ≤ d + 1. Then f and f 2 satisfy condition (B2) from Theorem 6.1, and f0 and
a constant function 1 satisfy (B1), since (H2a-3a) hold. Hence, by Remark 6.2 the
statements of Theorem 6.1 holds for these functions, and any T > 0. By applying this
conclusion to (29), the following holds:
p
k supϑ∈K |ℓn,T (ϑ) − ℓT (ϑ)|kL2 (Pθ0 ) ≤ C δn,T ,
(30)
ESTIMATING A CLASS OF DIFFUSIONS VIA APPROXIMATE ML METHOD 19
for any T > 0 and subdivisions of [0, T ] with δn,T ≤ h0 , and a constant C > 0 which
depends on T , X and K.
Now, let X, µ and b satisfy assumptions (H1a-3a), and let x0 be the initial state
of X. Moreover, let (Em , m ≥ 1) be a sequence ofSopen and bounded subintervals of E
+∞
such that for all m, E m ⊂ Em+1 , x0 ∈ E1 , and m=1 Em = E, and let (φm , m ≥ 1)
∞
be a sequence of C -functions on E such that for all m, 0 ≤ φm ≤ 1, φm (x) = 1 for
c
x ∈ E m and φm ≡ 0 on Em+1
. Let us define the following bounded functions for each m:
µm (x, ϑ) := φm (x)µ(x, ϑ), (x, ϑ) ∈ E × Θ, bm (x) := φm (x)b(x) + cm (1 − φm (x)), x ∈ E
where cm := sign b · maxx∈Em+1 |b(x)|. Since µ and b satisfy (Ha2-a3), bm ∈ C 2 (E),
and bm , b′m , b′′m are bounded on E, and (x, ϑ) 7→ µm (x, ϑ)/b(x), µ2m (x, ϑ)/b2 (x) satisfy
(B2) and are bounded on E × K, and hence satisfy (B3) too, for each m. Moreover,
c
let τm := inf{t ≥ 0 : Xt ∈ Em
}, m ≥ 1. Since X is a continuous process, (τm , m ≥ 1)
is an increasing sequence of stopping times (see [25]) such that τm ↑ +∞ a.s., when
m → +∞.
Let m be fixed and let diffusion X m = (Xtm ; t ≥ 0) be defined as solution to SDE:
Rt
√ Rt
Xtm = x0 + 0 µm (Xsm , ϑ0 ) ds + σ 0 bm (Xsm ) dWs , t > 0.
By Theorem V.11.2 in [26] (Vol. 2, p. 128) such a diffusion exists and is a.s. unique.
Moreover, for almost all ω ∈ Ω and t ∈ [0, τm (ω)], Xt (ω) = Xtm (ω) by Corollary V.11.10
in [26] (Vol. 2, p. 131). This implies (see [29]) that for an arbitrary number A > 0,
p
Pθ0 {supϑ∈K |ℓn,T (ϑ) − ℓT (ϑ)| > A δn,T } ≤
(31)
m
≤ Pθ0 {τm ≤ T } + √1 k supϑ∈K |ℓm
n,T (ϑ) − ℓT (ϑ)|kL2 (Pθ0 ) ,
A
δn,T
m
both based
where ℓm
T and ℓn,T are LLF (9) and its Euler approximation (5) respectively,
√
m
on diffusion X with drift µm (·, ϑ0 ), and diffusion coefficient function σbm . Now, (30)
m
holds for functions ℓm
T and ℓn,T with constant C = Cm . Hence the righthand side of
(31) is dominated by expression Pθ0 {τm ≤ T } + A1 Cm . First, let us take a limit when
n → +∞, and then when A → +∞. Next, we take a limit when m → +∞, and hence
we prove (28).
Proof of Theorem 4.1. We need to show that the model and random functions ℓT
and ℓn,T , n ≥ 1, for fixed T > 0, satisfy conditions (A1-5) of Theorem 3.1 of [17].
Let Fn,T be σ-subalgebras of FT0 that are introduced in Section 4. We recall from the
same section that ℓT is a FT0 ⊗ B(Θ)-measurable function. In the same way, ℓn,T is
Fn,T ⊗ B(Θ)-measurable, for each n. Hence (A1) is satisfied. Corollary 6.11 implies
that functions ℓT and ℓn,T , n ≥ 1, satisfy (A3). The same corollary and (H5a) imply
(A4) and (A5). Condition (A2) is the same as assumption (H4a). Hence by Theorem
3.1 of [17] there exists a sequence of FT0 -measurable random vectors (ϑ̂n,T , n ≥ 1) such
that the statements of Theorem 4.1 hold.
For proving Corollary 4.2 we need the following lemma.
Lemma 6.12 Let (H1a-2a) hold, and T > 0 be fixed. Then for θ = (ϑ, σ) ∈ Ψ,
Pn−1
i=0
(∆i X−µ(Xti ,ϑ)∆i t)2
b2 (Xti )∆i t
−σ
Pn−1
i=0
(∆i W )2
∆i t
= OPθ (1), n → +∞.
(32)
Proof of Corollary 4.2. Notice that (ii) implies the consistency (i.e. (i)) of σ̂n . Let
us prove (ii). Since
√
√
n(σ̂n,T − σ) = n(σ̂n,T −
σ
n
Pn−1
i=0
(∆i W )2
∆i t )
√
+σ 2·
√1
2n
Pn−1
i=0
(∆i W )2 −∆i t
∆i t
(33)
ESTIMATING A CLASS OF DIFFUSIONS VIA APPROXIMATE ML METHOD 20
and (L) limn √12n
that for all ǫ > 0,
Pn−1
i=0
(∆i W )2 −∆i t
∆i t
= N (0, 1), for (ii) to hold it is sufficient to prove
√
limn Pθ { n(σ̂n,T −
σ
n
Pn−1
i=0
(∆i W )2
∆i t )
(34)
≥ ǫ} = 0.
Let ǫ > 0 and η > 0 be any numbers and let K be a relatively compact set in Θ. If
K + η := {ϑ ∈ Θ : (∃ϑ′ ∈ K) |ϑ − ϑ′ | < η} then on event
Pn−1
Pn−1 (∆i X−µ(Xti ,ϑ)∆i t)2
)2
)| < 5ǫ , ϑ̂T ∈ K}∩
− σ i=0 (∆∆i W
A = {| √1n ( i=0
b2 (Xti )∆i t
it
√ ǫ
√ ǫ
}∩
∩{|ϑ̂n,T − ϑ̂T | < η, |ℓT (ϑ) −√ℓT (ϑ̂T )| < n 10 , |ℓn,T (ϑ) − ℓT (ϑ)| < n 10
√ ǫ
ǫ
∩{supϑ′ ∈K+η |DℓT (ϑ′ )| < ηn 10
, supϑ′ ∈K+η |ℓn,T (ϑ′ ) − ℓT (ϑ′ )| < n 10
},
Pn−1
√
the following holds: | n(σ̂n,T − nσ i=0
2
σ Pn−1 (∆i W )
i=0
n
∆i t ) < ǫ}. Hence
(∆i W )2
∆i t )|
√
< ǫ. This implies that A ⊆ { n(σ̂n,T −
Pn−1 (∆i W )2
√
Pθ { n(σ̂n,T − nσ i=0
∆i t ) ≥ ǫ} ≤
2
Pn−1 (∆i W )2
Pn−1 (∆i X−µ(Xti ,ϑ)∆
i t)
1
1
− σ i=0
≤ Pθ {| √n ( i=0
b2 (Xti )∆i t
∆i t )| ≥ 5 ǫ}+
√
ǫ
+Pθ {ϑ̂T ∈ Kc } + Pθ {|ℓT (ϑ) −√ℓT (ϑ̂T )| ≥ n 10
} + Pθ {|ϑ̂n,T − ϑ̂T | ≥ η}+
√ ǫ
n ǫ
′
+Pθ {supϑ′ ∈K+η |DℓT (ϑ )| ≥ η 10 } + Pθ {|ℓn,T (ϑ) − ℓT (ϑ)| ≥ n 10
}+
√
ǫ
′
′
+Pθ {supϑ′ ∈K+η |ℓn,T (ϑ ) − ℓT (ϑ )| ≥ n 10 }.
By Lemma 6.12, Corollary 6.11, property (ii) of ϑ̂n,T from Theorem 4.1, and arbitrariness of K, (34) follows.
Ergodic case
6.2
For all T > 0 let 0 = t0 < · · · < tn = T , n ∈ N, be equidistant subdivisions of [0, T ] such
that δn,T = T /n → 0 when T → +∞ and n → +∞. We need the following corollary to
Theorem 6.1.
Corollary 6.13 Let X be a diffusion such that (H1b-3b) hold. Then for all θ0 =
(ϑ0 , σ) ∈ Ψ, πϑ0 -a.s. nonrandom initial conditions, and r = 0, 1, 2,
p
supϑ∈Θ | T1 Dr ℓn,T (ϑ) − T1 Dr ℓT (ϑ)| = OPθ0 ( δn,T ), T → +∞, n → +∞.
(35)
Proof of Corollary 6.13. Similarly to the proof of Corolarlly 6.11 it is sufficient to
prove (35) for r = 0 since the statement of the corollary for cases r = 1 and r = 2 can
be proved
√ in the same way. Let θ0 = (ϑ0 , σ) ∈ Ψ be arbitrary, and let µ0 := µ(·, ϑ0 ),
ν := σb, and P ≡ Pθ0 , E ≡ Eθ0 . Let us recall expression (29) from the proof of
Corolarlly 6.11 where f (·, ϑ) = µ(·, ϑ)/b, ϑ ∈ Θ, and f0 = µ0 /b. Notice that f and
f 2 satisfy (B2) since (H2a-3a) hold by (H2b-3b). Let us show that f0 satisfies (B1)
and f satisfies (B3) with respect to a ≡ f0 and compact Θ, and that f 2 satisfies (B3)
with respect to constant function a ≡ 1 and the same compact (notice that constant
function trivially satisfies (B1)). If we fix ϑ ∈ Θ, m such that 0 ≤ m ≤ d + 1, and
m
nonnegative integers j1 ,..., jd such that j1 + · · · + jd = m then let f˜ := j1∂ jd f (·, ϑ),
∂m
j µ(·, ϑ). By (H3a),
j1
∂ϑ1 ···∂ϑdd
g0 ∈ L32 (πϑ0 ) ⊂ L8 (πϑ0 ), and
and µ̃ :=
|f˜| ≤
|f˜′ b|
=
′
˜
|f µ 0 | =
|f˜′′ b2 | =
∂ϑ1 ···∂ϑd
f˜, µ̃ ∈ C 2 (E). Since (H3b) holds it follows that
|µ̃′ − f˜b′ | ≤ g1 + g0 |b′ | =: g01 ∈ L16 (πϑ0 ) ⊂ L8 (πϑ0 )
|(f˜′ b)f0 | ≤ g01 g1 =: g02 ∈ L8 (πϑ0 )
|µ̃′′ b − 2(f˜′ b)b′ − f (b′′ b)| ≤ g2 + 2g01 |b′ | + g0 |b′′ b| =: g03 ∈ L8 (πϑ0 )
ESTIMATING A CLASS OF DIFFUSIONS VIA APPROXIMATE ML METHOD 21
√
by (H2b-3b). Then function g00 := g0 + σg01 +g02 +σg03 is such that g00 ∈ L8 (πϑ0 ) ⊂
4
L (πϑ0 ) and
∂2
∂
m
2
Dϑm f (·, ϑ)|∞ (|µ0 |+|ν|)+| ∂x
supϑ∈Θ |Dϑm f (·, ϑ)|∞ +| ∂x
2 Dϑ f (·, ϑ)ν |∞ ≤ g00
for all 0 ≤ m ≤ d + 1. This implies that f satisfies the first part of (B3) with g ≡ g00 .
This also implies that |f0 | + |f¯0 | ≤ g00 and hence f0 , f¯0 ∈ L8 (πϑ0 ). By ChaconOrnstein theorem, ergodic theorem for additive functionals and its corollary (e.g. Theorem (A.5.2) on p. 504, Theorem (X.3.12) on p. 397, and Exercise (X.3.18) on p. 399
in [25]), for πϑ0 -a.s. initial values x0 ∈ E,
RT
Pn−1
limT →+∞ E( T1 0 f08 (Xt ) dt) = limn,T E( T1 i=0 f08 (Xti )∆i t) =
R 8
R
P
(36)
n−1
(x)πϑ0 (dx) < +∞
= limn E( n1 i=0 f08 (Xti )) = E f08 (x)πϑ0 (dx) ≤ E g00
since (H1b) holds, and subdivisions are equidistant (∆i t = T /n for each i). Moreover,
since f0 ∈ L4 (πϑ0 ) too, the same holds for 4th powers of f0 , i.e. if we substitute f04
instead of f08 in (36). Finally, the both conclusions hold for f¯0 too. Hence f0 satisfies
(B1). It remains to show that g00 satisfies the limiting properties from (B3). Using the
same arguments as in proving (36) it follows that (36) holds for 8th and hence for 4th
power of g00 . Moreover, since f0 , g00 ∈ L8 (πϑ0 ) implies f0 g00 ∈ L4 (πϑ0 ), and (36) (with
respect to f¯0 and g00 too) holds, it follows that
R
Pn−1
limn,T E( T1 i=0 (f0 g00 )4 (Xti )∆i t) = E (f0 g00 )4 (x)πϑ0 (dx) < +∞,
R
P
t
n−1 4
limn,T E( T1 i=0 g00
(Xti ) tii+1 (f04 + f¯04 )(Xt ) dt ≤
RT
P
n−1 8
(Xti )∆i t) + limT →+∞ E( T1 0 (f08 + f¯08 )(Xt )dt) < +∞.
≤ 12 limn,T E( T1 i=0 g00
Hence f satisfies (B3) for πϑ0 -a.s. nonrandom initial conditions. It remains to show
2
that f 2 satisfies (B3) with respect to function a ≡ 1. Let g := 7 · 2d+1 g00
∈ L4 (πϑ0 ).
Notice that uniformly with respect to ϑ ∈ Θ,
2
∂
∂2
∂2
∂
∂
2
2
2
|f 2 |+| ∂x
(f 2 )|+| ∂x
+f ∂x
2 (f )| ≤ |f |+2|f ∂x f |+2| ∂x f
2 f | ≤ 7g00 ≤ g.
Let us put fˆ :=
∂m
j
j
∂ϑ11 ···∂ϑdd
(f 2 )(·, ϑ) for fixed ϑ ∈ Θ, m such that 0 ≤ m ≤ d + 1, and
nonnegative integers j1 ,..., jd such that j1 + · · · + jd = m. Then by induction
∂2 ˆ
∂ ˆ
m 2
f | + | ∂x
g00 ≤ g.
|fˆ| + | ∂x
2 f| ≤ 7 · 2
Then (36) (for 4th powers of g00 ) implies that f 2 satisfies (B3) with respect to a ≡ 1,
for πϑ0 -a.s. nonrandom initial conditions. Finally, (B4) holds for πϑ0 -a.s. nonrandom
initial conditions since (H1b-H2b) hold. Hence we can apply Theorem 6.1 to (29) to
conclude that there exists constants C > 0, T0 ≥ 0, and n0 ∈ N, such that for all T > T0
and n ≥ n0 , and arbitrary A > 0,
Pθ0 { √ 1 supϑ∈Θ | T1 ℓn,T (ϑ) − T1 ℓT (ϑ)| ≥ A} ≤
δ
2
n,T
1
1
1
1
supϑ∈Θ | T ℓn,T (ϑ) − T ℓT (ϑ)| ≤ AC2 .
≤ A2 E √
δn,T
Hence
limA→+∞ limn,T Pθ0 { √ 1
δn,T
supϑ∈Θ | T1 ℓn,T (ϑ) −
1
T ℓT (ϑ)|
≥ A} = 0
which proves the corollary.
In order to prove Theorems 4.5-4.6 we need the following lemmas.
ESTIMATING A CLASS OF DIFFUSIONS VIA APPROXIMATE ML METHOD 22
Lemma 6.14 Let (H1b-3b) hold. Then for all θ0 = (ϑ0 , σ) ∈ Ψ there exist constants
Cr > 0 (r = 0, 1, 2) such that Pθ0 -a.s. there exists T0 > 0 such that for all ϑ1 , ϑ2 ∈ Θ,
and all T ≥ T0 ,
| T1 Dr ℓT (ϑ1 ) − T1 Dr ℓT (ϑ2 )|
≤ Cr |ϑ1 − ϑ2 |, r = 0, 1, 2,
− T1 DℓT (ϑ2 ) − T1 D2 ℓT (ϑ2 )(ϑ1 − ϑ2 )| ≤ 21 C2 |ϑ1 − ϑ2 |2 , and
supϑ∈Θ T1 |D3 ℓT (ϑ)|
≤ C2 .
| T1 DℓT (ϑ1 )
Lemma 6.15 Let (H1b-3b) hold. Then for all θ0 = (ϑ0 , σ) ∈ Ψ, Pθ0 -a.s.
limT →+∞ supϑ∈Θ | T1 ℓT (ϑ) − ℓϑ0 (ϑ)| = 0.
Proof of Theorem 4.5. Let θ0 = (ϑ0 , σ) ∈ Ψ be arbitrary. Since Θ is an open
set there exists ε0 > 0 such that K(ϑ0 , ε0 ) ⊂ Θ. Let ℓϑ0 be function (12) and let
λ0 := min|y|=1 y τ I(ϑ0 )y = − max|y|=1 y τ D2 ℓϑ0 (ϑ0 )y > 0 be the minimal eigenvalue of
the Fisher information matrix I(ϑ0 ) since it is positive definite by (H5b). Moreover,
let Cr > 0 (r = 0, 1, 2) be constants from Lemma 6.14, and let Ω0 be an intersection of
the events from Lemmas 6.14-6.15, and the events such that (12) and (13) hold for ϑ0 .
Hence Pθ0 (Ω0 ) = 1, and for ω ∈ Ω0 , let T0 ≡ T0 (ω) > 0 be a such that the statements
of Lemma 6.14 hold for T ≥ T0 . Let ε > 0 be such that ε ≤ ε0 ∧ λ0 /(4C2 ). Then
K(ϑ0 , ε) ⊂ Θ. Let ω ∈ Ω0 be fixed. Since (13) holds, there exists T1 ≥ T0 such that for
all T ≥ T1 , | T1 D2 ℓT (ϑ0 ) − D2 ℓϑ0 (ϑ0 )| < λ40 and | T1 DℓT (ϑ0 ) − Dℓϑ0 (ϑ0 )| < λ40 ε. Then
for all y ∈ Rd , |y| = 1, T ≥ T1 , and ϑ ∈ K(ϑ0 , ε),
y τ ( T1 D2 ℓT (ϑ))y
≤
≤
| T1 D2 ℓT (ϑ) − T1 D2 ℓT (ϑ0 )| + | T1 D2 ℓT (ϑ0 ) − D2 ℓϑ0 (ϑ0 )|+
+y τ D2 ℓϑ0 (ϑ0 )y < C2 |ϑ − ϑ0 | + λ40 − λ0 ≤
λ0
+ λ40 − λ0 = − λ20 .
C2 4C
2
Hence ϑ 7→ T1 ℓT (ϑ) is a strictly concave function on K(ϑ0 , ε). Moreover, if z ∈ Rd is
such that |z| = ε, then for y := z/|z| and T ≥ T1 ,
1
T DℓT (ϑ0
+ z)z
=
≤
≤
R
1
τ 1 1 2
T DℓT (ϑ0 )z + z ( T 0 D ℓT (ϑ0 + Rtz) dt)z ≤
1
| T1 DℓT (ϑ0 ) − Dℓϑ0 (ϑ0 )|ε + y τ ( T1 0 D2 ℓT (ϑ0
λ0 2
λ0 2
λ0 2
4 ε − 2 ε = − 4 ε < 0.
+ tz)dt)yε2 ≤
Then there exists ϑ̂T ∈ K(ϑ0 , ε) such that DℓT (ϑ̂T ) = Ø (see e.g. Lemma 4.3. in [17]),
and D2 ℓT (ϑ̂T ) < Ø since min|y|=1 y τ (− T1 D2 ℓT (ϑ))y ≥ λ20 = 12 min|y|=1 y τ I(ϑ0 )y for all
ϑ ∈ K(ϑ0 , ε) obviously. Since ε > 0 is an arbitrary small number, these imply statement
(ii) of the theorem. Notice that ϑ̂T is the unique point of maximum of function ℓT on
K(ϑ0 , ε) since ℓT is strictly concave on this set. To finish the proof of statement (i)
we have to prove that there exists T2 ≥ T1 such that ϑ̂T is the unique point of global
maximum of ℓT on Θ. Since for all ϑ ∈ Θ \ {ϑ0 }, ℓϑ0 (ϑ0 ) > ℓϑ0 (ϑ), ℓϑ0 ∈ C(Θ), and
Θ \ K(ϑ0 , ε) is a compact set, it follows that ℓϑ0 (ϑ0 ) > sup|y|≥ε ℓϑ0 (ϑ0 + y). By Lemma
4.4. in [17] there exists a number 0 < s(ε) < ε such that
∆(ϑ0 , ε) := inf |x|≤s(ε) ℓϑ0 (ϑ0 + x) − sup|y|≥ε ℓϑ0 (ϑ0 + y) > 0.
Since Lemma 6.15 holds there exists T2 ≥ T1 such that for T ≥ T2 ,
supϑ∈Θ | T1 ℓT (ϑ) − ℓϑ0 (ϑ)| <
∆(ϑ0 ,ε)
.
4
ESTIMATING A CLASS OF DIFFUSIONS VIA APPROXIMATE ML METHOD 23
If x, y ∈ Rd such that |x| ≤ s(ε) and |y| ≥ ε then
1
T ℓT (ϑ0
+ x)
=
≥
≥
1
T ℓT (ϑ0 + x) − ℓϑ0 (ϑ0 + x) + ℓϑ0 (ϑ0 + x) − ℓϑ0 (ϑ0 + y)+
+ℓϑ0 (ϑ0 + y) − T1 ℓT (ϑ0 + y) + T1 ℓT (ϑ0 + y) ≥
− ∆(ϑ40 ,ε) + inf |x|≤s(ε) ℓϑ0 (ϑ0 + x) − sup|y|≥ε ℓϑ0 (ϑ0 + y)+
− ∆(ϑ40 ,ε) + T1 ℓT (ϑ0 + y) ≥
∆(ϑ0 ,ε)
+ T1 ℓT (ϑ0 + y)
2
implying that
inf |x|≤s(ε)
1
T ℓT (ϑ0
+ x) − sup|y|≥ε
1
T ℓT (ϑ0
+ y) ≥
∆(ϑ0 ,ε)
2
(37)
>0
and hence ℓT (ϑ0 ) > sup|y|≥ε ℓT (ϑ0 + y). Finally, (i) follows. To prove statement (iii),
first notice that
√ R
L−Pθ
T
1
(38)
√1 DℓT (ϑ0 ) = √ σ
Dµ0 (Xt ) dWt −→0 N (Ø, σI(ϑ0 )), T → +∞
T
T 0 b(Xt )
by Theorem 1 in [8] since (H1b-5b) hold, and second notice that for ϑ̄(s) := sϑT + (1 −
s)ϑ0 ,
R 1R 1
(39)
DℓT (ϑ̂T ) = DℓT (ϑ0 )+D2 ℓT (ϑ0 )(ϑ̂T −ϑ0 )+ 0 0 D3 ℓT (ϑ̄(st)) ds tdt(ϑ̂T −ϑ0 )2 .
R 1R 1
Let HT (ϑ0 ) := T1 D2 ℓT (ϑ0 ) + T1 0 0 D3 ℓT (ϑ̄(st)) ds tdt(ϑ̂T −ϑ0 ), and let us recall ω ∈ Ω0
and T1 = T1 (ω) from the first part od the proof. Notice that HT (ϑ0 ) is a symmetric
matrix. Then from Lemma 6.14, for T ≥ T1 ,
|HT (ϑ0 ) −
1
T
1
D2 ℓT (ϑ0 )| ≤ supϑ∈Θ | 2T
D3 ℓT (ϑ)||ϑ̂T −ϑ0 | ≤
C2
2 |ϑ̂T
−ϑ0 |
and hence, for y ∈ Rd such that |y| = 1,
y τ HT (ϑ0 )y ≤ |HT (ϑ0 ) −
1
2
T D ℓT (ϑ0 )|
+ y τ ( T1 D2 ℓT (ϑ))y ≤ − 3λ8 0
implying that HT (ϑ0 ) is a negative definite matrix, and |HT (ϑ0 )−1 | ≤
|I(ϑ0 )−1 | = 1/λ0 ,
≤
8
3λ0 .
Since
|HT (ϑ0 )−1 + I(ϑ0 )−1 | ≤ |HT (ϑ0 )−1 | · |HT (ϑ0 ) + I(ϑ0 )| · |I(ϑ0 )−1 | ≤
8
( C2 |ϑ̂T −ϑ0 | + | T1 D2 ℓT (ϑ0 ) − D2 ℓϑ0 (ϑ0 )|),
3λ2 2
0
and (ii) and (13) hold, it follows that Pθ0 -a.s.
limT →+∞ HT (ϑ0 )−1 = −I(ϑ0 )−1 .
(40)
Finally, since DℓT (ϑ̂T ) = Ø and I(ϑ0 ) is nonrandom, (38-40) imply that
√
L−Pϑ
T (ϑ̂T − ϑ0 ) = −HT (ϑ0 )−1 √1T DℓT (ϑ0 ) −→0 N (Ø, σI(ϑ0 )−1 ), T → +∞.
Proof of Theorem 4.6. Let θ0 = (ϑ0 , σ) ∈ Ψ be arbitrary, and let Cr > 0 (r = 0, 1, 2)
be constants from Lemma 6.14. Moreover, let Ω0 be a Pθ0 -probability one event from
Lemmas 6.14-6.15 and Theorem 4.5 (i-ii). Let ω ∈ Ω0 be fixed. Let ε0 > 0 be a such
number that K(ϑ0 , ε0 ) ⊂ Θ, and let λ0 > 0 be the minimal eigenvalue of Fisher matrix
I(ϑ0 ). Then there exists T0 = T0 (ω) ≥ 0 such that for all T > T0 , ϑ̂T ∈ K(ϑ0 , ε0 /2)
and λT := min|y|=1 y τ (− T1 D2 ℓT (ϑ̂T ))y ≥ λ0 /2 > 0, and the statements of Lemma
ESTIMATING A CLASS OF DIFFUSIONS VIA APPROXIMATE ML METHOD 24
6.14 hold. Let ε > 0 be an arbitrary small number such that ε <
ε0
2
K(ϑ̂T , ε) ⊂ K(ϑ̂T , ε0 /2) ⊂ K(ϑ0 , ε0 ) ⊂ Θ. Moreover, on event
Ωn,T := {supϑ∈Θ | T1 Dr ℓn,T (ϑ) −
1
T
Dr ℓT (ϑ)| ≤
λ0
8 (1
∧
λ0
8C2 ),
∧
λ0
8C2 .
Then
r = 1, 2},
for ϑ ∈ K(ϑ̂T , ε) and z ∈ Rd such that |z| = ε, and y := z/|z|, the following holds:
y τ D2 ℓn,T (ϑ)y
≤ |D2 ℓn,T (ϑ) − D2 ℓT (ϑ)|+|D2 ℓT (ϑ) − D2 ℓT (ϑ̂T )|+
λ0
− λ20 )T = − λ40 T < 0,
+y τ D2 ℓT (ϑ̂T )y < ( λ40 + C2 8C
2
R
τ 1 2
= Dℓn,T (ϑ̂T )z + z ( 0 D ℓn,T (ϑ̂T + tz) dt)z ≤
R1
≤ |Dℓn,T (ϑ̂T ) − DℓT (ϑ̂T )|ε + y τ ( 0 D2 ℓT (ϑ̂T + tz)dt)yε2 ≤
λ0 λ0
λ0 λ0
≤ ε 8C
T < 0.
( 8 − λ40 )T = −ε 8C
2
2 8
Dℓn,T (ϑ̂T + z)z
Hence ϑ 7→ ℓn,T (ϑ) is a strictly concave function on K(ϑ̂T , ε), and there exists ϑ̂n,T ∈
K(ϑ̂T , ε) such that Dℓn,T (ϑ̂n,T ) = Ø, and ϑ̂n,T is the unique stationary point and a
point of maximum of ℓn,T at K(ϑ̂T , ε). These imply that ϑ̂n,T is a random vector. Since
limn,T Pθ0 (Ωcn,T ) = 0 by Corollary 6.13, and Ωn,T ⊂ {Dℓn,T (ϑ̂n,T ) = Ø}∩{|ϑ̂n,T − ϑ̂T | <
ε}, statements (i) and (ii) of the theorem follow. Moreover if process (ϑ̃n,T ) satisfies
(i) and (ii) then statement (iv) follows since
Ωn,T ∩ {Dℓn,T (ϑ̃n,T ) = Ø} ∩ {|ϑ̃n,T − ϑ̂T | < ε} ⊆ {ϑ̂n,T = ϑ̃n,T }
by uniqness of a stationary point of ℓn,T on K(ϑ̂T , ε). To prove (iii), let A > p
0 be an
arbitrary number, and let Ωn,T (A) := {supϑ∈Θ | T1 Dℓn,T (ϑ) − T1 DℓT (ϑ)| ≤ λ40 A δn,T }.
Then on event Ωn,T (A) ∩ Ωn,T ,
|ϑ̂n,T − ϑ̂T | ≤
≤
⇒
≤
≤
|(D2 ℓT (ϑ̂T ))−1 |·|D2 ℓT (ϑ̂T )(ϑ̂n,T − ϑ̂T )| ≤
|(D2 ℓT (ϑ̂T ))−1 |·|DℓT (ϑ̂n,T )−DℓT (ϑ̂T )−D2 ℓT (ϑ̂T )(ϑ̂n,T − ϑ̂T )|+
+|(D2 ℓT (ϑ̂T ))−1 |·|Dℓn,T (ϑ̂n,T ) − DℓT (ϑ̂n,T )| ≤
p
2 C2 λ0
ϑ̂T | + λ02T λ04T A δn,T ≤
λ0 T 2 2C2 T |ϑ̂n,T −p
1
1
2 |ϑ̂n,T − ϑ̂T | + p
2 A δn,T
|ϑ̂n,T − ϑ̂T | ≤ A δn,T
p
λ0
. Hence Ωn,T (A) ∩ Ωn,T ⊆ {|ϑ̂n,T − ϑ̂T | ≤ A δn,T },
by Lemma 6.14 and since ε ≤ 2C
2
and
p
0 ≤ limA→+∞ limn,T Pθ0 {|ϑ̂n,T − ϑ̂T | ≤ A δn,T } ≤
≤ limA→+∞ limn,T Pθ0 (Ωn,T (A)c ) + limn,T Pθ0 (Ωcn,T ) = 0
by Corollary 6.13, and (iii) follows. Consistency of ϑ̂n,T (the first part of statement
(v)) follows directly from (ii) and Theorem 4.5 (ii). To prove its asymptotic normality
(the second part of (v)) notice that
√
√
p
| T (ϑ̂n,T −ϑ0 ) − T (ϑ̂T −ϑ0 )| = T δn,T √ 1
δn,T
Pθ
0
|ϑ̂n,T − ϑ̂T | −→
0
when limn,T T δn,T = 0 since (iii) holds. Then the second part of (v) follows by Slutsky
theorem since Theorem 4.5 (iii) holds. To prove statement (vi), first we need to prove
that
Pn−1 (∆i X−µ(Xti ,ϑ)∆i t)2
Pn−1
)2
1
(41)
) = OPθ (1), T → +∞, n → +∞
− σ i=0 (∆∆i W
i=0
T(
b2 (Xt )∆i t
it
i
ESTIMATING A CLASS OF DIFFUSIONS VIA APPROXIMATE ML METHOD 25
for πϑ0 -a.s. initial conditions. This follows from Lemma 6.9, the proof of Lemma 6.12,
and the fact that the functions f := µ(·, ϑ)/b and b satisfies (B1-4) which is proved in
Corollary 6.13. The proof of asymptotic normality of σ̂n,T is the same as in the proof
of Corollary 4.2 since
=
Pn−1 (∆i W )2
Pn−1 (∆i X−µ(Xti ,ϑ)∆i t)2
√1 (
−
σ
2 (X )∆ t
i=0
i=0
b
∆i t ) =
n
ti
i
√
2
P
P
T δn,T
n−1 (∆i X−µ(Xti ,ϑ)∆i t)
n−1 (∆i W )2
( i=0
− σ i=0
T
b2 (Xti )∆i t
∆i t )
→0
when T → +∞ such that T δn,T → 0, and since (i − v), Corollary 6.13, and Lemma
6.15 hold. Similarly consistency of √
σ̂n,T follows from decomposition (33) in the proof of
Corollary 4.2 (but without factor ” n”) by using (41) which appears with factor ”δn,T ”
(notice that δn,T /T = 1/n), and by the strong low of large numbers instead of CLT. In
this case it is sufficient to assume that δn,T → 0 when T → +∞. Finally, for proving
0
Fn,T
-measurability of ϑ̂n,T (and hence σ̂n,T too) it is sufficient to prove that ϑ̂n,T is a
unique point of maximum of ℓn,T on Θ. This proof follows in the similar way as proof
of uniqness of ϑ̂T as global point of maximum of ℓT on Θ by replacing ℓT with ℓn,T and
ℓϑ0 with ℓT .
References
[1] Aı̈t-Sahalia, Y. (2002). Maximum likelihood estimation of discretely sampled diffusions: A closedform approximation approach. Econometrica, 70(1), 223-262
[2] Aı̈t-Sahalia, Y., & Mykland, P. A. (2004). Estimators of diffusions with randomly spaced discrete
observations: a general theory. The Annals of Statistics, 32(5), 2186-2222
[3] Aı̈t-Sahalia, Y. (2008). Closed-form likelihood expansions for multivariate diffusions. The Annals
of Statistics, 36(2), 906-937
[4] Bibby, B. M., & Sørensen, M. (1995). Martingale estimation functions for discretely observed
diffusion processes. Bernoulli, 1(1/2), 17-39
[5] Bishwal, J. P. N. (2008). Parameter Estimation in Stochastic Differential Equations, Lecture
Notes in Mathematics 1923, Berlin: Springer-Verlag.
[6] Borisovich, Yu., Bliznyakov, N., Izrailevich, Ya., & Fomenko, T. (1985). Introduction to Topology,
Moscow: Mir Publishers.
[7] Brockwell, P. J. & Davis, R. A. (1991). Time Series: Theory and Methods, 2nd ed., New York:
Springer-Verlag.
[8] Brown, B. M., & Hewitt, J. I. (1975). Asymptotic likelihood theory for diffusion processes, J.
Appl. Prob., 12 228-238.
[9] Dacunha-Castelle, D., & Florens-Zmirou, D. (1986). Estimation of the coefficients of a diffusion
from discrete observations, Stochastics, 19 263-284.
[10] Dohnal, G. (1987). On estimating the diffusion coefficient. J. Appl. Prob., 24 105-114.
[11] Feigin, P. D. (1976). Maximum likelihood estimation for continuous-time stochastic processes. Adv.
Appl. Prob., 8 712-736
[12] Florens-Zmirou, D. (1989). Approximate discrete time shemes for statistics of diffusion processes.
Statistics: A Journal of Theoretical and Applied Statistics, 20 547-557.
[13] Friedman, A. (1975). Stochastic Differential Equations and Applications, Vol. 1-2, New York:
Academic Press.
[14] Genon-Catalot, V. & Jacod, J. (1993). On the estimation of the diffusion coefficient for multidimensional diffusion processes. Ann. Inst. H. Poincare Probab. Statist., 29 119-151.
[15] Huzak, M. (1997). Selection of diffusion growth process and parameter estimation from discrete
observation, Ph.D. thesis, University of Zagreb (in Croatian).
[16] Huzak, M. (1998). Parameter estimation of diffusion models. Mathematical Communications, 3
129-134.
26
APPENDIX
[17] Huzak, M. (2001). A general theorem on approximate maximum likelihood estimation. Glasnik
matematički, 36(56) 139-153. (hrcak.srce.hr/file/7900).
[18] Jin, P., Mandrekar, V., Rüdiger, B. & Trabelsi, C. (2013). Positive Harris recurrence of the CIR
process and its applications. Communications on Stochastic Analysis, 7(3), 409-424
[19] Kessler, M. (1997). Estimation of an Ergodic Diffusion from Discrete Observations, Scand. J.
Statist., 24 211-229.
[20] Kloeden, P. E., Platen, E., Schurz, H., & Sørensen, M. (1996). On Effects of Discretization on
Estimators of Drift Parameters for Diffusion Processes. J. Applied Prob. 33(4) 1061-1076.
[21] Lanska, V. (1979). Minimum contrast estimation in diffusion processes. J. Appl. Prob., 16 65-75.
[22] LeBreton, A. (1976). On continuous and discrete sampling for parameter estimation in diffusion
type processes, Math. Prog. Study, 5 124-144.
[23] Li, C. (2013). Maximum-likelihood estimation for diffusion processes via closed-form density expansions. The Annals of Statistics, 41(3), 1350-1380
[24] Liptser, R. S., & Shiryayev, A. N. (1977). Statistics of random processes I, General Theory, New
York: Springer-Verlag.
[25] Revuz, D., & Yor, M. (1991). Continuous martingales and Brownian motion, Berlin: SpringerVerlag.
[26] Rogers, L. C. G., & Williams, D. (1987). Diffusion, Markov Processes, and Martingales, Vol. 1-2,
Chichester: Wiley.
[27] Taylor, M. E. (1996). Partial Differential Equations. Basic Theory., New York: Springer.
[28] Yoshida, N. (1992). Estimation for diffusion processes from discrete observations. J. Multivar.
Anal., 41 220-242.
[29] Yor, M. (1975/76). Sur quelques approximations d’integrales stochastiques. Lecture notes in probability, 528 518-528.
APPENDIX
j
Proof of Lemma 6.3. Let kj = k1j1 · · · kdd for nonnegative integers j1 ,..., jd such that m :=
j1 + · · · + jd ≤ d + 1. Then for x ∈ E,
(j)
|k|j |Ck (x)| = |Ck (x)| ≤
1
(2π)d
∂m
j
j
∂ϑ11 ···∂ϑdd
R
K0
f (x, ϑ) dϑ ≤ g(x)
by the definition of Fourier coefficients, the monotonicity of integral, and (B3). Hence
=
(1 + |k1 | + · · · + |kd |)d+1 |Ck (x)| =
P
(d+1)!
j
d+1 g(x)
j0 +j1 +···+jd =d+1 j !j !···j ! |k| |Ck (x)| ≤ (d + 1)
0
1
d
by multinomial theorem, which implies the statement of the lemma.
Rt
t0
R
Proof of Lemma 6.4. At first, let us suppose that f is bounded on E. If Mt := ( tt f (Xs ) dWs )2 −
0
f 2 (Xs ) ds, then Itô formula and isometry implies
R R
E(Mt )2 = 4E( tt ( ts f (Xu ) dWu )f (Xs ) dWs )2 ≤ 2kf k4∞ (t − t0 )2 .
0
Hence, if Nt :=
Rt
t0
0
f (Xs ) dWs then
R
E(Nt )4 ≤ 2E(Mt )2 + 2E( tt f 2 (Xs ) ds)2 ≤ 6kf k4∞ (t − t0 )2 < +∞.
0
Similarly, if t0 ≤ s < s + h ≤ t then
2
− Ns2 )2 ≤ 2kf k4∞ (4(s − t0 ) + 3h)h → 0, h → 0.
E(Ns+h
In addition E(f 2 (Xs+h ) − f 2 (Xs ))2 → 0, and Ef 4 (Xs+h ) → Ef 4 (Xs ) when h → 0 by the dominated
convergence theorem. Hence
≤
≤
2
f 2 (Xs+h )) − E(Ns2 f 2 (Xs ))| ≤
|E(Ns+h
2
2
− Ns2 | + E(Ns2 |f 2 (Xs+h ) − f 2 (Xs )|) ≤
kf k∞ E|Ns+h
q
p
2
2
kf k∞ E(Ns+h − Ns2 )2 + E(Ns4 )E(f 2 (Xs+h ) − f 2 (Xs ))2 → 0, h → 0
27
APPENDIX
implying that s 7→ E(Ns2 f 2 (Xs )), and s 7→ Ef 4 (Xs ) are continuous functions on [t0 , t]. Let x(t) := ENt4 .
Since for t0 ≤ s < s + h ≤ t,
R
4
− Ns4 ) = 6 ss+h E(Nu2 f 2 (Xu )) du,
x(s + h) − x(s) = E(Ns+h
by Itô formula, s 7→ x(s) is a differentiable function for s > t0 , and
R
R
x(s + h) − x(s) ≤ 3 ss+h x(u) du + 3 ss+h Ef 4 (Xu ) du.
Hence ẋ(s) ≤ 3x(s) + 3Ef 4 (Xs ) for s > t0 , and x(t0 ) = 0, implying that
R
R
R
E( ttf (Xs ) dWs )4 = x(t) ≤ 3e3t tt e−3s Ef 4 (Xs ) ds ≤ 3e3(t−t0 ) E( ttf 4 (Xs ) ds).
0
0
0
Now, let f ∈ C(E) be unbounded generally. Then there exists a sequence (fm ) of bounded functions
such that for all m, fm ∈ C(E), |fm | ↑ |f |, and fm → f (see the proof of Corollary 6.11). Since
limm fm (Xs ) = f (Xs ) and |fm (Xs ) − f (Xs )| ≤ 2|f (Xs )| for all m, and s, it follows that
Rt
P R
f (Xs ) dWs −→ tt f (Xs ) dWs
t m
0
0
by the dominated convergence theorem for stochastic integrals. Then there exists a subsequence such
that
4
4
R
Rt
a.s. R
a.s. R t
t
−→ tt f (Xs ) dWs ,
t fmk (Xs ) dWs
t fmk (Xs ) dWs −→ t f (Xs ) dWs ⇒
0
0
0
0
and hence by Fatou’s lemma and monotone convergence theorem
R
4
R
4
E tt f (Xs ) dWs ≤ limk E tt fmk (Xs ) dWs ≤
0
0
R
R 4
≤ 3e3(t−t0 ) limk E( tt fm
(Xs ) ds)) = 3e3(t−t0 ) E( tt f 4 (Xs ) ds).
k
0
0
The last inequality follows trivially from the first one.
Proof of Lemma 6.5. By applying Itô formula on log-function of b16 over time interval [t, t + h] it
follows that (b(Xt+h )/b(Xt ))16 = Mh Zh where
√ R
R
Mh = exp 16 σ tt+h b′ (Xs ) dWs − σ2 162 tt+h b′2 (Xs ) ds
R
R
is a positive supermartingal (see [24], Lemma 6.1, p.207) since E tt+h b′2 (Xs ) ds ≤ E tt+h r 2 (Xs ) ds ≤
RT 8
(3T + E 0 r (Xs ) ds)/4 < +∞ by assumption (B4) (and so EMh ≤ EM0 = 1), and
R
′
Zh = exp 8 tt+h 2 µ0bb + σ(b′′ b + 15b′2 ) (Xs ) ds .
By Markov property and assumption (B4), for 0 < h ≤ h0 ,
=
0
EZh = E[ E[Z
hR|Ft] ] = ′
h
E[ EXt [ exp 8 0 2 µ0bb + σ(b′′ b + 15b′2 ) (Xs ) ds ] ] ≤ E[c(Xt )].
Hence, for 0 < h ≤ h0 ,
√
b(Xt+h ) 8
= E Mh Zh ≤
E b(X
)
t
1
(EMh
2
+ EZh ) ≤
1
(1
2
+ E c(Xt )) = E c0 (Xt ).
Proof of Lemma 6.6. First, let us show that (23) implies (22). In the same way it can be shown that
(25) implies (24). Let δ := δn,T , and Ii := hti , ti+1 ]. Then Cauchy-Schwarz inequality and isometry
imply
2
R
1 Pn−1
E √
=
i=0 Ii (Ck (Xt )−Ck (Xti ))a(Xt ) dt
T δ
2
R P
(t))
a(X
≤
))1
(C
(X
)−C
(X
= T 12 δ E 0T ( n−1
t ) dt
t
ti
Ii
k
k
i=0
R
P
T
n−1
1
2
2
(42)
≤ T δ E 0 | i=0 (Ck (Xt )−Ck (Xti ))1 Ii (t)| a (Xt ) dt =
2
R P
))1
=
(C
(X
)−C
(X
(t))
a(X
)
dW
= T1δ E 0T ( n−1
t
t
t
t
Ii
k
k
i
i=0
2
Pn−1 R
= E √1
))a(X
(C
(X
)−C
(X
t ) dWt .
t
t
k
k
i
i=0 I
Tδ
i
Hence it is sufficient to prove that there exist constants K1 > 0, T1 ≥ 0, and n1 such that
R
1 Pn−1
2 2
i=0 I E(|Ck (Xt )−Ck (Xti )| a (Xt )) dt ≤ K1 · Kk
Tδ
i
(43)
for T > T1 , and n ≥ n1 since the left hand side of (43) is equal to (42). Similarly, to prove (24) and
(25) it is sufficient to prove that there exist constants K2 > 0, T0 ≥ T1 , and n0 ≥ n1 such that for
T > T0 , and n ≥ n0 ,
2
R
1 Pn−1
2 b(Xt ) − 1 a2 (X )) dt ≤ K · K .
(44)
t
2
k
i=0 I E(|Ck (Xti )|
Tδ
b(X )
i
ti
28
APPENDIX
Let j1 ,..., jd be nonnegative
integers such that m := j1 + · · · + jd ≤ d + 1, and let ϑ ∈ K0 be fixed.
∂m
1 ˜′′ 2
2
˜
˜
˜′
Then function f˜ :=
jd f (·, ϑ) ∈ C (E) by (B2). If Af := f µ0 + 2 f ν , then |Af | ≤ g, and
j1
∂ϑ1 ···∂ϑd
|f˜′ ν| ≤ g by (B3). Hence by applying Itô formula, Jensen’s inequality, and Lemma 6.4 it follows that
R
R
E(f˜(Xt ) − f˜(Xti ))4 = E( tt Af˜(Xs ) ds + tt (f˜′ ν)(Xs ) dWs )4 ≤
i
i
Rt
R
≤ 8(E( t |Af˜|(Xs ) ds)4 + E( tt (f˜′ ν)(Xs ) dWs )4 ) ≤
i
iR
R
R
≤ 8(δ3 E tt (Af˜)4 (Xs ) ds + 3e3δ E tt (f˜′ ν)4 (Xs ) ds) ≤ 24e3 E tt g 4 (Xs ) ds,
i
i
i
R
and E(a(Xt )−a(Xti ))4 ≤ 24e3 E tt ā4 (Xs ) ds by an analogy, since we can assume that δ ≤ 1. Similarly,
i
2
0
2
2
˜
˜
˜
˜
ti )(f (Xt ) − f (Xti )) ) = E[a (Xti )E[(f (Xt ) − f (Xti )) |Fti ]] ≤
R
R
2E[a2 (Xti )E[δ tt (Af˜)2 (Xs ) ds + ( tt (f˜′ ν)(Xs ) dWs )2 |Ft0i ]] ≤
i
i
R
R
4 E[a2 (Xti ) tt g 2 (Xs ) ds] ≤ 2(t − ti )Ea4 (Xti ) + 2E tt g 4 (Xs ) ds.
E(a2 (X
≤
≤
Hence
E((f˜(Xt ) − f˜(Xti ))2 a2 (Xt )) ≤
E(f˜(Xt )−f˜(Xti ))4 + E(a(Xt )−a(Xti ))4 + 2E(a2 (Xti )(f˜(Xt )−f˜(Xti ))2 ) ≤
R
R
25e3 E tt g 4 (Xs ) ds + 24e3 E tt ā4 (Xs ) ds + 4(t − ti )Ea4 (Xti ).
≤
≤
(45)
i
i
i
(46)
i
j
Now, let kj = k1j1 · · · kdd . Then
(j)
(j)
|k|j 2 E(|Ck (Xt )−Ck (Xti )|2 a2 (Xt )) = E(|Ck (Xt )−Ck (Xti!)|2 a2 (Xt )) ≤
2
R
∂mf
∂m f
1
E
a2 (Xt ) dϑ ≤
,
ϑ)
(X
,
ϑ)
−
(X
t
t
j
j
d
j
j
i
(2π) K0
∂ϑ11 ···∂ϑdd
∂ϑ11 ···∂ϑdd
R
R
25e3 E tt g 4 (Xs ) ds + 24e3 E tt ā4 (Xs ) ds + 4(t − ti )Ea4 (Xti )
≤
≤
i
(47)
i
by the definition of Fourier’s coefficients, Jensen’s inequality, Fubini’s theorem, and (46). Hence
(1 + |k1 | + · · · + |kd |)2(d+1) E(|Ck (Xt )−Ck (Xti )|2 a2 (Xt )) ≤
(d + 1)d+1 (1 + |k1 |2 + · · · + |kd |2 )d+1 E(|Ck (Xt )−Ck (Xti )|2 a2 (Xt )) =
P
(d+1)!
(d + 1)d+1 j0 +···+jd =d+1j !j !···j ! |k|j 2 E(|Ck (Xt )−Ck (Xti )|2 a2 (Xt )) ≤
0 1
d
R
R
t
(d + 1)2(d+1) (25e3 E t g 4 (Xs ) ds + 24e3 E tt ā4 (Xs ) ds + 4(t − ti )Ea4 (Xti ))
≤
=
≤
i
implying that
E(|Ck (Xt )−Ck (Xti )|2 a2 (Xt )) ≤
R
R
Kk2 (25e3 E tt g 4 (Xs ) ds + 24e3 E tt ā4 (Xs ) ds + 4(t − ti )Ea4 (Xti )).
≤
Finally, if
K′
i
i
i
= 5e3/2 then
R
1 Pn−1
≤
≤
2 2
t )−C (Xti )| a (Xt )) dt ≤
i=0 IiE(|Ck (X
Tδ
R R t 4k
Pn−1 4
2
1 Pn−1
4
′
2
Kk K E( T δ i=0 I t (g + ā )(Xs ) ds dt + T1
i=0 a (Xti )∆i t)
i i
R
R
P
4
Kk2 K ′ 2 ( T1 E 0Tg 4 (Xt ) dt + T1 E 0T(ā4 (Xt ) dt + n−1
i=0 a (Xti )∆i t)).
≤
Assumptions (B1-3) imply that there exist T1 ≥ 0 and n1 ∈ N such that the expression in the parentheses on the right hand side of the above inequality is bounded by a constant K ′′ 2 = Cg + Ca > 0 for
T > T1 and n ≥ n1 . Hence K1 := K ′ K ′′ > 0 in (43) and the statements (22-23) are proved. To prove
(44) and hence statements (24-25) notice that
√ R
R
′
b(Xt )
σ tt b′ (Xs ) dWs + tt µ0bb + σ2 (b′′ b − b′2 ) (Xs ) ds
= exp
b(X )
0
0
t0
from the proof of Lemma 6.5. It follows that
R b(Xs ) √ ′
′
b(Xt )
σb (Xs ) dWs + ( µ0bb +
− 1 = tt b(X
)
b(X )
t0
0
t0
σ ′′
b b)(Xs ) ds
2
(48)
by Itô formula applied on the exponential function. Now, in the same way as equations (45) and
(46) have been derived we obtain the following: for subdivisions such that δn,T ≤ h0 , and f˜ :=
∂m
′
′′ > 0,
jd f (·, ϑ), ϑ ∈ K0 , and some constants C , C
j1
∂ϑ1 ···∂ϑd
≤
≤
2
2
b(X )
b(X )
E(a2 (Xti )f˜2 (Xti ) b(X t ) − 1 ) ≤ E(a2 (Xti )g 2 (Xti ) b(X t ) − 1 ) ≤
ti
ti
R b(X ) 4
C ′ (2(t − ti )E(a4 (Xti )g 4 (Xti )) + 2E tt b(X s ) r 4 (Xs ) ds) ≤
ti
i
R
C ′ (2(t − ti )E(a4 (Xti )g 4 (Xti )) + (t − ti )E c0 (Xti ) + E tt r 8 (Xs ) ds),
i
29
APPENDIX
and,
≤
2
2
b(X )
b(X )
E(f˜2 (Xti ) b(X t ) − 1 a2 (Xt )) ≤ E(g 2 (Xti ) b(X t ) − 1 a2 (Xt )) ≤
ti
4 ti
b(X )
E(g 4 (Xti )(a(Xt ) − a(Xti ))4 ) + E b(X t ) − 1 +
ti
2
R
b(Xt )
+2 E(a2 (Xti )g 2 (Xti ) b(X
−
1
)
≤
C ′′ ( E(g 4 (Xti ) tt ā4 (Xs ) ds)+
ti )
i
R
+(t − ti )E(a4 (Xti )g 4 (Xti )) + (t − ti )E c0 (Xti ) + E tt r 8 (Xs ) ds)
i
by Lemma 6.5, (B1) and (B4). Hence there exist K2 > 0, T0 ≥ T1 , and n0 ≥ n1 such that for all
T > T0 and n ≥ n0 , (44) follows in the same way as (43) has been followed from (45) and (46) by using
(B1-4).
and
Proof of Lemma 6.7. By Lemma 6.3,
P
P
d+1
k∈Zd
k∈Zd |Ck (x)| ≤ g(x)(d + 1)
1
,
(1+|k1 |+···+|kd |)d+1
1
k∈Zd (1+|k1 |+···+|kd |)d+1 =
P∞
P∞
Pd
r
2
+ r=1 dr r d+1
k1 =1 · · ·
k =1
P∞r
P
2r P∞
·
·
·
+ dr=1 dr r d+1
k1 =1
kr =1
P
≤
1
≤
1
=
1+
d 2r
r=1 r r d+1
Pd
P∞
k=1
1
d+1
k r
r
.
r
k1 +···+kr
1
d+1
d+1
d+1
k1 r ···kr r
=
≤
d+1
P
r ) < +∞ for all r ≤ d, it follows that K < +∞. Moreover, for any N , ϑ ∈ K0 , and
Since
k (1/k
P
x ∈ E, |SN (x, ϑ) − f (x, ϑ)| ≤ |k|>N |Ck (x)| ≤ Kg(x), implying the statements of the lemma.
R f (y,ϑ)
Proof of Lemma 6.8. Let x0 ∈ E be fixed and F (x, ϑ) := xx ν(y) a(y) dy. Then F is a continuous
0
function on E × Θ. By Itô formula applied on F ,
RT
0 f (Xt , ϑ)a(Xt ) dWt R=
= F (XT , ϑ)−F (X0 , ϑ) − 0T f (·, ϑ) µν0 − 21 ν ′ a + 12 (f (·, ϑ)a)′ ν (Xt ) dt,
which is a continuous function on Θ.
Proof of Lemma 6.9. Let Ii := hti , ti+1 ]. Notice that
2
R ti+1 b(Xt )
1 Pn−1 1
dWt − (∆i W )2 ≤
ti
i=0 ∆i t
T
b(Xti )
R
Pn−1 1 R
b(Xt )
b(Xt )
2 Pn−1 1
2
≤ T1
i=0 ∆ t ( I ( b(X ) − 1)dWt ) + T |
i=0 ∆ t ∆i W I ( b(X ) − 1)dWt |.
1
T
E
Pn−1
i=0
ti
i
i
Since
ti
i
i
1
T
Pn−1
ti
i
i
R
1
( ( b(Xt ) − 1)dWt )2 =
∆ t I b(X )
1
i=0 ∆i t
R
Ii
b(Xt )
− 1)2 dt
E( b(X
)
ti
by the isometry, it follows that this expression is bounded by a constant for all T > T1 and n ≥ n1
and some T1 ≥ 0 and n1 in the same way as in the proof of Lemma 6.6 since (B4) holds. It remains to
prove the same for the second expression from the right hand side of the above inequality. By applying
Ito formula and (48) the following holds:
R
b(Xt )
∆i W I ( b(X
− 1)dWt =
ti )
i
R R t b(X
R b(Xt )
b(Xt )
s)
(
=
(
−
1)dW
s + (Wt − Wti )( b(X ) − 1)) dWt + I ( b(X ) − 1) dt =
Ii ti b(Xti )
ti
ti
i
R t b(Xs )
R
√
b(Xt ) ′
b(Xt )
σ∆
t
)(
−
1)dW
+
(W
−
W
−
1)
+
b (Xt )+
(
=
s
t
t
i
ti b(Xti )
Ii
b(Xti )
i b(XtRi ) R
R t b(Xs ) ′
√
t b(Xs )
+ σ(t − ti ) t b(X ) b (Xs ) dWs dWt + I t b(X ) v(Xs ) ds dt
ti
i
i
i
ti
where v := (µ(·, ϑ)/b)b′ + (σ/2)bb′′ . Then by applying the isometry and Cauchy inequality, and by
assuming that T ≥ 1,
R
P
b(Xt )
1
2
E( T1 | n−1
i=0 ∆i t ∆i W Ii ( b(Xti ) − 1)dWt |) ≤
R
R
R
P
b(X
)
b(Xt )
t
n−1
1
1
s
2
4
≤ T1
i=0 2 ∆i t2 Ii ti E( b(Xt ) − 1) ds dt + ∆i t2 Ii E( b(Xt ) − 1) dt + 2∆i t+
i R R
i
R
t
b(Xs ) 2 2
b(Xt ) 2 2
+σ I E( b(X ) ) r (Xt )dt + σ I t E( b(X ) ) r (Xs ) ds dt+
ti
ti
i
i
i
R R
b(Xs ) 2 2
)
r
(X
)
ds
dt
+(1 + σ/2) ∆1 t I tt E( b(X
s
)
i
i
i
ti
since |v| ≤ (1 + σ/2)r and |b′ | ≤ r for function r from (B4). For all terms on the right hand side of
the above inequality we can prove boundedness in the same way as in the proof of Lemma 6.6 by using
30
APPENDIX
(B4), except for the following one for which we have to use the additional assumptions of the lemma to
obtain the boundedness. First by using (48), then Lemma 6.4, and Ito formula we obtain the following:
R
b(Xt )
1 Pn−1 1
4
i=0 ∆i t2 Ii E( b(Xt ) − 1) dt ≤
T
i
R
P
T
4
≤ K ′ (1 + T1 E( 0 (r 8 + r 16 + (b2 b′′′ )8 )(Xt ) dt + n−1
i=0 (c0 + r )(Xti )∆i t)
for some constant K ′ > 0. Now, the statement of the lemma follows.
Proof of Lemma 6.12. By applying (1) it follows that:
P
(∆i X−µ(Xti ,ϑ)∆i t)2
(∆i W )2
=
− n−1
i=0
∆i t
σb2 (Xti )∆i t
Pn−1 1 R ti+1 µ(Xt ,ϑ)−µ(Xti ,ϑ) 2
dt −
ti
i=0 ∆i t
b(Xt )
R ti+1 µ(Xt ,ϑ)−µ(Xti ,ϑ)
Pn−1√ R ti+1 b(Xt ) i
1
−2 i=0 σ ti b(X ) dWt · ∆ t ti
b(Xti )
i
R t ti
P
i+1 b(Xt )
1
dWt )2 − (∆i W )2 .
+σ n−1
i=0 ∆ t ( ti
b(Xt )
Pn−1
i=0
=
i
(E1)
dt+
(E2)
(E3)
i
First we will prove that the expression from the left hand side of the above equation is bounded in
L1 -norm by a constant for all n ≥ n0 (for some n0 ) in case when all functions µ(·, ϑ), b and their
appropriate partial derivatives are bounded on E, and then the statement of the lemma will follow by
using local compactness of E and Markov’s inequality just in the same way as in the proof of Corollary
6.11. Let f := µ(·, ϑ)/b and Ii := hti , ti+1 ], and let n be such that δn,T ≤ 1. Then the expectation of
(E1) is dominated by
R µ(X ,ϑ)−µ(X ,ϑ) 2
R µ(X ,ϑ)−µ(X ,ϑ) 2 P
P
t
ti
t
ti
1
1
E n−1
dt ≤ E n−1
dt ≤
i=0 ∆i t Ii
i=0 (∆i t)2
Ii
b(Xti )
b(Xti )
P
n−1 1 R
2 dt+
))
E(f
(X
)
−
f
(X
≤ 2T T1
t
ti
i=0 ∆ t Ii
Pn−1 1 Ri
2 (X )( b(Xt ) − 1)2 dt ≤ T C ′ .
+ T1
Ef
t
i=0 ∆ t I
b(X )
i
ti
i
The existence of a constant C ′ > 0 follows in the same way as in the proof of Lemma 6.6 since (B1-4)
hold for bounded functions by Remark 6.2. L1 -norm of (E2) is dominated by
R µ(X ,ϑ)−µ(X ,ϑ) 2
R t
2
P
P
t
ti
i+1 b(Xt )
1
E n−1
dWt + E n−1
dt ≤
ti
i=0
i=0 (∆i t)2
Ii
b(Xti )
b(Xti )
1 Pn−1
′
≤ T K(1 + n i=0 c0 (Xti )) + T C
for some constant K > 0 by the isometry and Lemma 6.5. Now, boundedness of L1 -norm of (E2)
follows from (B4). L1 -norm of (E3) is bounded by Lemma 6.9 and Remark 6.10.
Proof of Lemma 6.14. Let f := µ/b : E × Θ → R, and let f0 := µ(·, ϑ0 )/b. For nonnegative integers
∂3
∂3
2
ˆ
j1 ,..., jd such that j1 + · · · + jd = 3, let f˜ :=
jd f , and f :=
jd (f ). Then for T > 0,
j1
j1
and ϑ ∈ Θ,
∂ϑ1 ···∂ϑd
∂3
j ℓT (ϑ)
j
∂ϑ11 ···∂ϑdd
=
RT
0
∂ϑ1 ···∂ϑd
√
(f˜(·, ϑ)f0 − 12 fˆ(·, ϑ))(Xt ) dt + σ
RT
0
f˜(Xt , ϑ) dWt
by (10) and (1). Since (H2b-3b) hold, from the proof of Corollary 6.13 it follows that
R
R
2 (X ) dt,
supϑ∈Θ T1 | 0T (f˜(·, ϑ)f0 − 12 fˆ(·, ϑ))(Xt ) dt| ≤ C T1 0T g00
t
where C := 1+7·23 . The right hand side of the above inequality Pθ0 -a.s. converge to a finite nonrandom
limit L0 = L0 (ϑ0 ) by the ergodic property of X. Hence on an Pθ0 -a.s. event there exists T0′ ≥ 0 such
that for all T > T0′ ,
RT 2
R
1
2 (X ) dt − L | + L ≤ 1 + L .
g00 (Xt ) dt ≤ | T1 0T g00
t
0
0
0
T 0
Let us suppose that b > 0. The case when b < 0 can be analyzing in the same way. By applying Itô
R g0 (y)
R f˜(y,ϑ)
dy, we get the following
formula twice, first on function x 7→ xx b(y) dy, and then on x 7→ xx b(y)
0
≤
≤
0
√ R
√ R
R
˜(y,ϑ)
σ
| 0Tf˜(Xt , ϑ) dWt | = | Tσ xXT f b(y)
dy− T1 0T(f˜f0 − σ2 (f˜b′ − f˜′ b))(Xt ) dt|
T
0
√ R
R
σ
0 (y)
| xXT gb(y)
dy| + T1 0T (g02 + σ2 (g1 + 2g0 |b′ |))(Xt ) dt ≤
T
0
√
R
σ RT
| 0 g0 (Xt ) dWt |+ T1 0T(2g02 + σ2 (g0 +|g0′ b| + g1 + 2g0 |b′ |))(Xt ) dt.
T
≤
Since the right hand side of the above inequality Pθ0 -a.s. converge to a finite nonrandom limit (by
ergodic property and the law of large numbers for continuous martingales since (H2b-3b) hold), and
since it is also an upper bond for the left hand side uniformly for all ϑ ∈ Θ, and all partial derivatives
31
APPENDIX
of the third order, there exists a constant C ′′ > 0 such that Pθ0 -a.s. there exists T0′′ ≥ T0′ such that for
all T > T0′′ , supϑ∈Θ T1 |D 3 ℓT (ϑ)|∞ ≤ C ′′ . From the definition of operator norm, for the same T ,
supϑ∈Θ
1
T
|D 3 ℓT (ϑ)| ≤ supϑ∈Θ
1
T
d3/2 |D 3 ℓT (ϑ)|∞ ≤ d3/2 C ′′ =: C2 .
By the same arguments we can prove that there exist constants C0 > 0, C1 > 0 such that Pθ0 -a.s.
exists T0 ≥ 0 such that T0 ≥ T0′′ , and for all T > T0 , supϑ∈Θ T1 |D r+1 ℓT (ϑ)| ≤ Cr for r = 0, 1. Finally,
the statements of the lemma follow from the mean value theorem and Taylor expansion (39) from the
proof of Theorem 4.5, where ϑ̂T and ϑ0 are replaced with ϑ1 and ϑ2 respectively.
Proof of Lemma 6.15. Let C0 > 0 and Ω0 be an event, both from Lemma 6.14 such that Pθ0 (Ω0 ) = 1
and on Ω0 for all T ≥ T0 , and all ϑ1 , ϑ2 ∈ Ω, |ℓT (ϑ1 ) − ℓT (ϑ2 )| ≤ T C0 |ϑ1 − ϑ2 |. Let K0 > 0 be
Lipschitz constant of function ℓϑ0 , and let ε > 0 be an arbitrary number. Let δ := ε/(2(C0 + K0 )).
Since {K(ϑ, δ) : ϑ ∈ Θ} is an open cover of compact Θ, there exists a finite subcover {K(ϑi , δ) : i =
1, . . . , Kε }. Let Ω1 be an Pθ0 -a.s. event such that on this event there exists Tε ≥ T0 such that for all
T ≥ Tε , and 1 ≤ j ≤ Kε , | T1 ℓT (ϑj )−ℓϑ0 (ϑj )| < ε/(2Kε ). Then on Ω0 ∩ Ω1 for all ϑ ∈ Θ there exists
i = i(ϑ) ≤ Kε such that ϑ ∈ K(ϑi , δ), and
≤
<
Hence
| T1 ℓT (ϑ)−ℓϑ0 (ϑ)| = | T1 ℓT (ϑ)− T1 ℓT (ϑi )+ T1 ℓT (ϑi )−ℓϑ0(ϑi )+ℓϑ0(ϑi )−ℓϑ0(ϑ)| ≤
P ε 1
C0 |ϑ − ϑi |+ K
j=1 | T ℓT (ϑj )−ℓϑ0 (ϑj )|+K0 |ϑ − ϑi | <
ε
= ε.
(C0 + K0 )δ + Kε 2K
supϑ∈Θ | T1 ℓT (ϑ)−ℓϑ0 (ϑ)|
ε
< ε which proves the lemma.