Let
(
X
,
A
)
{\displaystyle (X,{\mathcal {A}})}
and
(
Y
,
B
)
{\displaystyle (Y,{\mathcal {B}})}
be measurable spaces . A Markov kernel with source
(
X
,
A
)
{\displaystyle (X,{\mathcal {A}})}
and target
(
Y
,
B
)
{\displaystyle (Y,{\mathcal {B}})}
, sometimes written as
κ
:
(
X
,
A
)
→
(
Y
,
B
)
{\displaystyle \kappa :(X,{\mathcal {A}})\to (Y,{\mathcal {B}})}
, is a function
κ
:
B
×
X
→
[
0
,
1
]
{\displaystyle \kappa :{\mathcal {B}}\times X\to [0,1]}
with the following properties:
For every (fixed)
B
0
∈
B
{\displaystyle B_{0}\in {\mathcal {B}}}
, the map
x
↦
κ
(
B
0
,
x
)
{\displaystyle x\mapsto \kappa (B_{0},x)}
is
A
{\displaystyle {\mathcal {A}}}
-measurable
For every (fixed)
x
0
∈
X
{\displaystyle x_{0}\in X}
, the map
B
↦
κ
(
B
,
x
0
)
{\displaystyle B\mapsto \kappa (B,x_{0})}
is a probability measure on
(
Y
,
B
)
{\displaystyle (Y,{\mathcal {B}})}
In other words it associates to each point
x
∈
X
{\displaystyle x\in X}
a probability measure
κ
(
d
y
|
x
)
:
B
↦
κ
(
B
,
x
)
{\displaystyle \kappa (dy|x):B\mapsto \kappa (B,x)}
on
(
Y
,
B
)
{\displaystyle (Y,{\mathcal {B}})}
such that, for every measurable set
B
∈
B
{\displaystyle B\in {\mathcal {B}}}
, the map
x
↦
κ
(
B
,
x
)
{\displaystyle x\mapsto \kappa (B,x)}
is measurable with respect to the
σ
{\displaystyle \sigma }
-algebra
A
{\displaystyle {\mathcal {A}}}
.[ 2]
Take
X
=
Y
=
Z
{\displaystyle X=Y=\mathbb {Z} }
, and
A
=
B
=
P
(
Z
)
{\displaystyle {\mathcal {A}}={\mathcal {B}}={\mathcal {P}}(\mathbb {Z} )}
(the power set of
Z
{\displaystyle \mathbb {Z} }
). Then a Markov kernel is fully determined by the probability it assigns to singletons
{
m
}
,
m
∈
Y
=
Z
{\displaystyle \{m\},\,m\in Y=\mathbb {Z} }
for each
n
∈
X
=
Z
{\displaystyle n\in X=\mathbb {Z} }
:
κ
(
B
|
n
)
=
∑
m
∈
B
κ
(
{
m
}
|
n
)
,
∀
n
∈
Z
,
∀
B
∈
B
{\displaystyle \kappa (B|n)=\sum _{m\in B}\kappa (\{m\}|n),\qquad \forall n\in \mathbb {Z} ,\,\forall B\in {\mathcal {B}}}
.
Now the random walk
κ
{\displaystyle \kappa }
that goes to the right with probability
p
{\displaystyle p}
and to the left with probability
1
−
p
{\displaystyle 1-p}
is defined by
κ
(
{
m
}
|
n
)
=
p
δ
m
,
n
+
1
+
(
1
−
p
)
δ
m
,
n
−
1
,
∀
n
,
m
∈
Z
{\displaystyle \kappa (\{m\}|n)=p\delta _{m,n+1}+(1-p)\delta _{m,n-1},\quad \forall n,m\in \mathbb {Z} }
where
δ
{\displaystyle \delta }
is the Kronecker delta . The transition probabilities
P
(
m
|
n
)
=
κ
(
{
m
}
|
n
)
{\displaystyle P(m|n)=\kappa (\{m\}|n)}
for the random walk are equivalent to the Markov kernel.
More generally take
X
{\displaystyle X}
and
Y
{\displaystyle Y}
both countable and
A
=
P
(
X
)
,
B
=
P
(
Y
)
{\displaystyle {\mathcal {A}}={\mathcal {P}}(X),\ {\mathcal {B}}={\mathcal {P}}(Y)}
.
Again a Markov kernel is defined by the probability it assigns to singleton sets for each
i
∈
X
{\displaystyle i\in X}
κ
(
B
|
i
)
=
∑
j
∈
B
κ
(
{
j
}
|
i
)
,
∀
i
∈
X
,
∀
B
∈
B
{\displaystyle \kappa (B|i)=\sum _{j\in B}\kappa (\{j\}|i),\qquad \forall i\in X,\,\forall B\in {\mathcal {B}}}
,
We define a Markov process by defining a transition probability
P
(
j
|
i
)
=
K
j
i
{\displaystyle P(j|i)=K_{ji}}
where the numbers
K
j
i
{\displaystyle K_{ji}}
define a (countable) stochastic matrix
(
K
j
i
)
{\displaystyle (K_{ji})}
i.e.
K
j
i
≥
0
,
∀
(
j
,
i
)
∈
Y
×
X
,
∑
j
∈
Y
K
j
i
=
1
,
∀
i
∈
X
.
{\displaystyle {\begin{aligned}K_{ji}&\geq 0,\qquad &\forall (j,i)\in Y\times X,\\\sum _{j\in Y}K_{ji}&=1,\qquad &\forall i\in X.\\\end{aligned}}}
We then define
κ
(
{
j
}
|
i
)
=
K
j
i
=
P
(
j
|
i
)
,
∀
i
∈
X
,
∀
B
∈
B
{\displaystyle \kappa (\{j\}|i)=K_{ji}=P(j|i),\qquad \forall i\in X,\quad \forall B\in {\mathcal {B}}}
.
Again the transition probability, the stochastic matrix and the Markov kernel are equivalent reformulations.
Markov kernel defined by a kernel function and a measure
edit
Let
ν
{\displaystyle \nu }
be a measure on
(
Y
,
B
)
{\displaystyle (Y,{\mathcal {B}})}
, and
k
:
Y
×
X
→
[
0
,
∞
]
{\displaystyle k:Y\times X\to [0,\infty ]}
a measurable function with respect to the product
σ
{\displaystyle \sigma }
-algebra
A
⊗
B
{\displaystyle {\mathcal {A}}\otimes {\mathcal {B}}}
such that
∫
Y
k
(
y
,
x
)
ν
(
d
y
)
=
1
,
∀
x
∈
X
{\displaystyle \int _{Y}k(y,x)\nu (\mathrm {d} y)=1,\qquad \forall x\in X}
,
then
κ
(
d
y
|
x
)
=
k
(
y
,
x
)
ν
(
d
y
)
{\displaystyle \kappa (dy|x)=k(y,x)\nu (dy)}
i.e. the mapping
{
κ
:
B
×
X
→
[
0
,
1
]
κ
(
B
|
x
)
=
∫
B
k
(
y
,
x
)
ν
(
d
y
)
{\displaystyle {\begin{cases}\kappa :{\mathcal {B}}\times X\to [0,1]\\\kappa (B|x)=\int _{B}k(y,x)\nu (\mathrm {d} y)\end{cases}}}
defines a Markov kernel.[ 3] This example generalises the countable Markov process example where
ν
{\displaystyle \nu }
was the counting measure . Moreover it encompasses other important examples such as the convolution kernels, in particular the Markov kernels defined by the heat equation. The latter example includes the Gaussian kernel on
X
=
Y
=
R
{\displaystyle X=Y=\mathbb {R} }
with
ν
(
d
x
)
=
d
x
{\displaystyle \nu (dx)=dx}
standard Lebesgue measure and
k
t
(
y
,
x
)
=
1
2
π
t
e
−
(
y
−
x
)
2
/
(
2
t
2
)
{\displaystyle k_{t}(y,x)={\frac {1}{{\sqrt {2\pi }}t}}e^{-(y-x)^{2}/(2t^{2})}}
.
Measurable functions
edit
Take
(
X
,
A
)
{\displaystyle (X,{\mathcal {A}})}
and
(
Y
,
B
)
{\displaystyle (Y,{\mathcal {B}})}
arbitrary measurable spaces, and let
f
:
X
→
Y
{\displaystyle f:X\to Y}
be a measurable function. Now define
κ
(
d
y
|
x
)
=
δ
f
(
x
)
(
d
y
)
{\displaystyle \kappa (dy|x)=\delta _{f(x)}(dy)}
i.e.
κ
(
B
|
x
)
=
1
B
(
f
(
x
)
)
=
1
f
−
1
(
B
)
(
x
)
=
{
1
if
f
(
x
)
∈
B
0
otherwise
{\displaystyle \kappa (B|x)=\mathbf {1} _{B}(f(x))=\mathbf {1} _{f^{-1}(B)}(x)={\begin{cases}1&{\text{if }}f(x)\in B\\0&{\text{otherwise}}\end{cases}}}
for all
B
∈
B
{\displaystyle B\in {\mathcal {B}}}
.
Note that the indicator function
1
f
−
1
(
B
)
{\displaystyle \mathbf {1} _{f^{-1}(B)}}
is
A
{\displaystyle {\mathcal {A}}}
-measurable for all
B
∈
B
{\displaystyle B\in {\mathcal {B}}}
iff
f
{\displaystyle f}
is measurable.
This example allows us to think of a Markov kernel as a generalised function with a (in general) random rather than certain value. That is, it is a multivalued function where the values are not equally weighted.
As a less obvious example, take
X
=
N
,
A
=
P
(
N
)
{\displaystyle X=\mathbb {N} ,{\mathcal {A}}={\mathcal {P}}(\mathbb {N} )}
, and
(
Y
,
B
)
{\displaystyle (Y,{\mathcal {B}})}
the real numbers
R
{\displaystyle \mathbb {R} }
with the standard sigma algebra of Borel sets . Then
κ
(
B
|
n
)
=
{
1
B
(
0
)
n
=
0
Pr
(
ξ
1
+
⋯
+
ξ
x
∈
B
)
n
≠
0
{\displaystyle \kappa (B|n)={\begin{cases}\mathbf {1} _{B}(0)&n=0\\\Pr(\xi _{1}+\cdots +\xi _{x}\in B)&n\neq 0\\\end{cases}}}
where
x
{\displaystyle x}
is the number of element at the state
n
{\displaystyle n}
,
ξ
i
{\displaystyle \xi _{i}}
are i.i.d. random variables (usually with mean 0) and where
1
B
{\displaystyle \mathbf {1} _{B}}
is the indicator function. For the simple case of coin flips this models the different levels of a Galton board .
Composition of Markov Kernels
edit
Given measurable spaces
(
X
,
A
)
{\displaystyle (X,{\mathcal {A}})}
,
(
Y
,
B
)
{\displaystyle (Y,{\mathcal {B}})}
we consider a Markov kernel
κ
:
B
×
X
→
[
0
,
1
]
{\displaystyle \kappa :{\mathcal {B}}\times X\to [0,1]}
as a morphism
κ
:
X
→
Y
{\displaystyle \kappa :X\to Y}
. Intuitively, rather than assigning to each
x
∈
X
{\displaystyle x\in X}
a sharply defined point
y
∈
Y
{\displaystyle y\in Y}
the kernel assigns a "fuzzy" point in
Y
{\displaystyle Y}
which is only known with some level of uncertainty, much like actual physical measurements. If we have a third measurable space
(
Z
,
C
)
{\displaystyle (Z,{\mathcal {C}})}
, and probability kernels
κ
:
X
→
Y
{\displaystyle \kappa :X\to Y}
and
λ
:
Y
→
Z
{\displaystyle \lambda :Y\to Z}
, we can define a composition
λ
∘
κ
:
X
→
Z
{\displaystyle \lambda \circ \kappa :X\to Z}
by the Chapman-Kolmogorov equation
(
λ
∘
κ
)
(
d
z
|
x
)
=
∫
Y
λ
(
d
z
|
y
)
κ
(
d
y
|
x
)
{\displaystyle (\lambda \circ \kappa )(dz|x)=\int _{Y}\lambda (dz|y)\kappa (dy|x)}
.
The composition is associative by the Monotone Convergence Theorem and the identity function considered as a Markov kernel (i.e. the delta measure
κ
1
(
d
x
′
|
x
)
=
δ
x
(
d
x
′
)
{\displaystyle \kappa _{1}(dx'|x)=\delta _{x}(dx')}
) is the unit for this composition.
This composition defines the structure of a category on the measurable spaces with Markov kernels as morphisms, first defined by Lawvere,[ 4] the category of Markov kernels .
Probability Space defined by Probability Distribution and a Markov Kernel
edit
A composition of a probability space
(
X
,
A
,
P
X
)
{\displaystyle (X,{\mathcal {A}},P_{X})}
and a probability kernel
κ
:
(
X
,
A
)
→
(
Y
,
B
)
{\displaystyle \kappa :(X,{\mathcal {A}})\to (Y,{\mathcal {B}})}
defines a probability space
(
Y
,
B
,
P
Y
=
κ
∘
P
X
)
{\displaystyle (Y,{\mathcal {B}},P_{Y}=\kappa \circ P_{X})}
, where the probability measure is given by
P
Y
(
B
)
=
∫
X
∫
B
κ
(
d
y
|
x
)
P
X
(
d
x
)
=
∫
X
κ
(
B
|
x
)
P
X
(
d
x
)
=
E
P
X
κ
(
B
|
⋅
)
.
{\displaystyle P_{Y}(B)=\int _{X}\int _{B}\kappa (dy|x)P_{X}(dx)=\int _{X}\kappa (B|x)P_{X}(dx)=\mathbb {E} _{P_{X}}\kappa (B|\cdot ).}
Let
(
X
,
A
,
P
)
{\displaystyle (X,{\mathcal {A}},P)}
be a probability space and
κ
{\displaystyle \kappa }
a Markov kernel from
(
X
,
A
)
{\displaystyle (X,{\mathcal {A}})}
to some
(
Y
,
B
)
{\displaystyle (Y,{\mathcal {B}})}
. Then there exists a unique measure
Q
{\displaystyle Q}
on
(
X
×
Y
,
A
⊗
B
)
{\displaystyle (X\times Y,{\mathcal {A}}\otimes {\mathcal {B}})}
, such that:
Q
(
A
×
B
)
=
∫
A
κ
(
B
|
x
)
P
(
d
x
)
,
∀
A
∈
A
,
∀
B
∈
B
.
{\displaystyle Q(A\times B)=\int _{A}\kappa (B|x)\,P(dx),\quad \forall A\in {\mathcal {A}},\quad \forall B\in {\mathcal {B}}.}
Regular conditional distribution
edit
Let
(
S
,
Y
)
{\displaystyle (S,Y)}
be a Borel space ,
X
{\displaystyle X}
a
(
S
,
Y
)
{\displaystyle (S,Y)}
-valued random variable on the measure space
(
Ω
,
F
,
P
)
{\displaystyle (\Omega ,{\mathcal {F}},P)}
and
G
⊆
F
{\displaystyle {\mathcal {G}}\subseteq {\mathcal {F}}}
a sub-
σ
{\displaystyle \sigma }
-algebra. Then there exists a Markov kernel
κ
{\displaystyle \kappa }
from
(
Ω
,
G
)
{\displaystyle (\Omega ,{\mathcal {G}})}
to
(
S
,
Y
)
{\displaystyle (S,Y)}
, such that
κ
(
⋅
,
B
)
{\displaystyle \kappa (\cdot ,B)}
is a version of the conditional expectation
E
[
1
{
X
∈
B
}
∣
G
]
{\displaystyle \mathbb {E} [\mathbf {1} _{\{X\in B\}}\mid {\mathcal {G}}]}
for every
B
∈
Y
{\displaystyle B\in Y}
, i.e.
P
(
X
∈
B
∣
G
)
=
E
[
1
{
X
∈
B
}
∣
G
]
=
κ
(
⋅
,
B
)
,
P
-a.s.
∀
B
∈
G
.
{\displaystyle P(X\in B\mid {\mathcal {G}})=\mathbb {E} \left[\mathbf {1} _{\{X\in B\}}\mid {\mathcal {G}}\right]=\kappa (\cdot ,B),\qquad P{\text{-a.s.}}\,\,\forall B\in {\mathcal {G}}.}
It is called regular conditional distribution of
X
{\displaystyle X}
given
G
{\displaystyle {\mathcal {G}}}
and is not uniquely defined.
Transition kernels generalize Markov kernels in the sense that for all
x
∈
X
{\displaystyle x\in X}
, the map
B
↦
κ
(
B
|
x
)
{\displaystyle B\mapsto \kappa (B|x)}
can be any type of (non negative) measure, not necessarily a probability measure.
§36. Kernels and semigroups of kernels