Theoretical Statistics. Lecture 4.: 1. Concentration Inequalities
Theoretical Statistics. Lecture 4.: 1. Concentration Inequalities
Theoretical Statistics. Lecture 4.: 1. Concentration Inequalities
Peter Bartlett
1. Concentration inequalities.
2
exp t 2
if 0 t 2 /b,
2
P (X + t)
exp t
if t > 2 /b.
2b
Johnson-Lindenstrauss
Applications: dimension reduction to simplify computation (nearest
neighbor, clustering, image processing, text processing).
Analysis of machine learning methods: separable by a large margin in high
dimensions implies its really a low-dimensional problem after all.
F (x) =
Y x,
n
where Y Rnd has independent N (0, 1) entries.
Let Yi denote the ith row, for 1 i n. It has a N (0, I) distribution, so
YiT x/kxk2 N (0, 1). Thus,
n
X
2
kY xk22
T
2
Y
x/kxk
=
Z=
i
n.
2
kxk2
i=1
1
nkxk22
2
kF (x)k2
2
P
6
[1
,
1
+
]
2
exp(n
/8).
2
kxk2
m
2
6
[1
,
1
+
]
2
exp(n
/8).
P i 6= j s.t.
2
2
kxi xj k2
Thus, for n > 16/ 2 log(m), this probability is strictly less than 1, so there
exists a suitable mapping.
In fact, we can choose a random projection in this way and ensure that the
probability that it does not satisfy the approximate isometry property is no
more than for n > 16/ 2 log(m/).
10
n
X
i=1
Martingales
X = (X1 , . . . , Xn ),
Define
X1i = (X1 , . . . , Xi ),
Y0 = Ef (X),
Yi = E[f (X)|X1i ].
Then
f (X) Ef (X) = Yn Y0 =
n
X
Di ,
i=1
i=1
n
Y
g(Xi )
=
= Yn ,
f (Xi )
i=1
Pn
Pn
2
D
is
sub-exponential,
with
(
,
b)
=
(
i
i=1
i=1 i , maxi bi ).
!
2 exp(t2 /(2 2 )) if 0 t 2 /b
X
Di t
P
2 exp(t/(2b))
if t > 2 /b.
i
16
Proof:
E exp
X
i
Di
"
= E exp
n1
X
Di
Di
!#
i=1
"
E exp
provided || < b. Iterating shows that
n1
X
i=1
17
E [ exp(Dn )| Fn1 ]
exp(2 n2 /2),
Di is sub-exponential.
Proof:
It suffices to show that
2t
P (|f (X) Ef (X)| t) 2 exp P 2
i Bi
19
n
X
Di .
i=1
Then
i1
i
|Di | = |Yi Yi1 | = E[f (X)|X1 ] E[f (X)|X1 ]
i1
i
= E E[f (X)|X1 ] f (X) X1 Bi .
20
2
sup
a
aA
i.
i
Proof:
Write Z = f (1 , . . . , n ), and notice that a change of i can lead to a
change in Z of no more than Bn = supaA 2|ai |. The result follows.
21
Proof:
Write Z = g(X1 , . . . , Xn ), and notice that a change of Xi can lead to a
change in Z of no more than Bn = 1/n. The result follows.
22