0% found this document useful (0 votes)

51 views

Theoretical Statistics. Lecture 4.: 1. Concentration Inequalities

The document summarizes concentration inequalities for statistics involving sums of independent random variables and martingale difference sequences. It discusses Chernoff bounds, Hoeffding's inequality, sub-Gaussian and sub-exponential random variables, and provides proofs of the Johnson-Lindenstrauss embedding theorem and Azuma-Hoeffding inequality for martingales. Examples on Rademacher averages and empirical processes are also given to demonstrate the bounded differences approach.

Uploaded by

Akon Akki

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

51 views

Theoretical Statistics. Lecture 4.: 1. Concentration Inequalities

Uploaded by

Akon Akki

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 22

Theoretical Statistics. Lecture 4.

Peter Bartlett
1. Concentration inequalities.

Outline of todays lecture

We have been looking at deviation inequalities, i.e., bounds on tail
probabilities like P (Xn t) for some statistic Xn .
1. Using moment generating function bounds, for sums of independent
r.v.s:
Chernoff; Hoeffding; sub-Gaussian, sub-exponential random variables;
Bernstein.
Today: Johnson-Lindenstrauss.
2. Martingale methods:
Hoeffding-Azuma, bounded differences.

Review. Chernoff technique

Theorem: For t > 0:

P (X EX t) inf et MX ().
>0

Theorem: [Hoeffdings Inequality] For a random variable X [a, b] with

EX = and R,
2 (b a)2
ln MX ()
.
8

Review. Sub-Gaussian, Sub-Exponential Random Variables

Definition: X is sub-Gaussian with parameter 2 if, for all R,

2 2
.
ln MX ()
2

Definition: X is sub-exponential with parameters ( 2 , b) if, for all || <

1/b,
2 2
.
ln MX ()
2

Review. Sub-Exponential Random Variables

Theorem: For X sub-exponential with parameters ( 2 , b),

2
exp t 2
if 0 t 2 /b,
2
P (X + t)
exp t
if t > 2 /b.
2b

For independent Xi , sub-exponential with parameters (i2 , bi ), the sum

X = X1 + + Xn is sub-exponential with parameters

P 2
i i , maxi bi .
Example: X 21 is sub-exponential with parameters (4, 4).
5

Sub-Exponential Random Variables: Example

Theorem: [Johnson-Lindenstrauss] For m points x1 , . . . , xm from Rd ,

there is a projection F : Rd Rn that preserves distances in the sense
that, for all xi , xj ,
(1 )kxi xj k22 kF (xi ) F (xj )k22 (1 + )kxi xj k22 ,
provided that n > (16/ 2 ) log m.
That is, we can embed these points in Rn and approximately maintain their
distance relationships, provided that n is not too small. Notice that n is
independent of the ambient dimension d, and depends only logarithmically
on the number of points m.

Johnson-Lindenstrauss
Applications: dimension reduction to simplify computation (nearest
neighbor, clustering, image processing, text processing).
Analysis of machine learning methods: separable by a large margin in high
dimensions implies its really a low-dimensional problem after all.

Johnson-Lindenstrauss Embedding: Proof

We use a random projection:
1

F (x) =
Y x,
n
where Y Rnd has independent N (0, 1) entries.
Let Yi denote the ith row, for 1 i n. It has a N (0, I) distribution, so
YiT x/kxk2 N (0, 1). Thus,
n
X
2
kY xk22
T
2
Y

x/kxk
=
Z=
i
n.
2
kxk2
i=1

Johnson-Lindenstrauss Embedding: Proof

Since Z 2n is the sum of n independent sub-exponential (4, 4) random
variables, it is sub-exponential (4n, 4). And we have that for 0 < t < n,
P (|Z 1| t) 2 exp(t2 /(8n)).
Hence, for 0 < < 1,

kY xk22

2 exp(n 2 /8)
P

1

nkxk22

2
kF (x)k2
2
P

6
[1

,
1
+
]

2
exp(n
/8).
2
kxk2

Johnson-Lindenstrauss Embedding: Proof

m
2

distinct pairs x = xi xj , and using the union

Applying this to the
bound gives

2
m
kF (xi xj )k2
2

6
[1

,
1
+
]

2
exp(n
/8).
P i 6= j s.t.
2
2
kxi xj k2
Thus, for n > 16/ 2 log(m), this probability is strictly less than 1, so there
exists a suitable mapping.
In fact, we can choose a random projection in this way and ensure that the
probability that it does not satisfy the approximate isometry property is no
more than for n > 16/ 2 log(m/).

Concentration Bounds for Martingale Difference Sequences

Next, were going to consider concentration of martingale difference

sequences. The application is to understand how tails of
f (X1 , . . . , Xn ) Ef (X1 , . . . , Xn ) behave, for some function f .
[e.g., in the homework, we have that f is some measure of the performance
of a kernel density estimator.] If we write
f (X1 , . . . , Xn ) Ef (X1 , . . . , Xn )
=

n
X
i=1

E[f (X1 , . . . , Xn )|X1 , . . . , Xi ] E[f (X1 , . . . , Xn )|X1 , . . . , Xi1 ],

then we have represented this deviation as a martingale difference sequence.

Martingales

Definition: A sequence Yn of random variables adapted to a filtration Fn is

a martingale if, for all n,
E|Yn | <
E[Yn+1 |Fn ] = Yn .
Fn is a filtration means these -fields are nested: Fn Fn+1 .
Yn is adapted to Fn means that each Yn is measurable with respect to Fn .
e.g. Fn = (Y1 , . . . , Yn ), the -field generated by the first n variables.
Then we say Yn is a martingale sequence.
e.g. Fn = (X1 , . . . , Xn ). Then Yn is a martingale sequence wrt Xn .
12

Martingale Difference Sequences

(because Yn is measurable wrt Fn , and because of the martingale property).

Pn
Hence, Yn Y0 = i=1 Di .
13

Martingale Difference Sequences: the Doob construction

X = (X1 , . . . , Xn ),

Define

X1i = (X1 , . . . , Xi ),
Y0 = Ef (X),
Yi = E[f (X)|X1i ].
Then

f (X) Ef (X) = Yn Y0 =

n
X

Di ,

i=1

where Di = Yi Yi1 . Also, Yi is a martingale w.r.t. Xi , and hence Di is a

martingale difference sequence. Indeed (because EX = EE[X|Y ]),
i

i+1
i
E[Yi+1 |X1 ] = E E[f (X)|X1 ] X1 = E[f (X)|X1i ] = Yi .
14

Martingale Difference Sequences: another example

[An aside:] Consider two densities f and g, with g absolutely continuous
w.r.t. f . Suppose Xn are drawn i.i.d. from f , and Yn is the likelihood ratio,
n
Y
g(Xi )
Yn =
.
f (Xi )
i=1

Then Yn is a martingale w.r.t. Xn . Indeed,

#
" n+1

Y
n

Y g(Xi )
g(X
)
g(Xi )
n+1
n
n
E[Yn+1 |X1 ] = E
X = E
f (Xi ) 1
f (Xn+1 )
f (Xi )
i=1

i=1

n
Y
g(Xi )
=
= Yn ,
f (Xi )
i=1

because E[g(Xn+1 )/f (Xn+1 )] = 1.

Concentration Bounds for Martingale Difference Sequences

Theorem: Consider a martingale difference sequence Dn (adapted to a

filtration Fn ) that satisfies
for || 1/bn a.s., E [ exp(Dn )| Fn1 ] exp(2 n2 /2).
Then

2
D
is
sub-exponential,
with
(
,
b)
=
(

i
i=1
i=1 i , maxi bi ).

!
2 exp(t2 /(2 2 )) if 0 t 2 /b
X

Di t
P

2 exp(t/(2b))

if t > 2 /b.
i

Concentration Bounds for Martingale Difference Sequences

Proof:
E exp

X
i

= E exp

n1
X

i=1

E exp
provided || < b. Iterating shows that

n1
X
i=1

E [ exp(Dn )| Fn1 ]
exp(2 n2 /2),

Di is sub-exponential.

Concentration Bounds for Martingale Difference Sequences

Theorem: Consider a martingale difference sequence Di with |Di | Bi

a.s. Then

!

X
2
2t

P
.
P
Di t 2 exp
2

B
i i
i

Proof:
It suffices to show that

E [ exp(Di )| Fi1 ] exp(2 Bi2 /2)

But |Di | Bi a.s., so the conditioned variable (Di |Fi1 ) Bi a.s., so it is
sub-Gaussian with parameter i2 = Bi2 .
18

Bounded Differences Inequality

Theorem: Suppose f : X n R satisfies the following bounded differences inequality:

for all x1 , . . . , xn , xi X ,
|f (x1 , . . . , xn ) f (x1 , . . . , xi1 , xi , xi+1 , . . . , xn )| Bi .
Then

2t
P (|f (X) Ef (X)| t) 2 exp P 2
i Bi

Bounded Differences Inequality

Proof: Use the Doob construction.
Yi = E[f (X)|X1i ],
Di = Yi Yi1 ,
f (X) Ef (X) =

n
X

Di .

i=1

Examples: Rademacher Averages

For a set A Rn , consider
Z = sup h, ai,
aA

where = (1 , . . . n ) is a sequence of i.i.d. uniform {1} random

variables. Define the Rademacher complexity of A as R(A) = EZ. [This
is a measure of the size of A.] The bounded differences approach implies
that Z is concentrated around R(A):
Theorem: Z is sub-Gaussian with parameter 4

2
sup
a
aA
i.
i

Proof:
Write Z = f (1 , . . . , n ), and notice that a change of i can lead to a
change in Z of no more than Bn = supaA 2|ai |. The result follows.
21

Examples: Empirical Processes

For a class F of functions f : X [0, 1], suppose that X1 , . . . , Xn , X are
i.i.d. on X , and consider

n

1X

f (Xi ) = P f Pn f .
Z = sup Ef (X)
| {z }
n i=1
f F
emp proc
F

If Z converges to 0, this is called a uniform law of large numbers. Here, we

show that Z is concentrated about EZ:
Theorem: Z is sub-Gaussian with parameter 1/n.

Proof:
Write Z = g(X1 , . . . , Xn ), and notice that a change of Xi can lead to a
change in Z of no more than Bn = 1/n. The result follows.
22

(Oxford World's Classics) Euripides, James Morwood, Edith Hall - The Trojan Women and Other Plays-Oxford University Press (2009) PDF
92% (12)
(Oxford World's Classics) Euripides, James Morwood, Edith Hall - The Trojan Women and Other Plays-Oxford University Press (2009) PDF
939 pages
An Informal Introduction To Stochastic Calculus With Applications
No ratings yet
An Informal Introduction To Stochastic Calculus With Applications
10 pages
Violet Evergarden Volume 2
100% (4)
Violet Evergarden Volume 2
201 pages
Probability Cheat Sheet: Distributions
No ratings yet
Probability Cheat Sheet: Distributions
2 pages
Technische Universit at Berlin: YB Iral CH Afer
No ratings yet
Technische Universit at Berlin: YB Iral CH Afer
3 pages
hw8 (5555)
No ratings yet
hw8 (5555)
3 pages
Lecture Notes 2 1 Probability Inequalities
No ratings yet
Lecture Notes 2 1 Probability Inequalities
9 pages
Introduction To Stochastic Processes: Master INVESMAT 2017-2018 Unit 3
No ratings yet
Introduction To Stochastic Processes: Master INVESMAT 2017-2018 Unit 3
50 pages
On The Compound Poisson Distribution: Acta Sci. Math. (Szeged)
No ratings yet
On The Compound Poisson Distribution: Acta Sci. Math. (Szeged)
6 pages
James Notes
No ratings yet
James Notes
83 pages
5 Continuous Random Variables
No ratings yet
5 Continuous Random Variables
11 pages
5 The Stochastic Approximation Algorithm: 5.1 Stochastic Processes - Some Basic Concepts
No ratings yet
5 The Stochastic Approximation Algorithm: 5.1 Stochastic Processes - Some Basic Concepts
14 pages
2006 Jessen - Mikosch Regularly Varying Functions
No ratings yet
2006 Jessen - Mikosch Regularly Varying Functions
22 pages
Malinnikova Lecture1
No ratings yet
Malinnikova Lecture1
14 pages
Seminar in Large Deviations and Applications
No ratings yet
Seminar in Large Deviations and Applications
14 pages
B671-672 Supplemental Notes 2 Hypergeometric, Binomial, Poisson and Multinomial Random Variables and Borel Sets
No ratings yet
B671-672 Supplemental Notes 2 Hypergeometric, Binomial, Poisson and Multinomial Random Variables and Borel Sets
13 pages
Convergence of Random Variables
No ratings yet
Convergence of Random Variables
11 pages
Convex Problems
No ratings yet
Convex Problems
48 pages
PROBABILITY 03 Rv-Dist-Moments 5 8
No ratings yet
PROBABILITY 03 Rv-Dist-Moments 5 8
21 pages
Notes March2002
No ratings yet
Notes March2002
85 pages
4.2. Sequences in Metric Spaces
No ratings yet
4.2. Sequences in Metric Spaces
8 pages
hw1 Solution 16 PDF
No ratings yet
hw1 Solution 16 PDF
17 pages
Practice Final Exam Solutions: 2 SN CF N N N N 2 N N N
No ratings yet
Practice Final Exam Solutions: 2 SN CF N N N N 2 N N N
7 pages
Lec3 Inverse Transformation Rejection
No ratings yet
Lec3 Inverse Transformation Rejection
46 pages
Partiel
No ratings yet
Partiel
8 pages
Elements of Probability Theory: 2.1 Probability, Random Variables and Random Matrices
No ratings yet
Elements of Probability Theory: 2.1 Probability, Random Variables and Random Matrices
7 pages
Tutorial 9
No ratings yet
Tutorial 9
4 pages
Lecture 09
No ratings yet
Lecture 09
15 pages
Collection of Assignments
No ratings yet
Collection of Assignments
2 pages
Background For Lesson 5: 1 Cumulative Distribution Function
No ratings yet
Background For Lesson 5: 1 Cumulative Distribution Function
5 pages
Information Theory Differential Entropy
No ratings yet
Information Theory Differential Entropy
29 pages
Properties of PDF and CDF For Continuous R.V.
No ratings yet
Properties of PDF and CDF For Continuous R.V.
7 pages
math400-exercises-chapt1-co
No ratings yet
math400-exercises-chapt1-co
7 pages
Multivariate Distributions
No ratings yet
Multivariate Distributions
8 pages
Random Variables: Presented by in Stochastic Analysis and Inverse Modelling
100% (1)
Random Variables: Presented by in Stochastic Analysis and Inverse Modelling
21 pages
Asymtotic Behaior
No ratings yet
Asymtotic Behaior
169 pages
Sobolov Spaces
No ratings yet
Sobolov Spaces
49 pages
Malinnikova Lecture2
No ratings yet
Malinnikova Lecture2
16 pages
MIT18 100CF12 Prob Set 4
No ratings yet
MIT18 100CF12 Prob Set 4
5 pages
Abstract:: 1 2 Definitions and Prelimiaries 3 Main Results
No ratings yet
Abstract:: 1 2 Definitions and Prelimiaries 3 Main Results
10 pages
L10_Subgrad_PGD (partially annotated)
No ratings yet
L10_Subgrad_PGD (partially annotated)
39 pages
sec05
No ratings yet
sec05
3 pages
Module 32 2 PDF
No ratings yet
Module 32 2 PDF
39 pages
Brand Name Distributions (Statistics)
No ratings yet
Brand Name Distributions (Statistics)
25 pages
Solutions To Sample Midterm Questions
No ratings yet
Solutions To Sample Midterm Questions
14 pages
MGF
No ratings yet
MGF
3 pages
Bartle
67% (3)
Bartle
11 pages
Numerical Analysis Lecture Notes
No ratings yet
Numerical Analysis Lecture Notes
72 pages
Đồ_án_CSXS (1)
No ratings yet
Đồ_án_CSXS (1)
28 pages
Chapter 5 Limit Theorems
No ratings yet
Chapter 5 Limit Theorems
31 pages
Chap5--SM4325
No ratings yet
Chap5--SM4325
10 pages
Midterm Solutions: N N N N N N N N N N N
No ratings yet
Midterm Solutions: N N N N N N N N N N N
4 pages
Functional Analysis.
No ratings yet
Functional Analysis.
165 pages
1 Banach Spaces and Hilbert Spaces
No ratings yet
1 Banach Spaces and Hilbert Spaces
10 pages
Sequences in RN - Lectures Notes
No ratings yet
Sequences in RN - Lectures Notes
3 pages
Class Notes 2
No ratings yet
Class Notes 2
18 pages
Chapter 4-6
No ratings yet
Chapter 4-6
39 pages
Differential Forms
From Everand
Differential Forms
Henri Cartan
5/5 (2)
Lectures on Integral Equations
From Everand
Lectures on Integral Equations
Harold Widom
3.5/5 (1)
Elementary Calculus
From Everand
Elementary Calculus
George N. Frempong
No ratings yet
Theory of Approximation
From Everand
Theory of Approximation
N. I. Achieser
No ratings yet
Mathematics 1St First Order Linear Differential Equations 2Nd Second Order Linear Differential Equations Laplace Fourier Bessel Mathematics
From Everand
Mathematics 1St First Order Linear Differential Equations 2Nd Second Order Linear Differential Equations Laplace Fourier Bessel Mathematics
Andrew Igla
No ratings yet
Nptel: NOC:Stochastic Processes - 1 - Video Course
No ratings yet
Nptel: NOC:Stochastic Processes - 1 - Video Course
2 pages
Consent Form ESL PDF
No ratings yet
Consent Form ESL PDF
3 pages
Performance Comparison Between Naïve Bayes, Decision Tree and K-Nearest Neighbor in Searching Alternative Design in An Energy Simulation Tool
No ratings yet
Performance Comparison Between Naïve Bayes, Decision Tree and K-Nearest Neighbor in Searching Alternative Design in An Energy Simulation Tool
7 pages
National Institute of Rural Development and Panchayati Raj: Instructions For Filling Online Application
No ratings yet
National Institute of Rural Development and Panchayati Raj: Instructions For Filling Online Application
1 page
Concave Function
No ratings yet
Concave Function
3 pages
What Is The Intuition Behind Hoeffding Inequality - Quora
No ratings yet
What Is The Intuition Behind Hoeffding Inequality - Quora
2 pages
885838
No ratings yet
885838
12 pages
FRLO 14410C T2 Fuller 10 Speed Parts Breakdown Manual
No ratings yet
FRLO 14410C T2 Fuller 10 Speed Parts Breakdown Manual
27 pages
EPD Overview 2024
No ratings yet
EPD Overview 2024
2 pages
Department of Education: Philippine Contemporary Arts in The Region Quarter3-Week3-4
No ratings yet
Department of Education: Philippine Contemporary Arts in The Region Quarter3-Week3-4
6 pages
Final AOP5 23-07-09
No ratings yet
Final AOP5 23-07-09
206 pages
Items Net Sales Gender Age Custome R Type of Customer Method of Payment Marital Status
No ratings yet
Items Net Sales Gender Age Custome R Type of Customer Method of Payment Marital Status
5 pages
BP404 Pharmacology - I Assignment-1
No ratings yet
BP404 Pharmacology - I Assignment-1
1 page
Soal PG Gcard 2
No ratings yet
Soal PG Gcard 2
6 pages
Clincal Guidelines For Genetics Services 2021
No ratings yet
Clincal Guidelines For Genetics Services 2021
73 pages
Devil's Triangle
No ratings yet
Devil's Triangle
5 pages
PCS 7 APACS OS Symbols and Faceplates V6.1
No ratings yet
PCS 7 APACS OS Symbols and Faceplates V6.1
66 pages
RULE-1160
No ratings yet
RULE-1160
23 pages
On The Method of Ship's Transoceanic Route Planning
No ratings yet
On The Method of Ship's Transoceanic Route Planning
8 pages
Dams in The Phil.
No ratings yet
Dams in The Phil.
33 pages
The Ultimate GP Referral Script Final - Compressed
No ratings yet
The Ultimate GP Referral Script Final - Compressed
12 pages
Relationship Between Cognitive Dissonance and Achievement in Mathematics Among Higher Secondary School Students
No ratings yet
Relationship Between Cognitive Dissonance and Achievement in Mathematics Among Higher Secondary School Students
5 pages
Elastomers Chemical Compatibility Char
No ratings yet
Elastomers Chemical Compatibility Char
12 pages
Kitchen and Dining Plan
No ratings yet
Kitchen and Dining Plan
1 page
Module 1 - Eng - 3day
No ratings yet
Module 1 - Eng - 3day
116 pages
B2016 Interpersonal and Intrapersonal Expectancies, Trusz ROUTLEDGE
100% (1)
B2016 Interpersonal and Intrapersonal Expectancies, Trusz ROUTLEDGE
203 pages
The Good Life Is A: Not A State of
No ratings yet
The Good Life Is A: Not A State of
8 pages
Angel Naval V COMELEC
No ratings yet
Angel Naval V COMELEC
22 pages
Research Paradigms
No ratings yet
Research Paradigms
24 pages
Chapter 7 Integrals
No ratings yet
Chapter 7 Integrals
37 pages
Vox Pops Video Worksheets Teacher's Notes
No ratings yet
Vox Pops Video Worksheets Teacher's Notes
3 pages
108 Circular 2022
No ratings yet
108 Circular 2022
10 pages
SMEs India - 15509
0% (1)
SMEs India - 15509
1,992 pages
IFMS West Bengal Brochure
100% (2)
IFMS West Bengal Brochure
36 pages