Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
2 views

Week 5-8 Short Notes

The document provides comprehensive notes on continuous random variables, covering their functions, expected values, variances, and various inequalities. It also discusses joint densities, independence, empirical distributions, sample statistics, and limit theorems including the Central Limit Theorem. Additionally, it introduces specific distributions such as Gamma, Beta, and Cauchy, along with important results related to normal distributions and their properties.

Uploaded by

Vijayrajtnu
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Week 5-8 Short Notes

The document provides comprehensive notes on continuous random variables, covering their functions, expected values, variances, and various inequalities. It also discusses joint densities, independence, empirical distributions, sample statistics, and limit theorems including the Central Limit Theorem. Additionally, it introduces specific distributions such as Gamma, Beta, and Cauchy, along with important results related to normal distributions and their properties.

Uploaded by

Vijayrajtnu
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

Statistics for Data Science - 2

Week 5 Notes

1. Functions of continuous random variable:


Suppose X is a continuous random variable with CDF FX and PDF fX and suppose
g : R → R is a (reasonable) function. Then, Y = g(X) is a random variable with CDF
FY determined as follows:

• FY (y) = P (Y ≤ y) = P (g(X) ≤ y) = P (X ∈ {x : g(x) ≤ y})


• To evaluate the above probability
– Convert the subset Ay = {x : g(x) ≤ y} into intervals in real line.
– Find the probability that X falls in those intervals.
R
– FY (y) = P (X ∈ AY ) = AY fX (x)dx
• If FY has no jumps, you may be able to differentiate and find a PDF.

2. Theorem: Monotonic differentiable function


Suppose X is a continuous random variable with PDF fX . Let g(x) be monotonic for
dg(x)
x ∈ supp(X) with derivative g ′ (x) = . Then, the PDF of Y = g(X) is
dx
1 −1
fY (y) = fX (g (y))
|g ′ (g −1 (y))|
• Translation: Y = X + a
fY (y) = fX (y − a)
• Scaling: Y = aX
1
fY (y) = fX (ya)
|a|
• Affine: Y = aX + b
1
fY (y) = fX ((y − b)a)
|a|
• Affine transformation of a normal random variable is normal.

3. Expected value of function of continuous random variable:


Let X be a continuous random variable with density fX (x). Let g : R → R be a function.
The expected value of g(X), denoted E[g(X)], is given by
Z ∞
E[g(X)] = g(x)fX (x)dx
−∞

whenever the above integral exists.

1
• The integral may diverge to ±∞ or may not exist in some cases.

4. Expected value (mean) of a continuous random variable:


Mean, denoted E[X] or µX or simply µ is given by
Z ∞
E[X] = xfX (x)dx
−∞

5. Variance of a continuous random variable:


2
Variance, denoted Var[X] or σX or simply σ 2 is given by
Z ∞
2
Var(X) = E[(X − E[X]) ] = (x − µ)2 fX (x)dx
−∞

• Variance is a measure of spread of X about its mean.


• Var(X) = E[X 2 ] − E[X]2

X E[X] Var(X)
a+b (b−a)2
Uniform[a, b] 2 12
1 1
Exp(λ) λ λ2

Normal(µ, σ 2 ) µ 2
σ

6. Markov’s inequality:
If X is a continuous random variable with mean µ and non-negative supp(X) (i.e. P (X <
0) = 0), then
µ
P (X > c) ≤
c
7. Chebyshev’s inequality:
If X is a continuous random variable with mean µ and variance σ 2 , then
1
P (|X − µ| ≥ kσ) ≤
k2

8. Marginal density: Let (X, Y ) be jointly distributed where X is discrete with range
TX and PMF pX (x).
For each x ∈ TX , we have a continuous random variable Yx with density fYx (y).
fYx (y) : conditional density of Y given X = x, denoted fY |X=x (y).

• Marginal density of Y
P
– fY (y) = pX (x)fY |X=x (y)
x∈TX

2
9. Conditional probability of discrete given continuous: Suppose X and Y are
jointly distributed with X ∈ TX being discrete with PMF pX (x) and conditional densi-
ties fY |X=x (y) for x ∈ TX . The conditional probability of X given Y = y0 ∈ supp(Y ) is
defined as

pX (x)fY |X=x (y0 )


• P (X = x | Y = y0 ) =
fY (y0 )

3
Statistics for Data Science - 2
Week 6 Notes
Continuous Random Variables

1. Joint density: A function f (x, y) is said to be a joint density function if


• f (x, y) ≥ 0, i.e. f is non-negative.
R∞ R∞
• f (x, y)dxdy = 1
−∞ −∞

2. 2D uniform distribution: Fix some (reasonable) region D in R2 with total area |D|.
We say that (X, Y ) ∼ Uniform(D) if they have the joint density
(
1
(x, y) ∈ D
fXY (x, y) = |D|
0 otherwise

3. Marginal density: Suppose (X, Y ) have joint density fXY (x, y). Then,
y=∞
• X has the marginal density fX (x) =
R
fXY (x, y)dy.
y=−∞
x=∞
• Y has the marginal density fY (y) =
R
fXY (x, y)dx.
x=−∞
– In general the marginals do not determine joint density.
4. Independence: (X, Y ) with joint density fXY (x, y) are independent if
• fXY (x, y) = fX (x)fY (y)
– If independent, the marginals determine the joint density.
5. Conditional density: Let (X, Y ) be random variables with joint density fXY (x, y).
Let fX (x) and fY (y) be the marginal densities.
• For a such that fX (a) > 0, the conditional density of Y given X = a, denoted as
fY |X=a (y), is defined as

fXY (a, y)
fY |X=a (y) =
fX (a)

• For b such that fY (b) > 0, the conditional density of X given Y = b, denoted as
fX|Y =b (x), is defined as

fXY (x, b)
fX|Y =b (x) =
fY (b)
6. Properties of conditional density: Joint = Marginal × Conditional, for x = a and
y = b such that fX (a) > 0 and fY (b) > 0.

• fXY (a, b) = fX (a)fY |X=a (b) = fY (b)fX|Y =b (a)

Page 2
Statistics for Data Science - 2
Week 7 Notes
Statistics from samples and Limit theorems

1. Empirical distribution:
Let X1 , X2 , . . . , Xn ∼ X be i.i.d. samples. Let #(Xi = t) denote the number of times
t occurs in the samples. The empirical distribution is the discrete distribution with
PMF
#(Xi = t)
p(t) =
n
• The empirical distribution is random because it depends on the actual sample
instances.
• Descriptive statistics: Properties of empirical distribution. Examples :
– Mean of the distribution
– Variance of the distribution
– Probability of an event
• As number of samples increases, the properties of empirical distribution should
become close to that of the original distribution.

2. Sample mean:
Let X1 , X2 , . . . , Xn ∼ X be i.i.d. samples. The sample mean, denoted X, is defined to
be the random variable
X1 + X 2 + . . . + Xn
X=
n
• Given a sampling x1 , . . . , xn the value taken by the sample mean X is x =
x1 + x2 + . . . + xn
. Often, X and x are both called sample mean.
n

3. Expected value and variance of sample mean:


Let X1 , X2 , . . . , Xn be i.i.d. samples whose distribution has a finite mean µ and variance
σ 2 . The sample mean X has expected value and variance given by

σ2
E[X] = µ, Var(X) =
n
• Expected value of sample mean equals the expected value or mean of the distri-
bution.
• Variance of sample mean decreases with n.
4. Sample variance:
Let X1 , X2 , . . . , Xn ∼ X be i.i.d. samples. The sample variance, denoted S 2 , is defined
to be the random variable
(X1 − X)2 + (X2 − X)2 + . . . + (Xn − X)2
S2 = ,
n−1

where X is the sample mean.

5. Expected value of sample variance:


Let X1 , X2 , . . . , Xn be i.i.d. samples whose distribution has a finite variance σ 2 . The
2 (X1 − X)2 + (X2 − X)2 + . . . + (Xn − X)2
sample variance S = has expected value
n−1
given by
E[S 2 ] = σ 2

• Values of sample variance, on average, give the variance of distribution.


• Variance of sample variance will decrease with number of samples (in most cases).
• As n increases, sample variance takes values close to distribution variance.

6. Sample proportion:
The sample proportion of A, denoted S(A), is defined as

number of Xi for which A is true


S(A) =
n
• As n increases, values of S(A) will be close to P (A).
• Mean of S(A) equals P (A).
• Variance of S(A) tends to 0.

7. Weak law of large numbers:


Let X1 , X2 , . . . , Xn ∼ iid X with E[X] = µ, Var(X) = σ 2 .
X1 + X2 + . . . + Xn
Define sample mean X = . Then,
n
σ2
P (|X − µ| > δ) ≤
nδ 2

Page 2
Statistics for Data Science - 2
Week 8 Notes
Statistics from samples and Limit theorems

1. Moment generating function (MGF):


Let X be a zero-mean random variable (E[X] = 0). The MGF of X, denoted MX (λ),
is a function from R to R defined as
MX (λ) = E[eλX ]

MX (λ) = E[eλX ]
λ2 X 2 λ3 X 3
= E[1 + λX + + + . . .]
2! 3!
λ2 λ3
= 1 + λE[X] + E[X ] + E[X 3 ] + . . .
2
2! 3!
λk
That is coefficient of in the MGF of X gives the kth moment of X.
k!
2 2
• If X ∼ Normal(0, σ 2 ) then, MX (λ) = eλ σ /2
• Let X1 , X2 , . . . , Xn ∼ i.i.d. X and let S = X1 + X2 + . . . + Xn , then
MS (λ) = (E[eλX ])n = [MX (λ)]n
It implies that MGF of sum of independent random variables is product of the
individual MGFs.

2. Central limit theorem: Let X1 , X2 , . . . , Xn ∼ iid X with E[X] = µ, Var(X) = σ 2 .


Define Y = X1 + X2 + . . . + Xn . Then,
Y − nµ
√ ≈ Normal(0, 1).

3. Gamma distribution:
X ∼ Gamma(α, β) if PDF fx (x) ∝ xα−1 e−βx , x>0
• α > 0 is a shape parameter.
• β > 0 is a rate parameter.
1
• θ = is a scale parameter.
β
α
• Mean, E[X] =
β
α
• Variance, Var(X) = 2
β
4. Beta distribution:
X ∼ Beta(α, β) if PDF fx (x) ∝ xα−1 (1 − x)β−1 , 0<x<1

• α > 0, β > 0 are the shape parameters.


α
• Mean, E[X] =
α+β
αβ
• Variance, Var(X) = 2
(α + β) (α + β + 1)

5. Cauchy distribution:
1 α
X ∼ Cauchy(θ, α2 ) if PDF fx (x) ∝
π α + (x − θ)2
2

• θ is a location parameter.
• α > 0 is a scale parameter.
• Mean and variance are undefined.

6. Some important results:

• Let Xi ∼ Normal(µi , σi2 ) are independent and let Y = a1 X1 + a2 X2 + . . . an Xn ,


then
Y ∼ Normal(µ, σ 2 )
where µ = a1 µ1 + a2 µ2 + . . . an µn and σ 2 = a21 σ12 + a22 σ22 + . . . a2n σn2
That is linear combinations of i.i.d. normal distributions is again a normal distri-
bution.

• Sum of n i.i.d. Exp(β) is Gamma(n, β).


 
1 1
• Square of Normal(0, σ ) is Gamma , 2 .
2
2 2σ
X
• Suppose X, Y ∼ i.i.d. Normal(0, σ 2 ). Then, ∼ Cauchy(0, 1).
Y

• Suppose X ∼ Gamma(α, k), Y ∼ Gamma(β, k) are independent random vari-


X
ables, then ∼ Beta(α, β).
X +Y

• Sum of n independent Gamma(α, β) is Gamma(nα, β).


 
n 1
• If X1 , X2 , . . . , Xn ∼ i.i.d. Normal(0, σ ) , then
2
X12 +X22 +. . .+Xn2 ∼ Gamma , .
2 2σ 2

Page 2
 
n 1
• Gamma , is called Chi-square distribution with n degrees of freedom, de-
2 2
noted χ2n .

• Suppose X1 , X2 , . . . , Xn ∼ i.i.d. Normal(µ, σ 2 ). Suppose that X and S 2 denote


the sample mean and sample variance, respectively, then
(n − 1)S 2
(i) ∼ χ2n−1
σ2
(ii) X and S 2 are independent.

Page 3

You might also like