Probability-The Science of Uncertainty and Data
Probability-The Science of Uncertainty and Data
Additionally we have
Definition (Conditional expectation) Given a continuous random 2. Differentiate the CDF of Y to obtain the PDF:
variable X and an event A, with P (A) > 0: E [g(X)∣Y = y] = ∫
∞
g(x)fX∣Y (x∣y)dx.
dFY (y)
fY (y) = dy .
−∞
∞
E[X∣A] = ∫ fX∣A (x)dx. Theorem (Total probability and total expectation theorems)
Proposition (General formula for monotonic g) Let X be a
−∞ continuous random variable and g a function that is monotonic
Definition (Memorylessness of the exponential random variable) fX (x) = ∫
∞
fY (y)fX∣Y (x∣y)dy, wherever fX (x) > 0. The PDF of Y = g(X) is given by
−∞
When we condition an exponential random variable X on the event ∞
X > t we have memorylessness, meaning that the “remaining time” E[X] = ∫ fY (y)E[X∣Y = y]dy. dh
fY (y) = fX (h(y)) ∣ (y)∣ .
X − t given that X > t is also geometric with the same parameter −∞ dy
i.e., Definition (Independence) Jointly continuous random variables
P(X − t > x∣X > t) = P(X > x). X, Y are independent if fX,Y (x, y) = fX (x)fY (y) for all x, y. where h = g −1 in the interval where g is monotonic.
Sums of independent r.v., covariance and correlation Definition (Conditional variance as a random variable) Given The Central Limit Theorem
Proposition (Discrete case) Let X, Y be discrete independent random variables X, Y the conditional variance Var(X∣Y ) is the Theorem (Central Limit Theorem (CLT)) Given a sequence of
random variables and Z = X + Y , then the PMF of Z is random variable that takes the value Var(X∣Y = y) whenever independent random variables {X1 , X2 , . . .} with E[Xi ] = µ and
Y = y. Var(Xi ) = σ 2 , we define
pZ (z) = ∑ pX (x)pY (z − x). Theorem (Law of total variance)
x 1 n
Var(X) = E [Var(X∣Y )] + Var (E[X∣Y ]) . Zn = √ ∑ (Xi − µ).
Proposition (Continuous case) Let X, Y be continuous σ n i=1
independent random variables and Z = X + Y , then the PDF of Z is Proposition (Sum of a random number of independent r.v.)
Then, for every z, we have
∞ Let N be a nonnegative integer random variable.
fZ (z) = ∫ fX (x)fY (z − x)dx. Let X, X1 , X2 , . . . , XN be i.i.d. random variables. lim P(Zn ≤ z) = P(Z ≤ z),
−∞ n→∞
Let Y = ∑i Xi . Then
Proposition (Sum of independent normal r.v.) Let X ∼ N (µx , σx2 ) where Z ∼ N (0, 1).
and Y ∼ N (µy , σy2 ) independent. Then E[Y ] = E[N ]E[X],
Corollary (Normal approximation of a binomial) Let
Z = X + Y ∼ N (µx + µy , σx2 + σy2 ). Var(Y ) = E[N ] Var(X) + (E[X])2 Var(N ). X ∼ Bin(n, p) with n large. Then Sn can be approximated by
Definition (Covariance) We define the covariance of random Z ∼ N (np, np(1 − p)).
variables X, Y as Remark (De Moivre-Laplace 1/2 approximation) Let X ∼ Bin,
Convergence of random variables then P(X = i) = P (i − 12 ≤ X ≤ i + 12 ) and we can use the CLT to
Cov(X, Y ) = E [(X − E[X]) (Y − E[Y ])] .
△
approximate the PMF of X.
Inequalities, convergence, and the Weak Law of
Properties (Properties of covariance) Large Numbers
• If X, Y are independent, then Cov(X, Y ) = 0. Theorem (Markov inequality) Given a random variable X ≥ 0 and,
• Cov(X, X) = Var(X). for every a > 0 we have
• Cov(aX + b, Y ) = a Cov(X, Y ). E[X]
P(X ≥ a) ≤ .
• Cov(X, Y + Z) = Cov(X, Y ) + Cov(X, Z). a
Theorem (Chebyshev inequality) Given a random variable X with
• Cov(X, Y ) = E[XY ] − E[X]E[Y ].
E[X] = µ and Var(X) = σ 2 , for every > 0 we have
Proposition (Variance of a sum of r.v.)
σ2
Var(X1 + ⋯ + Xn ) = ∑ Var(Xi ) + ∑ Cov(Xi , Xj ). P (∣X − µ∣ ≥ ) ≤ .
i i≠j
2
Theorem (Weak Law of Large Number (WLLN)) Given a
Definition (Correlation coefficient) We define the correlation sequence of i.i.d. random variables {X1 , X2 , . . .} with E[Xi ] = µ
coefficient of random variables X, Y , with σX , σY > 0, as and Var(Xi ) = σ 2 , we define
Cov(X, Y ) 1 n
ρ(X, Y ) =
△
. Mn = ∑ Xi ,
σX σY n i=1
Properties (Properties of the correlation coefficient) for every > 0 we have
• −1 ≤ ρ ≤ 1.
lim P (∣Mn − µ∣ ≥ ) = 0.
n→∞
• If X, Y are independent, then ρ = 0.
Definition (Convergence in probability) A sequence of random
• ∣ρ∣ = 1 if and only if X − E[X] = c (Y − E[Y ]).
variables {Yi } converges in probability to the random variable Y if
• ρ(aX + b, Y ) = sign(a)ρ(X, Y ).
lim P (∣Yi − Y ∣ ≥ ) = 0,
n→∞
Conditional expectation and variance, sum of
random number of r.v. for every > 0.
Definition (Conditional expectation as a random variable) Given Properties (Properties of convergence in probability) If Xn → a
random variables X, Y the conditional expectation E[X∣Y ] is the and Yn → b in probability, then
random variable that takes the value E[X∣Y = y] whenever Y = y. • Xn + Yn → a + b.
Theorem (Law of iterated expectations) • If g is a continuous function, then g(Xn ) → g(a).
E [E[X∣Y ]] = E[X]. • E[Xn ] does not always converge to a.