Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
83 views

Probability-The Science of Uncertainty and Data

This document provides an overview of key concepts in probability and statistics, including: 1) It defines important terms like sample space, events, random variables, probability mass functions, independence, and conditional probability. 2) It presents common probability distributions like the binomial, geometric, uniform, and Bernoulli distributions and provides examples of how to calculate probabilities for each. 3) It introduces Bayes' rule for calculating conditional probabilities and shows how independence impacts conditional probabilities.

Uploaded by

Almighty59
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
83 views

Probability-The Science of Uncertainty and Data

This document provides an overview of key concepts in probability and statistics, including: 1) It defines important terms like sample space, events, random variables, probability mass functions, independence, and conditional probability. 2) It presents common probability distributions like the binomial, geometric, uniform, and Bernoulli distributions and provides examples of how to calculate probabilities for each. 3) It introduces Bayes' rule for calculating conditional probabilities and shows how independence impacts conditional probabilities.

Uploaded by

Almighty59
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Probability–the Science of Uncertainty Theorem (Bayes’ rule) Given a partition {A1 , A2 , . . .

} of the Discrete random variables


sample space, meaning that ⋃ Ai = Ω and the events are disjoint,
and Data i Probability mass function and expectation
by Fabián Kozynski
and if P(Ai ) > 0 for all i, then for every event B, the conditional
Definition (Random variable) A random variable X is a function
probabilities P(Ai ∣B) can be obtained from the conditional
of the sample space Ω into the real numbers (or Rn ). Its range can
probabilities P(B∣Ai ) and the initial probabilities P(Ai ) as follows:
be discrete or continuous.
Probability P(Ai )P(B∣Ai ) Definition (Probability mass funtion (PMF)) The probability law
P(Ai ∣B) = .
Probability models and axioms ∑j P(Aj )P(B∣Aj ) of a discrete random variable X is called its PMF. It is defined as
Definition (Sample space) A sample space Ω is the set of all pX (x) = P(X = x) = P ({ω ∈ Ω ∶ X(ω) = x}) .
Independence
possible outcomes. The set’s elements must be mutually exclusive,
collectively exhaustive and at the right granularity. Definition (Independence of events) Two events are independent Properties
Definition (Event) An event is a subset of the sample space. if occurrence of one provides no information about the other. We
say that A and B are independent if pX (x) ≥ 0, ∀ x.
Probability is assigned to events.
Definition (Probability axioms) A probability law P assigns P(A ∩ B) = P(A)P(B). ∑x pX (x) = 1.
probabilities to events and satisfies the following axioms: Example (Bernoulli random variable) A Bernoulli random
Equivalently, as long as P(A) > 0 and P(B) > 0,
Nonnegativity P(A) ≥ 0 for all events A. variable X with parameter 0 ≤ p ≤ 1 (X ∼ Ber(p)) takes the
P(B∣A) = P(B) P(A∣B) = P(A). following values:
Normalization P(Ω) = 1. ⎧

⎪1 w.p. p,
(Countable) additivity For every sequence of events A1 , A2 , . . . Remarks X=⎨

⎪0 w.p. 1 − p.
• The definition of independence is symmetric with respect to ⎩
such that Ai ∩ Aj = ∅: P (⋃ Ai ) = ∑ P(Ai ).
i i A and B. An indicator random variable of an event (IA = 1 if A occurs) is an
Corollaries (Consequences of the axioms) • The product definition applies even if P(A) = 0 or P(B) = 0. example of a Bernoulli random variable.
Corollary If A and B are independent, then A and B c are Example (Discrete uniform random variable) A Discrete uniform
• P(∅) = 0.
independent. Similarly for Ac and B, or for Ac and B c . random variable X between a and b with a ≤ b (X ∼ Uni[a, b])
• For any finite collection of disjoint events A1 , . . . , An , takes any of the values in {a, a + 1, . . . , b} with probability b−a+1
1
.
n n Definition (Conditional independence) We say that A and B are
P ( ⋃ Ai ) = ∑ P(Ai ). independent conditioned on C, where P(C) > 0, if Example (Binomial random variable) A Binomial random
i=1 i=1 variable X with parameters n (natural number) and 0 ≤ p ≤ 1
• P(A) + P(Ac ) = 1. P(A ∩ B∣C) = P(A∣C)P(B∣C). (X ∼ Bin(n, p)) takes values in the set {0, 1, . . . , n} with
• P(A) ≤ 1. Definition (Independence of a collection of events) We say that probabilities pX (i) = (ni)pi (1 − p)n−i .
events A1 , A2 , . . . , An are independent if for every collection of It represents the number of successes in n independent trials where
• If A ⊂ B, then P(A) ≤ P(B).
distinct indices i1 , i2 , . . . , ik , we have each trial has a probability of success p. Therefore, it can also be
• P(A ∪ B) = P(A) + P(B) − P(A ∩ B). seen as the sum of n independent Bernoulli random variables, each
P(Ai1 ∩ . . . ∩ Aik ) = P(Ai1 ) ⋅ P(Ai2 )⋯P(Aik ).
• P(A ∪ B) ≤ P(A) + P(B). with parameter p.
Example (Discrete uniform law) Assume Ω is finite and consists Counting Example (Geometric random variable) A Geometric random
of n equally likely elements. Also, assume that A ⊂ Ω with k variable X with parameter 0 ≤ p ≤ 1 (X ∼ Geo(p)) takes values in
This section deals with finite sets with uniform probability law. In
elements. Then P(A) = n k
. the set {1, 2, . . .} with probabilities pX (i) = (1 − p)i−1 p.
this case, to calculate P(A), we need to count the number of
elements in A and in Ω. It represents the number of independent trials until (and including)
Conditioning and Bayes’ rule the first success, when the probability of success in each trial is p.
Remark (Basic counting principle) For a selection that can be
Definition (Conditional probability) Given that event B has Definition (Expectation/mean of a random variable) The
done in r stages, with ni choices at each stage i, the number of
occurred and that P (B) > 0, the probability that A occurs is
possible selections is n1 ⋅ n2 ⋯nr . expectation of a discrete random variable is defined as
P(A ∩ B) Definition (Permutations) The number of permutations
P(A∣B) =

. E[X] = ∑ xpX (x).

P(B) (orderings) of n different elements is
x
Remark (Conditional probabilities properties) They are the same n! = 1 ⋅ 2 ⋅ 3⋯n. assuming ∑x ∣x∣pX (x) < ∞.
as ordinary probabilities. Assuming P(B) > 0:
Definition (Combinations) Given a set of n elements, the number Properties (Properties of expectation)
• P(A∣B) ≥ 0. of subsets with exactly k elements is
• P(Ω∣B) = 1 • If X ≥ 0 then E[X] ≥ 0.
n n!
• P(B∣B) = 1. ( )= . • If a ≤ X ≤ b then a ≤ E[X] ≤ b.
k k!(n − k)!
• If A ∩ C = ∅, P(A ∪ C∣B) = P(A∣B) + P(C∣B). • If X = c then E[X] = c.
Definition (Partitions) We are given an n−element set and
Proposition (Multiplication rule) nonnegative integers n1 , n2 , . . . , nr , whose sum is equal to n. The Example Expected value of know r.v.
P(A1 ∩A2 ∩⋯∩An ) = P(A1 )⋅P(A2 ∣A1 )⋯P(An ∣A1 ∩A2 ∩⋯∩An−1 ). number of partitions of the set into r disjoint subsets, with the ith • If X ∼ Ber(p) then E[X] = p.
subset containing exactly ni elements, is equal to
Theorem (Total probability theorem) Given a partition • If X = IA then E[X] = P(A).
{A1 , A2 , . . .} of the sample space, meaning that ⋃ Ai = Ω and the n n!
i
( )= . • If X ∼ Uni[a, b] then E[X] = a+b
.
n1 , . . . , nr n1 !n2 !⋯nr ! 2
events are disjoint, and for every event B, we have • If X ∼ Bin(n, p) then E[X] = np.
Remark This is the same as counting how to assign n distinct
P(B) = ∑ P(Ai )P(B∣Ai ).
i
elements to r people, giving each person i exactly ni elements. • If X ∼ Geo(p) then E[X] = 1
p
.
Theorem (Expected value rule) Given a random variable X and a Properties (Properties of joint PMF) Proposition (Expectation of product of independent r.v.) If X
function g ∶ R → R, we construct the random variable Y = g(X). and Y are discrete independent random variables,
• ∑ ⋯ ∑ pX1 ,...,Xn (x1 , . . . , xn ) = 1.
Then x1 xn
∑ ypY (y) = E[Y ] = E [g(X)] = ∑ g(x)pX (x). E[XY ] = E[X]E[Y ].
y x
• pX1 (x1 ) = ∑ ⋯ ∑ pX1 ,...,Xn (x1 , x2 , . . . , xn ).
x2 xn Remark If X and Y are independent,
Remark (PMF of Y = g(X)) The PMF of Y = g(X) is E [g(X)h(Y )] = E [g(X)] E [h(Y )].
• pX2 ,...,Xn (x2 , . . . , xn ) = ∑ pX1 ,X2 ,...,Xn (x1 , x2 , . . . , xn ).
pY (y) = ∑ pX (x). x1
x∶g(x)=y Proposition (Variance of sum of independent random variables)
Definition (Functions of multiple r.v.) If Z = g(X1 , . . . , Xn ), IF X and Y are discrete independent random variables,
Remark In general g (E[X]) ≠ E [g(X)]. They are equal if
where g ∶ Rn → R, then pZ (z) = P (g(X1 , . . . , Xn ) = z).
g(x) = ax + b. Var(X + Y ) = Var(X) + Var(Y ).
Proposition (Expected value rule for multiple r.v.) Given
Variance, conditioning on an event, multiple r.v. g ∶ Rn → R,
Continuous random variables
Definition (Variance of a random variable) Given a random E [g(X1 , . . . , Xn )] = ∑ g(x1 , . . . , xn )pX1 ,...,Xn (x1 , . . . , xn ). PDF, Expectation, Variance, CDF
variable X with µ = E[X], its variance is a measure of the spread x1 ,...,xn
of the random variable and is defined as Definition (Probability density function (PDF)) A probability
Properties (Linearity of expectations)
density function of a r.v. X is a non-negative real valued function
Var(X) = E [(X − µ) ] = ∑(x − µ) pX (x).
2 2

• E[aX + b] = aE[X] + b. fX that satisfies the following
x
• E[X1 + ⋯ + Xn ] = E[X1 ] + ⋯ + E[Xn ]. ∞
Definition (Standard deviation) • ∫ fX (x)dx = 1.
√ Conditioning on a random variable, independence −∞
σX = Var(X). b
Definition (Conditional PMF given another random variable) • P(a ≤ X ≤ b) = ∫ fX (x)dx for some random variable X.
Properties (Properties of the variance) Given discrete random variables X, Y and y such that pY (y) > 0 a
• Var(aX) = a2 Var(X), for all a ∈ R. we define Definition (Continuous random variable) A random variable X is
△ pX,Y (x, y) continuous if its probability law can be described by a PDF fX .
• Var(X + b) = Var(X), for all b ∈ R. pX∣Y (x∣y) = .
pY (y) Remark Continuous random variables satisfy:
• Var(aX + b) = a2 Var(X).
Proposition (Multiplication rule) Given jointly discrete random
• For small δ > 0, P(a ≤ X ≤ a + δ) ≈ fX (a)δ.
• Var(X) = E[X 2 ] − (E[X])2 . variables X, Y , and whenever the conditional probabilities are
Example (Variance of known r.v.) defined, • P(X = a) = 0, ∀a ∈ R.
• If X ∼ Ber(p), then Var(X) = p(1 − p). pX,Y (x, y) = pX (x)pY ∣X (y∣x) = pY (y)pX∣Y (x∣y). Definition (Expectation of a continuous random variable) The
expectation of a continuous random variable is
• If X ∼ Uni[a, b], then Var(X) =
(b−a)(b−a+2)
. Definition (Conditional expectation) Given discrete random
12 ∞
variables X, Y and y such that pY (y) > 0 we define E[X] = ∫

xfX (x)dx.
• If X ∼ Bin(n, p), then Var(X) = np(1 − p). −∞
• If X ∼ Geo(p), then Var(X) = 1−p E[X∣Y = y] = ∑ xpX∣Y (x∣y). ∞
p2 x assuming ∫ ∣x∣fX (x)dx < ∞.
Proposition (Conditional PMF and expectation, given an event) −∞
Additionally we have
Given the event A, with P(A) > 0, we have the following Properties (Properties of expectation)
• pX∣A (x) = P(X = x∣A). E [g(X)∣Y = y] = ∑ g(x)pX∣Y (x∣y).
x
• If X ≥ 0 then E[X] ≥ 0.
• If A is a subset of the range of X, then: • If a ≤ X ≤ b then a ≤ E[X] ≤ b.


Theorem (Total probability and expectation theorems)

1
pX (x), if x ∈ A,
pX∣A (x) = pX∣{X∈A} (x) = ⎨ P(A)
△ If pY (y) > 0, then ∞

⎪0, otherwise. • E [g(X)] = ∫ g(x)fX (x)dx.
⎩ pX (x) = ∑ pY (y)pX∣Y (x∣y), −∞
• ∑x pX∣A (x) = 1. y • E[aX + b] = aE[X] + b.
• E[X∣A] = ∑x xpX∣A (x). E[X] = ∑ pY (y)E[X∣Y = y]. Definition (Variance of a continuous random variable) Given a
y
• E [g(X)∣A] = ∑x g(x)pX∣A (x). continuous random variable X with µ = E[X], its variance is
Proposition (Total expectation rule) Given a partition of disjoint Definition (Independence of a random variable and an event) A ∞
discrete random variable X and an event A are independent if Var(X) = E [(X − µ)2 ] = ∫ (x − µ)2 fX (x)dx.
events A1 , . . . , An such that ∑i P(Ai ) = 1, and P(Ai ) > 0, −∞
P(X = x and A) = pX (x)P(A), for all x.
E[X] = P(A1 )E[X∣A1 ] + ⋯ + P(An )E[X∣An ]. Definition (Independence of two random variables) Two discrete It has the same properties as the variance of a discrete random
random variables X and Y are independent if variable.
Definition (Memorylessness of the geometric random variable)
When we condition a geometric random variable X on the event pX,Y (x, y) = pX (x)pY (y) for all x, y. Example (Uniform continuous random variable) A Uniform
X > n we have memorylessness, meaning that the “remaining time” Remark (Independence of a collection of random variables) A continuous random variable X between a and b, with a < b,
X − n, given that X > n, is also geometric with the same parameter. collection X1 , X2 , . . . , Xn of random variables are independent if (X ∼ Uni(a, b)) has PDF
Formally, ⎧

pX−n∣X>n (i) = pX (i). pX1 ,...,Xn (x1 , . . . , xn ) = pX1 (x1 )⋯pXn (xn ), ∀ x1 , . . . , xn . ⎪ 1 , if a < x < b,
fX (x) = ⎨ b−a

⎪0, otherwise.
Definition (Joint PMF) The joint PMF of random variables Remark (Independence and expectation) In general, ⎩
X1 , X2 , . . . , Xn is E [g(X, Y )] ≠ g (E[X], E[Y ]). An exception is for linear functions:
(b−a)2
pX1 ,X2 ,...,Xn (x1 , . . . , xn ) = P(X1 = x1 , . . . , Xn = xn ). E[aX + bY ] = aE[X] + bE[Y ]. We have E[X] = a+b
2
and Var(X) = 12
.
Example (Exponential random variable) An Exponential random Theorem (Total probability and expectation theorems) Given a Proposition (Expectation of product of independent r.v.) If X
variable X with parameter λ > 0 (X ∼ Exp(λ)) has PDF partition of the space into disjoint events A1 , A2 , . . . , An such that and Y are independent continuous random variables,
⎧ ∑i P(Ai ) = 1 we have the following:

⎪λe−λx , if x ≥ 0, E[XY ] = E[X]E[Y ].
fX (x) = ⎨ FX (x) = P(A1 )FX∣A1 (x) + ⋯ + P(An )FX∣An (x),

⎪ 0, otherwise.

fX (x) = P(A1 )fX∣A1 (x) + ⋯ + P(An )fX∣An (x), Remark If X and Y are independent,
We have E[X] = 1
and Var(X) = 1 E [g(X)h(Y )] = E [g(X)] E [h(Y )].
λ λ2
. E[X] = P(A1 )E[X∣A1 ] + ⋯ + P(An )E[X∣An ].
Definition (Cumulative Distribution Function (CDF)) The CDF Proposition (Variance of sum of independent random variables)
of a random variable X is FX (x) = P(X ≤ x). If X and Y are independent continuous random variables,
Definition (Jointly continuous random variables) A pair
In particular, for a continuous random variable, we have
(collection) of random variables is jointly continuous if there exists
Var(X + Y ) = Var(X) + Var(Y ).
x a joint PDF fX,Y that describes them, that is, for every set B ⊂ Rn
FX (x) = ∫ fX (x)dx,
P ((X, Y ) ∈ B) = ∬ fX,Y (x, y)dxdy. Proposition (Bayes’ rule summary)
−∞
B
dFX (x) • For X, Y discrete: pX∣Y (x∣y) =
pX (x)pY ∣X (y∣x)
.
fX (x) = . Properties (Properties of joint PDFs) pY (y)
dx

• fX (x) = ∫ fX,Y (x, y)dy. fX (x)fY ∣X (y∣x)
Properties (Properties of CDF) • For X, Y continuous: fX∣Y (x∣y) = fY (y)
.
−∞
• If y ≥ x, then FX (y) ≥ FX (x). x y
pX (x)fY ∣X (y∣x)
• FX,Y (x, y) = P(X ≤ x, Y ≤ y) = ∫ [ ∫ fX,Y (u, v)dv] du. • For X discrete, Y continuous: pX∣Y (x∣y) = .
• lim FX (x) = 0. −∞ −∞ fY (y)
x→−∞
∂ 2 FX,Y (x,y) fX (x)pY ∣X (y∣x)
• lim FX (x) = 1. • fX,Y (x) = . • For X continuous, Y discrete: fX∣Y (x∣y) = .
x→∞ ∂x ∂y pY (y)
Definition (Normal/Gaussian random variable) A Normal random Example (Uniform joint PDF on a set S) Let S ⊂ R2 with area
Derived distributions
variable X with mean µ and variance σ 2 > 0 (X ∼ N (µ, σ 2 )) has s > 0, then the random variable (X, Y ) is uniform over S if it has
PDF PDF Proposition (Discrete case) Given a discrete random variable X


fX (x) = √
1 − 1 (x−µ)2
e 2σ2 . ⎪ 1 , (x, y) ∈ S, and a function g, the r.v. Y = g(X) has PMF
fX,Y (x, y) = ⎨ s
2πσ 2 ⎪0, (x, y) ∈/ S.

⎩ pY (y) = ∑ pX (x).
We have E[X] = µ and Var(X) = σ 2 .
Conditioning on a random variable, independence, Bayes’ rule x∶g(x)=y
Remark (Standard Normal) The standard Normal is N (0, 1).
Definition (Conditional PDF given another random variable)
Proposition (Linearity of Gaussians) Given X ∼ N (µ, σ 2 ), and if Remark (Linear function of discrete random variable) If
Given jointly continuous random variables X, Y and a value y such
a ≠ 0, then aX + b ∼ N (aµ + b, a2 σ 2 ). that fY (y) > 0, we define the conditional PDF as g(x) = ax + b, then pY (y) = pX ( y−b
a
).
Using this Y = X−µ is a standard gaussian.
σ fX,Y (x, y) Proposition (Linear function of continuous r.v.) Given a
fX∣Y (x∣y) =

.
Conditioning on an event, and multiple continuous r.v. fY (y) continuous random variable X and Y = aX + b, with a ≠ 0, we have
Definition (Conditional PDF given an event) Given a continuous Additionally we define P(X ∈ A∣Y = y) ∫A fX∣Y (x∣y)dx. 1 y−b
random variable X and event A with P (A) > 0, we define the Proposition (Multiplication rule) Given jointly continuous fY (y) = fX ( ).
conditional PDF as the function that satisfies ∣a∣ a
random variables X, Y , whenever possible we have
P(X ∈ B∣A) = ∫ fX∣A (x)dx. fX,Y (x, y) = fX (x)fY ∣X (y∣x) = fY (y)fX∣Y (x∣y). Corollary (Linear function of normal r.v.) If X ∼ N (µ, σ 2 ) and
B
Y = aX + b, with a ≠ 0, then Y ∼ N (aµ + b, a2 σ 2 ).
Definition (Conditional PDF given X ∈ A) Given a continuous Definition (Conditional expectation) Given jointly continuous
random variable X and an A ⊂ R, with P (A) > 0: random variables X, Y , and y such that fY (y) > 0, we define the Example (General function of a continuous r.v.) If X is a
conditional expected value as continuous random variable and g is any function, to obtain the

⎪ pdf of Y = g(X) we follow the two-step procedure:

1
fX (x), x ∈ A, ∞
fX∣X∈A (x) = ⎨ P(A) E[X∣Y = y] = ∫ xfX∣Y (x∣y)dx.

⎪0, x ∈/ A. 1. Find the CDF of Y : FY (y) = P(Y ≤ y) = P (g(X) ≤ y).

−∞

Additionally we have
Definition (Conditional expectation) Given a continuous random 2. Differentiate the CDF of Y to obtain the PDF:
variable X and an event A, with P (A) > 0: E [g(X)∣Y = y] = ∫

g(x)fX∣Y (x∣y)dx.
dFY (y)
fY (y) = dy .
−∞

E[X∣A] = ∫ fX∣A (x)dx. Theorem (Total probability and total expectation theorems)
Proposition (General formula for monotonic g) Let X be a
−∞ continuous random variable and g a function that is monotonic
Definition (Memorylessness of the exponential random variable) fX (x) = ∫

fY (y)fX∣Y (x∣y)dy, wherever fX (x) > 0. The PDF of Y = g(X) is given by
−∞
When we condition an exponential random variable X on the event ∞
X > t we have memorylessness, meaning that the “remaining time” E[X] = ∫ fY (y)E[X∣Y = y]dy. dh
fY (y) = fX (h(y)) ∣ (y)∣ .
X − t given that X > t is also geometric with the same parameter −∞ dy
i.e., Definition (Independence) Jointly continuous random variables
P(X − t > x∣X > t) = P(X > x). X, Y are independent if fX,Y (x, y) = fX (x)fY (y) for all x, y. where h = g −1 in the interval where g is monotonic.
Sums of independent r.v., covariance and correlation Definition (Conditional variance as a random variable) Given The Central Limit Theorem
Proposition (Discrete case) Let X, Y be discrete independent random variables X, Y the conditional variance Var(X∣Y ) is the Theorem (Central Limit Theorem (CLT)) Given a sequence of
random variables and Z = X + Y , then the PMF of Z is random variable that takes the value Var(X∣Y = y) whenever independent random variables {X1 , X2 , . . .} with E[Xi ] = µ and
Y = y. Var(Xi ) = σ 2 , we define
pZ (z) = ∑ pX (x)pY (z − x). Theorem (Law of total variance)
x 1 n
Var(X) = E [Var(X∣Y )] + Var (E[X∣Y ]) . Zn = √ ∑ (Xi − µ).
Proposition (Continuous case) Let X, Y be continuous σ n i=1
independent random variables and Z = X + Y , then the PDF of Z is Proposition (Sum of a random number of independent r.v.)
Then, for every z, we have
∞ Let N be a nonnegative integer random variable.
fZ (z) = ∫ fX (x)fY (z − x)dx. Let X, X1 , X2 , . . . , XN be i.i.d. random variables. lim P(Zn ≤ z) = P(Z ≤ z),
−∞ n→∞
Let Y = ∑i Xi . Then
Proposition (Sum of independent normal r.v.) Let X ∼ N (µx , σx2 ) where Z ∼ N (0, 1).
and Y ∼ N (µy , σy2 ) independent. Then E[Y ] = E[N ]E[X],
Corollary (Normal approximation of a binomial) Let
Z = X + Y ∼ N (µx + µy , σx2 + σy2 ). Var(Y ) = E[N ] Var(X) + (E[X])2 Var(N ). X ∼ Bin(n, p) with n large. Then Sn can be approximated by
Definition (Covariance) We define the covariance of random Z ∼ N (np, np(1 − p)).
variables X, Y as Remark (De Moivre-Laplace 1/2 approximation) Let X ∼ Bin,
Convergence of random variables then P(X = i) = P (i − 12 ≤ X ≤ i + 12 ) and we can use the CLT to
Cov(X, Y ) = E [(X − E[X]) (Y − E[Y ])] .

approximate the PMF of X.
Inequalities, convergence, and the Weak Law of
Properties (Properties of covariance) Large Numbers
• If X, Y are independent, then Cov(X, Y ) = 0. Theorem (Markov inequality) Given a random variable X ≥ 0 and,
• Cov(X, X) = Var(X). for every a > 0 we have
• Cov(aX + b, Y ) = a Cov(X, Y ). E[X]
P(X ≥ a) ≤ .
• Cov(X, Y + Z) = Cov(X, Y ) + Cov(X, Z). a
Theorem (Chebyshev inequality) Given a random variable X with
• Cov(X, Y ) = E[XY ] − E[X]E[Y ].
E[X] = µ and Var(X) = σ 2 , for every  > 0 we have
Proposition (Variance of a sum of r.v.)
σ2
Var(X1 + ⋯ + Xn ) = ∑ Var(Xi ) + ∑ Cov(Xi , Xj ). P (∣X − µ∣ ≥ ) ≤ .
i i≠j
2
Theorem (Weak Law of Large Number (WLLN)) Given a
Definition (Correlation coefficient) We define the correlation sequence of i.i.d. random variables {X1 , X2 , . . .} with E[Xi ] = µ
coefficient of random variables X, Y , with σX , σY > 0, as and Var(Xi ) = σ 2 , we define
Cov(X, Y ) 1 n
ρ(X, Y ) =

. Mn = ∑ Xi ,
σX σY n i=1
Properties (Properties of the correlation coefficient) for every  > 0 we have
• −1 ≤ ρ ≤ 1.
lim P (∣Mn − µ∣ ≥ ) = 0.
n→∞
• If X, Y are independent, then ρ = 0.
Definition (Convergence in probability) A sequence of random
• ∣ρ∣ = 1 if and only if X − E[X] = c (Y − E[Y ]).
variables {Yi } converges in probability to the random variable Y if
• ρ(aX + b, Y ) = sign(a)ρ(X, Y ).
lim P (∣Yi − Y ∣ ≥ ) = 0,
n→∞
Conditional expectation and variance, sum of
random number of r.v. for every  > 0.
Definition (Conditional expectation as a random variable) Given Properties (Properties of convergence in probability) If Xn → a
random variables X, Y the conditional expectation E[X∣Y ] is the and Yn → b in probability, then
random variable that takes the value E[X∣Y = y] whenever Y = y. • Xn + Yn → a + b.
Theorem (Law of iterated expectations) • If g is a continuous function, then g(Xn ) → g(a).
E [E[X∣Y ]] = E[X]. • E[Xn ] does not always converge to a.

You might also like