Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
53 views

Random Variables + Distribution Models + Linear Regression

The document discusses key concepts in probability and statistics including: 1) Discrete and continuous random variables are defined by their probability mass/density functions and distribution functions. Expectation, variation, and standard deviation are described. 2) Bivariate random variables have joint and marginal distributions as well as covariance and conditional distributions. 3) Common probability distributions for modeling discrete and continuous random variables are presented, including the Bernoulli, binomial, normal, and uniform distributions. Properties of the Bernoulli model are defined.

Uploaded by

Basoko_Leaks
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
53 views

Random Variables + Distribution Models + Linear Regression

The document discusses key concepts in probability and statistics including: 1) Discrete and continuous random variables are defined by their probability mass/density functions and distribution functions. Expectation, variation, and standard deviation are described. 2) Bivariate random variables have joint and marginal distributions as well as covariance and conditional distributions. 3) Common probability distributions for modeling discrete and continuous random variables are presented, including the Bernoulli, binomial, normal, and uniform distributions. Properties of the Bernoulli model are defined.

Uploaded by

Basoko_Leaks
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

PROBABILITY AND DATA ANALYSIS DATA SCIENCE AND ENGINEERING

RANDOM VARIABLES
DISCRETE RANDOM VARIABLE
PROBABILITY FUNCTION
0 ≤ 𝑃𝑃[𝑋𝑋 = 𝑥𝑥 ] ≤ 1 𝑃𝑃[𝑋𝑋 > 𝑥𝑥 ] = 1 − 𝑃𝑃[𝑋𝑋 ≤ 𝑥𝑥]
� 𝑃𝑃[𝑋𝑋 = 𝑥𝑥 ] = 1 𝑃𝑃[𝑋𝑋 ≤ 𝑥𝑥 ] = � 𝑃𝑃[𝑋𝑋 = 𝑥𝑥]

DISTRIBUTION FUNCTION (c.d.f.)


0 ≤ 𝐹𝐹 (𝑥𝑥 ) ≤ 1 𝐼𝐼𝐼𝐼 𝑥𝑥1 ≤ 𝑥𝑥2 , 𝑡𝑡ℎ𝑒𝑒𝑒𝑒 𝐹𝐹 (𝑥𝑥1 ) ≤ 𝐹𝐹(𝑥𝑥2 )
𝐹𝐹 (𝑦𝑦) = 0 ∀𝑦𝑦 < min 𝑆𝑆. 𝐹𝐹 (𝑦𝑦) = 1 ∀𝑦𝑦 > max 𝑆𝑆.
𝑇𝑇ℎ𝑒𝑒𝑒𝑒, 𝐹𝐹 (−∞) = 0 𝑇𝑇ℎ𝑒𝑒𝑒𝑒, 𝐹𝐹 (∞) = 1
∀𝑎𝑎, 𝑏𝑏 ∈ ℝ, 𝑃𝑃(𝑎𝑎 < 𝑋𝑋 ≤ 𝑏𝑏) = 𝑃𝑃(𝑋𝑋 ≤ 𝑏𝑏) − 𝑃𝑃(𝑋𝑋 ≤ 𝑎𝑎) = 𝐹𝐹 (𝑏𝑏) − 𝐹𝐹(𝑎𝑎)

EXPECTATION of a D.R.V.: 𝐸𝐸 [𝑋𝑋 ] = ∑ 𝑥𝑥𝑖𝑖 𝑝𝑝𝑖𝑖


𝐸𝐸 [𝑎𝑎 + 𝑏𝑏𝑏𝑏] = 𝑎𝑎 + 𝑏𝑏𝑏𝑏 [𝑋𝑋] 𝐸𝐸 [𝑔𝑔(𝑥𝑥)] = � 𝑔𝑔(𝑥𝑥)𝑃𝑃(𝑋𝑋 = 𝑥𝑥)
VARIATION of a D.R.V.: 𝑉𝑉 [𝑋𝑋 ] = 𝐸𝐸 [𝑋𝑋 2 ] − 𝐸𝐸 [𝑋𝑋]2 𝑉𝑉 [𝑎𝑎 + 𝑏𝑏𝑏𝑏] = 𝑏𝑏 2 𝑉𝑉[𝑋𝑋]
STANDARD DEVIATION of a D.R.V.: 𝑆𝑆 [𝑋𝑋 ] = �𝑉𝑉[𝑋𝑋]

CONTINUOUS RANDOM VARIABLE


DISTRIBUTION FUNCTION
0 ≤ 𝐹𝐹 (𝑥𝑥 ) ≤ 1 𝐼𝐼𝐼𝐼 𝑥𝑥1 ≤ 𝑥𝑥2 , 𝑡𝑡ℎ𝑒𝑒𝑒𝑒 𝐹𝐹 (𝑥𝑥1 ) ≤ 𝐹𝐹(𝑥𝑥2 )
𝐹𝐹 (−∞) = 0 𝐹𝐹 (∞) = 1
∀𝑎𝑎, 𝑏𝑏 ∈ ℝ, 𝑃𝑃(𝑎𝑎 ≤ 𝑋𝑋 ≤ 𝑏𝑏) = 𝐹𝐹 (𝑏𝑏) − 𝐹𝐹(𝑎𝑎) 𝐹𝐹 (𝑥𝑥 ) 𝑖𝑖𝑖𝑖 𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐

The probability mass function has no meaning for continuous r.v. because
𝑃𝑃(𝑋𝑋 = 𝑥𝑥) = 0. In its place we use the density function:
DENSITY FUNCTION
𝑏𝑏
𝑓𝑓 (𝑥𝑥 ) ≥ 0 ∀𝑥𝑥 ∈ ℝ 𝑃𝑃(𝑎𝑎 ≤ 𝑋𝑋 ≤ 𝑏𝑏) = � 𝑓𝑓(𝑥𝑥 )𝑑𝑑𝑑𝑑 ∀𝑎𝑎, 𝑏𝑏 ∈ ℝ
𝑎𝑎
𝑥𝑥 ∞
𝐹𝐹 (𝑥𝑥 ) = 𝑃𝑃(𝑋𝑋 ≤ 𝑥𝑥) = � 𝑓𝑓 (𝑢𝑢)𝑑𝑑𝑑𝑑 � 𝑓𝑓 (𝑥𝑥 )𝑑𝑑𝑑𝑑 = 1
−∞ −∞

EXPECTATION of a C.R.V.: 𝐸𝐸 [𝑋𝑋 ] = ∫𝑆𝑆 𝑥𝑥 𝑓𝑓 (𝑥𝑥) 𝑑𝑑𝑑𝑑

𝐸𝐸 [𝑎𝑎 + 𝑏𝑏𝑏𝑏] = 𝑎𝑎 + 𝑏𝑏𝑏𝑏 [𝑋𝑋] 𝐸𝐸 [𝑔𝑔(𝑥𝑥)] = � 𝑔𝑔(𝑥𝑥) 𝑓𝑓 (𝑥𝑥) 𝑑𝑑𝑑𝑑


VARIATION of a C.R.V.: 𝑉𝑉 [𝑋𝑋 ] = 𝐸𝐸 [𝑋𝑋 2 ] − 𝐸𝐸 [𝑋𝑋]2 𝑉𝑉 [𝑎𝑎 + 𝑏𝑏𝑏𝑏] = 𝑏𝑏 2 𝑉𝑉[𝑋𝑋]
STANDARD DEVIATION of a C.R.V.: 𝑆𝑆 [𝑋𝑋 ] = �𝑉𝑉[𝑋𝑋]
PROBABILITY AND DATA ANALYSIS DATA SCIENCE AND ENGINEERING

CHEBYSHEV’S INEQUALITY
The inequality provides a bound for the probability of a random variable when the
expectation (𝐸𝐸 [𝑋𝑋]) and the variance (𝑉𝑉[𝑋𝑋]) are available.
𝑉𝑉(𝑋𝑋) 𝑉𝑉(𝑋𝑋)
𝑃𝑃(|𝑋𝑋 − 𝐸𝐸 [𝑋𝑋]| ≥ 𝑘𝑘) ≤ 𝑜𝑜𝑜𝑜 𝑃𝑃(|𝑋𝑋 − 𝐸𝐸 [𝑋𝑋]| < 𝑘𝑘) ≥ 1 −
𝑘𝑘 2 𝑘𝑘 2

BIVARIATE RANDOM VARIABLE


JOINT DISTRIBUTION FUNCTION 𝑭𝑭(𝒙𝒙, 𝒚𝒚)
𝑥𝑥 𝑦𝑦
𝐹𝐹 (𝑥𝑥, 𝑦𝑦) = 𝑃𝑃(𝑋𝑋 ≤ 𝑥𝑥, 𝑌𝑌 ≤ 𝑦𝑦) = � � 𝑓𝑓(𝑥𝑥, 𝑦𝑦) 𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑
−∞ −∞

JOINT DENSITY FUNCTION 𝒇𝒇(𝒙𝒙, 𝒚𝒚)


𝑥𝑥 𝑦𝑦
𝑓𝑓 (𝑥𝑥, 𝑦𝑦) ≥ 0 ∀𝑥𝑥, 𝑦𝑦 ∈ ℝ � � 𝑓𝑓(𝑥𝑥, 𝑦𝑦) 𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑 = 1
−∞ −∞
𝑏𝑏 𝑑𝑑
𝑃𝑃(𝑎𝑎 ≤ 𝑋𝑋 ≤ 𝑏𝑏, 𝑐𝑐 ≤ 𝑌𝑌 ≤ 𝑑𝑑 ) = � � 𝑓𝑓 (𝑥𝑥, 𝑦𝑦) 𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑
𝑎𝑎 𝑐𝑐

MARGINAL DENSITY FUNCTION


∞ ∞
𝑓𝑓𝑥𝑥 (𝑥𝑥) = � 𝑓𝑓(𝑥𝑥, 𝑦𝑦)𝑑𝑑𝑑𝑑 𝑓𝑓𝑦𝑦 (𝑦𝑦) = � 𝑓𝑓(𝑥𝑥, 𝑦𝑦)𝑑𝑑𝑑𝑑
−∞ −∞
∞ ∞
𝐸𝐸 [𝑋𝑋] = 𝑓𝑓𝑥𝑥 (𝑥𝑥) = � 𝑥𝑥 𝑓𝑓𝑥𝑥 (𝑥𝑥) 𝑑𝑑𝑑𝑑 𝐸𝐸 [𝑌𝑌] = 𝑓𝑓𝑦𝑦 (𝑦𝑦) = � 𝑦𝑦 𝑓𝑓𝑦𝑦 (𝑦𝑦) 𝑑𝑑𝑑𝑑
−∞ −∞

CONDITIONAL DENSITY DISTRIBUTION


𝑓𝑓𝑦𝑦 (𝑥𝑥0 , 𝑦𝑦)
𝑓𝑓�𝑌𝑌�𝑋𝑋� (𝑦𝑦 | 𝑋𝑋 = 𝑥𝑥0 ) =
𝑓𝑓𝑥𝑥 (𝑥𝑥0 )

COVARIANCE

Independent r.v 𝐶𝐶𝐶𝐶𝐶𝐶[𝑥𝑥, 𝑦𝑦] = 0


Dependent r.v. 𝐶𝐶𝐶𝐶𝐶𝐶[𝑥𝑥, 𝑦𝑦] = 𝐸𝐸 [𝑋𝑋𝑋𝑋] − 𝐸𝐸 [𝑋𝑋]𝐸𝐸[𝑌𝑌]

VARIANCE

Independent r.v 𝑉𝑉[𝑎𝑎𝑎𝑎 + 𝑏𝑏𝑏𝑏] = 𝑎𝑎2 𝑉𝑉[𝑋𝑋] + 𝑏𝑏 2 𝑉𝑉[𝑋𝑋]


Dependent r.v. 𝑉𝑉[𝑎𝑎𝑎𝑎 + 𝑏𝑏𝑏𝑏] = 𝑎𝑎2 𝑉𝑉[𝑋𝑋] + 𝑏𝑏 2 𝑉𝑉[𝑋𝑋] + 2𝑎𝑎𝑎𝑎 𝐶𝐶𝐶𝐶𝐶𝐶[𝑥𝑥, 𝑦𝑦]

CONDITIONAL DENSITY DISTRIBUTION


𝐶𝐶𝐶𝐶𝐶𝐶[𝑥𝑥, 𝑦𝑦]
𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶[𝑥𝑥, 𝑦𝑦] =
�𝑉𝑉[𝑋𝑋]𝑉𝑉[𝑌𝑌]
PROBABILITY AND DATA ANALYSIS DATA SCIENCE AND ENGINEERING

DISTRIBUTION MODELS
DISCRETE R.V.: Bernoulli, binomial, geometric, poisson
CONTINUOUS R.V.: Uniform, exponential, normal

BERNOULLI MODEL (D)


This probability model describes an experiment with two possible outcomes:
“success” or “fail”:
1 𝑖𝑖𝑖𝑖 𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠
𝑋𝑋 ~ 𝐵𝐵𝐵𝐵𝐵𝐵(𝑝𝑝) 𝑋𝑋 = �
0 𝑖𝑖𝑖𝑖 𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓

Let 𝑝𝑝 ∈ [0,1] denotes 𝑝𝑝(𝑥𝑥 = 1) = 𝑝𝑝, 𝑝𝑝(𝑥𝑥 = 0)1 − 𝑝𝑝

𝑝𝑝 𝑖𝑖𝑖𝑖 𝑥𝑥 = 1
 PROB. MASS FUNCTION: 𝑝𝑝(𝑥𝑥) = �
1 − 𝑝𝑝 𝑖𝑖𝑖𝑖 𝑥𝑥 = 0

0 𝑖𝑖𝑖𝑖 𝑥𝑥 < 0
 DISTRIBUTION FUNCTION: 𝐹𝐹(𝑥𝑥) = �1 − 𝑝𝑝 𝑖𝑖𝑖𝑖 0 ≤ 𝑥𝑥 < 1
1 𝑖𝑖𝑖𝑖 𝑥𝑥 ≥ 1

EXPECTATION: 𝐸𝐸 [𝑋𝑋 ]= 𝑝𝑝
VARIATION: 𝑉𝑉 [𝑋𝑋 ] = 𝑝𝑝(1 − 𝑝𝑝)

ST. DEVIATION: 𝑆𝑆[𝑋𝑋 ] = �𝑝𝑝(1 − 𝑝𝑝)

BINOMIAL MODEL (D)


This model describes the total number of successes of a n equal Bernoulli
experiments repeated independently.

𝑋𝑋 ~ 𝐵𝐵𝐵𝐵𝐵𝐵(𝑛𝑛, 𝑝𝑝) 𝑥𝑥 ∈ {0, 1, 2, … , 𝑛𝑛}

The random variable represents the number of successes


and follows a binomial distribution (𝑝𝑝 ∈ [0,1]).

𝑛𝑛 𝑘𝑘
 PROB. MASS FUNCTION: 𝑃𝑃(𝑋𝑋 = 𝑘𝑘 ) = � � 𝑝𝑝 (1 − 𝑝𝑝)𝑛𝑛−𝑘𝑘 ∀ 𝑘𝑘 ∈ ℕ
𝑘𝑘
𝑛𝑛 𝑛𝑛!
𝑤𝑤ℎ𝑒𝑒𝑒𝑒𝑒𝑒 � � =
𝑘𝑘 𝑘𝑘! (𝑛𝑛 − 𝑘𝑘)!

EXPECTATION: 𝐸𝐸 [𝑋𝑋 ] = 𝑛𝑛𝑝𝑝


VARIATION: 𝑉𝑉 [𝑋𝑋 ] = 𝑛𝑛𝑛𝑛(1 − 𝑝𝑝)

ST. DEVIATION: 𝑆𝑆[𝑋𝑋 ] = �𝑛𝑛𝑛𝑛(1 − 𝑝𝑝)


PROBABILITY AND DATA ANALYSIS DATA SCIENCE AND ENGINEERING

GEOMETRIC MODEL (D)


The random variable denotes the number of trials until the first success.

𝑋𝑋 ~ 𝐺𝐺(𝑝𝑝) 𝑥𝑥 ∈ {0, 1, 2, … , 𝑛𝑛}

 PROB. MASS FUNCTION: 𝑃𝑃(𝑋𝑋 = 𝑘𝑘 ) = (1 − 𝑝𝑝)𝑘𝑘−1 𝑝𝑝 ∀ 𝑘𝑘 ∈ ℕ

1
EXPECTATION: 𝐸𝐸 [𝑋𝑋 ] =
𝑝𝑝
1−𝑝𝑝
VARIATION: 𝑉𝑉 [𝑋𝑋 ] =
𝑝𝑝2
�(1−𝑝𝑝)
ST. DEVIATION: 𝑆𝑆[𝑋𝑋 ] =
𝑝𝑝

POISSON DISTRIBUTION (D)


Expresses the probability of a given number of events occurring in a fixed interval
of time or space (area, volume…) and knowing their average rate (𝜆𝜆).

𝑋𝑋 ~ 𝑃𝑃𝑃𝑃𝑃𝑃(𝜆𝜆) 𝑥𝑥 ∈ {0, 1, 2, … , 𝑛𝑛}

If 𝑋𝑋~𝑃𝑃𝑃𝑃𝑃𝑃(𝜆𝜆1 ) and 𝑌𝑌~𝑃𝑃𝑃𝑃𝑃𝑃(𝜆𝜆2 ) are independent, then 𝑋𝑋 + 𝑌𝑌~𝑃𝑃𝑃𝑃𝑃𝑃(𝜆𝜆1 + 𝜆𝜆2 ).

𝜆𝜆𝑘𝑘
 PROB. MASS FUNCTION: 𝑃𝑃 (𝑋𝑋 = 𝑘𝑘 ) = 𝑒𝑒 −𝜆𝜆 ∀ 𝑘𝑘 ∈ ℕ
𝑘𝑘!

EXPECTATION: 𝐸𝐸 [𝑋𝑋] = 𝜆𝜆
VARIATION: 𝑉𝑉 [𝑋𝑋] = 𝜆𝜆
ST. DEVIATION: 𝑆𝑆[𝑋𝑋] = √𝜆𝜆

UNIFORM DISTRIBUTION (C)


For the uniform distribution every set of the same length has the same probability.
A cont. r.v. variable 𝑋𝑋 follows a uniform distribution over the interval (𝑎𝑎, 𝑏𝑏) if:

)−1 𝑖𝑖𝑖𝑖 𝑎𝑎 < 𝑥𝑥 ≤ 𝑏𝑏


𝑓𝑓(𝑥𝑥) = � 𝑏𝑏 − 𝑎𝑎
(
𝑋𝑋 ~ 𝒰𝒰(𝑎𝑎, 𝑏𝑏)
0 𝑜𝑜𝑜𝑜ℎ𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒

𝑎𝑎+𝑏𝑏
EXPECTATION: 𝐸𝐸 [𝑋𝑋 ] =
2
(𝑏𝑏−𝑎𝑎)2
VARIATION: 𝑉𝑉 [𝑋𝑋 ] =
12
𝑏𝑏−𝑎𝑎
ST. DEVIATION: 𝑆𝑆[𝑋𝑋 ] =
√12
PROBABILITY AND DATA ANALYSIS DATA SCIENCE AND ENGINEERING

EXPONENTIAL DISTRIBUTION (C)


The random variable equals the distance between successive events in a Poisson
process follows an exponential distribution with parameter 𝜆𝜆.

𝑋𝑋 ~ 𝐸𝐸𝐸𝐸𝐸𝐸(𝜆𝜆) 𝑥𝑥 ∈ {0, 1, 2, … , 𝑛𝑛}

𝜆𝜆𝑒𝑒 −𝜆𝜆𝜆𝜆 𝑖𝑖𝑖𝑖 𝑥𝑥 ≥ 0


 DENSITY MASS FUNCTION: 𝑓𝑓(𝑥𝑥) = �
0 𝑖𝑖𝑖𝑖 𝑥𝑥 < 0
1 − 𝑒𝑒 −𝜆𝜆𝜆𝜆 𝑖𝑖𝑖𝑖 𝑥𝑥 ≥ 0
 CUMULATIVE DISTRIB. FUNCTION: 𝐹𝐹(𝑥𝑥) = �
0 𝑖𝑖𝑖𝑖 𝑥𝑥 < 0

EXPECTATION: 𝐸𝐸 [𝑋𝑋] = 𝜆𝜆−1


VARIATION: 𝑉𝑉 [𝑋𝑋] = 𝜆𝜆−2
ST. DEVIATION: 𝑆𝑆[𝑋𝑋] = 𝜆𝜆−1

LACK OF MEMORY PROPERTY: Given 𝑥𝑥1 , 𝑥𝑥2 > 0:

𝑃𝑃(𝑋𝑋 > 𝑥𝑥1 + 𝑥𝑥2 |𝑋𝑋 > 𝑥𝑥1 ) = 𝑃𝑃(𝑋𝑋 > 𝑥𝑥2 )

NORMAL OR GAUSSIAN DISTRIBUTION (C)


Models the measurement of errors of a certain continuous quantity. The r.v. follows
a normal or Gaussian distribution with parameters 𝜇𝜇 and 𝜆𝜆.

𝑋𝑋 ~ 𝑁𝑁(𝜇𝜇, 𝜎𝜎) 𝜇𝜇 ∈ ℝ 𝑎𝑎𝑎𝑎𝑎𝑎 𝜎𝜎 ∈ ℝ+

1
1 − (𝑥𝑥−𝜇𝜇)2
 DENSITY FUNCTION: 𝑓𝑓(𝑥𝑥) = 𝑒𝑒 2𝜎𝜎2
𝜎𝜎 √2𝜋𝜋

EXPECTATION: 𝐸𝐸 [𝑋𝑋 ] = 𝜇𝜇 VARIATION: 𝑉𝑉 [𝑋𝑋 ] = 𝜎𝜎 2 ST. DEVIATION: 𝑆𝑆[𝑋𝑋 ] = 𝜎𝜎

𝜎𝜎 2
CHEBYSHEV’S INEQ.: 𝑃𝑃 (|𝑋𝑋 − 𝜇𝜇 | < 𝑘𝑘 ) = 𝑃𝑃(𝜇𝜇 − 𝑘𝑘 < 𝑋𝑋 < 𝜇𝜇 + 𝑘𝑘 ) ≥ 1 −
𝑘𝑘 2
1
Therefore if 𝑘𝑘 = 𝑐𝑐𝑐𝑐 ⟹ 𝑃𝑃(𝜇𝜇 − 𝑐𝑐𝑐𝑐 < 𝑋𝑋 < 𝜇𝜇 + 𝑐𝑐𝑐𝑐) ≥ 1 − )
𝑐𝑐 2

LINEAR TRANS.: If 𝑋𝑋 ~ 𝒩𝒩(𝜇𝜇, 𝜎𝜎) and 𝑌𝑌 = 𝑎𝑎 + 𝑏𝑏𝑏𝑏, then: 𝑌𝑌 ~ 𝒩𝒩(𝑎𝑎 + 𝑏𝑏𝑏𝑏, |𝑏𝑏|𝜎𝜎)

EXPECTATION: 𝐸𝐸 [𝑦𝑦] = 𝑎𝑎 + 𝑏𝑏𝑏𝑏[𝑋𝑋]


VARIATION: 𝑉𝑉 [𝑌𝑌 ] = 𝑏𝑏 2 𝑉𝑉[𝑋𝑋] ST. DEVIATION: 𝑆𝑆[𝑋𝑋 ] = |𝑏𝑏|𝜎𝜎

STANDARIZATION: If 𝑋𝑋 ~ 𝒩𝒩(𝜇𝜇, 𝜎𝜎) it is possible to consider the standardized r.v.:

𝑋𝑋 − 𝜇𝜇 𝜇𝜇 1
𝑍𝑍 = = − + 𝑋𝑋 ~ 𝒩𝒩(0,1)
𝜎𝜎 𝜎𝜎 𝜎𝜎
PROBABILITY AND DATA ANALYSIS DATA SCIENCE AND ENGINEERING

CENTRAL LIMIT THEOREM (CLT)


Let 𝑋𝑋1 , 𝑋𝑋2 , … , 𝑋𝑋𝑛𝑛 be a set of independent random variables where 𝐸𝐸[𝑋𝑋] = 𝜇𝜇 and
𝑉𝑉[𝑋𝑋] = 𝜎𝜎 2 . Then, for 𝑛𝑛 large enough (𝑛𝑛 → ∞):

𝑋𝑋1 + 𝑋𝑋2 + ⋯ + 𝑋𝑋𝑛𝑛 ~ 𝒩𝒩 �∑𝑛𝑛𝑖𝑖=1 𝜇𝜇𝑖𝑖 , �∑𝑛𝑛𝑖𝑖=1 𝜎𝜎𝑖𝑖2 � The approx. is optimal for 𝑛𝑛 > 30

As a particular case, let 𝑋𝑋1 , 𝑋𝑋2 , … , 𝑋𝑋𝑛𝑛 be a set of identical distributed and independent
random variables:
𝑛𝑛
1 For n large, the distribution of 𝑋𝑋� is Gaussian
𝑋𝑋� = � 𝑋𝑋𝑖𝑖 𝑋𝑋� − 𝜇𝜇
𝑛𝑛 independently of the distribution of 𝑋𝑋: ⟹ 𝜎𝜎 ~ 𝒩𝒩(0,1)
𝑖𝑖=1 � 𝑛𝑛

APPROXIMATIONS WITH THE CLT
o BINOMIAL: Let 𝑋𝑋 ~ 𝐵𝐵𝐵𝐵𝐵𝐵(𝑛𝑛, 𝑝𝑝) with 𝑛𝑛 large enough, then:

𝑋𝑋 − 𝑛𝑛𝑛𝑛
𝑋𝑋 ~ 𝒩𝒩�𝑛𝑛𝑛𝑛, �𝑛𝑛𝑛𝑛(1 − 𝑝𝑝)� ⟺ ~ 𝒩𝒩(0,1)
�𝑛𝑛𝑛𝑛(1 − 𝑝𝑝)
o POISSON: Let 𝑋𝑋 ~ 𝑃𝑃𝑃𝑃𝑃𝑃(𝜆𝜆) with 𝜆𝜆 > 5 then it. can be approximated by:

𝑋𝑋 − 𝜆𝜆
𝑋𝑋 ~ 𝒩𝒩�𝜆𝜆, √𝜆𝜆� ⟺ ~ 𝒩𝒩(0,1)
√𝜆𝜆

LINEAR REGRESSION
REGRESSION MODEL: It is a model that allows us to describe an effect of a
variable X and Y, in other words, we want to describe or forecast the behavior of Y
as a function of X.
𝑿𝑿 ≡ 𝑰𝑰𝑰𝑰𝑰𝑰𝑰𝑰𝑰𝑰𝑰𝑰𝑰𝑰𝑰𝑰𝑰𝑰𝑰𝑰𝑰𝑰 𝑜𝑜𝑜𝑜 𝒆𝒆𝒆𝒆𝒆𝒆𝒆𝒆𝒆𝒆𝒆𝒆𝒆𝒆𝒆𝒆𝒆𝒆𝒆𝒆𝒆𝒆 𝑜𝑜𝑜𝑜 𝒆𝒆𝒆𝒆𝒆𝒆𝒆𝒆𝒆𝒆𝒆𝒆𝒆𝒆𝒆𝒆𝒆𝒆𝒆𝒆 𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣
𝒀𝒀 ≡ 𝑫𝑫𝑫𝑫𝑫𝑫𝑫𝑫𝑫𝑫𝑫𝑫𝑫𝑫𝑫𝑫𝑫𝑫 𝑜𝑜𝑜𝑜 𝒓𝒓𝒓𝒓𝒓𝒓𝒓𝒓𝒓𝒓𝒓𝒓𝒓𝒓𝒓𝒓 𝑜𝑜𝑜𝑜 𝒆𝒆𝒆𝒆𝒆𝒆𝒆𝒆𝒆𝒆𝒆𝒆𝒆𝒆𝒆𝒆𝒆𝒆𝒆𝒆𝒆𝒆 𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣

TYPES OF RELATIONSHIPS
 Deterministic: Given a value of X, the value of Y can be perfectly identified:
𝑦𝑦 = 𝑓𝑓(𝑥𝑥)
 Nondeterministic: Given X, the value of Y cannot be perfectly known:
𝑦𝑦 = 𝑓𝑓(𝑥𝑥) + 𝑢𝑢
 Linear: When the function 𝑓𝑓(𝑥𝑥) is linear.
𝑓𝑓(𝑥𝑥) = 𝛽𝛽0 + 𝛽𝛽1 𝑥𝑥
If 𝛽𝛽1 > 0 ⇒ 𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷 𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙 𝑟𝑟𝑟𝑟𝑟𝑟. 𝛽𝛽1 < 0 ⇒ 𝑵𝑵𝑵𝑵𝑵𝑵𝑵𝑵𝑵𝑵𝑵𝑵𝑵𝑵𝑵𝑵 𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙 𝑟𝑟𝑟𝑟𝑟𝑟.
 Nonlinear: When 𝑓𝑓(𝑥𝑥) is nonlinear. Examples: 𝑓𝑓(𝑥𝑥) = log 𝑥𝑥 , 𝑓𝑓(𝑥𝑥) = 𝑥𝑥 2 …
 Lack of relationship: When 𝑓𝑓(𝑥𝑥) = 0.
PROBABILITY AND DATA ANALYSIS DATA SCIENCE AND ENGINEERING

MEASURES OF LINEAR DEPENDENCE


𝐶𝐶𝐶𝐶𝐶𝐶 > 0 → 𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃 𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙 𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟ℎ𝑖𝑖𝑖𝑖
∑𝑛𝑛𝑖𝑖=1 𝑥𝑥𝑖𝑖 𝑦𝑦𝑖𝑖 − 𝑛𝑛(𝑥𝑥̅ ) (𝑦𝑦�)
𝑐𝑐𝑐𝑐𝑐𝑐(𝑥𝑥, 𝑦𝑦) = 𝐶𝐶𝐶𝐶𝐶𝐶 < 0 → 𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁 𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙 𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟ℎ𝑖𝑖𝑖𝑖
𝑛𝑛 − 1
𝐶𝐶𝐶𝐶𝐶𝐶 ≈ 0 → 𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁 𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟ℎ𝑖𝑖𝑖𝑖

𝑐𝑐𝑐𝑐𝑐𝑐(𝑥𝑥, 𝑦𝑦) ∑𝑛𝑛𝑖𝑖=1(𝑥𝑥𝑖𝑖 − 𝑥𝑥̅ )2 ∑𝑛𝑛𝑖𝑖=1(𝑦𝑦𝑖𝑖 − 𝑦𝑦�)2


𝑟𝑟(𝑥𝑥,𝑦𝑦) = 𝑐𝑐𝑐𝑐𝑐𝑐(𝑥𝑥, 𝑦𝑦) = 𝑆𝑆𝑋𝑋2 = 𝑆𝑆𝑌𝑌2 =
𝑆𝑆𝑋𝑋 𝑆𝑆𝑌𝑌 𝑛𝑛 − 1 𝑛𝑛 − 1

−1 ≤ 𝑐𝑐𝑐𝑐𝑐𝑐(𝑥𝑥, 𝑦𝑦) ≤ 1 𝑐𝑐𝑐𝑐𝑐𝑐(𝑥𝑥, 𝑦𝑦) = 𝑐𝑐𝑐𝑐𝑐𝑐(𝑦𝑦, 𝑥𝑥)

LINEAR REGRESSION MODEL

The simple linear regression model assumes that: 𝑌𝑌𝑖𝑖 = 𝛽𝛽0 + 𝛽𝛽1 𝑥𝑥𝑖𝑖 + 𝑢𝑢𝑖𝑖
Where 𝜷𝜷𝟎𝟎 (intercept) and 𝜷𝜷𝟏𝟏 (slope) are the population coefficients and 𝑢𝑢𝑖𝑖 is an
error. The parameters that we need to estimate are 𝛽𝛽0 , 𝛽𝛽1 , 𝜎𝜎 2 in order to obtain the
regression line: 𝑦𝑦� = 𝛽𝛽̂1 + 𝛽𝛽̂1 𝑥𝑥.
The residual is: 𝑒𝑒𝑖𝑖 = 𝑦𝑦𝑖𝑖 − 𝑦𝑦�𝑖𝑖

MODEL ASSUMPTIONS
 Linearity: The relationship between X and Y is linear: 𝑓𝑓(𝑥𝑥) = 𝛽𝛽0 + 𝛽𝛽1 𝑥𝑥
 Homogeneity: The errors have mean zero: 𝐸𝐸[𝑢𝑢𝑖𝑖 ] = 0
 Homoscedasticity: The variance of the errors is constant: 𝑉𝑉𝑉𝑉𝑉𝑉(𝑢𝑢𝑖𝑖 ) = 𝜎𝜎 2
 Independence: The errors are independent: 𝐸𝐸�𝑢𝑢𝑖𝑖 𝑢𝑢𝑗𝑗 � = 0 (not time series)
 Normality: The errors follow a normal distribution: 𝑢𝑢𝑖𝑖 ~ 𝒩𝒩(0, 𝜎𝜎 2 )

LEAST SQUARE ESTIMATORS (LSE)


Proposed by Gauss, this method minimizes the sum of squares of the residuals:
𝑛𝑛 𝑛𝑛 𝑛𝑛
2
𝑀𝑀𝑀𝑀𝑀𝑀 � 𝑢𝑢𝑖𝑖2 = �(𝑦𝑦𝑖𝑖 − 𝑦𝑦�𝑖𝑖 )2 = �� 𝑦𝑦𝑖𝑖 − �𝛽𝛽̂0 + 𝛽𝛽̂1 𝑥𝑥� �
𝑖𝑖=1 𝑖𝑖=1 𝑖𝑖=1

The result estimators are:

𝑐𝑐𝑐𝑐𝑐𝑐(𝑥𝑥, 𝑦𝑦) ∑𝑛𝑛𝑖𝑖=1(𝑥𝑥𝑖𝑖 − 𝑥𝑥̅ )(𝑦𝑦𝑖𝑖 − 𝑦𝑦�)


𝛽𝛽̂1 = = 𝛽𝛽̂0 = 𝑦𝑦� − 𝛽𝛽̂1 𝑥𝑥̅
𝑆𝑆𝑋𝑋2 ∑𝑛𝑛𝑖𝑖=1(𝑥𝑥𝑖𝑖 − 𝑥𝑥�𝑖𝑖 )2

COEFFICIENT OF DETERMINATION, R-SQUARED


It Is used to assess the goodness-of-fit of the model. It is defined as:
2
𝑅𝑅2 = 𝑟𝑟(𝑥𝑥,𝑦𝑦) = 𝑐𝑐𝑐𝑐𝑐𝑐(𝑥𝑥, 𝑦𝑦)2 ⟹ 0 ≤ 𝑅𝑅2 ≤ 1
The closer 𝑅𝑅2 is to 1, the better.
PROBABILITY AND DATA ANALYSIS DATA SCIENCE AND ENGINEERING

MULTIPLE LINEAR REGRESSION MODEL


It Is used to predict the value of a response Y from the value of an explanatory
variable X. The least-squares fit:

1. We have 𝑛𝑛 observ. for 𝑖𝑖 = 1, … , 𝑛𝑛: 𝑦𝑦𝑖𝑖 = 𝛽𝛽0 + 𝛽𝛽1 𝑥𝑥𝑖𝑖1 + 𝛽𝛽2 𝑥𝑥𝑖𝑖2 + ⋯ 𝛽𝛽𝑘𝑘 𝑥𝑥𝑖𝑖𝑖𝑖 + 𝑢𝑢𝑖𝑖

� + 𝛽𝛽
2. We wish to fit the data in the form: 𝑦𝑦�𝑖𝑖 = 𝛽𝛽 � 𝑥𝑥 + 𝛽𝛽
� 𝑥𝑥 + ⋯ 𝛽𝛽
� 𝑥𝑥
0 1 𝑖𝑖1 2 𝑖𝑖2 𝑘𝑘 𝑖𝑖𝑖𝑖

MODEL IN MATRIX FORM


We can write the model as a matrix relationship: 𝑦𝑦 = 𝑋𝑋𝑋𝑋 + 𝑢𝑢 , where
𝒚𝒚 ≡ 𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟 𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣; 𝑿𝑿 ≡ 𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒 𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣 𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚;
𝜷𝜷 ≡ 𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣 𝑜𝑜𝑜𝑜 𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝; 𝒖𝒖 ≡ 𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒 𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣

LEAST-SQUARES ESTIMATION
The least-squares vector parameter estimate 𝛽𝛽̂ :
(𝑋𝑋 𝑇𝑇 𝑋𝑋)𝛽𝛽 = 𝑋𝑋 𝑇𝑇 𝑦𝑦 ⟹ 𝛽𝛽 = (𝑋𝑋 𝑇𝑇 𝑋𝑋)−1 𝑋𝑋 𝑇𝑇 𝑦𝑦
� = 𝑿𝑿𝜷𝜷
The vector 𝑦𝑦� is given by: 𝒚𝒚 �

VARIANCE ESTIMATION
2 ∑𝑛𝑛 2
𝑖𝑖=1 𝑒𝑒𝑖𝑖
An estimator for the error variance is the residual (quasi-)variance: 𝑆𝑆𝑅𝑅 =
𝑛𝑛−𝑘𝑘−1

ANOVA DESCOMPOSITION

𝑆𝑆𝑆𝑆𝑆𝑆 = 𝑆𝑆𝑆𝑆𝑆𝑆 + 𝑆𝑆𝑆𝑆𝑆𝑆


𝑛𝑛 𝑛𝑛 𝑛𝑛

𝑆𝑆𝑆𝑆𝑆𝑆 = �(𝑦𝑦𝑖𝑖 − 𝑦𝑦�)2 𝑆𝑆𝑆𝑆𝑆𝑆 = �(𝑦𝑦𝑖𝑖 − 𝑦𝑦�𝑖𝑖 )2 𝑆𝑆𝑆𝑆𝑆𝑆 = �(𝑦𝑦�𝑖𝑖 − 𝑦𝑦�)2


𝑖𝑖=1 𝑖𝑖=1 𝑖𝑖=1

𝑆𝑆𝑆𝑆𝑆𝑆 𝑆𝑆𝑆𝑆𝑆𝑆
𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶. 𝑜𝑜𝑜𝑜 𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑 = 𝑅𝑅2 = =1−
𝑆𝑆𝑆𝑆𝑆𝑆 𝑆𝑆𝑆𝑆𝑆𝑆

You might also like