Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
1 views

Data Analytics - Notes

Uploaded by

17clopezalvarez
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
1 views

Data Analytics - Notes

Uploaded by

17clopezalvarez
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Lecture 4 : Estimation

Sample : part of the population

Let θ be the population parameter


Let X1,...,Xn be random variables
An estimator of θ is θ=g(X1,...,Xn) where g is a function of X1,...,Xn
θ is hence also random
Ex of an estimator would be the average

Be careful of the differences :


θ is a parameter, in fact, it’s the parameter we want to estimate
θ(X)=g(X1,...,Xn) is the estimator, meaning the function we use the estimate θ
θ(x)=g(x1,...,xn) is an estimate, meaning a value obtained by applying the estimator
to observed values

Example with Netflix : which thumbnail should be used?


Population parameters : µ𝐴 and µ𝐵 are the two averages of hours watched for
thumbnails A and B
Estimators : µ𝐴 (X)= (X1A+...+X100A)/100 and µ𝐵 = (X1B+...+X100B)/100

Estimates : µ𝐴 (x)=4.5 and µ𝐵 (x)=3

Sampling error :
Difference between a point estimate and the true population parameter due to having
a random sample

What is the distribution of µ𝐴 ?


Sampling distribution of the average
⇒ from the CLT, we know it’s approx normal

Confidence interval :
We want to construct a 95% interval for µ
2 2
σ σ 𝑋−µ
From the CLT : 𝑋 ∼ 𝑁(µ, 𝑛
) ⇔ 𝑋 − µ ∼ 𝑁 (0, 𝑛
)⇔ ∼ 𝑁 (0, 1)
σ/ 𝑛
𝑋−µ
Then, we find z such that : P(-z< <z)=0.95
σ/ 𝑛
𝑋−µ
Since the normal distribution is symmetric, this is the same as P( >z)=0.025
σ/ 𝑛
We find that z=1.96
So :
𝑋−µ σ σ
P(-1.96< <1.96)=0.95 ⇔ P(-1.96 <𝑋 − µ<1.96 )=0.95
σ/ 𝑛 𝑛 𝑛
σ σ
⇔ P(- 𝑋 -1.96 <− µ< - 𝑋 +1.96 )=0.95
𝑛 𝑛
σ σ
⇔P( 𝑋 -1.96 <µ< 𝑋 +1.96 )=0.95
𝑛 𝑛
σ σ
So the confidence interval is (𝑥 -1.96 , 𝑥+1.96 )
𝑛 𝑛

For α ∈ (0, 1), 1-α is the confidence level and the confidence interval will be :
σ σ
(𝑥-𝑧α/2 ,𝑥+𝑧α/2 )
𝑛 𝑛

2
What if we don’t know the population variance σ ?
We use the t-statistic :
𝑠 𝑠
I = (𝑥-𝑡α/2,𝑛−1 ,𝑥+𝑡α/2,𝑛−1 )
𝑛 𝑛
2
Here, we have n-1 degrees of freedom - 𝑠 is the estimate of the ppl variance

Confidence interval for a proportion p :


𝑝(1−𝑝) 𝑝(1−𝑝)
I = (𝑝 − 𝑧α/2 𝑛
, 𝑝 + 𝑧α/2 𝑛
, )

𝑝 is the population proportion


𝑝(1−𝑝) 𝑝(1−𝑝)
The variance of the sample proportion is 𝑛
, so we estimate it with 𝑛
Lecture 5 : Testing
→ Translate our questions into a set of testable hypothesis
Ex with Netflix :

Null Hypothesis 𝐻0 : thumbnails A and B lead to the same level of engagement


Alternative Hypothesis 𝐻1 : they lead to different levels of engagement

In terms of math :
𝐻0 : µ𝐴 = µ𝐵
𝐻1 : µ𝐴 ≠ µ𝐵

The Null Hypothesis always represents the default position, or the assumption of
“no effect”, “no difference”

We always assume 𝐻0 is true until data says otherwise


We ask : if 𝐻0 is actually true, what’s the chance of getting sample evidence as or
more extreme than what we observed?

small chance ⇒ evidence against the null

Possible conclusions :
- We reject 𝐻0
- We fail to reject 𝐻0

Netflix example :
- Under 𝐻0, we have µ𝐴 = µ𝐵
- If 𝐻0 is true, we’re likely to have µ𝐴 − µ𝐵 close to 0
- If 𝐻0 is true , we’re unlikely to have µ𝐴 − µ𝐵 far from 0

Defs :

Fail to reject 𝐻0 Reject 𝐻0

𝐻0 true No problem Type 1 error

𝐻0 false Type 2 error No problem


Significance level :
α = P(Type 1 error) = P(Reject 𝐻0|𝐻0 is true)

Power :
1-β with β = P(Type 2 error) = P(Fail to reject 𝐻0|𝐻0 is false)

Types of test :

Test statistics :

θ(𝑋)−θ
Test statistic = , where :
𝑆𝐷(θ(𝑋))

- θ(𝑋) is the estimator for θ


- The test statistic is a random variable, its distribution tells us how likely
different outcomes are true when 𝐻0 is true

θ(𝑥)−θ0
Observed test statistic = , where :
𝑆𝐷(θ(𝑋))

- θ(𝑥) is the estimate we get from the data


- θ0 is the value of θ if 𝐻0 is true

Test statistic depends on whether we have one or two parameters and on whether
we know the population variance
5 steps of hypothesis testing :
- State the Null and alternative hypothesis
- Choose a test and a significance level
- Compute the observed test-statistic
- Calculate the p value
- Make a statistical decision - interpret the results

Computing the p-value :

Smaller p values ⇒ more evidence against the Null

You might also like