0% found this document useful (0 votes)

1 views

Data Analytics - Notes

Uploaded by

17clopezalvarez

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

1 views

Data Analytics - Notes

Uploaded by

17clopezalvarez

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

Lecture 4 : Estimation

Sample : part of the population

Let θ be the population parameter

Let X1,...,Xn be random variables
An estimator of θ is θ=g(X1,...,Xn) where g is a function of X1,...,Xn
θ is hence also random
Ex of an estimator would be the average

Be careful of the differences :

θ is a parameter, in fact, it’s the parameter we want to estimate
θ(X)=g(X1,...,Xn) is the estimator, meaning the function we use the estimate θ
θ(x)=g(x1,...,xn) is an estimate, meaning a value obtained by applying the estimator
to observed values

Example with Netflix : which thumbnail should be used?

Population parameters : µ𝐴 and µ𝐵 are the two averages of hours watched for
thumbnails A and B
Estimators : µ𝐴 (X)= (X1A+...+X100A)/100 and µ𝐵 = (X1B+...+X100B)/100

Estimates : µ𝐴 (x)=4.5 and µ𝐵 (x)=3

Sampling error :
Difference between a point estimate and the true population parameter due to having
a random sample

What is the distribution of µ𝐴 ?

Sampling distribution of the average
⇒ from the CLT, we know it’s approx normal

Confidence interval :
We want to construct a 95% interval for µ
2 2
σ σ 𝑋−µ
From the CLT : 𝑋 ∼ 𝑁(µ, 𝑛
) ⇔ 𝑋 − µ ∼ 𝑁 (0, 𝑛
)⇔ ∼ 𝑁 (0, 1)
σ/ 𝑛
𝑋−µ
Then, we find z such that : P(-z< <z)=0.95
σ/ 𝑛
𝑋−µ
Since the normal distribution is symmetric, this is the same as P( >z)=0.025
σ/ 𝑛
We find that z=1.96
So :
𝑋−µ σ σ
P(-1.96< <1.96)=0.95 ⇔ P(-1.96 <𝑋 − µ<1.96 )=0.95
σ/ 𝑛 𝑛 𝑛
σ σ
⇔ P(- 𝑋 -1.96 <− µ< - 𝑋 +1.96 )=0.95
𝑛 𝑛
σ σ
⇔P( 𝑋 -1.96 <µ< 𝑋 +1.96 )=0.95
𝑛 𝑛
σ σ
So the confidence interval is (𝑥 -1.96 , 𝑥+1.96 )
𝑛 𝑛

For α ∈ (0, 1), 1-α is the confidence level and the confidence interval will be :
σ σ
(𝑥-𝑧α/2 ,𝑥+𝑧α/2 )
𝑛 𝑛

2
What if we don’t know the population variance σ ?
We use the t-statistic :
𝑠 𝑠
I = (𝑥-𝑡α/2,𝑛−1 ,𝑥+𝑡α/2,𝑛−1 )
𝑛 𝑛
2
Here, we have n-1 degrees of freedom - 𝑠 is the estimate of the ppl variance

Confidence interval for a proportion p :

𝑝(1−𝑝) 𝑝(1−𝑝)
I = (𝑝 − 𝑧α/2 𝑛
, 𝑝 + 𝑧α/2 𝑛
, )

𝑝 is the population proportion

𝑝(1−𝑝) 𝑝(1−𝑝)
The variance of the sample proportion is 𝑛
, so we estimate it with 𝑛
Lecture 5 : Testing
→ Translate our questions into a set of testable hypothesis
Ex with Netflix :

Null Hypothesis 𝐻0 : thumbnails A and B lead to the same level of engagement

Alternative Hypothesis 𝐻1 : they lead to different levels of engagement

In terms of math :
𝐻0 : µ𝐴 = µ𝐵
𝐻1 : µ𝐴 ≠ µ𝐵

The Null Hypothesis always represents the default position, or the assumption of
“no effect”, “no difference”

We always assume 𝐻0 is true until data says otherwise

We ask : if 𝐻0 is actually true, what’s the chance of getting sample evidence as or
more extreme than what we observed?

small chance ⇒ evidence against the null

Possible conclusions :
- We reject 𝐻0
- We fail to reject 𝐻0

Netflix example :
- Under 𝐻0, we have µ𝐴 = µ𝐵
- If 𝐻0 is true, we’re likely to have µ𝐴 − µ𝐵 close to 0
- If 𝐻0 is true , we’re unlikely to have µ𝐴 − µ𝐵 far from 0

Defs :

Fail to reject 𝐻0 Reject 𝐻0

𝐻0 true No problem Type 1 error

𝐻0 false Type 2 error No problem

Significance level :
α = P(Type 1 error) = P(Reject 𝐻0|𝐻0 is true)

Power :
1-β with β = P(Type 2 error) = P(Fail to reject 𝐻0|𝐻0 is false)

Types of test :

Test statistics :

θ(𝑋)−θ
Test statistic = , where :
𝑆𝐷(θ(𝑋))

- θ(𝑋) is the estimator for θ

- The test statistic is a random variable, its distribution tells us how likely
different outcomes are true when 𝐻0 is true

θ(𝑥)−θ0
Observed test statistic = , where :
𝑆𝐷(θ(𝑋))

- θ(𝑥) is the estimate we get from the data

- θ0 is the value of θ if 𝐻0 is true

Test statistic depends on whether we have one or two parameters and on whether
we know the population variance
5 steps of hypothesis testing :
- State the Null and alternative hypothesis
- Choose a test and a significance level
- Compute the observed test-statistic
- Calculate the p value
- Make a statistical decision - interpret the results

Computing the p-value :

Smaller p values ⇒ more evidence against the Null

Course Project LL
No ratings yet
Course Project LL
3 pages
Periodograms and Blackman-Tukey Spectral Estimation: - Objectives
No ratings yet
Periodograms and Blackman-Tukey Spectral Estimation: - Objectives
11 pages
Applied Longitudinal Data Analysis Ch1&2
No ratings yet
Applied Longitudinal Data Analysis Ch1&2
48 pages
Stats 2 Notes
No ratings yet
Stats 2 Notes
17 pages
4_6023601120499207603
No ratings yet
4_6023601120499207603
25 pages
X400004_20220215_solutions
No ratings yet
X400004_20220215_solutions
8 pages
Review 2 Summary
No ratings yet
Review 2 Summary
4 pages
Outline Note Allan Agresti
No ratings yet
Outline Note Allan Agresti
187 pages
Formula and Table Value
No ratings yet
Formula and Table Value
7 pages
s2 Revision Notes
No ratings yet
s2 Revision Notes
5 pages
Lecture Note 4
No ratings yet
Lecture Note 4
6 pages
Confidence interval and credintial interval
No ratings yet
Confidence interval and credintial interval
15 pages
STA-CM 121 Lecture 2
No ratings yet
STA-CM 121 Lecture 2
18 pages
ECON 2P91: Assignment #1
No ratings yet
ECON 2P91: Assignment #1
7 pages
Testing Concepts.: 1 Hypotheses
No ratings yet
Testing Concepts.: 1 Hypotheses
6 pages
+part 03 - AMEFA - 2024 - Introduction and Repetition
No ratings yet
+part 03 - AMEFA - 2024 - Introduction and Repetition
46 pages
1 Estimation PDF
No ratings yet
1 Estimation PDF
31 pages
Aspiri Statistics Course
No ratings yet
Aspiri Statistics Course
6 pages
Script Confidence Intervals PDF
No ratings yet
Script Confidence Intervals PDF
16 pages
Session 4-5 Reference: SFM Ch.5
No ratings yet
Session 4-5 Reference: SFM Ch.5
24 pages
Inference using normal and t distribution
No ratings yet
Inference using normal and t distribution
9 pages
2A3. Review of Mathematical Statistics
No ratings yet
2A3. Review of Mathematical Statistics
8 pages
Book Down
No ratings yet
Book Down
17 pages
Chapter 9 1
No ratings yet
Chapter 9 1
9 pages
Method of Moments
No ratings yet
Method of Moments
5 pages
Exercises Lecture 5 Including Solutions
No ratings yet
Exercises Lecture 5 Including Solutions
3 pages
Hypothesis Test-5
No ratings yet
Hypothesis Test-5
32 pages
Engineering Mathematics Lecture Frequency distribution - Central Tendency
No ratings yet
Engineering Mathematics Lecture Frequency distribution - Central Tendency
6 pages
Lecture 21 STATS 30301
No ratings yet
Lecture 21 STATS 30301
11 pages
Formulas
No ratings yet
Formulas
2 pages
Asymptotic Statistics (By Changliang ZOU)
No ratings yet
Asymptotic Statistics (By Changliang ZOU)
115 pages
Estimation and Hypothesis testing (1)
No ratings yet
Estimation and Hypothesis testing (1)
44 pages
2 Bootstrap: 2.1 Review On Usual Asymptotic Inference
No ratings yet
2 Bootstrap: 2.1 Review On Usual Asymptotic Inference
7 pages
Answer Sheet To Prob 2
No ratings yet
Answer Sheet To Prob 2
3 pages
Lesson 2 Statistical Inference
No ratings yet
Lesson 2 Statistical Inference
45 pages
Queueing Theory and Birth Death Process
No ratings yet
Queueing Theory and Birth Death Process
5 pages
Tutorial 7 So LN
No ratings yet
Tutorial 7 So LN
10 pages
12-UnknownProportions (1)
No ratings yet
12-UnknownProportions (1)
37 pages
Hypothesis Testing I
No ratings yet
Hypothesis Testing I
7 pages
small data
No ratings yet
small data
30 pages
Study Guide - Biostatistics: 35% of Prevmed Exam (With Epi)
No ratings yet
Study Guide - Biostatistics: 35% of Prevmed Exam (With Epi)
14 pages
Rolle, MVT Theorem
No ratings yet
Rolle, MVT Theorem
16 pages
STAT 135 Lab 2 Confidence Intervals, MLE and The Delta Method
No ratings yet
STAT 135 Lab 2 Confidence Intervals, MLE and The Delta Method
28 pages
Two-Sample T-Tests Using Effect Size
No ratings yet
Two-Sample T-Tests Using Effect Size
11 pages
Lectuer 21-ConfidenceInterval
No ratings yet
Lectuer 21-ConfidenceInterval
41 pages
Session 5-6
No ratings yet
Session 5-6
25 pages
Statistics 512 Notes I D. Small
No ratings yet
Statistics 512 Notes I D. Small
8 pages
Reliability Theory and Survival Analysis Final
No ratings yet
Reliability Theory and Survival Analysis Final
12 pages
stats exam 1
No ratings yet
stats exam 1
2 pages
Rakhlin Mathstat sp22
No ratings yet
Rakhlin Mathstat sp22
108 pages
Lecture 03. Statistical Inference
No ratings yet
Lecture 03. Statistical Inference
31 pages
3.Handouts_binary_dependent_variables
No ratings yet
3.Handouts_binary_dependent_variables
8 pages
Neyman Pearson Detectors
No ratings yet
Neyman Pearson Detectors
5 pages
Math
No ratings yet
Math
8 pages
Lecture 3 - Sampling-Distribution & Central Limit Theorem
No ratings yet
Lecture 3 - Sampling-Distribution & Central Limit Theorem
5 pages
MAP&MLE
No ratings yet
MAP&MLE
44 pages
Stat Midterm Revision
No ratings yet
Stat Midterm Revision
20 pages
Theoretical Statistics. Lecture 15.: M-Estimators. Consistency of M-Estimators. Nonparametric Maximum Likelihood
No ratings yet
Theoretical Statistics. Lecture 15.: M-Estimators. Consistency of M-Estimators. Nonparametric Maximum Likelihood
20 pages
Unbiased Statistic
No ratings yet
Unbiased Statistic
15 pages
46 - Calculus Stuff You Must Know Cold
No ratings yet
46 - Calculus Stuff You Must Know Cold
15 pages
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)
Student's Solutions Manual and Supplementary Materials for Econometric Analysis of Cross Section and Panel Data, second edition
From Everand
Student's Solutions Manual and Supplementary Materials for Econometric Analysis of Cross Section and Panel Data, second edition
Jeffrey M. Wooldridge
No ratings yet
BAYES Theorem
From Everand
BAYES Theorem
Jeffery Short
2/5 (5)
Mathematics Subject Curriculum
No ratings yet
Mathematics Subject Curriculum
12 pages
دور الحوافز في استقطاب الكفاءات في المؤسسة الاقتصادية الجزائرية دراسة حالة شركة Sonelgaz بسطيف
No ratings yet
دور الحوافز في استقطاب الكفاءات في المؤسسة الاقتصادية الجزائرية دراسة حالة شركة Sonelgaz بسطيف
19 pages
An Aggregate-Disaggregate Intermittent Demand Approach (ADIDA) To Forecasting: An Empirical Proposition and Analysis
No ratings yet
An Aggregate-Disaggregate Intermittent Demand Approach (ADIDA) To Forecasting: An Empirical Proposition and Analysis
17 pages
PREFACE - Introduction To Linear Regression Analysis, 5th Edition
No ratings yet
PREFACE - Introduction To Linear Regression Analysis, 5th Edition
4 pages
Filtering For Speech Enhancement: H H H H
No ratings yet
Filtering For Speech Enhancement: H H H H
4 pages
TQ Statistics 11 - Q3 UPDATED
100% (1)
TQ Statistics 11 - Q3 UPDATED
2 pages
MAS202 - Homework For Chapter 13-14
No ratings yet
MAS202 - Homework For Chapter 13-14
7 pages
Liquidity NIC
75% (8)
Liquidity NIC
26 pages
Regression Analysis: League Points Using Goals Scored and Wins by 3 or More Goals
No ratings yet
Regression Analysis: League Points Using Goals Scored and Wins by 3 or More Goals
3 pages
ERERER
No ratings yet
ERERER
1 page
IS328 Data Mining-Tutorial 1 Solution
No ratings yet
IS328 Data Mining-Tutorial 1 Solution
5 pages
Unit 2 Ma 202
No ratings yet
Unit 2 Ma 202
13 pages
Psych Stats 4 Parametric Tests
No ratings yet
Psych Stats 4 Parametric Tests
133 pages
6 The Korean Wave - A Quantitative Study On K-Pop's Aesthetic Presence in Philippine Multi-Media
No ratings yet
6 The Korean Wave - A Quantitative Study On K-Pop's Aesthetic Presence in Philippine Multi-Media
12 pages
Regression Solution
No ratings yet
Regression Solution
11 pages
Randomly Ask 20 People The Following and Record Their Values: (Please Refer To The Excel Sheet To See Which Question You Have Been Assigned)
No ratings yet
Randomly Ask 20 People The Following and Record Their Values: (Please Refer To The Excel Sheet To See Which Question You Have Been Assigned)
6 pages
LECTURE NOTES_3
No ratings yet
LECTURE NOTES_3
17 pages
Assignment 2
No ratings yet
Assignment 2
2 pages
CHAPTER 03-Random Variable
No ratings yet
CHAPTER 03-Random Variable
68 pages
Croston Method
No ratings yet
Croston Method
10 pages
Chapter 4 - Anova (EC2206 B2) 22dec22 (Thursday) - 7-12
No ratings yet
Chapter 4 - Anova (EC2206 B2) 22dec22 (Thursday) - 7-12
6 pages
MA 2213 - Tutorial 1
No ratings yet
MA 2213 - Tutorial 1
3 pages
The ARRIVE Guidelines Checklist: Animal Research: Reporting in Vivo Experiments
No ratings yet
The ARRIVE Guidelines Checklist: Animal Research: Reporting in Vivo Experiments
2 pages
Unit 3 - Activity 4 - Visual Representation of Data Worksheet
No ratings yet
Unit 3 - Activity 4 - Visual Representation of Data Worksheet
2 pages
DM Chapter 3 Data Preprocessing
No ratings yet
DM Chapter 3 Data Preprocessing
76 pages
BDRRMC Powerpoint
No ratings yet
BDRRMC Powerpoint
40 pages
Hypothesis Testing
No ratings yet
Hypothesis Testing
65 pages