Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
30 views28 pages

Week3 Stats

Download as pdf or txt
Download as pdf or txt
Download as pdf or txt
You are on page 1/ 28

Week-3: Statistics

Normal distribution
Skewness & Kurtosis Both these metrics can be used
check deviation of a given
distribution from normality of
distribution

>3
=3

<3

Skewness indicates how the overall shape of a distribution Kurtosis indicates how the tails of a distribution is shaped
deviates from normal distribution/ symmetric >3, thinner tail
>0, right skewed <3, thicker tail
<0, left skewed
central limit theorem
Central limit Theorem
• Population: {18,20,21,23,26,27,30,35,41,45}
• Age of 10 people

What is the average 28.6 yrs.


Central limit Theorem
Actual distribution Sampling distribution
• Population: Age of 10 people
• {18,20,21,23,26,27,30,35,41,45}

• Sample size 3
• Random sample
• {18,23,30} Average: 23.66
• {21,27,41}29.66

frequency
• {20,35,45}33.33 N=7
• {35,41,45}40.33
• | N=3
• |
• | 28.6
Number of trials
Average of the samples
N>30
Central Limit Theorem
• The mean of the sampling distribution of means is equal to the mean
of the population from which the samples were drawn.
• The variance of the sampling distribution of means is equal to the
variance of the population from which the samples were drawn
divided by the size of the samples.
• If the original population is distributed normal (i.e. it is bell shaped),
the sampling distribution of means will also be normal. If the original
population is not normally distributed, the sampling distribution of
means will increasingly approximate a normal distribution as sample
size increases (i.e. when increasingly large samples are drawn).
Actual distribution Sampling distribution
How is this useful
• In reality estimating population statistics is hard large size

• what is the average life expectancy in the world today


• What is the average weight of a 6month old toddler?

• Get the sample distribution (N30)


• Evaluate the population mean and variance based on sampling
distribution
Hypothesis testing
Hypothesis Testing
• How do we answer questions such as

• Exercising for 2Hrs a day helps reduce weight by 5 Kgs

• A new drug in the market helps reduce the side effects of Diabetes by 50%

• Starting to office early by 15 min will help reach faster by 10 minutes


• Start at 9:00 reach at 10:00 (1 Hr)
• Start at 8:45. reach by 9:35 (50 Min)
Hypothesis Testing
• The effect could be due random chance or due to the assumed
(hypothesized) cause.

• The effect may not be same for all the members of the sample

• Through hyp. Testing we are trying to figure out whether the


observed trend in data is due to random chance or not
• The outcome of the of the test to figure out if the data is statiscally
significant to reject /not reject the theory
Hypothesis Testing
• Hypothesis testing: statiscal way of testing a theory/speculation
• Structured
• Null hypothesis (Not guilty) [Not guilty till proved]
• Alternate hypothesis (guilty)
• Based on evidence
• Use observed data (collect statements from witnesses, evidences)
• Pre defined procedure
• Test of means
• Test of variance
• Test of proportion
• Any evidence is not evidence statically significance
• Significance level
• P value
• Critical values
• Predefined outcomes
• Reject null hypothesis (prove guilty)
• Fail to reject null hypothesis (Failed to prove guilty due to lack of evidence)
Hypothesis testing
• Hypothesis A claim that we want tested
• Two components in SHT Null Hyp (Ho) and Alternate/Research Hyp(Ha)
• Ho is the current belief/value/ status quo/established fact
• Ha is the new proposal/claim opposite to Ho/ challenge Ho
• Ho is true till test rejects it (innocent till proven guilty)
• Only reject or fail to reject Ho (we do not accept the null hypothesis)
• Ho and Ha are mathematical opposites H H
o a

a=b ab
ab a>b
ab a<b
Hypothesis testing: example-1
• Premise: the average IQ of students admitted to a city college ‘X’ is at best
120. The new administration in the college thinks that its current students
are much smarter
Hypothesis testing: example-1
• Premise: the average IQ of students admitted to a city college ‘X’ is at best
120. The new administration in the college thinks that its current students
are much smarter

• Ho (current belief/value/ status quo/established fact)IQ 120


• Ha (challenge Ho )IQ>120
Type I and Type II Errors
Type I Error:
• Rejection of null hypothesis when it should not have been rejected.
• Incorrectly rejecting the null hypothesis.

Type II Error:
• Failure to reject the null hypothesis, when it should have been rejected.
• Incorrectly not rejecting the null hypothesis.
Decision/ Reality Ho True (Should not reject) Ho False (Should reject)

Reject Ho Type I Error (α) (FP) Correct Rejection (No error)


( reject when should not be rejected) (1-β) TP
Fail to Reject Ho Correct Decision (No error) (1-α) Type II Error (β) (FN)
(TN) ( fail to reject when it should be
rejected) J\R I G
Causes of Type I and Type II Errors: G
• By random chance, we may select a sample which is not representative of the population. I
• Sampling techniques may be flawed.
• Assumptions in our null hypothesis may be flawed.
17
Relation between ,  and power
Hypothesis test
• Define the problem
• Formulate the Ho and Ha
• Decide on confidence level/level of significance (0.95/0.05 typical)
• Collect data
• Type of test, mean/variance/proportion
• Test metric Z, t, chi-squared, F
• Identify one-tailed/ two-tailed
• For the given condition determine the critical value/p value
• conclude based on calculated statistic Vs critical statistic / p value
Confidence interval
• Repeated sampling with replacement (same size)
• For 95% confidence level,
• repeat the sampling 100 times and perform the HT.

• The outcome of the HT is same at least 95 times out of 100

• C+ =1
Two tail and One tail test
• Ho:a=b
• Ha: ab
• =0.05

• Ho:ab
• Ho:ab
• Ha: a>b
• Ha: a<b
• =0.05
• =0.05
21
Problem-1
• The school nurse thinks the average height of 7th graders has increased. The
average height of a 7th grader five years ago was 145 cm with a standard
deviation of 20 cm. She takes a random sample of 200 students and finds that the
average height of her sample is 147 cm. Are 7th graders now taller than they
were before? Conduct a X-tailed hypothesis test use a .05 significance level to
evaluate the null and alternative hypotheses.

•  Ho:
•  Ha:
• Confidence level:
• Test metric:
• Tail: one tail/two tail?
• Sample size:
Problem-1
• The school nurse thinks the average height of 7th graders has increased. The
average height of a 7th grader five years ago was 145 cm with a standard
deviation of 20 cm. She takes a random sample of 200 students and finds that the
average height of her sample is 147 cm. Are 7th graders now taller than they
were before? Conduct a X-tailed hypothesis test use a .05 significance level to
evaluate the null and alternative hypotheses.

•  Ho: hnow h5yearsbefore z = x̅ - µ


•  Ha: hnow>h5yearsbefore σ/ √n
• Confidence level: 95% (  =0.05 given) Z= (147-145)/(20/200) Z=1.41412
• Test metric: Z ( population std deviation given) =1.41412
• Tail: one tail/two tail? One tailed
• Sample size:200
for =0.05, Zcritical=1.645
Problem -2
• A farmer is trying out a planting technique that he hopes will increase
the yield on his pea plants. The average number of pods on one of his
pea plants is 145 pods with a standard deviation of 100 pods. This
year, after trying his new planting technique, he takes a random
sample of his plants and finds the average number of pods to be 147.
He wonders whether or not this is a statistically significant increase.
What are his hypotheses and the test statistic?
 Ho:
 Ha:
Confidence level:
Test metric:
Tail: one tail/two tail?
Sample size:
Problem -2
• A farmer is trying out a planting technique that he hopes will increase
the yield on his pea plants. The average number of pods on one of his
pea plants is 145 pods with a standard deviation of 100 pods. This
year, after trying his new planting technique, he takes a random
sample of his plants and finds the average number of pods to be 147.
He wonders whether or not this is a statistically significant increase.
What are his hypotheses and the test statistic?
 Ho:
 Ha:
Confidence level:
Test metric:
Tail: one tail/two tail?
Sample size:
400 Contestants 300 Other labels ( winners
/runnerup….) 100
Still operating 100 40
closed 200 60
ANOVA

You might also like