Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
2 views

Lecture 8 Hypothesis Testing

Hypothesis testing is a statistical method used to evaluate two mutually exclusive population statements through experimental data, involving a null hypothesis (H0) and an alternative hypothesis (H1). The document outlines the processes for conducting Z-tests and T-tests, including calculating test statistics and determining significance levels. It also discusses the Chi-square test for assessing discrepancies between observed and expected frequencies, and its application in feature selection for machine learning.

Uploaded by

saibole2003
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Lecture 8 Hypothesis Testing

Hypothesis testing is a statistical method used to evaluate two mutually exclusive population statements through experimental data, involving a null hypothesis (H0) and an alternative hypothesis (H1). The document outlines the processes for conducting Z-tests and T-tests, including calculating test statistics and determining significance levels. It also discusses the Chi-square test for assessing discrepancies between observed and expected frequencies, and its application in feature selection for machine learning.

Uploaded by

saibole2003
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 44

Hypothesis Testing

Hypothesis testing is a statistical method that is used in making a


statistical decision using experimental data.

Hypothesis testing evaluates two mutually exclusive population statements to


determine which statement is most supported by sample data.
Parameters of hypothesis testing

•Null hypothesis(H0): It is a basic assumption based on the


problem knowledge.

•Alternative hypothesis(H1): The alternative hypothesis is the


hypothesis used in hypothesis testing that is contrary to the null
hypothesis.

Null Hypothesis : A company production is equal to 50 unit/per day


Alternate Hypothesis: : A company production is not equal to 50
unit/per day
H0 : amount of lead in Maggie noodles does not exceed the maximum limit i.e., 2.5ppm
H1: amount of lead in Maggie noodles exceed the maximum limit i.e., 2.5ppm
Outcome 1: We reject the null hypothesis when in reality it is false.
Outcome 2: We reject the null hypothesis when in reality it is true.
(Type 1 Error)
Outcome 3: We failed to reject the null hypothesis when in reality it is false.
(Type 2 Error)
Outcome 4: We failed to reject the null hypothesis when in reality it is true.

We say “We failed to reject the null hypothesis” instead of “we accept the null hypothesis”.
• P-value
The P value is the probability for the null hypothesis to be true.

• Level of significance
The level of significance is the probability of rejecting the null hypothesis when
it is true.

If the p-value is less than α, then the null hypothesis is rejected, and the
alternative hypothesis is accepted. If the p-value is greater than α, then the null
hypothesis is not rejected.
Z - Test

When to Use Z-test:


•Samples should be drawn at random from the population.
•The sample size should be greater than 30.
•The standard deviation of the population should be known.
Steps to perform Z-test:
• First, identify the null and alternate hypotheses.
• Determine the level of significance (∝).
• Calculate the z-test statistics. Below is the formula for calculating the z-test
statistics.

where,
: mean of the sample.
: mean of the population.
: Standard deviation of the population.
n: sample size.
• Find p value using z statistics.
• Now compare with the hypothesis and decide whether to reject or not to reject
the null hypothesis
Suppose the arousal of hot cats has a population that is normally distributed with a
standard deviation of 6. Tomorrow you sample 49 hot cats from this population and
obtain a mean arousal of 46.44 and a standard deviation of 5.6968. Using an alpha
value of α = 0.01, is this observed mean significantly less than an expected arousal of
47?
Problem: A school claimed that the student’s study is more intelligent than the average
school. On calculating the IQ scores of 50 students, the average turns out to be 110. The
mean of the population IQ is 100 and the standard deviation is 15. State whether the claim of
principal is right or not at a 5% significance level.
A teacher claims that the mean score of students in his class is
greater than 82 with a standard deviation of 20. If a sample of 81
students was selected with a mean score of 90 then check if there is
enough evidence to support this claim at a 0.05 significance level.
Suppose the width of makeshift personalities has a population that is normally
distributed with a standard deviation of 7. You want to sample 22 makeshift
personalities from this population and obtain a mean width of 87.19 and a standard
deviation of 7.257. Using an alpha value of α = 0.01, is this observed mean significantly
less than an expected width of 89?
Z – Test (two – tailed)
Suppose the jewelry of exams has a population that is normally distributed with a
standard deviation of 5. You are walking down the street and sample 9 exams from
this population and obtain a mean jewelry of 28.95 and a standard deviation of
6.3802. Using an alpha value of α = 0.01, is this observed mean significantly different
than an expected jewelry of 27?
Suppose the life expectancy of Seattleites has a population that is normally distributed
with a standard deviation of 1. You go out and sample 45 Seattleites from this
population and obtain a mean life expectancy of 88.51 and a standard deviation of
1.0815. Using an alpha value of α = 0.05, is this observed mean significantly different
than an expected life expectancy of 89?
Suppose the width of bus riders has a population that is normally distributed with a
standard deviation of 10. Suppose that before graduation, your first job was to sample
98 bus riders from this population and obtain a mean width of 49.98 and a standard
deviation of 10.3386. Using an alpha value of α = 0.01, is this observed mean
significantly different than an expected width of 52?
T - Test

A t-test is a statistical test that compares the means of two samples. It is used in
hypothesis testing, with a null hypothesis that the difference in group means is
zero and an alternate hypothesis that the difference in group means is different
from zero.
There are three main types of t-test:

• A One sample t-test tests the mean of a single group against a known mean.
• An Independent Samples t-test compares the means for two groups.
• A Paired sample t-test compares means from the same group at different times
(say, one year apart).
Steps to perform T-test:
• First, identify the null and alternate hypotheses.
• Determine the level of significance (∝).
• Calculate the degree of freedom df = n-1
• Find the critical value of t in the t-test using t- table.
• Calculate the t-test statistics. Below is the formula for calculating the t-test
statistics.

where,
: mean of the sample.
: mean of the population.
: Standard deviation of the sample.
n: sample size.
• Now compare with the hypothesis and decide whether to reject or not to reject
the null hypothesis
Problem: A school claimed that the students’ study that is more intelligent than the average
school. On calculating the IQ scores of 30 students, the average turns out to be 140 and
standard deviation is 20. The mean of the population IQ is 100 . State whether the claim of
principal is right or not at a 5% significance level.
Suppose we are interested in determining whether the average weight of a certain
breed of dog is significantly different from a target weight of 25 pounds. We randomly
select a sample of 20 dogs from this breed and weigh them and get the mean 24
pounds and standard deviation is 0.7. State whether the claim we made is right or not
at a 5% significance level.
There are three main types of t-test:

• A One sample t-test tests the mean of a single group against a known mean.
• An Independent Samples t-test compares the means for two groups.
• A Paired sample t-test compares means from the same group at different times
(say, one year apart).
Practice Questions
Q1

Q2
Chi- Square Test

It is a powerful test for testing the significance of the discrepancy between theory and
experiment.
(OR)
The Chi-square (χ2 ) test represents a useful method of comparing experimentally obtained
results with those to be expected theoretically on some hypothesis.
The value of chi-square is very big it indicates that the divergence between expected
and observed frequencies is large.
If the value of chi-square is very small it indicates that the divergence between
actual and expected frequencies is very little.
The following steps are followed for the above said purpose:
i. A null and alternative hypothesis related to the enquiry
ii. expected or theoretical frequencies are derived through probability.
iii. A level of significance is chosen for rejection of the null hypothesis.
iv. Chi Square value

v. The observed frequencies are compared with the expected or theoretical


frequencies.

If the calculated value of is less than the table value, failed to reject the null
hypothesis. On the other hand, if the calculated value of is greater than the table
value, we will reject the null hypothesis.
Problem Ninety-six subjects are asked to express their attitude towards the
proposition “Should AIDS education be integrated in the curriculum of Higher
secondary stage” by marking F (favorable), I (indifferent) or U (unfavorable).
Observed(fo) 48 24 24

Expected (fe) 32 32 32

Test the hypothesis that “there is no difference between preferences in the group”.
Two hundred bolts were selected at random from the output of each of the five machines.
The number of defective bolts found were 5, 9, 13, 7 and 6 . Is there a significant
difference among the machines? Use 5% level of significance.
Chi- Square Test for
feature selection

Feature selection is selecting best and optimal features for Machine learning model.

In this we remove irrelevant or partially relevant features from the data.

(i) Minimizes the cost of computation.


(ii) Reduces the curse of dimensionality
(iii) Helps in achieving good accuracy.
Chi-square Test for Feature Extraction:

We calculate Chi-square between each feature and the target and select the desired
number of features with best Chi-square scores.
The higher the value of , the more dependent the output label is on the feature and
higher the importance the feature has on determining the output.

It determines if the association between two categorical variables of the sample would
reflect their real association in the population.
Consider the following table:-
The contingency table for the feature “Outlook” is constructed as below:-
The contingency table for the feature “Wind” is constructed as below:-
Thank
you

You might also like