Hypothesis Testing Brief
Hypothesis Testing Brief
Hypothesis Testing Brief
Decision Sciences
https://learn.upgrad.com/course/1260
Agenda
Topic Time (mins.)
Quiz 1. Framing of hypothesis https://forms.gle/JaUNBYKtgd7QgNor6 5
What is Hypothesis? Need for hypothesis in business 5
Converting of business problem into a hypothesis statement: Null and Alternate 10
Types of tail in test 10
Quiz 2. Testing of Hypothesis https://forms.gle/jLbszZKDucYstzyy5 5
Step-by-step process of hypothesis testing 5
Testing of Hypothesis: Critical Value method 15
Testing of Hypothesis: p value method 15
Types of Errors 10
Quiz3. Practice https://forms.gle/NmAufBEXStaRagRr8 HW
Q&A : Link to data
:https://drive.google.com/file/d/1Y37nO1ZA8OifzA38IEpCYTOMyXWJyYtp/view?usp=sharing 10
Total 90
Module 2: Hypothesis Testing
Hypothesis testing Z distribution
Two tailed test
Left tailed test
Right tailed test
Hypothesis testing t distribution
One sample
Two sample
Paired
Unpaired
A/B testing
• Hypothesis testing is designed to detect significant differences: differences that did not occur
by random chance.
Types of Hypothesis
Null hypothesis refers to a specified value of the population parameter not sample
A null hypothesis may be rejected, but it can never be accepted based on a single test.
Tails of test
We want to test that the We want to test that the We want to test that the
population mean is different population mean is less than 10 population mean is greater than 10
than 10
Significance Level (⍺) Two-Tailed Test One-Tailed Test (Left) One-Tailed Test (Right)
0.01 ±2.58 -2.326 +2.326
0.05 ±1.96 -1.645 +1.645
0.10 ±1.645 -1.282 +1.282
Steps Involved in Hypothesis Testing
Formulate H0 and Ha
Hypothesis Tests
Tests of Tests of
Association Differences
• Hypothesis Statement
Null hypothesis: The mileage is greater than or equal to 17
(as this is the default claim made by the brand )
Alternative hypothesis: The mileage is less than 17
(as this challenges the null hypothesis)
• Mathematically
Ho: Mileage (mean) ≥ 17
Hα: Mileage (mean) < 17
Formulating the Hypotheses
• Let’s say you are the COO of a shoe-manufacturing company. An employee has
developed a new sole and claims that incorporating it will decrease the wear after
three years of use by more than 9%. Now, suppose you want to test this claim.
• What will be the null and alternative hypotheses for the sole developed by the
employee in this scenario?
Ho: Decrease in wear after 3 years ≤ 9%; Ha: Decrease in wear after 3 years > 9%
Formulating the Hypotheses
• Mr. Mohan of the Civil Engineering Department wants to test the load bearing
capacity of an old bridge which must be more than 10 tons, in that case he can
state his hypotheses as under:
• Null hypothesis H0 : tons µ<=10
• Alternative Hypothesis Ha : tons µ > 10
Formulating the Hypotheses
• The average score in an aptitude test administered at the national level
is 80. To evaluate a state’s education system, the average score of 100
of the state’s students selected on random basis was 75. The state
wants to know if there is a significant difference between the local
scores and the national scores.
• Null hypothesis H0 : µ = 80
• Alternative Hypothesis Ha : µ ≠ 80
Formulating the Hypotheses
• The null hypothesis for this experiment is that the average weight of the flour
packages is 10 kg (no problem). The alternative hypothesis is that the average is not
10 kg (process is out of control).
Testing of Hypothesis
Type of Test
Type of Test
Test One sample Population standard N>30 Z test
(normality (Parameter of deviation is known
assumption) measurement:
mean)
Population standard N<30 Independent
deviation is not known sample t test
P value method
If p<=alpha , Reject Ho
If p> alpha, Fail to Reject Ho
Confidence level
The confidence level or reliability is the expected percentage of times that the actual value
will fall within the stated precision limits.
The confidence level is defined for the hypothesis test according to the accuracy needed.
A higher confidence level indicates that more evidence is needed to reject the null hypothesis.
Therefore, increasing the confidence level makes it harder to reject the null hypothesis.
Inversely, a low confidence level indicates that the null hypothesis can be rejected easily.
Thus, if we take a confidence level of 95%, then we mean that there are 95 chances in 100
(or .95 in 1) that the sample results represent the true condition of the population within a
specified precision range against 5 chances in 100 (or .05 in 1) that it does not.
We can always remember that if the confidence level is 95%, then the significance level will be
(100 – 95) i.e., 5%; if the confidence level is 99%, the significance level is (100 – 99) i.e., 1%.
We should also remember that the area of normal curve within precision limits for the
specified confidence level constitute the acceptance region and the area of the curve outside
these limits in either direction constitutes the rejection regions.
Level of Significance
• The significance level is the probability of
rejecting the null hypothesis when it is true.
• For example, a significance level of 0.05 indicates
a 5% risk of concluding that a difference exists
when there is no actual difference.
• Lower significance levels indicate that you
require stronger evidence before you will reject
the null hypothesis.
• Los (Alpha)= 1- CI
Z Score
If the z-score of the sample lies further away from the center than the critical z-
values, the null hypothesis is rejected.
Otherwise, the test fails to reject the hypothesis.
The only two possible outcomes of a hypothesis test are ‘reject the null hypothesis’
or ‘fail to reject the null hypothesis’. This hypothesis can never be ‘accepted’.
Commonly used critical z scores
Left tail test Two tail test Right tail test
One-Tailed Test
Significance Level (⍺) One-Tailed Test (Left) Two-Tailed Test
(Right)
0.01 -2.326 ±2.58 +2.326
0.05 -1.645 ±1.96 +1.645
0.10 -1.282 ±1.645 +1.282
Two tail test
Example 2: MS EXCEL
• Imagine you’re the owner of a pizza company, and you claim that your pizzas are
more than 9 inches in diameter. But you’ve been receiving complaints from some
of your customers, who say that the pizzas are actually smaller. Your task is to now
find out whether your chefs are producing smaller pizzas. In this case, you will
conduct a ‘left-tailed test’ by checking whether your sample mean is significantly
lesser than 9 inches, since you’re checking whether the complaints about smaller
pizzas are true.
• Hypothesis Statement
• Null hypothesis : Pizza size is at least 9 inches (i.e. 9 or more).
• Alternative hypothesis : Pizza size is less than 9 inches
• Mathematically
• Null hypothesis : Pizza size ≥ 9.
• Alternative hypothesis : Pizza size is < 9.
Hypothesis testing –One sample t test
• The One Sample t Test examines whether the mean of a population is
statistically different from a known or hypothesized value. The One
Sample t Test is a parametric test.
• Applicable when population standard deviation is unknown
• Number of samples is less than 30
Two Sample t Test
• When there is a need to compare the means of two samples, a two-
sample t-test is conducted. In such a case, the formula for the t-statistic
becomes
https://learn.upgrad.com/course/1260/segment/10349/64185/187901/997831
Types of two sample test
Paired t test Unpaired t test
https://www.technologynetworks.com/informatics/articles/paired-vs-unpaired-t-test-differences-assumptions-and-hypotheses-330826
Summary
1. Define the hypothesis statements: Your test will either ‘reject’ or ‘fail to reject’ the null hypothesis.
2. Collect as many data points as possible: The data points you collect will produce one sample. The size
of this single sample will depend on how many data points you take.
3. Measure the sample mean and the sample standard deviation: The standard deviation should be
calculated using the ‘n-1’ method. The STDEV function in Excel takes care of this.
4. Identify the distribution of the sample means: If the sample size is larger than 30, the distribution will
be a normal one (We’re only focusing on normal distributions for now).
5. Define the confidence level: This is the level of surety that you demand from a hypothesis test. The
higher the confidence level, the harder it is to reject the null hypothesis.
6. Find the critical z-scores of the confidence level and the test statistic or the z-score of the sample: The
z-score of the sample can be calculated by subtracting the hypothesised mean from the sample mean
and dividing it by the population standard deviation, divided by the root over sample size.
7. Compare the sample test statistic with the critical z-scores: Here, you check whether the sample
statistic is more extreme than the z-scores.
8. If the sample test statistic is more extreme than the critical z-scores, you will reject the null
hypothesis. Otherwise, you will fail to reject it.
Summary
When the test needs to check only positive or negative deviation from the null
hypothesis, a one-tailed test is performed.
When the test needs to check for deviation on either side of the null hypothesis, a
two-tailed test is performed.
When the sample size is low, a t-test is performed.
A t-test is also preferred over a z-test when the population standard deviation is
unknown.
When two sample means need to be checked for equality, a two-sample t-test is
performed.
When there is a need to check whether an entire distribution is similar to another,
a goodness of fit test is performed.
Hypothesis testing also carries some probability of committing errors. The errors
can be of two types: Type I and Type II.
A/B testing
An A/B test tells you whether there is a statistical difference in the performance
of the two options.
Data driven decision making system
A/B tests are used whenever there is a need to compare two alternatives.
The A/B test can be considered the most basic kind of randomized controlled
experiment
You will now learn about ‘A/B tests’, which are used in the industry when there is
a need to make a choice between two options. An A/B test tells you whether
there is a statistical difference in the performance of the two options.
A/B testing : History
• In the 1920s statistician and biologist Ronald Fisher discovered the most
important principles behind A/B testing and randomized controlled
experiments in general.
• Fisher ran agricultural experiments, asking questions such as, What happens if I
put more fertilizer on this land? The principles persisted and in the early 1950s
scientists started running clinical trials in medicine.
• In the 1960s and 1970s the concept was adapted by marketers to evaluate
direct response campaigns (e.g., would a postcard or a letter to target
customers result in more sales?).
Areas of application
• Medicine, to understand if a drug works or not
• Economics, to understand human behaviour
• Foreign aid and charitable work (the reputable ones at least), to understand which
interventions are most effective at alleviating problems (health, poverty, etc)
• Comparing two version of websites
• Comparing two colors/ tab/ page design
Example: A/ B testing
• Let’s say John builds a website for a free e-book and is testing out two colour variations —
red and blue. On the red website, 45 out of 100 visitors downloaded the e-book. But on
the blue website, 47 out of 100 visitors downloaded the e-book. Based on this, John may
conclude that the blue website is performing better.
• However, John’s method can backfire. This is because he did not bother to check for
statistical significance. The difference in performance observed may be due to plain old
randomness. Thus, there’s a high probability that he may end up with an inferior website
colour.
Coupon A Coupon B
Impressions 50000 650000
Clicks 2400 2770
CTR 4.80% 4.26%
Variant B’s conversion rate (4.26%) was 11.22% lower than variant A’s conversion rate (4.80%). You can
be 95% confident that variant B will perform worse than variant A.
Power 0.00% p value 1.0000
Example: A/ B testing
Tanishq launched two ads during Diwali on youtube to promote its products. Both
ads were measured in terms of how many people watched the ad and how many
clicked on them to visit Tanishq store. Using the following data, calculate if adv 2 is
more effective in directing the traffic.
Advertisement 1 Advertisement 2
Impressions 343490 344200
Clicks 96720 97535
CTR 28.16% 28.34%
Variant B’s conversion rate (28.34%) was 0.63% higher than variant A’s conversion rate (28.16%).
You can be 95% confident that variant B will perform better than variant A.
Power 75.27% p value 0.0499
Errors in Hypothesis test
Decision
Type I error:
Fail to reject H0 Reject H0
Reject H0 when H0 is true
H0 (True) Correct decision Type I error
(Alpha Error) Type II error:
H0 (False) Type II Error Correct decision Fail to reject H0 when H0 is
(Beta error) false
• Type 1- Null hypothesis was true but rejected, pizza>=9, but I rejected
• Type 2 error- Accept Ho when ho is false, pizza was not >=9 but accepted it
Handling Error
There are two ways of handling error-
1. Increasing confidence level of the test
a. Reduces type 1 error
b. Increase types two error
2. Increasing sample size
a. Reduces type 2 error
b. Doesn’t effect Type 1 error
• P value calculator
• http://courses.atlas.illinois.edu/spring2016/STAT/STAT200/pnormal.htm
l
• A/B testing
• https://www.surveymonkey.com/mp/ab-testing-significance-calculator/
• Quiz3. Practice
https://forms.gle/NmAufBEXStaRagRr8
Doubts?
All the Best!
https://www.youtube.com/watch?v=Z9Gw9dIJGiA&t=86s&ab_channel=upGrad_Gmba