Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Hypothesis Testing

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 57

Hypothesis Testing

Introduction
• Inferential statistics enables to measure the behaviour in samples to learn more
about the behaviour in populations that are often too large or inaccessible.
• Hypothesis testing is an act in statistics whereby an analyst tests an assumption
regarding a population parameter. The methodology employed by the analyst
depends on the nature of the data used and the reason for the analysis.
• Hypothesis testing is used to assess the plausibility of a hypothesis by using
sample data which come from a larger population, or from a data-generating
process.
• The test provides evidence concerning the plausibility of the hypothesis, given
the data.
• Statistical analysts test a hypothesis by measuring and examining a random
sample of the population being analysed.
Introduction
• The method in which we select samples to learn more about characteristics in a
given population is called hypothesis testing.
• Hypothesis testing is really a systematic way to test claims or ideas about a group
or population
Hypothesis testing or significance testing is a method for testing a
claim or hypothesis about a parameter in a population, using data
measured in a sample. In this method, we test some hypothesis by
determining the likelihood that a sample statistic could have been
selected, if the hypothesis regarding the population parameter were
true.
Basic Terms
• Population  all possible values
• Sample  a portion of the population
• Statistical inference  generalizing from a sample to
a population with calculated degree of certainty
• Two forms of statistical inference
• Hypothesis testing
• Estimation
• Parameter  a characteristic of population, e.g., population
mean µ
• Statistic  calculated from data in the sample, e.g., sample
mean ( )
x
Distinctions Between Parameters and Statistics

Parameters Statistics

Source Population Sample

Notation Greek (e.g., μ) Roman (e.g., xbar)

Vary No Yes

Calculated No Yes
Sampling Distributions of a Mean

The sampling distributions of a mean (SDM)


describes the behavior of a sampling mean

x ~ N  , SE x 

where SE x 
n
Hypothesis Testing
• A statistical hypothesis is an assertion Hypothesis testing is formulated in
or conjecture concerning one or more terms of two hypotheses:
populations. • H0: the null hypothesis;
• To prove that a hypothesis is true, or false,
• H1: the alternate hypothesis.
with absolute certainty, we would need
absolute knowledge. That is, we would
have to examine the entire population.
• Instead, hypothesis testing concerns on how
to use a random sample to judge if it is
evidence that supports or not the
hypothesis.
Hypothesis Testing
• Tests a claim about a parameter using evidence (data
in a sample.
• The goal of hypothesis testing is to determine the
likelihood that a population parameter, such as the
mean, is likely to be true
• The technique is introduced by considering a one-
sample z test
• The procedure is broken into four steps
• Each element of the procedure must be understood
Hypothesis Testing Steps
• The first step is for the analyst to state
the two hypotheses so that only one can
State the hypotheses.
be right.
• The next step is to formulate an analysis
Set the criteria for a decision or
signification level
plan, which outlines how the data will be
evaluated.
Compute the test statistic and
• The third step is to carry out the plan and
Corresponding P-Value physically analyze the sample data.
• The fourth and final step is to analyze the
Make a conclusion / decision. results and either reject the null
hypothesis, or state that the null
hypothesis is plausible, given the data.
State the hypotheses
• Convert the research question to null and alternative hypotheses
• The null hypothesis (H0) is a statement of no effect, relationship, or difference between two or
more groups or factors.  In research studies, a researcher is usually interested in disproving the
null hypothesis
• The null hypothesis (H0) is a claim of “no difference in the population”
• The alternative hypothesis (Ha) claims “H0 is false”
• The alternative hypothesis (H1) is the statement that there is an effect or difference.  This is
usually the hypothesis the researcher is interested in proving.
• The alternative hypothesis can be one-sided (only provides one direction, e.g., lower) or two-
sided.   
• Collect data and seek evidence against H0 as a way of bolstering Ha (deduction)
• Rather than trying to prove that the study hypothesis is true, we proceed in statistical
hypothesis testing by attempting to disprove the null hypothesis, H0, which is the converse of the
study hypothesis or alternative.
State the hypotheses
• Usually, the alternative hypothesis states that a difference exists between the
parameter values but the direction of that difference is not known. It leads to a two-
sided or a two-
• tailed test.
State the hypotheses
Two-Sided Tests
Usually, the alternative hypothesis states that a difference exists between the parameter values
but the direction of that difference is not known. It leads to a two-sided or a two-tailed test.
• Suppose a pharmaceutical company manufactures ibuprofen pills. They need to perform
some quality assurance to ensure they have the correct dosage, which is supposed to be 500
milligrams. This is a two-sided test because if the company's pills are deviating significantly in
either direction, meaning there are more than 500 milligrams or less than 500 milligrams, this
will indicate a problem.

One-Sided Tests
Very occasionally, however, we have sound prior knowledge that any difference between the treatments,
if it exists, can be in one direction only. This must not be based on hopes or expectations about a novel
treatment, but on an absolute certainty that the difference can only be in that direction, if the difference
is not zero. This gives rise to a one-sided or a one-tailed test in which the direction of the difference is
specified in the alternative hypothesis.
State the hypotheses
One-Sided Tests
― we'll look at the proportion of students who suffer from test anxiety. We want to
test the claim that fewer than half of students suffer from test anxiety.

― we will be testing the claim that women in a certain town are taller than the
average state height, which is 63.8 inches
Setp 2:  Set the Significance Level (α)
• Having specified the null and alternative hypotheses, we then collect
our sample data and set the significance level (denoted by the Greek
letter alpha—  α) is generally set at 0.05.  This means that there is a
5% chance that you will accept your alternative hypothesis when
your null hypothesis is actually true. 
• The smaller the significance level, the greater the burden of proof
needed to reject the null hypothesis, or in other words, to support the
alternative hypothesis.
The test statistic and the P-value
• From the data we calculate the value of a test statistic (an algebraic
expression particular to the hypothesis we are testing)

• Attached to each value of the test statistic is a probability, called a


P-value. It describes the chance of getting the observed effect (or
one more extreme) if the null hypothesis is true. The ‘if the null
hypothesis is true’ is crucial to the correct interpretation of the P-
value.
Step 4 Making a decision using the P-value
• According to the evidence obtained from our sample, we make a
judgement about whether the data are inconsistent with the null
hypothesis; this leads to a decision whether or not to reject the null
hypothesis.
• If the observed results are not consistent with what we would expect if the
null hypothesis were true, we conclude that we have enough evidence to
reject the null hypothesis. We say that the result of the test is statistically
significant.
• If, however, the observed results are consistent with what we would expect if
the null hypothesis were true, we do not reject the null hypothesis. We say
that the result of the test is non-significant.
Step 4 Making a decision using the P-value
The hypothesis we want to test is if H1 is “likely” true. So, there are two
possible outcomes:
• Reject H0 and accept H1 because of sufficient evidence in the sample in
favor or H1;
• Do not reject H0 because of insufficient evidence to support H1.

• Failure to reject H0 does not mean the null hypothesis is true. There is no formal
outcome that says “accept H0.” It only means that we do not have sufficient
evidence to support H1.
Step 4 Making a decision using the P-value
The P-value allows us to determine whether we have enough evidence to
reject the null hypothesis in favour of the alternative hypothesis.
• If the P-value is very small, then it is unlikely that we could have
obtained the observed results if the null hypothesis were true, so we
reject H0
• If the P-value is very large, then there is a high chance that we could
have obtained the observed results if the null hypothesis were true,
and we do not reject H0
Case study

A company manufacturing RAM chips claims the defective


rate of the population is 5%. Let p denote the true defective
probability. We want to test if:
• H0 : p = 0.05
• H1 : p > 0.05
We are going to use a sample of 100 chips from the
production to test.
Deriving the P-value
• Naturally, a vital link in the whole hypothesis test procedure is the
relationship between the value of the test statistic and the P-value.
• each test statistic follows a known theoretical probability distribution.
This means that its value obtained from a particular set of sample
data can be compared with its known distribution to determine the P-
value
• known distribution of the test statistic is Normal, t, F or
Degrees of freedom (df) of the test statistic
• If we are using tables to relate the value of the test statistic to the P-value,
we generally have to know the degrees of freedom of the relevant
distribution of our test statistic.
• The degrees of freedom of a statistic are the number of independent
observations contributing to that statistic, i.e. the number of observations
available to evaluate that statistic minus the number of restrictions on
those observations.
• The easiest way of calculating the degrees of freedom of any statistic is to
take them as the difference between the number of observations we have
in our sample and the number of parameters we have to estimate in
order to evaluate that statistic.
Degrees of freedom (df) of the test statistic
• suppose we are estimating a population variance of a variable, x, in a
sample of size n by its sample statistic, , given by

• We have to estimate the mean in order to evaluate the numerator,


and so the degrees of freedom of s2 are (n − 1).
Summary of the hypothesis test procedure
• Specify the null hypothesis, H0, and the alternative hypothesis (by default,
we adopt a two-sided test unless a different alternative hypothesis is
specified).
• Collect the data and look at them, and try to investigate their distribution(s)
• calculate the appropriate test statistic using the sample data.
• Relate the calculated value of the test statistic to a P-value.
• Consider the P-value to judge whether the data are inconsistent with the
null hypothesis. Then decide whether or not to reject the null hypothesis.
• If appropriate, calculate the confidence interval for the effect of interest,
phrased in terms of the parameter specification in the null hypothesis.
Case study(cont.)

Let X denote the number of defective in the sample of 100.


Reject H0 if X ≥ 10 (chosen “arbitrarily” in this case).
X is called the test statistic.

p = 0.05 Reject H0, p > 0.05

0 10 critical region
100not
Do critical
reject H0 value
Case study(cont.)

Why did we choose a critical value of 10 for this example?


Because this is a Bernoulli process, the expected number of
defectives in a sample is np. So, if p = 0.05 we should expect
100 × 0.05 = 5 defectives in a sample of 100 chips.
Therefore,
10 defectives would be strong evidence that p > 0.05.
The problem of how to find a critical value for a desired level
of significance of the hypothesis test will be studied later.
Illustrative Example: “Body Weight”
• The problem: In the 1970s, 20–29 year old men in the
U.S. had a mean μ body weight of 170 pounds.
Standard deviation σ was 40 pounds. We test whether
mean body weight in the population now differs.
• Null hypothesis H0: μ = 170 (“no difference”)
• The alternative hypothesis can be either Ha: μ > 170
(one-sided test) or
Ha: μ ≠ 170 (two-sided test)
Types of errors
When there is a favored assumption, the presumed innocence of the
person in this case, and the assumption is true, but the jury decides it is
false and declares that the person is guilty, we have a so-called Type I
error.

Conversely, if the favored assumption is false, i.e., the person is really


guilty, but the jury declares that it is true, that is that the person is
innocent, then we have a so-called Type II error.

H0 is true H1 is true
Because we are making a decision based Do not Correct Type II
on a finite sample, there is a possibility reject H0 decision error

that we will make mistakes. The Type I Correct


possible outcomes are: Reject H0 error decision
Power and Sample Size
Two types of decision errors:
Type I error = erroneous rejection of true H0
Type II error = erroneous retention of false H0

Truth
Decision H0 true H0 false
Retain H0 Correct retention Type II error
Reject H0 Type I error Correct rejection
α ≡ probability of a Type I error
β ≡ Probability of a Type II error
Types of errors(cont.)
Definition
The acceptance of H1 when H0 is true is called a Type I error. The probability
of committing a type I error is called the level of significance and is denoted by
α.
The probability of making a Type I error is the probability of incorrectly
rejecting the null hypothesis
it is the P-value obtained from the test. The null hypothesis will be rejected if this probability
is less than the significance level, often denoted by α (alpha) and commonly taken as 0.05.
Thus the significance level is the maximum chance of making a Type I error. If the P-value is
equal to or greater than α, then we do not reject the null hypothesis and we are not making a
Type I error. Therefore, by choosing the significance level of the test to be α at the design
stage of the study, we are limiting the probability of a Type I error to be less than α.
Types of errors(cont.)
Definition
Failure to reject H0 when H1 is true is called a Type II error. The probability of
committing a type II error is denoted by β.

Note: It is impossible to compute β unless we have a specific alternate hypothesis.


The probability of making a Type II error is usually designated by β (beta). It is the probability
of not rejecting the null hypothesis when the null hypothesis is false. We should decide on a
value of β that we regard as acceptable at the design stage of the experiment. β is affected by a
number of factors, one of which is the sample size; the greater the sample size, the smaller β
becomes (keeping the other factors that affect it constant).
In fact, instead of thinking about β, we usually consider its complement, 1 − β (often multiplied
by 100 and expressed as a percentage). This is called the power of the test. It is the probability
of rejecting the null hypothesis when the null hypothesis is false, i.e. it is the chance of
detecting a treatment effect of a given size if it exists
Illustrative Example: “Body Weight”
• The problem: In the 1970s, 20–29 year old had a
mean μ body weight of 170 pounds. Standard
deviation σ was 40 pounds. We test whether mean
body weight in the population now differs.
• Null hypothesis H0: μ = 170 (“no difference”)
• The alternative hypothesis can be either Ha: μ > 170
(one-sided test) or
Ha: μ ≠ 170 (two-sided test)
Test Statistic
This is an example of a one-sample test of a
mean when σ is known. Use this statistic to
test the problem:
x  0
z stat 
SE x
where  0  population mean assuming H 0 is true

and SE x 
n
Illustrative Example: z statistic
• For the illustrative example, μ0 = 170
• We know σ = 40
• Take an SRS of n = 64. Therefore
 40
SE x   5
n 64
• If we found a sample mean of 173, then

x   0 173  170
zstat    0.60
SE x 5
Illustrative Example: z statistic
If we found a sample mean of 185, then

x   0 185  170
zstat    3.00
SE x 5
Reasoning Behind µ zstat

x ~ N 170,5
Sampling distribution of xbar
under H0: µ = 170 for n = 64 
P-value
• The P-value answer the question: What is the
probability of the observed test statistic or one more
extreme when H0 is true?
• This corresponds to the AUC in the tail of the
Standard Normal distribution beyond the zstat.
• Convert z statistics to P-value :
For Ha: μ > μ0  P = Pr(Z > zstat) = right-tail beyond zstat
For Ha: μ < μ0  P = Pr(Z < zstat) = left tail beyond zstat
For Ha: μ ¹ μ0  P = 2 × one-tailed P-value
• Use Table or software to find these probabilities
One-sided P-value for zstat of 0.6
One-sided P-value for zstat of 3.0
Two-Sided P-Value
• One-sided Ha  AUC
in tail beyond zstat
• Two-sided Ha 
consider potential
deviations in both
directions  double Examples: If one-sided P
the one-sided P-value
= 0.0010, then two-sided
P = 2 × 0.0010 = 0.0020.
If one-sided P = 0.2743,
then two-sided P = 2 ×
0.2743 = 0.5486.
Interpretation
• P-value answer the question: What is the probability
of the observed test statistic … when H0 is true?
• Thus, smaller and smaller P-values provide stronger
and stronger evidence against H0
• Small P-value  strong evidence
Interpretation
Conventions*
P > 0.10  non-significant evidence against H0
0.05 < P  0.10  marginally significant evidence
0.01 < P  0.05  significant evidence against H0
P  0.01  highly significant evidence against H0

Examples
P =.27  non-significant evidence against H0
P =.01  highly significant evidence against H0

* It is unwise to draw firm borders for “significance”


α-Level (Used in some situations)

• Let α ≡ probability of erroneously rejecting H0


• Set α threshold (e.g., let α = .10, .05, or whatever)
• Reject H0 when P ≤ α
• Retain H0 when P > α
• Example: Set α = .10. Find P = 0.27  retain H0
• Example: Set α = .01. Find P = .001  reject H0
(Summary) One-Sample z Test
A. Hypothesis statements
H0: µ = µ0 vs.
Ha: µ ≠ µ0 (two-sided) or
Ha: µ < µ0 (left-sided) or
Ha: µ > µ0 (right-sided)
B. Test statistic
x  0 
z stat  where SE x 
SE x n
C. P-value: convert zstat to P value
D. Significance statement (usually not necessary)
Conditions for z test
• σ known (not from data)
• Population approximately Normal or large
sample (central limit theorem)
• SRS (or facsimile)
• Data valid
The Lake Wobegon Example
“where all the children are above average”
• Let X represent Weschler Adult Intelligence
scores (WAIS)
• Typically, X ~ N(100, 15)
• Take SRS of n = 9 from Lake Wobegon population
• Data  {116, 128, 125, 119, 89, 99, 105, 116,
118}
• Calculate: x-bar = 112.8
• Does sample mean provide strong evidence that
population mean μ > 100?
Example: “Lake Wobegon”
A. Hypotheses:
H0: µ = 100 versus
Ha: µ > 100 (one-sided)
Ha: µ ≠ 100 (two-sided)
B. Test statistic:

 15
SE x   5
n 9
x   0 112.8  100
zstat    2.56
SE x 5
C. P-value: P = Pr(Z ≥ 2.56) = 0.0052

P =.0052  it is unlikely the sample came from this


null distribution  strong evidence against H0
Two-Sided P-value: Lake Wobegon
• Ha: µ ≠100
• Considers random
deviations “up” and “down”
from μ0 tails above and
below ±zstat
• Thus, two-sided P
= 2 × 0.0052
= 0.0104
Power
• β ≡ probability of a Type II error
β = Pr(retain H0 | H0 false)
(the “|” is read as “given”)

• 1 – β = “Power” ≡ probability of avoiding a Type II error


1– β = Pr(reject H0 | H0 false)
Power of a z test
 | 0   a | n 
1      z1   


2  
where
• Φ(z) represent the cumulative probability of Standard
Normal Z
• μ0 represent the population mean under the null
hypothesis
• μa represents the population mean under the
alternative hypothesis
Calculating Power: Example
A study of n = 16 retains H0: μ = 170 at α = 0.05
(two-sided); σ is 40. What was the power of test’s
conditions to identify a population mean of 190?

 |    | n 
1      z1   0 a 
 2  
 
 | 170  190 | 16 
   1.96  
 40 
 
  0.04
 0.5160
Reasoning Behind Power
• Competing sampling distributions
Top curve (next page) assumes H0 is true
Bottom curve assumes Ha is true
α is set to 0.05 (two-sided)
• We will reject H0 when a sample mean exceeds 189.6 (right tail, top
curve)
• The probability of getting a value greater than 189.6 on the bottom
curve is 0.5160, corresponding to the power of the test
Sample Size Requirements
Sample size for one-sample z test:

n
2

 z1   z1 
2

2

2
where

1 – β ≡ desired power
α ≡ desired significance level (two-sided)
σ ≡ population standard deviation
Δ = μ0 – μa ≡ the difference worth detecting
Example: Sample Size Requirement
How large a sample is needed for a one-sample z test
with 90% power and α = 0.05 (two-tailed) when σ = 40?
Let H0: μ = 170 and Ha: μ = 190 (thus, Δ = μ0 − μa = 170 –
190 = −20)

n
2

 z1   z1 
2

2


40 2 (1.28  1.96) 2
 41.99
2  20 2
Round up to 42 to ensure adequate power.
Illustration: conditions
for 90% power.

You might also like