All About Statistical Significance and Testing

Null Hypothesis (1 of 4)
The null hypothesis is an hypothesis about a population parameter. The purpose of hypothesis
testing is to test the viability of the null hypothesis in the light of experimental data. Depending
on the data, the null hypothesis either will or will not be rejected as a viable possibility.
Consider a researcher interested in whether the time to respond to a tone is affected by the
consumption of alcohol. The null hypothesis is that µ 1 - µ 2 = 0 where µ 1 is the mean time to
respond after consuming alcohol and µ 2 is the mean time to respond otherwise. Thus, the null
hypothesis concerns the parameter µ 1 - µ 2 and the null hypothesis is that the parameter equals
zero.
The null hypothesis is often the reverse of what the experimenter actually believes; it is put
forward to allow the data to contradict it. In the experiment on the effect of alcohol, the
experimenter probably expects alcohol to have a harmful effect. If the experimental data show a
sufficiently large effect of alcohol, then the null hypothesis that alcohol has no effect can be
rejected.
!"
#
!"# $% & $' ( "
!"# )% ( )'
* * * * &&
+ * * * * ,
"
- .
/"" # !"# ) ( /""
.
012 "
!"# 1 ( "
#
!"# )("
!"# )(%"
!"# )% & )' ( "
!"# 3 ( 4
!"# 3% & 3' ( "
!"# )% ( )' ( )5
!"# 1%& 1'( "
6 & . &
)% & ) ' 7 " 8% & 8'
" #
)% & )' 9 "
Steps in Hypothesis Testing (1 of 5)
The basic logic of hypothesis testing has been presented somewhat informally in the sections on
"Ruling out chance as an explanation" and the "Null hypothesis." In this section the logic will be
presented in more detail and more formally.
1. The first step in hypothesis testing is to specify the null hypothesis (H0) and the
alternative hypothesis (H1). If the research concerns whether one method of presenting
pictorial stimuli leads to better recognition than another, the null hypothesis would most
likely be that there is no difference between methods (H0: 1 - 2 = 0). The alternative
hypothesis would be H1: 1 2. If the research concerned the correlation between grades
and SAT scores, the null hypothesis would most likely be that there is no correlation (H0:
= 0). The alternative hypothesis would be H1: 0.
2. The next step is to select a significance level. Typically the 0.05 or the 0.01 level is used.
3. The third step is to calculate a statistic analogous to the parameter specified by the null
hypothesis. If the null hypothesis were defined by the parameter 1- 2, then the statistic
M1 - M2 would be computed.
4. The fourth step is to calculate the probability value (often called the p value). The p value
is the probability of obtaining a statistic as different or more different from the parameter
specified in the null hypothesis as the statistic computed from the data. The calculations
are made assuming that the null hypothesis is true. (click here for a concrete example)
5. The probability value computed in Step 4 is compared with the significance level chosen
in Step 2. If the probability is less than or equal to the significance level, then the null
hypothesis is rejected; if the probability is greater than the significance level then the null
hypothesis is not rejected. When the null hypothesis is rejected, the outcome is said to be
"statistically significant" when the null hypothesis is not rejected then the outcome is said
be "not statistically significant."
6. If the outcome is statistically significant, then the null hypothesis is rejected in favor of
the alternative hypothesis. If the rejected null hypothesis were that 1- 2 = 0, then the
alternative hypothesis would be that 1 2. If M1 were greater than M2 then the
researcher would naturally conclude that 1 2. (Click here to see why you can
conclude more than 1 2)
7. The final step is to describe the result and the statistical conclusion in an understandable
way. Be sure to present the descriptive statistics as well as whether the effect was
significant or not. For example, a significant difference between a group that received a
drug and a control group might be described as follow:
Subjects in the drug group scored significantly higher (M = 23) than did subjects in the
control group (M = 17), t(18) = 2.4, p = 0.027.
The statement that "t(18) =2.4" has to do with how the probability value (p) was calculated. A
small minority of researchers might object to two aspects of this wording. First, some believe
that the significance level rather than the probability level should be reported. The argument for
reporting the probability value is presented in another section. Second, since the alternative
hypothesis was stated as µ 1 µ 2, some might argue that it can only be concluded that the
population means differ and not that the population mean for the drug group is higher than the
population mean for the control group.
This argument is misguided. Intuitively, there are strong reasons for inferring that the direction
of the difference in the population is the same as the difference in the sample. There is also a
more formal argument. A non significant effect might be described as follows:
Although subjects in the drug group scored higher (M = 23) than did subjects in the control
group, (M = 20), the difference between means was not significant, t(18) = 1.4, p = 0.179.
It would not have been correct to say that there was no difference between the performance of
the two groups. There was a difference. It is just that the difference was not large enough to rule
out chance as an explanation of the difference. It would also have been incorrect to imply that
there is no difference in the population. Be sure not to accept the null hypothesis.
.
* ) : *
Why the Null Hypothesis is Not Accepted (1 of 5)

A null hypothesis is not accepted just because it is not rejected. Data not sufficient to show
convincingly that a difference between means is not zero do not prove that the difference is zero.
Such data may even suggest that the null hypothesis is false but not be strong enough to make a
convincing case that the null hypothesis is false. For example, if the probability value were 0.15,
then one would not be ready to present one's case that the null hypothesis is false to the
(properly) skeptical scientific community. More convincing data would be needed to do that.
However, there would be no basis to conclude that the null hypothesis is true. It may or may not
be true, there just is not strong enough evidence to reject it. Not even in cases where there is no
evidence that the null hypothesis is false is it valid to conclude the null hypothesis is true. If the
null hypothesis is that µ 1 - µ 2 is zero then the hypothesis is that the difference is exactly zero. No
experiment can distinguish between the case of no difference between means and an extremely
small difference between means. If data are consistent with the null hypothesis, they are also
consistent with other similar hypotheses.
Significance Test (1 of 2)
A significance test is performed to determine if an observed value of a statistic differs enough

from a hypothesized value of a parameter to draw the inference that the hypothesized value of
the parameter is not the true value. The hypothesized value of the parameter is called the "null
hypothesis." A significance test consists of calculating the probability of obtaining a statistic as
different or more different from the null hypothesis (given that the null hypothesis is correct)
than the statistic obtained in the sample. If this probability is sufficiently low, then the difference
between the parameter and the statistic is said to be "statistically significant."
Just how low is sufficiently low? The choice is somewhat arbitrary but by convention levels of
0.05 and 0.01 are most commonly used.
For instance, an experimenter may hypothesize that the size of a food reward does not affect the
speed a rat runs down an alley. One group of rats receives a large reward and another receives a
small reward for running the alley. Suppose the mean running time for the large reward were 1.5
seconds and the mean running time for the small reward were 2.1 seconds.
'%&%4("/
"/
0
2
Significance Level
In hypothesis testing, the significance level is the criterion used for rejecting the null hypothesis.
The significance level is used in hypothesis testing as follows: First, the difference between the
results of the experiment and the null hypothesis is determined. Then, assuming the null
hypothesis is true, the probability of a difference that large or larger is computed . Finally, this
probability is compared to the significance level. If the probability is less than or equal to the
significance level, then the null hypothesis is rejected and the outcome is said to be statistically
significant. Traditionally, experimenters have used either the 0.05 level (sometimes called the
5% level) or the 0.01 level (1% level), although the choice of levels is largely subjective. The
lower the significance level, the more the data must diverge from the null hypothesis to be
significant. Therefore, the 0.01 level is more conservative than the 0.05 level. The Greek letter
alpha ( ) is sometimes used to indicate the significance level. See also: Type I error and
significance test.
A null hypothesis is not accepted just because it is not rejected. Data not sufficient to show
convincingly that a difference between means is not zero do not prove that the difference is zero.
Such data may even suggest that the null hypothesis is false but not be strong enough to make a
convincing case that the null hypothesis is false. For example, if the probability value were 0.15,
then one would not be ready to present one's case that the null hypothesis is false to the
(properly) skeptical scientific community. More convincing data would be needed to do that.
However, there would be no basis to conclude that the null hypothesis is true. It may or may not
be true, there just is not strong enough evidence to reject it. Not even in cases where there is no
evidence that the null hypothesis is false is it valid to conclude the null hypothesis is true. If the
null hypothesis is that µ 1 - µ 2 is zero then the hypothesis is that the difference is exactly zero. No
experiment can distinguish between the case of no difference between means and an extremely
small difference between means. If data are consistent with the null hypothesis, they are also
consistent with other similar hypotheses.
$%& $' ( "

$%& $' ( " ""%
6
8
-
6
;
<
&
.
, $%
$'
$ % & $'

Assume the experiment measured "well being" on a 50 point scale (with higher scores
representing more well being) that has a standard deviation of 10. Further assume the 99%
confidence interval computed from the experimental data was:
-0.5 µ 1- µ 2 1
This says that one can be confident that the mean "true" drug treatment effect is somewhere
between -0.5 and 1. If it were -0.5 then the drug would, on average, be slightly detrimental; if it
were 1 then the drug would, on average, be slightly beneficial. But, how much benefit is an
average improvement of 1? Naturally that is a question that involves characteristics of the
measurement scale. But, since 1 is only 0.10 standard deviations, it can be presumed to be a
small effect. The overlap between two distributions whose means differ by 0.10 standard
deviations is shown below. Although the blue distribution is
slightly to the right of the red distribution, the overlap is almost complete.
0 ==> 2
# '&5 *< ? *
The Precise Meaning of the Probability Value (1 of 3)
There is often confusion about the precise meaning of the probability computed in a significance
test. As stated in Step 4 of the steps in hypothesis testing, the null hypothesis (H0) is assumed to
be true. The difference between the statistic computed in the sample and the parameter specified
by H0 is computed and the probability of obtaining a difference this large or large is calculated.
This probability value is the probability of obtaining data as extreme or more extreme than the
current data (assuming H0 is true). It is not the probability of the null hypothesis itself. Thus, if
the probability value is 0.005, this does not mean that the probability that the null hypothesis is
true is .005. It means that the probability of obtaining data as different or more different from the
null hypothesis as those obtained in the experiment is 0.005.
The inferential step to conclude that the null hypothesis is false goes as follows: The data (or
data more extreme) are very unlikely given that the null hypothesis is true. This means that: (1) a
very unlikely event occurred or (2) the null hypothesis is false. The inference usually made is
that the null hypothesis is false.
To illustrate that the probability is not the probability of the hypothesis, consider a test of a
person who claims to be able to predict whether a coin will come up heads or tails. One should
take a rather skeptical attitude toward this claim and require strong evidence to believe in its
validity. The null hypothesis is that the person can predict correctly half the time (H0: = 0.5). In
the test, a coin is flipped 20 times and the person is correct 11 times. If the person has no special
ability (H0 is true), then the probability of being correct 11 or more times out of 20 is 0.41.
Would someone who was originally skeptical now believe that there is only a 0.41 chance that
the null hypothesis is true? They almost certainly would not since they probably originally
thought H0 had a very high probability of being true (perhaps as high as 0.9999). There is no
logical reason for them to decrease their belief in the validity of the null hypothesis since the
outcome was perfectly consistent with the null hypothesis.
The proper interpretation of the test is as follows: A person made a rather extraordinary claim
and should be able to provide strong evidence in support of the claim if the claim is to believed.
The test provided data consistent with the null hypothesis that the person has no special ability
since a person with no special ability would be able to predict as well or better more than 40% of
the time. Therefore, there is no compelling reason to believe the extraordinary claim. However,
the test does not prove the person cannot predict better than chance; it simply fails to provide
evidence that he or she can. The probability that the null hypothesis is true is not determined by
the statistical analysis conducted as part of hypothesis testing. Rather, the probability computed
is the probability of obtaining data as different or more different from the null hypothesis (given
that the null hypothesis is true) as the data actually obtained.
.
0 2
" "4 " "A'

" "4 " """% " "A'
" "4
" """% " "A%
" "4%
!
" AB' " "4
8
6
" """% " "A'; 0
2
.
0 2
6 0%===2
Statistical and Practical Significance (1 of 4)
It is important not to confuse the confidence with which the null hypothesis can be rejected with
size of the effect. To make this point concrete, consider a researcher assigned the task of
determining whether the video display used by travel agents for booking airline reservations
should be in color or in black and white. Market research had shown that travel agencies were
primarily concerned with the speed with which reservations can be made. Therefore, the question
was whether color displays allow travel agents to book reservations faster. Market research had
also shown that in order to justify the higher price of color displays, they must be faster by an
average of at least 10 seconds per transaction. Fifty subjects were tested with color displays and
50 subjects were tested with black and white displays. Subjects were slightly faster at making
reservations on a color display (M = 504.7 seconds) than on a black and white display (M =
508.2) seconds. although the difference is small, it was statistically significant at the .05
significance level. Box plots of the data are shown below.
=4> #
&C " D ) &) ? D &" %
C"
"% <
E =4> 0 2
0%" 2
,,
%"
%""
08 ( 4"A C
2 08 ( 4"B % 2
" "% " "4 + ,
* *
, , 5A
, %"" 54 , 4"
The 95% confidence interval on the difference between means is:
-5.8 color - black & white -0.9

and the 99% interval is:
-6.6 color - black & white -0.1
Therefore, despite the finding of a "more significant" difference between means, the
experimenter can be even more certain that the color displays are only slightly better than the
black and white displays. The second experiment shows conclusively that the difference is less
than 10 seconds.
This example was used to illustrate the following points: (1) an effect that is statistically
significant is not necessarily large enough to be of practical significance and (2) the smaller of
two effects can be "more significant" than the larger. Be careful how you interpret findings
reported in the media. If you read that a particular diet lowered cholesterol significantly, this
does not necessarily mean that the diet lowered cholesterol enough to be of any health value. It
means that the effect on cholesterol in the population is greater than zero.
Type I and II errors (1 of 2)
There are two kinds of errors that can be made in significance testing: (1) a true null hypothesis
can be incorrectly rejected and (2) a false null hypothesis can fail to be rejected. The former error
is called a Type I error and the latter error is called a Type II error. These two types of errors are
defined in the table.
True State of the Null Hypothesis

Statistical Decision
H0 True H0 False
Reject H0 Type I error Correct
Do not Reject H0 Correct Type II error
The probability of a Type I error is designated by the Greek letter alpha (α) and is called the
Type I error rate; the probability of a Type II error (the Type II error rate) is designated by the
Greek letter beta (ß) . A Type II error is only an error in the sense that an opportunity to reject
the null hypothesis correctly was lost. It is not an error in the sense that an incorrect conclusion
was drawn since no conclusion is drawn when the null hypothesis is not rejected.
. .
0F2
G
!
"4 "%
"%
"4
One- and Two-Tailed Tests (1 of 4)
In the section on "Steps in hypothesis testing" the fourth step involves calculating the probability
that a statistic would differ as much or more from parameter specified in the null hypothesis as
does the statistic obtained in the experiment. This statement implies that a difference in either
direction would be counted. That is, if the null hypothesis were:
H0: - =0
and the value of the statistic M1- M2 were +5, then the probability of M1- M2 differing from zero
by five or more (in either direction) would be computed. In other words, probability value would
be the probability that either M1- M2 5 or M1- M2 -5.
Assume that the figure shown below is the sampling distribution of M1- M2.
The figure shows that the probability of a value of +5 or more is 0.036 and that the probability of
a value of -5 or less is .036. Therefore the probability of a value either greater than or equal to +5
or less than or equal to -5 is 0.036 + 0.036 = 0.072.
. * & *
)% & )' , !
)% & )' , ,
,
6 * & *
)% & )' , &
& 8%& 8'
,
&
" "5/
& &
&
& & 0" "5/2
" "4 & 0" "C'2 H &
&
- & & - &

&
- & I
&
Confidence Intervals & Hypothesis Testing (1 of 5)
There is an extremely close relationship between confidence intervals and hypothesis testing.
When a 95% confidence interval is constructed, all values in the interval are considered plausible
values for the parameter being estimated. Values outside the interval are rejected as relatively
implausible. If the value of the parameter specified by the null hypothesis is contained in the
95% interval then the null hypothesis cannot be rejected at the 0.05 level. If the value specified
by the null hypothesis is not in the interval then the null hypothesis can be rejected at the 0.05
level. If a 99% confidence interval is constructed, then values outside the interval are rejected at
the 0.01 level.
Imagine a researcher wishing to test the null hypothesis that the mean time to respond to an
auditory signal is the same as the mean time to respond to a visual signal. The null hypothesis
therefore is:
visual - auditory = 0.
Ten subjects were tested in the visual condition and their scores (in milliseconds) were: 355, 421,
299, 460, 600, 580, 474, 511, 550, and 586.
Ten subjects were tested in the auditory condition and their scores were: 275, 320, 278, 360, 430,
520, 464, 311, 529, and 326.
The 95% confidence interval on the difference between means is:

9 visual - auditory 196.
Therefore only values in the interval between 9 and 196 are retained as plausible values for the
difference between population means. Since zero, the value specified by the null hypothesis, is
not in the interval, the null hypothesis of no difference between auditory and visual presentation
can be rejected at the 0.05 level. The probability value for this example is 0.034. Any time the
parameter specified by a null hypothesis is not contained in the 95% confidence interval
estimating that parameter, the null hypothesis can be rejected at the 0.05 level or less. Similarly,
if the 99% interval does not contain the parameter then the null hypothesis can be rejected at the
0.01 level. The null hypothesis is not rejected if the parameter value specified by the null
hypothesis is in the interval since the null hypothesis would still be plausible.
However, since the null hypothesis would be only one of an infinite number of values in the
confidence interval, accepting the null hypothesis is not justified.
There are many arguments against accepting the null hypothesis when it is not rejected. The null
hypothesis is usually a hypothesis of no difference. Thus null hypotheses such as:
1 - 2 =0
1 - 2 =0
in which the hypothesized value is zero are most common. When the hypothesized value is zero
then there is a simple relationship between hypothesis testing and confidence intervals:
If the interval contains zero then the null hypothesis cannot be rejected at the stated level of
confidence. If the interval does not contain zero then the null hypothesis can be rejected.
This is just a special case of the general rule stating that the null hypothesis can be rejected if the
interval does not contain the hypothesized value of the parameter and cannot be rejected if the
interval contains the hypothesized value.
, ) % & )' ( " " "4

, ) % & )'
)% )' J
)% )'#
)% & )' ( "

)% & )' K "
)% & )' 9 "
6
0 2
J =4> #
/ D )% & )' D %4
, )% & )' ( " " "4

8 # )% K
)'
6 ,
,

All About Statistical Significance and Testing

Uploaded by

Copyright:

Available Formats

All About Statistical Significance and Testing

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

All About Statistical Significance and Testing

Uploaded by

Copyright:

Available Formats

Null Hypothesis (1 of 4)

!"# $% & $' ( "

Steps in Hypothesis Testing (1 of 5)

Why the Null Hypothesis is Not Accepted (1 of 5)

A significance test is performed to determine if an observed value of a statistic differs enough

Why the Null Hypothesis is Not Accepted (1 of 5)

$%& $' ( "

Why the Null Hypothesis is Not Accepted (4 of 5)

The Precise Meaning of the Probability Value (1 of 3)

The Precise Meaning of the Probability Value (2 of 3)

The Precise Meaning of the Probability Value (3 of 3)

" "4 " "A'

Statistical and Practical Significance (1 of 4)

&C " D ) &) ? D &" %

The 95% confidence interval on the difference between means is:

-5.8 color - black & white -0.9

-6.6 color - black & white -0.1

Type I and II errors (1 of 2)

True State of the Null Hypothesis

One- and Two-Tailed Tests (1 of 4)

- & & - &

Confidence Intervals & Hypothesis Testing (1 of 5)

The 95% confidence interval on the difference between means is:

, ) % & )' ( " " "4

)% & )' ( "

, )% & )' ( " " "4

You might also like