123 T F Z Chi Test 2
123 T F Z Chi Test 2
123 T F Z Chi Test 2
t-test
The t-test was described by 1908 by William Sealy Gosset for monitoring the brewing at
Guinness in Dublin. Guinness considered the use of statistics a trade secret, so he published his
test under the pen-name 'Student' -- hence the test is now often called the 'Student's t-test'.
The t-test is a basic test that is limited to two groups. For multiple groups, you would have
to compare each pair of groups, for example with three groups there would be three tests (AB,
AC, BC).
The t-test (or student's t-test) gives an indication of the separateness of two sets of
measurements, and is thus used to check whether two sets of measures are essentially different
(and usually that an experimental effect has been demonstrated). The typical way of doing this is
with the null hypothesis that means of the two sets of measures are equal.
The t-test assumes:
In testing the null hypothesis that the population mean is equal to a specified value 0, one uses the
statistic
where s is the sample standard deviation of the sample and n is the sample size. The degrees of
freedom used in this test is n 1.
the two sample sizes (that is, the number, n, of participants of each group) are equal;
it can be assumed that the two distributions have the same variance.
where
Here is the grand standard deviation (or pooled standard deviation), 1 = group one, 2 =
group two. The denominator of t is the standard error of the difference between two means.
123 t-F-Z-chi-test 2
For significance testing, the degrees of freedom for this test is 2n 2 where n is the number of
participants in each group.
where
Note that the formulae above are generalizations for the case where both samples have equal sizes
is an estimator of the common standard deviation of the two samples: it is defined in this
way so that its square is an unbiased estimator of the common variance whether or not the
population means are the same. In these formulae, n = number of participants, 1 = group one, 2 =
group two. n 1 is the number of degrees of freedom for either group, and the total sample size
minus two (that is, n1 + n2 2) is the total number of degrees of freedom, which is used in
significance testing.
where
2
Where s is the unbiased estimator of the variance of the two samples, n = number of participants,
1 = group one, 2 = group two. Note that in this case, is not a pooled variance. For use in
significance testing, the distribution of the test statistic is approximated as being an ordinary
Student's t distribution with the degrees of freedom calculated using
This is called the WelchSatterthwaite equation. Note that the true distribution of the test statistic
actually depends (slightly) on the two unknown variances.
This test is used when the samples are dependent; that is, when there is only one sample
that has been tested twice (repeated measures) or when there are two samples that have been
matched or "paired". This is an example of a paired difference test.
An F-test ( Snedecor and Cochran, 1983) is used to test if the standard deviations of two
populations are equal. This test can be a two-tailed test or a one-tailed test. The two-tailed version
tests against the alternative that the standard deviations are not equal. The one-tailed version only
tests in one direction, that is the standard deviation from the first population is either greater than or
less than (but not both) the second population standard deviation . The choice is determined by the
problem. For example, if we are testing a new process, we may only be interested in knowing if the
new process is less variable than the old process. The test statistic for F is simply
where and are the sample variances with N1 and N2 number of observations respectively,. The
more this ratio deviates from 1, the stronger the evidence for unequal population variances.
We use the F-test as the Student's t test, only we are testing for significant differences in the
variances.
The hypothesis that the two standard deviations are equal is rejected if
where is the critical value of the F distribution with and degrees of freedom and a
significance level of .
Chi-square test
The chi-square test may be used both as a test of goodness-of-fit (comparing frequencies of one
attribute variable to theoretical expectations) and as a test of independence (comparing frequencies
of one attribute variable for different values of a second attribute variable). The underlying
arithmetic of the test is the same; the only difference is the way the expected values are calculated.
Goodness-of-fit tests and tests of independence are used for quite different experimental designs
and test different null hypotheses, so we will consider the chi-square test of goodness-of-fit and the
chi-square test of independence to be two distinct statistical tests.
When to use it
The chi-squared test of independence is used when you have two attribute variables, each with
two or more possible values. A data set like this is often called an "R X C table," where R is the
number of rows and C is the number of columns. For example, if you surveyed the frequencies of
three flower phenotypes (red, pink, white) in four geographic locations, you would have a 3 X 4
123 t-F-Z-chi-test 4
table. You could also consider it a 4 X 3 table; it doesn't matter which variable is the columns and
which is the rows.
It is also possible to do a chi-squared test of independence with more than two attribute
variables, but that experimental design doesn't occur very often and is rather complicated to analyze
and interpret, so we won't cover it.
Hypothesis
H0 : The chi-square test is defined for the hypothesis:
H1 : The data do not follow the specified distribution.
Test statistic
For the chi-square goodness-of-fit computation, the data are divided into k bins and
the test statistic is defined as
where is the observed frequency for bin i and is the expected frequency.
The hypothesis(H0) that the data are from a population with the specified distribution is rejected if
Degrees of freedom can be calculated as the number of categories in the problem minus 1.
Calculating Chi-Square
Green Yellow
Observed (o) 639 241
Expected (e) 660 220
Deviation (o - e) -21 21
2 2
Deviation (d ) 441 441
2
d /e 0.668 2
2 2
= d /e = 2.668 . .
For a contingency table that has r rows and c columns, the chi square test can be thought of
as a test of independence. In a test of independence the null and alternative hypotheses are:
When a comparison is made between one sample and another, a simple rule is that the
degrees of freedom equal (number of columns minus one) x (number of rows minus one) not
counting the totals for rows or columns.
Example:
Suppose you conducted a drug trial on a group of animals and you hypothesized that the
animals receiving the drug would survive better than those that did not receive the drug. You
conduct the study and collect the following data:
Ho: The survival of the animals is independent of drug treatment.
H1: The survival of the animals is associated with drug treatment.