Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Independent Samples T Test

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 21

 Independent Samples t Test

The Independent Samples t Test compares the means of two independent groups in order to
determine whether there is statistical evidence that the associated population means are
significantly different. The Independent Samples t Test is a parametric test.

This test is also known as:

o Independent t Test
o Independent Measures t Test
o Independent Two-sample t Test
o Student t Test
o Two-Sample t Test
o Uncorrelated Scores t Test
o Unpaired t Test
o Unrelated t Test

The variables used in this test are known as:

o Dependent variable, or test variable


o Independent variable, or grouping variable
Common Uses

The Independent Samples t Test is commonly used to test the following:

o Statistical differences between the means of two groups


o Statistical differences between the means of two interventions
o Statistical differences between the means of two change scores

Note: The Independent Samples t Test can only compare the means for two (and only two)
groups. It cannot make comparisons among more than two groups. If you wish to compare the
means across more than two groups, you will likely want to run an ANOVA.

Data Requirements

Your data must meet the following requirements:

o Dependent variable that is continuous (i.e., interval or ratio level)


o Independent variable that is categorical (i.e., two or more groups)
o Cases that have values on both the dependent and independent variables
o Independent samples/groups (i.e., independence of observations)
o There is no relationship between the subjects in each sample. This means that:
o Subjects in the first group cannot also be in the second group
o No subject in either group can influence subjects in the other group
o No group can influence the other group
o Violation of this assumption will yield an inaccurate p value
o Random sample of data from the population
o Normal distribution (approximately) of the dependent variable for each group
o Non-normal population distributions, especially those that are thick-tailed or heavily
skewed, considerably reduce the power of the test
o Among moderate or large samples, a violation of normality may still yield
accurate p values
o Homogeneity of variances (i.e., variances approximately equal across groups)
o When this assumption is violated and the sample sizes for each group differ, the p value is
not trustworthy. However, the Independent Samples t Test output also includes an
approximate t statistic that is not based on assuming equal population variances. This
alternative statistic, called the Welch t Test statistic1, may be used when equal variances
among populations cannot be assumed. The Welch t Test is also known an Unequal
Variance t Test or Separate Variances t Test.
o No outliers

Note: When one or more of the assumptions for the Independent Samples t Test are not met, you
may want to run the nonparametric Mann-Whitney U Test instead.

Researchers often follow several rules of thumb:

o Each group should have at least 6 subjects, ideally more. Inferences for the population
will be more tenuous with too few subjects.
o A balanced design (i.e., same number of subjects in each group) is ideal. Extremely
unbalanced designs increase the possibility that violating any of the
requirements/assumptions will threaten the validity of the Independent Samples t Test.

Hypotheses

The null hypothesis (H0) and alternative hypothesis (H1) of the Independent Samples t Test can
be expressed in two different but equivalent ways:

H0: µ1 = µ2 ("the two population means are equal")

H1: µ1 ≠ µ2 ("the two population means are not equal")

OR

H0: µ1 - µ2 = 0 ("the difference between the two population means is equal to 0")

H1: µ1 - µ2 ≠ 0 ("the difference between the two population means is not 0")

where µ1 and µ2 are the populations means for group 1 and group 2, respectively. Notice that the
second set of hypotheses can be derived from the first set by simply subtracting µ2 from both
sides of the equation.

Levene’s Test for Equality of Variances

Recall that the Independent Samples t Test requires the assumption of homogeneity of variance  --
i.e., both groups have the same variance. SPSS conveniently includes a test for the homogeneity of
variance, called Levene's Test, whenever you run an independent samples t test.

The hypotheses for Levene’s test are: 


H0: σ12 - σ22 = 0 ("the population variances of group 1 and 2 are equal")

H1: σ12 - σ22 ≠ 0 ("the population variances of group 1 and 2 are not equal")

This implies that if we reject the null hypothesis of Levene's Test, it suggests that the variances of
the two groups are not equal; i.e., that the homogeneity of variances assumption is violated.

The output in the Independent Samples Test table includes two rows: Equal variances
assumed and Equal variances not assumed. If Levene’s test indicates that the variances are
equal across the two groups (i.e., p-value large), you will rely on the first row of output, Equal
variances assumed, when you look at the results for the actual Independent Samples t Test (under
the heading t-test for Equality of Means). If Levene’s test indicates that the variances are not equal
across the two groups (i.e., p-value small), you will need to rely on the second row of
output, Equal variances not assumed, when you look at the results of the Independent
Samples t Test (under the heading t-test for Equality of Means). 

The difference between these two rows of output lies in the way the independent samples t test
statistic is calculated. When equal variances are assumed, the calculation uses pooled variances;
when equal variances cannot be assumed, the calculation utilizes un-pooled variances and a
correction to the degrees of freedom.

The hypotheses for Levene’s test are: 

H0: σ12 - σ22 = 0 ("the population variances of group 1 and 2 are equal")

H1: σ12 - σ22 ≠ 0 ("the population variances of group 1 and 2 are not equal")

This implies that if we reject the null hypothesis of Levene's Test, it suggests that the variances of
the two groups are not equal; i.e., that the homogeneity of variances assumption is violated.

The output in the Independent Samples Test table includes two rows: Equal variances
assumed and Equal variances not assumed. If Levene’s test indicates that the variances are
equal across the two groups (i.e., p-value large), you will rely on the first row of output, Equal
variances assumed, when you look at the results for the actual Independent Samples t Test (under
the heading t-test for Equality of Means). If Levene’s test indicates that the variances are not equal
across the two groups (i.e., p-value small), you will need to rely on the second row of
output, Equal variances not assumed, when you look at the results of the Independent
Samples t Test (under the heading t-test for Equality of Means). 

The difference between these two rows of output lies in the way the independent samples t test
statistic is calculated. When equal variances are assumed, the calculation uses pooled variances;
when equal variances cannot be assumed, the calculation utilizes un-pooled variances and a
correction to the degrees of freedom.

Run an Independent Samples t Test

To run an Independent Samples t Test in SPSS, click Analyze > Compare Means >


Independent-Samples T Test.
The Independent-Samples T Test window opens where you will specify the variables to be used in
the analysis. All of the variables in your dataset appear in the list on the left side. Move variables
to the right by selecting them in the list and clicking the blue arrow buttons. You can move a
variable(s) to either of two areas: Grouping Variable or Test Variable(s).

A Test Variable(s): The dependent variable(s). This is the continuous variable whose means will
be compared between the two groups. You may run multiple t tests simultaneously by selecting
more than one test variable.

B Grouping Variable: The independent variable. The categories (or groups) of the independent
variable will define which samples will be compared in the t test. The grouping variable must have
at least two categories (groups); it may have more than two categories but a t test can only
compare two groups, so you will need to specify which two groups to compare. You can also use a
continuous variable by specifying a cut point to create two groups (i.e., values at or above the cut
point and values below the cut point).

C Define Groups: Click Define Groups to define the category indicators (groups) to use in
the t test. If the button is not active, make sure that you have already moved your independent
variable to the right in the Grouping Variable field. You must define the categories of your
grouping variable before you can run the Independent Samples t Test procedure.

D Options: The Options section is where you can set your desired confidence level for the
confidence interval for the mean difference, and specify how SPSS should handle missing values.

When finished, click OK to run the Independent Samples t Test, or click Paste to have the syntax
corresponding to your specified settings written to an open syntax window. (If you do not have a
syntax window open, a new window will open for you.)

DEFINE GROUPS

Clicking the Define Groups button (C) opens the Define Groups window:
1 Use specified values: If your grouping variable is categorical, select Use specified values.
Enter the values for the categories you wish to compare in the Group 1 and Group 2 fields. If
your categories are numerically coded, you will enter the numeric codes. If your group variable is
string, you will enter the exact text strings representing the two categories. If your grouping
variable has more than two categories (e.g., takes on values of 1, 2, 3, 4), you can specify two of
the categories to be compared (SPSS will disregard the other categories in this case).

Note that when computing the test statistic, SPSS will subtract the mean of the Group 2 from the
mean of Group 1. Changing the order of the subtraction affects the sign of the results, but does not
affect the magnitude of the results.

2 Cut point: If your grouping variable is numeric and continuous, you can designate a cut
point for dichotomizing the variable. This will separate the cases into two categories based on the
cut point. Specifically, for a given cut point x, the new categories will be:

o Group 1: All cases where grouping variable > x


o Group 2: All cases where grouping variable < x

Note that this implies that cases where the grouping variable is equal to the cut point itself will be
included in the "greater than or equal to" category. (If you want your cut point to be included in a
"less than or equal to" group, then you will need to use Recode into Different Variables or use DO
IF syntax to create this grouping variable yourself.) Also note that while you can use cut points on
any variable that has a numeric type, it may not make practical sense depending on the actual
measurement level of the variable (e.g., nominal categorical variables coded numerically).
Additionally, using a dichotomized variable created via a cut point generally reduces the power of
the test compared to using a non-dichotomized variable.

OPTIONS

Clicking the Options button (D) opens the Options window:


The Confidence Interval Percentage box allows you to specify the confidence level for a
confidence interval. Note that this setting does NOT affect the test statistic or p-value or standard
error; it only affects the computed upper and lower bounds of the confidence interval. You can
enter any value between 1 and 99 in this box (although in practice, it only makes sense to enter
numbers between 90 and 99).

The Missing Values section allows you to choose if cases should be excluded "analysis by
analysis" (i.e. pairwise deletion) or excluded listwise. This setting is not relevant if you have only
specified one dependent variable; it only matters if you are entering more than one dependent
(continuous numeric) variable. In that case, excluding "analysis by analysis" will use all
nonmissing values for a given variable. If you exclude "listwise", it will only use the cases with
nonmissing values for all of the variables entered. Depending on the amount of missing data you
have, listwise deletion could greatly reduce your sample size.

Example: Independent samples T test when variances are not equal


PROBLEM STATEMENT

In our sample dataset, students reported their typical time to run a mile, and whether or not they
were an athlete. Suppose we want to know if the average time to run a mile is different for athletes
versus non-athletes. This involves testing whether the sample means for mile time among athletes
and non-athletes in your sample are statistically different (and by extension, inferring whether the
means for mile times in the population are significantly different between these two groups). You
can use an Independent Samples t Test to compare the mean mile time for athletes and non-
athletes.

The hypotheses for this example can be expressed as:

H0: µnon-athlete - µathlete  = 0 ("the difference of the means is equal to zero")

H1: µnon-athlete - µathlete  ≠ 0 ("the difference of the means is not equal to zero")

where µathlete and µnon-athlete are the population means for athletes and non-athletes,
respectively.

In the sample data, we will use two variables: Athlete and MileMinDur. The variable Athlete has


values of either “0” (non-athlete) or "1" (athlete). It will function as the independent variable in
this T test. The variable MileMinDur is a numeric duration variable (h:mm:ss), and it will function
as the dependent variable. In SPSS, the first few rows of data look like this:
BEFORE THE TEST

Before running the Independent Samples t Test, it is a good idea to look at descriptive statistics
and graphs to get an idea of what to expect. Running Compare Means (Analyze > Compare
Means > Means) to get descriptive statistics by group tells us that the standard deviation in mile
time for non-athletes is about 2 minutes; for athletes, it is about 49 seconds. This corresponds to a
variance of 14803 seconds for non-athletes, and a variance of 2447 seconds for athletes1. Running
the Explore procedure (Analyze > Descriptives > Explore) to obtain a comparative boxplot
yields the following graph:

If the variances were indeed equal, we would expect the total length of the boxplots to be about
the same for both groups. However, from this boxplot, it is clear that the spread of observations
for non-athletes is much greater than the spread of observations for athletes. Already, we can
estimate that the variances for these two groups are quite different. It should not come as a
surprise if we run the Independent Samples t Test and see that Levene's Test is significant.

Additionally, we should also decide on a significance level (typically denoted using the Greek
letter alpha, α) before we perform our hypothesis tests. The significance level is the threshold we
use to decide whether a test result is significant. For this example, let's use α = 0.05.

1When computing the variance of a duration variable (formatted as hh:mm:ss or mm:ss or


mm:ss.s), SPSS converts the standard deviation value to seconds before squaring.

RUNNING THE TEST


To run the Independent Samples t  Test:

o Click Analyze > Compare Means > Independent-Samples T Test.


o Move the variable Athlete to the Grouping Variable field, and move the
variable MileMinDur to the Test Variable(s) area. Now Athlete is defined as the
independent variable and MileMinDur is defined as the dependent variable.
o Click Define Groups, which opens a new window. Use specified values is selected by
default. Since our grouping variable is numerically coded (0 = "Non-athlete", 1 =
"Athlete"), type “0” in the first text box, and “1” in the second text box. This indicates
that we will compare groups 0 and 1, which correspond to non-athletes and athletes,
respectively. Click Continue when finished.
o Click OK to run the Independent Samples t Test. Output for the analysis will display in
the Output Viewer window. 
OUTPUT
Tables

Two sections (boxes) appear in the output: Group Statistics and Independent Samples Test. The
first section, Group Statistics, provides basic information about the group comparisons, including
the sample size (n), mean, standard deviation, and standard error for mile times by group. In this
example, there are 166 athletes and 226 non-athletes. The mean mile time for athletes is 6 minutes
51 seconds, and the mean mile time for non-athletes is 9 minutes 6 seconds.

The second section, Independent Samples Test, displays the results most relevant to the
Independent Samples t Test. There are two parts that provide different pieces of information: (A)
Levene’s Test for Equality of Variances and (B) t-test for Equality of Means.

A Levene's Test for Equality of of Variances: This section has the test results for Levene's Test.
From left to right:

o F is the test statistic of Levene's test


o Sig. is the p-value corresponding to this test statistic.

The p-value of Levene's test is printed as ".000" (but should be read as p < 0.001 -- i.e., p very
small), so we we reject the null of Levene's test and conclude that the variance in mile time of
athletes is significantly different than that of non-athletes. This tells us that we should look at
the "Equal variances not assumed" row for the t test (and corresponding confidence
interval) results. (If this test result had not been significant -- that is, if we had observed p > α --
then we would have used the "Equal variances assumed" output.)

B t-test for Equality of Means provides the results for the actual Independent Samples t Test.
From left to right:

o t is the computed test statistic


o df is the degrees of freedom
o Sig (2-tailed) is the p-value corresponding to the given test statistic and degrees of
freedom
o Mean Difference is the difference between the sample means; it also corresponds to the
numerator of the test statistic
o Std. Error Difference is the standard error; it also corresponds to the denominator of the
test statistic

Note that the mean difference is calculated by subtracting the mean of the second group from the
mean of the first group. In this example, the mean mile time for athletes was subtracted from the
mean mile time for non-athletes (9:06 minus 6:51 = 02:14). The sign of the mean difference
corresponds to the sign of the t  value. The positive t value in this example indicates that the mean
mile time for the first group, non-athletes, is significantly greater than the mean for the second
group, athletes.

The associated p value is printed as ".000"; double-clicking on the p-value will reveal the un-
rounded number. SPSS rounds p-values to three decimal places, so any p-value too small to round
up to .001 will print as .000. (In this particular example, the p-values are on the order of 10-40.)

C Confidence Interval of the Difference: This part of the t-test output complements the
significance test results. Typically, if the CI for the mean difference contains 0, the results are not
significant at the chosen significance level. In this example, the 95% CI is [01:57, 02:32], which
does not contain zero; this agrees with the small p-value of the significance test.

DECISION AND CONCLUSIONS

Since p < .001 is less than our chosen significance level α = 0.05, we can reject the null
hypothesis, and conclude that the that the mean mile time for athletes and non-athletes is
significantly different.

Based on the results, we can state the following:

o There was a significant difference in mean mile time between non-athletes and athletes
(t315.846 = 15.047, p < .001).
o The average mile time for athletes was 2 minutes and 14 seconds faster than the average
mile time for non-athletes.
o One Sample t Test

The One Sample t Test examines whether the mean of a population is statistically different from a
known or hypothesized value. The One Sample t Test is a parametric test.

This test is also known as:

o Single Sample t Test

The variable used in this test is known as:


o Test variable

In a One Sample t Test, the test variable's mean is compared against a "test value", which is a
known or hypothesized value of the mean in the population. Test values may come from a
literature review, a trusted research organization, legal requirements, or industry standards. For
example:

o A particular factory's machines are supposed to fill bottles with 150 milliliters of product.
A plant manager wants to test a random sample of bottles to ensure that the machines are
not under- or over-filling the bottles.
o The United States Environmental Protection Agency (EPA) sets clearance levels for the
amount of lead present in homes: no more than 10 micrograms per square foot on floors
and no more than 100 micrograms per square foot on window sills (as of December
2020). An inspector wants to test if samples taken from units in an apartment building
exceed the clearance level.
Common Uses

The One Sample t Test is commonly used to test the following:

o Statistical difference between a mean and a known or hypothesized value of the mean in
the population.
o Statistical difference between a change score and zero.
o This approach involves creating a change score from two variables, and then comparing
the mean change score to zero, which will indicate whether any change occurred between
the two time points for the original measures. If the mean change score is not significantly
different from zero, no significant change occurred.

Note: The One Sample t Test can only compare a single sample mean to a specified constant. It
can not compare sample means between two or more groups. If you wish to compare the means of
multiple groups to each other, you will likely want to run an Independent Samples t Test (to
compare the means of two groups) or a One-Way ANOVA (to compare the means of two or more
groups).

Data Requirements

Your data must meet the following requirements:

o Test variable that is continuous (i.e., interval or ratio level)


o Scores on the test variable are independent (i.e., independence of observations)
o There is no relationship between scores on the test variable
o Violation of this assumption will yield an inaccurate p value
o Random sample of data from the population
o Normal distribution (approximately) of the sample and population on the test variable
o Non-normal population distributions, especially those that are thick-tailed or heavily
skewed, considerably reduce the power of the test
o Among moderate or large samples, a violation of normality may still yield
accurate p values
o Homogeneity of variances (i.e., variances approximately equal in both the sample and
population)
o No outliers
Hypotheses

The null hypothesis (H0) and (two-tailed) alternative hypothesis (H1) of the one sample T test can
be expressed as:
H0: µ = µ0 ("the population mean is equal to the [proposed] population mean")

H1: µ ≠ µ0 ("the population mean is not equal to the [proposed] population mean")

where µ is the "true" population mean and µ0 is the proposed value of the population mean.

Run a One Sample t Test

To run a One Sample t Test in SPSS, click Analyze > Compare Means > One-Sample T Test.

The One-Sample T Test window opens where you will specify the variables to be used in the
analysis. All of the variables in your dataset appear in the list on the left side. Move variables to
the Test Variable(s) area by selecting them in the list and clicking the arrow button.

A Test Variable(s): The variable whose mean will be compared to the hypothesized population
mean (i.e., Test Value). You may run multiple One Sample t Tests simultaneously by selecting
more than one test variable. Each variable will be compared to the same Test Value. 

B Test Value: The hypothesized population mean against which your test variable(s) will be
compared.

C Options: Clicking Options will open a window where you can specify the Confidence


Interval Percentage and how the analysis will address Missing Values (i.e., Exclude cases
analysis by analysis or Exclude cases listwise). Click Continue when you are finished making
specifications.
Click OK to run the One Sample t Test.

Example
PROBLEM STATEMENT

According to the CDC, the mean height of U.S. adults ages 20 and older is about 66.5 inches (69.3
inches for males, 63.8 inches for females).

In our sample data, we have a sample of 435 college students from a single college. Let's test if the
mean height of students at this college is significantly different than 66.5 inches using a one-
sample t test. The null and alternative hypotheses of this test will be:

H0: µHeight = 66.5 ("the mean height is equal to 66.5")

H1: µHeight ≠ 66.5 ("the mean height is not equal to 66.5")

BEFORE THE TEST

In the sample data, we will use the variable Height, which a continuous variable representing each
respondent’s height in inches. The heights exhibit a range of values from 55.00 to 88.41
(Analyze > Descriptive Statistics > Descriptives).

Let's create a histogram of the data to get an idea of the distribution, and to see if our hypothesized
mean is near our sample mean. Click Graphs > Legacy Dialogs > Histogram. Move variable
Height to the Variable box, then click OK.
To add vertical reference lines at the mean (or another location), double-click on the plot to open
the Chart Editor, then click Options > X Axis Reference Line. In the Properties window, you
can enter a specific location on the x-axis for the vertical line, or you can choose to have the
reference line at the mean or median of the sample data (using the sample data). Click Apply to
make sure your new line is added to the chart. Here, we have added two reference lines: one at the
sample mean (the solid black line), and the other at 66.5 (the dashed red line).

From the histogram, we can see that height is relatively symmetrically distributed about the mean,
though there is a slightly longer right tail. The reference lines indicate that sample mean is slightly
greater than the hypothesized mean, but not by a huge amount. It's possible that our test result
could come back significant.

RUNNING THE TEST

To run the One Sample t Test, click Analyze > Compare Means > One-Sample T Test. Move


the variable Height to the Test Variable(s) area. In the Test Value field, enter 66.5.
Click OK to run the One Sample t Test.

Syntax

T-TEST

/TESTVAL=66.5

/MISSING=ANALYSIS

/VARIABLES=Height

/CRITERIA=CI(.95).

OUTPUT
Tables

Two sections (boxes) appear in the output: One-Sample Statistics and One-Sample Test. The


first section, One-Sample Statistics, provides basic information about the selected
variable, Height, including the valid (nonmissing) sample size (n), mean, standard deviation, and
standard error. In this example, the mean height of the sample is 68.03 inches, which is based on
408 nonmissing observations.

The second section, One-Sample Test, displays the results most relevant to the One
Sample t Test. 
A Test Value: The number we entered as the test value in the One-Sample T Test window.

B t Statistic: The test statistic of the one-sample t  test, denoted t. In this example, t = 5.810. Note
that t is calculated by dividing the mean difference (E) by the standard error mean (from the One-
Sample Statistics box).

C df: The degrees of freedom for the test. For a one-sample t test, df = n - 1; so here, df = 408 - 1
= 407.

D Sig. (2-tailed): The two-tailed p-value corresponding to the test statistic.

E Mean Difference: The difference between the "observed" sample mean (from the One Sample
Statistics box) and the "expected" mean (the specified test value (A)). The sign of the mean
difference corresponds to the sign of the t value (B). The positive t value in this example indicates
that the mean height of the sample is greater than the hypothesized value (66.5).

F Confidence Interval for the Difference: The confidence interval for the difference between the
specified test value and the sample mean.

DECISION AND CONCLUSIONS

Recall that our hypothesized population value was 66.5 inches, the [approximate] average height
of the overall adult population in the U.S. Since p < 0.001, we reject the null hypothesis that the
mean height of students at this college is equal to the hypothesized population mean of 66.5 inches
and conclude that the mean height is significantly different than 66.5 inches.

Based on the results, we can state the following:

o There is a significant difference in the mean height of the students at this college and the
overall adult population in the U.S. (p < .001).
o The average height of students at this college is about 1.5 inches taller than the U.S. adult
population average (95% CI [1.013, 2.050]).
o Paired Samples t Test

The Paired Samples t Test compares the means of two measurements taken from the same
individual, object, or related units. These "paired" measurements can represent things like:

o A measurement taken at two different times (e.g., pre-test and post-test score with an
intervention administered between the two time points)
o A measurement taken under two different conditions (e.g., completing a test under a
"control" condition and an "experimental" condition)
o Measurements taken from two halves or sides of a subject or experimental unit (e.g.,
measuring hearing loss in a subject's left and right ears).
The purpose of the test is to determine whether there is statistical evidence that the mean
difference between paired observations is significantly different from zero. The Paired
Samples t Test is a parametric test.

This test is also known as:

o Dependent t Test
o Paired t Test
o Repeated Measures t Test

The variable used in this test is known as:

o Dependent variable, or test variable (continuous), measured at two different times or for
two related conditions or units
Common Uses

The Paired Samples t Test is commonly used to test the following:

o Statistical difference between two time points


o Statistical difference between two conditions
o Statistical difference between two measurements
o Statistical difference between a matched pair

Note: The Paired Samples t Test can only compare the means for two (and only two) related
(paired) units on a continuous outcome that is normally distributed. The Paired Samples t Test is
not appropriate for analyses involving the following: 1) unpaired data; 2) comparisons between
more than two units/groups; 3) a continuous outcome that is not normally distributed; and 4) an
ordinal/ranked outcome.

o To compare unpaired means between two independent groups on a continuous outcome


that is normally distributed, choose the Independent Samples t Test.
o To compare unpaired means between more than two groups on a continuous outcome that
is normally distributed, choose ANOVA.
o To compare paired means for continuous data that are not normally distributed, choose
the nonparametric Wilcoxon Signed-Ranks Test.
o To compare paired means for ranked data, choose the nonparametric Wilcoxon Signed-
Ranks Test.
Data Requirements

Your data must meet the following requirements:

o Dependent variable that is continuous (i.e., interval or ratio level)


o Note: The paired measurements must be recorded in two separate variables.
o Related samples/groups (i.e., dependent observations)
o The subjects in each sample, or group, are the same. This means that the subjects in the
first group are also in the second group.
o Random sample of data from the population
o Normal distribution (approximately) of the difference between the paired values
o No outliers in the difference between the two related groups

Note: When testing assumptions related to normality and outliers, you must use a variable that
represents the difference between the paired values - not the original variables themselves.
Note: When one or more of the assumptions for the Paired Samples t Test are not met, you may
want to run the nonparametric Wilcoxon Signed-Ranks Test instead.

Hypotheses

The hypotheses can be expressed in two different ways that express the same idea and are
mathematically equivalent:

H0: µ1 = µ2 ("the paired population means are equal")

H1: µ1 ≠ µ2 ("the paired population means are not equal")

OR

H0: µ1 - µ2 = 0 ("the difference between the paired population means is equal to 0")

H1: µ1 - µ2 ≠ 0 ("the difference between the paired population means is not 0")

where

o µ1 is the population mean of variable 1, and


o µ2 is the population mean of variable 2.
Data Set-Up

Your data should include two continuous numeric variables (represented in columns) that will be
used in the analysis. The two variables should represent the paired variables for each subject
(row). If your data are arranged differently (e.g., cases represent repeated units/subjects), simply
restructure the data to reflect this format.

Run a Paired Samples t Test

To run a Paired Samples t Test in SPSS, click Analyze > Compare Means > Paired-Samples T


Test.
The Paired-Samples T Test window opens where you will specify the variables to be used in the
analysis. All of the variables in your dataset appear in the list on the left side. Move variables to
the right by selecting them in the list and clicking the blue arrow buttons. You will specify the
paired variables in the Paired Variables area.

A Pair: The “Pair” column represents the number of Paired Samples t Tests to run. You may
choose to run multiple Paired Samples t Tests simultaneously by selecting multiple sets of
matched variables. Each new pair will appear on a new line.

B Variable1: The first variable, representing the first group of matched values. Move the variable
that represents the first group to the right where it will be listed beneath the “Variable1” column.

C Variable2: The second variable, representing the second group of matched values. Move the
variable that represents the second group to the right where it will be listed beneath
the “Variable2” column.

D Options: Clicking Options will open a window where you can specify the Confidence


Interval Percentage and how the analysis will address Missing Values (i.e., Exclude cases
analysis by analysis or Exclude cases listwise). Click Continue when you are finished making
specifications.
o Setting the confidence interval percentage does not have any impact on the calculation of
the p-value.
o If you are only running one paired samples t test, the two "missing values" settings will
produce the same results. There will only be differences if you are running 2 or more
paired samples t tests. (This would look like having two or more rows in the main Paired
Samples T Test dialog window.)
Example
PROBLEM STATEMENT

The sample dataset has placement test scores (out of 100 points) for four subject areas: English,
Reading, Math, and Writing. Suppose we are particularly interested in the English and Math
sections, and want to determine whether English or Math had higher test scores on average. We
could use a paired t test to test if there was a significant difference in the average of the two tests.

BEFORE THE TEST

Variable English has a high of 101.95 and a low of 59.83, while variable Math has a high of 93.78
and a low of 35.32 (Analyze > Descriptive Statistics > Descriptives). The mean English score is
much higher than the mean Math score (82.79 versus 65.47). Additionally, there were 409 cases
with non-missing English scores, and 422 cases with non-missing Math scores, but only 398 cases
with non-missing observations for both variables. (Recall that the sample dataset has 435 cases in
all.)

Let's create a comparative boxplot of these variables to help visualize these numbers.
Click Analyze > Descriptive Statistics > Explore. Add English and Math to
the Dependents box; then, change the Display option to Plots. We'll also need to tell SPSS to put
these two variables on the same chart. Click the Plots button, and in the Boxplots area, change the
selection to Dependents Together. You can also uncheck Stem-and-leaf. Click Continue. Then
click OK to run the procedure.

We can see from the boxplot that the center of the English scores is much higher than the center of
the Math scores, and that there is slightly more spread in the Math scores than in the English
scores. Both variables appear to be symmetrically distributed. It's quite possible that the paired
samples t test could come back significant.

RUNNING THE TEST


o Click Analyze > Compare Means > Paired-Samples T Test.
o Select the variable English and move it to the Variable1 slot in the Paired Variables box.
Then select the variable Math and move it to the Variable2 slot in the Paired Variables
box.
o Click OK.

OUTPUT
Tables

There are three tables: Paired Samples Statistics, Paired Samples Correlations, and Paired


Samples Test. Paired Samples Statistics gives univariate descriptive statistics (mean, sample
size, standard deviation, and standard error) for each variable entered. Notice that the sample size
here is 398; this is because the paired t-test can only use cases that have non-missing values for
both variables. Paired Samples Correlations shows the bivariate Pearson correlation coefficient
(with a two-tailed test of significance) for each pair of variables entered. Paired Samples
Test gives the hypothesis test results.

The Paired Samples Statistics output repeats what we examined before we ran the test. The Paired
Samples Correlation table adds the information that English and Math scores are significantly
positively correlated (r = .243).

Why does SPSS report the correlation between the two variables when you run a Paired t Test?
Although our primary interest when we run a Paired t Test is finding out if the means of the two
variables are significantly different, it's also important to consider how strongly the two variables
are associated with one another, especially when the variables being compared are pre-test/post-
test measures. For more information about correlation, check out the Pearson Correlation tutorial.

Reading from left to right:


o First column: The pair of variables being tested, and the order the subtraction was carried
out. (If you have specified more than one variable pair, this table will have multiple
rows.)
o Mean: The average difference between the two variables.
o Standard deviation: The standard deviation of the difference scores.
o Standard error mean: The standard error (standard deviation divided by the square root
of the sample size). Used in computing both the test statistic and the upper and lower
bounds of the confidence interval.
o t: The test statistic (denoted t) for the paired T test.
o df: The degrees of freedom for this test.
o Sig. (2-tailed): The p-value corresponding to the given test statistic t with degrees of
freedom df.
DECISION AND CONCLUSIONS

From the results, we can say that:

o English and Math scores were weakly and positively correlated (r = 0.243, p < 0.001).
o There was a significant average difference between English and Math scores (t397 =
36.313, p < 0.001).
o On average, English scores were 17.3 points higher than Math scores (95% CI [16.36,
18.23]).

You might also like