Statistics Translated A Step by Step Guide To Analyzing and Interpreting Data 2nd Edition
Statistics Translated A Step by Step Guide To Analyzing and Interpreting Data 2nd Edition
Also Available
STEVEN R. TERRELL
First, I would like to thank C. Deborah Laughton, Publisher, Methodology and Statis-
tics, at The Guilford Press for her untiring support, ideas, critique, and insight during
this project; there’s nothing to say except she’s the editor that every author dreams
of. My job was made far easier by her professional and caring staff. I offer a special
thanks to Katherine Sommer, Paul Gordon for the creative cover, and others at Guil-
ford for their tremendous help along the way. William Meyer—all I can say is “Thank
you so much.” The book looks great and I truly appreciate what you’ve done! I am
also most grateful to the following reviewers: Brian Withrow, Department of Crimi-
nal Justice, Texas State University; Tammy Kolb, Center for Education Policy Analy-
sis, Neag School of Education, University of Connecticut; Cyndi Garvan, College of
Education, University of Florida; Chris Ohana, Mathematics and Science Education
Groups, Western Washington University; Robert Griffore, Human Development and
Family Studies Department, Michigan State University; Christine McDonald, Depart-
ment of Communication Disorders and Counseling, School, and Educational Psychol-
ogy, Indiana State University; Val Larsen, James Madison University; Adam T hrasher,
Department of Health and Human Performance, University of Houston; Melissa
Gruys, Department of Management, Wright State University; Lynne Schrum, Elemen-
tary Education Program, George Mason University, retired; Dave Edyburn, School of
Education, University of Wisconsin–Milwaukee; M. D. Roblyer, Instructional Technol-
ogy and Distance Education, Nova Southeastern University, retired; Robin A. Barry,
Department of Psychology, University of Wyoming; Andrew H. Rose, Master of S ocial
Work Program, Texas Tech University; Charles M. Super, Center for the Study of Cul-
ture, Health, and Human Development, University of Connecticut; and Todd D. Little,
Department of Educational Psychology and Leadership, Texas Tech University.
vii
viii | Acknowledgments
Many of my friends played a role in this book without knowing it. When I was
stuck for an idea for a case study, all I had to do was think of what they do in their
professional lives, and the ideas flowed from there. Thank you; you guys are the best.
Finally, thank you to my students at Nova Southeastern University. By using ear-
lier drafts of this book, you provided me the feedback that helped me write what you
needed to know—statistics can be simplified.
Brief Contents
ix
x | Brief Contents
APPENDIX A. Area under the Normal Curve Table (Critical Values of z) 377
APPENDIX B. Critical Values of t 378
APPENDIX C. Critical Values of F When Alpha = .01 379
APPENDIX D. Critical Values of F When Alpha = .05 381
APPENDIX E. Critical Values of F When Alpha = .10 383
APPENDIX F. Critical Values of Chi-Square 385
APPENDIX G. Selecting the Right Statistical Test 386
Glossary 387
Answers to Quiz Time! 397
Index 421
About the Author 433
Extended Contents
xi
xii | Extended Contents
Pie Charts 88
Bar Charts 90
Graphically Describing Quantitative Data 92
Scatterplots 92
Histograms 97
Don’t Let a Picture Tell You the Wrong Story! 97
Summary of Graphical Descriptive Statistics 99
The Normal Distribution 99
Things That Can Affect the Shape of a Distribution
of Quantitative Data 101
Summary of the Normal Distribution 112
Do You Understand These Key Words and Phrases? 113
Quiz Time! 113
APPENDIX A. Area under the Normal Curve Table (Critical Values of z) 377
APPENDIX B. Critical Values of t 378
APPENDIX C. Critical Values of F When Alpha = .01 379
APPENDIX D. Critical Values of F When Alpha = .05 381
APPENDIX E. Critical Values of F When Alpha = .10 383
APPENDIX F. Critical Values of Chi-Square 385
APPENDIX G. Selecting the Right Statistical Test 386
Glossary 387
Answers to Quiz Time! 397
Index 421
About the Author 433
Introduction: You Do Not Need to Be a
Statistician to Understand Statistics!
As I said in the first edition of this book, there aren’t a lot of students who take a
statistics class on purpose; most of them are there only to fulfill the requirements of
their chosen major. It is even rarer to find a student pleasantly anticipating what the
semester will bring; they’ve heard the horror stories, they have no clue how statistics
affect their lives, and many aren’t much interested in finding out! I’ve had a lot of
readers tell me the approach I took in the first edition really helped them. Although
I really appreciate their compliments, I know there are always new opportunities for
learning—both for me and my students! Before I explain what I’ve done to hopefully
make that happen, let me talk a little about how I got to where I am.
A Little Background
I took my first statistics class as a prerequisite to a graduate program. I am the first to
admit that I didn’t understand a thing. The formulas were mysterious, the professor
even more so, and, for the life of me, I couldn’t tell the difference between a t-test and
a teacup. I struggled through, made a passing grade, and went on to graduate school.
As part of my curriculum, I had to take another statistics class. Much like the first,
I couldn’t make heads or tails out of what the teacher was saying, and the C grade I
earned was proof of that. “At least I passed,” I thought. “I will never have to take a
statistics course again!”
Never say never, I soon learned, for I had barely finished my master’s degree when I
decided to continue for the doctorate. Faced with four required research and statistics
classes, I resolved to do my best and try to make it through. I figured I had already
1
2 | STATISTICS TRANSLATED
passed the prior two classes and could prevail again. Besides, I thought, if I couldn’t
do the work, I didn’t deserve the degree.
Although I entered my first doctoral class looking and feeling much like the stu-
dents I described earlier, things soon took a change for the better. During the class I
discovered the teacher was a “real person,” and this “real person” was explaining sta-
tistics in a way I could understand. Not only could I understand what she was saying,
I could understand the largely unintelligible textbook! I came out of the class feeling
great. In fact, not only did I learn what I needed to learn, but I became so interested
that I wound up taking quite a few extra statistics courses.
When I finished my doctorate, I liked what I was doing so much I convinced
the dean where I worked to allow me to teach statistics in our graduate school. I was
excited about teaching this subject for the first time, and, although that was over 25
years and many classes ago, I still love what I am doing. People look at me funny when
I say this, but most of them are like the people I have described earlier. What they do
not know is I’ve learned a few things that make it far easier to use and understand
statistics; I will tell you what they are.
The science of statistics is nothing more than the use of arithmetic tools to
help us examine numeric data we have collected and make decisions based
on our examination. The first of these tools, descriptive statistics, helps
us understand our data by giving us the ability to organize and summarize
it. Inferential statistics, the second tool, helps us make decisions or make
inferences based on the data.
Second, I’ve found that students greatly overestimate the proficiency in math they
need in order to calculate or comprehend statistics. Whenever I find people worry-
ing about their math skills, I always ask them, “Can you add, subtract, multiply, and
divide? Can you use a calculator to determine the square root of a number?” Often,
their answer is “yes” so I tell them, “That is great news! You have all the math back-
ground you need to succeed!” I’ve found that many people, when I tell them this,
breathe a big sigh of relief!
Next, I’ve discovered that a class in statistics isn’t nearly as bad as people antici-
pate if they approach it with an open mind and learn a few basic rules. Since most
people want to be able to use statistics to plan research, analyze its results, and make
sound statistically based decisions, they want to become consumers of statistics rather
than statisticians per se. As consumers, they want to be able to make decisions based
on data, as well as understand decisions others have made. With that in mind, this
Introduction | 3
book will help you; all you must do is approach it with an open mind and learn a few,
easily understood steps.
Fourth, I have found that students get confused by all the terminology thrown
around in a statistics course. For instance, for a given statistical problem you could
find the correct answer by using procedures such as the Scheffé test, Tukey’s HSD test,
or the Bonferroni procedure. The truth is, you could use any one of these techniques
and get very similar answers. I was recently at a meeting where a very prominent stat-
istician jokingly said we do this “to confuse graduate students”; there certainly seems
to be a lot of truth in what he said! These types of terms, as well as the ambiguity of
having many terms that mean basically the same thing, seem to intimidate students
sometimes. Knowing that, I like to stick to what students need to know most of the
time. We can deal with the exceptions on an “as needed” basis.
Along these same lines, I’ve discovered many instructors try to cover far too much
in an introductory statistics class, especially when they are dealing with people who
want to be consumers of statistics. Research shows that 30 to 50% of studies published
in prestigious research journals use a small set of common descriptive and inferential
statistics. My experience, however, is that the average student tends to use the same
basic statistical techniques about 90% of the time. Knowing that, we are left with a
manageable problem: we need to be able to identify when to use a given statistical test,
use it to analyze data we have collected, and interpret the results. To help you do that,
I have developed a set of six steps to guide you. Using these steps, you will be able to
work with the statistics you need most of the time.
sion making. After we get through the basics, we will move into the “real statistics.” For
now, just read through these steps and the brief description of what each step entails.
State a Hypothesis
S TEP 2
A hypothesis is a researcher’s beliefs about what will be found
or what will occur when he or she investigates a problem. For
example, let’s suppose that our school is faced with lower than
average scores on a statewide math test. We have heard that it is
possible that the problem may be based on gender; apparently
males do not perform as well in math as their female classmates, and this may cause
the classes’ overall average to be lower. Since we have considerably more males than
females in our school, we then must ask ourselves, “How can I find out if this is true?”
Or, if we plan on using some type of intervention to increase the achievement of our
male students, we could ask, “What do I think will occur based on what I do?” These
questions lead us to make a statement regarding our beliefs about what will occur. As
I said earlier, stating the hypothesis is the first step in understanding “how” we can
investigate the problem.
Summary
As I’ve said, if you understand these six steps and what we are about to discuss,
you can become a proficient consumer of statistics. Let me remind you that this
isn’t going to be an exhaustive overview of statistics. Again, about 90% of the time,
students use a given set of statistics over and over. That is what this text is about—
the common set of statistical tools that you, as a consumer of statistics, need to be
familiar with.
In the next few chapters, we are going to go through each of these steps, one by
one and explain, in detail, what is necessary to accomplish the step. After we have
looked at each step in detail, we will then apply the knowledge we have gained by
analyzing data and making decisions based on our analysis. Believe me, it is going to
be far easier than you think, so let’s get started!
Introduction
The first two steps in the six-step model should help good consumers of statistics
understand exactly what they intend to investigate. As I said earlier, Step 1 focuses on
“what” will be investigated by clearly defining the problem; Step 2 starts the process
of describing “how” it will be investigated. Unfortunately, far too many people try to
get to Step 2 without understanding how it relates to the problem at hand. Because
of that, let’s see what constitutes a good problem statement.
7
8 | STATISTICS TRANSLATED
argument is, if you’re a consumer of statistics, most of the time the problem statement
will be clearly stated for you; if not, in most instances it’s clearly implied.
I’ll admit that this point is arguable, but the bottom line is this: whether we’re con-
sumers of statistics or conducting actual research, we still need to be able to recognize
a good problem statement or develop one when necessary. Either way, keep in mind
that, in Step 1, we are identifying “what” we are going to investigate; the remaining
five steps will help us understand “how” to investigate it. Knowing that, we’re going
to move forward by discussing the characteristics of a good problem statement; we’ll
then learn how to correctly write a problem statement.
Our school has a lower than average passing rate on the state licensure test.
This type of problem is generally called a practical research problem in that it focuses
on an issue within our own school or organization. We may also investigate research-
based problems: those that come from conflicts or contradictions in previous findings
Identifying a Research Problem and Stating Hypotheses | 9
or a desire to extend the knowledge about a problem area. For example, let’s say that,
in our readings, we have found conflicting research on the effectiveness of using age-
appropriate popular press material, such as magazines, to teach reading in elemen-
tary schools; we could easily write a problem statement such as:
Despite the source of the problem we’re investigating or the type of research it
represents, we must keep the six characteristics of a good problem in mind. As you are
reading these characteristics, keep in mind that we’re not going to look at them in any
order; they all must be met before we can move forward.
with a scope you can manage.” For example, look at the following “problem statement”
presented to me by one of my students:
One of the most dynamic affordances of the Internet is the ability to offer
educational programs to students throughout the globe. This has led to
nearly 90% of all universities and colleges in the United States offering
Internet-based courses. While convenient for students, research has shown
that attrition from Internet-based classes is significantly higher than that
from classes taught in a traditional environment.
The attrition rate in our school’s Internet-based classes is higher than the
attrition rate from our school’s traditional classes.
By narrowing down the first statement, the student was able to create a problem
statement that could be easily investigated.
ducted in a valid manner; therefore, the study seemed to show that the intervention
didn’t work!
Let’s look at a few more examples and determine if their investigation is either
theoretically or practically significant:
Youth who receive their driver’s license at age 16 have a higher number of
accidents than youth who obtain a license at age 18 or older.
This problem would seem to have some practical significance, but there are obvi-
ously two reasons we might not want to investigate it. First, is it really age that affects
the number of accidents, or is it the amount of experience the driver has? Doesn’t it
stand to reason that youngsters who receive their license at 18 would have, percentage-
wise, the same number of accidents as those who started earlier? As beginning driv-
ers, they would have the same level of experience as their 16-year-old peers. Second,
this problem doesn’t warrant a full-scale investigation; it would be easy to simply look
at the percentage of drivers in each age group who have had accidents to determine
whether there is a difference.
Let’s look at one last problem:
Nurses whose first job is in a private practice are far more likely to change
jobs within three years than nurses whose first job is in a hospital or other
health care setting.
While this problem seems like it might bear investigating, I would ask the ques-
tion, “Why is it important how often a nurse changes location? Does this have a nega-
tive effect on the facility they are leaving?” In short, this seems like only half of a
practically significant problem statement:
Nurses whose first job is in a private practice are far more likely to change
jobs than nurses whose first job is in a hospital or other health care setting.
Replacement and training of new personnel is a burden and the burden has
been shown to negatively affect patient care.
Writing the problem statement in this manner clearly shows its practical signifi-
cance.
What does this mean? Is what the author is trying to say clear and concise? In this
case, it is not. In order to state the problem more clearly, the author needs to establish
the relationship between the reduced or free lunches and student self-esteem. It could
be presented in this manner:
Thinking back, does it meet the six criteria of a good problem? First, we can
only hope that it’s interesting to the researchers and that they have the necessary time,
skills, and resources; if not, why would they want to investigate the problem? Next,
the problem can certainly be analyzed through the collection and analysis of numeric
data, and investigating it has practical significance. It appears that it would be ethical
to investigate the problem, but, again, an institutional review panel could confirm
that for us. Finally, the scope of the problem is manageable since it is limited to one
school. How about this?
In this case, we do not know much about what the researchers are proposing to
investigate, but it could be reworded to make it clearer:
14 | STATISTICS TRANSLATED
Children diagnosed with epilepsy and whose diets contain high levels of fat
and sugar have a greater number of seizures than children diagnosed with
epilepsy who eat healthier diets.
Here again, we’re assuming the researcher is interested in the problem and has
the necessary knowledge, skills, and time. Investigating this problem has both theo-
retical and practical significance, and numeric data can be collected. There is a bit
of an issue with the ethics of investigating this problem in that (1) the subjects are
children and (2) they are being treated for a medical condition; both of these raise a
red flag with review boards and would be closely examined to ensure the health and
well-being of the children while in the study. The scope of the problem, however, seems
to be our biggest concern. Who are the children the author wants to work with? All of
those in the United States? The state they are in? Who? In short, this problem state-
ment should reflect a much narrower scope—one the researcher has access to and can
work with.
In this case, we have included a variable that describes the cause—different types
of incentives—but instead of a variable measuring morale, we have included one that
measures motivation—these are two completely different things. Employees could be
highly motivated for fear of losing their jobs, but morale could be very low. While this
problem statement looked good at first glance, remember the first criterion: a prob-
lem statement must be clear and concise.
Finally, is this a good problem statement if we are interested in investigating the
ability of a dog to be trained and whether the dog was purchased from a breeder or
from a pet store?
This statement seems to be OK; the location where the dog was purchased is the
Identifying a Research Problem and Stating Hypotheses | 15
cause we want to investigate and the dog’s ability to be trained is the effect. Thinking
back in this chapter, however, are there other issues with this problem statement that
should concern us? I would argue that there is no practical or theoretical significance
in investigating the problem; after all, what difference does it make? When it comes to
purchasing a dog, aren’t there a lot more factors involved in selecting one than their
potential to be trained not to bark? Besides, if the dogs are well trained, who cares
where they came from?
That is a large assumption on my part, isn’t it? Suppose I do conduct research and
show that there is increased morale and enthusiasm after obtaining new technology?
Have I proven anything? Absolutely not! There are far too many things that affect
morale: the employees themselves, a change in the work requirements or responsibili-
ties because of the new technology, unrealistic administrative expectations and so on.
At the same time, what would happen if levels of morale went down? Would we go
back to Apple, Dell, or Hewlett-Packard and ask for our money back? Of course not;
we do not know if the technology had any effect on what we’re measuring. There are
far too many factors that influence morale to let us assume we can “prove” anything!
Once you are sure you have met these criteria and begin writing the actual prob-
lem statement, you must ensure that you are clear and concise, that all variables to be
investigated are included, and that you do not interject your personal bias. Following
all these rules ensures that we have a problem statement we can use to state a hypoth-
esis in Step 2 of our six-step model.
Science achievement scores of the morning students are higher than science
achievement scores of the afternoon students.
Yes, this is very clear, and all the variables to be considered are included; we’re
comparing the achievement of students in the morning to that of students in the
afternoon. I have not interjected my bias; I have simply stated what I have observed.
Throughout the rest of this chapter, we’re going to focus on writing hypotheses. While
we won’t state the problem each time, as I said in the last chapter, it will be clearly
inferred.
Given that, I decide to follow up on my observations by going to the library and
Identifying a Research Problem and Stating Hypotheses | 17
try to determine if there have been studies investigating the effect of time of day on
achievement. Much to my surprise, I discover a lot of information suggesting that stu-
dents studying science in the morning seem to outperform students taking science in
the afternoon. Based on this finding, my observations are supported; the time of day
a student takes science might influence his or her class achievement.
Now, based on what I have observed and read, I can make a statement about what
I believe is occurring in my school. In other words, I can state a hypothesis since, by
definition, a hypothesis is nothing more than a statement expressing the researcher’s
beliefs about an event that has occurred or will occur. A well-stated hypothesis has
four requirements:
1. It must provide a reasonable explanation for the event that has occurred or
will occur.
2. It must be consistent with prior research or observations.
3. It must be stated clearly and concisely.
4. It must be testable via the collection and analysis of data.
Knowing these four things, I can state a formal hypothesis based on what I have
seen:
By reading this hypothesis, we can see it meets our four criteria. First, it is stated
clearly and is consistent with both our prior observations and reading. It is a reason-
able explanation for what has occurred, and we are able to collect data for statistical
analysis. Given these things, it appears we have met the criteria—we have a well-stated
hypothesis.
When we are not comparing something against an exact value, we can state
hypotheses that compare data collected from two or more groups. For example, with
the hypothesis we stated concerning the difference in achievement between science
students in the morning and afternoon, we expect the morning group to have higher
scores than the afternoon group. We aren’t stating an exact value; we are just saying
that one group will do better than the other:
We can also make a comparison to an exact value. For example, imagine we are
investigating whether the five-year recidivism rate for inmates released from prisons
in our state is less than the national average of 76%. In order to conduct our study, we
would state the following hypothesis:
The recidivism rate of inmates released from prisons in our state is less than
the national average of 76%.
Nondirectional Hypotheses
In the preceding examples, the hypotheses were consistent with the literature or our
experience; it would not make sense to state them in any other manner. What hap-
pens, though, when the prior research or our observations are contradictory?
To answer this question, let’s use the debate currently raging over block sched-
uling as an example. For anyone who hasn’t heard of block scheduling, it involves
students in middle and high schools attending classes every other day for extended
periods. For example, a student might be enrolled in six classes, each of which meets
in a traditional class for one hour each day. In a school with block scheduling, the
student meets for three two-hour classes one day and meets for two hours with each of
the other three classes the next day. Proponents feel that longer periods allow teach-
ers to get into more detail and make for more meaningful class sessions. Opponents
of block scheduling criticize it because they believe many teachers do not know how
to effectively use a longer class period or, as is the case with many math teachers, they
believe students need to be exposed to their subject matter every day.
Instead of both sides of the issue giving us their personal beliefs or feelings, would
not it be a good idea to see what the literature says? The reviews of block scheduling
are just as mixed in the academic research journals. Some articles suggest that block
scheduling negatively affects student achievement, and others seem to show that it
increases student achievement. Knowing this, how do we write our hypothesis? The
answer is simple: we must state a nondirectional hypothesis:
In this example, the hypothesis implies no direction; instead, we are stating that
we believe a difference in algebra achievement will exist between the two groups. No
“greater than” or “less than” direction is implied.
Just as was the case with the directional hypotheses, we can use a nondirectional
hypothesis to make a comparison to an exact value. Suppose, for example, someone
told me the average retirement age in the United States is 63. In order to investigate
this, I could ask a group of retirees how old they were when they retired. I could then
use that data to test the following hypothesis:
The average retirement age of workers in the United States is not 63.
Notice here, again, I have not said “less than” and I have not said “greater than.”
20 | STATISTICS TRANSLATED
All I care about is if a difference exists. Later in the book we will see that different sta-
tistical tests are used when we compare data we have collected against a specific value
or compare data we have collected from two or more groups.
Obviously if something is not greater than 100, it could be less than 100 or exactly
equal to 100. Since, in our research hypothesis, we are only interested in the “greater
than” scenario, we can state our null hypothesis in this manner:
In this case, the opposite of “higher levels” could either be less than or equal to:
Using our example concerning school uniforms, our research hypothesis was
Here we are doing exactly the opposite of the prior two research hypotheses;
now we are using a “less than” research hypothesis. Given that, we will state our null
hypothesis in the following manner:
Children attending school where uniforms are required do not have fewer
disciplinary problems than children attending school where uniforms are not
required.
Finally, if we were investigating whether the average number of years a sailor stays
in the navy is greater than 20, we could state:
Because we are only worried about the greater than scenario, our null hypothesis
would read:
Children attending schools where uniforms are required do not have fewer
disciplinary problems than children in schools where uniforms are not
required.
In thinking about this null hypothesis, it is stating that one of two things will
occur:
1. Children attending schools where uniforms are required have exactly the
same number of disciplinary problems as children in schools where uniforms
are not required.
2. Children attending schools where uniforms are not required will have a great-
er number of disciplinary problems than students in schools where uniforms
are required.
Given this, as well as other issues we will discuss shortly, it is better to state the
null hypothesis for the directional hypothesis in the following manner:
Stating the null hypothesis in this manner ignores the “greater than” or “less
than” condition stated in the research hypothesis by saying that no difference exists;
doing so better reflects what we are trying to hypothesize. No difference would mean
we have an equal number of disciplinary problems in the two groups. If we subtracted
the average of one group from the average of the other group, the answer would be
zero or “null.” This may not seem logical right now, but it is the correct form for sta-
tistical decision making, and we will be stating it in this manner for the remainder of
this book.
Stating the null hypothesis for a nondirectional hypothesis is very logical; the
exact opposite of “there will be a difference” is “there will be no difference.” Putting
that into a null format, you would write:
This one is not as straightforward but think about it this way. Since we have the
words “is not equal to” in the research hypothesis, the opposite of that is “is equal to.”
We can put our null hypothesis into the correct form by stating:
Again, in this case, if the average age of my students is 42 and the average age
of students in the population is 42, when I subtract one from the other, the answer is
zero. Zero obviously corresponds to null, again the name of the type of hypothesis.
This would be easy to test; we could administer tests to the students and then look
at the difference in scores between the boys and the girls, then determine if there is
a difference between the two groups. That would be the correct thing to do, right?
Unfortunately, it is not that simple and leads us to briefly talk about sampling error, one
of the basic building blocks of inferential statistics.
Sampling Error
Any time we are comparing data we have collected to an exact value or to another set
of data we have collected, there is only a very small chance they would be exactly the
same. This, of course, could wreak havoc with our decision making unless we recog-
nize and control for it. Let’s look at a couple of examples to help get a better under-
standing of this problem.
Suppose before we collect any data, we already know that the boys and girls have
exactly the same ability level in math. This means, theoretically, that if we gave them a
math exam, the average score for the boys should be exactly equal to the average score
for the girls. While this could occur, there is a very high probability that the average
scores of the two groups will not be exactly the same. This is because, although math
knowledge would be the highest contributor to an individual score, many, many other
factors might affect individual scores. Factors such as how well students slept the night
before an exam, their mental attitude on the day of the exam, whether they had break-
fast, or even how well they guessed on some answers might affect their score. Because
24 | STATISTICS TRANSLATED
of this, we could give the students the same exam repeatedly and two things would
occur. First, the average score for both groups would change slightly each time, and
second, in only a very few instances would the scores for the two groups be exactly
equal. The fact that the overall average changes daily is caused by sampling error, as is
illustrated in Table 1.1.
The same type of problem exists when we take samples of data from a large group.
For example, suppose we have a group of 900 students and we are interested in find-
ing the average reading ability for the entire group (we call all possible participants in
a group the population and any value we know about the population is called a popula-
tion parameter). Giving a reading examination to all 900 of them is possible but testing
a representative sample would be far easier (a sample is a subset of the population, and
any value we know about it is called a sample statistic). Theoretically, if you randomly
pick a sample of adequate size from a population, any measure you take of the sample
should be about the same as the measure of the entire population. In this case, if we
randomly chose a group of 30 students and tested them, their average reading ability
should reflect that of the entire population. Unfortunately, again, our results will be
affected by sampling error; let’s use an example to demonstrate this.
Let’s use a traditional range of zero to 100 for our reading scores. Suppose, how-
ever, that I already know the average reading ability is 70 for the 900 students I want
to survey. I also know that the average score of 70 represents 300 students who scored
50 on the exam (we will refer to them as Group A), 300 who scored 70 on the exam
(Group B), and 300 who scored 90 on the exam (Group C). All our data are shown in
Table 1.2.
This means that, when I select 30 students as my random sample, I hope to get a
set of scores with an average of 70, exactly that of the population. Because of the ran-
dom nature by which I choose the students, however, the sample average will usually
not be exactly equal to the population average. Most of the time, the average score of
my sample will be about 70, but at other times I might randomly select more students
Identifying a Research Problem and Stating Hypotheses | 25
with lower scores, resulting in an average less than 70. In other samples, a higher num-
ber of students with scores greater than 70 would result in a sample average larger
than the population average. In short, the luck of the draw would affect the average
of my sample; sometimes it would be exactly equal to the population average, but, in
many instances, it would be higher or lower.
This idea of sampling error is further demonstrated in Table 1.3. In this table, I
have created five samples by randomly selecting 30 scores from the population. In the
first sample, I have 8 scores of 50 from Group A, 11 scores of 70 from Group B, and
11 scores of 90 from Group C. This gives me a sample average of 72, slightly higher
than the population average of 70. In the second sample, however, you can see that I
have more scores from the A and B groups than I did in the first sample; this causes
my sample average to fall below 70. The number of values in each group are about the
same in the third and fourth sample, but the average goes even lower in the fifth sam-
ple; many scores from the A group have pulled our sample average down to 64. Again,
this will even out over time, with most of the sample averages clustering around 70;
with enough samples it would equal the population mean of 70 exactly. Despite that,
it’s clear that randomly selecting samples can cause misleading results.
TABLE 1.3. Five Samples of Average Reading Scores for Three Groups
A B C
Sample (50) (70) (90) Average
1 8 11 11 72
2 12 12 6 66
3 9 9 12 72
4 10 10 10 70
5 14 11 5 64
In order to make sure we’re getting the hang of this, look at Table 1.4. In the
first column we have a population of scores where the average is 10. If I take random
samples of five of the scores, shown in the remaining four columns, I compute sample
averages ranging from 7 to 13. Again, this shows that random sampling can dramati-
cally affect our calculations.
Going back to our example, suppose we now found the average grade for students
taking science in the morning was 89% and the average score for students taking sci-
ence in the afternoon was 83%. Although we would need to verify these findings using
the appropriate statistical test, we would feel safe in saying the morning students had
higher achievement than the afternoon students and it wasn’t due to sampling error.
Knowing this, we would reject the null hypothesis; the difference between the groups
does not seem to be due to chance. By saying we reject the null hypothesis, we are say-
ing we support our research hypothesis. We must take it a step further though.
For our example of children wearing uniforms versus those children not wearing
uniforms, our research hypothesis would be
By adding the word “significant” to our hypotheses, we are supporting our desire
to ensure that any difference we find when we are analyzing data is “real” and not due
to chance. This idea is shown in Table 1.6.
28 | STATISTICS TRANSLATED
TABLE 1.6. Actions Regarding the Null and Research Hypotheses When
Differences Are Significant
Action regarding the null hypothesis Action regarding the research hypothesis
When we reject the null hypothesis, the When we reject the null hypothesis, we
groups appear to be significantly different. support the research hypothesis; what we
Any differences appear to be due to reasons have hypothesized appears to be accurate.
other than sampling error. Any differences appear be “real” and due to
reasons other than sampling error.
When we fail to reject the null hypothesis, When we fail to reject the null hypothesis,
the groups appear not to be significantly we fail to support the research hypothesis;
different. Any differences appear to be due what we have hypothesized does not appear
to sampling error. to be accurate. Any differences appear to
be due to sampling error.
When you are calculating inferential statistics, always test the null
hypothesis. When you test the null hypothesis, you are not trying to “prove”
anything. You are simply trying to determine one of two things. First, if
your results cause you to reject your null hypothesis, you will support your
research hypothesis. Second, if your results cause you to fail to reject the null
hypothesis, you fail to support your research hypothesis.
These are statistical “words of wisdom.” The task of learning and understanding
statistics will be much easier if you do not forget them.
Quiz Time!
Before we move forward, let’s make sure we have got a good handle on what we have covered
by working on the problem statements and case studies below. Once you are finished with both
sections, you can check your work at the end of the book.
Problem Statements
Evaluate each of the problem statements below; are all six criteria for a good problem state-
ment met? Are they clear, do they include all variables, and do they avoid interjecting personal
bias? If the problem statement does not meet the criteria, what is wrong?
4. The tremendous growth of the beef industry in Central Iowa has left many smaller,
family-owned ranches facing the possibility of bankruptcy. This study will investi-
gate the attitudes of small ranchers toward three options: going into bankruptcy,
merging with larger ranching conglomerates, or attempting to stay lucrative by
developing specialty beef products aimed at niche markets.
7. This study will investigate average miles per gallon between three grades of gaso-
line in imported cars.
Case Studies
State both the research and null hypothesis for each of the following scenarios:
for them find time to meet with you. Since we are in a wealthier part of town, we experience
exactly the opposite. When we send out a note asking to meet with them, most of our parents
contact us within a day or two.” The young teacher, not having much experience, then told her
friend: “I cannot believe that; parents are parents, and they should all want to hear what the
teacher says about their child. I would like to collect data to help determine if there really are
fewer parent–teacher conferences in my school than in yours.” In order to do that, how would
she state her null and research hypotheses?
sulting firm, asked several questions about the content of her advertisements and then asked
where they were published. The Vice-President quickly answered, “They are in the newspaper,
of course; isn’t that where all job announcements are placed?” The consultant explained to
the Vice-President that many job applicants look for positions advertised on social media sites
such as LinkedIn, Twitter, and Facebook. Many young people, she explained, do not bother to
read the newspaper, and they certainly do not look for jobs there.
The Vice-President decided it couldn’t hurt to try advertising on the Internet and was
interested in seeing if the number of applicants really increased. If she wanted to test the effec-
tiveness of the advertisement by comparing the number of applicants in the month prior to the
advertisement to the number of applicants in the month after the advertisement was placed on
the Internet, how would she state her research and null hypotheses?
Introduction
Now that we understand how to state our hypothesis, let’s take a closer look at it.
By doing so, we will find much of the information that will ultimately help us select
the appropriate statistical tool we will use to test it. Let’s refer to one of the research
hypotheses we stated in Chapter 1 to get a better feel for what we are going to do:
In this hypothesis, we can see that we are looking at the relationship between the
time of day students take science and their performance in the course. Although we
ultimately must rely on statistical analysis to confirm any relationship, for now we
can think of our independent variable as the “cause” we are interested in investigat-
ing and our dependent variable as the “effect” we want to measure. In this chapter
we’ll do several things; first we’ll talk about identifying the independent and depen-
dent variables, and from there we’ll move to a discussion of different data types and
how we can statistically describe them.
33
34 | STATISTICS TRANSLATED
variables can be developed in two different ways. First, if we are interested in deter-
mining the effect of something that occurs naturally and does not require intervention
by the researcher or statistician, they are called nonmanipulated independent variables
(these are sometimes called quasi-independent variables). In cases where the researcher
must actively manipulate a variable by randomly creating groups or assigning objects
to a class, the independent variable is manipulated or experimental.
Male employees miss significantly more days of work than female employees.
The independent variable is gender and, since we do not assign employees to one
gender or the other, it isn’t manipulated. We are interested in seeing the relationship
between gender (the independent variable or cause) and the number of workdays
missed.
How about this hypothesis?
We are not assigning individual students to the various classes; we are using their
current status. In this case, “year in college” is the nonmanipulated independent vari-
able, and we are trying to determine its effect on grade point average.
What is the independent variable in this hypothesis?
People with pets in their home have significantly lower levels of depression
than people who do not have pets in their home.
In this case, pet ownership is the independent variable; either a person has a pet
at home, or they do not. The dependent variable is a person’s level of depression.
What is the independent variable in the following hypothesis?
Canadian citizens take significantly more vacation days each year than
citizens of the United States.
Here our dependent variable is the number of vacation days taken each year;
our independent variable is country of citizenship: the United States or Canada. As
you can see, the citizens are already living in either Canada or the United States; this
means the independent variable is nonmanipulated.
Identifying the Independent and Dependent Variables | 35
This is pretty much a “no-brainer” hypothesis, but it serves to illustrate the point.
Because we are going to compare two groups, low achievers and high achievers, our
independent variable must be achievement. Our only problem is, how do we deter-
mine which students we are going to label as low achieving and which we are going to
label as high achieving?
This can be easily done. Suppose, for instance, we have the results of a recent
administration of a standardized math test. We can label those as falling into the
lower 50% as low achievers and those scoring in the upper 50% as high achievers.
While this may sound like we are manipulating the independent variable, we really
are not. Although we have decided the parameters of group membership, the students
naturally fell into either the low-achieving or high-achieving group by their own per-
formance.
It is important to understand we can divide the students by whichever standard
we choose, if it logically makes sense. For example, we could have also divided the
students into four ability groups or arranged them so that anyone in the top 30% was
labeled as a high achiever and everyone else was labeled as low achievers. Regardless
of how we do it, setting up the independent variable in this manner is justified.
In this case, we have stated a directional research hypothesis and, in order to test
it, we could randomly assign participants to one group or the other. Since we would be
actively intervening, this is a manipulated, or as it is sometimes called, an experimen-
tal independent variable. Remember, though, if the two groups had already existed, it
would be a nonmanipulated independent variable.
Now, suppose we are interested in looking at the effect of taking an after-lunch
nap on the behavior of children in kindergarten. Further, suppose that we have six
36 | STATISTICS TRANSLATED
classes and decide to randomly assign three classes to the “nap” group and three class-
es to the “no nap” group. In this case, we are dealing with existing classrooms, but we
are still randomly assigning them to either
the “nap” or “no nap” group. This meets
the criteria for a manipulated independent
variable, and our progress so far is shown
in Figure 2.1.
Customer satisfaction for airlines that charge customers for checking their
bags will be significantly lower than customer satisfaction of airlines that do
not charge for bags that are checked in.
In this case, we are trying to investigate cause and effect by looking at the differ-
ence in customer satisfaction toward airlines that charge for baggage and those that
do not. We could label our independent variable “airline,” and there are two levels:
those airlines that charge customers to check their baggage and those that do not. The
dependent variable is customer satisfaction.
Again, in some cases we may have more than two levels of the independent vari-
able:
Here, our independent variable is “season,” and there are four levels; fall, win-
ter, spring, and summer. Our dependent variable would require that we measure the
amount of rainfall during each of the four seasons.
Let’s look at one more hypothesis before moving forward:
In this case, the independent variable is “school district” and there are three lev-
els: rural, suburban, and urban. What about the dependent variable? Wait a minute:
this looks odd, doesn’t it? We seem to have two dependent variables: absences and
dropouts. How do we handle this? This is a very common problem. As I’ve said, just
as is the case of multiple independent variables, any time we are dealing with more
than one dependent variable we will use multivariate inferential statistics to test our
hypotheses. We won’t concern ourselves with that idea just yet, so let’s move forward;
Identifying the Independent and Dependent Variables | 39
our next task involves understanding the type of data our dependent variable repre-
sents.
Nominal Data
Nominal data, sometimes called categorical or discrete, are data that are measured in
categories. For example, if we are interested in determining the number of males and
females in our class, the data value “gender” is nominal in nature. In doing this, we
only need to count the number of persons falling into each category. The important
thing to remember is that the categories are mutually exclusive because the items
being counted can only fall into one category or another. For example, a person can-
not be counted as both a male and a female. Other examples of nominal data include
ethnicity, college class, or just about any other construct used to group persons or
things. In Table 2.1, we can see where we have asked 100 people the political party to
which they belong. Each of the parties represents a distinct category; a person cannot
belong to more than one.
In Table 2.2, we have asked 500 students to tell us the primary method they use
to get to school. There are five possible choices and, since we have asked for only the
primary method of transportation, a student could only choose one. Since all we want
to know is how many students fall into each group, it meets the definition of nominal
data.
Ordinal Data
The second type of data is called ordinal data or, as it is sometimes called, rank data.
When we use ordinal data, we are interested in determining what is important or
significant while investigating non-numeric concepts such as happiness, discomfort,
40 | STATISTICS TRANSLATED
or agreement. For example, I recently took my car to the dealership for scheduled
maintenance. Within a day of picking it up, I received an email asking me to rank their
service using a form like the one in Table 2.3. You can see that a scale ranging from
“Very Dissatisfied (1)” to “Very Satisfied (5)” was used. I was asked to assign a ranking
to each of the statements shown therein.
There are things about ordinal data that are essential to remember. First, the
answers are subjective; there is no one standard for the choices. You could have the
same service performed, and pay the same cost, as another customer. The rank you
give is your opinion; you might be satisfied with the price you paid while another
customer is very dissatisfied. Because of that, even if we assign numeric values to the
scores (e.g., Very Dissatisfied = 1 and Satisfied = 4), we can’t compute an average level
of satisfaction for the service performed simply because the scores might not be rela-
tive to one another. Again, all the answers are subjective. In short, the properties of
the actual variable are not known. You will see, later in the book, that we can use other
statistical tools to look at the mid-point of the data distribution.
Interval Data
Interval data are the first of two types of data that are called quantitative or continuous.
By this, we mean that a data value can hypothetically fall anywhere on a number line
within the range of a given dataset. Test scores are a perfect example of this type of
data since we know, from experience, that test scores generally range from 0 to 100
and that a student taking a test could score anywhere in that range.
As suggested by its name, the number line shown in Table 2.4 represents the pos-
sible range of test scores and is divided into equal increments of 10. Given that, the
10-point difference between a grade of 70 and a grade of 80 is the same as the differ-
ence between a grade of 10 and a grade of 20. The differences they represent how-
ever are not relative to one another. For example, if a person scores 80 on the exam,
their grade is four times larger than that of a person who scores 20 on the exam. We
cannot say, however, that the person who scores 80 knows four times more than the
person who scores 20; we can only say that the person with the 80 answered 80% of
the questions correctly and that the person scoring 20 answered 20% of the questions
correctly. We also cannot say that a score of zero means they know nothing about the
Identifying the Independent and Dependent Variables | 41
subject matter, nor does a score of 100 indicate absolute mastery. These scores only
indicate a complete mastery or ignorance of the questions on the examination.
The point is that even though the score data are measured in equal intervals on
this type of scale, the scale itself is only a handy way of presenting the scores with-
out indicating the scores are in any way related to one another. Another example of
interval-level data is temperature, as shown in Table 2.5.
Ratio Data
Ratio data are also classified as quantitative or continuous data. Ratio data differ from
interval data because they do have an absolute zero point and the various points on
the scale can be used to make comparisons between one another. For example, weight
could be measured using an interval scale because we know that the relative difference
between 20 pounds and 40 pounds is the same as the relative difference between 150
pounds and 300 pounds; in both cases the second value is twice as large as the first.
There are, however, three important distinctions between interval and ratio data.
First, a value of zero on a ratio scale means that whatever you are trying to mea-
sure doesn’t exist; for example, a swimming pool that is zero feet deep is empty. Sec-
ond, ratio data allow us to establish a true ratio between the different points on a
scale. For example, a person who owns 600 shares of a company has twice as many
shares as a person owning 300 shares; the same person would own six times as many
shares as a person owning 100 shares. Third, this added degree of precision allows us
to use ratio scales to measure data more accurately than any of the other previously
mentioned scales. Other examples of ratio-level data include distance and elapsed
time; one of my favorite examples, annual income, is shown in Table 2.6.
42 | STATISTICS TRANSLATED
FIGURE 2.4. The dependent variable and types of data that may be collected.
Identifying the Independent and Dependent Variables | 43
TABLE 2.7. Relationship between Data Type and Appropriate Statistical Tests
Parametric or
Type of data Quantitative? Nonparametric statistics?
Nominal (categorical) No Nonparametric
Ordinal (rank) No Nonparametric
Interval Yes Parametric
Ratio Yes Parametric
TABLE 2.8. Dataset for Nominal, Ordinal, Interval, and Ratio Data
Student Math
number Gender Ethnicity Attitude Math Exam Absences
1 F H 5 92 7
2 F NHW 3 91 4
3 M AA 3 90 0
4 M NHW 3 89 8
5 M AA 5 88 3
6 F NHW 1 87 3
7 F H 2 86 3
44 | STATISTICS TRANSLATED
As you can see by looking at the headers, the contents of the columns are self-
explanatory, but let’s take a closer look and decide what type of data they represent.
First, look at the column labeled Gender. Since the valid values for this field are M
for male students and F for female students, what type of data are we collecting? That
is easy; since the value of the field represents a category a given student falls into,
we know it is nominal-level data. What about the field called Ethnicity? It is also an
example of nominal data since we generally say a student falls into only one ethnic
category. In this case, the categories are AA (African American), NHW (non-Hispanic
White), and H (Hispanic).
The next column, Math Attitude, contains ordinal data. Students are asked to
rate their attitude toward math on a scale of 1-Dislike, 2-Somewhat Dislike, 3-Neutral,
4-Somewhat Like, and 5-Like. Keeping in mind that this is subjective and represents
each student’s opinion, the values are not relative to one another. In Table 2.9, I’ve
added subtitles under each of the variable names showing their data type; I’ve also
added a second ordinal value, College Importance, and ask students to rate their
interest in going to college on a scale of 1-No Interest, 2-Undecided, and 3-Interested.
Again, this is a given student’s personal opinion; the factors that influence one stu-
dent’s interest in college may not be relevant to another student.
TABLE 2.9. Expanded Dataset for Nominal, Ordinal, Interval, and Ratio Data
Math College
Student Gender Ethnicity Attitude Math Exam Absences Importance
number (Nominal) (Nominal) (Ordinal) (Interval) (Ratio) (Ordinal)
1 F H 5 92 7 1
2 F NHW 3 91 4 2
3 M AA 3 90 0 3
4 M NHW 3 89 8 2
5 M AA 5 88 3 2
6 F NHW 1 87 3 2
7 F H 2 86 3 3
Now, what type of data does “math exam” represent? In order to answer that
question, let’s first assume that achievement scores range from the traditional 0 to
100. Given that, we can narrow down our choices to interval or ratio level data. The
next question is, “Does a math exam have an absolute zero point?” From our previous
discussion, we know the answer is “no.” A student may make a zero on the exam, but
that does not mean she knows absolutely nothing about math; nor does 100 indicate
absolute mastery of the topic. We also cannot make comparisons or ratios using the
data; a person scoring 80 on a test does not necessarily know twice as much about
math as a person who scores 40; they simply scored twice as much on that given exam.
Knowing that, the data are interval in nature.
Finally, let’s look at the field titled Absences. We know these data are quantitative
in nature, but this time an absolute value of zero is possible; a student may have never
missed class. We can also make meaningful comparisons between the values; a stu-
dent who has been absent four times has been absent four times as many as a student
who has been absent once; this means the data are measured on a ratio scale. Remem-
Identifying the Independent and Dependent Variables | 45
ber, however, it is generally not important to know if the data are either interval or
ratio level. Just knowing whether the data we are collecting are quantitative is enough.
Sx
x=
n
Now, we can use the values in our table above to fill in the formula:
92 + 91 + 90 + 89 + 88 + 87 + 86
x=
7
This gives us:
623
x=
7
By dividing 623 by 7, we can see that the average math exam score is 89, just as
we computed above:
x = 89
Remember, we very rarely know the population mean, but, on any of the rare
occasions where you might know all the values in a population, we can compute it for
a population using the same formula. The only difference is that we would substitute
the lower-case Greek letter mu (i.e., μ) for x and an upper-case N as the number of
members in the population. The equation would then look like this:
46 | STATISTICS TRANSLATED
Sx
m=
N
Before we move forward, let me point out something here that might easily be
overlooked. Did you notice that we used a lower-case x to refer to the value we com-
puted for a sample and then we used a Greek letter to refer to that same value for
a population? That idea holds true throughout. If we are calculating a value for a
population, we’ll use Greek letters; for samples, we’ll use English letters. The ability to
recognize and understand these seemingly insignificant things is a big part of becom-
ing a good consumer of statistics!
Calculating the average for quantitative data is commonplace, but it does not make
sense to calculate it for nominal or ordinal data. For example, can we calculate the
average gender? The real question is, “Why would we want to calculate an average
gender? A person is either a male or a female!” Despite already knowing that, let’s try
calculating the mean for gender and see where it takes us.
First, we already know that we must have numeric values to compute the mean of
data we have collected. In this case, since gender is not represented by numeric values,
let’s go so far as to replace all the values of M and F with numeric values of 1 and 2,
respectively; we can see this in Table 2.10.
We could then use these values to enter into our equation for a sample mean:
2 + 2 +1+1+1+ 2 + 2
x=
7
Using these data, we would compute an average of 1.57. Is this meaningful to us?
No, because it doesn’t tell us anything meaningful about our data.
The same holds true if we try to calculate the mean for ordinal data. For example,
let’s use the data for College Importance from Table 2.9 and calculate the mean. Just
as before, we would add all the values of importance leaving us with 15. When we
divide that by the 7, the number of values in the dataset, it leaves us with an average of
2.14. Keeping in mind that each student’s attitude is subjective, this value doesn’t tell
Identifying the Independent and Dependent Variables | 47
us anything about the overall group. A college importance ranking of 2 by one student
could mean something entirely different than a ranking of 2 by another student.
The Median
The median is the score that falls in the exact center of a sorted dataset and can be
used with ordinal, interval, or ratio data. Because there is no commonly agreed upon
symbol for either the population or sample median, we will use “Mdn” in this book.
To see an example of a median, let’s use the absences from our original table shown
again in Table 2.11; if we sorted them, lowest to highest, we could see that the 3 is right
in the middle.
Let’s expand our findings using the median score of the math exam. To do that,
we would sort the scores, from highest to lowest, as shown in Table 2.12. From that,
we can see that the median score belongs to a non-Hispanic White male with a math
exam grade of 89.
In cases where you have a larger dataset and you cannot readily see the median,
there are formulas to help us locate the median of a dataset. There are two ways to
do this; the formula you use depends on whether you have an even or odd number of
values in your set of data.
48 | STATISTICS TRANSLATED
In order to find the midpoint of a dataset with an odd number of values, add 1 to the
total number of items in the dataset and then divide the total by 2. Using the data
above showing ethnicity and math exam information, we can find the midpoint of the
data using the following steps:
4. Midpoint = 4
Remember, this simply means that 4 is the midpoint of the dataset; we must go
to that point in the table to determine the actual median value. For example, in
this case, if you look at the Math exam column in the table and count 4 up from the
bottom or 4 down from the top, you will see, based on the math scores, that the actual
median value is 89.
When we had an odd number of values in our dataset, it was easy to find the middle;
it is the value where there are just as many data values above it as there are below it. In
cases where there is an even number of values, our logic must change just a bit. For
example, let’s use the data from Table 2.9 and add information to the bottom, shown
in Table 2.13, about a student who scored 50 on the math exam.
Looking at this table, we can see that there is no middle record, so we cannot
readily point out the median of the dataset. In order to get around that, we have to
Identifying the Independent and Dependent Variables | 49
use the two records that fall in the middle of the dataset—in this case the two records
with the scores of 89 and 88. In order to find the median we would add these values
together and take the average; doing so would give us a median of 88.5.
We can use the same formula as before to find our midpoint but will wind up with
a fractional number:
4. Midpoint = 4.5
The resulting answer, 4.5, shows the position in the dataset of the median. Since
there are no fractional values in our table, it’s necessary to average the two numbers
in the fourth and fifth position. That again means you use 88 and 89. The resultant
average, just like before, is 88.5; this is our median. Another thing we need to notice
is that, while the test scores are ranked highest to lowest, the actual scores are not rel-
evant to one another. For example, look at the difference between the students ranked
first and second; we can see a difference of one point; the difference between the
seventh and eighth is 36 points. This means that, although the data are ranked, the
differences are not relative and computing a mean score tells us nothing about rank.
The median is also a good tool to use if our data are quantitative but are badly skewed;
this simply means that the values on one side of the mean are more spread out than
the values on the other side of the mean. We will talk about skewness in detail later,
but here is an example to help us get a basic understanding. Suppose we are interested
in moving to a neighborhood where a realtor tells us there are five homes and the aver-
age home price is $100,000. Upon hearing that, we decide to visit the neighborhood
where we find there are, indeed, five houses. Four of the houses are worth $10,000
each and one is worth $460,000. This is shown in Table 2.14.
While we might be disappointed with what the realtor told us, it was true; the
average home in the neighborhood is worth $100,000. However, since there are many
more values below the mean than there are above the mean, it would have been better
if the realtor had told us that the median value was $10,000. That would have been a
more meaningful indicator of what the average home in the neighborhood is worth.
50 | STATISTICS TRANSLATED
The median is the best measure of central tendency for ordinal-level data, as well as
badly skewed quantitative data, but we cannot use it with nominal data. For example,
Table 2.15 presents a list showing the highest college degree earned by employees of a
marketing company. In this case, we can see that the midpoint in the list represents
a person with an MBA, but what does that tell us? Absolutely nothing; it tells us the
center point of an ordered list and that is it. As far as we know, everyone on the list has
an MBA.
TABLE 2.15. Position of Median Degree Earned
Degree earned
BS
BS
BS
BS
MBA
MBA
MBA
MS
PhD
The Mode
Up to this point we have talked about the mean and the median; both can be used
to help us examine the central tendency of a dataset. As we saw, the mean cannot be
used with nominal and ordinal data, and the median cannot be used with nominal
data. Fortunately, we have another tool, the mode, which can be used with all types
of data. The mode is nothing more than the value that occurs most often in a dataset.
Like the median, there is no one standard symbol, so we will use Mo.
If we look at the Absences field from our original dataset (shown as Table 2.16),
we can see that our mode is 3 since it occurs most often; we can further see that F is
our modal value for gender since we have four females and three males.
The idea of the mode is straightforward, but we must be aware of a small issue
that arises when we have an equal number of any given value. For example, suppose
we added another student, whose ethnicity is AA, to the data; what then would our
mode be for Ethnicity? Since we would then have three NHWs, three AAs, and two
Hs, it appears we do not have a mode since no one value occurs more often than the
others. Since NHW and AA occur an equal number of times, we have two modes; we
call this a bimodal dataset. In other instances, you might find three, four, five, or even
more modes in a dataset. Instead of getting complicated and trying to use a different
name for each distribution, we just refer to any dataset having more than one mode
as multimodal.
So, after all of that, we have three measures of central tendency: the mean, the
median, and the mode. Table 2.17 helps clarify exactly when to use each.
TABLE 2.17. Relationship between Data Type and Measure of Central Tendency
Nominal Ordinal Interval Ratio
Mean No No Yes Yes
Median No Yes Yes Yes
Mode Yes Yes Yes Yes
Within the table you see four columns; the first column shows valid values for
Gender—F and M. Knowing that this is nominal data, we could easily compute a mode
of 7 since there are seven females represented in the table. The second column repre-
sents a patient’s self-perception of their health. Values range from 1 (Poor) to 3 (Very
Good). From our earlier discussion, we know it is appropriate to compute both the
mode and median for ordinal data; here the mode is 2 since it occurs four times. We
could sort the patients based on the self-perception of their health and, since our data-
set contains an even number of cases, we would average the fifth and sixth values to
determine the median; in this case, it is also 2.
The third column, labeled Anxiety, represents a person’s score, from zero to 100,
based on a survey they completed. Knowing that, is this an interval or ratio-scale item?
We can determine that by answering two questions. First, does anxiety have an abso-
lute zero point? In this case, it does not. Although some of the patients have a score of
zero, it doesn’t mean they do not experience anxiety in their lives; it only means they
have no anxiety as measured by the survey or scale used as a measurement instru-
ment. Second, is a person with a score of 80 twice as anxious as a person with a score
of 40? They are not; the scores only represent each person’s anxiety level relative to the
scale. Now, knowing the answers to both of those questions leads us to determine that
the data are on an interval scale. Because of that, we can compute a mean of 56.5, a
median of 60, and a mode of 100.
Next, we have the number of minutes that each person exercised daily. Using this
data, can ratios be made? Yes, a person who exercised for 60 minutes exercised twice
as long as a person who exercised for 30 minutes. There is also an absolute zero—if
the person did no exercise at all, their score would be zero. Knowing both things, we
can feel safe in saying that the data are ratio level. As we did with interval data, we can
compute a mean, in this case, 35.5 and a median of 42.5. We can also see that the data
are bimodal since we have two occurrences each of 45 and 60.
Everything we have computed so far is shown in Table 2.19.
We will use the Name column to identify each variable and then use the remainder
of the columns in that row to define the characteristics of that variable. The first row
represents Gender. It is a string variable Type with a maximum Width of one char-
acter; obviously there are no decimal points. The label “Gender” will be used when
referring to this variable on output produced by SPSS. In this case, by clicking on the
actual variable name, using the Variable Labels dialog box, we have created two Val-
ues. If the actual data value is F, it will show as “Female” on any reports we create; a
value of M will show as “Male.” We have no entry in the Missing column, which means
the system will not automatically enter values if a data value is missing; Column and
Align indicate that there are eight spaces, with the value left justified, on any output
we create. The measure, of course, is nominal, and the Role label indicates this field
will be used as input.
The next variable, Perception, represents a person’s self-perception of their health
and is a numeric value between 1 and 3, no decimal points are allowed. The variable
Label is “Health Perception” and there are three possible values: 1 (Poor), 2 (Good),
and 3 (Very Good). The Missing, Column, and Align categories work the same as
for the Gender value, but, in this case, since the patients are ranking themselves, the
Measure is ordinal.
The remaining two columns show levels of Anxiety and the amount of Exercise.
In this case, the data are quantitative, either interval or ratio; SPSS simply labels them
both as “scale” data. The remaining columns are self-explanatory. From here, we need
to analyze our data, so let’s go back to the Data View screen and enter the data from
above.
In Figure 2.8 you can see that we’ve entered the data from Table 2.18, so we can go
ahead with our analysis. First, as seen in Figure 2.9, we’ll select the Analyze command,
Descriptive Statistics and Frequencies. This will be followed by the screen shown in
Figure 2.10. In this case, in the Frequencies: Statistics box we will select Mean, Medi-
Identifying the Independent and Dependent Variables | 55
an, and Mode, followed by Continue; we will then select OK in the left box. SPSS will
then produce the output shown in Figure 2.11.
It’s readily apparent that the values shown in this box are exactly equal to those
we computed earlier. Three things bear mentioning, however. First, as we just said,
Perception is ordinal but since the values are numeric, SPSS computes the mean,
median, and mode. Second, we can see there is a lowercase “a” next to the Mode for
Exercise. This only occurs when the data are multimodal, and the software displays
the lowest value; this is explained in the note below the table. Finally, since Gender
FIGURE 2.9. Selecting the Descriptives and Frequencies commands on the SPSS Data View
spreadsheet.
56 | STATISTICS TRANSLATED
FIGURE 2.10. Using the Frequencies command on the Data View spreadsheet.
is nominal, none of the descriptive statistics other than N (i.e., the number of values)
are produced.
Statistics
Health
Gender Perception Anxiety Exercise
N Valid 10 10 10 10
Missing 0 0 0 0
Mean of a population:
Sx
m=
N
Mean of a sample:
Sx
x=
n
Quiz Time!
Let’s start by looking at the following hypotheses. Read each one and then identify the inde-
pendent variable and its levels; then explain why the levels are either manipulated or nonma-
nipulated. Following that, identify the dependent variable and the type of data it represents.
2. There will be a significant difference in the number of males and females working
as computer programmers in corporate America.
3. There will be a significant difference in the number of females in computer science
classes between students in the United States, France, and Russia.
58 | STATISTICS TRANSLATED
4. There will be a significant difference in weight gained during their freshman year
between students who live at home and students who live away from home.
5. There will be a significant difference in first-year salaries between graduates of Ivy
League schools and graduates of state universities.
6. Administrative assistants who work in cubicles are significantly less productive
than administrative assistants who work in enclosed offices.
7. Primary-care patients who are treated by an osteopathic physician will have sig-
nificantly fewer health problems than primary-care patients who are treated by an
allopathic physician.
8. Truck drivers who check their tire pressure frequently will have significantly higher
miles-per-gallon than truck drivers who do not check their tire pressure frequently.
9. Insurance companies that use computer-dialing services to call prospective clients
will have a significantly lower number of sales than insurance companies that use
live agents to call prospective clients.
10. The rankings of favorite sporting activities will be significantly different between
Mexico, the United States, and Canada.
Let’s check to make sure you understand how to compute the basic measures of central ten-
dency using the data in Table 2.20. Let me warn you, be careful! Some of these answers are not
as obvious as they might appear! You will be able to check your answers at the end of the book.
Measures of Dispersion
and Measures of Relative Standing
Introduction
In this chapter we’ll continue with Step 4 of our six-step model by learning how to
compute and understand measures of dispersion and measures of relative standing.
When we measure dispersion, we are simply looking at how spread out our dataset
is. Measures of relative standing allow us to determine how far given values are from
each other as well as from the center of the dataset. As usual, we’ll learn how to
compute everything manually, but then we’ll use SPSS to show us how easy it really
is.
Measures of Dispersion
Measures of dispersion are only used with quantitative data and help us answer the
question, “How spread out is the dataset we plan to use?” For example, I might tell
you I own paintings that average about $300 each; I might also tell you that I paid as
little as $25 for some of my paintings and as much as $1,000 for others. Based on that,
you could say that the amount I’ve paid for paintings is widely dispersed around the
mean. In fact, if we subtract $25 from $1,000, we can see that the amount I’ve paid for
pictures could fall anywhere between these two values; in this case a range of $975.
Computing the range is the most common of the measures of dispersion, but we’ll
also look at the standard deviation and variance. All three measures are critical for the
accurate computation of inferential statistics.
60
Measures of Dispersion and Relative Standing | 61
The Range
As we just saw, the range tells us how far apart the largest and smallest values in a
dataset are from each other; it is determined by subtracting the smallest value in the
dataset from the largest value in the data. For example, in Table 3.1, when we subtract
25 from 75, we have a range of 50; we call this our computed range. In this case, we also
have a mean score of 50.
While this is very easy to compute, we must be careful in its interpretation when we
are interested in the dispersion of the data. Remember, the purpose of the measures
of dispersion is to understand how the data are spread out around the center of the
dataset. Using the range to help us understand this is very imprecise and can be very
misleading. For example, what does a range of 50 tell us? What can we learn about how
the data are dispersed around the center of the data by computing it? Again, let’s use
the data in Table 3.1 and look at some of the issues we might encounter.
We already know that both our mean and range are 50. In this case, all we can
tell by using the range as a measure of dispersion is that the data are evenly spread
out around the average. This can be seen very clearly; the smallest value is 25 points
below the mean, the largest value is 25 points above the mean, and the data values are
spaced at equal increments of 5 points each.
We must be careful, however, because we can often compute the same range for
two datasets that have the same mean score and range, but their dispersion will be
completely different. For example, as in the previous example, the lowest value in
Table 3.2 is 25 and the largest is 75; this leaves us with a range of 50.
TABLE 3.2. Range of Data Values from 25 to 75 Clustered around the Mean
25 42 43 44 45 50 51 52 53 54 75
In this case, although we can see that the range is somewhat large, most of the
values are clustered very closely around the mean. This isn’t to say that the range can-
not be used effectively; it is perfectly acceptable for a given task while remembering
its limitations. For example, investors might want to look at the range of values that a
stock sold for on a given day, or a college professor might be interested in knowing the
overall spread of test scores for an examination. In both cases the results are meaning-
ful, but the practical application of the range is often limited.
to 800 and, since there are 100 senators in the United States Congress, the possible
range of the number of “yes” votes on any piece of legislation would be 0 to 100. We
can also establish our own possible range by simply deciding the boundaries we are
looking for; we’ll use shopping mall sales pitches as an example.
We have all experienced those people who approach you in the mall asking for
you to try one product or the other; if you’re like me, you pretend you don’t notice
them and walk away as quickly as possible! Other people, who I guess are more will-
ing to part with their hard-earned money than I am, stop to talk and try out the new
product. In this case, suppose our marketers are interested in knowing the appeal of
a new cologne to males in the 20 to 40 age bracket; when we subtract 20 from 40, we
have a possible range of 20. While that isn’t terribly interesting by itself, we can use the
possible range and the computed range together to help in decision making.
For example, if the employee of the marketing company asked each male trying
the new product their age, the marketing company could use the data to compute the
range of mall patrons who tried the product. In this case, let’s say the youngest person
who rated the product was 20 and the oldest was 32; this would result in a computed
range of 12. If we were to compare our computed range of 12 to our possible range
of 20, we would see that our computed age range is somewhat spread out since it
accounts for 60% of the possible range. At the same time, since our sample only repre-
sents the opinions of the younger men, in order to really understand the appeal of the
cologne to their targeted age group, the marketing staff would need to get opinions
from the remaining 40%, all of whom would fall in the 33 to 40 age range. We can see
this entire idea demonstrated in Table 3.3.
Let’s use the same possible range but, this time, the youngest person trying the
product was 28 and the oldest was 34; this would give us a computed range of 6. Since
our possible range is 20, we’ve only accounted for 30% of our possible range—not a
very wide dispersion. In order to get responses from the entire possible range, the
marketers would need to focus their efforts on males in the 20 to 27 age range as well
as those in the 35 to 40 age range. As you can see, range only tells us is how spread out
the data are and what percentage of the possible range can be accounted for by the
computed range. This is shown in Table 3.4.
Here the data are sorted and range from the smallest value on the left (i.e., 64
inches), to the largest value on the right (i.e., 72 inches). The mean, 68, falls right in
the center of the dataset. Although we won’t go into the calculations now, the stan-
dard deviation for this data is 2.74; again, this approximates the average distance each
person’s height is from the average height. How do we apply this to the overall idea of
dispersion? Simple: the larger the standard deviation, the more spread out the heights
are from the mean; the smaller the standard deviation, the closer each score is to the
mean.
In order to better demonstrate this, in Table 3.6 let’s change the data we just used
so that the values above and below the mean are more widely spread out; I have illus-
trated this with the arrows moving away from the mean in both directions. While the
mean remains the same, the standard deviation increases to 5.48. This tells us that the
approximate average distance any value in the dataset is from the mean is larger than
when the data values were grouped closer to the mean.
The exact opposite happens if we group the data closer to the mean of the data-
set. In Table 3.7, the mean is again 68, but the values are much more closely clustered
around the mean. This results in a standard deviation of 1.58, much smaller than the
original standard deviation of 2.74.
64 | STATISTICS TRANSLATED
TABLE 3.8. Using the Standard Deviation to Compare Values in One Set of Data
Annual sick days
2 5 5 6 6 6 7 7 10
average
Herein, the average employee calls in sick 6 days each year. If we computed the
standard deviation, we would find that the average number of sick days taken is 2.12
days away from the mean. For clarity’s sake, we’ll round that off to 2 meaning that the
average number of days employees call in sick ranges from 4 (the mean of 6 minus
one standard deviation) to 8 (the mean of 6 plus one standard deviation). Anything
outside of this is not in the average range. In this case, the person who used only two
sick days is below average but is still acceptable. Obviously, this is a good thing unless
they are coming to work sick when they should be staying at home! On the other end,
we might want to talk to the person taking 10 days a year; this might be a bit excessive.
We can also make comparisons of datasets using the standard deviation. For
example, suppose we are a new fan of professional football in the United States and
want to know as much about it as possible. One thing that interests you is the equality
between the two divisions—the National Football Conference (NFC) and the Ameri-
can Football Conference (AFC). Let’s use Table 3.9 to represent the total points scored
by each team during a season in order to compare the two conferences. Yes, I know
there are more than nine teams in each conference, but this page is only so wide!
As shown in Table 3.10, the NFC teams score, on average, 330 points per season
while the AFC teams score 380 points per season; their range is 56 and 610, respec-
Measures of Dispersion and Relative Standing | 65
tively. At this point, we might decide to head to Las Vegas; it’s apparent that the
AFC teams are much better than the NFC teams—maybe there is money to be made!
Before we bet our hard-earned cash, however, let’s look a little deeper into how the
team scores are dispersed within the two leagues. I’ve already computed the mean,
average range, and standard deviation for each league. Let’s use this to compare their
performance.
First, the standard deviation for the NFC teams is 19.4 and 213.11 for the AFC
teams. This means, on average, the NFC teams score between 310.6 and 349.4 points
per season while the AFC teams score, on average, between 166.9 and 593.11 points
per season. What does this tell us? The large standard deviation means the AFC
scores are very spread out—there are some very good AFC teams and some underper-
forming AFC teams. NFC teams, while they have a lower overall average, are more
evenly matched. What does this mean? When an AFC team plays an NFC team, do
we always bet on the AFC team because of their higher overall average? No, it would
probably be better to look at the individual teams when we are placing our wagers!
Σ ( x − x )2
s=
n −1
Although there are quite a few symbols in this equation, it is actually very easy to
compute. First, let’s make sure we know what we are looking at:
Now that we understand all of the symbols, let’s go step by step through the compu-
tation; after that, we’ll use it with our data to see what the actual standard deviation is:
1. (x − x )2 means to subtract the average score of the entire dataset from each
of the individual scores; this is called the deviation from the mean. Each of
these values is then squared.
2. S is the Greek letter sigma and indicates we should add together all of the
squared values from Step 1.
3. n – 1 is the number of values in the dataset minus one. Subtracting one from
the total number helps adjust for the error caused by using only a sample of
data as opposed to the entire population of data.
Let’s go back and verify the computations for the NFC scores above by going
through Table 3.11.
In the first column, we see each of the NFC scores in our sample. When we sub-
tract the sample mean, shown in the second column, from each of those values, we see
the difference shown in column three; this is our deviation from the mean. Notice, if
the observed value is less than the mean, the difference will be negative. The fourth
column shows the result when we square each of the values in the third column
(remember Math 101; when we square a negative value, it becomes a positive value).
The sum of all the squared values, 3012, is shown at the bottom of column four.
Remembering that n – 1 is the total number of values in our dataset, minus 1, we can
now include all these values in our formula:
3012
s=
8
The square root of 3012 is 376.5; when we divide that by 8, we have computed a
standard deviation of 19.4—believe it or not, it’s really that easy! We will verify what
we’ve done using SPSS in only a few pages, but first let’s discuss a minor inconvenience.
Measures of Dispersion and Relative Standing | 67
When we first began our discussion of the standard deviation, we said we could think
of it as the average distance that each value is from the mean of the dataset. Given
that, when we look back at columns three and four in the table above, the obvious
question becomes, “If we are interested in knowing the average distance each value
in our dataset is from the mean, why do we even need the fourth column? Why don’t
we just add the deviations from the mean and divide them by the number of values in
the sample?”
Unfortunately, it’s not that easy. As you can see in the column labeled “Deviation
from the Mean,” if we computed the sum of these values, the answer would be zero.
Because the sum of x − x is an integral part of our equation, that means the standard
deviation would always be zero. We avoid that by squaring each of the values; this,
of course, means that any deviation from the mean less than zero becomes a positive
value. After squaring those values, we then divide the sum of those values by n – 1,
giving us an approximation of the average distance from the mean.
At this point, we run into yet another problem! Since we have squared each of the
values, we are no longer measuring the standard deviation in terms of the original
scale. In this case, instead of a scale representing the points scored by each division,
we have the squared value of each division. In order to transform this value back into
the scale we want, all we need to do is take the square root of the entire process; that
gets us back to where we want to be.
S ( x − m)2
σ=
N
As you can see, since we are now computing a population parameter, we have
changed the symbol for the standard deviation to the lower-case Greek letter sigma
(i.e., σ). We have also included the Greek letter mu (i.e., μ), which represents the mean
of the population. Since we are no longer worried about the inherent error caused by
sampling, we can drop the idea of subtracting 1 from our denominator and replace
it with an uppercase N. This means that we are using all the values in the population.
The overall concept, however, is the same. The population standard deviation simply
shows the dispersion of the dataset when we know all the possible values in the popu-
lation.
The Variance
The variance is a third tool that is used to show how spread out a dataset is and inter-
preting it is easy. If the variance value is large for a given dataset, the data values are
more spread out than if the variance value is small. You are probably thinking, “That
sounds too simple, what’s the catch?” The good news is there’s not a catch. Let’s look
68 | STATISTICS TRANSLATED
at the formula for calculating the variance of a sample and then put some numbers
into it to show you what I mean:
S ( x − x )2
s2 =
n −1
If it seems that we’ve seen some of this before, you’re right. Other than a couple
of minor changes, this is nothing more than the formula for the standard deviation
for a sample. We are now using s2, the symbol for the sample variance, instead of s,
which is the symbol for the sample standard deviation. The fact that we are squaring
the s value only means that we are no longer taking the square root of our computa-
tions, because of that, we must remove the square-root symbol. In other words, if we
know the standard deviation, all we need to do is multiply it by itself to compute the
variance. For example, since we know the standard deviation for our NFC football
scores is 19.4, all we need to do is multiply that by itself leaving a variance of 376.4.
That means, of course, that if we know the variance of a set of data, its square root is
the standard deviation. The obvious question, then is, “Why do we even need to com-
pute the variance since it’s just another measure of dispersion? Can’t we just use the
standard deviation in its place?”
The answer to those questions is simple; in most cases we can use the standard
deviation in our statistical inferential decision making. Certain statistical tests, how-
ever, use the variance as the tool for measurement.
TABLE 3.12. Computing Distances from the Mean with a Small Amount of
Dispersion
Average = 6
Standard deviation = 2.16
3 4 5 6 7 8 9 Variance = 4.67
Distance from –3 –2 –1 0 1 2 3 Sum = 0
the mean
Standard deviations –1.39 –.926 –.463 0.00 .463 .926 1.39 Sum = 0
from the mean
Measures of Dispersion and Relative Standing | 69
In order to interpret these data, you must look at them from several perspectives.
First, just as is the case with the range and standard deviation, the greater the
variance, the more spread out the data are. Unlike the range and standard deviation,
however, the variance can be smaller than, equal to, or larger than the actual range of
the data. In this case, our range (6) is greater than our variance (4.67); this indicates
that the values are not very spread out around the mean, but that’s not always the case.
For example, suppose the following data represent the number of times an hour
our neighbor’s dogs bark; our imaginary neighbor has seven dogs, therefore we have
seven values. If we computed the range and variance for the data, we would find that
the variance of 1108.33 is much larger than the range of 85. As shown in Table 3.13,
this means the number of barks per dog is highly dispersed around the mean of 50;
some dogs bark quite a bit, others much less so.
TABLE 3.13. Computing Distances from the Mean with a Large Amount of
Dispersion
Average = 50
Standard deviation = 33.29
5 15 30 60 70 80 90 Variance = 1108.33
Distance from –45 –35 –20 10 20 30 40 Sum = 0
the mean
Standard deviations –1.35 –1.05 –.601 .300 .601 .901 1.20 Sum = 0
from the mean
Second, the variance can never be negative simply because it is the standard
deviation squared. Unlike the situation when we use the standard deviation while
computing inferential statistics, this means we will not be canceling out like values
above and below the mean of a dataset. This will allow us, later in the text, to use
specific statistical tests based on the idea of comparing variance between two groups
to test a hypothesis. For now, however, just remember that the larger the variance, the
more spread out the data values are away from the mean.
Third, we must remember that both the standard deviation and variance measure
dispersion around the center of the dataset, but there’s a distinct difference. Since the
variance is the squared standard deviation, this means it is no longer representative of
the original scale. For example, if we weigh a group of students and compute an aver-
age of 100 and a standard deviation of 10, it means a person weighing 90 pounds would
be one standard deviation, measured in pounds, below the mean. The variance of the
dataset, since it’s the standard deviation squared (i.e., 100 in this case), simply shows
relative position from the center of the dataset and it is not in the original scale. Again,
all we know is that a larger variance value means that the data are more spread out.
FIGURE 3.1. Selecting the Frequencies command on the Data View spreadsheet.
Measures of Dispersion and Relative Standing | 71
FIGURE 3.2. Using the Frequencies command to generate the standard deviation, variance,
and range.
Percentiles
Many of us are used to hearing the word percentile, especially when discussing test
scores, ability, or aptitude. For example, we might hear that Bob, one of our friends,
scored in the 80th percentile on a physical fitness test. What does that mean though?
Is this good or bad? While this sounds good, we wouldn’t really know anything about
Bob’s physical fitness unless we looked at his actual score on the physical fitness test.
His percentile score is simply the per-
centage of people scoring below that
Statistics value. With that definition, all we know
is that Bob scored higher than 80% of
Scores
the others taking the same fitness test.
N Valid 9 Bob might be in bad physical shape, but
he’s better off than 80% of the others!
Missing 0 Let’s look at a detailed example.
Suppose we have a group of 11
Std. Deviation 19.40361 newly hired attorneys in our company
Variance 376.500 and, while they all passed the bar exam,
there were some very low scores. One
Range 56.00 of the young attorneys, a real go-get-
ter, knew his score of 79 was fairly low
FIGURE 3.3. Output from the Frequencies com- and was trying to determine a way to
mand. “stand out in the crowd.” Having heard
72 | STATISTICS TRANSLATED
that statisticians are wizards at making bad things look better, the young attorney
approached me and told me his dilemma. While I disagreed with the “wizard” part, I
did tell him there’s another way he could look at his score. He could compare himself
to the rest of the newly hired attorneys and determine, using a percentile score, where
he ranked in relation to them. Although he would still have exactly the same score on
the bar exam, his percentile score might make him look better than he really is.
Knowing that, let’s use the following scores to represent the bar scores of all our
newly hired attorneys. Not knowing any better, I will just assume the scores on the bar
exam range from 0 to 100, with our group of attorneys having scores ranging from 70
to 100 (Table 3.14).
Bar scores 75
highest 75
to lowest
74
74
72
70
70
In this equation, “Below” represents the number of scores less than the one we are
interested in; “Same” means all values equal to the one we are interested in, and N, as
usual, means the total number of values in our dataset. Notice, N is upper-case here;
the 11 scores we have refer to the entire population of newly hired attorneys.
In this case, our attorney’s score is 79; there are nine values less than 79 and only
one greater than 79. We can substitute those values, along with N, into our formula,
and a few simple calculations will lead us to our percentile score:
Measures of Dispersion and Relative Standing | 73
At this point, it’s necessary to drop anything to the right of the decimal point;
we don’t round it off, but rather just drop it. This means things are looking better for
our young friend. Although his actual score was 79, he represents the 86th percentile.
This means his score is greater than 86% of the people taking the bar exam. This
could possibly make him look better in the eyes of his new employer; that’s what he is
betting on.
Again, some people find this perplexing; the person made a perfect score but
was in the 99th percentile; what happened? It’s easy when you think about our defini-
tion. What is the only score that is greater than 99% of the other scores? Easy, it’s 100.
There’s no way that a person could outscore 100% of people taking the bar exam; that
means they would have to outscore themselves. Impossible!
We can make this clearer with an example, but, before we do, let me warn you that
there are at least two different ways to compute quartiles. Statisticians tend to disagree
over which one is most representative; for our purposes, we’ll use these guidelines:
X Our definition of the median, represented by Q2, will remain the same; it’s
the 50th percentile.
X Quartile 1, represented by Q1, is the median of the lower half of the data, not
including the value Q2.
X Quartile 3, or Q3, is the median of the upper half of the data, again not
including the value Q2.
Let’s demonstrate starting with Table 3.16. Therein we have an odd number of
values in our dataset, so our median (Q2) is 5.
As we just said, we are going to compute Q1 as the median of all values less than
the median; remember, this is the same as the 25th percentile. In this case, the median
will be the average of 2 and 3, or 2.5. Q3 will be computed in the same manner; it is
the median of all values above the median; in this case the average of 7 and 8, or 7.5.
These are marked in Table 3.17.
TABLE 3.17. Computing Q1, Q2, and Q3 for an Odd Number of Values
1 2 3 4 5 6 7 8 9
5 is the
50th
2.5 is 25th percentile 7.5 is the 75th
percentile or Q1 (i.e., the percentile or Q3
median or
Q2)
In this table, we had an odd number of values; Table 3.18 shows us a dataset with
an even number of values
Knowing we have an even number of values, the median, Q2, is the average of the
middle two numbers; remember that 10 is the number of values in the dataset:
10 + 1
Midpoint =
2
Measures of Dispersion and Relative Standing | 75
Our results would show the median is in the 5.5th position. In this case, the median,
Q2, is 43 [i.e., (40 + 46)/2 = 43]. This would also mean that Q1 is the median of all
values to the left of 43. Remember, the value 40 is in our original data, and it is less
than 43, so we must include it. Given that, we have an odd number of values; 10, 15,
20, 30, and 40; we simply identify the one in the middle, 20, as our 25th percentile or
Q1. In order to compute Q3, the 75th percentile, we would use 46, 52, 58, 60, and 62
and locate the middle value. Here Q3 is equal to 58. All of this is shown in Table 3.19.
TABLE 3.19. Computing Q1, Q2, and Q3 for an Even Number of Values
10 15 20 30 40 46 52 58 60 62
20 is 43 is the 50th 58 is
the 25th percentile the 75th
percentile (i.e., the median percentile
or Q1 or Q2) or Q3
x−x
z=
s
Here we are using the symbols for the sample mean a sample standard deviation
(i.e., s) within the formula. Let’s use the data in Table 3.20. We have a mean of 10, and
to make life simple, let’s use a standard deviation of 1 to calculate the z score for an
observed value of 12.
12 − 10
1. z =
1
2
2. z =
1
3. z = 2
In this case, our z score is 2; this means our value, 12, is two standard deviations
above the mean. We might also find that the z score we compute is negative. For exam-
76 | STATISTICS TRANSLATED
ple, using the same standard deviation and mean score, along with an observed score
of 7, we would compute a z score of –3. This means our observed score is 3 standard
deviations below the mean of the dataset. We can see this as follows:
7 − 10
1. z =
1
−3
2. z =
1
3. z = –3
In most instances, our z score will not be a whole number; it’s quite common to
compute a fractional z score. For example, let’s use a mean of 15, a standard deviation
of 3, and an observed value of 20:
20 − 15
1. z = 3
5
2. z = 3
3. z = 1.67
This would indicate that the observed value, 20, is 1.67 standard deviations above
the mean.
When we get into the section of graphical descriptive statistics, we will revisit the
subject of z scores. We will find that we can use a predefined table to assign actual
percentile values to our z scores, thereby eliminating the problem of sorting all the
test scores and then determining overall percentile values.
T = (z * 10) + 50
You can see that all we need to do to compute T is to multiply our z value by 10
and then add 50 to it. If we used the z score of +1 we just computed using the math
score, the T-score for that same value would be 60; when our z score was –3, our
T-score would be 20 (i.e., –30 + 50).
When you compute the T-score, the average will always be 50 with a standard
deviation of 10. For example, in the prior example we had an average math score of
50. This means that if we had an observed value of 50, the z score would be zero. It
also means that when we are computing the T-score using a z score of zero, the answer
is still zero. If we add zero to 50, the mean of our T-score distribution is 50. This is
shown in Table 3.23.
78 | STATISTICS TRANSLATED
Now, let’s use our observed score of 65. Since it is three standard deviations above
the mean, our z score is +3; this means our T-score is 80. If we have an observed value
of 35, our z score is –3; this translates into a T-score of 20. Both examples are shown
in Table 3.24.
TABLE 3.24. Using the Mean, z Score, and T-Score to Compare Values
in Two Datasets
Observed Standard
Mean value deviation z score T-score
50 65 5 +3 80
50 35 5 –3 20
At this point, if we wanted to show these test scores to a parent, we could explain
that the T-scores range from 20 to 80, with an average of 50. This would provide them
with information in a format they are more used to seeing.
Stanines
Stanine (short for “standard nine”) scores divide the set of values we are looking at
into nine groups. Stanine scores are frequently used in education, especially to report
standardized test scores and to compare groups of students. Computing a stanine
score is very easy; simply multiply the z score by 2 and add 5:
Stanine = (z * 2) + 5
For example, let’s use a dataset with a mean score of 40 and a standard deviation
of 8. If we have an observed value of 56, our z score is 2.
56 − 40
1. z =
8
16
2. z =
8
3. z = 2
When we insert our z score of 2 into the stanine formula above, the result is a
stanine score of 9:
1. Stanine = (2 * 2) + 5
2. Stanine = 9
Measures of Dispersion and Relative Standing | 79
56 − 50
1. z =
7
6
2. z =
7
3. z = .857
1. Stanine = (.857 * 2) + 5
2. Stanine = 6.714
This would result in a stanine score of 6.714 which, when we removed everything
to the right of the decimal point, would result in an actual score of 6. By looking at
this example, educators can see that a student with a high stanine number is doing
better than a student with a lower stanine number and, in many instances, will use
these scores for ability grouping. For example, it is common to see children with stan-
ine scores of 1 through 3 included in a low-ability group, students with scores of 4
through 6 in an intermediate group, and students with the highest scores placed in an
advanced group.
1. When an observed value is greater than the mean, the z score will be greater
than zero. That will cause the T-score to be greater than 50 and the stanine
to be greater than 5.
2. When an observed value is equal to the mean, the z score will equal zero. This
will cause the T-score to equal 50 and the stanine to equal 5.
3. When an observed value is less than the mean, the z score will be less than
zero. This will cause the T-score to be less than 50 and the stanine to be less
than 5.
FIGURE 3.4. Using the Explore option of the Descriptive statistics command.
Measures of Dispersion and Relative Standing | 81
FIGURE 3.5. Using the Explore option to calculate descriptive statistics and percentiles.
of that, SPSS uses two different computations: weighted average and Tukey’s hinges.
Although I didn’t label it as such, in our manual calculations we used Tukey’s formula.
Because of that, you can see that the values we computed for the median, as well as the
first and second quartiles (i.e., the 25th and 75th percentiles), match exactly. Besides
this formula, what else was Tukey famous for? If you said that he coined the word
“software,” you’re right!
Finally, let’s see how SPSS computes z scores for us. Let’s start with the same
dataset but change the variable name to z score to avoid confusion; we’ll then select
Analyze and Descriptive Statistics (Figure 3.7).
Percentiles
Weighted
Average(Definition 1) Tukey's Hinges
Median and
Percentiles Median and Quartile Quartile
5 10.0000
10 10.5000
25 18.7500 20.0000
50 43.0000 43.0000
75 58.5000 58.0000
90 61.8000
95 .
Again, we’ll see the Descriptives screen, but this time we will do something a bit
differently. First, within the Descriptives box, we’ll check the box in the lower left that
says “Save standardized values as variables”; then we’ll select any of the statistics we
want in the Options box. In this case, we’ve asked for the mean score and the standard
deviation (Figure 3.8).
Once we select Continue and OK, SPSS will do two things. First, it will compute
and display a mean score of 39.3 and a standard deviation of 19.45964 for the variable
we called z score; this is shown in Figure 3.9.
More importantly, when we return to the Data View screen, shown in Figure 3.10,
we’ll see that SPSS has saved the actual computed z score for each value in the column
labeled Z score.
In short, it has done nothing more than put a “z” in front of whatever we named
the original variable. For example, we can see that the z score for 62 is 1.16652. This
means that 62 is 1.16652 standard deviations above the mean of 39.3. We can check
that with a bit of basic math:
62 − 39.3
1. z =
19.45964
22.7
2. z =
19.45964
3. z = 1.17
FIGURE 3.8. Using the Descriptive statistics option to compute the mean and standard
deviation.
Once again, we have shown that our calculations match those of the software, so
let’s wrap up this chapter.
Summary
In this chapter, we continued talking about Step 4, learning to understand how spread
out our dataset is, using measures of dispersion; we also learned how to determine
the relative position of a value within a dataset when compared to a given measure of
central tendency. We’ll continue with Step 4 of our six-step model by using graphical
tools to examine our data. Just like everything else we have talked about up to this
point, these tools are essential in deciding which inferential statistical tool we will use
to analyze our data.
Descriptive Statistics
FIGURE 3.10. Computed z scores created as a new column in the Data View spreadsheet.
Computed range:
Largest observed value minus the smallest observed value
Percentile:
[(Below + 1/2 Same)/N] * 100
Possible range:
Largest possible value minus the smallest possible value
Sample variance:
S ( x − x )2
s2 =
n −1
Stanine:
Stanine = (z * 2) + 5
T-score:
T = (z * 10) + 50
Quiz Time!
Table 3.26 shows a dataset representing the ages of employees working for three different com-
panies. Assuming a minimum working age of 18 and mandatory retirement at 65:
7. What does the standard deviation tell you about the dispersion of age in each com-
pany?
1. You can see in Table 3.27 that there are 10 sets of mean scores, observed scores,
and standard deviations. For each of these, compute the z score and T-score. Follow-
ing that, you should rank the mean values and compute the stanines.
2. Intelligence tests have shown that the average IQ in the United States is 100 with a
standard deviation of 15. What z score would enable students to qualify for a pro-
gram where the minimum IQ was 130?
3. Let’s assume that average income of families in the United States is $22,000; we
also know that families whose annual income is less than 2 standard deviations
below the mean qualify for governmental assistance. If the variance of the incomes
in America is $9,000, would a family making $19,500 a year qualify?
4. We’ve found that the average number of states that citizens of the United States
have visited is 10. If we have a sample standard deviation of 3, what is the z score for
a person who has visited 14 states?, 5 states?, 20 states?
CHAPTER 4
Graphically Describing
the Dependent Variable
Introduction
As we have already said, there are two ways of describing data—numerically and
graphically. We spent the last two chapters talking about the numeric methods: mea-
sures of central tendency, measures of dispersion, and measures of relative standing.
In this chapter we will focus on basic graphical tools, many of them already familiar
to you. Let me say, right off the bat, that we are only going to look at examples of
graphical statistics that were generated using our SPSS software. While many may
disagree, I think creating the graphs “by hand” is a waste of time. I learned to do that in
a basic statistics class years ago, and I’ve not had to do it since. We have the software
to create the graphs for us; why not use it?
87
88 | STATISTICS TRANSLATED
Although a table like this one is a perfectly acceptable way to describe our data
graphically, it doesn’t concisely show what we want to see. In order to try to get a better
understanding, we could break it down further by creating a frequency table (Table
4.2) showing the count of each value as well as the percentage of the total that each
frequency represents.
Figure 4.1 shows how we could use SPSS to verify this by entering the data into
the SPSS spreadsheet, then select Analyze, Descriptive Statistics, and Frequencies.
Selecting Frequencies brings us to the next step (Figure 4.2). By selecting Class as our
Variable, checking the box titled “Display Frequency Tables” in the lower left, and
clicking on OK, we get the output shown in Figure 4.3; as expected, it matches what
we computed manually in Table 4.2.
Pie Charts
Another graphical tool that we’ve all seen used quite often is a pie chart. It is noth-
ing more than a picture that looks like a pie, with each slice representing how often
a given value occurs in the dataset. Using the same data from the frequency table, if
you look closely at Figure 4.4, you will see options for many types of graphs. We will
look at many of these graphs in the following pages but, for now, select Graph, Legacy
Dialogs, and Pie. After selecting Pie, the dialog box shown in Figure 4.5 will appear.
We want our Slices to represent the number of cases and we have defined our cases as
Class. Entering OK results in the pie chart shown in Figure 4.6.
Graphically Describing the Dependent Variable | 89
FIGURE 4.2. Using the Frequencies command for class (nominal) data.
90 | STATISTICS TRANSLATED
Class
Cumulative
Frequency Percent Valid Percent Percent
Valid FR 6 16.7 16.7 16.7
Doesn’t this really make a difference? It becomes immediately clear, with just
a glance, that there are as many sophomores as there are all the other classes put
together. It is also obvious that there are an equal number of students in each of the
other three classes.
This verifies what we just saw in our frequency table. The percentage for fresh-
men, juniors, and seniors is the same (i.e., 16.7% each), and the number of sopho-
mores is the same as the other three groups combined (i.e., 50%). Obviously, the table
showing the frequency count is easy to read, but if you want to get your point across,
the picture is far more dynamic.
Bar Charts
Another easy way to present data is using a bar chart. As we saw when we looked at the
box where we indicated we wanted to create a pie chart, there were options for creat-
ing several other types of graphs. Rather than present all those input screens again,
Figure 4.7 shows us what SPSS would generate if we had selected Bar instead of Pie.
On the bar chart, the actual count for each of the variables is displayed; you can
FIGURE 4.5. Setting up the Pie command to use class (nominal) data.
tell that by looking at the values on the left side of the graph. We can verify what we
saw on the pie chart and in the descriptive statistics; the number of freshmen, juniors,
and seniors is the same, while the number of sophomores is equal to the number in
each of the other three groups combined.
Had we wanted to, we could have created a bar chart that showed the percent-
ages each of these values represented. If we had, the values on the left side of the
chart would have ranged from 0 to 100
and the values for each group would
have represented their percentage out
of 100 rather than their actual count. Of
course, the relative sizes of the bars for
each group would have been the same.
As was the case with the pie chart, this
bar chart really makes the data “come
to life.”
There is one additional thing to
note here. Later in this chapter we’ll talk
about a histogram, and you will see that
it’s very similar to the bar chart. The dif-
ference between the two is one of scale;
the bar chart is used to present nominal
data, while the histogram will be used
FIGURE 4.6. A pie chart created from the class when we are working with quantitative
data. data.
92 | STATISTICS TRANSLATED
Scatterplots
A scatterplot is used to show the relationship between paired values from two datasets.
In order to create the scatterplot, we will prepare the SPSS spreadsheet shown in Fig-
ure 4.8 to include the three variables.
Creating a scatterplot takes several steps, so let’s go through it carefully to ensure
Graphically Describing the Dependent Variable | 93
FIGURE 4.8. Data for year, tuition spent, and resignations in the Data View spreadsheet.
we understand each one. First, although not shown, we would select Graphs, Legacy
Dialogs, and Scatter/Dot. Then, as shown in Figure 4.9, we’ll ask for a Simple Scatter-
plot. We then can select Tuition to be on the x-axis (i.e., the horizontal axis) and Res-
ignations to be on the y-axis (i.e., the vertical axis). Notice, in Figure 4.10, that we’re
not including Year because it is not pertinent to the scatterplot itself. After clicking on
OK, SPSS creates the scatterplot shown in Figure 4.11.
The left and bottom sides of the box are labeled with the possible range of Resig-
nations and Tuition. Since our data didn’t indicate to the software what the possible
range is, SPSS created it by including a small amount above and below the actual
values. The possible range of values for Resignations goes from about 15 to 100, while
the possible range of Tuition goes from about $30,000 to $90,000.
SPSS then paired the variables, by year, and plotted them on the chart. For exam-
ple, in the first year of the program, there were 20 resignations and the company
spent $32,000 on tuition reimbursement. To plot that point on the chart, SPSS went
out to the right on the x-axis to a point equivalent to $32,000 and then up the y-axis to
a point equivalent to 20. The intersection of these two values is marked with a small
circle. This process is then repeated for each of the sets of values.
Although it is not necessary, SPSS can plot a line of best fit through the data points.
This line, shown in Figure 4.12, shows the trend of the relationship between the plotted
variables. Later in the book we will compute the actual correlation coefficient between
the two variables, but just by looking at Figure 4.12, we can see that the company
might have a reason to be concerned. It does seem that the amount of tuition money
spent and the number of resignations are related.
This plot represents a positive relationship; as one value goes up, the other goes
up. If the line of best fit was exactly 45 degrees, we would know that the relationship
between the two sets of variables is perfect in that, if we knew one value, we could
accurately predict the other. This doesn’t occur in most instances, however; we will
see lines that look nearly perfect, and, in other instances, the line will be nearly flat.
For example, using the data in Table 4.4, we could create a scatterplot and see
exactly the opposite. As one value goes up, the corresponding value tends to go down;
94 | STATISTICS TRANSLATED
FIGURE 4.10. Setting up the Scatterplot command to use the tuition and resignations data.
Graphically Describing the Dependent Variable | 95
this is known as a negative relationship. We can use these data to create Figure 4.13.
Here, the line of best fit goes slightly down from left to right; this indicates we have a
somewhat negative relationship. Let’s change the data in Table 4.4 to that in Table 4.5.
SPSS shows a flat, but slightly positive, line of best fit in Figure 4.14.
FIGURE 4.14. A slightly positive relationship between resignations and tuition paid.
Histograms
A histogram is really nothing more than a bar chart used to chart quantitative data.
We’ll use the data in Table 4.6, collected from our coworkers, when we asked them
how many minutes they usually take for lunch. Using these data, SPSS would produce
the histogram shown in Figure 4.15.
In a histogram, the numbers across the bottom of the chart represent the actual
values in our dataset; it begins with 30 on the left and goes to 80 on the right. The
values on the left side of the chart are the number of times a given value appears in
the dataset. As you can see, the value 30 appears once in the dataset, and the bar rep-
resenting it stops at the line showing the value of 1. We could double-check ourselves
by seeing that the value 50 occurs 5 times in both the table and the bar chart. We’ll
use the histogram extensively in the next chapter to discuss the normal distribution, but
for now, let me give you a word of warning about graphical statistics.
won’t surprise you that statistics have been used, in many cases, to try to mislead, mis-
inform, or manipulate an unsuspecting victim. This seems to be especially true when
graphical statistics are used.
For example, let’s imagine a local environmental group denouncing the effect
of humans on the natural habitat of brown bears in the Rocky Mountains. As an
example, they compare the life expectancy of the brown bears to their cousins above
the Arctic Circle. They first explain that the average life expectancy of all bears ranges
from 0 to 30 but adamantly believe that the following graph clearly shows that brown
bears have a far shorter life expectancy than polar bears. Based on this, they demand
the government spend more money on environmental conservation; their results are
shown in Figure 4.16.
Although the environmentalists certainly seem to have a case, do you notice any-
thing suspicious about this graph? If not, look on the left side of the chart. As you can
see, the possible values range from 23 to 27. This means that the bars showing average
life expectancy are not actually proportionate to the possible range of 0 to 30; instead
they are proportionate to a much shorter possible range. This makes the difference
between the average life expectancy of brown bears and the average life expectancy of
polar bears look much more dramatic than it really is. If we were to plot the average
life expectancy using the true possible range, the chart would look like that shown in
Figure 4.17.
That makes quite a bit of difference, doesn’t it? While the first graph might sup-
port the environmentalists’ concerns, the second graph shows there is really very little
difference in the average life span of brown and polar bears. This is not to say that we
should not worry about a shorter life expectancy of brown bears, but we should worry
Graphically Describing the Dependent Variable | 99
median, and mode are all equal, and the other values are distributed symmetrically
around the center. It would be nice to have a normal distribution every time we collect
data; that would allow us to use a discrete set of statistical procedures, called paramet-
ric statistics, to analyze our data. You may already be familiar with the names of many
of these statistical tests: the t-test and the analysis of variance (ANOVA), for example.
Although using these parametric tools will be the primary focus of the latter part of
this book, we must keep two things in mind.
First, remember that histograms are used to plot quantitative data. As we talked
about earlier, when we’re working with nominal or ordinal-level data, we’re forced to
use a less powerful set of tools called nonparametric statistics. Like the parametric sta-
tistics, you may already be familiar with some of them; the chi-square test, for example,
is the most widely recognized.
Second, the use of parametric statistics is based on quantitative data that are
normally distributed. In most cases the distribution doesn’t have to be perfectly bell-
shaped since the inferential statistics are powerful enough to allow for some latitude
within the data. In certain instances, however, a researcher might find a set of quan-
titative data where the distribution is so distorted that parametric statistics will not
work. Right now, we will not get into the underlying reasons, so let’s just leave it at
this. In this book, when we have nominal or ordinal data, we will use nonparametric
statistics; if we have quantitative data, we will use parametric statistics.
the average. How often, however, would the heights less than average and the heights
greater than average be exactly equally distributed? You might have a few more short-
er students than taller students or vice versa; either way it would affect how the distri-
bution would look. This type of distortion is very common and very rarely leads to hav-
ing a data distribution where both sides of the curve are perfectly symmetrical. This is
not a big concern in most instances because, like I said, the inferential tests we will use
can compensate for problems with the shape of a distribution up to a certain degree.
Knowing that, let’s look at a few things that can affect the way a distribution looks.
Skewness
Skewness means that our data distribution is “stretched out” to one side or the other
more than we would expect if we had a normal distribution. When we have more val-
ues greater than the mean, we say the distribution is positively skewed or skewed to the
right. When we have more values than expected less than the mean, we say that it is
negatively skewed or skewed to the left. Let’s look at each.
POSITIVE SKEWNESS
Let’s add these values—90, 90, 90, 100, 100, and 105—to the lunch-time data we have
been using. This means we have more data points on the high end of the scale than we
do on the lower end. In Figure 4.21, you can see that this causes the normal curve to
be far more stretched out on the right side than it is on the left side. When this occurs,
we say the dataset is positively skewed or skewed to the right.
In a dataset that is positively skewed, the mean score will be larger than the medi-
an score. Since we already know how to use SPSS to create graphs, let’s just look at
the SPSS printout shown in Figure 4.22; in it you can see that the mean is 60.83 while
the median is 57.5. Larger degrees of skewness would cause this difference to become
even greater. Besides just looking at the data distribution and the difference between
the mean and median, we also have statistical formulas to help us to determine the
degree of skewness. For example, the output below includes a field labeled “skewness.”
Its value, either positive or negative, indicates how skewed the data distribution is; the
further away from zero it is, the greater the skewness is in one direction or the other.
In this case, we have a skewness statistic of .734. Since this value is greater than zero, it
verifies that the dataset is positively skewed. This only becomes problematic, however,
if the value is greater than +2. Anything larger would tell us that the distribution is so
skewed that it will interfere with our ability to use parametric inferential statistics. In
those cases, we would have to use a nonparametric alternative.
Before we move on, let me point out a couple of other things in Figure 4.22 you
might have already noticed about the SPSS output. First, for some reason, it doesn’t
include the modal value. This is not usually an issue as we can easily determine it by
looking at the graphical data distribution, and there are also other commands in SPSS
that will compute the mode for you. Second, the printout includes many descriptive
statistics that we haven’t discussed up to this point. A lot of these statistics, such as the
Minimum and Maximum, are common sense; others will be covered later in the text.
NEGATIVE SKEWNESS
To demonstrate negative skewness, let’s add these values to the original lunch-time
dataset: 5, 10, 10, 20, 20, and 20. By doing that, we can see in Figure 4.23 that the dis-
tribution is more spread out on the left side than it is on the right.
In Figure 4.24, you can see the mean, 49.17, is less than the median, 52.50. As you
Variance 342.581
Minimum 30.00
Maximum 105.00
Range 75.00
might have expected, the relationship between the mean and the median is opposite
that of the positively skewed distribution. The more negatively skewed the distribution
is, the smaller the mean value will be when compared to the median value.
The negative skewness in the diagram is supported by a skewness index of –.734
which, although it is negative, is not less than –2 (i.e., it is between –2 and 0). As was
the case with a positive skewness value less than +2, this means the normality of the
Variance 342.581
Minimum 5.00
Maximum 80.00
Range 75.00
data distribution is not significantly affected; while this distribution is not perfectly
normal, it is still close enough to use parametric statistics.
Kurtosis
Kurtosis refers to problems with the data distribution that cause it to look either more
peaked or more spread out than we would expect with a normal distribution. Platykur-
tosis occurs when we have more values than expected in both ends of the distribution,
causing the frequency diagram to “flatten” out. Leptokurtosis happens when we have
more values in the center of the dataset than we might expect, causing the distribution
to appear more peaked than it would in a normal distribution.
PL AT YKURTOSIS
Once again, let’s modify the original dataset we used. This time, however, instead of
adding values to either end of it, let’s remove one each of the values 50, 55, and 60 in
order to help us understand platykurtosis. If we look at our descriptive statistics in
Figure 4.25, we can see that the mean and median both equal 55, just as we saw in the
normal distribution in Figure 4.19. In this case, however, the data values are not sym-
metrical around the mean. Instead we can see in Figure 4.26 that the normal curve is
plotted slightly above the higher value. That means our dataset is flatter than would
be expected if the distribution was perfectly bell-shaped.
As was the case with skewness, SPSS will calculate an index of kurtosis which, as
seen in Figure 4.26, is –.719. This value, since it is negative, verifies what we saw in the
graph. Much like the skewness numbers, however, the kurtosis number would have to
be less than –2.00 to significantly affect the normality of the dataset.
Descriptives
Variance 162.500
LEPTOKURTOSIS
In other instances, there may be more values around the center of the distribution
than would be expected in a perfectly normal distribution. This causes the distribu-
tion to be taller in the center than we would expect; this condition is called leptokurto-
sis. In Figure 4.27, we again have the mean, median, and mode equal to 55 (remember,
we have to look at the actual diagram in Figure 4.28 to determine the mode), but
the distribution is not normal; you can see that we have added several occurrences
of the value 55 as well as taken away some of the lower and higher values. As odd as
Std.
Statistic Error
Variance 114.286
it looks, however, the index of kurtosis for this distribution is only .334, which is still
well within the range allowed by parametric statistics; these descriptive statistics are
shown in Figure 4.27.
than 60 inches or taller than 78 inches. Knowing that explains why we tend to really
notice men that are very, very short or very, very tall; there are not too many of them.
You probably noticed that, in the last section, I kept saying “about” while referring
to a certain percentage from the mean. I did that to get you used to the concept of
the empirical rule; now let’s look at the exact percentages. What the empirical rule
actually says is that, in a mound-shaped distribution, at least 68.26% of all values fall
within ±1 standard deviation from the mean, at least 95.44% of all values fall between
the mean ±2 standard deviations and nearly all (99.74%) values fall between the mean
and ±3 standard deviations. Using our example of height, this now means only .26%
of all men fall outside the mean ±3 standard deviations; they’re rarer than we thought!
Let’s look at Figure 4.30. There you can see that I used SD for standard deviation
instead of a symbol. I did this because the empirical rule applies to both mound-
shaped samples and populations, so it would not be appropriate to use either of the
symbols discussed earlier.
It makes sense, by looking at this figure, that we can see percentage differences
between the mean and any value above or below the mean. Although it is not as clear
conceptually, we can also look at percentages between any two values in the distribu-
tion, be they both above the mean, both below the mean, or one on either side of
the mean. For example, we know that 68.26% of all values fall within ±1 standard
deviation of the mean. It naturally follows that half of these values, 34.13%, would fall
between the mean and +1 standard deviation, and half of the values, 34.13%, would
fall between the mean and –1 standard deviation. You can see that in Figure 4.31.
If we wanted to determine the difference between the mean and –3 standard devi-
Graphically Describing the Dependent Variable | 109
FIGURE 4.31. Mean ±1 standard deviation. FIGURE 4.32. Mean ± 3 standard deviations.
110 | STATISTICS TRANSLATED
values that lie between the mean and a given z score, go down this column until you
find the row where the value corresponds to your z score. For example, if we wanted to
verify that the percentage for a z score of 1 is 34.13%, go down that column until you
get to the row containing 1.0. Immediately to the right of that, you will see the value
.3413; this is the decimal notation for 34.13%. All you must do is move the decimal
point in the table two places to the right to get the percentage. In this case, you can
see we are in the column where the heading is 0.00; this means we have a z score of
Graphically Describing the Dependent Variable | 111
FIGURE 4.34. The area between z scores of 1.10 FIGURE 4.35. The area between z scores of –2
and 2.21. and +2.2.
under the curve are contained in the range from a z score of –2 and a z score of +2.2.
This is shown in Figure 4.35.
Finally, let’s look at a case where both z values are negative; let’s use –1 (34.13%)
and –3 (49.87%). In this case, since both values are on the same side of the mean, we
must again subtract the smaller number from the larger. We can plot them on our
graph, shown in Figure 4.36, and we find that 15.74% of values fall between our two
z scores.
Quiz Time!
Since we said that we always use a computer to help us when we are graphically describing our
dependent variable, we’re in somewhat of a jam. Knowing we can’t do that, let’s use the follow-
ing questions to understand conceptually where we are. When you are finished, check your
answers at the end of the book.
1. If I wanted a quick and easy way to present the frequency of values in a dataset
containing nominal data, which graphical tool(s) would I use? Why?
2. Using math data on the y-axis and the reading data on the x-axis, as shown in Table
4.8, what trend would the line of best fit show us if we plotted the data on a scatter-
plot?
3. If we plotted the salary data in Table 4.9 on the y-axis and years of education on the
x-axis, what would our plot look like?
114 | STATISTICS TRANSLATED
4. Explain the difference between histogram and measures of tendency. Can this be
used with nominal data? Why? Quantitative data? Why?
5. What are the requirements for a dataset to be called a perfect normal distribution?
6. If a dataset had a mean of 45 and a median of 40, would it be skewed? Why?
7. If we plot a dataset with a mean of 60 and a median of 55, would it be either
platykurtotic or leptokurtotic? Why?
8. Does a dataset with an equal mean and median always mean the data are normally
distributed? Why?
9. Do we always have to use nonparametric statistics when we’re working with datasets
that are skewed or kurtotic?
10. Discuss the relationship between the mean, median, and skewness. How would we
know if it is problematic for inferential decision making?
11. On each row in Table 4.10 there are two z values. Using these, determine the per-
centage of the curve for each and then calculate the difference between the two.
Introduction
We are to the point where we will choose the statistical test to investigate our hy-
pothesis. Before we can, though, we must pass through a roadblock. In order to
move forward, we need to go into a lot more detail about hypothesis testing. First, we
will discuss the overall concept of inferential decision making and learn to manually
compute a few of the basic statistical tests necessary for hypothesis testing; we will
also use SPSS to verify our results. Once we have done that, we will be ready to move
forward, select the test we need, and go on to Step 6.
115
116 | STATISTICS TRANSLATED
In order to start answering these questions, let’s start by talking about the central
limit theorem and the standard error of the mean.
1. If you knew the final exam scores for the population of all students in your
university, you would not have a problem. You could easily add up the scores
for all 1,000 students in the population, divide that by 1,000, and wind up
with the average score for the population.
2. Your class of 50 students is only a sample of the population of 1,000 students,
and your class’s average score may or may not be exactly equal to the mean
score of the population.
3. Based on what we already know, the mean scores from the other samples (i.e.,
classes of 50 students) are going to be exactly equal to the population mean
or, much more likely, they are going to fluctuate around the mean.
4. The more samples you select and plot, the closer the overall mean of the
samples will be to the mean of the population. If you take an infinite number
of samples, the overall mean of the samples will be exactly equal to the popu-
lation mean.
Choosing the Right Statistical Test | 117
As we can see in Figure 5.1, because the distribution is flatter than a normal
distribution, it is also platykurtotic. Let’s continue by using the data in Table 5.2 to
represent the mean scores for 24 samples.
In Figure 5.2, by using a larger number of mean scores, the distribution is starting
to look different. The curve is more bell-shaped, and the standard deviation (3.46) is
getting smaller. Since the standard deviation is a measure of dispersion, the data dis-
tribution is less spread out, less skewed, and less platykurtotic.
Now, let’s use the set of 49 sample means shown in Table 5.3, plot them, and see
what happens. This is really starting to make a difference isn’t it? We can see that the
curve is becoming very bell-shaped and the standard deviation, in this case, 2.86, is
getting smaller (Figure 5.3).
We could keep adding data values and plotting them on our distribution but, suf-
fice it to say, the process we are following is shown in Figure 5.4.
By looking at the distribution of sample means in Figure 5.4, you can see it looks
like a normal distribution. It is, but unlike the normal distributions we have seen so
far, this one has three special qualities.
1. The distribution has a special name. Since we are plotting the mean scores
from repeated samples, it is called the sampling distribution of the means.
2. The mean of the sampling distribution of the means is called the mean of means
and is shown as m x . The more samples you plot on a histogram, the closer
the mean of means gets to the mean of the population from which we take
the samples.
3. When we compute a standard deviation for our sampling distribution of the
means, we must keep in mind the error inherent in the sampling process.
Because of that, instead of calling it the standard deviation, we now call it the
standard error of the mean or SEM or σ x .
Figure 5.5 gives us the overall feel for the central limit theorem.
1. When we plotted only 10 sample means, we saw a flat, spread out distribu-
tion.
2. When we plotted 24 sample means, we saw a distribution that looked more
like a normal distribution.
3. When we plotted 49 sample means, we had a nice, bell-shaped distribution
with most scores clustering around the mean of the population. Many of the
scores were exactly equal to the mean, but most were a little higher or a little
lower; this we know is due to sampling error. Had we continued plotting
mean scores, eventually we would have a perfectly normal distribution with a
mean of means exactly equal to the population mean.
Third, although it was implied that the distribution of scores for the entire popu-
lation of PSY101 students was normally distributed, it does not have to be a normal
distribution for the central limit theorem to apply. If enough samples are taken and
FIGURE 5.5. Comparison of the mean between a population and a sampling distribution of
means.
Choosing the Right Statistical Test | 121
the samples are large enough, when the means of the samples are plotted, a normal
curve will emerge.
14
1. σ x =
49
14
2. σ x =
7
3. σ x = 2
Keep in mind that we probably will not know the population standard deviation;
we simply used 14 in this case to illustrate what we are doing. If the population stan-
dard deviation is not known, we can use the sample standard deviation in its place.
When we do this, it is no longer called the standard error of the mean; instead we
call it the sample standard error of the mean. This means our formula changes slightly
because we have to replace the symbol for the population standard deviation (i.e., s)
with the symbol for a sample standard deviation, that is, s, as well as replace the
symbol for the population SEM (i.e., σ x ) with that of the sample standard error of
the mean (i.e., σ x):
122 | STATISTICS TRANSLATED
s
sx =
n
You will find that the sample standard error of the mean is a very good estimate
of what we would have computed had we known the population value.
z=
( x − mx )
σx
We already know that the mean of means (i.e., m x ) is 92. We also know, from our
example, that the population standard error of the mean (i.e., σ x ) is 2. If we have an
Choosing the Right Statistical Test | 123
observed sample mean that is one sample SEM above the mean (i.e., 94), then our
z score is 1:
(x − mx )
1. z =
σx
2. z =
(94 − 92)
2
3. z =
( 2)
2
4. z = 1
1. z =
(42 − 50)
3
2. z =
( −8)
3
3. z = –2.67
In this case, what does a z score of –2.67 tell us? Simple: the sample mean we are
interested in is 2.67 SEMs below the mean of the sampling distribution of the means
(i.e., the mean of means).
Now, let’s really start tying this together. Based on what we learned earlier, what
is the area under the curve between the mean and our z value? If we go to the back
of the book and look at the area under the normal curve table, we can see that our z
value is equivalent to .4962. This tells us that 49.62% of the values under the curve lie
between 42 and 50. Remember, even though our observed value is negative, that does
not affect the percentage we are interested in; a z score of 2.67 above the mean would
result in the same percentage under the curve. All we are interested in is measuring
the distance between two points.
124 | STATISTICS TRANSLATED
TABLE 5.5. The Process for Creating a Sampling Distribution of the Means
An infinite number of
random samples with
Original means computed and Sampling distribution
data distribution Population plotted of the means
Type of data Quantitative Quantitative
Shape of Any shape Bell-shaped
distribution
Measure of central Mean Mean equal to the
tendency population mean (i.e.,
the mean of means)
Measure of Standard Standard error of the
dispersion deviation mean
Measure of Empirical rule Empirical rule applies
relative standing applies
1. For each of the following sample means, compute the z score using the descrip-
tive statistics given in Figure 5.7. Note that the column labeled “Mean Statistic” is
the mean of the sampling distribution of the means and the column labeled “Std.
Deviation Statistic” is the population standard error of the mean.
a. 109
b. 77
c. 113
d. 101
e. 95
Choosing the Right Statistical Test | 125
Descriptive Statistics
Valid N 100
FIGURE 5.7. Descriptive statistics for computing z scores.
f. 88
g. 96
h. 90
2. In Table 5.6, using a SEM of 2 and a population mean of 92, complete Table 5.6.
FIGURE 5.8. The population mean and the mean of the sampling distribution of means.
cutoff point?” If we are trying to test a hypothesis, how do we decide which sample
statistics are different owing to sampling error and which are significantly different
from the population parameter?
First, we accept the fact that any decision we make may be wrong; after all, we are
dealing with probability. Luckily, we have some leeway at this point. We are able, by
stating an alpha value, to state a priori (i.e., beforehand) the degree of risk we are will-
ing to take when deciding. The alpha value, often abbreviated using the Greek letter
α, is sometimes called the level of significance. Its use is probably best explained using
an example.
Suppose we are interested in determining if the average final exam score for the
students in our Management class (i.e., our sample) is significantly different from the
average score of other Management students throughout the university (i.e., the popu-
lation of Management students). In order to determine this, we would have to test the
following null hypothesis:
Now, let’s imagine we have computed an average score of 92 for our students and
found the population average to be 80. Because the difference between these mean
scores is somewhat large, it appears we should reject our null hypothesis (i.e., there
appears to be a significant difference between the two groups). Before we make that
assumption, though, let’s think it through.
We have already said that statistics from a sample, the mean for example, cluster
around the same measure in the population. Knowing that, the mean score of our
class (i.e., our sample) is nothing more than one of the multitude of mean scores fluc-
tuating around the population mean. In our case, all we know about our sample mean
is that it is different from the population mean. This difference is either due to sam-
pling error, or it is so large it cannot be attributed to chance; it represents a significant
difference between the sample mean and the population mean.
Choosing the Right Statistical Test | 127
If the first is true, we will fail to reject our null hypothesis—although there is a
difference, it is due to chance and is not significant. If the second is true, then we
will reject our null hypothesis—it is unlikely we can attribute our results to chance;
they represent a significant difference between our sample mean and the population
mean. The question that remains, then, is, “This sounds good on the surface but how
do we know when the sample mean, and the population mean are different due to
chance and when they are significantly different from one another?” In answering
that question, I have some bad news and some good news. The bad news is, since we
are dealing with a sample of data, we are never 100% sure if the differences are due
to chance or represent a significant difference. The good news is, by using our alpha
value we can control the risk of making the wrong decision.
Our alpha value allows us to do four things:
The counterpart of a Type I error is called a Type II error and occurs when, because
of sampling error, we fail to reject a null hypothesis when we actually should reject
it (i.e., we say the values are not significantly different when they really are). In some
texts, you’ll see the Type II error rate referred to as beta; the two are synonymous. A
mistake that many people make is to assume that, if our alpha value is .05, then our
beta value must be .95—that is not true! Calculating beta is a very complicated process,
but, luckily for us, it is often computed as part of the output when using a statistical
software package.
Although beta is not often a topic for beginning statisticians, it does play an inte-
gral part in computing the statistical power of an inferential test. Statistical power,
defined as the probability of not making a Type II error (i.e., we will reject the null
hypothesis when it is false), is computed by subtracting beta from 1. This value, of
course, is affected by the alpha value we use, the actual difference between values we
are comparing and the sample size. Generally speaking, any power value over .80 is
considered acceptable. Again, this is something that consumers of statistics are not
frequently faced with, but we must be able to recognize and understand terms like this
when we see them!
In this case, a Type I error (i.e., rejecting the null hypothesis when you should not)
would mean that an innocent person would be sentenced to jail because you would be
supporting the research hypothesis. Failing to reject the null and not supporting the
research hypothesis due to sampling error (i.e., a Type II error) would mean a criminal
would be found not guilty and be allowed to go free. Which is worse, an innocent man
in jail or a criminal on the street? Of course, we could argue circumstances here, but
suffice it to say the first scenario is worse!
Having said all of that, Table 5.7 shows the relationship between null hypotheses
and the two types of errors.
A range of numbers around a sample statistic within which the true value of
the population is likely to fall.
As we have said, any time we have a sample from a population, the sample statistic
may or may not be equal to the corresponding population parameter. Because of that,
we can never be 100% sure of any estimates made based on the sample data. We can
build a range around the sample statistic, called the confidence interval, within which
we predict the population parameter will fall.
To compute the confidence interval, you need two things: a sample from the pop-
ulation and an alpha value. We will stick with the standard alpha value of .05 (i.e., 5%).
This means that you want to create a range that, based on the data from the sample,
will have a 95% (i.e., 100% minus your alpha value of 5%) probability of including the
population average. Obviously, it also means there is a 5% probability that the popula-
tion mean will not fall into the confidence interval. Let’s use the following data to put
this all together:
Using this information you can use the following formula to compute the confidence
interval you need:
130 | STATISTICS TRANSLATED
σ
Confidence interval = x ± (za 2 )
n
The individual parts are easy to understand:
X x stands for the mean of the sample we are dealing with (i.e., 92).
X z is the z value for the alpha level we chose (i.e., 1.96). As you can see, we will
be dividing our z value by 2; we will talk more about that in a minute.
X a is our alpha value (i.e., .05).
X n is the square root of the sample size (i.e., 50) we are dealing with (i.e.,
7.07).
X s is our sigma value and represents the population standard deviation.
Here it seems we have a potential problem. This last step in the equation, shown
below, is the formula for the standard error of the mean, but we still do not know the
population standard deviation (i.e., s) in order to compute it:
σ
n
As we have said, if we do not know the population standard deviation we can
replace it with the sample standard deviation and get a very good estimate. In this
case, however, we’ll make it easy and use a population standard deviation of 5 so we
can move forward. That will give us everything we need for our formula:
5
Confidence interval = 92 ± (1.96)
7.07
At first glance, everything here is pretty straightforward except for the whole idea
of the z score for alpha divided by 2. We do that because, if you are trying to create
a range that contains a population parameter, obviously you will have fluctuation
around that parameter due to sampling error. Since you know this is going to occur,
you have to be willing to make a mistake in either direction—you may estimate too
high, you may estimate too low, or you may estimate just right.
If we set alpha to .05 and then divide it by 2, we get a value of .025 or 2.5%. That
means you are willing to make a 2.5% mistake on either side of the mean. When you
subtract this value from the total percentage (50%) on either side, you are now saying
your range will include from 47.5% below the mean to 47.5% above the mean (remem-
ber 47.5% + 47.5% = 95%, the size of our confidence interval). When you find 47.5 (or
.4750) in our area under the normal curve table, you find the corresponding z value of
1.96. To verify this, here is the row in the table showing these values. As you can see
in Table 5.8, .4750 lies at the intersection of the 1.9 row and the .06 column; obviously,
1.9 and .06 equals 1.96. All of this is shown in Figure 5.9.
We can now compute the actual confidence interval using what we know up to
this point:
1. When alpha = .05, we are looking for a confidence interval that contains 95%
of the values (i.e., 1.00 – .05 or 100% – 5%).
2. Since we must establish a range on both sides of the value we are predicting,
it is necessary to divide the size of our confidence interval by 2. This leaves us
with 95%/2 or 47.5%.
3. The area under the normal curve table shows a z value of 1.96 for 47.5%.
4. The population standard deviation (5) divided by the square root of the sam-
ple size (n = 50) is .707.
5. Confidence interval = 92 ± (1.96)(.707)
6. Confidence interval = 92 ± 1.39
7. Confidence interval = 90.61 to 93.39
1. When alpha = .05, we are looking for a confidence interval that contains 95%
of the values (i.e., 1.00 – .05 or 100% – 5%).
2. Since we must establish a range on both sides of the value we are predicting,
it is necessary to divide the size of our confidence interval by 2. This leaves us
with 95%/2 or 47.5%.
132 | STATISTICS TRANSLATED
3. The area under the normal curve table shows a z value of 1.96 or 47.5%.
4. The population standard deviation (10) divided by the square root of the
sample size (n = 100) gives us
10/10 or 1.
5. Confidence interval = 70 ±
(1.96)(1.00)
6. Confidence interval = 70 ± 1.96
7. Confidence interval = 68.04 to
71.96
In looking at the table, there is not an entry for 45% (i.e., .4500) but we do have
entries for 44.95% (i.e., .4495 in the table and a z score of 1.64) and 45.05% (i.e., .4505
in the table and a z score of 1.65). Since the value we are looking for, 45.00%, is in the
exact middle between 44.95% and 45.05%, we must compute a z score that represents
the average between the two. This process is as follows:
1. When alpha = .10, we are looking for a z score for 45% (i.e., 90%/2).
2. The table shows scores of 44.95% (z = 1.64) and 45.05% (z = 1.65).
3. 1.64 + 1.65 = 3.29
4. z = 3.29/2
Choosing the Right Statistical Test | 133
5. z = 1.645
Let’s use the same data as from the first example above; our mean is 92, our
population standard deviation is 5, and we have a sample size of 50. This leads us to a
confidence interval ranging from
The same holds true when we have an alpha value of .01. In this case, we want
to compute a 99% confidence interval that represents 49.5% of the values above the
mean and 49.5% of the values below the mean. That means we are looking for a value
of .4950 in the area under the normal curve table; let’s use Table 5.10 to help us.
Here we can see there is no entry for 49.5% (.4950), but there are entries for the
two values that surround it, .4949 and .4951. Because 49.50% is exactly in the middle
between 49.49% and 49.51%, we must again use the z score that represents the average
of the two (i.e., (2.57 + 2.58)/2 = 2.575). Again, this process is
1. When alpha = .01, we are looking for a z score for 49.5% (i.e., 99%/2).
2. The table shows scores of 49.49% (z = 2.57) and 49.51% (z = 2.58).
3. 2.57 + 2.58 = 5.15
4. z = 5.15/2
5. z = 2.575
Using this value, we go through the same process to compute our confidence
interval:
Let me reiterate just one more time. This only means that we are 99% confident
that our population average lies somewhere between 90.18 and 93.82. Remember,
although it is close, 99% is not 100%; because of random error, we can never be abso-
lutely certain.
134 | STATISTICS TRANSLATED
The results, shown in Figure 5.11, indicate there is a 95% probability that the aver-
age IQ in the school is between 104.63 and 115.37.
Choosing the Right Statistical Test | 135
15
Confidence interval = 110 ± (2.575)
FIGURE 5.11. 95% confidence interval around a 5.48
mean of 110.
In this case, there is a 99% probabil-
ity that the average IQ in the school is
between 102.94 and 117.06. By using the same data and decreasing the size of our
alpha value, we can see that we have widened the confidence interval (Figure 5.12).
In order to give ourselves the greatest opportunity to create a range that includes the
population mean, this means we have to use a smaller value for alpha.
Let’s do the same thing but increase our alpha value to .10. Again, our formula
would look the same except our z value now is 1.645.
15
Confidence interval = 110 ± (1.645)
5.48
In this case, we have narrowed our confidence interval down to 105.50–114.50.
Here in Figure 5.13, we can see, by using a larger alpha value, we have decreased the
range of the confidence interval.
TABLE 5.13. Learning to Compute the Width and Limits of a Confidence Interval
Lower limit Upper limit Width
Alpha n s of CI of CI of CI
100 .10 25 5
500 .05 50 25
20 .01 15 3
55 .05 20 7
70 .01 22 6
220 .10 40 10
Choosing the Right Statistical Test | 137
Can you tell me why we made the hypothesis two-tailed (i.e., nondirectional)?
That is easy—we do not know if our students are doing better or worse than other stu-
dents in the country. Because our scores could be greater than or less than the popula-
tion’s scores, we must make the hypothesis nondirectional.
In our case, let’s suppose we have 100 students in our graduate program. Knowing
that, as we just said, we can use the z statistic to test our hypothesis. This means using
the following values to compute our z value.
z=
(x − m)
σx
For the sake of our example, let’s assume the population mean is 800 (i.e., the
average score for graduate students at all universities), the SEM is 10, and the mean
value for our students is 815. We can substitute those into our formula:
(815 − 800)
z=
10
This would result in a z score of 1.5. From this point forward, this will be called
our computed value of z. In order to test our hypothesis, we have to compare this value
to a critical value of z. This is nothing more than a value from the area under the nor-
mal curve table we used earlier. The given value we choose will be based on the alpha
value we’re using and the type of hypothesis we’ve stated (i.e., directional or nondirec-
tional). Let’s use an example to better understand this.
(i.e., 100%) to give us the range of possible z values we can use to test our hypothesis.
This means our possible range is 95% of all possible z values (i.e., 100% – 5% = 95%).
Next, we must use the type of hypothesis we have stated to determine the distribu-
tion of all the possible critical values of z. If we have a two-tailed hypothesis, we must
distribute the z scores equally under the curve; if we have a one-tailed hypothesis we
must distribute them according to the direction of the hypothesis. For now, let’s look
at the nondirectional (i.e., two-tailed) hypothesis; we will do the same for a one-tailed
hypothesis shortly.
If we already know that we want to build a range that includes 95% of all possible z
values and that range must be equally distributed on either side of the mean, it is easy
to determine what the range would be. We would equally divide the 95%, meaning
47.5% of the values would fall on one side of the mean and 47.5% of all values would
fall on the other. We would then have 2.5% of all possible values left over on either
side. We can see that in Figure 5.16.
At this point, things start coming
together. If we have 2.5% of all possible
values left over on either side of the
mean, it gives us a total of 5% remain-
ing. This is the same as our alpha value
(i.e., .05 or 5%). This tells us that, when
we have a two-tailed hypothesis, all we
must do is divide alpha by 2 to deter-
mine the percentage of all possible val-
ues on either side of the mean.
Now that we have identified an area
that contains 95% of all values of z, our
job is to determine exactly which value
we want to use as our critical value. In
FIGURE 5.16. Equally dividing 95% of z values order to do that, we need to use the area
under the curve. under the normal curve table.
As you’ll remember, this table only
shows values for one side of the curve. That is fine. If we know values for one side of
the curve, it is easy to determine what the corresponding value would be on the other
side; it is just the numeric opposite. Although we have gone through all of this before
back when we talked about confidence intervals, let’s look at Table 5.14; this is a small
part of the critical value of the z table and will help us determine the critical z value
we need.
We can see that the value .475 (i.e., 47.5%) lies at the intersection of the row
labeled 1.9 and the column labeled .06. If you add these two values together, it gives
us a z value of 1.96. This means, for 47.5%, our critical value of z on the right side of
the mean is 1.96 and the critical z value on the left side of the mean is –1.96. Putting
Choosing the Right Statistical Test | 141
all of this together, we can see that a range of 95% (i.e., 47.5% on either side of the
mean) encompasses a range of critical z scores from –1.96 to +1.96. This is shown in
Figure 5.17.
z=
(780 − 800)
10
This gives us a z score of –2. Since this doesn’t fall between –1.96 and +1.96, we
will reject the null hypothesis. In this case, our score of 780 is significantly less than
the national average of 800. Given this, perhaps your critics are right; maybe it is time
to implement the GRE.
In Figure 5.18, I have used a population mean of 800 and SEM of 10 and plotted
a few more mean values for our students. This, along with the corresponding z scores,
will help solidify this idea.
FIGURE 5.18. Using the z score to reject or fail to reject a null hypothesis.
harder, teach more classes, and have a larger dissertation load than faculty at other
universities. Unless you can show us we are making significantly more than the aver-
age faculty salary at those other places, we are going on strike!” After saying this, the
professor leaves your office in a huff, slamming the door behind him. After careful
consideration, you decide he might have a point. He is good professor and he does a
lot of fine work; perhaps an investigation is called for.
What is the first thing you do? It is easy. Since he wants you to show that your
faculty is earning more than the national average, you state the following research
hypothesis:
Your faculty’s average annual salary will be significantly higher than the
national average salary of faculty members.
In order to begin investigating this, you call the human resources department and
find that the average salary in your department is $74,000. After a little investigation,
they tell you the national average is $70,000 with a SEM of $2,000. This means your
faculty’s salary, $74,000, has a z score of +2:
1. z =
(74000 − 70000)
2000
2. z =
(4000)
2000
3. z = +2
Choosing the Right Statistical Test | 143
Is that large enough to consider the difference significant? Let’s compare the
computed z score to the critical z score and find out. Before we do, however, let’s take
a minute to consider where we are going conceptually.
In the prior example, when we were using a two-tailed hypothesis, we had to
divide our alpha value by 2 and mark off 2.5% on each end of the distribution. In
order to determine the critical z score, we had to take the remaining 47.5% (remem-
ber, 50% on either side of the distribution, minus 2.5%) and consult the area under
the normal curve table. In the table, we found that a z score of 1.96 corresponded to
47.5%, so we marked off a range from –1.96 to +1.96 on our curve. We then compared
our computed value of z to that range in order to test our hypothesis.
In this case, we must do something a bit different. Since we have a one-tailed
hypothesis but still have an alpha value of .05, we must mark the entire 5% off on one
end of the distribution or the other. Here we have a “greater than” hypothesis so we
must mark the 5% on the positive end of the distribution. You can see this in Figure
5.19.
Now, it is time to determine the critical value of z we are going to use. Using the
excerpt from the area under the normal curve table (Table 5.15), we can see that the
critical value for z for 45% of the area under the curve lies between 1.64 and 1.65; this
leaves us with a critical value of 1.645. That can be seen in Figure 5.20.
Now, let’s compare our computed z to the critical value of z that we have plotted. Is
our computed z (i.e., +2) greater than our critical value of z (i.e., 1.645)? Since it is, we
will reject the null hypothesis. Your faculty members are making, on average, $74,000
per year. This is significantly more than the national average of $70,000. We can see
this in Figure 5.21.
In Figure 5.22, I have shown a couple of other examples using different values
of z. When our computed value of z is equal to 1.5, we would not reject the null
FIGURE 5.19. 5% on the positive side of the FIGURE 5.20. Critical z for 5% above the mean.
mean.
144 | STATISTICS TRANSLATED
FIGURE 5.21. Rejecting the null hypothesis with FIGURE 5.22. Examples of rejecting and failing
a computed z of 2. to reject based on computed z scores.
hypothesis; there is no significant difference between our faculty’s salaries and the
national average. We do, however, reject the null when z is equal to 1.7. This means our
research hypothesis stating that our faculty members are already making more than
the national average is supported.
Your faculty’s average annual salary will be significantly less than the
national average salary of faculty members.
First, let’s compute our z score using our faculty member’s salary along with our
population mean and standard error:
z=
(66,500 − 70,000)
2000
Here the computed z score is –1.75. We can plot this on our distribution to see if
a significant difference exists but, before we do, let’s consider what our distribution
should look like.
Choosing the Right Statistical Test | 145
1. Determine the alpha value we want to use. In this example, let’s use a = .05.
2. If we have a two-tailed hypothesis (i.e., nondirectional), divide our alpha value
by 2; in this case that would leave us with .05/2 = .025. If we have a one-tailed
(i.e., directional) hypothesis, we will use the entire alpha value.
3. We then subtract the result from Step 2 from .50. In this case, if alpha is equal
to .05 and we have a two-tailed hypothesis, we would be left with .475 (i.e., .50
FIGURE 5.23. 5% on the negative side of the FIGURE 5.24. Critical z for 5% below the mean.
mean.
146 | STATISTICS TRANSLATED
FIGURE 5.25. Rejecting the null hypothesis with FIGURE 5.26. Examples of rejecting and failing
a computed z of –1.75. to reject based on computed z scores.
TABLE 5.16. Tying Together the z Score, Alpha Value, Confidence Interval, and Error
Probability
Lower Upper
Confidence end of Mean end of Width Hypothesis error
interval Alpha z score CI of CI CI of CI probability
90% .10 1.645 105.50 110 114.50 9.0 Higher probability of
Type I error
Lower probability of
Type II error
95% .05 1.96 104.63 110 115.37 10.74 Acceptable probability
of Type I and Type II
errors
99% .01 2.575 102.94 110 117.06 14.12 Higher probability of
Type II error
Lower probability of
Type I error
Remember, using this table we demonstrated that higher alpha values lead to
smaller z scores and confidence intervals. Let’s tie those ideas together with the idea of
hypothesis testing so we can understand how these same values affect the probability
of Type I and Type II errors.
When we test a hypothesis and use a larger alpha value, we create a narrower con-
fidence interval. This means we are creating a smaller range of values that we would
consider not significantly different from the value we are interested in. In this case,
if we have a two-tailed hypothesis and use an alpha value of .10, we say that any value
from 105.50 to 114.50 is not significantly different from our average of 110 and we
would fail to reject the null hypothesis based on that value. Anything outside of that
range would be considered significantly different and would cause us to reject our
null hypothesis. It is easy to see, then, that a larger alpha gives us a far better chance
of rejecting the null hypothesis. Unfortunately, this also means that we are increasing
our Type I error rate; we may be rejecting a null hypothesis when we should not.
At the other end of the spectrum, if we decrease our alpha value to .01, we are
greatly widening the range of values we would consider not significantly different
from our mean score. In this case, we would consider anything from 102.94 to 117.06
to be different from 110 strictly due to chance. This, of course, lowers our probability
of rejecting the null hypothesis and greatly increases the probability of a Type II error;
we might fail to reject the null hypothesis when we should.
As we said earlier, at this point a good consumer of statistics knows two things.
First, we cannot count on being 100% right in our statistical decision making because
of random error. Second, because of that, we can control our level of risk by using an
appropriate alpha value. As we just saw, having too large or too small of an alpha value
creates problems; a good consumer of statistics will usually use an alpha value of .05
and its acceptable probability of making either a Type I or Type II error.
mation from both the population and the sample. In upcoming cases we will com-
pare two samples, three samples, and so on. Although the different tests we will use
(t-tests, ANOVAs, etc.) have different formulas, the basic premise of hypothesis testing
remains the same.
As easy as this is, I am the first to admit that calculating all these values, using
the tables, and plotting the results can be somewhat tedious. Given that, let’s look at
an alternate way of making decisions about our hypotheses. I believe you’ll find this
way a lot easier.
Probability Values
Instead of working through all those calculations, statisticians use statistical software
to compute a p value. This p value is the probability that a particular outcome is due to
chance. In the case we just discussed, another way of describing the p value is to call it
the probability of our university’s salaries being equivalent to the salaries from other
universities. These p values range from 0.00 (no probability that the sample mean
came from the population being considered) to 1.00 (an absolute certainty that the
sample mean came from the population being considered). Obviously, if we are trying
to reject the null hypothesis, we want as small a p value as possible. This will help us
ensure that any differences we find are “real” and not due to chance.
Rather than testing a hypothesis by computing an observed value of z, finding a
critical value of z, and then comparing the two, statisticians can just compare their
computer-generated p value to their predetermined alpha value. For a particular case,
if the computed p value is less than the alpha value, a researcher will reject the null
hypothesis; this means the differences are significant and not attributable to chance.
If the p value is greater than or equal to the alpha value, the statistician will fail to
reject their null hypothesis. This means any differences found are not significant or
are attributable only to chance. Table 5.17 shows an example of these ideas.
At this point, we are going to make up a pretty far-fetched case study. Let’s sup-
pose we have a pool of depressed patients. We randomly select one group to receive
cognitive therapy and the other to receive behavioral therapy. After an appropriate
period, say 10 sessions with the psychologist, we ask each of the patients to complete
the Beck Depression Inventory. After computing the descriptive statistics, we see that
the cognitively treated patients have a mean score of 15 while the behaviorally treated
patients have a mean score of 30.
Things are looking pretty good for the cognitive therapists, aren’t they? Let’s
use our alpha value of .05, and, using the data we collected, our software computes
a p value of .01. Since this is less than our alpha value of .05, obviously the cognitive
150 | STATISTICS TRANSLATED
therapists know something the behaviorists do not; their clients’ scores are signifi-
cantly less than those of the behaviorists’ clients.
Let’s look at one last research hypothesis:
TABLE 5.18. The Relationship between the p Value and a Decision Made about
a Hypothesis
What happens to the What happens to the
Situation null hypothesis? research hypothesis?
Computed p is greater than alpha. We fail to reject the null We do not support the
or hypothesis. Any observed research hypothesis. Any
Computed p is equal to alpha. differences may be due to observed differences may
random sampling error. be due to sampling error.
Computed p is less than alpha. We reject the null We support the research
hypothesis. The observed hypothesis. The observed
differences are probably differences are probably
not due to sampling error. not due to sampling error.
dard error of the mean. Use these values to compute the appropriate z score, obtain
the critical z score, and then determine whether you should reject the null hypothesis.
Finally, based on everything, determine if the p value would be less than .05. Again,
you do not have to compute a p value; just state whether it would be less than based on
whether you would reject the null hypothesis based on the other computed values. In
the last line of the table, support your decision about the p value. I have supplied the
answers at the end of the book so you can check your work.
variables. While these names are different, they are used in a manner somewhat like
their independent and dependent cousins. We will discuss this in detail when we get
to that chapter.
Confidence interval:
σ
x ± (za 2 )
n
z score for sampling distribution of the means when the population mean is
known:
z=
(x − m)
σx
Remember: In the z score formula, both the population mean and the standard
error of the mean can be replaced by the mean of means or the sample standard
error of the mean if needed.
Quiz Time!
Using the decision table, identify the statistical test that should be used to test each of the fol-
lowing hypotheses and then explain why you chose that test.
5. There will be a significant difference in height and weight between persons raised
in an urban setting and persons raised in a rural setting.
9. There will be a significant correlation between the number of siblings and the
annual income of their parents.
11. There will be no significant difference in the percentage of males and females in
our office building when compared to the national percentage.
12. There will be a significant difference in authors’ procrastination levels before being
called by their editor and after they have been called by their editor.
CHAPTER 6
Introduction
We are finally up to the point where we can choose our statistical test, use SPSS to
analyze our data, and interpret our hypothesis based on the SPSS report. From here
through the end of the book, we will dedicate a chapter to each of the tests in our
table. We’ll work through a lot of cases, going through the entire six-step process, to
ensure we know when and how to use each test.
In the last chapter, we learned to use a statistic from a large sample of data to test
a hypothesis about a population parameter. In our case, using a z-test, we tested a
hypothesis about a population mean using the mean of one sample drawn from that
population. We agreed that by “large sample” we mean any sample that has 30 or
more data values. What happens, however, when we have a sample with less than
30 items? We cannot use the one-sample z-test, so what can we do? The answer
is, we can use its counterpart, the one-sample t-test. The logic underlying the one-
sample t-test is very similar to that of the one-sample z-test and is easily understood.
You’ll find that this material is critical to understanding the next two chapters, on
the independent-sample t-test and the dependent-sample t-test. For now, let’s take
a trip.
The t Distribution
We’ve seen, in prior chapters, when we had large (i.e., greater than 30), mound-shaped
quantitative data distributions, the empirical rule showed us how to understand the
distribution of the data. We were able to locate the relative position of a data point
in terms of the standard deviation by computing z scores using the sample mean, the
population mean, and the standard error of the mean. If we did not know the stan-
dard deviation of the population and therefore could not compute the standard error
of the mean, we also saw that we could use the standard deviation of the sample to
compute the sample standard error of the mean. This is a very close approximation
of the same population parameter and can be used in our formula to compute the z
score.
Gossett, while experimenting with possible solutions, noticed something interest-
ing about the mound-shaped distribution of data values when the sample size was
less than 30. Each time he decreased the sample size by one and plotted the means
of repeated samples of the same size, the shape of the distribution flattened out. You
can see the plots for 30 sample means, 25 sample means, and 20 sample means in
Figure 6.1.
By looking at the picture we can
see that, when the means of individual
samples with 30 sample means are plot-
ted, the distribution looks like a normal
curve. When we use only 25 sample
means, the sampling distribution gets
flatter and more spread out; this is even
more evident when we have only 20
sample means. In other words, the fewer
the number of data values you have,
the more spread out on both ends (i.e.,
platykurtotic) our distribution is.
FIGURE 6.1. Changes in distribution shape based Using this idea, Gossett further dis-
on number of samples plotted. covered that the empirical rule applied
to this distribution and, if you compen-
sate for the number of data values less than 30, you can test a hypothesis by compar-
ing a computed value of t to a critical value of t. This is exactly what we did when we
compared a computed value of z to a critical value of z in the z-test. This was the key, he
determined, to testing a population parameter using only a relatively few data values.
better understand this, let’s state a hypothesis and then use a dataset with 15 values
to test it:
Let’s use a test, designed to measure anxiety, with scores ranging from zero (no
anxiety) to 80 (high anxiety). The 15 scores in Table 6.1 serve as an example so we can
continue our discussion. Since we already know how to use SPSS to compute descrip-
tive statistics so we can move past that part; Figure 6.2 shows the output.
Descriptive Statistics
The formula for the computed value of t is like the formula for the computed
value of z; we just need to include different values:
x −µ
t=
sx
As always, x represents the mean of the dataset—in this case 32. From the descrip-
tive statistics in Figure 6.2, we see the standard error of the sample (i.e., µ x) is .74322.
In this case, let’s assume the mean of our population (i.e., μ) is 30; this is the value
that we want to compare our sample mean against. We can insert these values into our
equation and calculate a t value of 2.691 using the following three steps:
32 − 30
1. t =
.74322
2
2. t =
.74322
3. t = 2.691
The One-Sample t-Test | 159
Let’s use SPSS to verify our calculations. We have entered our data in Figure 6.3,
selected Analyze, Compare Means, and One-Sample T Test. We then select the Anxi-
ety value and set the Test Value to 30, shown in Figure 6.4; this represents the mean
value of our population. Clicking on OK causes SPSS to run the one-sample t-test,
comparing the mean score of our data to the population mean of 30, and will produce
Figure 6.5.
FIGURE 6.3. Selecting the Compare Means and One-Sample T-Test command.
We are familiar with these terms, so let’s not go into detail just yet; we will refer
to our p value, shown as Sig. 2-tailed by SPSS, and the mean difference later. For now,
the value that we are most interested in is 2.691, our computed value of t. This is the
value we will compare to a critical value of t in order to test our hypothesis.
Degrees of Freedom
Throughout the remainder of the book, most of the inferential statistics we compute
will include degrees of freedom; this is the number of values in the dataset we’ve col-
160 | STATISTICS TRANSLATED
FIGURE 6.4. Identifying the Test Variable and Test Value for the One Sample T-Test.
lected that are “free to vary” when you try to estimate any given value. For example,
suppose we have a one-sample t-test where we know the population mean is 10. I might
ask you to tell me five numbers that, when summed, would give me an average of 10.
You might start out by saying “6, 8, 10, 12 . . .” but then I would have to interrupt you
by saying, “That is enough, if I know the first 4 numbers, I can calculate the fifth.”
Here’s how I know. In order to compute a mean of 10 with a dataset of five values,
the sum of the values in the dataset would have to equal 50 (i.e., 50/5 = 10). We could
set up an equation to show the relationship between the values we know and the sum
we need; x is used in the equation to represent the value we do not know:
6 + 8 + 10 + 12 + x = 50
A little simple math would show the sum of the four values we know is 36:
6 + 8 + 10 + 12 = 36
One-Sample Test
Test Value = 30
We can subtract that value from 50 and see the value we need to complete our
equation is 14:
50 – 36 = 14
In order to double-check our work, we can then put these numbers together and
do the required math:
6 + 8 + 10 + 12 +14 = 50
And then:
50/5 = 10
In short, we’ve shown that, if we know the mean and all the values in the dataset
except one, we can easily determine the missing value. In this case, four of the values
in the dataset can vary, but the fifth one cannot. By definition then, our degrees of
freedom are 4, that is, (n – 1).
Although the t table looks a lot like the z table, there are a few differences. First,
you can see degrees of freedom abbreviated as df. Next, there are three highlighted
columns. The leftmost highlighted column shows degrees of freedom from 1 through
10, the middle, highlighted column shows degrees of freedom from 11 through 20,
162 | STATISTICS TRANSLATED
and the rightmost column shows degrees of freedom from 21 through 30. To the right
of each of these values, you can see the critical value of t for that given degree of free-
dom; this value is shown for both alpha = .05 and alpha = .025.
Before we move forward, one thing you may have noticed is there is no column
for α = .01; I did that on purpose simply to get the idea across. This is just an example
of the critical values of the t table. You’ll find that others have all the values of t in one
column; others include different alpha values, and so on. For now, we’ll just use what
we have.
In order to use the t table, let us suppose we have a scenario with 12 degrees of
freedom and want the critical value of t for alpha = .05. All we need to do is find 12
in the table and look directly to the right in the column labeled alpha = .05. As you
can see, the critical value of t is 1.782. In another case, we might have 29 degrees of
freedom and want the critical value of t when alpha is .025. Again, we would find the
appropriate row and see that the critical value of t is 2.045.
In order to address this issue, statisticians have developed tools, called effect size indi-
ces, which help us better understand the magnitude of any significant difference we’ve
uncovered. An easy way to think about the effect size is that it helps us better under-
stand the extent to which the independent variable affects the dependent variable;
this is sometimes called the practical significance. In this case, the effect size for the
one-sample t-test, called Cohen’s delta (i.e., d), is computed by subtracting the mean
of the population from the mean of our sample; we then divide the remainder by the
standard deviation of our sample.
x −µ
d=
s
Let’s replace the symbols in our formula using the values from above:
32 − 30
1. d =
2.87849
2
2. d =
2.87849
3. d = .695
This leaves us with an effect size of .695, but what does that mean?
First, it’s plain to see that the computed effect size is nothing more than the per-
centage of the standard deviation that the difference in the mean scores represents.
In discussing this difference, Cohen defined effect sizes as being “small” (i.e., .2 or
smaller), “medium” (i.e., between .2 and .5), or “large” (i.e., greater than .5). In this
case, we have a large effect size; the groups are significantly different, and the levels
of the independent variable had a dramatic effect on the dependent variable. In short,
there’s a strong relationship between where a person lives and their level of anxiety.
As you will see in the following chapters, different inferential tests require effect
sizes to be computed using different values and different formulas. Even so, inter-
preting them will be about the same. Because they contribute so much to our goal
of becoming a good consumer of statistics, we will make computing the effect size a
normal part of our descriptive statistics from now on.
Men who use anabolic steroids will have significantly shorter life
expectancies than men who do not.
In this case, suppose we know the average life expectancy for men is 70 but we
only have access to information on 12 men who used steroids of this type. Their age
164 | STATISTICS TRANSLATED
of death is shown in Table 6.3. Using this data, SPSS would compute the descriptive
statistics shown in Figure 6.7. Since SPSS does not compute the effect size for a one-
sample t-test, we can use these statistics, along with the formula, and compute an
effect size of –.535:
68 − 70
1. d =
3.742
−2
2. d =
3.742
3. d = .535
We are interested only in the absolute value, so we drop the negative sign and
wind up with an effect size of .535. SPSS would then compute the inferential statistics
in Figure 6.8.
Descriptive Statistics
One-Sample Test
Test Value = 70
Here our computed value of t is –1.852, and our critical t value is 1.796 for alpha
= .05. Remember, though, the table doesn’t show negative values, so, if we are going
to test a one-tailed “less than” hypothesis, we need to put a negative sign in front of
our value; this means we wind up with –1.796. Since we have a one-tailed hypothesis,
The One-Sample t-Test | 165
as seen in Figure 6.9, we would then plot our critical value on the “less than” side of
the distribution.
Our computed t value, –1.852, is
less than the critical value of t (i.e.,
–1.796). Because of that, we reject
our null hypothesis and support our
research hypothesis. This is supported
by a large effect size of .535; the inde-
pendent variable does have a large effect
on the dependent variable. Apparently,
men who use anabolic steroids live sig-
FIGURE 6.9. Comparing the computed and nificantly fewer years than their peers
critical values of t for a one-tailed alpha value
of .05.
who do not.
Just as was the case with the z-test,
this is very easy to understand. To make
things even better, just as we did with the large samples, we can use our p value to help
us test our null hypothesis.
68, two years less than the average life span of 70 and in the order we hypothesized.
If, however, the average life span of males using steroids was 72, we would still have a
difference of two years and we would reject the null hypothesis. In that case, however,
since the values are not in the order hypothesized, we would not support the research
hypothesis. We can see this in Table 6.4.
TABLE 6.4. Using the p Value to Test the Steroid Use Hypothesis
Average
lifespan Steroids Difference p Decision
70 68 –2 .0455 Reject null hypothesis.
Support research hypothesis.
70 72 +2 .0455 Reject null hypothesis.
Fail to support research hypothesis.
Let’s use the data in Table 6.5 to test our hypothesis. Figure 6.10 shows the descrip-
tive statistics from SPSS. Based on these descriptive statistics, we would compute a
very large effect size of 1.4 and a t value of 7.426. To test our hypothesis, all we need to
do is compare our computed value of t, shown in Figure 6.11, to the critical value of t.
Unlike the nondirectional hypothesis, where we had to divide our alpha value
by 2, here we have a one-tailed hypothesis, so we will determine our critical value of
t using the entire alpha value of .05. We can determine our degrees of freedom by
subtracting 1 from the sample size and then, by looking back at Table 6.2, see that our
The One-Sample t-Test | 167
Descriptive Statistics
One-Sample Test
critical value of t is 1.711. Since our critical value of t is much less than our computed
value of t (i.e., 7.426), we must reject our null hypothesis and support our research
hypothesis. Based on this, it is apparent that our nursing students do score signifi-
cantly higher on the national certification exams than those students at other univer-
sities. This, again, is supported by the large effect size. The relationship between the
computed and critical values of t is shown in Figure 6.12.
Just as was the case earlier, we need
to ensure our mean scores are in the
direction hypothesized. In this case, the
average score of our nursing students
is 825, a difference from the mean of
+25. Let’s compare our current results
against the results had our students
averaged 775 (Table 6.6).
FIGURE 6.12. Comparing the computed and Important Note about Software
critical values of t for a one-tailed alpha value Packages
of .05.
As we saw in our example output, SPSS
only gives us the two-tailed p value.
When testing a directional hypothesis, that means we had to divide our p value by 2
prior to comparing it to our alpha value. Remember, this is not the case with all soft-
ware packages; some will supply both the one-tailed and two-tailed values. These ideas
are summarized in Table 6.7.
168 | STATISTICS TRANSLATED
TABLE 6.6. Using the p Value to Test the Certification Exam Score Hypothesis
All
universities ABC Difference p Decision
800 825 +25 .0000 Reject null hypothesis.
Support research hypothesis.
800 775 –25 .0000 Reject null hypothesis.
Fail to support research hypothesis.
3. Yes—city officials have the knowledge, time, and resources needed to investi-
gate the problem.
4. Yes—the problem can be researched through the collection and analysis of
numeric data.
5. Yes—investigating the problem has theoretical or practical significance.
6. Yes—it is ethical to investigate the problem.
This study will investigate whether ambulance response times vary between
different sections of the city.
Our problem statement is clear and concise, it includes all the variables we want
to investigate, and we have not interjected personal bias.
One-Sample Statistics
Just as the residents believed, the average response time for calls in their neigh-
borhood is greater than response times for the city in general. Let’s move on to the
next step to see if their complaints are justified by answering this question, “Is the
response time to their neighborhood higher or is it significantly higher?”
One-Sample Test
Test Value = 90
This is further substantiated by noting that our computed value of t, 2.326, is much
larger than our critical value of t, 1.729.
lect numeric data showing the time it takes patients to stop sneezing. Finally, inves-
tigating the problem is ethical and very practically significant (at least to those of us
who suffer from allergies!).
Sneezing treated with the new medication will not end in significantly less
than 2 minutes.
One-Sample Statistics
One-Sample Test
There’s an interesting issue here, however, that comes up from time to time. In
this case, the new drug has an average response time of 117 seconds, while the estab-
lished drug’s average time is 120 seconds. Even though the difference is not signifi-
cant, the new drug’s time is lower. The question to the physicians then remains, “Do
we use the new drug and save 3 seconds or, is that so insignificant in the overall scope
of things that we continue with what we are used to?”
174 | STATISTICS TRANSLATED
Steve wants to demonstrate to his neighbors that he can grow tomatoes that
are equal to or larger than the community’s historical average.
My null hypothesis is
One-Sample Statistics
Things are not looking too good, are they? I was determined to grow bigger toma-
toes, but mine are a little smaller than the average weight reported by my neighbors.
Maybe I can salvage some of my dignity, though; I can still hold my head up high if
they are not significantly smaller.
One-Sample Test
Test Value = 10
much larger than the alpha value; this is reflected by a moderate effect size of .213.
Because of that, I fail to reject my null hypothesis; this means the research hypothesis
is not supported. What does this mean for me? Easy—I get to sit around talking to my
neighbors and tell them that my tomatoes are just about average.
Summary
At the beginning of the chapter I told you that, because of the similarities to the one
sample z-test, it wouldn’t take long to discuss the one-sample t-test. The key thing to
remember from this chapter is the t distribution; it is created when multiple samples,
each with less than 30 values, are collected and their means plotted. We then used
the concept of degrees of freedom to help understand the additional random error
introduced by smaller sample sizes. Again, just like the z-test, hypotheses can be
tested by computing a t value and comparing it to a critical value of t or by using a
computed p value and comparing it to alpha. Regardless of whether we reject or fail
to reject the null hypothesis, we can look at the strength of the relationship between
our independent and dependent variables using a computed effect size. In the next
two chapters we will look at two other tests that use the t distribution; the indepen-
dent and dependent-sample t-tests. While many of the underlying features are the
same, we will see that these tests are used to specifically control for the relationship
between levels of the independent variable.
Before we move forward with our quiz and into the next chapter, let me tell you
a short story about Mr. Gossett. When he was working for Guinness, Gossett was
not allowed to publish his findings because the brewery considered his work their
intellectual property. After all, they reasoned, the ideas were developed by their
employee while he was working in their brewery. Gossett, realizing the importance
of his work, felt it was imperative that he publish his findings. Rather than give
his employer a reason to fire him, he published the work under the pseudonym
“A. Student.” Because of that, many of the older textbooks refer to this distribution
as “Student’s t distribution” and to the test as “Student’s t-test.” While you may not
think this is important, just think how much money you could win on the television
show Jeopardy if one of the categories was “Famous Statisticians”!
The One-Sample t-Test | 177
Computed value of t:
x −µ
t=
Effect size:
sx
x −µ
d=
s
Quiz Time!
Using the SPSS output provided, what decision would you make for each of the hypotheses
below? Manually compute the effect size for each hypothesis; what does that tell you in rela-
tionship to the decision made?
1. The average number of cavities for elementary school children in areas without
fluoride in the water will be significantly higher than the national average of 2.0
(use Figures 6.19 and 6.20 to test this hypothesis).
One-Sample Statistics
One-Sample Test
Test Value = 2
2. Over the course of their careers, physicians trained in Ivy League schools will have
significantly fewer malpractice suits filed against them than the national average of
11 (use Figures 6.21 and 6.22 to test this hypothesis).
One-Sample Statistics
One-Sample Test
Test Value = 11
3. The graduation rate of universities that require students to spend their first two
years at a community college will be significantly greater than 82% (use Figures
6.23 and 6.24 to test this hypothesis).
One-Sample Statistics
One-Sample Test
Test Value = 82
4. Adult turkeys in the wild will weigh significantly less than 12 pounds (use Figures
6.25 and 6.26 to test this hypothesis).
The One-Sample t-Test | 179
One-Sample Statistics
Test Value = 12
5. Residents in lower socio-economic areas of town are concerned that the time it
takes to restore electricity after a storm-related power outage (i.e., 5.15) is signifi-
cantly longer than the town’s average restoration time (i.e., 4.0; use Figures 6.27
and 6.28 to test this hypothesis).
6. A local golf-pro has noticed that the number of holes-in-one made on her course
during the current season seems to be significantly higher than in years before. By
looking at the number from prior years, she can see that some years are greater
than the current year, and some less. She decided to compare her current years’
results to those in the past 20 years, to see if there is a significant difference. If she
has seen 25 holes-in-one during the current year and using Figures 6.29 and 6.30,
what could she determine?
180 | STATISTICS TRANSLATED
Introduction
We just learned how Gossett successfully used the t distribution, with a one-sample
t-test, to compare a sample of ale to the batch (i.e., the population) of ale it came
from. Gossett soon realized, however, that a problem arose if you have small sam-
ples from two different populations and need to determine if they are significantly
different from one another.
For example, suppose we are interested in looking at the difference in the number of
disciplinary referrals between children from single-parent homes and children who live
with both parents. Our independent variable would be “Home Environment,” and there
would be two levels, “Single Parent” and “Both Parents.” Our dependent variable
would be the number of times children were disciplined. In another case, we might be
interested in comparing the time it takes pigeons to learn via operant or classical con-
ditioning. Again, we would have one independent variable, type of reinforcement, with
two levels—operant conditioning and classical conditioning. Our dependent variable
would represent the amount of time it took pigeons to learn to do a particular task.
Both scenarios meet all the criteria for an independent or dependent sample
t-test. Both have one independent variable with two levels, one dependent variable,
and the data being collected is quantitative. Our only problem is, how do we decide
whether to use the independent-sample t-test or the dependent-sample t-test?
We can answer that question by looking closely at the relationship between the
two levels of each of the independent variables. Do you notice anything remarkable?
In the first case, you are measuring the number of referrals of two groups of stu-
dents—those from single-parent homes and those from homes where there are two
parents. These two groups are independent of one another; a child who is in one group
can’t be in the other. The same goes for the groups of pigeons—one group is being
181
182 | STATISTICS TRANSLATED
trained with operant conditioning, and the other with classical conditioning. Being in
one group excludes the possibility of being in the other.
In both instances the researcher would collect data from two groups, with each
group representing a unique population. In Chapter 8, we will see that samples are
not always independent, so we will use the dependent-sample t-test. In the mean-
time, in Figure 7.1 we can use our example of students from different home environ-
ments to help explain what we’ve covered up to this point.
tion mean, it made sense to take single random samples from a population, calculate
the mean of the sample and then plot it. By doing this, we created the sampling distri-
bution of the means. Given both large enough sample sizes and number of samples, the
sample mean was equal to the population mean, and the main measure of dispersion
was called the standard error of the mean. We were then able to compare our sample
mean to the sampling distribution of the means to determine any significant difference.
In this case, however, we are not interested in determining if a single sample
is different from the population from which it was drawn. Instead we are interest-
ed in determining if the mean difference between samples taken from two like, but
independent, populations are significantly different from the mean difference of any
other two samples drawn from the same populations. This can be done by creating a
sampling distribution of mean differences (SDMD) using the following steps.
1. Take repeated samples from each of the two populations we are dealing with.
2. Calculate the mean of each sample.
3. Compute the difference between the two means.
4. Plot the difference between the two means on a histogram.
To help us understand this, keep in mind that our hypothesis concerning disci-
plinary referrals involves samples from two populations—students from single-parent
homes and students who live with both parents. As we saw with the central limit theo-
rem, most of the values in each sample will cluster around the mean—a mean that
we’ve hypothesized is the same for both populations. When we take samples from
each population and then subtract the mean of one from the mean of the other, it
naturally follows that the majority of the means of the repeated samplings are eventu-
ally going to cancel one another out, leaving a distribution mean of zero. Those means
that do not cancel out will leave negative and positive values that create the tails of
the distribution.
Let’s use the hypothesis about the number of parents and disciplinary referrals
to demonstrate this idea. Table 7.1 shows data for 10 samples representing the aver-
age number of referrals for each level of the independent variable (i.e., single parent
or both parents). After that, I’ve subtracted the mean number of referrals from two-
parent homes from the mean number of referrals from a single-parent home and have
included that value in the fourth column:
If we use a histogram to plot the difference values from the rightmost column,
SPSS will create Figure 7.3. If we continued calculating an infinite number of these
mean differences and plotted them on a histogram, we would wind up with the same
qualities of the t distribution discussed in Chapter 6.
1. The shape of the distribution will be different for different sample sizes. It
will generally be bell-shaped but will be “flatter” with smaller sample sizes.
2. As the number of samples plotted grows, the distribution will be symmetrical
around the mean.
3. The empirical rule applies.
4. The measure of dispersion, the standard error of mean differences (SEMD),
is conceptually the same as the standard error of the mean.
The Independent-Sample t-Test | 185
Some of this may look like Greek (no pun intended), so let’s use Table 7.2 to
refresh our memory of the symbols we’ve already covered as well as add a few new
symbols to our gray matter.
Now, let’s start putting these together to solve the equation. The first thing we see
in the denominator (the bottom of the fraction) is that we need to compute the sum of
squares for both groups. We have already talked about the idea of the sum of squares
earlier, but let’s compute it again using the following formulas:
1. SS1 = ∑ x12 −
( ∑ x1 )
2
n1
2. SS2 = ∑ x 2 − ( ∑ x2 )
2
2
n2
First, we will modify Table 7.1 by squaring each of the values in our dataset; this is
shown in the second and fourth columns of Table 7.3 and is labeled Step 1. In Step 2,
we then total the values for each column. Finally, in Step 3, we will compute the mean
for our original two sets of values, shown in the bottom row of Table 7.3.
The Independent-Sample t-Test | 187
Step 2 Sx1 = 30 2
Sx2 = 122 Sx2 = 40 2
S x2 = 224
Step 3 x1 = 3 x2 = 4
We can then compute the sum of squares by using the data from the table.
We can now include our SS1 and SS2 values, along with the values we already
knew, into our equation and compute our actual t value:
3−4
1. t =
32 + 64 1 1
+
10 + 10 − 2 10 10
2. t = −1
96 2
18 10
−1
3. t =
(5.33)(.2)
188 | STATISTICS TRANSLATED
−1
4. t =
1.066
−1
5. t =
1.0325
6. t = –.968
FIGURE 7.4. Data View spreadsheet including number of parents and referrals.
FIGURE 7.5. Using the Compare Means and Independent-Samples T Test commands.
FIGURE 7.6. Defining the Test and Grouping Variables for the Independent-Samples T Test.
190 | STATISTICS TRANSLATED
Group Statistics
Figure 7.8. The column to the far right is labeled “Referrals” and represents the infer-
ential statistics for our dependent variable. Beneath that, is a column titled “Equal
Variances Assumed.” If you look down that column to the row labeled “T,” you’ll see
the value –.968, exactly what we computed by hand. We can now use this information
to help us make a decision about the hypothesis we stated.
Referrals
Df 18 16.200
the following formula; in it you can see we add the two sample sizes and then subtract
the number of levels in our independent variable, in this case 2:
df = (n1 + n2) – 2
Since both of our samples sizes are 10, our degrees of freedom value is 18 (i.e.,
10 + 10 – 2 = 18). We can check ourselves by looking at Figure 7.8, shown above.
With an alpha value of .025 and 18 degrees of freedom, we can use Table 7.4, an
exact copy of Table 6.2, and find our critical value of t is 2.101.
We can then plot both the critical and computed value and t and test our hypoth-
esis. Remember, in Figure 7.9, we must plot 2.101 as both a positive and a negative
value since we’re testing a two-tailed hypothesis.
Should we reject or fail to reject our
null hypothesis? In this case, we would
fail to reject the null since our computed
value of t (i.e., –.968) is within the range
created by our critical value of t. That
means, in this case, although the stu-
dents with both parents at home have a
higher number of referrals, the differ-
ence isn’t significant.
The p Value
We can also look at our p value to help
us decide. Since we are testing a two-
FIGURE 7.9. Plotting the critical and computed
values of t. tailed hypothesis, we need to use the
two-tailed p value. If we look in the col-
umn labeled “Equal Variances Assumed” up in Figure 7.8, we can see that p (i.e., Sig.
2-tailed) of .346 is much greater than our alpha value, and we support our decision to
not reject the null hypothesis.
192 | STATISTICS TRANSLATED
3.56 + 7.11
1. s pooled = 2
10.67
2. S pooled =
2
3. S pooled = 5.34
4. Spooled = 2.31
This gives us an effect size of –.433. As we said earlier, we always use the absolute
value of an effect size; that means we just need to drop the negative sign and wind up
with an effect size of .433. According to Cohen, this is a medium effect size and, as
might be expected given the relatively large p value, indicates that the number of par-
ents in a home does not have a large effect on the number of referrals a given student
receives.
In order to test this hypothesis, we could ask a representative sample of males and
females to report how often they have dreams of this type in a month. Our indepen-
dent variable is gender and has two levels, male and female. We have one dependent
variable, frequency of nightmares, and the data collected is quantitative. Since a given
participant can fall into only one category, male or female, we will use an indepen-
dent-sample t-test.
Using the data in Table 7.5, we might see that males averaged 12 nightmares
and females averaged 7 nightmares (Figure 7.10). The mean difference between these
groups is 5, meaning that males have an average of 5 more nightmares per month
than females. We can use this information in Table 7.6 to compute the square of the
number of dreams for each student.
Group Statistics
x1 x22 x2 x22
12 144 2 4
13 169 3 9
17 289 4 16
10 100 9 81
10 100 8 64
10 100 9 81
9 81 10 100
12 144 8 64
15 225 8 64
12 144 9 81
Sx1 = 120 2
Sx2 = 1496 Sx2 = 70 2
S x2 = 564
x 1 = 12 x1 = 7
Let’s compute the t value by using the values in Table 7.6; I’ve already computed
the sum of squares for you.
12 − 7
t=
56 + 74 1 1
+
18 10 10
Nightmares
2. Spooled = 14.44
2
3. Spooled = 7.22
4. Spooled = 2.69
We can include this into our effect size formula along with the values of the two
means resulting in an effect size of 1.86. Obviously, the independent variable has a
large effect on the dependent variable.
12 − 7
1. d =
2.69
2. d = 1.86
This means, of course, that we’re going to have the same descriptive statistics
and we will compute the same value for t. We do need a different critical value of t,
however, because we are testing a directional hypothesis. Because we’re dealing with a
one-tailed hypothesis, however, we need to refresh our memories on using our alpha
value to determine the correct critical value of t we need to use.
As you know, when you state a directional hypothesis, you’re saying one of the
mean values you are comparing is going to be significantly higher OR significantly
lower than the other. Since you’re not looking at the probability of it being either, this
means you are going to have to use your entire alpha value on one end of the curve
or the other. Here, we’re hypothesizing that the mean of one group is going to be sig-
nificantly higher than the other. Because of that, we wouldn’t divide alpha by 2 as we
did with the two-tailed test; instead we would use the t table to look at the one-tailed t
value for an alpha level of .05 and 18 degrees of freedom. We would find this gives us
a critical value of 1.734; let’s plot this in Figure 7.13 and see what happens.
In this case, our computed value
of t (i.e., 4.16) is greater than our criti-
cal value of t (i.e., 1.734). There seems
to be a significant difference in the val-
ues, and we can use our p value to help
support our decision. We have to be
careful here, however. Remember, we
are looking at a one-tailed hypothesis,
and we used our entire alpha value to
determine the critical value of t. Having
changed those things, our p value is also
affected; it has dropped to .0005. When
we compare it to our alpha value of .05,
FIGURE 7.13. Plotting the critical and computed we do support the decision we made
values of t for a one-tailed test. when we compared the computed and
critical values of t. In Table 7.7, we can
see the logical pattern of when to divide either alpha or p by 2 depending on the type
of hypothesis.
Because of that, the manager asks us to get involved; let’s see what we can do to help
him.
As we can see, this is a directional hypothesis since we believe one group will
perform significantly better than the other group. We stated it that way because that’s
the scenario the manager wants to investigate. We could have just as easily stated a
nondirectional hypothesis by saying there would be a significant difference between
the groups with no direction stated. That isn’t, however, what the scenario calls for.
The null hypothesis would be:
variable is location and there are two levels: students in the Old Library and students
in the new Arts and Sciences building.
Group Statistics
We can see that the students in the Arts and Science building do have a higher
average score than their counterparts in the Old Library. While it is apparent that
the independent variable does have some effect on the dependent variable, that’s not
enough. We must determine if the difference between the two groups is significant or
if it is due to chance.
Achievement
Up to this point, we’ve used only the “Equal Variances Assumed” column to test
our hypotheses. As I said earlier, now that we have a good feeling for how the inde-
pendent-sample t-test works, it is important to understand the meaning of these two
values.
There will be no significant difference between the variance of the Arts and
Sciences group and the variance of the Old Library group.
The Independent-Sample t-Test | 201
Notice this is a two-tailed hypothesis and it is always stated in this manner. To test
it, SPSS computes an analysis of variance (this is often called ANOVA, but more on that
later). The results of this ANOVA are shown in the top two rows of Figure 7.15 labeled
“Levene’s Test for Equality of Variance.”
To interpret the results, we first need to set our alpha level at .05. We then com-
pare our alpha value to the computed value of p for the Levene test. Since, in this
case, the computed value of p is .913 and our alpha value is .05, we fail to reject the
null hypothesis (be careful here, we are still talking about the values in the rows for
the Levene’s test, not the p values farther down the table). This means there’s no sig-
nificant difference in the variance between the two groups. Given that, we will use
the column reading “Equal Variances Assumed” for our independent-sample t-test. If
the computed p value for the Levene test was less than .05, we would reject the null
hypothesis and then have to use the column reading “Equal Variances Not Assumed.”
This idea is summarized in Table 7.9.
Determining the Critical Value of t When You Have More Than 30 Degrees
of Freedom
We could also check ourselves by comparing the computed value of t to the critical
value of t. In this case, in order to do that, we need to look at an issue that arises when
our degrees of freedom value is greater than 30.
Up to this point, we’ve been dealing with cases where we have had 30 or fewer
degrees of freedom and each of those values has been listed in the t distribution table.
In this case, we have 48 degrees of freedom, but if you look at the complete t distri-
bution table in the back of the book, you’ll see the values skip from 30 to 40, 40 to
60, 60 to 120, and then 120 to infinity (that’s what the symbol ∞ means). You might
ask, “What happened to all of the other values?” The short answer is, once we get to
a certain number of degrees of freedom, the table can be abbreviated because the
area under the curve for t changes so little between the differing degrees of freedom,
there’s no need to have a value that is 100% correct.
For example, if we have a problem with 40 degrees of freedom and our alpha
value is .05, our critical value for t would be 1.684; if we had 60 degrees of freedom, it
would be 1.671. Since the difference between these two values is so small (i.e., .013) the
table does not break the critical values down any further. Instead, when we are manu-
202 | STATISTICS TRANSLATED
ally computing statistics, we use the next highest df value in the table that’s greater
than our actual df. In this case, we have 48 degrees of freedom so we would use the
value for df = 60 (or 1.671). Since our computed value of t is 2.211, we again know we
should reject our null hypothesis.
rent motivational theory and contributes back to the literature about what’s known
about feedback and intrinsic motivation.
Research has shown that the frequency of report cards may affect student
motivation. In order to be able to provide feedback to students in the most
meaningful way possible, teachers will investigate the effect on motivation of
weekly report cards versus report cards given once per 9 weeks.
Group Statistics
We can see the weekly group has a mean score (i.e., 74.2) considerably larger
than do those students getting the report cards every 9 weeks (i.e., 68.2). Remember,
though, the teachers are interested in a significant difference, not just a difference.
Knowing that, they need to look at the second table. Our job now is to select the
appropriate statistic to help us interpret our data.
Motivation
Equal variances Equal variances
assumed not assumed
Df 18 9.814
As we said earlier, since the teachers are looking only for a significant differ-
ence in their hypothesis and no direction is implied, the hypothesis is two-tailed; that
means they should use the Sig. (2-tailed) value shown in the table. In this case, the p
value is .076, larger than the alpha value of .05. Because of this, the null hypothesis
is not rejected; there’s not a significant difference in levels of motivation between
students who receive report cards every week and those who receive them only every
9 weeks. We could verify that by checking the computed value of t (1.986) against our
critical value of t (2.101); again, this would show that the teachers would not reject the
null hypothesis.
A Point of Interest
We have already agreed that a good hypothesis is consistent with prior research or
observations. In this case, based on the literature, the teachers hypothesized there
would be a significant difference in motivation. They didn’t say significantly higher
or significantly lower, they just said “different.” Interestingly, had they opted for a
one-tailed hypothesis such as “Students receiving weekly report cards will have signifi-
cantly higher levels of intrinsic motivation than those receiving report cards each 9
weeks,” they would have rejected the null hypothesis. We know that because, had we
divided our two-tailed p value by 2, the resultant one-tailed p value of .038 would be
less than our alpha value. Again, based on the literature, the hypothesis is two-tailed
206 | STATISTICS TRANSLATED
so the teachers have to live with it the way it is. Let’s look at another scenario and see
what happens.
Group Statistics
So far it seems we are doing fine. We have two groups of quantitative data that
are fairly normally distributed. The means are 5.2 points apart, and we have a rather
large effect size of .650 so we may be on to something. Let’s move on to the next step
and see what happens.
208 | STATISTICS TRANSLATED
Anxiety
Df 28 25.995
Wait a minute; are we jumping the gun here? We are right; we do have a p value
less than our alpha value, but look at the mean scores from the two groups. The stu-
dents involved in sports have a higher mean score (35.6) than their less active class-
mates (30.4). In this case, the researchers would technically be able to reject the null
hypothesis, but since the results weren’t in the hypothesized direction, there is no
support for the research hypothesis. For us, this shows we need to pay attention to
the relationship between our printout and the hypothesis we’ve stated. Many good
students have been led astray by not paying strict attention to what they were doing!
The Independent-Sample t-Test | 209
Summary
Using the independent-sample t-test is very straightforward. Once we state our hy-
pothesis and identify the statistical tool we need, interpreting the results of a sta-
tistical software package is easy. Remember, the key things to look for are Levene’s
variance test and the computed value of p for the data you’ve entered. Always keep
in mind, however, that if you reject the null hypothesis, ensure the means are in the
order you hypothesized prior to supporting the research hypothesis. Table 7.12 can
act as a guide to help you understand and use the t-test.
Decide which t-test If levels of independent variable are related, use the
is needed dependent sample t-test.
If levels of independent variable are not related, use
the independent sample t-test.
Once the test is run, check If p is less than .05, equal variances are not assumed.
Levene’s p value If p is greater than or equals .05, equal variances are
assumed.
Use appropriate p value to If p is less than alpha, reject null hypothesis. Be sure
compare to alpha value to check if the mean values are in the order you’ve
hypothesized prior to making any decisions.
If p is greater than or equals alpha, fail to reject null
hypothesis.
210 | STATISTICS TRANSLATED
Degrees of freedom:
df = n1 + n2 – 2
Effect size:
x1 − x2
d=
s pooled
Pooled standard deviation:
2 2
s1 + s2
s pooled = 2
n1
n2
t-score:
x1 − x2
t=
SS1 + SS 2 1 1
n + n − 2 n + n
1 2 1 2
The Independent-Sample t-Test | 211
Quiz Time!
Phone Calls
ing this could lead to fewer sales, the owners of the company decided to put half of their
employees into private offices to see if it would lead to more calls per hour.
Marketing Calls
Equal variances Equal variances
assumed not assumed
Df 14 13.894
Sig. (2-tailed) .000 .000
Group Statistics
Ticket Sales
Df 28 27.967
Introduction
In the last chapter, we saw that when we have one independent variable with two
levels that are independent of one another, we used the independent-sample t-test.
What happens, however, when the levels of the independent variable represent two
different measurements of the same thing? For example, imagine comparing pretest
and posttest scores for one group of students. If we did, we would have two mea-
surements of the same group. The independent variable would be “Student Test
Scores” and there would be two levels, “Pretest Scores” and “Posttest Scores.” In
this case, a given student’s posttest score would be directly related to his or her
pretest score. Because of that, we would say the levels are dependent upon, or influ-
ence, one another.
As another example, suppose we are interested in determining if a particular drug
has an effect on blood pressure. We would measure a person’s blood pressure prior
to the study, have the patient take medicine for a period of time, and then measure
the blood pressure again. In this case, for a given set of measurements, the “Before
Blood Pressure” and the “After Blood Pressure” are the two levels of the independent
variable “Blood Pressure.” Again, these two levels are related to one another, so we
would use the dependent-sample t-test (sometimes called the paired-samples t-test)
to check for significant differences.
you might be asking, “Why not just use the independent-sample t-test?” While that is
a logical question, unfortunately, we cannot do that. As we’ll soon see, the relation-
ship between the levels of the independent variable creates a problem if we try to do
it that way.
D
t=
(∑ D)2
∑D −
2
n
n ( n − 1)
The Dependent-Sample t-Test | 219
The average number of hours spent reading per week will be significantly
greater after allowing students to read books that appeal to them.
In Table 8.2, we have data for five students with the number of hours weekly
they read before the new books were introduced, as well as the number of books they
read after the new books were introduced. In order to compute the t value, we need
to include two extra columns to create Table 8.3. The first new column, labeled D,
shows us the difference between the Before and After scores. The second new column,
labeled D2, shows the squared value of D. At the bottom of each of those columns, you
can see we have summed these values.
Looking back at the equation, you can see the first value we need to compute is
D, the average of the difference between scores in each dataset. We can see the sum
of the differences (i.e., ∑D2) is 10; to compute the average, we need to divide that by
220 | STATISTICS TRANSLATED
5, the number of values in our dataset. This leaves us with an average of 2; we can put
that into our formula before moving on to the next step.
2
t=
(∑ D)2
∑D −
2
n
n ( n − 1)
We can now deal with the denominator of the equation. Again, let’s go through
step by step. First, in the third column of our table we have calculated the difference
between both values and summed them (i.e., 10). In the rightmost column, we have
taken each of the difference values and squared it. Adding these values gives us the
sum of the differences squared (i.e., ∑D2). We can insert these values, along with n
(i.e., 5) into the equation. We can finish computing the equation using the following
steps.
1. t = 2
100
28 −
5
n ( n − 1)
2
2. t =
100
28 −
5
n ( n − 1)
2
3. t =
28 − 20
5 ( 5 − 1)
2
4. t =
8
20
2
5. t =
(.4)
2
6. t =
.6325
7. t = 3.16
This leaves us with a computed t value of 3.16. Before we can test the hypothesis,
we must determine the critical value of t from the same table we used with the inde-
The Dependent-Sample t-Test | 221
pendent-sample t-test. This time, however, we will compute our degrees of freedom
by subtracting one from the total pairs of data; when we subtract 1 from 5, we are left
with 4 degrees of freedom. Using the traditional alpha value of .05, we would refer to
our table and find that the critical value of t is 2.132.
We can then plot that on our t distribution shown in Figure 8.1; remember, we
have a one-tailed hypothesis, so the entire t value goes on one end. Here our com-
puted value of t is greater than our critical value of t. Obviously we have rejected the
null hypothesis and supported my wife’s research hypothesis: children do read more
when they are interested in what they are reading.
In order to check our work using
SPSS, we need to set up our spread-
sheet, shown in Figure 8.2, to include
two variables, Before and After; we
would then include the data for each.
Following that, in Figure 8.3, we select
Analyze, Compare Means, and Paired
Samples T Test. As shown in Figure 8.4,
we would then identify the pairs of data
we want to compare and click on OK.
FIGURE 8.1. Comparing the computed and criti- SPSS would first provide us with Figure
cal values of t. 8.5; it verifies what we computed earlier
in Table 8.3.
FIGURE 8.2. Before and after data in the Data View spreadsheet.
As we can see in Figure 8.6, the t value of 3.16 is exactly what we computed earlier,
and our p value is less than .05. This means we can support the research hypothesis;
kids reading books they enjoy actually did spend significantly more time reading per
week than they did when they had no choice in their reading material. What is the
bottom line? My wife is happy!
222 | STATISTICS TRANSLATED
FIGURE 8.3. Using the Compare Means and Paired-Samples T Test command.
FIGURE 8.4. Creating the pairs of data to be analyzed using the Paired-Samples T Test.
The Dependent-Sample t-Test | 223
Pair 1
After - Before
Using the values from above and the following two steps, we can compute an
effect size of 1.42:
2
1. d =
1.41
2. d = 1.42
According to Cohen’s standards, this is very large and indicates that the treatment
had quite an effect on the dependent variable. Just as we did in the preceding chapter,
we will always include the effect size as part of our descriptive statistics.
A track athlete’s time in the 400-meter run will be significantly less after
following a high-protein diet for 6 weeks.
During the diet regimen, the coach collected data shown in Table 8.4; each of the
times is measured in seconds.
Just by looking, it seems that the “after” scores are lower, but let’s go ahead and
compute our t value. First, since we are looking at a one-tailed “less than” hypothesis,
we need to subtract the “Before” from the “After” value. This can be seen in Table 8.5.
To compute x , we are again going to divide the sum of the differences (–39) by
the number of values in the dataset (6); this gives us an average difference of –6.5.
Let’s go ahead and enter that into our equation:
−6.5
t=
(∑ D)2
∑D −
2
n
n ( n − 1)
We continue by inserting the sum of the differences squared (i.e., 273), the sum
of the differences (i.e., –39), and n (i.e., 6), into our equation:
The Dependent-Sample t-Test | 225
−6.5
t=
( −39)2
273 − 6
6 (6 − 1)
−6.5
1. t =
1521
273 −
6
30
−6.5
2. t =
19.5
30
3. t = −6.5
.065
−6.5
4. t =
.806
5. t = –8.06
We can now help the coach test his hypothesis. First, using our alpha value of
.05 and our degrees of freedom of 5, we find that we have a critical t value of 2.015.
In order to plot this, keep in mind that we have a one-tailed hypothesis, so the entire
critical t value goes on one end of the distribution as shown in Figure 8.7. In this case,
since we have a one-tailed “less than” hypothesis, it needs to go on the left tail of the
distribution. Since our computed value of t is also negative, it needs to go on the left
side of the distribution as well.
We can clearly see our computed
value of t of –8.06 is far less than our
critical value of t; this means we must
reject the null hypothesis. It appears
that athletes really do run faster if they
follow a high-protein diet.
In Figures 8.8 and 8.9, we can con-
firm what we have done using SPSS. We
FIGURE 8.7. Using the computed and critical are further assured of our decision since
values of t to test the hypothesis. the p value of .000 is less than our alpha
value; we can also compute an effect size
by dividing the mean difference of our
scores by the standard deviation.
226 | STATISTICS TRANSLATED
Pair 1
−6.5
d=
1.97
This yields an effect size of –3.30, but as we did with the independent-sample
t-test, we must drop the negative sign and use the absolute value. When we do, we see
that our effect size means our intervention had a definite effect on our dependent
variable.
Just like before, we compute D by taking the average of the difference scores (i.e.,
37/10 = 3.7) and include it in our formula:
3.7
t=
(∑ D)2
∑D −
2
n
n ( n − 1)
We can insert the values for SD2, SD, and n into our equation and use the follow-
ing steps to compute t:
3.7
1. t =
1369
563 −
10
10 (10 − 1)
3.7
2. t =
563 − 136.9
90
3.7
3. t =
426.1
90
3.7
4. t =
(4.73)
5. t = 3.7
2.18
Finally, we are left with a t value of 1.70; this matches Figures 8.10 and 8.11 that
would be created by SPSS.
228 | STATISTICS TRANSLATED
Pair 1
After the
Medication -
Before the
Medication
Paired Differences Mean 3.70000
Std. Deviation 6.88073
Std. Error Mean 2.17588
You can see from the descriptive statistics that the average blood pressure before
the medication is 118.3, with a slightly higher blood pressure of 122 after the medica-
tion has been taken. If we compute our effect size of .538 based on these values, we
can see that the intervention had a moderate influence on our dependent variable.
We can now plot our computed t value and the appropriate critical value. Remem-
ber, since we are dealing with a two-tailed hypothesis, it is necessary to divide alpha by
2 and find the critical value for alpha = .025. Using the table, along with 9 degrees of
freedom, our critical value of t is 2.262. As shown in Figure 8.12, we would then mark
that off on both ends of our normal curve.
Since our computed value is within
the range of the positive and negative
critical values, we do not reject the null
hypothesis; even though the average
“After” blood pressure rose slightly, it
wasn’t a significant difference. This is
verified by the two-tailed p value of .123.
In other words, after all of this, we have
shown that the drug manufacturers
have nothing to worry about. Apparent-
FIGURE 8.12. Using the computed and critical ly, their new migraine medicine doesn’t
values of t to test the hypothesis.
significantly affect the average systolic
blood pressure.
The Dependent-Sample t-Test | 229
Math teachers have found many students are apathetic toward their subject
matter; this, they feel, may lead to lower achievement. This study will
investigate using a software package that tailors the text of word problems to
each student’s given interests and activities. This, they believe, will lead to
lower apathy and higher achievement.
State a Hypothesis
S TEP 2
From the text of the scenario it is easy to develop the hypothesis
the teacher will be investigating:
As we can see, we have a directional hypothesis because we have stated that stu-
dents will have lower levels of apathy after the intervention than they did before it.
Again, we stated it in this manner because it reflects the situation that the teacher
wants to investigate. The corresponding null hypothesis would read:
230 | STATISTICS TRANSLATED
SPSS would produce Figure 8.13, our descriptive statistics. In this case, we used
nine pairs of data representing “Starting Apathy” and “Ending Apathy” scores. In
the Mean column, we see two values. The Starting Apathy scores show that students
had an average score of 66.78 on the questionnaire administered at the start of the 10
weeks. The Ending Apathy scores, collected at the end of the 10 weeks, show an aver-
age of 60.89. For each of the mean values, we also see the standard deviation and the
standard error of the mean.
The Dependent-Sample t-Test | 231
Pair 1
Ending Apathy -
Starting Apathy
Paired Differences Mean -5.88889
Std. Deviation 6.88194
State a Hypothesis
S TEP 2
Here is the research hypothesis that corresponds to the princi-
pal’s plan:
We can see this is a directional hypothesis in that we are suggesting the total num-
ber of absences will be less than they were before starting the study. We can state the
null hypothesis in the following manner:
Things are looking good; the average number of absences after the implementa-
tion of the party program is less than the average number before starting the pro-
gram. We have to be careful, though; the values are very close. We should run the
appropriate statistical test and see what the output tells us.
Pair 1
Ending
Absences -
Starting
Absences
(i.e., .744) and an even smaller mean difference score (i.e., –.375); this gives us an effect
size of .504; things are not looking good for the principal but let’s move forward.
Knowing that we have a one-tailed hypothesis, we would divide our Sig. (2-tailed)
value and arrive at a one-tailed p value of .0985; this means we cannot reject our null
hypothesis. We could verify this by comparing our computed value of t, –1.416, to our
critical t value, 2.365. Although the principal tried, the offer of a party at the end of
the term just was not enough of a stimulus to get students to come to school.
State a Hypothesis
S TEP 2
Based on this story, we can clearly see a hypothesis has formed.
Since the principal is not sure if attendance will go up or down,
the hypothesis must be stated as two-tailed (Step 1):
Pair 1
Old Schedule -
New Schedule
than our alpha value of .05. We have to reject our null hypothesis; there was a signifi-
cant difference in the number of absences. Unfortunately for the principal, this means
the new schedule caused a significant increase in the number of absences!
Summary
The dependent-sample t-test, much like its independent counterpart, is easy to un-
derstand, both conceptually and from an applied perspective. The key to using both
of these inferential techniques is to keep in mind that they can be used only when
you have one independent variable with two levels and when the dependent vari-
able measures quantitative data that is fairly normally distributed. Again, the labels
“independent” and “dependent” describe the relationship between the two levels of
the independent variable that are being compared.
As I said earlier, these two inferential tests are used widely in educational re-
search. What happens, though, when you have more than one independent variable,
more than two levels of an independent variable, or even multiple dependent vari-
ables? These questions, and more, will be answered in the following chapters.
Quiz Time!
As usual, before we wind up this chapter, let’s take a look at a couple of case studies. Read
through these and answer the questions that follow. If you need to check your work, the
answers are at the end of the chapter.
Pair 1
Laptops Used -
No Laptops
Pair 1
After Annex -
Before Annex
Paired Differences Mean $-9,471.867
Pair 1
SPAM After -
SPAM Before
Paired Differences Mean 19.875
Pair 1
Satisfaction
Before -
Satisfaction
After
Paired Differences Mean 17.86667
Analysis of Variance
and Multivariate Analysis of Variance
Introduction
In the last two chapters we dealt with situations in which we have one independent
variable with two levels. In many instances, however, we may have an independent
variable with three or more levels. For example, suppose we are interested in deter-
mining if there is a significant difference in the per capita income between residents
of Canada, the United States, and Mexico. We could easily state a hypothesis to
investigate this question.
There will be a significant difference in the average per capita income between
citizens of Canada, the United States, and Mexico.
Our independent variable is “Country” and the three different countries represent
three levels. The dependent variable is the average income for each country’s citi-
zens. Given all of this, which statistical test can we use?
At about this point, a lot of beginning statisticians say, “This is easy; we will use
three separate independent-sample t-tests. We will compare Canada to the United
States, Canada to Mexico, and then the United States to Mexico. That covers all the
bases.” They are right. Those three comparisons are inclusive of all the comparisons
that could be made between the three countries but using three different t-tests cre-
ates a problem. Let me show you what I mean.
Let’s suppose we actually decided to use the three separate t-tests; by doing so,
we would have three separate computed values of t, up to three separate critical
values of t (depending on the number in each group), and we would be using alpha in
three separate tests. The first two of these things are not so important, but the three
244
Analysis of Variance and Multivariate Analysis of Variance | 245
alpha values create quite a problem. In order to understand what might go wrong,
think back to a very early chapter where we talked about Type I and Type II errors.
We agreed that Type I errors occur when we reject a null hypothesis when we
should not; the larger the alpha value, the greater the probability of making an error
of this type. We also said that, if we decrease alpha, we increase the probability of a
Type II error where we fail to reject a null hypothesis when we should. We settled on
a traditional alpha value of .05 because it is a good balance between the two types
of error.
Keeping that in mind, think what might happen if we used three independent
sample t-tests to test our hypothesis. In essence, we would be using an alpha value
of .05 three different times in order to test one overall hypothesis. Obviously, this is
going to greatly inflate our possibility of making a Type I error. In order to address this
problem, an English agronomist, Ronald Fisher, developed the analysis of variance
or, as it is most always abbreviated, ANOVA.
Fisher, originally an astronomer, began working in 1919 as a biologist for an ag-
ricultural station in England. During his tenure there, Fisher contributed greatly to
the fields of genetics and statistics. Like his friend Gossett and his t-test, Fisher was
interested in working with small data samples and eventually developed the analysis
of variance to help him in his studies. His list of accomplishments in statistics and
genetics is so great and his awards are so impressive that he was even knighted by
Queen Elizabeth in 1952!
Fisher felt that the only way to accurately investigate significant differences
between groups was to look at the variance of scores within each of the levels of the
independent variable as well as the variance between the different groups. We will
discuss the underlying logic and formulas in a bit, but for now let’s look at the differ-
ent types of ANOVAs.
One-Way ANOVA
The most elementary ANOVA is called the one-way, or simple, ANOVA. When we
say something is “one way,” we mean there is only one independent variable and one
dependent variable. The independent variable can have three or more levels, and the
dependent variable represents quantitative data.
Factorial ANOVA
When we say an ANOVA is “factorial,” we simply mean there is more than one inde-
pendent variable and only one dependent variable. One of the most common ways
we refer to factorial ANOVAs is by stating the number of independent variables. For
example, if we have a three-way ANOVA, we are saying we have a factorial ANOVA
with three independent variables. Sometimes we take it a step further and tell the
reader the number of levels within each of the independent variables. For example,
let’s say we had a two-way ANOVA where the independent variables were “college class”
and “gender” and we are interested in using it to determine if there is a significant dif-
ference in achievement. As we know, “college class” has four levels: freshman, sopho-
more, junior, and senior; obviously, gender has two levels, female and male. In order
to be as clear as possible, someone performing research using this ANOVA might call
it a 4 by 2 (sometimes you see it abbreviated as 4 × 2) ANOVA rather than just a facto-
rial ANOVA. This tells the reader two things.
First, there are two independent variables. The first independent variable is repre-
sented by the “4,” which also shows it has four levels (i.e., freshman, sophomore, junior,
and senior). The second independent variable is represented by the “2,” which means
it has two levels (i.e., male and female). The great thing about using this annotation is
we can tell exactly how many measurements we are going to make by multiplying the
Analysis of Variance and Multivariate Analysis of Variance | 247
TABLE 9.1. Cells for Gender and Class for Factorial ANOVA
Achievement of Achievement of Achievement of Achievement of
male freshmen male sophomores male juniors male seniors
Achievement of Achievement of Achievement of Achievement of
female freshmen female sophomores female juniors female seniors
TABLE 9.2. Cells for Gender and Number of Parents for Factorial ANOVA
Males attending Females attending Males not attending Females not
kindergarten— kindergarten— kindergarten— attending
one parent at home one parent at home one parent at home kindergarten—one
parent at home
Males attending Females attending Males not attending Females not
kindergarten— kindergarten— kindergarten— attending
two parents at two parents at two parents at kindergarten—two
home home home parents at home
The factorial ANOVA’s underlying logic, and particularly the computations, are
very complicated. We will touch briefly on its use at the end of the chapter, but we will
spend most of our effort focusing on the one-way ANOVA.
Random Samples
The first of these assumptions is that, if your dependent variable scores represent
a sample of a larger population, then it should be a random sample. Other types of
samples, especially where the researcher picks subjects because of ease or availability,
can make the results of the ANOVA not generalizable to other samples or situations.
Independence of Scores
The second assumption is that the values collected are independent of one another.
For example, if you are collecting data from school children, this means the score that
one child gets is not influenced by, or dependent on, the score received by another
child. This is the same idea as was the case with the independent-sample t-test.
Homogeneity of Variance
The fourth assumption requires that there be “homogeneity of variance.” This means
that the degree of variance within each of the samples should be about the same.
As we will see later, SPSS will provide us with this information, thereby making our
decision-making process far easier.
The first two assumptions do not have anything to do with the actual statistical
process; instead they are methodological concerns and should be addressed when the
data are being collected. For the third assumption, we have already seen that it is very
straightforward to check if data are normally distributed using various numeric and
graphical descriptive statistics. The last assumption, homogeneity of variance, is some-
Analysis of Variance and Multivariate Analysis of Variance | 249
thing we address with secondary statistical procedures appropriate for each test. We
saw the same thing when we used the Levene test when interpreting the results of the
independent-sample t-test; we must do the same thing when we are using the ANOVA.
In other words, if we are using the ANOVA and we find the p value of the Levene sta-
tistic is less than .05, we have decisions to make. For now, however, it is important to
discuss the math and logic underlying the one-way ANOVA.
1. The total variance in the data. This is the sum of the within-group variance and
the between-groups variance.
2. The total degrees of freedom. This is the sum of the within-group degrees of freedom
and the between-groups degrees of freedom.
3. The mean square values for the within-group and between-groups variance.
4. The F value that can be calculated using the data from the first three steps.
We can better understand these values by using the data in Table 9.3. In this case,
we are interested in looking at the effect of different types of medication on headache
relief. Here we have three levels: Pill 1, Pill 2, and Pill 3. The dependent variable,
which we will call “time to relief,” shows the number of minutes that elapse between
the time the pill is taken and the time the headache symptoms are relieved. This will
allow us to state the following hypothesis:
Descriptive Statistics
We can use the actual ANOVA procedure to calculate the descriptive statistics but,
before we do that, let’s look at how we would set up the data in SPSS. In Figure 9.1, you
250 | STATISTICS TRANSLATED
FIGURE 9.1. Data for pill and time in the Data View spreadsheet.
can see a variable labeled Pill; there are three values representing the three types of
pills. There is also a variable labeled Time; this is the amount of time before relief.
Once we have entered our data, we will then tell SPSS to run the ANOVA for
us. In Figure 9.2, we begin by selecting Compare Means and then One-Way ANOVA.
At this point, as shown in Figure 9.3, we will be asked to do several things. First, we
FIGURE 9.2. Using the Compare Means and One-Way ANOVA commands.
Analysis of Variance and Multivariate Analysis of Variance | 251
FIGURE 9.3. Identifying the independent and dependent variables and the statistics to be
computed.
include our variable Time in the Dependent List (i.e., the dependent variable); we fol-
low that by including Pill in the Factor Type (i.e., the independent variable). In this
case, unlike the Independent Sample t-Test, we do not need to include the exact values
for the levels; SPSS simply uses each value entered under Pill as a level. We have also
selected Options and asked for Descriptives and the Homogeneity of variance test.
This will generate the output shown in Figure 9.4.
By looking at the mean scores, it appears that the time it takes Pill 2 to work may
be significantly less than the other two pills, but in order to make sure, we need to
continue by computing the component parts of the F statistic.
Descriptives
Time
Table 9.4; I have substituted x1 for Pill 1, x2 for Pill 2, and x3 for Pill 3 and have gone
ahead and totaled each column.
2
SSTotal = ∑ x −
(∑ x )2
N
First we need to determine the value Sx2. Since it represents each value in the
dataset squared, we sum the second, fourth, and sixth columns:
N
1. Compute Sx by adding the totals of the first, third, and fifth columns (i.e.,
31 + 23 + 29 = 83).
2. Compute (Sx) 2 by squaring Sx (i.e., 83 * 83 = 6889).
3. Divide (Sx) 2 by the total number of data values collected (i.e., 6889/15 =
459.27).
After we have completed these steps, we now have everything we need to include
in the equation:
1. SSTotal = ∑ x 2 − (∑ )2
x
N
2. SSTotal = 545 — 459.27
3. SSTotal = 85.73
Analysis of Variance and Multivariate Analysis of Variance | 253
1. SS Between =
(31)2 + (23)2 + (29)2 − ( ∑ x )2
5 5 5 N
4. SS Between = 466.2 −
( ∑ x )2
N
Now, in order to determine the value that we must subtract from 466.2, all we do
is complete the following three steps. Notice, this is exactly what we did when we were
computing the total sum of squares:
1. Add the totals for the first, third, and fifth columns (i.e., 31 + 23 + 29 = 83).
2. Square the value from Step 1 (i.e., 83 * 83 = 6889).
3. Divide the value from Step 2 by the total number of data values collected (i.e.,
6889/15 = 459.27).
4. SSBetween = 466.2 – 459.27.
SSBetween = 6.93
Computing the Within Sum of Squares requires that we first compute the Sum
of Squares for each of the levels of the independent variable; they each take the same
general form:
(∑ x1)2
SS1 = ∑ X 12 −
N
To complete this equation, we go through the following three general steps:
1. We square the sum of all of the values from the first level of the independent
variable shown in column one (i.e., 31 * 31 = 961).
2. We divide the result of Step 1 by the number of data values in that column by
N (i.e., 5).
3. We then subtract that value from the sum of the squared values shown in
column 2 of the table (i.e., 211).
We can then include our values in the equation and compute SS1 using the fol-
lowing four steps:
(31)2
1. SS1 = 211 −
5
961
2. SS1 = 211 −
5
3. SS1 = 211 – 192.2
4. SS1 = 18.8
We can then use the same formula to compute SS2 (i.e., 11.2) and SS3 (i.e., 48.8)
and then insert all three values into our original equation:
SSWithin = 78.8
A SHORTCUT
There is an easier way to compute the Total Sum of Squares; all you need to do is add
the Between Sum of Squares to the Within Sum of Squares. This is shown in the fol-
lowing formula:
That means if we know only two of the values, we can easily compute the third.
For example, if we know the Total Sum of Squares and the Sum of Squares Between,
we can insert them into the following formula:
Analysis of Variance and Multivariate Analysis of Variance | 255
We then subtract 6.93 from both sides and arrive at a Total Sum of Square value
of 78.8:
If you manually computing these values, you might find that computing the
Between Sum of Squares is the most tedious. Because of that, it is much easier to
compute the Total Sum of Squares and the Within Sum of Squares and use those to
compute the Between Sum of Squares value.
1. The Between Groups degrees of freedom are equal to the number of levels of
the independent variable, minus one. In this case, we have three levels, so we
have two degrees of freedom.
2. The Within Group degrees of freedom are equal to the total number of data
items minus the number of levels in the independent variable. In this case, we
have 12 degrees of freedom (i.e., 15 - 3 = 12).
3. The Total degrees of freedom are equal to the Within Group degrees of free-
dom plus the Between Group degrees of freedom (i.e., 12 + 2 = 14 degrees of
freedom). You can also compute this value by subtracting one from the total
number of data items (i.e., 15 – 1 = 14).
ANOVA
Pill
Sum of Squares df Mean Square F Sig.
The F Distribution
At this point, we have our computed value of F. Now we need an F distribution so that
we can determine the critical value needed to test our hypothesis. This process is simi-
lar to how t distributions are created, in
that an infinite number of F values are
computed from random samples of data
and then plotted. This process is shown
in Figure 9.6.
This distribution is certainly a lot
different from the normal curves we
have been using up to this point. First,
there are no plotted values less than
zero; this is explained by the fact that
we are using the variance which, as you
know, can never be less than zero. As for
the strange shape of the distribution,
that takes a little more explaining.
When Fisher first developed the
F distribution, he discovered that the
shape of the distribution depends on
the “within” and “between” degrees of
freedom. In the following example, you
can see a plot of three distributions
with different degrees of freedom. As
you can see in Figure 9.7, the larger the
between-groups degrees of freedom,
shown on the horizontal axis, the more
the distribution is skewed to the right;
the more degrees of freedom you have
for within groups, shown on the vertical
axis, the more peaked the distribution.
FIGURE 9.6. Creating the F distribution.
Analysis of Variance and Multivariate Analysis of Variance | 257
Having said that, if you are like me, upon first seeing the F table, you are probably
thinking, “This isn’t so bad, it looks a lot like our t and z tables.” If so, you are right;
there is only one small difference. First, you can see the two degrees of freedom val-
ues we have talked about. The horizontal row across the top shows you the “between
groups degrees of freedom,” and the vertical column on the left shows you the “with-
in-group degrees of freedom.”
Now we are going to use this table, along with our data about the different head-
ache remedies, to test this hypothesis:
First, we must locate the critical value of F for situations where we have 2 between-
groups degrees of freedom and 12 within-group degrees of freedom. To do this, we go
across the top row until we get to the column showing 2 degrees of freedom; we then
go down the left column until we get to 12 degrees of freedom. The point where that
row and that column intersect shows us an F value of 3.89; we can mark that on our F
distribution shown in Figure 9.8. In Figure 9.9, we then plot our computed value of F.
In order to test our hypothesis, we
compare these two values. If our com-
puted value of F falls to the right of the
critical value, we reject the null hypoth-
esis. If our computed value of F is equal
to or to the left of the critical value of F,
then we do not reject the null hypoth-
esis. This, of course, is the same logic
we used with the z and t-tests. In this
case it means we cannot reject the null
hypothesis; obviously, there is no signifi-
cant difference in headache relief time
between the three types of pills.
FIGURE 9.8. The critical value of F equals 3.89. The p Value for an ANOVA
The decision to fail to reject the null
hypothesis is supported by a p value of
.603, shown as Sig., back in Figure 9.5.
This simply means there are no signifi-
cant differences between any of the com-
binations of pills: Pill 1 compared to Pill
2, Pill 1 compared to Pill 3 or Pill 2 com-
pared to Pill 3. You will see that when we
reject a null hypothesis, we must take it
a step further. We’ll use tools called post-
hoc tests to help us determine where the
difference lies.
Substituting our values into the formula, we will compute an effect size of .08.
Analysis of Variance and Multivariate Analysis of Variance | 259
1. h2 = 6.933/85.733
2. h2 = .08
This supports our decision not to reject the null hypothesis. Apparently, the type
of pill you take does not make a lot of difference. At the same time, a person with a
really bad headache might look at the mean scores and choose Pill 2; while it is not
significantly faster, it is somewhat faster!
It seems that our problem meets all criteria; remember, as we write it, it must be
clear and concise, contain all variables to be considered, and not interject the bias of
the researcher:
This study will investigate the effect of technology on the achievement, both
alone and in combination with lecture, of students in a math class.
Descriptives
Grade
N 10 10 10 30
Mean 78.0000 86.4000 93.6000 86.0000
Std. Deviation 5.73488 3.09839 2.95146 7.61124
Std. Error 1.81353 .97980 .93333 1.38962
95% Confidence Interval for Lower Bound 73.8975 84.1835 91.4887 83.1579
Mean Upper Bound 82.1025 88.6165 95.7113 88.8421
Minimum 68.00 82.00 90.00 68.00
Maximum 88.00 92.00 100.00 100.00
In this case, the mean scores between the three groups are different, but the same
question must be asked, “Are they significantly different?”
would affect the validity of our results. Because manually calculating these values is
something we would never do, SPSS creates the table shown in Figure 9.11.
2.557 2 27 .096
FIGURE 9.11. Homogeneity of variance statistics from the ANOVA.
We will use this table to test the null hypothesis: “There is no significant differ-
ence in the variance within the three sets of data.” Just like with the t-test, all we must
do is compare our p value (i.e., Sig.) to an alpha value of .05. Obviously, since our
p value is larger than alpha, there is no significant difference in the degree of variance
within the three sets of data. If the p value was less than .05, we might need to revert
to the ANOVA’s nonparametric equivalent; the Kruskal–Wallis H test. Keep in mind,
however, that the difference in the variances between the groups must be very, very
large for that to ever happen. Given that, we can go ahead and use the SPSS output
shown in Figure 9.12.
ANOVA
Grade
To begin with, since this is our first case with an ANOVA, let’s plot both our com-
puted value of F and our critical F value. Remember, we determine the critical value
of F by using both the within and between-groups degrees of freedom in combination
with our desired alpha value. In Figure 9.13, we would find that these values represent
a critical value of 3.35.
Our computed value of F (i.e., 35.719) falls very far to the right of our critical value
Analysis of Variance and Multivariate Analysis of Variance | 263
Multiple-Comparison Tests
The multiple-comparison tests do exactly what their name implies. They make com-
parisons between the levels of an ANOVA and tell us which are significantly different
from one another. We won’t get into the manual calculations; suffice it to say that they
are a modified version of the independent-sample t-test that controls for the inflated
Type I error rate. We will always use computer software to compute the right value
for us.
There are several multiple-comparison tests we can choose from, and most of the
time they will provide us with the same results. There are situations that require a
specific multiple-comparison test, but these do not arise that often. For our purposes,
one of the most commonly used multiple-comparison tests is the Bonferroni test. The
procedure is easy to understand, and it gives the researcher a good estimate of the
probability that any two groups are different while, at the same time, controlling for
Type I and Type II errors. Typical output for the Bonferroni procedure is shown in
Figure 9.14.
You can see that there are four rows in the table; the bottom three rows represent
the three levels of our independent variable—the Lecture group, the CAI group, and
the Combination group. If we look at the lecture row, for example, we can see we are
given statistics representing it and its relationship to the other levels of the indepen-
264 | STATISTICS TRANSLATED
Multiple Comparisons
Grade
Bonferroni
(I) Method (J) Method Mean 95% Confidence Interval
Difference (I-J) Std. Error Sig. Lower Bound Upper Bound
*
Lecture CAI -8.40000 1.84752 .000 -13.1157 -3.6843
*
Combination -15.60000 1.84752 .000 -20.3157 -10.8843
*
CAI Lecture 8.40000 1.84752 .000 3.6843 13.1157
*
Combination -7.20000 1.84752 .002 -11.9157 -2.4843
*
Combination Lecture 15.60000 1.84752 .000 10.8843 20.3157
*
CAI 7.20000 1.84752 .002 2.4843 11.9157
dent variable. There is one quirk, however: because of the total number of possible
comparisons, part of the table is redundant.
For example, in the third row where you compare the CAI group to the Combina-
tion group, the mean difference is –7.2 points. In row 4, you compare the same two
groups; only, this time, you subtract the average CAI score from the average combina-
tion score. This gives you a difference, obviously, of +7.2 points. The order in which
you make the subtraction doesn’t matter; the point here is that there is a 7.2-point
difference between the groups.
We can also see an asterisk next to each of the mean difference values; this indi-
cates that the mean difference is significant when alpha equals .05. For example, we
see that the mean difference between the Lecture and the CAI group is –8.4; this
difference is statistically significant. In this case, we can see asterisks next to all our
mean difference values; each of the groups is significantly different from one another.
We can verify our results by looking at the p value in the fourth column (i.e., Sig.).
Here, we can see the p value between each of these; this means the two groups in that
given comparison are significantly different from one another. Again, the p value
supports the significant difference shown by the asterisks next to all mean difference
comparisons.
So, what do we know from all of this? Apparently, the use of a combination of
lecture and CAI results in achievement scores significantly higher than either lecture
alone or CAI alone. At the same time, the use of CAI is significantly more effective
than the use of lecture alone. The teacher should use this information to plan future
class sessions accordingly.
The administration and faculty in the high school have noticed what they
believe is an inordinate number of absences by students in the senior class
when compared to students in other grades. Afraid that higher levels of
absenteeism could lead to lower grades and fewer graduates, personnel at
the school will attempt to determine if seniors are skipping more classes than
students in other grades.
Descriptives
Absences
N 18 18 18 18 72
Mean 2.6667 2.7222 2.7778 5.1667 3.3333
Std. Deviation 1.45521 1.56452 1.06027 1.24853 1.69506
Std. Error .34300 .36876 .24991 .29428 .19977
95% Confidence Interval for Lower Bound 1.9430 1.9442 2.2505 4.5458 2.9350
Mean Upper Bound 3.3903 3.5002 3.3050 5.7875 3.7317
Minimum .00 .00 1.00 3.00 .00
Maximum 5.00 5.00 4.00 7.00 7.00
1.318 3 68 .276
FIGURE 9.16. Homogeneity of variance statistics from the ANOVA.
ANOVA
Absences
Sum of Squares df Mean Square F Sig.
Between Groups 80.778 3 26.926 14.859 .000
Within Groups 123.222 68 1.812
Total 204.000 71
This table verifies what we suspected. Since there are asterisks beside each of the
mean difference values between the seniors and every other class, it is apparent they
have a significantly higher number of absences than those in the other classes. There
is no significant difference in the mean scores between any of the other classes. As
always, be careful; the same statistics would have been computed had the seniors had
significantly fewer absences.
268 | STATISTICS TRANSLATED
Multiple Comparisons
Absences
Bonferroni
Descriptives
Arrests
Born First Born Second Born Third Born Fourth Born Fifth Total
N 10 10 10 10 10 50
Mean 3.0000 2.7000 3.4000 3.9000 1.1000 2.8200
Std. Deviation 1.24722 .82327 1.42984 1.10050 .73786 1.42414
Std. Error .39441 .26034 .45216 .34801 .23333 .20140
95% Confidence Interval for Lower Bound 2.1078 2.1111 2.3772 3.1127 .5722 2.4153
Mean Upper Bound 3.8922 3.2889 4.4228 4.6873 1.6278 3.2247
Minimum 1.00 1.00 1.00 3.00 .00 .00
Maximum 4.00 4.00 5.00 6.00 2.00 6.00
2.218 4 45 .082
FIGURE 9.20. Homogeneity of variance statistics from the ANOVA.
ANOVA
Arrests
Here we have a p value less than .05, so we will reject our null hypothesis. This is
supported by a moderate effect size of .454. Apparently, the social workers are right;
there is a significant difference in the number of arrests when you look at birth order.
We do need to look at our post-hoc tests in Figure 9.22 to see where those differences
lie. If they are right, the people born last will have a significantly higher number of
arrests.
What does this tell us? The first thing it tells us is that, for the social workers,
things can go from looking pretty good to looking pretty bad in just a matter of sec-
onds. First, if they had paid attention to the descriptive statistics in Step 3, they would
have already known that children born last have fewer arrests; the fact that they are
significantly less is supported by the p value. The birth order of children born first
through fourth makes no difference in the number of arrests, but children born fifth
have significantly fewer arrests than their siblings.
Analysis of Variance and Multivariate Analysis of Variance | 271
Multiple Comparisons
Arrests
Bonferroni
(I) Birth Order (J) Birth Order Mean 95% Confidence Interval
Difference (I-J) Std. Error Sig. Lower Bound Upper Bound
The corresponding null hypothesis would simply state that no significant differ-
ence would exist.
Descriptives
Units Produced
We can see, right away, that there is probably not a significant difference in pro-
duction rates between the four plants since the mean scores seem to be very, very
close. Using the correct statistical test will help us be sure.
ANOVA
Units Produced
Our p value, .996, is very large; obviously, we will not reject our null hypothesis.
We could also compute a very small effect size of .0012 that supports our decision.
Although we would not have to, we can check our multiple comparison tests in Figure
9.26 just to see how they look when no significant difference is detected.
In this case, there are no asterisks marking any of the mean differences; this veri-
fies what we saw with the large p value and small effect size. Apparently, the rumor
regarding low production was only that. Now all the president must worry about is if
they are all running up to capacity!
1. Although it may appear to be quite clear cut, you will find when you get into
scenarios where you have three or more independent variables that things
can get quite complicated in a hurry. For example, if you have a 4 × 2 × 3 facto-
rial ANOVA, you are dealing with 24 different comparisons.
2. Second, the math underlying this test is far beyond what we want to cover in
this book. Because of that, we will not manually calculate it.
3. If you violate any of the assumptions that underlie using an ANOVA, the best
solution is to run a Kruskal–Wallis H test, the nonparametric alternative for
the one-way ANOVA, for each of the independent variables. This, of course,
inflates the probability of making a Type I error. If you are ambitious, there
Analysis of Variance and Multivariate Analysis of Variance | 275
Multiple Comparisons
Units Produced
Bonferroni
(I) (J) Plant Mean Difference (I-J) Std. Error Sig. 95% Confidence Interval
Plant Lower Bound Upper Bound
is a test called Friedman’s ANOVA that can be modified to work in this sce-
nario.
personal computers to help in their work. The professor followed this plan through-
out the term and, after collecting final grades, decided to test whether the grades in
the two groups were equivalent. Noting there was one independent variable (class)
with two levels (the calculator section and the computer section) and one dependent
variable (course grade), the professor analyzed the data using an independent-sample
t-test. Upon running the analysis, the results showed a p value greater than the pre-
determined alpha value of .05. Given that, rejecting the null hypothesis was not fea-
sible—there was apparently no significant difference between the two groups.
As the young professor was presenting the results in a faculty meeting, one of
his colleagues noted that a lot of variables affect whether technology adoption is suc-
cessful. The young professor decided to investigate this and, after some additional
research, found that a student’s age is an important predictor of success with technol-
ogy. In fact, he read, students that are older than 30 appear to have more problems
when trying to use technology in a classroom. “That is true.” the young professor
thought. “I did notice some of the older students looking puzzled when we started
using the computers. Maybe I should look at this.”
If we think about it, however, this is not really what we are interested in; there are
four groups that we want to look at. To make this clear, look at Table 9.10.
TABLE 9.10. Cells for Age and Technology Type for Factorial ANOVA
People 30 or younger People older than 30
Computers Group 1 Group 2
Calculators Group 3 Group 4
You might be saying to yourself, “What is the big deal? We have four groups; we
still have the two independent variables and have already stated hypotheses about
them—what is the problem?” The problem is that we cannot just look at the effects
of each of the independent variables, called the main effects; we also must look at the
interaction between the two variables. In other words, it is important to know if these
two variables interact to cause something unexpected to happen. For example, the
literature suggests that younger people will perform better than older people when
computers are involved; this is called an interaction of technology type and age. It is
imperative that we look at the interaction of these variables to see if achievement is
different based on the way these independent variables affect one another. Given that,
in addition to our two main effect hypotheses that we stated earlier, we also must state
an interaction effect hypothesis:
There will be no significant interaction between technology use and age and
their effect on achievement in a statistics class.
Obviously, this is radically different from any hypotheses we have stated up to this
point, but it will become clear as we work through the example.
We have two groups of 20 students: those who use calculators and those who use
computers. Each of these groups includes 10 students age 30 and younger; the remain-
ing 10 students are older than 30. In order to input this into the SPSS spreadsheet,
shown in Figure 9.27, we would identify the students using computers as 1 in the Tech-
nology column; those with calculators would be identified as 2. Students 30 or younger
would be identified as 1; students older than 30 would be identified with a 2.
Our software would generate the descriptive statistics shown in Figure 9.28. As
you can see, the average scores for the computer group (72.55) and the calculator
group (73.40) are about the same; so are the average scores for the 30 or younger
group (72.30) and the students older than 30 (73.65). We can take this a step further
and break the scores down by age group within the technology group; our software
would give us the following results:
Again, just by looking at the data, our young professor might be right; there
doesn’t appear to be any great difference between the age of a student and whether
they are using computers or calculators. Our analysis will help us make sure we have
made the right decision.
Analysis of Variance and Multivariate Analysis of Variance | 279
FIGURE 9.27. Technology type, age, and score entered into the Data View spreadsheet.
Descriptive Statistics
Dependent Variable:Scores
d
Computer 30 or Younger 70.6000 3.09839 10
i Over 30 74.5000 3.17105 10
m
n
Over 30 73.6500 2.81490 20
1
Total 72.9750 3.23829 40
FIGURE 9.28. Descriptive statistics for the factorial ANOVA.
280 | STATISTICS TRANSLATED
FIGURE 9.29. Selecting the General Linear Model and Univariate to run a factorial ANOVA.
Analysis of Variance and Multivariate Analysis of Variance | 281
FIGURE 9.30. Identifying the independent variables, dependent variable, and descriptive
statistics for the factorial ANOVA.
.878 3 36 .462
FIGURE 9.31. Levene’s test for the factorial ANOVA.
The table in Figure 9.32 is very similar to the ANOVA table we used earlier, but
there are a few differences. First, there is a lot of technical information given in the
first two rows, so move down to the third and fourth rows, labeled “Technology” and
“Age.” These lines, called the Main Effects, show what the results of a one-way ANOVA
would be if it were calculated using only one independent variable at a time. In this
case, if we separated the groups by either Age or Technology used, the differences are
not significant; our p value (i.e., Sig.) is greater than our alpha value of .05. This, of
course, validates the observation we made earlier when we noticed that the means for
each of these groups were fairly close to one another.
We have rejected the null hypothesis, but what does it mean? How do we interpret
these results so that they are meaningful to us? To do that, we will rely on that adage
“one picture is worth a thousand words”; many people find it easier to help interpret
the results of a factorial ANOVA by plotting the interactions using the average scores
from each subgroup. We would do this by selecting the Plots subcommand from the
earlier screen and identifying the x-axis and y-axis variables as shown in Figure 9.33.
SPSS would then create Figure 9.34, an interaction plot.
To interpret this chart, first look at the legend at the top right of the chart. It tells
us a solid line represents the scores for students 30 and younger and a dashed line rep-
resents the scores of students older than 30. When we look at the actual graph, we can
see that the bottom left of the graph represents those students in the computer group
and the bottom right side shows those students in the calculator group.
Given this, you can see that the dashed line on the left side of the graph (com-
puter group) starts at the point on that line which represents the average score of the
over-30 group (74.5). It then goes down to the point on the right side of the graph
(the calculator side) that is equal to the average score of that group (72.8). The solid
line for the 30 and younger group starts on the computer side at 70.6 and goes up to
the level of the calculator group (74.00). Based on this, it seems the young professor’s
colleague was right: age as an independent variable does interact with the type of
technology a student uses. We can look at this in even more detail.
When looking at the right side of the chart, the first thing we notice is that there
is not a big difference in achievement, by age, for those students using calculators. We
already knew this because, using the descriptive statistics, we saw students under 30
who used a calculator had an average score of 74.0 while students 30 and older using
calculators had an average of 72.8. The more striking observation we can make is
Analysis of Variance and Multivariate Analysis of Variance | 283
FIGURE 9.33. Selection of the dependent variable and fixed factors in a factorial ANOVA.
that students 30 and younger have somewhat lower achievement (i.e., 70.6) than their
older classmates when using computers (i.e., 74.5). While we knew this already from
looking at the data, plotting them on this graph makes the differences quite clear.
At this point, we can safely state that a significant interaction does exist (remember,
however, we already knew that because of the low p value). We can also see that when
considered by the type of technology used, the average scores change when we move
between levels of age. Before we move forward, we need to talk about the types of
interactions that might exist.
cross one another. For example, if our plotted data had looked like the following
graph, we would have had an ordinal interaction. If you look at Figure 9.35, the rea-
soning behind this is clear; the independent variable called Technology Type only
affects one level of the independent vari-
able called Age Group. As can be seen,
students whose ages are less than or
equal to 30 have a higher average than
their older classmates when calculators
are being used. The two age groups are
equal, however, when they are all using
computers.
This study will investigate the effect of combining different diets and exercise
regimens on the endurance of high school athletes.
Descriptive Statistics
Dependent Variable:Endurance
3.234 3 16 .050
FIGURE 9.37. Factorial ANOVA Test of Equality.
Analysis of Variance and Multivariate Analysis of Variance | 287
In this case, we can see there is not a significant main effect for the type of
exercise that a person performs because the p value is equal to .361. Of course, we
suspected that before examining the p
value because our mean scores were so
close to one another for the two groups.
A significant main effect does exist for
the type of diet an athlete follows. In
this case, the p value of .000 shows that
athletes eating a diet high in protein can
develop higher levels of endurance than
athletes eating a diet high in carbohy-
drates. We can verify this by looking at
the interaction plot and noting that the
scores on the left side (protein) are dra-
matically higher than those on the right
side (carbohydrates). Given all of this,
the only null hypothesis we can reject is
our first one; it seems that a diet rich in
FIGURE 9.39. Ordinal interaction of food and protein is better for developing superior
exercise. levels of endurance.
1. Like the factorial ANOVA, when you have a scenario with more than one
dependent variable and two or more independent variables, your analysis can
get very complicated.
288 | STATISTICS TRANSLATED
2. As was the case with the factorial ANOVA, the math underlying the M
ANOVA
is far beyond what we want to cover in this book. Because of that, we will rely
on our software to do the calculations for us.
3. Just as was the case with the ANOVA, if you violate any of the assumptions
that underlie using a MANOVA, you run the risk of making a Type I error.
Each additional independent or dependent variable worsens this problem.
Looking at these, you might ask, “Why not just use two ANOVAs, one for each of
the dependent variables?”
We could certainly do that, but it would create two problems. First, as we said ear-
lier, the use of multiple ANOVAs would increase our Type I error rate. Additionally,
we would not be able to accurately determine if there is a relationship between each of
the regions and a combination of the consumer’s taste preference and their intention
to buy coffee. Knowing if the combination does show a relationship could strongly
affect decisions on where and how to market the new coffee.
Keeping in mind that the purpose of the MANOVA is to test both dependent vari-
ables simultaneously, it creates a third variable that contains a linear combination of
the dependent variables. This results in the ability to compute an overall mean and, by
using it, we can determine if there are significant group differences, by region, on the
set of dependent variables. This will help increase the statistical power of our analysis.
In short, we can determine if there is a significant interaction between customers’
intention to buy and their taste preference by region. Knowing that, we can examine
all variables simultaneously with the following interaction hypothesis:
There are also the two main effect hypotheses we discussed earlier:
An example of the data we could use to test this hypothesis is shown in Table 9.12.
TABLE 9.12. Data for Coffee Drinker Region, Intention, and Taste
Region Intention Taste
Urban 2 3
Urban 3 4
Urban 4 2
Urban 4 3
Urban 3 5
Rural 1 5
Rural 1 5
Rural 2 5
Rural 3 4
Rural 2 4
290 | STATISTICS TRANSLATED
The left-most column shows our two regions, Urban and Rural. The middle col-
umn shows a customer’s intention to buy coffee, ranked from 1 (Not Likely) to 5 (Very
Likely) and the right-most column shows Taste preference, ranked from 1 (Very Mild)
to 5 (Very Strong). Note that I’m only showing an example of 10 potential customers
here, the actual dataset that we would use for our SPSS analysis would be much larger.
We begin our analysis by entering the larger dataset into our SPSS spreadsheet
(Figure 9.40). We follow that by selecting Analyze, General Linear Model, and Multi-
variate, shown in Figure 9.41. This would be followed by identifying the Dependent
Variable and Fixed Factors, as well as choosing to compute the Descriptive Statistics
and Estimate of Effect Size (Figure 9.42).
FIGURE 9.40. Data for multivariate analysis of variance (MANOVA). Reprint Courtesy of Interna-
tional Business Machines Corporation, © International Business Machines Corporation.
FIGURE 9.41. Selecting the General Linear Model and Multivariate to run a MANOVA. Reprint
Courtesy of International Business Machines Corporation, © International Business Machines
Corporation.
Analysis of Variance and Multivariate Analysis of Variance | 291
FIGURE 9.42. Selecting the Independent and Dependent Variables and Options Needed for the
MANOVA. Reprint Courtesy of International Business Machines Corporation, © International
Business Machines Corporation.
Figure 9.43 shows the descriptive statistics, and we can see that the mean scores
for Intention to Buy are very close for Urban and Rural customers (i.e., 3.3 and 3.1);
the same is true for Taste Preference (i.e., 2.8 and 2.7). Box’s Test of Equality of Covari-
ance Matrices (i.e., Box’s M), shown in Figure 9.44, tests the assumption of homogene-
ity of the covariance matrices. In this case, our Sig. (i.e., p) is not less than an alpha
of .05, so we are supporting the hypothesis that the variance in the groups is equal.
Because there is equal variance between groups, we can use the results shown in
Figure 9.45 to test our multivariate hypothesis. Therein we want to focus on the row
labeled “Region” and we can see that there are Sig. (i.e., p) values for four tests, Pillai’s
Trace, Wilk’s Lambda, Hoteling’s Trace, and Roy’s Largest Root. As we have seen in
other places throughout this book, there are specific times, and researchers’ opinions,
as to which of these tests should be used for a given data set. It is agreed, however,
that Pillai’s Trace is the most robust of these tests, so we will use it in our journey to
become good producers and consumers of research.
In the section labeled Region, Pillai’s trace is not significant (i.e., F (2, 37) =
151.182, p = .895). This means we fail to reject the interaction null hypothesis; there
is no significant difference between urban consumers and rural consumers when con-
sidering intent to buy and taste preference simultaneously. At this point, we would
not go further with our analysis by testing the main effect hypotheses focusing on
each dependent variable individually (i.e., a one-way ANOVA). As we’ll see in the next
example, when we do see significant multivariate differences, we’ll take our analysis a
step further in trying to determine the exact cause of the difference.
sessions. In order to investigate ways of addressing these student concerns, one uni-
versity’s administrators want to determine if factors such as course satisfaction and
perceived value are directly related to whether a student’s class meets once per week,
or if they are required to be in class two days per week.
Before we do, however, we need to think back to the problem with power had we
chosen to use two separate ANOVAs rather than one MANOVA. The same problem
still exists with the MANOVA but to a much lesser extent. In order to control for
potential issues, we will adjust our alpha value using the Bonferroni correction. Our
new alpha value is computed by dividing our traditional alpha value of .05 by the num-
ber of hypotheses we have stated. In this case our corrected alpha is .025 (i.e., .05/2).
We will then use this new value in our comparisons to our completed p value when
testing our univariate hypotheses.
In Figure 9.49, our Sig. (i.e., p) values for Satisfaction and Value are .105 and
Analysis of Variance and Multivariate Analysis of Variance | 295
.019 respectively. This means we would fail to reject the null hypothesis for Satisfac-
tion (i.e., p of .105 > alpha of .025) but reject the null hypothesis for Value (i.e., p of
.019 < alpha of .025). This means, if we look back at our descriptive statistics, we can
determine that the mean Satisfaction level of students taking the class once per week
(i.e., 4.2308) is significantly greater than the mean value of students taking the class
twice per week (i.e., 3.3077). In seeing this, faculty and administrators might want
to reconsider the twice weekly option or find new or better ways to increase student
satisfaction.
Summary
Although we have covered quite a lot of material in this chapter, the analysis of vari-
ance is straightforward. We use the simple (one-way) ANOVA when we have one
independent variable that has three or more levels and one dependent variable that is
represented by quantitative data. When we have more than one independent variable
and only one dependent variable, we will use a factorial ANOVA and a multivariate
ANOVA when we have two or more dependent variables. As I mentioned, there are
other types of ANOVAs but, for now, we have talked about as much as we need to; it
is time to move forward!
296 | STATISTICS TRANSLATED
Quiz Time!
To test mastery of this topic, work through these examples and check your work at the end of
the book.
1. What are the null and research hypotheses the university is investigating?
Descriptives
Time to Finish
ANOVA
Time to Finish
1. What are the null and research hypotheses the psychologists are investigating?
1.997 3 20 .147
FIGURE 9.54. ANOVA tests of homogeneity of variance.
ANOVA
Anxiety Attacks
Multiple Comparisons
Anxiety Attacks
Bonferroni
(I) Season (J) Season Mean 95% Confidence Interval
Difference (I-J) Std. Error Sig. Lower Bound Upper Bound
*
Winter dime
Spring 2.33333 .56765 .003 .6718 3.9949
*
nsion
Summer 2.33333 .56765 .003 .6718 3.9949
*
d
3
Fall 2.33333 .56765 .003 .6718 3.9949
i
*
Spring dime
Winter -2.33333 .56765 .003 -3.9949 -.6718
m
e
nsion
Summer .00000 .56765 1.000 -1.6616 1.6616
n
3
Fall .00000 .56765 1.000 -1.6616 1.6616
*
s Summer dime
Winter -2.33333 .56765 .003 -3.9949 -.6718
i
nsion
Spring .00000 .56765 1.000 -1.6616 1.6616
o
3
Fall .00000 .56765 1.000 -1.6616 1.6616
*
Fall Winter -2.33333 .56765 .003 -3.9949 -.6718
n
dime
nsion
Spring .00000 .56765 1.000 -1.6616 1.6616
3
Summer .00000 .56765 1.000 -1.6616 1.6616
1. What are the null and research hypotheses the teachers are investigating?
Descriptives
Times Late
.385 2 27 .684
FIGURE 9.58. Homogeneity of Variance for the ANOVA.
ANOVA
Times Late
Multiple Comparisons
Times Late
Bonferroni
1. What are the three main-effect null hypotheses the climbers are investigating?
2. What are the three research hypotheses the climbers are investigating?
3. What are the independent variables?
4. What are the levels of the independent variables?
5. What is the dependent variable?
6. Using the information in Figures 9.61 and 9.62, what conclusion would the climbers
come to?
Analysis of Variance and Multivariate Analysis of Variance | 301
higher productivity than for that of employees working in the traditional office environment.
In order to investigate this, administrators from one company collected data from 60 employ-
ees working as telephone technical support personnel, 30 working from home, and 30 from a
traditional workplace. The data collected measured employee satisfaction, on a scale from 1 to
5, and the average number of customers an employee was able to work with over time.
Introduction
Up to this point we have dealt with quantitative data and, because of that, we have
been using parametric statistical tools. As I said in the very beginning, this is normal
for introductory statistics students. Other than the topic of this chapter, the chi-square
test, it is unusual for beginning students to use other nonparametric statistical tests.
In order to begin understanding the purpose of the chi-square, as well as how it
works, think back to our chapter on the analysis of variance. In it, we saw the one-
way analysis of variance, where we investigated an independent variable with three
or more levels, and the factorial analysis of variance (e.g., a two-way ANOVA) where
we compared more than one independent variable, regardless of the number of lev-
els within each of the variables. The chi-square tests work using basically the same
principles but instead of each level representing a quantitative value (e.g., test scores),
the levels of the chi-square test contain counts of values (i.e., nominal data). Let’s look
at an example of both a “one-way” and “factorial” chi-square to help us understand
what we are getting into.
ple supported the idea, 105 didn’t like the idea, and 100 didn’t care; these are called
the observed values in each of the cells. This is shown in Table 10.1.
Once we have collected our observed values, the question then becomes, “We
just defined goodness of fit as the comparison of a set of observed values to a set of
expected values; how do we determine the expected values?” That’s easy; since we are
testing goodness of fit, we are interested in comparing the observed values to values
we expect. For example, if we are interested in comparing our results to national aver-
ages showing that 40% of people like the idea and the rest are evenly split (i.e., 30%
each) between not liking the idea or not caring, this results in Table 10.2.
In order to test your hypothesis, you would compare the proportion of data values
in the table to determine if the differences between what you observed and what you
expected are significant.
Before we move forward, it is important to know that the chi-square test is not
designed to be used with a small number of expected values. Because of that, it is
important to ensure you expect at least five occurrences of each of the cell values; try-
ing to do any calculations with less than that will adversely affect your results.
really want to know is how many people with a given opinion fall into each category.
Given that, “opinion” is the independent variable, and the count of each opinion is the
dependent variable where we are collecting nominal data.
In testing your hypothesis, your feeling is that there will be an equal distribution
of “yes” and “no” answers (i.e., the expected values) between the genders. For the sake
of discussion, I have gone ahead and entered a set of equal expected values, along with
a set of observed values, in Table 10.3.
Now, in order to test our hypothesis, we will need to compare our observed values
to our expected values. Before we can do that, we first need to have a better under-
standing of the concepts underlying the chi-square.
In short, I have hypothesized that the observed distribution of responses will not
be different from the expected distribution of responses. Testing this hypothesis will
involve comparing a chi-square value computed from the observed responses to a
critical chi-square value based on a given degrees of freedom value. As we are used to
doing, we will reject our null hypothesis if the computed value of chi-square is greater
than the critical value of chi-square. If we are saying nothing more than the distribu-
tion we observed is significantly different from the distribution we expected. The
formula for computing the chi-square statistic is shown below:
S (O − E )
2
c2 =
E
First, c, the Greek letter “chi”, indicates that we will be computing its squared
value. As you can see in Table 10.5, for each row we are going to:
1. Subtract the expected value (i.e., E) in each row from the observed value
(i.e., O) in each row. This is shown in the second column.
2. Square the value from Step 1. This is shown in the third column.
3. Divide the value from Step 2 by the expected value (i.e., E) for that row; this
gives us the chi-square value for that row. This is shown in the fourth column.
4. Add all of the chi-square values from each row (Step 3) to get the comput-
ed chi-square value. This is shown at the bottom of the fourth column of
Table 10.5.
Now all we must do is compare our computed value to the table (i.e., critical)
value. To do that, we need to know what the distribution looks like.
tion are positive. This distribution is also based on the number of degrees of freedom
from the data we have collected, but how we compute the degrees of freedom is going
to depend on the particular chi-square test we are using. The fewer the degrees of
freedom, the more peaked the distribution; a larger number of degrees of freedom
will result in a flatter distribution that is more skewed to the right. This idea is shown
in Figure 10.1.
As I said earlier, we will compute
degrees of freedom differently for each
of the chi-square tests but let’s focus on
the goodness-of-fit test.
df = (number of entries) – 1
freedom. Second, as you can see, the critical value of chi-square with only 10 degrees
of freedom is 18.31; that’s a really large value. In order to compute a chi-square value
larger than that, you would need observed and expected values that were totally out
of proportion. Needless to say, that does not happen very often.
Knowing that, let’s get back to testing our null hypothesis:
Looking at the table, since we have 2 degrees of freedom, our critical chi-square
value is 5.99. Since our computed value of 8.82 is greater than our critical value, we
reject our null hypothesis; there appears to be a significant difference in the observed
number of responses for each answer and the expected number of responses for each
value. We can see this in Figure 10.2.
In order to use SPSS, let’s first enter
our data. As you can see, we have one
variable, Vote. For the chi-square test to
work, we would need to enter an appro-
priate value for each of our 300 voters.
This means we would enter, for exam-
ple, 95 values of 1 for people who sup-
port the idea, 105 values of 2 for those
who do not like the idea, and 100 values
of 3 for people who do not care (Figure
10.3).
In order to run the actual test, we
would select Analyze, Nonparametric
Tests, Legacy Dialogs, and Chi-Square;
FIGURE 10.2. Testing a hypothesis using the this is seen in Figure 10.4.
computed and critical values of chi-square. We would then identify Vote as
our Test Variable and include Expected
Values of 120, 90, and 90. We have also
asked for descriptive statistics to be computed as shown in Figure 10.5.
SPSS would generate two tables for us. In Figure 10.6, we can see the number of
votes we observed in each category, the number of votes we expected in each category,
and the difference (i.e., the Residual) between the two. In Figure 10.7, we can see a
chi-square value of 8.82, just as we calculated, and a p value (i.e., Asymp. Sig.) of .012,
which also supports our decision to reject the null hypothesis.
FIGURE 10.5. Identifying the Test Variable and the Expected Values.
that we are happy knowing that a significant difference does exist; where that differ-
ence lies is obvious.
Vote
TABLE 10.7. Observed and Expected Values of Vote Data with All
Categories Equal
It is all right with me. I do not like the idea. I do not care.
Observed 95 105 100
Expected 100 100 100
FIGURE 10.8. Identifying the Test Variable and the Expected Values when all categories are
equal.
vote
FIGURE 10.9. Observed and expected values for the vote data.
The Chi-Square Tests | 313
Test Statistics
vote
Chi-square .500
df 2
Asymp. Sig. .779
The school system pointed out to the driver’s union that they felt the number of
absences should be about equally distributed over the course of the week (e.g., 20%
of the absences should occur on Monday, 20% on Tuesday, etc.). The driver’s union
agreed that the numbers seemed higher on Mondays and Fridays but felt it was just a
coincidence. They dismissed the concerns of the school system and told the adminis-
trators they would have to support their allegations before they would take any action.
At this point, the school system had no resort other than to attempt to make its point.
data. Again, although the drivers may disagree, this is something the schools should
investigate.
Day of Week
FIGURE 10.11. Observed and expected values for day of the week data.
The Chi-Square Tests | 315
Since we are dealing with nominal data, it does not make sense to compute most
of the descriptive statistics; the most important value, the mode, was easily seen back
in Table 10.8—the most absences occurred on Fridays. We could, however, use SPSS
to create the bar chart in Figure 10.12 if we wanted; it verifies what we already know.
Gender
There doesn’t appear to be a lot of difference between our observed and expected
values but, as always, let’s use SPSS to check.
Test Statistics
Gender
Chi-square .364
df 1
Asymp. Sig. .546
By looking at the chart, we can see where the parents might be concerned. There
are more males than females in the low- and middle-ability groups but there are nearly
three times as many males as females in the high-ability group. The parents may really
have something to complain about this time!
Since we have used two independent variables to identify nominal-level data, we
will use a two-way chi-square test, often called the chi-square test of independence. This
test is used in situations where we want to decide if values represented by one indepen-
dent variable are not influenced, or are independent of, the same values represented
by another independent variable. In this case, we want to see if the total number of
students, when we look at it from a gender perspective, is independent of the same
total number of students when we look at it from an ability group perspective. We can
investigate this question using the following research hypothesis:
By looking at these hypotheses you can see we have two independent variables:
ability group and gender. Ability group has three levels: low, middle, and high; and
gender has two: male and female. Just like with the factorial ANOVA, this means we
are dealing with a 3 × 2 chi-square table.
Our dependent variable is the actual number of occurrences of each of these val-
ues. As we just saw, there is a total of 200 students—131 males and 69 females broken
into the three separate ability groups. There are 40 males and 24 females in the low-
ability group, 35 males and 24 females in the middle-ability group, and 56 males and
21 females in the high-ability group.
1. E = (64)(131)
200
2. E = 8384
200
3. E = 41.9
The resulting table (i.e., Table 10.10) is sometimes called a contingency table since
we want to decide if placement in one group is contingent on placement in a second
group. If the expected proportions in each cell are not significantly different from the
320 | STATISTICS TRANSLATED
actual proportions, then the groups are independent; in this case, it would mean that
gender does not have an effect on ability group. If the differences are significant, then
we will say the two variables are dependent on one another. That would mean gender
does influence ability group.
1. Subtract the expected value in each cell from the observed value in each cell
(column A in the table).
2. Square that value (column B).
3. Divide that value by the expected value for that cell; this gives us the chi-
square value for that cell (column C).
4. Add the chi-square values for each cell to get the overall chi-square value (the
bottom of column C).
The Chi-Square Tests | 321
df = (R – 1)(C – 1)
Here, the actual number of boys and girls in each ability group has been replaced
by the percentage of the group they represent. For example, out of all of the students in
the low-ability group, 62.5% of them are males; obviously the rest are females. In the
high-ability group, 72.7% are males while only 27.3% are females. When you first look
at this, it does seem like there is some inequality. You cannot, however, go by just the
cell values; you also have to compare your cell percentages to the overall percentages.
For example, our sample of 200 kids is 65.5% male; that means you would expect
about 65.5% of the students in each group to be male. As you can see, the percentage
of males in each group is close to 65.5%. At the same time, the overall percentage
of females in the study is 34.5%; again, the cell percentages are close to this value.
Because of this, we can say that the two variables, gender and ability level, are inde-
pendent of one another and any differences we see are purely due to chance. In other
words, if we know something about a person’s gender, it doesn’t tell us a thing about
her ability level.
Suppose, however, the p value had been less than .05 and we were able to reject
the null hypothesis, thereby saying there was some dependence between the two vari-
ables. Have we seen a situation like that before? Sure we have. In essence, what we
would be stating is that the two variables interact with one another. This is the same
type of phenomenon we saw with the factorial ANOVA, but this time we are using
nominal data.
FIGURE 10.16. Selecting the Crosstabs command on the Data View spreadsheet.
The Chi-Square Tests | 323
female and used values from 1 to 3 to represent low-, middle-, and high-ability groups.
At that point, we select Analyze, Descriptive Statistics, and Crosstabs.
Following that, as shown in Figure 10.17, we select Gender as our Row and Ability
Group as our Column; we also ask for Descriptive Statistics as shown in Figures 10.18
and 10.19.
FIGURE 10.17. Identifying the rows and columns for the Crosstabs command.
Ability Group
Female 24 24 21 69
Total 64 59 77 200
FIGURE 10.18. Counts of gender within ability group.
Because of rounding issues, the chi-square value here is slightly different from
that we computed manually, but the p value of .220, when divided by 2 (i.e., .110), still
shows that we fail to reject our null hypothesis. This means, as we said earlier, that
gender and ability level are independent of one another; any differences we see are
purely due to chance.
324 | STATISTICS TRANSLATED
Chi-Square Tests
FIGURE 10.19. Output of the Crosstabs command used for the chi-square test of independence.
Total Count 25 35 60
If we wanted to, we could use a histogram to display our data; in this case, we can
use the histogram in Figure 10.20 to help us see the relationship between the ethnic
groups and their level of support.
FIGURE 10.20. Bar chart showing counts of punishment within ethnic group.
Our p value of .007 is well below our alpha value of .05, so we must reject our
null hypothesis; the use of corporal punishment is not independent of ethnic group.
This just means that corporal punishment is used inordinately among different ethnic
groups.
Count
punishment
Punished Not Punished Total
Ethnic Group A 5 15 20
Group B 14 6 20
Group C 6 14 20
Total 25 35 60
Chi-Square Tests
In an effort to meet the needs of students who cannot be time and place
dependent, distance education programs are offered at most institutions of
328 | STATISTICS TRANSLATED
higher education throughout the United States. Many critics believe that a
student’s chance of failure from these programs may be higher because of a
conflict between their learning style and the manner by which the course is
delivered.
Learning Style
Accomodator Assimilator Diverger Converger Total
Graduate Graduate 9 16 12 10 47
Non-graduate 5 12 6 10 33
Total 14 28 18 20 80
FIGURE 10.24. Bar chart showing counts of graduation within learning style.
Chi-Square Tests
I have collected from distance education students over the past 15 years. Believe it or
not, and contrary to what we might think, it seems that learning style doesn’t make a
difference in distance education.
Summary
The chi-square test, more than likely, is the only nonparametric test that most begin-
ning statisticians will ever use. The key points to remember are that we are analyzing
nominal (i.e., categorical) data and, unlike the earlier tests we have looked at, we are
not going to use the chi-square test to compare means or variances. Instead, we are
going to compare the frequency of values in a data distribution; we will either compare
this observed frequency to an expected frequency that we specify or an expected
frequency that has been predetermined. As was the case with the analysis of variance,
we can use the chi-square test for either one independent variable (i.e., the goodness-
of-fit test) or multiple independent variables (i.e., the test of independence).
Quiz Time!
As you’ve just seen, working with the chi-square test is very easy. Before we move forward, how-
ever, take a look at each of the following scenarios. After you’ve read through these cases and
answered the questions, you can check your work at the end of the book.
1. What are the null and research hypotheses the professor is investigating?
2. What are the independent variables and their levels?
3. What is the dependent variable?
4. Which chi-square test would he need to test the hypothesis?
5. What does this tell the professor about undergraduate major and performance in a
graduate school psychology class?
Grade
A B C D F Total
Other 10 15 13 11 12 61
Total 34 25 20 17 19 115
Chi-Square Tests
1. What are the null and research hypotheses the professor is testing?
2. What is the independent variable and its levels?
3. What is the dependent variable?
4. Which chi-square test would he need to test the hypothesis?
5. What does this tell his class about the distribution of grades according to his grad-
ing scheme?
Grade
Test Statistics
Grade
Chi-square 5.327
df 2
Asymp. Sig. .070
borhood. “After all,” they said, “that means we will have a lot of indigent people wandering
around; they are the ones with drug problems!” In order to investigate their concerns, let’s
imagine we have collected relevant data and have produced Figures 10.30 and 10.31.
Income
$20,000 to $40,000 to $60,000 to $80,000 and
< $20,000 $39,999 $59,999 $79,999 higher Total
Chi-Square Tests
1. What are the null and research hypotheses the manager is investigating?
2. What is the independent variable and its levels?
3. What is the dependent variable?
334 | STATISTICS TRANSLATED
4. Which chi-square test would the manager use to test this hypothesis?
5. What would the manager learn about gender representation in the company?
Employee Gender
Test Statistics
Employee
Gender
Chi-square .646
df 1
Asymp. Sig. .421
Introduction
When we discussed descriptive statistics, we talked about the idea of using scatter-
plots to look at the relationship between two sets of quantitative data. We saw cases
where two values, votes and campaign spending for example, might have a positive
relationship—higher levels of spending lead to a greater number of votes. In another
case, we might collect data about the amount of time people exercise daily and their
blood pressure. In this case, we might find a negative relationship: when one value
goes up (i.e., the amount of exercise) the other value goes down (i.e., blood pressure).
In this chapter we are going to expand on that idea by learning how to numerically
describe the degree of relationship by computing a correlation coefficient. We will
continue by showing how to use scatterplots and correlation coefficients together
to investigate hypotheses. The operative word here is “investigate”; that word will be
important in just a few minutes.
335
336 | STATISTICS TRANSLATED
reimbursement spent by a company, in any given year, was correlated with the number
of resignations during that same time period. As a reminder, the management of the
company was concerned that more and more employees were taking advantage of that
benefit, becoming better educated and moving to another company for a higher sal-
ary. Using the following data, the company’s manager decided to investigate whether
such a relationship really existed. In Table 11.2, each row represents a given year, the
amount of tuition for that year, and the number of resignations for that same year.
You can see that there is no easily identified dependent variable. Since the correla-
tion is a descriptive tool, we are interested in using our numeric statistics to look at the
relationship between two variables, tuition and resignations. As we just said, manage-
ment is worried that the more money they spend, the greater the number of resigna-
tions. As we said, though, that is only one of three possibilities. First, if they are right,
as the amount of tuition reimbursement goes up, so will the number of resignations.
A second possibility is that, when the amount of tuition money goes up, the number
of resignations goes down. The final option is that there will be no obvious pattern
between the number of resignations and the amount of tuition money.
In situations like this, we can investigate these relationships with any of several
correlational statistics. The key in determining which correlation tool to use depends
on the type of data we have collected. In this case, both resignations and tuition spent
are quantitative (i.e., interval or ratio data). Because of that, we will use the Pearson
The Correlational Procedures | 337
product–moment correlation coefficient (most people call it Pearson’s r). Karl Pearson was
interested in relationships of this type and developed a formula that allows us to calcu-
late a correlation coefficient that numerically describes the relationship between two
variables. Here is his formula; we will go through it step by step and soon be using it
with no problem.
r=
(∑ xy) − (∑ x * ∑ y)
n
n x2 −
(∑ x ) n∑ y2 − (∑ y)
2 2
∑
First, let’s look at the data We will use in Table 11.3, specifically columns 2 and 3.
If you closely examine these data, you can see that, generally speaking, as tuition
goes up, so do resignations. In order to continue with our calculations, we will modify
Table 11.3 and create Table 11.4 and refer to tuition as x and resignations as y; we will
label these as columns A and C.
TABLE 11.4. Values Needed for Computing the Pearson Correlation Coefficient
x y
32 20
38 25
41 28
42 25
50 30
60 32
57 40
70 45
80 45
100 90
(A) (C)
338 | STATISTICS TRANSLATED
Let’s make another modification and create Table 11.5 by adding two columns
where we can enter the values when we square each of the x and y values and one
column where we can put the product when we multiply each value of x by each value
of y. We are also going to put a row in at the bottom where we can sum everything in
each of the columns; these new columns will be labeled B, D, and E.
TABLE 11.5. Values Needed for Computing the Pearson Correlation Coefficient
x c2 y y2 xy
32 1024 20 400 640
38 1444 25 625 950
41 1681 28 784 1148
42 1764 25 625 1050
50 2500 30 900 1500
60 3600 32 1024 1920
57 3249 40 1600 2280
70 4900 45 2025 3150
80 6400 45 2025 3600
100 10000 90 8100 9000
Sx = 570 Sx2 36562 3 Sy = 380 Sy2 = 18108 Sxy = 25238
(A) (B) (C) (D) (E)
In this case, remember that n is the number of pairs of data in the equation, in
this case 10. Knowing that, we now have everything we need to include in the formula,
and we can compute Pearson’s r using the following steps. Notice we are just taking the
values from the bottom row of Table 11.5 and inserting them into the formula; again,
we have these columns labeled A–E.
10( E) − ( A * C )
1. r =
[10 * B − ( A) 2][10 * D − (C ) 2]
(252380 − 216600)
4. r =
40720 * 36680
35780
5. r =
1493609600
35780
6. r =
38647.25
7. r = .926
The Correlational Procedures | 339
We could, of course, use SPSS to analyze the same data; we would start by input-
ting our data and selecting Analyze, Correlate, and Bivariate (i.e., two variables) as
shown in Figure 11.1.
FIGURE 11.1. Using the Correlate and Bivariate options on the Data View spreadsheet.
In Figure 11.2, we identify the two fields we want to correlate—tuition and res-
ignations. Obviously, since we are dealing with two quantitative variables, we select
Pearson’s correlation.
These commands would produce the table in Figure 11.3. As you can see in the
figure, Pearson’s r is exactly what we calculated by hand. You can also see the software
has given us a p value (i.e., Sig. 2-tailed) of zero; this indicates a significant correlation.
We will discuss that shortly, but for now let’s focus on the r value itself.
Interpreting Pearson’s r
In this case our r value is .926 but it could have ranged from –1 to +1. When you are
computing a correlation, three things can happen:
FIGURE 11.2. Selecting the variables to be correlated on the Data View spreadsheet.
Correlations
Tuition (in
thousands of
dollars) Resignations
**
Tuition (in thousands of Pearson Correlation 1 .926
dollars) Sig. (2-tailed) .000
N 10 10
**
Resignations Pearson Correlation .926 1
N 10 20
**. Correlation is significant at the 0.01 level (2-tailed).
A Word of Caution
A positive correlation exists when the values of two variables both go
up or if both values go down. Only when one variable goes up and the
other goes down do we say there is a negative correlation. Do not make
the mistake of labeling a case in which both variables go down as a negative correla-
tion. The three possibilities are shown below in Table 11.6.
The Correlational Procedures | 341
One value goes up, the Both values go down. Both values go up.
other value goes down.
In the preceding case, we have a very high r value of .926, but what does that
really mean? How high does the r value have to be for us to consider the relationship
between the variables to be meaningful? The answer to this question is that the inter-
pretation of the r value depends on what the researcher is looking for.
Any time we analyze data, an r value of .90 or greater would be excellent for a pos-
itive correlation and a value less than –.90 would be great for a negative correlation.
In both instances we could clearly see that a relationship exists between the variables
being considered. At other times, we might consider an r value in the .60s and .70s (or
–.60 to –.70) to be enough; it would just depend on the type of relationship we were
looking for. Obviously, if our correlation coefficient is smaller than this, especially
anything between zero and .50 for a positive correlation or between –.50 and zero for
a negative correlation, it is apparent that a strong relationship apparently does not
exist between the two variables.
United States. While this is obviously a cause for great concern, and it is very impor-
tant information, at the same time it could be very misleading. Why? Doesn’t it stand
to reason that Florida has more cases than South Dakota, and New York has more
cases than Vermont? Of course, it does, the number of cases of Covid-19 is positively
correlated to the number of residents of the state! States with more residents have
larger numbers of residents testing positive for the disease than do states with smaller
populations. Wouldn’t it be more meaningful if we read or heard information such as
the percentage of residents affected or the number of cases per 100,000 residents in
a given location? Yes, this would be more accurate, but it doesn’t take away from the
message—regardless of where you are and the number of people who are ill, always
protect yourself and those around you!
These are perfect examples of being a good consumer of statistics. You can see
quite a few of these noncausal correlations being presented in various newspapers,
magazines, and other sources; in a lot of cases, people are trying to use them to bol-
ster their point or get their way about something. What is the lesson? Simple, always
look at the big picture.
Now, let’s use the same dataset but create Table 11.7 by changing the resignation
numbers just a bit.
Again, just by looking, we can see that an apparent relationship exists. In this
case, it appears that the number of resignations goes down as the amount of tuition
money spent goes up. Let’s check that using the information given in Figure 11.4.
Here SPSS computed an r value of –.419 and, since it is between –.5 and zero, it
is a rather small negative correlation. While there is a correlation between the money
spent and the resignations, it is not very strong. We can also see that the r value is
negative. This indicates an inverse relationship: when one of the values goes up, the
other value goes down. In this case we can see that, as the amount of tuition money
spent gets larger, there are fewer resignations. Remember, although a correlation
exists here, there may or may not be a causal relationship; be careful how you inter-
pret these values.
When you look at Table 11.8; do you see a pattern emerging? Does there seem to
be a logical correlation between the two sets of values? If you look closely, you will see
The Correlational Procedures | 343
Correlations
Tuition Resignations
Tuition Pearson Correlation 1 -.419
N 10 10
Resignations Pearson Correlation -.419 1
Sig. (2-tailed) .228
N 10 10
FIGURE 11.4. Negative Pearson correlation coefficients.
there does not appear to be. Sometimes a value in the left column will have a much
greater value in the right column (e.g., 41 and 58), and sometimes the value in the
right column will be much lower (e.g., 32 and 20). As you can see in the SPSS output
in Figure 11.5, because of the lack of a pattern, Pearson’s r is only .013; this supports
our observation that there does not seem to be a meaningful correlation between the
two sets of values. Remember, though, this might be purely coincidental; you cannot
use these results to infer cause and effect.
Correlations
Tuition Resignations
N 10 10
Resignations Pearson Correlation .013 1
Sig. (2-tailed) .971
N 10 10
In this case, let’s imagine we are members of the board of directors of a company
where several issues have arisen that seem to be affecting employee morale. We have
decided that our first step is to ask both the management team and employees to rank
these issues in terms of importance. We then want to determine if there is agreement
between the two groups in terms of the importance of the issues labeled A–J.
Here we have three columns: the first column represents the 10 issues facing the
company (i.e., A–J) that employees and management are asked to rank. The second
column shows how the issues were ranked by the employees. The third column shows
how the issues were ranked by the management. We can use the Spearman rank-dif-
ference correlation formula to help us determine if the employees and management
ranked them in a similar order:
rs = 1−
6 (∑ d 2 )
n(n2 − 1)
This formula is more straightforward than the Pearson formula, but there are
components we have not used up to this point. First, rs is the symbol for Spearman’s
rho; that is the correlation coefficient we are interested in computing. Notice, we are
using the same symbol as Pearson’s r except we are adding a lower-case s to differenti-
ate it. Second, the number 6 is exactly that: a constant value of 6. Next, the lower-case
n is the number of objects we are going to rank; in this case we have 10.
Computing the square value of all the rankings (i.e., d2) involves first determining
the difference between the employee and management rankings; this is shown in the
column labeled d in Table 11.10. For example, the employees’ ranking for issue D is
an 8; management’s ranking for that same issue is 1; the difference between the two
The Correlational Procedures | 345
rankings is 7. The d2 value for that issue is 49 (i.e., 7 * 7); this is nothing more than
the difference value squared.
We can compute our Spearman’s rho coefficient using the following steps:
6(108)
1. r s = 1 −
10(100 − 1)
648
2. r s = 1 −
990
3. rs = 1 – .655
4. rs = .345
Correlations
Employees Management
N 10 10
FIGURE 11.6. Spearman’s rho correlation coefficient showing a moderate positive correlation.
The p value shown as part of the output for a correlation tests the hypothesis, “the
correlation coefficient in the population is significantly different from zero.” By using
this hypothesis, statisticians can make decisions about a population r value by using
sample data. This is very, very rarely done by beginning statisticians, so we won’t go
into any further detail. Knowing that, let’s move forward to our first case study.
Before we move forward, look at what I just wrote. Is this really a problem? No,
it may not be; since the correlation measures relationships that are not necessarily
causal, we are simply trying to determine if a problem may exist.
The Correlational Procedures | 347
Right off the bat, you can see something different about this hypothesis in that
it does not include the word “significant.” This is because of three things. First, the
correlational procedures are descriptive statistics. When we learned to calculate the
correlations earlier, we saw that they are used to tell us whether a linear relationship
exists between two variables. Second, despite the fact they are descriptive in nature,
many people will state a hypothesis and then use the correlation to investigate rather
than test it. Third, because we are not involved in testing a null hypothesis, we are
going to subjectively evaluate our research hypothesis. Because of these things, the
wording and processes of our six-step model are going to change slightly.
Just by looking closely at this table, it appears that our hypothesis may be right
on target; it does seem as if the kids with the lower number of absences have higher
test scores. Since we have already seen, step by step, how to calculate the descriptive
statistics, let’s just look at the results from SPSS shown in Figure 11.7.
Descriptive Statistics
Overall, these do not look too bad; we have an average of 5.2 absences and an aver-
age grade of 84.30. Remember, though, these descriptive statistics are not as meaning-
ful to us as when we were looking for significant differences. Let’s move forward and
try to resolve our hypothesis.
Correlations
N 20 20
Final Exam Score Pearson Correlation -.836 1
FIGURE 11.8. Negative Pearson correlation coefficients for absence and exam scores.
In Figure 11.9, the data values have been plotted on the scatterplot with grades
plotted across the x (bottom) axis and the corresponding number of absences plot-
ted along the y (left) axis. We instruct-
ed SPSS to plot a line of best fit and it
shows a nearly 45-degree angle upward
to the left. When we see a line such as
this, we immediately know the relation-
ship we have plotted is negative; when
one value goes up, the other goes down.
In this case, since the line is nearly at a
45-degree angle, we know the relation-
ship is fairly strong. This verifies what
we saw with Pearson’s r.
Always keep in mind, however, that
while both the computed value of r and
the scatterplot indicate a strong relation-
ship, do not be fooled. Correlations are
just descriptive measures of the relation-
FIGURE 11.9. Scatterplot showing the negative
correlation between absences and exam scores.
ship; that doesn’t necessarily mean that
one caused the other to happen.
350 | STATISTICS TRANSLATED
Descriptive Statistics
FIGURE 11.10. Descriptive statistics for GPA and hours slept data.
Here we have a B grade point average (i.e., 3.0725). The average hours slept is
somewhat low, but remember, those two things aren’t important. We are interested
in the relationship between the two variables. By looking at the dataset, we can see
that there is no apparent relationship between the two. For example, some students
who slept 4 hours had a grade point average of 4.00, while others who slept the same
amount of time had a grade point average of 2.00. The lowest grade point average
(1.15) was shared by two students; one of them slept 4.5 hours, while the other slept
over 7 hours. Given this, we will be expecting a small correlation coefficient and a flat
line on the scatterplot. Let’s see what happens.
Correlations
N 32 32
Hours Slept Pearson Correlation .084 1
FIGURE 11.12. Scatterplot showing the weak correlation between GPA and hours slept.
The Correlational Procedures | 353
even though correlations never prove cause and effect, the students will be using these
results as “scientific proof” that they should be able to stay up later.
The purpose of this study is to examine the relationship between height and
weight.
Descriptive Statistics
We can see that the 20 participants have an average height of 66.8 inches (i.e., 5′ 6.8″)
and an average weight of 151.5 pounds.
Correlations
N 20 20
Height in Inches Pearson Correlation .629 1
N 20 20
FIGURE 11.14. Pearson correlation coefficient showing a moderately high correlation between
weight and height.
The Correlational Procedures | 355
The chart has three columns. The first column represents the 10 different for-
mulas we are interested in ranking. Just for the sake of our discussion, we will assume
that formula A is that from the United States. The second column shows the average
ranking for each brand by persons from the United States; the third column shows the
same thing for persons from outside the United States.
The Correlational Procedures | 357
Correlations
Foreign
US Ranking Ranking
N 10 10
FIGURE 11.16. Spearman’s rho showing a moderate negative correlation in rankings between
the United States and foreign countries.
Linear Regression
If we had a dataset of height and weight data and wound up with an r value of 1.00, we
would be able to state, with 100% accuracy, either a person’s height based on knowing
his weight or his weight based on knowing his height. Unfortunately, we rarely, if ever,
have that degree of accuracy.
At the same time, unless the r value is zero, we know there is some degree of
relationship between height and weight. Knowing that, and depending on the value
of r, we should be able to predict, with some accuracy, one of the variables based on
knowing the other variable. The key to doing so is a thorough understanding of the
line of best fit. Up to this point we have referred to it as a line that shows the trend
of a data distribution, but now we need to know exactly what it is, how to compute it,
and how to use it.
y = a + bx
First, y is the value we are trying to predict. For example, if we wanted to predict
weight based on height, we would change our equation to read:
Weight = a + bx
Lower-case x represents the value of the predictor variable. In this case, we know
a person’s height, so let’s enter that into our equation.
Weight = a + b(height)
The next symbol, b, represents the slope; this tells us how steep or flat the best-fit
line is. We can enter it into our equation.
The Correlational Procedures | 359
Weight = a + slope(height)
Finally, a is the point on the y-axis where the line of best fit crosses for a given
value of x; we call this the intercept.
Since we do not know the value for weight but we do know the value for height,
all we need to get this equation to work is to determine how to compute the intercept
and the slope.
You can see that I have already done some of the basic math (e.g., squared each
of the height and weight values) that we will need to use in our equation. We can put
these into our regression equation and work through the following four steps:
The Correlational Procedures | 361
20(204465) − (1336)(3030)
1. slope =
20(89880) − (1336)2
4089300 − 4048080
2. slope =
1797600 − 1784896
41220
3. slope =
12704
4. slope = 3.24
1. Intercept = y − slope(x )
2. Intercept = 151.5 – 3.24(66.8)
This gives us an intercept value of –65.242. Notice that this number is negative;
that is perfectly normal and happens from time to time. Using it, we have everything
we need for our regression formula so let’s pick a value of x (i.e., the height), 65 for
example, and predict what our y value (i.e., the weight) should be.
FIGURE 11.25. Creating the height and weight variables in the Variable View spreadsheet.
FIGURE 11.26. Height and weight data in the Data View spreadsheet.
364 | STATISTICS TRANSLATED
FIGURE 11.28. Identifying the independent variable, the dependent variable, and the statis-
tics in the Linear Regression command.
The Correlational Procedures | 365
a
Coefficients
Model Standardized
Unstandardized Coefficients Coefficients
B Std. Error Beta T Sig.
If we computed Pearson’s r for the two values, as shown in Figure 11.30, this would
mean it would be exactly 1.00, indicating a perfect positive linear relationship. If we
look at Figure 11.31, we first see that our standard error is zero; given that, we do
not expect error to interfere with our ability to use the regression procedure to accu-
rately use the predictor variable to predict the criterion variable. We can also see our
366 | STATISTICS TRANSLATED
Correlations
N 20 20
Weight Pearson Correlation 1.000 1
N 20 20
a
Coefficients
Model Standardized
Unstandardized Coefficients Coefficients
intercept (i.e., constant) and our slope (i.e., height). In this case if we know a person’s
height, we can accurately state their weight.
If everything we have said up to this point is true, then we should be able to use
the slope and height, along with a given value of the predictor variable, to accurately
predict a criterion variable. Let’s use a height of 64 and enter all these data into our
equation:
perfect linear relationship between the predictor and criterion variables. This perfect
relationship, of course, happens very rarely. In most cases, we can use our regression
equation to help predict values, but we must be aware of the error inherent in the
process.
Safety groups have stated they believe that drivers reaching the age of 65
should be tested each year; their reasoning is that, once a driver reaches that
age, their ability to safely drive an automobile diminishes. Elderly rights
groups feel there is no manner by which the given number of accidents in a
year’s time can be predicted by age. This study will collect and analyze data
to see if such a relationship exists.
There is no relationship between the age of elderly drivers and the number of
automobile accidents they are involved in during a given year.
368 | STATISTICS TRANSLATED
Descriptive Statistics
Correlations
Number of
Accidents Driver's Age
Driver's Age 20 20
An average of 3.2 accidents per year is somewhat disturbing. What is even more
worrisome is that our correlation coefficient is negative; it seems there is an inverse
relationship between age and the number of accidents. Remember, although we are
getting a good picture of what seems to be happening, these are descriptive statistics.
Let’s move forward and see what happens from here.
Coefficients
Model Standardized
Unstandardized Coefficients Coefficients
B Std. Error Beta t Sig.
1.
Accidents = 15.086 + –.159(82)
2.
Accidents = 15.086 – 13.038
3.
Accidents = 2.084
FIGURE 11.36. Scatterplot of the age and
accident data.
370 | STATISTICS TRANSLATED
Unfortunately, for the highway safety personnel, the results were exactly the
opposite of what they believed: the 85-yearold driver is estimated to be in about two
accidents per year, while the 65-year-old drivers will be in over four accidents during
that same time period!
Remember, although it is implied, we are not proving cause and effect here. Our
coefficient of determination is only .481. This tells us that only 48.1% of the change
in the criterion variable is attributable to age. This means there is error involved;
obviously, other things predict the number of accidents. In this case, we might have to
consider other factors such as number of miles driven per year, number of other driv-
ers on the road, and prevalent weather conditions in each driver’s location.
Summary
Most of my students agree that the correlational procedures are extremely easy to
understand. We used basic math skills to help us decide if a linear relationship exists
between two sets of data. If we do find a linear relationship, we then determine if we
can use one set of data to predict values in the other set. Although we did not talk
about them, we could consider many other tools of this type: correlations where there
are more than two variables; regression formulas with more than one predictor variable
(i.e., multiple regression); and regression equations where you can use your data to
predict a nominal value (i.e., logistic regression). Use of these tools is beyond the scope
of this book, but the good news is that once you understand the purpose and use of
the tools in this chapter, understanding the others is made a lot easier.
Intercept:
Intercept = y − slope( x )
Slope of a line:
slope =
n (∑ xy) − (∑ x )(∑ y)
n (∑ x 2 ) − (∑ x )
2
Quiz Time!
We covered everything we need to cover in this chapter but, before we go, let’s make sure you
completely understand everything we have done. Answer each of these questions and then
turn to the end of the book to check your work.
Descriptive Statistics
FIGURE 11.37. Descriptive statistics for father and son age data.
Correlations
Father's Age 10 10
FIGURE 11.38. Positive Pearson correlation coefficients between father and son age data.
372 | STATISTICS TRANSLATED
a
Coefficients
Model Standardized
Unstandardized Coefficients Coefficients
Correlations
Democrat Republican
N 10 10
Republican Correlation Coefficient -.515 1.000
Sig. (2-tailed) .128 .
N 10 10
FIGURE 11.40. Spearman rho correlation coefficients for rankings between political parties.
Descriptive Statistics
FIGURE 11.41. Descriptive statistics for income and highest year of education data.
Correlations
FIGURE 11.42. Moderate Pearson correlation coefficient between income and highest year of
education.
a
Coefficients
Model Standardized
Unstandardized Coefficients Coefficients
1. If you were asked to speak to a group of potential high school dropouts, how would
you interpret Pearson’s r?
2. In comparison to the overall average, what would you say to someone who was
thinking about dropping out in the ninth grade?
Descriptive Statistics
FIGURE 11.44. Descriptive statistics for the number of children and highest year of education.
Correlations
FIGURE 11.45. Small negative Pearson correlation coefficient between number of children and
highest year of education.
a
Coefficients
Model Standardized
Unstandardized Coefficients Coefficients
4. Generally speaking, in which direction would the line of best fit flow?
5. Combining what you know from Scenario 3, what would you say to the students, in
terms of living in poverty?
Conclusion: Have We Accomplished
What We Set Out to Do?
When I wrote the first edition of this book, I had several goals in mind. First, I wanted
the reader to put any past statistical prejudices behind them and look at this topic
from a new perspective. Second, I wanted the reader to understand that, in most
cases, you are going to use a relatively easy-to-understand set of statistical tools. Third,
even though we worked through the manual calculations, we also saw we could use
SPSS, one of the many easily used software packages, for our calculations. Lastly, I
wanted the reader to understand there is a straightforward way they can approach
these cases. People that read that edition were very complimentary, I was glad that I
accomplished what I set out to do. I asked then did the book accomplish the goals I set
out for it. Did I accomplish these goals? Let’s take a moment and see.
375
376 | STATISTICS TRANSLATED
A Straightforward Approach
As I said at the beginning of the book, I can remember being faced with situations
when I first started learning about statistics where I was not sure which statistical tech-
nique I should be using. This anxiety, combined with the other issues I have already
discussed, led to a great deal of reluctance on my part, as well as that of my classmates,
to get excited about what we were supposed to be doing. As you saw in this text, how-
ever, we were able to identify a set of six steps that helped us navigate through the sta-
tistical fog! As I have said before, these six steps do not cover every statistical situation
you might find yourself in. They should, however, cover most cases.
At Long Last
So, there we have it, it seems we have met our goals. I hope you are now able to not
only identify and interpret the statistics you will need and, just as importantly, you
are far more comfortable as a consumer of statistics. We have taken what many of you
considered to be a huge, anxiety-provoking problem, and we have shown how easy it
is solving it if we only approach it using small, logic-driven steps. Why can’t the rest of
life be so simple?
APPENDIX A
z score 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09
0.0 .0000 .004 .0080 .0120 .0160 .0199 .0239 .0279 .0319 .0359
0.1 .0398 .0438 .0478 .0517 .0557 .0596 .0636 .0675 .0714 .0753
0.2 .0793 .0832 .0871 .0910 .0948 .0987 .1026 .1075 .1103 .1141
0.3 .1179 .1217 .1255 .1293 .1331 .1368 .1406 .1443 .1480 .1517
0.4 .1554 .1591 .1628 .1664 .1700 .1736 .1772 .1808 .1844 .1879
0.5 .1915 .1950 .1985 .2019 .2054 .2088 .2123 .2157 .2190 .2224
0.6 .2257 .2291 .2324 .2357 .2389 .2422 .2454 .2486 .2517 .2549
0.7 .2580 .2611 .2642 .2673 .2704 .2734 .2764 .2794 .2823 .2852
0.8 .2881 .2910 .2939 .2967 .2995 .3023 .3051 .3078 .3106 .3133
0.9 .3159 .3186 .3212 .3238 .3264 .3289 .3315 .3340 .3365 .3389
1.0 .3413 .3438 .3461 .3485 .3508 .3531 .3554 .3577 .3599 .3621
1.1 .3643 .3665 .3686 .3708 .3729 .3745 .3770 .3790 .3810 .3830
1.2 .3849 .3869 .3888 .3907 .3925 .3944 .3962 .3980 .3997 .4015
1.3 .4032 .4049 .4066 .4082 .4099 .4115 .4131 .4147 .4162 .4177
1.4 .4192 .4207 .4222 .4236 .4261 .4265 .4279 .4291 .4306 .4319
1.5 .4332 .4345 .4357 .4370 .4382 .4394 .4406 .4418 .4429 .4441
1.6 .4452 .4463 .4474 .4484 .4495 .4505 .4515 .4525 .4535 .4545
1.7 .4554 .4564 .4573 .4582 .4591 .4590 .4608 .4616 .4625 .4633
1.8 .4641 .4649 .4645 .4664 .4671 .4678 .4686 .4693 .4699 .4706
1.9 .4713 .4719 .4726 .4732 .4738 .4744 .4750 .4756 .4761 .4767
2.0 .4772 .4778 .4783 .4788 .4793 .4798 .4803 .4808 .4812 .4817
2.1 .4821 .2826 .4830 .4834 .4838 .4842 .4846 .4850 .4854 .4857
2.2 .4861 .4874 .4868 .4871 .4875 .4887 .4881 .4884 .4887 .4890
2.3 .4893 .4896 .4998 .4901 .4904 .4906 .4909 .4911 .4913 .4916
2.4 .4918 .4920 .4922 .4925 .4927 .4929 .4931 .4932 .4934 .4936
2.5 .4938 .4940 .4941 .4943 .4945 .4946 .4948 .4949 .4951 .4952
2.6 .4953 .4955 .4956 .4957 .4959 .4960 .4961 .4962 .4963 .4964
2.7 .4965 .4966 .4967 .4468 .4969 .4970 .4971 .4972 .4973 .4974
2.8 .4974 .4975 .4976 .4977 .4977 .4978 .4979 .4979 .4980 .4981
2.9 .4981 .4982 .4982 .4983 .4985 .4984 .4985 .4985 .4986 .4986
3.0 .4987 .4987 .4987 .4988 .4988 .4989 .4989 .4989 .4990 .4990
377
APPENDIX B
Critical Values of t
df Alpha = .10 Alpha = .05 Alpha = .025 df Alpha = .10 Alpha = .05 Alpha = .025
1 3.078 6.314 12.706 18 1.330 1.734 2.101
2 1.886 2.920 4.303 19 1.328 1.729 2.093
3 1.638 2.353 3.182 20 1.325 1.725 2.086
4 1.533 2.132 2.776 21 1.323 1.721 2.080
5 1.476 2.015 2.571 22 1.321 1.717 2.074
6 1.440 1.943 2.447 23 1.319 1.714 2.069
7 1.415 1.895 2.365 24 1.318 1.711 2.064
8 1.397 1.860 2.306 25 1.316 1.708 2.060
9 1.383 1.833 2.262 26 1.315 1.706 2.056
10 1.372 1.812 2.228 27 1.314 1.703 2.052
11 1.363 1.796 2.201 28 1.313 1.701 2.048
12 1.356 1.782 2.179 29 1.311 1.699 2.045
13 1.350 1.771 2.160 30 1.310 1.697 2.042
14 1.345 1.761 2.145 40 1.303 1.684 2.021
15 1.341 1.753 2.131 60 1.296 1.671 2.000
16 1.337 1.746 2.120 120 1.289 1.658 1.980
17 1.333 1.740 2.110 ∞ 1.282 1.645 1.960
378
APPENDIX C
Within-group
degrees of Between-groups degrees of freedom
freedom 1 2 3 4 5 6 7 8 9 10
1 4052 5000 5403 5625 5764 5859 5928 5982 6022 6056
2 98.50 99 99.17 99.25 99.30 99.33 99.36 99.37 99.39 99.40
3 34.12 30.82 29.46 28.71 28.24 27.91 27.67 27.49 27.35 27.23
4 21.20 18.00 16.69 15.98 15.52 15.21 14.98 14.80 14.66 14.55
5 16.26 13.27 12.06 11.39 10.97 10.67 10.46 10.29 10.16 10.05
6 13.75 10.92 9.78 9.15 8.75 8.47 8.26 8.10 7.98 7.87
7 12.25 9.55 8.45 7.85 7.46 7.19 6.99 6.84 6.72 6.62
8 11.26 8.65 7.59 7.01 6.63 6.37 6.18 6.03 5.91 5.81
9 10.56 8.02 6.99 6.42 6.06 5.80 5.61 5.47 5.35 5.26
10 10.04 7.56 6.55 5.99 5.64 5.39 5.20 5.06 4.94 4.85
11 9.65 7.21 6.22 5.67 5.32 5.07 4.89 4.74 4.63 4.54
12 9.33 6.93 5.95 5.41 5.06 4.82 4.64 4.50 4.39 4.30
13 9.07 9.7 5.74 5.21 4.86 4.62 4.44 4.30 4.19 4.10
14 8.86 6.51 5.56 5.04 4.69 4.46 4.28 4.14 4.03 3.94
15 8.68 6.36 5.42 4.89 4.56 4.32 4.14 4.00 3.89 3.80
16 8.53 6.23 5.29 4.77 4.44 4.20 4.03 3.89 3.78 3.69
17 8.40 6.11 5.18 4.67 4.34 4.10 3.93 3.79 3.68 3.59
18 8.29 6.01 5.09 4.58 4.25 4.01 3.84 3.71 3.60 3.51
19 8.18 5.93 5.01 4.50 4.17 3.94 3.77 3.63 3.52 3.43
20 8.10 5.85 4.94 4.43 4.10 3.87 3.70 3.56 3.46 3.37
21 8.02 5.78 4.87 4.37 4.04 3.81 3.64 3.51 3.40 3.31
22 7.95 5.72 4.82 4.31 3.99 3.76 3.59 3.45 3.35 3.26
23 7.88 5.66 4.76 4.26 3.94 3.71 3.54 3.41 3.30 3.21
24 7.82 5.61 4.72 4.22 3.90 3.67 3.50 3.36 3.26 3.17
25 7.77 5.57 4.68 4.18 3.85 3.63 3.46 3.32 3.22 3.13
26 7.72 5.53 4.64 4.14 3.82 3.59 3.42 3.29 3.18 3.09
27 7.68 5.49 4.60 4.11 3.78 3.56 3.39 3.26 3.15 3.06
28 7.64 5.45 4.57 4.07 3.75 3.53 3.36 3.23 3.12 3.03
29 7.60 5.42 4.54 4.04 3.73 3.50 3.33 3.20 3.09 3.00
30 7.56 5.39 4.51 4.02 3.70 3.47 3.30 3.17 3.07 2.98
40 7.31 5.18 4.31 3.83 3.51 3.29 3.12 2.99 2.89 2.80
60 7.08 4.98 4.13 3.65 3.34 3.12 2.95 2.82 2.72 2.63
120 6.85 4.79 3.95 3.48 3.17 2.96 2.79 2.66 2.56 2.47
∞ 6.63 4.61 3.78 3.32 3.02 2.80 2.64 2.51 2.41 2.32
379
380 | Appendix C
Within-group
degrees of Between-groups degrees of freedom
freedom 12 15 20 24 30 40 60 120 ∞
1 6106 6157 6209 6235 6261 6287 6313 6339 6366
2 99.42 99.43 99.45 99.46 99.47 99.47 99.48 99.49 99.50
3 27.05 26.87 26.69 26.60 26.50 26.41 26.32 26.62 26.13
4 14.37 14.20 14.02 13.93 13.84 13.75 13.65 13.56 13.46
5 9.89 9.72 9.55 9.47 9.38 9.29 9.20 9.11 9.02
6 7.72 7.56 7.40 7.31 7.23 7.14 7.06 6.97 6.88
7 6.47 6.31 6.16 6.07 5.99 5.91 5.82 5.74 5.65
8 5.67 5.52 5.36 5.28 5.20 5.12 5.03 4.95 4.86
9 5.11 4.96 4.81 4.73 4.65 4.57 4.48 4.40 4.31
10 4.71 4.56 4.41 4.33 4.25 4.17 4.08 4.00 3.91
11 4.40 4.25 4.10 4.02 3.94 3.86 3.78 3.69 3.60
12 4.16 4.01 3.86 3.78 3.70 3.62 3.54 3.45 3.36
13 3.96 3.82 3.66 3.59 3.51 3.43 3.34 3.25 3.17
14 3.80 3.66 3.51 3.43 3.35 3.27 3.18 3.09 3.00
15 3.67 3.52 3.37 3.29 3.21 3.13 3.05 2.96 2.87
16 3.55 3.41 3.26 3.18 3.10 3.02 2.93 3.84 2.75
17 3.46 3.31 3.16 3.08 3.00 2.92 2.83 2.75 2.65
18 3.37 3.23 3.08 3.00 2.92 2.84 2.75 2.66 2.57
19 3.30 3.15 3.00 2.92 2.84 2.76 2.67 2.58 2.49
20 3.23 3.09 2.94 2.86 2.78 2.69 2.61 2.52 2.42
21 3.17 3.03 2.88 2.80 2.72 2.64 2.55 2.46 2.36
22 3.123 2.98 2.83 2.75 2.67 2.58 2.50 2.40 2.31
23 3.07 2.93 2.78 2.70 2.62 2.54 2.45 2.35 2.26
24 3.03 2.89 2.74 2.66 2.58 2.49 2.40 2.31 2.21
25 2.99 2.85 2.70 2.62 2.54 2.45 2.36 2.27 2.17
26 2.96 2.81 2.66 2.58 2.50 2.42 2.33 2.23 2.13
27 2.93 2.78 2.63 2.55 2.47 2.38 2.29 2.20 2.10
28 2.90 2.75 2.60 2.52 2.44 2.35 2.26 2.17 2.06
29 2.87 2.73 2.57 2.49 2.41 2.33 2.23 2.14 2.03
30 2.84 2.70 2.55 2.47 2.39 3.30 2.21 2.11 2.01
40 2.66 2.52 2.37 2.29 2.20 2.11 2.02 1.92 1.80
60 2.50 2.35 2.20 2.12 2.03 1.94 1.84 1.73 1.60
120 2.34 2.19 2.03 1.95 1.86 1.76 1.66 1.53 1.38
∞ 2.18 2.04 1.88 1.79 1.70 1.59 1.47 1.32 1.00
APPENDIX D
Within-group
degrees of Between-groups degrees of freedom
freedom 1 2 3 4 5 6 7 8 9 10
1 161.4 199.5 215.7 224.6 230.2 234.0 236.8 238.9 240.5 241.9
2 18.51 19.0 19.16 19.25 19.30 19.33 19.35 19.37 19.38 19.40
3 10.13 9.55 9.28 9.12 9.01 8.94 8.89 8.85 8.81 8.79
4 7.71 6.94 6.59 6.39 6.26 6.16 6.09 6.04 6.00 5.96
5 6.61 5.79 5.41 5.19 5.05 4.95 4.88 4.82 4.77 4.74
6 5.99 5.14 4.76 4.53 4.39 4.28 4.21 4.15 4.10 4.06
7 5.59 4.74 4.35 4.12 3.97 3.87 3.79 3.73 3.68 3.64
8 5.32 4.46 4.07 3.84 3.69 3.58 3.5 3.44 3.39 3.35
9 5.12 4.26 3.86 3.63 3.48 3.37 3.29 3.23 3.18 3.14
10 4.96 4.10 3.71 3.48 3.33 3.22 3.14 3.07 3.02 2.98
11 4.84 3.98 3.59 3.36 3.20 3.09 3.01 2.95 2.90 2.85
12 4.75 3.89 3.49 3.26 3.11 3.00 2.91 2.85 2.80 2.75
13 4.67 3.81 3.41 3.18 3.03 2.92 2.83 2.77 2.71 2.67
14 4.60 3.74 3.34 3.11 2.96 2.85 2.76 2.70 2.65 2.60
15 4.54 3.68 3.29 3.06 2.90 2.79 2.71 2.64 2.59 2.54
16 4.49 3.63 3.24 3.01 2.85 2.74 2.66 2.59 2.54 2.49
17 4.45 3.59 3.20 2.96 2.81 2.70 2.61 2.55 2.49 2.45
18 4.41 3.55 3.16 2.93 2.77 2.66 2.58 2.51 2.46 2.41
19 4.38 3.52 3.13 2.90 2.74 2.63 2.54 2.48 2.42 2.38
20 4.35 3.49 3.10 2.87 2.71 2.60 2.51 2.45 2.39 2.35
21 4.32 3.47 3.07 2.84 2.68 2.57 2.49 2.42 2.37 2.32
22 4.30 3.44 3.05 2.82 2.66 2.55 2.46 2.40 2.34 2.30
23 4.28 3.42 3.03 2.80 2.64 2.53 2.44 2.37 2.32 2.27
24 4.26 3.40 3.01 2.78 2.62 2.51 2.42 2.36 2.30 2.25
25 4.24 3.39 2.99 2.76 2.60 2.49 2.40 2.34 2.28 2.24
26 4.23 3.37 2.98 2.74 2.59 2.47 2.39 2.32 2.27 2.22
27 4.21 3.35 2.96 2.73 2.57 2.46 2.37 2.31 2.25 2.20
28 4.20 3.34 2.95 2.71 2.56 2.45 2.36 2.29 2.24 2.19
29 4.18 3.33 2.93 2.70 2.55 2.43 2.35 2.28 2.22 2.18
30 4.17 3.32 2.92 2.69 2.53 2.42 2.33 2.27 2.21 2.16
40 4.08 3.23 2.84 2.61 2.45 2.34 2.25 2.18 2.12 2.08
60 4.00 3.15 2.76 2.53 2.37 2.25 2.17 2.10 2.04 1.99
120 3.92 3.07 2.68 2.45 2.29 2.17 2.09 2.02 1.96 1.91
∞ 3.84 3.00 2.60 2.37 2.21 2.10 2.01 1.94 1.88 1.83
381
382 | Appendix D
Within-group
degrees of Between-groups degrees of freedom
freedom 12 15 20 24 30 40 60 120 ∞
1 243.9 245.9 248.0 249.1 250.1 251.1 252.2 253.3 254.3
2 19.41 19.43 19.45 19.45 19.46 19.47 19.48 19.49 19.5
3 8.74 8.70 8.66 8.64 8.62 8.59 8.57 8.55 8.53
4 5.91 5.86 5.80 5.77 5.75 5.72 5.69 5.66 5.63
5 4.68 4.62 4.56 4.53 4.50 4.46 4.43 4.40 4.36
6 4.00 3.94 3.87 3.84 3.81 3.77 3.74 3.70 3.67
7 3.57 3.51 3.44 3.41 3.38 3.34 3.30 3.27 3.23
8 3.28 3.22 3.15 3.12 3.08 3.04 3.01 2.97 2.93
9 3.07 3.01 2.94 2.90 2.86 2.83 2.79 2.75 2.71
10 2.91 2.85 2.77 2.74 2.70 2.66 2.62 2.58 2.54
11 2.79 2.72 2.65 2.61 2.57 2.53 2.49 2.45 2.40
12 2.69 2.62 2.54 2.51 2.47 2.43 2.38 2.34 2.30
13 2.60 2.53 2.46 2.42 2.38 2.34 2.30 2.25 2.21
14 2.53 2.46 2.39 2.35 2.31 2.27 2.22 2.18 2.13
15 2.48 2.40 2.33 2.29 2.25 2.20 2.16 2.11 2.07
16 2.42 2.35 2.28 2.24 2.19 2.15 2.11 2.06 2.01
17 2.38 2.31 2.23 2.19 2.15 2.10 2.06 2.01 1.96
18 2.34 2.27 2.19 2.15 2.11 2.06 2.02 1.97 1.92
19 2.31 2.23 2.16 2.11 2.07 2.03 1.98 1.93 1.88
20 2.28 2.20 2.12 2.08 2.04 1.99 1.95 1.90 1.84
21 2.25 2.18 2.10 2.05 2.01 1.96 1.92 1.87 1.81
22 2.23 2.15 2.07 2.03 1.98 1.94 1.89 1.84 1.78
23 2.20 2.13 2.05 2.01 1.96 1.91 1.86 1.81 1.76
24 2.18 2.11 2.03 1.98 1.94 1.89 1.84 1.79 1.73
25 2.16 2.09 2.01 1.96 1.92 1.87 1.82 1.77 1.71
26 2.15 2.07 1.99 1.95 1.90 1.85 1.80 1.75 1.69
27 2.13 2.06 1.97 1.93 1.88 1.84 1.79 1.73 1.67
28 2.12 2.04 1.96 1.91 1.87 1.82 1.77 1.71 1.65
29 2.10 2.03 1.94 1.90 1.85 1.81 1.75 1.70 1.64
30 2.09 2.01 1.93 1.89 1.84 1.79 1.74 1.68 1.62
40 2.00 1.92 1.84 1.79 1..74 1.69 1.64 1.58 1.51
60 1.92 1.84 1.75 1.70 1.65 1.59 1.53 1.47 1.39
120 1.83 1.75 1.66 1.61 1.55 1.50 1.43 1.35 1.25
∞ 1.75 1.67 1.57 1.52 1.46 1.39 1.32 1.22 1.00
APPENDIX E
Within-group
degrees of Between-groups degrees of freedom
freedom 1 2 3 4 5 6 7 8 9 10
1 39.86 49.50 53.59 55.83 57.24 58.20 58.91 59.44 59.86 60.19
2 8.53 9.00 9.16 9.24 9.29 9.33 9.35 9.37 9.38 9.39
3 5.54 5.46 5.39 5.34 5.31 5.28 5.27 5.25 5.24 5.23
4 4.54 4.32 4.19 4.11 4.05 4.01 3.98 3.95 3.94 3.92
5 4.06 3.78 3.62 3.52 3.45 3.40 3.37 3.34 3.32 3.30
6 3.78 3.46 3.29 3.18 3.11 3.05 3.01 2.98 2.96 2.94
7 3.59 3.26 3.07 2.96 2.88 2.83 2.78 2.75 2.72 2.70
8 3.46 3.11 2.92 2.81 2.73 2.67 2.62 2.59 2.56 2.54
9 3.36 3.01 2.81 2.69 2.61 2.55 2.51 2.47 2.44 2.42
10 3.29 2.92 2.73 2.61 2.52 2.46 2.41 2.38 2.35 2.32
11 3.23 2.86 2.66 2.54 2.45 2.39 2.34 2.30 2.27 2.25
12 3.18 2.81 2.61 2.48 2.39 2.33 2.28 2.24 2.21 2.19
13 3.14 2.76 2.56 2.43 2.35 2.28 2.23 2.20 2.16 2.14
14 3.10 2.73 2.52 2.39 2.31 2.24 2.19 2.15 2.12 2.10
15 3.07 2.70 2.49 2.36 2.27 2.21 2.16 2.12 2.09 2.06
16 3.05 2.67 2.46 2.33 2.24 2.18 2.13 2.09 2.06 2.03
17 3.03 2.64 2.44 2.31 2.22 2.15 2.10 2.06 2.03 2.00
18 3.01 2.62 2.42 2.29 2.20 2.13 2.08 2.04 2.00 1.98
19 2.99 2.61 2.40 2.27 2.18 2.11 2.06 2.02 1.98 1.96
20 2.97 2.59 2.38 2.25 2.16 2.09 2.04 2.00 1.96 1.94
21 2.96 2.57 2.36 2.23 2.14 2.08 2.02 1.98 1.95 1.92
22 2.95 2.56 2.35 2.22 2.13 2.06 2.01 1.97 1.93 1.90
23 2.94 2.55 2.34 2.21 2.11 2.05 1.99 1.95 1.92 1.89
24 2.93 2.54 2.33 2.19 2.10 2.04 1.98 1.94 1.91 1.88
25 2.92 2.53 2.32 2.18 2.09 2.02 1.97 1.93 1.89 1.87
26 2.91 2.52 2.31 2.17 2.08 2.01 1.96 1.92 1.88 1.86
27 2.90 2.51 2.30 2.17 2.07 2.00 1.95 1.91 1.87 1.85
28 2.89 2.50 2.29 2.16 2.06 2.00 1.94 1.90 1.87 1.84
29 2.89 2.50 2.28 2.15 2.06 1.99 1.93 1.89 1.86 1.83
30 2.88 2.49 2.28 2.14 2.05 1.98 1.93 1.88 1.85 1.82
40 2.84 2.44 2.23 2.09 2.00 1.93 1.87 1.83 1.79 1.76
60 2.79 2.39 2.18 2.04 1.95 1.87 1.82 1.77 1.74 1.71
120 2.75 2.35 2.13 1.99 1.90 1.82 1.77 1.72 1.68 1.65
∞ 2.71 2.30 2.08 1.94 1.85 1.77 1.72 1.67 1.63 1.60
383
384 | Appendix E
Within-group
degrees of Between-groups degrees of freedom
freedom 12 15 20 24 30 40 60 120 ∞
1 60.71 61.22 61.74 62.00 62.26 62.53 62.79 63.06 63.33
2 9.41 9.42 9.44 9.45 9.46 9.47 9.47 9.48 9.49
3 5.22 5.20 5.18 5.18 5.17 5.16 5.15 5.14 5.13
4 3.90 3.87 3.84 3.83 3.82 3.80 3.79 3.78 3.76
5 3.27 3.24 3.21 3.19 3.17 3.16 3.14 3.12 3.10
6 2.90 2.87 2.84 2.82 2.80 2.78 2.76 2.74 2.72
7 2.67 2.63 2.59 2.58 2.56 2.54 2.51 2.49 2.47
8 2.50 2.46 2.42 2.40 2.38 2.36 2.34 2.32 2.29
9 2.38 2.34 2.30 2.28 2.25 2.23 2.21 2.18 2.16
10 2.28 2.24 2.20 2.18 2.16 2.13 2.11 2.08 2.06
11 2.21 2.17 2.12 2.10 2.08 2.05 2.03 2.00 1.97
12 2.15 2.10 2.06 2.04 2.01 1.99 1.96 1.93 1.90
13 2.10 2.05 2.01 1.98 1.96 1.93 1.90 1.88 1.85
14 2.05 2.01 1.96 1.94 1.91 1.89 1.86 1.83 1.80
15 2.02 1.97 1.92 1.90 1.87 1.85 1.82 1.79 1.76
16 1.99 1.94 1.89 1.87 1.84 1.81 1.78 1.75 1.72
17 1.96 1.91 1.86 1.84 1.81 1.78 1.75 1.72 1.69
18 1.93 1.89 1.84 1.81 1.78 1.75 1.72 1.69 1.66
19 1.91 1.86 1.81 1.79 1.76 1.73 1.70 1.67 1.63
20 1.89 1.84 1.79 1.77 1.74 1.71 1.68 1.64 1.61
21 1.87 1.83 1.78 1.75 1.72 1.69 1.66 1.62 1.59
22 1.86 1.81 1.76 1.73 1.70 1.67 1.64 1.60 1.57
23 1.84 1.80 1.74 1.72 1.69 1.66 1.62 1.59 1.55
24 1.83 1.78 1.73 1.70 1.67 1.64 1.61 1.57 1.53
25 1.82 1.77 1.72 1.69 1.66 1.63 1.59 2.56 1.52
26 1.81 1.76 1.71 1.68 1.65 1.61 1.58 1.54 1.50
27 1.80 1.75 1.70 1.67 1.64 1.60 1.57 1.53 1.49
28 1.79 1.74 1.69 1.66 1.63 1.59 1.56 1.52 1.48
29 1.78 1.73 1.68 1.65 1.62 1.58 1.55 1.51 1.47
30 1.77 1.72 1.67 1.64 1.61 1.57 1.54 1.50 1.46
40 1.71 1.66 1.61 1.57 1.54 1.51 1.47 1.42 1.38
60 1.66 1.60 1.54 1.51 1.48 1.44 1.40 1.35 1.29
120 1.60 1.55 1.48 1.45 1.41 1.37 1.32 1.26 1.19
∞ 1.55 1.49 1.42 1.38 1.34 1.30 1.24 1.17 1.00
APPENDIX F
Degrees of
freedom Alpha = .01 Alpha = .05 Alpha = .10
1 6.63 3.84 2.71
2 9.21 5.99 4.61
3 11.34 7.82 6.25
4 13.28 9.49 7.78
5 15.09 11.07 9.24
6 16.81 12.59 10.65
7 18.48 14.07 12.02
8 20.09 15.51 13.36
9 21.66 16.92 14.68
10 23.21 18.31 15.99
11 24.73 19.68 17.28
12 26.22 21.03 18.55
13 27.69 22.36 19.81
14 29.25 23.68 21.06
15 30.58 25.00 22.31
385
APPENDIX G
386
Glossary
a priori: Before an event occurs. For example, we decide our alpha value a priori.
This means we decide our alpha value prior to the start of a study.
alpha value: The degree of risk we are willing to take when computing inferential
statistics. Sometimes referred to as the Type I error rate.
alternate hypothesis: See “research hypothesis.”
analysis of covariance (ANCOVA): A version of the analysis of variance where ini-
tial differences in the dependent variable are taken into account prior to final calcula-
tions.
analysis of variance (ANOVA): A statistical tool based on an f distribution. Varieties
of the ANOVA include the one-way ANOVA, the factorial ANOVA, and the multivari-
ate ANOVA (MANOVA).
area under the curve: A value ranging from 0 to 100% representing the percentage
of values in a given range under one of the data distributions (e.g., the z distribution
and the t distribution).
area under the normal curve table: See “z table.”
assumptions: Characteristics of a dataset we assume to be true prior to using a given
statistical procedure.
bar chart: A graph showing bars rising from the bottom that are proportionate to
the number of occurrences of a given value in a dataset.
beta: Annotation used to designate the probability of making a Type II error.
between-groups variance: The total amount of variance between datasets represent-
ing the levels of an independent variable.
387
388 | Glossary
continuous data: A generic name for interval- or ratio-level data; also called “quan-
titative data.”
correlation coefficient: An output value from any of the correlational procedures. It
represents the strength of the relationship between two or more sets of data.
criterion variable: The value that is being predicted in a correlation procedure.
critical value: A table value to which one of the computed statistics (e.g., t, z, or F) is
compared in order to test a null hypothesis. There are specific tables for each of the
respective distributions.
degrees of freedom: The number of scores that are free to vary when describing a
sample. The method by which they are calculated changes depending on the statistical
test you are using, but for most calculations, the value is defined as one less than the
number of data values.
dependent-sample t-test: Statistical tool used when a study involves one indepen-
dent variable with two levels and one dependent variable that is measured with quan-
titative data. In this case, the levels of the independent variable must be related to or
correlated with one another.
dependent variable: The “effect” that is being measured in a study.
descriptive statistics: Numeric and graphical statistical tools that help us “describe”
the data so it can be better used for our decision making. Examples include the mean
of a dataset or a pie chart.
deviation from the mean: The amount any given value in a dataset is from the mean
of the dataset.
directional hypothesis: A hypothesis that implies a “greater than” or a “less than”
relationship between the variables being studied; also called a “one-tailed hypoth-
esis.”
disordinal interaction: An interaction of two independent variables wherein the
values of one independent variable have a dramatically opposite effect on a dependent
variable than do values of the second independent variable.
effect size: In parametric statistics, a measure of the degree to which an indepen-
dent variable affects a dependent variable.
empirical rule: The rule stating that, in a normal distribution, approximately 68%
of values are within ±1 standard deviation from the mean, 95% of values are w ithin ±2
standard deviations of the mean, and nearly all (99.7%) of all values are within ±3
standard deviations of the mean.
equal variances assumed/not assumed: A test used in parametric statistics to en-
sure that the variability within sets of data being compared is equitable. Significant
differences in variance call for modification of how computed values in parametric
statistics are calculated.
390 | Glossary
ethical research: A research study in which the researcher ensures that participa-
tion by subjects is voluntary; they should not be harmed in any way—socially, physi-
cally, or mentally.
expected value: Value used in chi-square tests as a measure of the number of occur-
rences of each cell value that the researcher believes should appear. These expected
values are compared to the actual number of occurrences in each cell.
experimental independent variable: See “manipulated (experimental) independent
variable.”
F distribution: The plot of F values computed from repeated samples of data.
F value: The value computed in an analysis of variance. It is compared to a critical
value of F to interpret hypotheses.
factorial analysis of variance: Statistical tool used when a study involves more than
one independent variable as well as one dependent variable that represents quantita-
tive data.
fail to reject the null hypothesis: Your inability to reject the null hypothesis based
on the results of your statistical test. This means you are unable to support your re-
search hypothesis.
frequency distribution table: A table showing the number of occurrences of the
various values in a dataset.
goodness of fit: The degree to which an observed distribution of data values fits the
distribution that was expected.
graphical descriptive statistics: The use of tools such as pie charts, bar charts, and
relative frequency diagrams to illustrate the characteristics of a dataset.
histogram: A graph showing the number of occurrences of the various values in a
dataset.
hypothesis: A statement that reflects the researcher’s beliefs about an event that has
occurred or will occur.
independent-sample t-test: A statistical tool used when a study involves one inde-
pendent variable with two levels and one dependent variable that is measured with
quantitative data. In this case, the levels of the independent variable cannot be related
to or correlated with one another.
independent variable: The “cause” being investigated in a study.
inferential statistics: Statistical tools used to make decisions or draw inferences
about the data we have collected.
interaction effect: The simultaneous effect on a dependent variable of two or more
independent variables.
intercept: The point at which the line of best fit crosses the x- or y-axis on a scat-
terplot.
Glossary | 391
interquartile range: The range of values between the first and third quartiles in a
data distribution.
interval data: One of two types of data that are called quantitative or continuous.
Interval data can theoretically fall anywhere within the range of a given dataset. The
range can be divided into equal intervals, but this does not imply that the intervals
can be directly compared. For example, a student scoring in the 80s on an examina-
tion is not twice as smart as a student scoring in the 40s. There is no absolute zero
point in an interval-level dataset.
kurtosis: The degree to which a data distribution deviates from normal by being too
“peaked” or too “flat.”
latent independent variable: An independent variable that is examined “as is” and
is not manipulated by the researcher. Examples include gender and ethnic group.
leptokurtosis: A bell-shaped distribution that is too peaked (i.e., too many values
around the mean of a distribution) to be perfectly normally distributed.
levels of the independent variable: Different values of the independent variable that
are investigated to determine if there is a differing effect on the dependent variable.
Levene’s test: A statistical tool for testing the equality of variance in datasets being
compared.
line of best fit: A graphical technique used in correlational procedures to show the
trend of correlations being plotted. A line of best fit that emulates a 45-degree angle
demonstrates a strong positive correlation. A line of best fit that appears flatter indi-
cates a lesser degree of correlation.
logistic regression: A linear regression procedure used to predict the values of a
nominal dependent variable.
lower confidence limit: The smallest value in a confidence interval.
main effect: The effect of a single independent variable on a dependent variable.
manipulated (experimental) independent variable: An independent variable where
the researcher defines membership into the factors or levels. An example would show
a researcher placing students in different groups to measure the effect of technology
on learning. Sometimes called “experimental independent variable.”
Mann–Whitney U test: Nonparametric alternative to the independent-sample t-test.
mean: A measure of central tendency reflecting the average value in a dataset. Only
used with quantitative data (interval and ratio).
mean of means: The average score of a dataset created by computing and plotting
the mean of repeated samples of a population.
mean square: A constant used in the calculation of an F value computed by dividing
the sum of squares value for a specific group by the degrees of freedom for the same
group.
392 | Glossary
measures of central tendency: Descriptive statistics that help us determine the mid-
dle of a dataset. Examples include the mean, the median, and the mode.
measures of dispersion: Descriptive statistics that help determine how spread out
a data distribution is. Examples include the range, the standard deviation, and the
variance.
measures of relative standing: Measures used to compare data points in terms of
their relationship within a given dataset. Examples include z scores and percentiles.
median: A measure of central tendency that describes the midpoint of data that are
ordinal, interval, or ratio level and have been sorted into ascending or descending
sequence.
mode: A measure of central tendency reflecting the value that occurs most often in
a dataset. Can be used with all types of data.
multimodal: A dataset having more than one mode.
multiple-comparison test: Tests used after the analysis of variance to determine
exactly which means are significantly different from one another. An example is the
Bonferroni test.
multiple regression: A regression procedure that uses more than one predictor vari-
able.
multivariate: Referring to a statistical test that has more than one independent or
dependent variable.
multivariate analysis of variance (MANOVA): Statistical tool used when a study
involves any number of independent variables and more than one dependent variable
that measures quantitative data.
negatively skewed: Skewed to the left (i.e., there are more values below the mean of
a quantitative dataset than there are above the mean).
nominal data: Data that are categorical in nature. Examples include gender, ethnic
group, and grade in school.
nondirectional hypothesis: A hypothesis that implies a difference will exist between
the variables being studied but no direction is implied. Also called a “two-tailed hy-
pothesis.”
nonmanipulated (quasi-) independent variable: An independent variable where the
levels are preexisting. An example would include a situation where the independent
variable was gender; the levels are male and female.
nonparametric statistics: Inferential statistical tools used with nominal or ordinal
data or with quantitative data where the distribution is very abnormally distributed
(e.g., skewed or kurtotic).
normal distribution: A quantitative distribution wherein the distribution is bell-
shaped.
Glossary | 393
null hypothesis: A hypothesis that states there will be no relationship between the
variables being studied. The null hypothesis is the antithesis of the research hypoth-
esis.
numeric descriptive statistics: The use of tools such as measures of central ten-
dency, measures of dispersion, and measures of relative standing to illustrate the char-
acteristics of a dataset.
observed value: The actual count of occurrences in a cell of nominal value. It is used
in chi-square tests to compare to expected values.
one-sample chi-square test: The comparison of cell frequencies in a nominal distri-
bution to those that would be expected according to a previously defined criterion. See
also “chi-square goodness-of-fit test.”
one-tailed hypothesis: See “directional hypothesis.”
one-way analysis of variance (ANOVA): Statistical tool used when a study involves
one independent variable with three or more levels and one dependent variable that
is measured with quantitative data.
ordinal data: Data that are rank-ordered. Examples include order of finish in a race
and class standing.
ordinal interaction: An interaction effect where the influence of one independent
variable remains in the same direction but varies in magnitude across levels of an-
other independent variable
p value: The probability that groups being compared came from the same popula-
tion.
paired-samples t-test: See “dependent-sample t-test.”
parameter: Any value known about a dataset representing an entire population.
parametric: Pertaining to the use of quantitative data that form a mound-shaped
distribution.
parametric statistics: Inferential statistical tools used with quantitative data.
Pearson’s r: Output from a correlation procedure using quantitative datasets. Pear-
son’s r can range from –1.00 to +1.00 with extremely high values indicating a strong
positive correlation; extremely low values indicate a strong negative correlation. Val-
ues around zero indicate a weak correlation.
percentile: A measure of relative standing that describes the percentage of other
values in the dataset falling below it. For example, a test score of 50 that falls into the
80th percentile means that 80% of the other scores are less than 50.
pie chart: A circular chart divided into segments, with each representing the per-
centage of a given value in a dataset.
platykurtosis: A bell-shaped distribution that is too flat (i.e., fewer values around the
mean of the distribution than is expected) to be perfectly normally distributed.
394 | Glossary
2. In this case, there are two issues. First, this study is not within the scope of the
teachers’ expertise; mold inspection and the like is best left to experts. Second, all
of the variables are not considered. The problem statement indicates that potential
health hazards are to be investigated but it does not mention mold and asbestos.
While they may be part of what could be considered health issues, there are quite a
few other problems that might be identified as well.
3. This appears to be a valid problem. It meets all of the criteria and is stated in an
appropriate manner.
4. This appears to be a good problem statement given that the attitudes of ranchers
could be measured by an instrument representing numeric data. All other charac-
teristics of the problem statement are met or assumed.
5. This has the potential to be a good problem statement, but there appears to be
one major flaw; what is the scope of the problem? It would be better if the author
included characteristics such as age, gender, and ethnic group that might be appli-
cable to the study.
6. The major issue with this problem statement is one of scope. While the problem
is clear, the goals of the lesson plan are clearly beyond their reach. Very few teach-
ers at that level would have access to the types of tools necessary to conduct the
research that is being suggested.
7. The problem statement seems to be very well written, but I worry about the scope.
It would be better if the scope was limited to different makes of foreign cars. For
example, the study might not be valid if the researcher were comparing large Euro-
pean luxury sedans to economical imports from Japan or Korea.
397
398 | Answers to Quiz Time!
8. Again, there is the potential for a very good problem statement here. The author
might better state the relationship between the variables, however. Given, weight
gain is a part of prenatal lifestyle, but it is not the only component; other issues
such as exercise, alcohol consumption, and tobacco use could also contribute to the
health of newborn infants.
9. This problem statement begs to ask, “What is the problem?” The scope is not men-
tioned, it is not clear and concise, and not all of the variables seem to be included.
In addition, it’s not clear whether the researcher is comfortable with the subject
area. In short, this is not a good problem statement.
There will not be a significant difference in the number of sessions for treating
depression between clients in a traditional setting and clients in a distance setting.
There will be a significant difference between the temperature of our process and
155°.
The null hypothesis would read:
Answers to Quiz Time! | 399
There will be no significant difference between the temperature of our process and
155°.
There will be a significant difference in language skills after one year of instruction
between students taught in an immersion environment and students taught in the
classroom.
The null hypothesis would read:
The number of customers who would like to take a cruise where no children are
allowed on board is significantly larger than the number of customers who feel that
children should be allowed on all cruises.
The null hypothesis would read:
There is no significant difference in the number of customers who would like to take
a cruise where no children are allowed on board, and the number of customers who
feel that children should be allowed on all cruises.
8. Truck drivers who check their tire Truck drivers: Miles per Quantitative
pressure frequently will have signifi- Those who check gallon. (ratio). There is a
cantly higher miles per gallon than their tire pressure real zero, and ratios
truck drivers who do not check their frequently and can be made.
tire pressure frequently. those who do not.
9. Class is multimodal; there are four each of freshman, sophomore, and senior.
10. Class is a nominal value; the median should not be computed.
11. Class rank is ordinal; the average should not be computed.
12. The average shoe size is 9.6.
3. The computed range of company A is 43; the computed range of company B is 25;
and the computed range of company C is 40.
6. The standard deviation of company A is 14.89, the standard deviation for company
B is 7.56, and the standard deviation for company C is 11.34.
1.
TABLE 3.27. Data for Computation of z Scores, T-Scores, Stanines, and
Ranked Mean Scores
Observed Standard Ranked
Mean value deviation z score T-score Stanine mean score
30 33 2.00 1.50 65 8 100
48 52 5.00 .80 58 7 81
55 54 3.00 –.33 47 5 71
71 77 7.00 –.86 59 7 61
14 8 2.70 –2.22 28 1 55
23 35 5.00 2.40 74 9 48
61 48.6 2.90 –4.28 7 1 47
100 114 6.33 2.21 72 9 30
81 78.5 1.55 –1.61 34 2 23
47 60.00 12.0 1.08 61 8 14
25th percentile = 30
50th percentile = 51.5
75th percentile = 71
Answers to Quiz Time! | 403
2. The z score for a score of 130 with a mean of 100 and a standard deviation of 15 is
2.00.
3. In this case, we would first have to determine the standard deviation. Since we
know that it is the square root of the variance, it is easy to determine. The square
root of $9,000 is $94.87. Since we are interested in a point 2 standard deviations
below the mean, we then multiply that by 2: $94.87 * 2 = $189.74. In order to deter-
mine our cutoff point, we subtract that from $22,000 and get $21,810.26. Since the
family we are interested in has an income of $19,500, they would qualify for govern-
mental assistance.
4. We compute the z score by first subtracting the sample mean from the observed
value we are investigating; in this case 14 – 10 = 4; we then divide that by the sample
standard deviation (i.e., 3), giving us a z score of 1.33. For an observed value of 5,
the z score is –1.67 and for the observed value of 20, the z score is 3.33.
1. The easiest way to show the frequency of the values would be with a bar chart. Each
bar would represent the actual number of each value in the dataset.
2. In this case there would seem to be no consistent relationship between the data val-
ues (i.e., the line of best fit would tend to be flat). This is based on the observation
that, as some of the reading scores go up, their corresponding math scores do the
same. In other cases, the opposite is true; as some of the reading scores go up, their
corresponding math score goes down.
4. The histogram is used with quantitative data in order to allow for fractional values;
given this, nominal data are not plotted using a histogram. A histogram is mound-
shaped if the mean, median, and mode are approximately equal and has a sym-
metrical distribution of values on either side of the center. If there are more values
in either end of the distribution than would generally be found in a bell-shaped dis-
tribution, we say the distribution is skewed. If the mean is greater than the median,
the distribution is positively skewed; if the mean is less than the median, negative
skewness occurs. If there are more data values than expected in the middle of the
dataset, we say the distribution is leptokurtotic and the plotted distribution would
be more peaked than normal. Fewer values in the center cause the distribution to
“flatten out”; we call this distribution platykurtotic.
5. A perfect normal distribution has an equal mean, median, and mode with data
values distributed symmetrically around the center of the dataset.
404 | Answers to Quiz Time!
6. Since the mean is greater than the median, the distribution would be positively
skewed.
7. The distribution will be leptokurtotic; it will appear more peaked than a normal
distribution.
8. No, there are many reasons a dataset with an equal mean and median does not
have to be normally distributed. One primary reason would be the possibility of
too few data values to give the symmetrical, mound-shaped appearance of a normal
distribution.
9. No, most parametric inferential statistical tests are powerful enough to work with
minor problems with skewness or kurtosis.
10. If the mean is less than the median, negative skewness occurs. Statistical software
can be used to determine if this is problematic by returning a skewness coefficient.
This value is only problematic if it is less than –2.00. If the mean is greater than the
median, positive skewness occurs. Again, this is detrimental to decision making
only if the software returns a value greater than +2.00.
11.
TABLE 4.10. Values for Quiz Time Question 11
Area under the Area under the
z value 1 curve for z 1 z value 2 curve for z 2 Difference
3.01 49.87% 2.00 47.72% 2.15%
–2.50 49.38% 3.00 49.87% 99.25%
–1.99 47.67% –0.09 3.59% 44.08%
1.50 43.32% –1.50 43.32% 86.64%
1. z score:
a. 1.347219
b. –2.57196
c. 1.837116
d. 0.367423
e. –0.36742
f. –1.22474
g. –0.24495
h. –0.9798
Answers to Quiz Time! | 405
2.
TABLE 5.6. Computing the Distance between z Scores
Area under the Area under the Area between
Value 1 z1 curve for z 1 Value 2 z2 curve for z 2 z 1 and z 2
90 –1 .3413 95 1.5 .4332 .7745
89 –1.5 .4332 90 –1 .3413 0.0919
91.5 –.25 .0987 93.5 .75 .2734 .3721
92 0 0 96 2 .4772 .4772
TABLE 5.13. Learning to Compute the Width and Limits of a Confidence Interval
Lower limit Upper limit Width
x Alpha n s of CI of CI of CI
100 .10 25 5 98.36 101.65 3.29
500 .05 50 25 493.07 506.93 13.86
20 .01 15 3 18.01 21.99 3.99
55 .05 20 7 51.93 58.07 6.14
70 .01 22 6 66.71 73.29 6.59
220 .10 40 10 217.40 222.60 5.20
2. There is one independent variable with two levels—young females before the science
Answers to Quiz Time! | 407
program and the same young females after the science program. Their interest level
is the dependent variable. If, for example, we measured interest on a 1 to 5 scale the
data would be quantitative. Because of this, we would use a dependent sample t-test
to test the hypothesis.
3. There is one independent variable, office, with two levels—well-lighted and dimly
lighted. The level of productivity is the dependent variable. Because you are collect-
ing quantitative data, you would use an independent-sample t-test.
5. In this case, you have one independent variable with two levels—persons raised in
an urban setting and persons raised in a rural setting. There are two dependent
variables, height and weight. Because of this, a MANOVA would be used to investi-
gate this hypothesis.
7. The number of children is the dependent variable; family is the independent vari-
able and has three levels—lower, middle, and high socioeconomic status. In this
case, you would use a one-way ANOVA.
8. The location of the show is the independent variable; the two levels are “on Broad-
way” and “touring.” The degree of audience appreciation is a quantitative depen-
dent variable; in this case, you would use an independent-sample t-test.
9. In this case, you have a predictor variable, number of siblings, and a criterion vari-
able, annual income. Because of this, you would use the Pearson correlation.
10. Here there is one independent variable, political party, with three levels—Repub-
licans, Democrats, and Liberals. Since we have one dependent variable that is
numeric, we would use a one-way ANOVA.
11. Here we have one independent variable, gender, with two levels—males and females.
We are comparing the number of each in our office (i.e., the observed value) to the
percentage of each nationwide (i.e., the expected value). Because of this, we would
use a one-way chi-square test.
12. Herein, we’re comparing the degree of procrastination of one group of authors, be-
fore and after a call from their editor. Because it’s the same group being compared
at two different times, we would use a dependent-sample t-test.
.215 (remember, you must divide the two-tailed p value of .430 by 2 when you have
a directional hypothesis), the difference is not significant. We will fail to reject the
null hypothesis and will not support the research hypothesis.
2. Here we have a one-tailed “less than” hypothesis stating that Ivy League-trained
physicians will have significantly fewer malpractice lawsuits than the national mean
of 11. In this case, the Ivy League physicians have an average of 5.20, giving a mean
difference of –5.8. Our p value of .000 indicates this difference is significantly
lower. We will reject the null hypothesis and support the research hypothesis.
3. In this case, we have a one-tailed “greater than” hypothesis in that we believe the
mean graduation rate of students who went to community college for their first two
years will be greater than the national average of 82%. The graduation rate of our
sample is 79.6% and results in a p value of .0215; because of this, we will reject the
null hypothesis. However, since the mean difference is opposite of what we hypoth-
esized, we will not support the research hypothesis.
4. In this case, we’ve hypothesized that adult turkeys in the wild will weigh significant-
ly less than 12.0 pounds. Given a mean difference of –1.467 and a p value of .008,
we reject the null hypothesis and support the research hypothesis.
5. We can see that the time to restore power to lower socioeconomic sections of town
is 5.15 hours, the average time in more affluent neighborhoods is 4.0 hours. Our
p value (.009) shows that we reject our null hypothesis and support our research
hypothesis.
6. Here we tested the number of holes-in-one in the current season (i.e., 25) against
those from past years. We can see there is hardly any difference, 25 vs. 24.85. This
results in a very large p value of .901 meaning we would fail to support our research
hypothesis and fail to support the research hypothesis.
1. The researchers do not state that they expect a difference between the groups, so
the research hypothesis must be nondirectional:
The p value for the Levene statistic is less than .05, so we have to use the “Equal
variances not assumed” column. Since we have a nondirectional hypothesis, we
have to divide the p value of .002 by two. In this case, the very low p value of .001
indicates there is a significant difference; male trainees call home a significantly
greater number of times than female trainees.
Employees in private offices will make significantly more phone calls than
employees working in cubicles.
The null hypothesis would read:
1. The owners are investigating the female actresses’ statement, so the research hy-
pothesis must be a “greater than” directional research hypothesis:
Females, when placed on the marquee first, will lead to significantly higher
ticket sales than when males are placed first.
The null hypothesis would be:
There will be no significant difference in ticket sales when females are first on
the marquee or when males are first on the marquee.
2. The independent variable is “Name,” and the two levels are “Female” and “Male.”
3. The dependent variable is “ticket sales.”
4. Here the females do have a higher number of ticket sales, but we need to determine
if the difference is significant. Using the “Equal variances assumed” column, the
low p value (i.e., 439) shows that the differences between male and female actors are
different but not significantly so.
410 | Answers to Quiz Time!
1. In this case, I am hypothesizing that the gas stations on the way out of town have
prices that are significantly higher than those coming into town. Because of that, I
state a one-tailed directional research hypothesis:
Gas stations going out of town have significantly higher prices than gas
stations going into town.
The null hypothesis would read:
There will be no significant difference in gas prices between stations going into
town and those coming out of town.
2. The independent variable is “Gas Station,” and the levels are “Into town” and “Out
of town.”
1. In this case, the proponents of technology believe that computers are beneficial to
students. Because of that, they want to test a directional “greater than” research
hypothesis:
Students will have significantly higher grades after the use of technology in the
classroom than before they used technology in the classroom.
The null hypothesis will be:
1. Here the citizens are concerned that their property values will go down. Because of
that, we must state a directional “less than” research hypothesis:
1. The entrepreneur is attempting to sell his service by advertising fewer junk e-mails;
he is stating a one-tailed “less than” research hypothesis:
There will be significantly fewer junk e-mails after customers start using his
service than prior to using his service.
The null hypothesis would be:
1. While the members of the management team were happy when the new President
was hired, apparently their satisfaction went down over time. That means they need
to state a directional “less than” research hypothesis to determine if the drop was
statistically significant:
Management satisfaction after the President has been employed for 3 months
is significantly lower than when the President was hired.
The null hypothesis is:
the new President was hired is dramatically higher than the mean of 42.27 three
months later. We can see from the p value of .000 that this is a very significant dif-
ference.
1. I’m interested in looking at the amount of money spent by people who play a weekly
lottery. I feel that using a video to show these players that the odds of winning the
lottery are probably far less than they imagined. To investigate this, I will state a
directional “less than” research hypothesis:
Money spent playing the lottery will be significantly less after watching the
video than was spent prior to watching the video.
The null hypothesis is:
Because of that, the null hypothesis cannot be rejected, and the research hypothesis
cannot be supported.
1. Since the teachers are interested in determining if there is a difference in the num-
ber of absences between the three groups, the null hypothesis would read:
4. “Guide Service Used” has two levels: “yes” and “no.” Route has four levels: “Ingra-
ham Direct,” “Gibraltar Rock,” “Disappointment Cleaver,” and “Nisqually Ice Cliff.”
gained between the four routes (p = .754). In short, climbers can sit around the fire
and debate all night, but it really doesn’t make a difference.
5. The dependent variables are employee satisfaction and the average number
of customers worked with.
6. The difference in mean scores in satisfaction and number of customers between
the two groups is small. Box’s M is not significant, so we can examine the results
of the Multivariate tests. Therein you can see that Pillai’s Trace is not significant
(i.e., p = .784.). This means there is no significant interaction effect, therefore we do
not need to evaluate the Between-Subjects Effects tests.
Answers to Quiz Time! | 417
1. Since the professor is interested in determining the relationship between major and
grade, he would be investigating this null hypothesis:
5. Because of the p value of .027, he would reject the null hypothesis. While there
isn’t a post-hoc test, it appears that those students with undergraduate degrees in
psychology are far more likely to make an A grade than their classmates with other
majors.
5. While there are fewer “A” grades and more “C” grades than expected, the p value of
.07 shows there is not a significant difference in the number of each grade that was
expected and the number of each grade that was actually received.
418 | Answers to Quiz Time!
1. In this case, we are interested in looking at the relationship between drug use and
socioeconomic status. Our null hypothesis would be:
3. The dependent variable is the number of each occurrence within the cells (e.g., the
number of drug users who make below $20,000).
4. Since there are two independent variables and we want to see if an interaction ex-
ists between socioeconomic status and drug use, we would use the chi-square test of
independence.
5. In this case, we would fail to reject the null hypothesis given the p value of .726.
Simply put, it doesn’t make any difference how much money someone makes when
it comes to whether or not they use drugs.
2. The independent variable is “Gender,” and the two levels are “Male” and “Female.”
3. The dependent variable is the actual count of males and females.
4. Because the manager is comparing the number of each gender to the expected
number of each gender, he will use the chi-square goodness of fit.
5. The p value of .421 indicates that the number of males and females in the company
is not significantly different from the national average.
Answers to Quiz Time! | 419
1. Pearson’s r says there is a strong relationship (i.e., r = .790) between a father’s age at
death and the son’s age at death.
2. The coefficient of determination is .6241. This tells us that about 62.41% of change
in the criterion variable (i.e., the son’s age) is due to the predictor variable (i.e., the
father’s age).
3. Given a father’s age at death of 65, the son would be expected to live about 73 years.
4. The line-of-best-fit flow would move from the bottom left to the upper right at
slightly less than a 45-degree angle.
1. We would use Spearman’s rho because we are dealing with rank (i.e., ordinal) data.
2. The rho value of –.515 means there is a moderate negative relationship between the
Democrats’ rankings and the Republicans’ rankings.
3. I would tell the president that I was surprised the correlation wasn’t closer to –1.00!
After all, aren’t these two parties supposed to vote exactly the opposite of one an-
other?
1. I would tell a group of potential high school dropouts that there is a fairly strong
relationship (i.e., r = .437) between the number of years they go to school and how
much money they can expect to make.
2. I would tell a student that the average income is nearly $15,000 annually, while a
ninth-grade dropout can only expect to make about $11,600.
3. The coefficient of determination is .1909. This tells us that about 19.09% of change
in the criterion variable (i.e., salary) is due to the predictor variable (i.e., the highest
grade completed). While the number of years of education is a predictor, over 80%
of the time, other factors are involved.
4. The line of best fit would rise, from left to right, at about a 25- to 30-degree angle.
1. In this case, Pearson’s r of –.259 shows a small negative correlation; people with
more education tend to have a slightly smaller number of children.
420 | Answers to Quiz Time!
2. The coefficient of determination is .0670. This tells us that about 6.7% of change in
the criterion variable (i.e., number of children) is due to the predictor variable (i.e.,
the highest grade completed). While the number of years of education is a slight
predictor, other factors are much more likely to determine the number of children.
3. I would tell the potential dropouts that students who drop out after the eighth
grade have, on average, more children than the overall average.
4. The line of best fit would drop slightly from the left upper corner down to the right.
5. I would point out to the students considering leaving school that the combination
of lower wages and more children makes for a difficult life. Stay in school!
Index
421
422 | Index
Between sum of squares, 251–252, 253 Prerequisites and Performance, 331, 417
Between-group degrees of freedom, 249, Prima Donnas, 212–213, 409
257–258 Quality Time, 268–271
Between-groups variance, 249, 387 Regional Discrepancies, 271–275
Bimodal dataset, 51, 388. See also Mode Report Cards, 202–206
Bivariate, 339, 388 Seasonal Depression, 297–298, 414
Bonferroni procedure Seniors Skipping School, 265–268
analysis of variance (ANOVA) and, 263–264, Slow Response Time, 168–171
267, 268, 271, 275 SPAM, 240–241, 412
definition, 388 Stopping Sneezing, 171–173
overview, 3 Technology and Achievement, 238–239,
411
Type of Instruction and Learning Style,
C 327–330
Unexcused Students, 232–234
Cases “We Can’t Get No Satisfaction,” 241–242,
Absent Students, 346–349 412–413
Advertising, 31–32, 399 “Winning at the Lottery,” 242–243, 413
Affecting Ability, 275–284 Workplace Satisfaction, 214–215, 410
Age and Driving, 367–360 Worrying about Our Neighbors, 239–240,
Anxious Athletes, 206–209 411–412
Balancing Time, 292–295 Wrong Side of the Road, 213–214, 410
Being Exactly Right, 31, 398–399 Categorical data. See Nominal data
Belligerent Bus Drivers, 313–315 Causal relationships, compared to correlations,
“Can’t We All Just Get Along?” 372, 419 341–342
Case Against Sleep, 350–353 Cell, 388
Cavernous Lab, 197–202 Central limit theorem
Climbing, 300–301, 415–416 definition, 388
Coach, 284–287 estimating population parameters using con-
Cold Call, 211–212, 409 fidence intervals and, 125–128
Corporal Punishment, 324–326 overview, 116–117
Degree Completion, 296–297, 413–414 sampling distribution of mean differences
Different Tastes, 355–357 and, 184
Distance Therapy, 30, 398 sampling distribution of the means and,
“Does It Really Work?” 31, 399 117–124
Driving Away, 299–300, 414 Central tendency measures. See Measures of
Employee Productivity, 301–303, 416 central tendency
Equal Opportunity, 333–334, 418 Chi-square test. See also Chi-square test of
Flower Shop, 215–216, 410–411 independence; Factorial chi-square test;
Getting What You Asked For, 332, 417 Nonparametric statistics; One-way chi-
Growing Tomatoes, 174–176 square test
Height versus Weight, 353–355 chi-square distribution, 307–309, 310–311
Homesick Blues, 211, 408–409 choosing the right statistical test and, 153
Irate Parents, 316–317 computing, 306–307, 319–321
Kids on Cruises, 32, 400 definition, 388
Learning to Speak, 32, 399 hypothesis testing and, 321–326
“Like Father, Like Son,” 371–372, 419 normal distribution and, 101
Money Meaning Nothing, 332–333, 418 overview, 304–305, 330
More Is Better, 372–373, 419 post-hoc tests and, 327
More Is Better Still, 373–374, 419–420 quiz regarding, 331–334, 419–420
Multiple Means of Math Mastery, 259–265 six-step model and, 312–317, 322–326,
Never Saying Never, 234–237 327–330
New Teacher, 30–31, 398 types of data and, 42
Index | 423
Dependent variable. See also Dependent vari- Disordinal interactions, 283–284, 390
able identification and description step Dispersion, measures of. See Measures of dis-
choosing the right statistical test and, persion
152–154 Distance from the mean, 68–70
definition, 390 Distribution of data. See Data distribution
overview, 5, 33
quiz regarding, 57–58, 400–401
relationship between independent variables E
and, 37–38
Dependent variable identification and descrip- Effect size
tion step. See also Dependent variable; analysis of variance (ANOVA) and, 258–259,
Measures of dispersion; Measures of rela- 274
tive standing; Six-step model definition, 390
analysis of variance (ANOVA) and, 260–261, dependent-sample t-test and, 222–223
266, 269, 272–273 independent-sample t-test and, 192–196, 205
chi-square test and, 314–315, 317, 325, t distribution and, 163
328–329 Empirical rule
correlations and, 348, 351, 353–354, 356, 368 definition, 390
dependent-sample t-test and, 230, 231, 233, overview, 107–112, 113
235, 236 sampling distribution of mean differences
factorial ANOVA and, 277–278, 285, 286 and, 184
independent-sample t-test and, 199, 203–204, sampling distribution of the means and,
207 121–123, 124
multivariate ANOVA (MANOVA) and, 293, t distribution and, 157
294 Equal variances assumed, 190, 191–192, 210,
one-sample t-test and, 170, 172, 173, 175 201, 390
overview, 5, 37–56 Equal variances not assumed, 192, 201, 390
quiz regarding, 57–58, 400–401 Error, sampling. See Sampling error
Dependent-sample t-test. See also t-test Error probability, 135–136, 146–147
choosing the right statistical test and, 153 Estimations, 64–65, 125–128, 129–136
computing the t value for, 218–221, 222, 223 Ethical research, 12–13, 390
definition, 390 Examples. See Cases
effect size and, 222–223 Expected values
independent-sample t-test and, 218 chi-square test and, 311–312
one-tailed hypothesis testing and, 223–226 chi-square test of independence and,
overview, 181–182, 217–218, 237 319–320
quiz regarding, 238–243, 411–413 definition, 390
six-step model and, 229–237 factorial chi-square test and, 306
two-tailed hypothesis testing and, 226–228 overview, 305
Descriptive statistics. See also Graphical descrip- Experimental independent variables, 35–36,
tive statistics 390. See also Independent variables
analysis of variance (ANOVA) and, 249–251
definition, 390
one-sample t-test and, 166, 167 F
overview, 2–3, 5
Deviation from the mean, 67, 390. See also Stan- F distribution, 256–258, 390
dard deviation F value, 249, 255–256, 262–263, 390
Directional hypothesis. See also Hypothesis Factorial ANOVA. See also Analysis of variance
definition, 390 (ANOVA)
independent-sample t-test and, 195–196, 202 choosing the right statistical test and, 153
null hypotheses for, 20–22 definition, 390
one-sample t-test and, 163–166 overview, 246–247, 274–275, 295
overview, 16, 17–19 six-step model and, 275–287
Index | 425
Factorial chi-square test, 305–312. See also Chi- analysis of variance (ANOVA) and, 260, 265,
square test 269, 272
Failure to reject the null hypothesis, 26–27, case studies to consider in, 30–32
127, 390. See also Null hypothesis chi-square test and, 314, 316, 324–325, 328
Frequency distribution table, 88, 390 correlations and, 347, 350, 353, 356, 367
Friedman’s ANOVA, 275. See also Analysis of dependent-sample t-test and, 229–230, 232,
variance (ANOVA); Factorial ANOVA 235
factorial ANOVA and, 276–277, 285
independent-sample t-test and, 198, 203,
G 206–207
multivariate ANOVA (MANOVA) and, 293
Goodness of fit, 304–305, 306, 311–312, one-sample t-test and, 169, 172, 174
317–318, 390. See also Chi-square test overview, 4, 7, 16–28
Graphical descriptive statistics. See also Descrip- quiz regarding, 29–30, 397–400
tive statistics; Scatterplots Hypothesis testing. See also Hypothesis;
bar charts and, 90–91 Hypothesis testing step
definition, 390 changing alpha values and, 135–136, 146–147
empirical rule and, 107–112 chi-square test and, 305, 321–326
histograms, 97, 98, 99 dependent-sample t-test and, 217–218
kurtosis and, 105–107 independent-sample t-test and, 190–191
nominal data and, 88, 89, 90 population parameter based on a sample
normal distribution and, 99–113 statistic, 137–146
overview, 87, 97–99, 100, 112–113 probability (p) values and, 148–152
pie charts, 88, 90, 91 quiz regarding, 155, 404–407
quantitative data and, 92 t distribution and, 157–163
quiz regarding, 113–114, 403–404 three or more comparisons and, 244–245
skewness and, 102–105 Hypothesis testing step. See also Hypothesis;
“Greater than” relationship testing, 18. See also Hypothesis testing; Six-step model
One-tailed hypothesis testing analysis of variance (ANOVA) and, 261–265,
267, 268, 270, 271, 273–274, 275
chi-square test and, 315, 317, 325–326,
H 329–330
correlations and, 349, 352–353, 354–355,
Histograms. See also Graphical descriptive 357, 369–370
statistics dependent-sample t-test and, 231, 233–234,
definition, 390 236–237
normal distribution and, 100–101 factorial ANOVA and, 280–284, 286–287
overview, 91, 97, 98, 99 independent-sample t-test and, 200–202,
Homogeneity of covariance matrices, 288, 291, 204–206, 208–209
292 multivariate ANOVA (MANOVA) and,
Homogeneity of variance, 248–249, 261–262, 294–295
267, 270, 273–274, 275 one-sample t-test and, 170–171, 173, 175–176
Hypothesis. See also Directional hypothesis; testing the null hypothesis and, 23–25
Hypothesis development/stating step;
Hypothesis testing; Hypothesis testing
step; Nondirectional hypothesis; Null I
hypothesis
definition, 390 Independence of scores, 248
overview, 4, 16–17 Independent variable identification step. See
rejecting and failing to reject, 26–27 also Independent variables; Six-step model
significance and, 27 analysis of variance (ANOVA) and, 260, 266,
Hypothesis development/stating step. See also 269, 272
Data analysis software; Hypothesis; Six- chi-square test and, 314, 316, 325, 328
step model
426 | Index
Normal distribution (cont.) One-way ANOVA, 246, 393. See also Analysis of
empirical rule and, 107–112 variance (ANOVA)
graphical descriptions of, 99–113 One-way chi-square test, 304–305. See also Chi-
histograms and, 100–101 square test
hypothesis testing and, 138–139 Ordinal data. See also Data types; Nonparamet-
overview, 100–101, 112–113 ric statistics
quiz regarding, 113–114, 403–404 choosing the right statistical test and,
sampling distribution of the means and, 152–153
118–119 definition, 393
Null hypothesis. See also Hypothesis; Hypoth- graphical descriptions of, 101
esis testing measures of central tendency and, 43–45,
case studies to consider in, 30–32 46–47, 50–51
definition, 393 overview, 39–40, 42, 43
for directional research hypotheses and, 20–22 Spearman rank difference correlation and,
for nondirectional research hypotheses and, 344
22–23 Ordinal interactions, 283–284, 287, 393
one-tailed hypothesis testing and, 142–146
overview, 16, 20–21
probability (p) values and, 148–152 P
rejecting and failing to reject, 26–28, 127
research hypotheses and, 26–27 p values. See Probability (p) values
significance and, 27, 28 Paired-samples t-test. See Dependent-sample
testing, 23–25 t-test
Type I and II errors and, 128 Parametric statistics. See also Analysis of vari-
Numeric descriptive statistics, 393 ance (ANOVA); t-test
choosing the right statistical test and,
152–153
O definition, 393
normal distribution and, 101, 112–113
Observed values overview, 24, 42
chi-square test and, 312 types of data and, 43
chi-square test of independence and, 319 Pearson product–moment correlation coeffi-
definition, 393 cient (Pearson’s r). See also Correlations
factorial chi-square test and, 306 choosing the right statistical test and, 153
overview, 304–305 definition, 393
One-sample chi-square test, 393 graphical descriptive statistics and, 92
One-sample t-test. See also t-test interpreting, 339–343
choosing the right statistical test and, 153 linear regression and, 365–366
directional hypothesis and, 163–166 overview, 336–339
one-tailed hypothesis testing and, 166–168 quiz regarding, 371–374, 417–418
overview, 156–157, 176, 182 Percentiles. See also Measures of relative stand-
quiz regarding, 177–180, 407–408 ing; Quartiles
six-step model and, 168–176 definition, 393
t distribution and, 157–163 median as, 73–75
One-sample z-test, 153. See also z-test overview, 71–75
One-tailed hypothesis testing. See also Direc- SPSS and, 80–83, 84
tional hypothesis; Hypothesis testing Pie charts, 90, 91, 393. See also Graphical
dependent-sample t-test and, 219–221, 222, descriptive statistics
223–226 Pillai’s Trace Sig, 294, 295
one-sample t-test and, 166–167, 168 Platykurtosis. See also Kurtosis
testing hypotheses about a population definition, 393
parameter based on a sample statistic overview, 105, 106, 112–113
and, 141–146 sampling distribution of the means and, 117
Index | 429
Type I error. See also Alpha value; Sampling Within-group degrees of freedom, 249,
error 257–258, 396
analysis of variance (ANOVA) and, 263–264 Within-group variance, 249, 396
changing alpha values and, 146–147
definition, 396
multivariate ANOVA (MANOVA) and, 289
overview, 127–128, 245
X
three or more comparisons and, 244–245
Type I error rate, 396 x-axis, 93, 282, 396
Type II error. See also Beta value; Sampling error
analysis of variance (ANOVA) and, 263–264
changing alpha values and, 146–147 Y
definition, 396
overview, 128, 245 y-axis, 93, 282, 396
three or more comparisons and, 244–245
Type II error rate, 396
Z
U
z scores. See also Measures of relative standing;
Univariate statistics, 36, 396 T-scores
Upper confidence limit, 396 changing alpha values and, 135–136, 146–147
definition, 396
empirical rule and, 111–112, 113
V overview, 75–78, 79, 80, 147–148
predicting a population parameter based
Variance. See also Analysis of variance on a sample statistic using confidence
(ANOVA); Measures of dispersion intervals and, 131–136
definition, 396 probability (p) values and, 148–152
independent-sample t-test and, 192 quiz regarding, 85–86, 124–125, 401–403
overview, 67–70 SPSS and, 80–83, 84
of a population, 69–70 stanines and, 78–79
quiz regarding, 85–86, 401–403 testing hypotheses about a population
parameter based on, 145, 146
testing hypotheses about a population
W parameter based on a sample statistic
and, 139–141, 142–146
Wilcoxon t-test, 237, 396 z table, 396
Within sum of squares, 251–252, 253–255 z-test, 153–154. See also One-sample z-test
About the Author
433